Professional Documents
Culture Documents
Spatial Cognition VI
Learning, Reasoning,
and Talking about Space
13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany
Volume Editors
Christian Freksa
SFB/TR 8 Spatial Cognition
Universität Bremen, Bremen, Germany
E-mail: freksa@sfbtr8.uni-bremen.de
Nora S. Newcombe
James H. Glackin Distinguished Faculty Fellow
Temple University, Philadelphia, PA, USA
E-mail: newcombe@temple.edu
Peter Gärdenfors
Lund University Cognitive Science
Lund, Sweden
E-mail: peter.gardenfors@lucs.lu.se
Stefan Wölfl
Department of Computer Science
University of Freiburg, Freiburg, Germany
E-mail: woelfl@informatik.uni-freiburg.de
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12519798 06/3180 543210
Preface
This is the sixth volume in a series of books dedicated to basic research in spatial
cognition. Spatial cognition research investigates relations between the physical
spatial world, on the one hand, and the mental world of humans, animals, and
artificial agents, on the other hand. Cognitive agents – natural or artificial – make
use of spatial and temporal information about their environment and about their
relation to the environment to move around, to behave intelligently, and to make
adaptive decisions in the pursuit of their goals. More specifically, cognitive agents
process various kinds of spatial knowledge for learning, reasoning, and talking
about space.
From a cognitive point of view, a central question is how our brains represent
and process spatial information. When designing spatial representation systems,
usability will be increased if the external and internal forms of representation
are aligned as much as possible. A particularly interesting feature is that much
of the internal representations of the meanings of words seem to have a spatial
structure. This also holds when we are not talking about space as such. The
spatiality of natural semantics will impose further requirements on the design
of information systems. An elementary example is that “more” of something is
often imagined as “higher” on a vertical dimension: consequently, a graphical
information system that associates “more” with “down” will easily be misunder-
stood. Another example concerns similarity relations: features that are judged
to be similar in meaning are best represented as spatially close in a graphical
information system.
In addition to the question of how this information is represented and used
– which was the focus of the previous Spatial Cognition volumes – an impor-
tant question is whether spatial abilities are innate (“hard wired”) or whether
these abilities can be learned and trained. The hypothesis that spatial cogni-
tion is malleable, and hence that spatial learning can be fostered by effective
technology and education, is based on recent evidence from multiple sources.
Developmental research now indicates that cognitive growth is not simply the
unfolding of a maturational program but instead involves considerable learning;
new neuroscience research indicates substantial developmental plasticity; and
cognitive and educational research has shown us significant effects of experience
on spatial skill.
Because an informed citizen in the 21st century must be fluent at process-
ing spatial abstractions including graphs, diagrams, and other visualizations,
research that reveals how to increase the level of spatial functioning in the pop-
ulation is vital. In addition, such research could lead to the reduction of gender
and socioeconomic status differences in spatial functioning and thus have an im-
portant impact on social equity. We need to understand spatial learning and to
use this knowledge to develop programs and technologies that will support the
VI Preface
capability of all children and adolescents to develop the skills required to com-
pete in an increasingly complex world. To answer these questions, we need to
understand structures and mechanisms of abstraction and we must develop and
test models that instantiate our insights into the cognitive mechanisms studied.
Today, spatial cognition is an established research area that investigates a
multitude of phenomena in a variety of domains on many different levels of ab-
straction involving a palette of disciplines with their specific methodologies. One
of today’s challenges is to connect and relate these different research areas. In
pursuit of this goal, the Transregional Collaborative Research Center SFB/TR 8
Spatial Cognition (Bremen and Freiburg) and the Spatial Intelligence and Learn-
ing Center (Philadelphia and Chicago) co-organized Spatial Cognition 2008 in
the series of the biannual international Spatial Cognition conferences. This con-
ference brought together researchers from both centers and from other spatial
cognition research labs, from all over the world. This proceedings volume con-
tains 27 papers that were selected for oral presentation at the conference in
a thorough peer-review process to which 54 papers had been submitted; each
paper was reviewed and commented on by at least three Program Committee
members. Many high-quality contributions could not be accepted due to space
limitations in the single-track conference program.
The Program Chairs invited three prominent scientists to deliver keynote
lectures at the Spatial Cognition 2008 conference: Heinrich H. Bülthoff spoke on
“Virtual Reality as a Valuable Research Tool for Investigating Different Aspects
of Spatial Cognition”, Laura Carlson’s talk was about “On the ‘Whats’ and
‘Hows’ of ‘Where’: The Role of Salience in Spatial Descriptions”, and Dedre
Gentner addressed the topic “Learning about space”. Abstracts of the keynote
presentations are also printed in this volume.
Spatial Cognition 2008 took place at Schloss Reinach near Freiburg (Ger-
many) in September 2008. Besides the papers for oral presentation, more than
30 poster contributions were selected for presenting work in progress. The con-
ference program also featured various tutorials, workshops, and a doctoral col-
loquium to promote an exchange of research experience of young scientists and
knowledge transfer at an early stage of project development. Immediately before
the conference, a workshop sponsored by the American National Science Foun-
dation (NSF) was organized by the SILC consortium in cooperation with the
SFB/TR 8 at the University of Freiburg. This workshop included lab visits at
the Freiburg site of the SFB/TR 8.
Many people contributed to the success of the Spatial Cognition 2008 con-
ference. First of all, we thank the authors for preparing excellent contributions.
This volume presents contributions by 61 authors on a large spectrum of interdis-
ciplinary work on descriptions of space, on spatial mental models and maps, on
spatio-temporal representation and reasoning, on route directions, wayfinding in
natural and virtual environments, and spatial behavior, and on robot mapping
and piloting. Our special thanks go to the members of the Program Committee
for carefully reviewing and commenting on these contributions. Thorough re-
views by peers are one of the most important sources of feedback to the authors
Preface VII
that connects them to still-unknown territory and that helps them to improve
their work and to secure a high-quality scientific publication.
We thank Thomas F. Shipley for organizing, and Kenneth D. Forbus, Alexan-
der Klippel, Marco Ragni, and Niels Krabisch for offering tutorials. For orga-
nizing workshops we owe thanks to Kenny Coventry and Jan M. Wiener as
well as Alexander Klippel, Stephen Hirtle, Marco Ragni, Holger Schultheis,
Thomas Barkowsky, Ronan O’Ceallaigh, and Wolfgang Stürzl. Further thanks
go to Christoph Hölscher for organizing the poster session, and Sven Bertel and
Marco Ragni, who were responsible for organizing the doctoral colloquium and
for allocating travel grants to PhD students.
We thank the members of our support staff, namely, Ingrid Schulz, Dagmar
Sonntag, Roswitha Hilden, Susanne Bourjaillat, and Ulrich Jakob for profes-
sionally arranging many details. Special thanks go to Thomas Barkowsky, Eva
Räthe, Lutz Frommberger, and Matthias Westphal for the close cooperation on
both sites of the SFB/TR 8.
We thank Wolfgang Bay and the SICK AG for the generous sponsorship
for this conference and the continuous support of scientific activities in and
around Freiburg. We thank Daniel Schober and the ESRI Geoinformatik GmbH
for sponsoring the travel grants to PhD students participating in the doctoral
colloquium.
We thank the Deutsche Forschungsgemeinschaft and the National Science
Foundation and their program directors Bettina Zirpel, Gerit Sonntag, and Soo-
Siang Lim for continued support of our research and for encouragement and
enhancing our international research cooperations.
For the review process and for the preparation of the conference proceed-
ings we used the EasyChair conference management system, which we found
convenient to use.
Finally, we thank Alfred Hofmann and his staff at Springer for their contin-
uing support of our book series as well as for sponsoring the Spatial Cognition
2008 Best Paper Award.
Program Chairs
Christian Freksa
Nora S. Newcombe
Peter Gärdenfors
Local Organization
Stefan Wölfl
Program Committee
Additional Reviewers
Daniel Beck
Kirsten Bergmann
Roberta Ferrario
Alexander Ferrein
Stefan Schiffer
Related Book Publications
1. Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.): COSIT 2007. LNCS,
vol. 4736. Springer, Heidelberg (2007)
2. Fonseca, F., Rodrı́guez, M.A., Levashkin, S. (eds.): GeoS 2007. LNCS, vol. 4853.
Springer, Heidelberg (2007)
3. Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.): Spatial Cognition
2007. LNCS (LNAI), vol. 4387. Springer, Heidelberg (2007)
4. Barker-Plummer, D., Cox, R., Swoboda, N. (eds.): Diagrams 2006. LNCS (LNAI),
vol. 4045. Springer, Heidelberg (2006)
5. Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (eds.): GIScience 2006.
LNCS, vol. 4197. Springer, Heidelberg (2006)
6. Cohn, A.G., Mark, D.M. (eds.): COSIT 2005. LNCS, vol. 3693. Springer, Heidel-
berg (2005)
7. Rodrı́guez, M.A., Cruz, I., Levashkin, S., Egenhofer, M.J. (eds.): GeoS 2005. LNCS,
vol. 3799. Springer, Heidelberg (2005)
8. Meng, L., Zipf, A., Reichenbacher, T. (eds.): Map-based mobile services — Theo-
ries, methods and implementations. Springer, Berlin (2005)
9. Freksa, C., Knauff, M., Krieg-Brückner, B., Nebel, B., Barkowsky, T. (eds.): Spatial
Cognition IV. LNCS (LNAI), vol. 3343. Springer, Heidelberg (2005)
10. Blackwell, A.F., Marriott, K., Shimojima, A. (eds.): Diagrams 2004. LNCS (LNAI),
vol. 2980. Springer, Heidelberg (2004)
11. Egenhofer, M.J., Freksa, C., Miller, H.J. (eds.): GIScience 2004. LNCS, vol. 3234.
Springer, Heidelberg (2004)
12. Gero, J.S., Tversky, B., Knight, T. (eds.): Visual and spatial reasoning in design
III, Key Centre of Design Computing and Cognition. University of Sydney (2004)
13. Freksa, C., Brauer, W., Habel, C., Wender, K.F.: Spatial Cognition III. LNCS
(LNAI), vol. 2685. Springer, Heidelberg (2003)
14. Kuhn, W., Worboys, M.F., Timpf, S. (eds.): COSIT 2003. LNCS, vol. 2825.
Springer, Heidelberg (2003)
15. Hegarty, M., Meyer, B., Narayanan, N.H. (eds.): Diagrams 2002. LNCS (LNAI),
vol. 2317. Springer, Heidelberg (2002)
16. Egenhofer, M.J., Mark, D.M. (eds.): GIScience 2002. LNCS, vol. 2478. Springer,
Heidelberg (2002)
17. Barkowsky, T.: Mental Representation and Processing of Geographic Knowledge.
LNCS (LNAI), vol. 2541. Springer, Heidelberg (2002)
18. Renz, J.: Qualitative Spatial Reasoning with Topological Information. LNCS
(LNAI), vol. 2293. Springer, Heidelberg (2002)
19. Coventry, K., Olivier, P. (eds.): Spatial language: Cognitive and computational
perspectives. Kluwer, Dordrecht (2002)
20. Montello, D.R. (ed.): COSIT 2001. LNCS, vol. 2205. Springer, Heidelberg (2001)
21. Gero, J.S., Tversky, B., Purcell, T. (eds.): Visual and spatial reasoning in design
II. Key Centre of Design Computing and Cognition. University of Sydney (2001)
22. Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.): Spatial Cognition 2000.
LNCS (LNAI), vol. 1849. Springer, Heidelberg (2000)
23. Habel, C., von Stutterheim, C. (eds.): Räumliche Konzepte und sprachliche Struk-
turen. Niemeyer, Tübingen (2000)
XII Related Book Publications
24. Freksa, C., Mark, D.M. (eds.): COSIT 1999. LNCS, vol. 1661. Springer, Heidelberg
(1999)
25. Gero, J.S., Tversky, B. (eds.): Visual and spatial reasoning in design. Key Centre
of Design Computing and Cognition. University of Sydney (1999)
26. Habel, C., Werner, S. (eds.): Special issue on spatial reference systems, Spatial
Cognition and Computation, vol. 1(4) (1999)
27. Freksa, C., Habel, C., Wender, K.F. (eds.): Spatial Cognition 1998. LNCS (LNAI),
vol. 1404. Springer, Heidelberg (1998)
28. Hirtle, S.C., Frank, A.U. (eds.): COSIT 1997. LNCS, vol. 1329. Springer, Heidel-
berg (1997)
29. Kuhn, W., Frank, A.U. (eds.): COSIT 1995. LNCS, vol. 988. Springer, Heidelberg
(1995)
Table of Contents
Invited Talks
Virtual Reality as a Valuable Research Tool for Investigating Different
Aspects of Spatial Cognition (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Heinrich H. Bülthoff, Jennifer L. Campos, and Tobias Meilinger
Spatial Orientation
Does Body Orientation Matter When Reasoning about Depicted or
Described Scenes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Marios N. Avraamides and Stephanie Pantelidou
Spatial Navigation
Map-Based Spatial Navigation: A Cortical Column Model for Action
Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Louis-Emmanuel Martinet, Jean-Baptiste Passot, Benjamin Fouque,
Jean-Arcady Meyer, and Angelo Arleo
Spatial Learning
Route Learning Strategies in a Virtual Cluttered Environment . . . . . . . . . 104
Rebecca Hurlebaus, Kai Basten, Hanspeter A. Mallot, and
Jan M. Wiener
Conflicting Cues from Vision and Touch Can Impair Spatial Task
Performance: Speculations on the Role of Spatial Ability in Reconciling
Frames of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Madeleine Keehner
Spatial Communication
Epistemic Actions in Science Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Kim A. Kastens, Lynn S. Liben, and Shruti Agrawal
Spatial Language
Tiered Models of Spatial Language Interpretation . . . . . . . . . . . . . . . . . . . . 233
Robert J. Ross
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 1–3, 2008.
c Springer-Verlag Berlin Heidelberg 2008
2 H.H. Bülthoff, J.L. Campos, and T. Meilinger
in any number of ways (both open and closed-loop). Finally, a new, state-of-the-
art omni-directional treadmill now offers observers the opportunity to experience
unrestricted, limitless walking throughout large-scale VE’s.
When moving through space, both, dynamic visual information (i.e., optic flow),
and body-based information (i.e., proprioceptive/efference copy and vestibular)
jointly specify the magnitude of a distance travelled. Relatively little is currently
known about how these cues are integrated when simultaneously present. In a series
of experiments, we investigated participants’ ability to estimate travelled distances
under a variety of sensory/motor conditions. Visual information presented via an
HMD was combined with body-based cues that were provided either by walking
in a fully-tracked, free-walking space, by walking on a large linear treadmill, or by
being passively transported in a robotic wheelchair. Visually-specified distances
were either congruent or incongruent with distances specified by body-based cues.
Responses reflect a combined effect of both visual and body-based information,
with an overall higher weighting of body-based cues during walking and a relatively
equal weighting of inertial and visual cues during passive movement. The charac-
teristics of self-motion perception have also been investigated using a novel contin-
uous pointing method. This task simply requires participants to view a target and
point continuously towards the target as they moved past it along a straight, for-
ward trajectory. By using arm angle, we are able to measure perceived location and,
hence, perceived self-velocity during the entire trajectory. We have compared the
natural characteristics of continuous pointing during sighted walking with those
during reduced sensory/motor cue conditions, including: blind-walking, passive
transport, and imagined walking. The specific characteristics of self-motion per-
ception during passive transport have also been further evaluated through the use
of a robotic wheelchair and the MPI Motion Simulator.
Additional research programs have focused on understanding particular as-
pects of spatial memory when navigating through visually rich, complex envi-
ronments. In one study that investigated route memory, participants navigated
through virtual Tübingen while it was projected onto a 220◦ field-of-view, curved
screen display. Participants learned two routes while they were simultaneously
required to perform a visual, spatial, or verbal secondary task. In the subsequent
wayfinding phase the participants were asked to locate and “virtually travel”
along the two routes again (via joystick manipulation). During this wayfinding
phase a number of dependent measures were recorded. The results indicate that
encoding wayfinding knowledge interfered with the verbal and spatial secondary
tasks. These interferences were even stronger than the interference of wayfinding
knowledge with the visual secondary task. These findings are consistent with a
dual-coding approach of wayfinding knowledge. This dual coding approach was
further examined in our fully-tracked, free-walking space. In this case, partic-
ipants walked a route through a virtual environment and again were required
to remember the route. For 50% of the intersections they encountered, they
were asked associate it with an arbitrary name they heard via headphones (e.g.,
“Goethe place”). For the other 50% of the intersections they were asked to re-
member the intersection by the local environmental features and not associate
Virtual Reality as a Valuable Research Tool 3
References
1. Berger, D.R., Terzibas, C., Beykirch, K., Bülthoff, H.H.: The role of visual cues
and whole-body rotations in helicopter hovering control. In: Proceedings of the
AIAA Modeling and Simulation Technologies Conference and Exhibit (AIAA 2007),
Reston, VA, USA. American Institute of Aeronautics and Astronautics (2007)
2. Bülthoff, H.H., van Veen, H.A.H.C.: Vision and action in virtual environments:
Modern psychophysics in spatial cognition research. In: Jenkin, M., Harris, M.L.
(eds.) Vision and Attention, pp. 233–252. Springer, Heidelberg (2000)
3. Campos, J.L., Butler, J.S., Mohler, B.J., Bülthoff, H.H.: The contributions of visual
flow and locomotor cues to walked distance estimation in a virtual environment. In:
Proceedings of the 4th Symposium on Applied Perception in Graphics and Visual-
ization, p. 146. ACM Press, New York (2007)
4. Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding - a dual
task experiment in a virtual city. Cognitive Science 32, 755–770 (2008)
5. Mohler, B.J., Campos, J.L., Weyel, M., Bülthoff, H.H.: Gait parameters while walk-
ing in a head-mounted display virtual environment and the real world. In: Proceed-
ings of Eurographics 2007, Eurographics Association, pp. 85–88 (2007)
6. Teufel, H.J., Nusseck, H.-G., Beykirch, K.A., Butler, J.S., Kerger, M., Bülthoff,
H.H.: MPI motion simulator: Development and analysis of a novel motion simulator.
In: Proceedings of the AIAA Modeling and Simulation Technologies Conference and
Exhibit (AIAA 2007), Reston, VA, USA. American Institute of Aeronautics and
Astronautics (2007)
On the “Whats” and “Hows” of “Where”:
The Role of Salience in Spatial Descriptions
(Abstract)
Laura A. Carlson
According to Clark [1] language is a joint activity between speaker and listener,
undertaken to accomplish a shared goal. In the case of spatial descriptions, one
such goal is for a speaker to assist a listener in finding a sought-for object. For
example, imagine misplacing your keys on a cluttered desktop, and asking your
friend if s/he knows where they are. In response, there are a variety of spatial
descriptions that your friend can select that vary in complexity, ranging from
a simple deictic expression such as “there” (and typically accompanied by a
pointing gesture), to a much more complicated description such as “its on the
desk, under the shelf, to the left of the book and in front of the phone.” Between
these two extremes are descriptions of the form “The keys are by the book”,
consisting of three parts: the located object that is being sought (i.e., the keys);
the reference object from which the location of the located object is specified
(i.e., the book) and the spatial term that conveys the spatial relation between
these two objects (i.e., by). For inquiries of this type (“where are my keys?”), the
located object is pre-specified, but the speaker needs to select an appropriate
spatial term and an appropriate reference object. My research focuses on the
representations and processes by which a speaker selects these spatial terms
and reference objects, and the representations and processes by which a listener
comprehends these ensuing descriptions.
The “Whats”
With respect to selection, one important issue is understanding why particular
terms and particular reference objects are chosen. For a given real-world scene,
there are many possible objects that stand in many possible relations with respect
to a given located object. On what basis might a speaker make his/her selection?
Several researchers argue that reference objects are selected on the basis of prop-
erties that make them salient relative to other objects [2,3,4]. Given the purpose of
the description as specifying the location of the sought-for object, it would make
sense that the reference object be easy to find among the other objects in the dis-
play. However, there are many different properties that could define salience, in-
cluding spatial features, perceptual properties, and conceptual properties.
With respect to spatial features, certain spatial relations are preferred over oth-
ers. For example, objects that stand in front/back relations to a given located ob-
ject are preferred to objects that stand in left/right relations [5]. This is consistent
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 4–6, 2008.
c Springer-Verlag Berlin Heidelberg 2008
On the “Whats” and “Hows” of “Where” 5
with well-known differences in the ease of processing different terms [6,7]. In ad-
dition, distance may play an important role, with objects that are closer to the
located object preferred to those that are more distant [8]. Thus, all else being
equal, a reference object may be selected because it is closest to the located object
and/or stands in a preferred relation with respect to the located object.
With respect to perceptual features, Talmy [4] identified size and movability
as key dimensions, with larger and immovable objects preferred as reference
objects. In addition, there may be a preference to select more geometrically
complex objects as reference objects. Blocher and Stopp [9] argued for color,
shape and size as critical salient dimensions. Finally, de Vega et al. [2] observed
preferences for reference objects that are inanimate, more solid, and whole rather
than parts of objects.
Finally, with respect to conceptual features, reference objects are considered
“given” objects, less recently mentioned into the discourse [4]. In addition, there
may be a bias to select reference objects that are functionally related to the
located object [10,11].
In this talk I will present research from my lab in which we systematically
manipulate spatial, conceptual and perceptual features, and ask which dimen-
sions are influential in reference object selection, and how priorities are assigned
across the spatial, perceptual and conceptual dimensions. Both production and
comprehension measures will be discussed. This work will provide a better sense
of how salience is being defined with respect to selecting a reference object for
a spatial description.
The “Hows”
Implicit in the argument that the salience of an object is computed across these di-
mensions is the idea that such computation requires that multiple objects are eval-
uated and compared among each other along these dimensions. That is, to say an
object stands out relative to other objects (for example, a red object among black
objects) requires that the color of all objects (black and red) be computed and com-
pared, and that on the basis of this comparison, the unique object (in this case, red)
stands out (among black). Put another way, an object can only stand out relative
to a contrast set [12]. Research in my lab has examined how properties of various
objects are evaluated and compared during production and comprehension, and
in particular, the point in processing at which properties of multiple objects ex-
ert their influence. For example, we have shown that the presence, placement and
properties of surrounding objects have a significant impact during comprehension
and production [13,11]. I will discuss these findings in detail, and will present elec-
trophysiological data that illustrate within the time course of processing the point
at which these features have an impact.
and when these features and dimensions have an impact on processing spatial
descriptions. Implications for other tasks and other types of spatial descriptions
will be discussed.
References
1. Clark, H.H.: Using language. Cambridge University Press, Cambridge (1996)
2. de Vega, M., Rodrigo, M.J., Ato, M., Dehn, D.M., Barquero, B.: How nouns and
prepositions fit together: An exploration of the semantics of locative sentences.
Discourse Processes 34, 117–143 (2002)
3. Miller, G.A., Johnson-Laird, P.N.: Language and perception. Harvard University
Press, Cambridge (1976)
4. Talmy, L.: How language structures space. In: Pick, H.L., Acredolo, L.P. (eds.)
Spatial orientation: Theory, research, and application, pp. 225–282. Plenum, New
York (1983)
5. Craton, L.G., Elicker, J., Plumert, J.M., Pick Jr., H.L.: Children’s use of frames of
reference in communication of spatial location. Child Developmen 61, 1528–1543
(1990)
6. Clark, H.H.: Space, time, semantics, and the child. In: Moore, T.E. (ed.) Cognitive
development and the acquisition of language. Academic Press, New York (1973)
7. Fillmore, C.J.: Santa Cruz lectures on deixis. Indiana University Linguistics Club,
Bloomington (1971)
8. Hund, A.M., Plumert, J.M.: What counts as by? Young children’s use of relative
distance to judge nearbyness. Developmental Psychology 43, 121–133 (2007)
9. Blocher, A., Stopp, E.: Time-dependent generation of minimal sets of spatial de-
scriptions. In: Olivier, P., Gapp, K.P. (eds.) Representation and processing of spa-
tial relations, pp. 57–72. Erlbaum, Mahwah (1998)
10. Carlson-Radvansky, L.A., Tang, Z.: Functional influences on orienting a reference
frame. Memory & Cognition 28, 812–820 (2000)
11. Carlson, L.A., Hill, P.L.: Processing the presence, placeent and properties of a
distractor in spatial language tasks. Memory & Cognition 36, 240–255 (2008)
12. Olson, D.: Language and thought: Aspects of a cognitive theory of semantics.
Psychological Review 77, 143–184 (1970)
13. Carlson, L.A., Logan, G.D.: Using spatial terms to select an object. Memory &
Cognition 29, 883–892 (2001)
Learning about Space
(Abstract)
Dedre Gentner
Spatial cognition is important in human learning, both in itself and as a major substrate
of learning in other domains. Although some aspects of spatial cognition may be in-
nate, it is clear that many important spatial concepts must be learned from experience.
For example, Dutch and German use three spatial prepositions—op, aan, and om in
Dutch—to describe containment and support relations, whereas English requires just
one preposition—on—to span this range. How do children learn these different ways
of partitioning the world of spatial relations? More generally, how do people come to
understand powerful spatial abstractions like parallel, convergent, proportionate, and
continuous?
I suggest that two powerful contributors to spatial learning are analogical mapping—
structural alignment and abstraction—and language, especially relational language,
which both invites and consolidates the insights that arise from analogical processes.
I will present evidence that (1) analogical processes are instrumental in learning new
spatial relational concepts; and, further, that (2) spatial relational language fosters ana-
logical processing. I suggest that mutual bootstrapping between structure-mapping pro-
cesses and relational language is a major contributor to spatial learning in humans.
1 Introduction
While moving around in the environment people are able to keep track of how ego-
centric spatial relations (i.e., self-to-object directions and distances) change as a result
of their movement [1-4]. To try out an example, choose one object from your imme-
diate surroundings (e.g., a chair), and point to it. Then, close your eyes and take a few
steps forward and/or rotate yourself by some angle. As soon as you finish moving, but
before opening your eyes, point to the object again. It is very likely that you pointed
very accurately and without taking any time to contemplate where the object might be
as a result of your movement. This task which humans can carry out with such re-
markable efficiency and speed entails rather complex mathematical computations. It
requires that the egocentric location of an object is initially encoded and then continu-
ously updated while moving in the environment. The mechanism that allows people to
update egocentric relations and stay oriented within their immediate surroundings is
commonly known as spatial updating.
Several studies have suggested that spatial updating takes place automatically with
physical movement because such movement provides the input that is necessary for
*
The presented experiments have been conducted as a part of an undergraduate thesis by
Stephanie Pantelidou.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 8–21, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 9
updating [2, 4]. In the case of non-visual locomotion this input consists of kinesthetic
cues, vestibular feedback, and copies of efferent commands. The importance of
physical movement is corroborated by empirical findings showing that participants
point to a location equally fast and accurately from an initial standpoint and a novel
standpoint they adopt by means of physical movement (as in the example above). In
contrast, when the novel standpoint is adopted by merely imagining the movement,
participants are faster and more accurate to respond from their initial than their novel
(imagined) standpoint [5]. This is particularly the case when an imagined rotation is
needed to adopt the novel standpoint.
The traditional account for spatial updating [4, 6] posits that spatial relations are
encoded and updated on the basis of an egocentric reference frame (i.e., a reference
frame that is centered on one´s body). Because egocentric relations are continuously
updated when moving, reasoning from one’s physical perspective is privileged as it
can be carried out on the basis of relations that are directly represented in memory.
Instead, reasoning from imagined perspectives is deliberate and effortful as it entails
performing “off-line” mental transformations to compute the correct response. Re-
cently, May proposed the sensorimotor interference account which places the exact
locus of difficulty for responding from imagined perspectives at the presence of con-
flicts between automatically-activated sensorimotor codes that specify locations rela-
tive to the physical perspective and cognitive codes that define locations relative to
the imagined perspective [7, 8]. Based on this account, while responding from an
actual physical perspective is facilitated by compatible sensorimotor codes, in order to
respond from an imagined perspective, the incompatible sensorimotor codes must be
inhibited while an alternative response is computed. The presence of conflicts reduces
accuracy and increases reaction time when reasoning from imagined perspectives. In
a series of elegant experiments, May provided support for the facilitatory and interfer-
ing effects of sensorimotor codes [7].
Recently, Kelly, Avraamides, and Loomis [9] dissociated the influence of sensori-
motor interference in spatial reasoning from effects caused by the organizational
structure of spatial memory (see also [10]). In one condition of the study participants
initially examined a spatial layout of 9 objects from a fixed standpoint and perspec-
tive. Then, they were asked to rotate 90° to their left or right to adopt a novel perspec-
tive. From this perspective participants carried out a series of localization trials that
involved pointing to object locations from various imagined perspectives. This para-
digm allowed dissociating the orientation of the testing perspective from that of the
perspective adopted during learning. This dissociation is deemed necessary in light of
evidence from several studies showing that spatial memories are stored with a pre-
ferred direction that is very often determined by the learning perspective [11]. Results
revealed that responding from imagined perspectives that coincided with either the
learning or the testing perspective was more efficient compared to responding from
other perspectives. A similar result was obtained in the earlier study of Mou, McNa-
mara, Valiquette, and Rump [10] which suggested that independent effects attributed
to the orientation of the body of the observer at test and the preferred storage orienta-
tion of spatial memory can be obtained in spatial cognition experiments. Kelly et al,
have termed the former effect as the sensorimotor alignment effect and the latter as
the memory-encoding alignment effect.
10 M.N. Avraamides and S. Pantelidou
1 and 2 in the present study, we will be able to assess the degree of functional equiva-
lence between scenes that are learned though pictures and language. Based on evi-
dence from previous studies that examined the organization of spatial memories
derived from maps and linguistic descriptions [22, 23], we expect that similar patterns
of findings will be found in the two experiments.
For the present experiments we adopted the paradigm used by Waller, Montello,
Richardson, and Hegarty [24] and previously by Presson and Hazelrigg [15]. In these
studies participants first learned various 4-point paths and then made judgments of
relative direction by adopting imagined perspectives within the paths. Trials could be
classified as aligned (i.e., the orientation of the imagined perspective matched the
physical perspective of the participant) or as contra-aligned (i.e., the imagined per-
spective deviated 180° from the physical perspective of the participant). The typical
result when participants carry out the task without moving from the learning stand-
point/perspective (Stay condition in [24]) is that performance is more efficient in
aligned than contra-aligned trials. This finding is commonly referred to as an align-
ment effect. Additional interesting conditions were included in the study by Waller.
In experiment 2, a Rotate condition was included. In this condition, participants per-
formed the task after having physically rotated 180°. The rationale was that if the
alignment effect is caused primarily by the learning orientation then a similar align-
ment effect to that of the Stay condition would be obtained. However, if the alignment
effect is caused by the influence of the orientation of the body at the time of test, a
reverse-alignment effect should be expected. Results, however, revealed no alignment
effect (see also[25]). Two additional conditions, namely the Rotate-Update and the
Rotate-Ignore, provided important results. In the Rotate-Update condition participants
were instructed to physically rotate 180° in place and imagine that the spatial layout
was behind them (i.e., they updated their position relative to the learned layout). In
the Rotate-Ignore condition participants also rotated by 180° but were asked to imag-
ine that the learned layout had rotated along with them. Results revealed a normal
alignment effect in the rotate-ignore condition but a reverse-alignment effect in the
rotate-update condition. Overall, these findings suggest that the orientation of the
body is important when reasoning about immediate environments.
In the present experiments we adopted the rationale of Waller et al. to examine the
presence of normal vs. reverse alignment effects in Stay and Rotate conditions. How-
ever, in contrast to Waller et al., the paths that we have used were not directly experi-
enced by participants. Instead, they were presented on a computer monitor as either
pictures (Experiment 1) or text route descriptions (Experiment 2). If the orientation
of the body of the participant at the time of test influences performance, a normal
alignment effect should be found in Stay conditions and a reverse alignment effect
should be obtained in Rotate conditions. However, if the learning perspective domi-
nates performance then a normal alignment effect should be expected in both Stay and
Rotate conditions. Finally, a third possibility is that both the learning and physical
perspectives influence performance, as shown by Kelly et al for immediate environ-
ments. In that case, if the two effects are of equal magnitude then no alignment effect
should be expected in Rotate conditions as the two effects would cancel each other
out. However, without making any assumptions about the magnitude of the two ef-
fects, we should at least expect a reduced alignment effect in Rotate conditions, if
indeed both learning and physical perspectives influence reasoning.
12 M.N. Avraamides and S. Pantelidou
2 Experiment 1
In Experiment 1 participants encoded paths that were depicted on a computer screen
and then carried out judgments of relative direction (JRD’s). A Stay condition and a
Rotate condition (in which neither update nor ignore instructions were given) were
included. Based on previous findings documenting that the orientation of one’s body
does not typically influence spatial reasoning about non-immediate environments, we
predict that a normal alignment effect would be present in both the Stay and Rotate
conditions. We also expect that overall performance will be equal in the Stay and
Rotate conditions.
2.1 Method
Participants
Twenty-two students from an introductory psychology course at the University of
Cyprus participated in the experiment in exchange for course credit. Twelve were
assigned to the Stay condition and 10 to the Rotate condition.
Design
A 2 (observer position: Stay vs Rotate) x 3 (imagined perspective: aligned 0°, mis-
aligned 90°, contra-aligned 180°) mixed factorial design was used. Observer position
was manipulated between subjects while imagined perspective varied within-subjects.
2.2 Procedure
Prior to the beginning of the experiment participants were shown example paths on
paper and were instructed on how to perform JRD’s. JRD’s involve responding to
statements of the form “imagine being at x, facing y. Point to z” were x, y, and z are
objects/landmarks from the studied layout. Prior to the beginning of the experiment
participants were asked to perform various practice trials with JRD’s using campus
landmarks as targets and responding both with their arms and the joystick. Then,
participants were seated in front of one of the monitors and were asked to study the
first path. They were instructed to visualize themselves moving on the path. The
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 13
initial direction of imagined movement was to the left for two paths and to the right in
the other two (e.g., Figure 1). This was done to avoid confounding the initial move-
ment direction with either the orientation of the body or the one opposite to it. Partici-
pants were given unlimited time to memorize the path and then proceeded to perform
the experimental trials. Each trial instructed them to imagine adopting a perspective
within the memorized path (e.g., Imagine standing at 1 facing 2) and point from it
with the joystick toward a different position in the path (e.g., Point to 3). Participants
in the Stay condition performed the trials on the same monitor on which they have
previously viewed the path. Those in the Rotate condition were asked to rotate 180°
and perform the pointing trials on the other monitor. Participants were instructed to
respond as fast as possible but without sacrificing accuracy. Sixteen trials for each
path were included yielding to a total of 64 trials per subject. Four imagined perspec-
tives (i.e., aligned 0°, misaligned 90° left, misaligned 90° right, and contra-aligned
180°) were equally represented in the 64 trials. Furthermore, correct pointing re-
sponses, which could be 45°, 90°, and 135° to the left and right of the forward joy-
stick position, were equally distributed across the four imagined perspectives. The
order of trials within each path was randomized. Also, the order in which the four
paths were presented to participants varied randomly.
14 M.N. Avraamides and S. Pantelidou
2.3 Results
Separate analyses for pointing accuracy and latency for correct responses were carried
out. In order to classify responses as correct and incorrect, joystick deflection angles
were quantized as follows. Responses between 22.5° and 67.5° from to forward posi-
tion of the joystick were classified as 45° responses to the left or right depending on
the side of deflections. Similarly, responses that fell between 67.5° and 112.5° were
considered as 90° responses to the left or right. Finally, responses between 112.5° and
157.5° were marked as 135° responses. Initial analyses of accuracy and latency in-
volving all four imagined perspectives revealed no differences between the 90° left
and the 90° right perspectives in either Stay or rotate conditions. Therefore, data for
these two perspectives were averaged to form a misaligned 90° condition. A 2 (ob-
server position) x 3 (imagined perspective) mixed-model Analysis of Variance
(ANOVA) was conducted for both accuracy and latency data.
Accuracy
The analysis revealed that overall accuracy was somewhat higher in the Stay (79,9%)
than in the Rotate (73,9%) condition. However, this difference did not reach statistical
significance, F(1,20)=.92, p=.35, η2 =.04. A significant main effect for imagined
perspective was obtained, F(2,40)=8.44, p<.001, η2 =.30. As seen in Table 1, accuracy
was higher for the aligned 0° perspective (84,4%), intermediate for the misaligned 90°
perspective (76,2%), and the lowest for the 180° contra-aligned perspective (70,2%).
Within-subject contrasts verified that all pair-wise differences were significant,
p´s<.05. Importantly, this pattern was obtained in both the Stay and Rotate conditions
as evidenced by the absence of a significant interaction, F(2,40)=.40, p=.68, η2 =.02.
Table 1. Accuracy (%) in Experiment 1 as a function of observer position and imagined per-
spective. Values in parentheses indicate standard deviations.
Latency
The analysis of latencies yielded similar findings with the accuracy data. No differ-
ences were obtained between the Stay (11,63s) and the Rotate (11,45s) conditions,
F(1,20)=.03, p=.87, η2=.001. However, a significant main effect was obtained for
imagined perspective, F(2,40)=19,96, p<.001, η2 =.50.
As seen in Figure 2, pointing was faster in the aligned 0° condition (9,80s), inter-
mediate in the misaligned 90° condition (11,47s), and the slowest in the contra-
aligned 180° condition (13,35s). All pair-wise comparisons were significant, p´s<.01.
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 15
Fig. 2. Latency for pointing responses as a function of observer position and imagined perspec-
tive in Experiment 1. Error bars represent standard errors.
Finally, the interaction between observer position and imagined perspective was not
significant, F(2,40)=.72, p=.50, η2=.04.
2.4 Discussion
3 Experiment 2
Experiment 2 was identical to Experiment 1 with the only exception being that instead
of presenting the paths in pictures, route descriptions were shown. Previous studies
with route descriptions have documented the presence of a strong influence of the
orientation of the first travel segment of the path on spatial performance [26]; this
suggests that the way the path is represented in memory determined the ease of spatial
reasoning. Based on these findings we expect that no influence of body orientation
would be evidenced in our experiment. Like Experiment 1, we predict the presence of
a normal alignment effect in both Stay and Rotate conditions.
16 M.N. Avraamides and S. Pantelidou
3.1 Method
Participants
Twenty-two students, none of which were included in Experiment 1, participated in
the experiment in exchange for course credit. Half were randomly assigned to the
Stay condition and the other half to the Rotate condition.
Design
As in Experiment 1, the design adopted was a 2 (observer position: Stay vs rotate) x 3
(imagined perspective: aligned 0°, misaligned 90°, contra-aligned 180°) mixed facto-
rial with observer position and imagined perspective as between-subject and within-
subject factors respectively.
In contrast to Experiment 1, the paths were learned through text descriptions pre-
sented on the screen. These descriptions were presented in Greek, the native language
of all participants. Prior to the experiment participants were shown a picture as the
one in Figure 1, which however included no path. They were told that this was an
environment in which they should imagine themselves standing in. The text descrip-
tion described the same paths of Experiment 1. An English-translation of an example
description would read as follows:
Imagine standing at the beginning of a path. The position that you are
standing at is position 1. Without moving from this position, you turn
yourself to the left. Then, you walk straight for 10 meters and you reach
position 2. As soon as you get there you turn towards the left again and
you walk another 10 meters to reach position 3. At this position, you
turn to your right and walk another 10 meters to position 4. Finally, you
turn again to your right and walk another 10 meters towards position 5
which is the endpoint of the path.
3.2 Procedure
The procedure was identical to that of Experiment 1. Prior to reading the descriptions
participants were instructed to visualize themselves moving along the described path
and imagine turning 90° whenever a turn was described. As in Experiment 1, the
initial movement direction was to the left for two paths and to the right for the other
two. Participants in the Rotate condition carried out a physical 180° turn prior to be-
ginning the test trials.
3.3 Results
As in Experiment 1, no differences were obtained between the 90° left and the 90°
right imagined perspective in either accuracy or latency. Therefore, data were aver-
aged across these two perspectives to form a 90° misaligned perspective condition.
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 17
Separate 2 x 3 repeated measures ANOVA were then conducted for accuracy and
latency.
Accuracy
The ANOVA on accuracy data revealed that overall performance was equivalent
between the Stay (68,7%) and the rotate conditions (70,3%), F(1,20)=.40, p=.84, η2
=.002. A significant main effect for imagined perspective was obtained,
F(2,40)=17.60, p<.001, η2 =.47. As seen in Table 2, accuracy was higher for the
aligned 0° perspective (77,1%), intermediate for the misaligned 90° perspective
(69,8%), and the lowest for the 180° contra-aligned perspective (61,7%). Within-
subject contrasts verified that all pair-wise differences were significant, p´s<.05.
These difference among perspectives were present in both the Stay and rotate condi-
tions as suggested by the lack of a significant interaction, F(2,40)=.22, p=.81, η2 =.01.
Table 2. Accuracy (%) in Experiment 2 as a function of observer position and imagined per-
spective. Values in parentheses indicate standard deviations.
Fig. 3. Latency for pointing responses as a function of observer position and imagined perspec-
tive in Experiment 1. Error bars represent standard errors.
18 M.N. Avraamides and S. Pantelidou
Latency
The analysis reveal no difference in performance for the Stay (12,39 s) and the Rotate
(11,79 s) conditions, F(1,20)=.12, p=.74, η2 =.006. A significant main effect was pre-
sent for imagined perspective, F(2,40)=24,22, p<.001, η2 =.55.
As seen in Figure 3, participants pointed faster in the aligned 0° condition (10,51 s),
intermediate in the misaligned 90° condition (11,82 s), and the slowest in the contra-
aligned 180° condition (13,94 s). All pair-wise comparisons were significant, p´s<.05.
Finally, the interaction between observer position and imagined perspective was not
significant, F(2,40)=.41, p=.67, η2 =.02.
4 General Discussion
The experiments presented here provide evidence for the lack of sensorimotor influ-
ence for reasoning about spatial relations contained in depicted or described environ-
ments. The current findings deviate from those obtained from experiments with real
visual scenes in which the influence of body orientation was substantial [9, 10].
While our findings suggest that reasoning through symbolic media might not al-
ways be equivalent to reasoning about actual environments, in our opinion, the criti-
cal variable is not whether the environments are experienced directly through our
senses or indirectly through symbolic media but rather whether the spatial relations
they contain are immediate or not (see [9]). We believe that reasoning about remote
locations is free of sensorimotor facilitation/interference. Because symbolic media are
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 19
previous studies have suggested that people have difficulty in maintaining misaligned
imagined perspectives [26].
At this point it should be pointed out that while we claim that no egocentric rela-
tions between the self and the elements of the path were formed, we acknowledge that
the transient egocentric systems of participants would have been used to encode and
update egocentric relations to objects from the laboratory, including the two computer
monitors used to present stimuli. Moreover, spatial relations between each path loca-
tion and an imagined representation of the self within the path could have been
formed. But, such relations could be more easily classified as allocentric rather than
egocentric if the self in the imagined path is regarded as just another location in the
layout.
A secondary goal of our study was to assess the degree of functional equivalence
between spatial representations created from depicted and described scenes. An im-
portant result is that the same pattern of findings (i.e., a normal alignment effect) was
observed in the two experiments. While performance was somewhat more accurate
for depicted than described scenes, our cross-experiment analysis revealed that the
difference was not significant. The difference in mean accuracy is not surprising
given findings from previous studies showing that it takes a longer to reach the same
level of learning when encoding spatial layouts through language than vision [17-19].
In the current study we have used no learning criterion. Instead, participants were
provided with unlimited time to study the layouts in the two experiments. The accu-
racy and fidelity of their spatial representations was, however, not assessed prior to
testing. It is possible then that the overall performance difference between described
and depicted was caused by differences in encoding. Previous studies suggest that
functional equivalence for representations acquired from different modalities is
achieved after equating conditions in terms of encoding differences [3, 17]. A future
direction for research would thus to examine functional equivalence for representa-
tions of remote environments after taking in account the differences that may exist
across modalities in terms of encoding.
Acknowledgments. We are grateful to all the students who participated in the study.
References
1. Amorim, M.A., et al.: Updating an object’s orientation and location during nonvisual navi-
gation: a comparison between two processing modes. Percept. Psychophys. 59(3), 404–418
(1997)
2. Farrell, M.J., Thomson, J.A.: On-Line Updating of Spatial Information Druing Locomo-
tion Without Vision. J. Mot. Behav. 31(1), 39–53 (1999)
3. Loomis, J.M., et al.: Spatial updating of locations specified by 3-d sound and spatial lan-
guage. J. Exp. Psychol. Learn. Mem. Cogn. 28(2), 335–345 (2002)
4. Rieser, J.J.: Access to knowledge of spatial structure at novel points of observation. J. Exp.
Psychol. Learn. Mem. Cogn. 15(6), 1157–1165 (1989)
5. Presson, C.C., Montello, D.R.: Updating after rotational and translational body move-
ments: coordinate structure of perspective space. Perception 23(12), 1447–1455 (1994)
6. Wang, R.F., Spelke, E.S.: Updating egocentric representations in human navigation. Cog-
nition 77(3), 215–250 (2000)
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 21
1 Introduction
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 22–38, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Spatial Memory and Spatial Orientation 23
locations are inherently relative, objects contained in this long-term spatial memory
must be specified in the context of a spatial reference system. For example, a football
fan might remember the location of his or her car in the stadium parking lot
with respect to the rows and columns of cars, or possibly with respect to the car’s
location relative to the main stadium entrance. In either case, the car’s location must
be represented relative to some reference frame, which is likely to be centered on the
environment.
Much of the experimental work on the organization of long-term spatial memories
has focused on the cues that influence the selection of one spatial reference system
over the infinite number of candidate reference systems. In these experiments, par-
ticipants learn the locations of objects on a table, within a room, or throughout a city,
and are later asked to retrieve inter-object spatial relationships from the remembered
layout. A variety of spatial memory retrieval tasks have been employed, including
map drawing, picture recognition, and perspective taking. These retrieval tasks are
commonly performed after participants have been removed from the learning envi-
ronment, to ensure that spatial memories are being retrieved from the long-term repre-
sentation and not from the sensorimotor representation. Here we focus primarily on
results from perspective taking tasks, where participants point to locations from imag-
ined perspectives within the remembered environment. A consistent finding from
these experiments is that long-term spatial memories are typically represented with
respect to a small number of reference directions, centered on the environment and
selected during learning (see McNamara, 2003, for a review). During spatial memory
retrieval, inter-object spatial relationships aligned with those reference directions are
readily accessible because they are directly represented in the spatial memory. In
contrast, misaligned spatial relationships must be inferred from other represented
relationships, and this inference process is cognitively effortful (e.g, Klatzky, 1998).
The pattern of response latencies and pointing errors across a sample of imagined
perspectives is interpreted as an indicator of the reference directions used to organize
the spatial memory, and a large body of work has focused on understanding the cues
that influence the selection of one reference direction over another during acquisition
of the memory.
Because we use our bodies to sense environmental information and also to act on
the environment, the body’s position during learning seems likely to have a large
influence on selecting a reference direction. Consistent with this thinking, early evi-
dence indicated that perspectives aligned with experienced views are facilitated rela-
tive to non-experienced views. This facilitation fell off as a function of angular dis-
tance from the experienced views (Diwakdar & McNamara, 1997; Roskos-Ewoldsen,
McNamara, Shelton & Carr, 1999; Shelton & McNamara, 1997), and this pattern of
facilitation holds true for up to three learned perspectives. These findings resonate
with similar findings from object recognition (Bülthoff & Edelman, 1992), but are
complicated by two other sets of findings. First, Kelly et al. (2007; see also
Avraamides & Kelly, 2005) had participants learn a layout of eight objects within an
octagonal room using immersive virtual reality. Participants freely turned and ex-
plored the virtual environment during learning, but the initially experienced perspec-
tive was held constant. After learning this layout, imagined perspectives aligned with
the initially experienced perspective were facilitated, and this pattern persisted even
after extensive experience with other perspectives misaligned with that initial view.
Spatial Memory and Spatial Orientation 25
The authors concluded that participants established a reference direction upon first
experiencing the environment, and that this organization was not updated even after
learning from many other views. Second, Shelton and McNamara (2001; also see
Hintzman, O’Dell & Arndt, 1981) found that the environmental shape has a profound
impact on selecting reference directions. In one of their experiments, participants
learned a layout of objects on the floor of a rectangular room. Learning occurred
from two perspectives, one parallel with the long axis of the room and one misaligned
with the room axis. Perspective taking performance was best when imagining the
aligned learning perspective and performance on the misaligned learning perspective
was no better than on non-experienced perspectives. The authors concluded that
reference directions are selected based on a combination of egocentric experience and
environmental structure, and that the rectangular room served as a cue to selecting a
reference direction consistent with that structure. This finding is supported by other
work showing facilitated retrieval of inter-object relationships aligned with salient
environmental features like city streets, large buildings, and lakes (McNamara, Rump
& Werner, 2003; Montello, 1991; Werner & Schmidt, 1999).
Other work has shown that selection of reference directions is influenced not only
by features external to the learned layout, but also by the structure of the learned lay-
out itself. For example, the reference directions used to remember the locations of
cars in a stadium parking lot might be influenced by the row and column structure of
the very cars that are being learned. Mou and McNamara (2002) demonstrated the
influence of this intrinsic structure by having participants study a rectilinear object
array. The experimenter pointed out the spatial regularity of the layout, which con-
tained rows and columns oblique to the viewing perspective during learning. Subse-
quent perspective taking performance was best for perspectives aligned with the
intrinsic axes defined by the rows and columns of objects, even though those perspec-
tives were never directly experienced during learning. Furthermore, this influence of
the intrinsic object structure is not dependent on experimenter instructions like those
provided in Mou and McNamara’s experiments. Instead, an axis of bilateral symme-
try within the object array can induce the same organization with respect to an intrin-
sic frame of reference, defined by the symmetry axis (Mou, Zhao & McNamara,
2007).
To summarize the findings reviewed so far, the reference directions used to organ-
ize long-term spatial memories are known to be influenced by egocentric experience,
extrinsic environmental structures like room walls (extrinsic to the learned layout),
and intrinsic structures like rows and columns of objects or symmetry axes (intrinsic
to the learned layout). While these cues have each proven influential in cases where
only one or two cues are available, real world environments typically contain a whole
host of cues, including numerous extrinsic and intrinsic cues like sidewalks, tree lines,
waterfronts, and mountain ranges. A recent set of experiments reported by Kelly &
McNamara (2008) sought to determine whether one particular cue type is dominant in
a more representative scene, where egocentric experience, extrinsic structure, and
intrinsic structure all provided potential cues to selecting a reference direction. In the
first of two experiments using immersive virtual reality, participants learned a layout
of seven virtual objects from two perspectives. The objects were arranged in rows and
columns which were oblique to the walls of a surrounding square room (termed the
incongruent environment, since intrinsic and extrinsic environmental structures
26 J.W. Kelly and T.P. McNamara
60
Absolute Pointing Error [deg] 0°-135°
50 135°-0°
40
30 135°
20
10
Incongruent environment 0°
0
0 45 90 135 180 225 270 315
60
Absolute Pointing Error [deg]
0°-135° 135°
50 135°-0°
40
30 0°
20
10
Congruent environment
0
0 45 90 135 180 225 270 315
Imagined Perspective [deg]
Fig. 1. Stimuli and results from Kelly and McNamara (2008). Plan views of the incongruent
(top) and congruent (bottom) environments appear as insets within each panel. In the plan
views, open circles represent object locations, solid lines represent room walls, and arrows
represent viewing locations during learning. Pointing error is plotted as function of imagined
perspective, separately for the two viewing orders (0° then 135° or 135° then 0°). After learn-
ing the incongruent environment (top), where intrinsic and extrinsic structures were incongru-
ent with one another, performance was best on the initially experienced view. After learning
the congruent environment (bottom), where intrinsic and extrinsic structures were congruent
with one another, performance was best for perspectives aligned with the redundant environ-
mental structures, regardless of viewing order.
were incongruent with one another; see Figure 1, top panel). One of the learned per-
spectives (0°) was aligned with the intrinsic object structure, and the other (135°) was
aligned with the extrinsic room structure. Learning occurred from both views, and
viewing order was manipulated. If the intrinsic structure was more salient than extrin-
sic structure, then participants should have selected a reference direction from the 0°
Spatial Memory and Spatial Orientation 27
view (aligned with the rows and columns of the layout). However, if extrinsic struc-
ture was more salient than intrinsic structure, then participants should have selected a
reference direction from the 135° view (aligned with the walls of the room). Finally, if
the competing intrinsic and extrinsic structures negated one another’s influence, then
participants should have selected a reference direction from the initially experienced
view, regardless of its alignment with a particular environmental structure. In fact,
spatial memories of the incongruent environment (top panel of Figure 1) were based on
the initially experienced view, and the pattern of facilitation is well predicted by the
viewing order. Neither the intrinsic structure of the objects nor the extrinsic structure of
the room was more salient when the two were placed in competition.
In the second experiment reported by Kelly and McNamara (2008), the intrinsic and
extrinsic structures were placed in alignment with one another (termed the congruent
environment; see inset in Figure 1, bottom panel), and learning occurred from two
perspectives, one aligned and one misaligned with the congruent environmental struc-
tures. Spatial memories of the congruent environment (bottom panel of Figure 1) were
organized around the redundantly defined environmental axes. Performance was best
for perspectives aligned with the congruent intrinsic and extrinsic structures, and was
no better on the misaligned experienced view than on other misaligned views that were
never experienced. The results of these two experiments fit well with those reported by
Shelton and McNamara (2001), where multiple misaligned extrinsic structures (a
rectangular room and a square mat on the floor) resulted in egocentric selection of
reference directions, but aligned extrinsic structures resulted in environment-based
selection. Taken together, these findings indicate that intrinsic and extrinsic structures
are equally salient, and can serve to reinforce or negate the influences of one another as
cues to the selection of reference directions. Every-day environments typically contain
multiple intrinsic and extrinsic structures like roads, waterfronts, and tree lines, and
these structures often define incongruent sets of environmental axes. As such, it
is possible that reference directions are most commonly selected on the basis of
egocentric experience.
Experiments on long-term spatial memory have regularly provided evidence that
long-term representations are orientation-dependent, allowing for privileged access to
spatial relations aligned with a reference direction centered on the environment. How-
ever, the evidence reviewed thus far is based primarily on imagined perspective tak-
ing performance, and experiments using scene recognition indicate that there may be
more than one long-term representation. Valiquette and McNamara (2007; also see
Shelton & McNamara, 2004) had participants learn a layout of objects from two per-
spectives, one aligned and one misaligned with the extrinsic structure of the environ-
ment (redundantly defined by the room walls and a square mat on the floor). As in
other experiments (e.g., Kelly & McNamara, 2008; Shelton & McNamara, 2001),
perspective taking performance was better when imagining the aligned learning per-
spective than when imagining the misaligned learning perspective, which was no
better than when imagining other misaligned perspectives that were never experi-
enced. In contrast, scene recognition performance was good on both the aligned and
misaligned learning perspectives, and fell off as a function of angular distance from
the learned perspectives. So while imagined perspective taking performance indicated
that the misaligned learning view was not represented in long-term memory, scene
recognition performance indicated that the misaligned view was represented. The
28 J.W. Kelly and T.P. McNamara
authors interpreted this as evidence for two long-term representations, one used for
locating self-position (active during the scene recognition test) and the other for locat-
ing goal locations after establishing self-position (active during the perspective taking
task). Importantly, both representations were found to be orientation-dependent, but
the reference directions used to organize the two types of representations were differ-
ent. The influence of these reference directions on navigation is still unclear. One
possibility is that spatial relationships are more accessible when the navigator is
aligned with a reference direction in long-term memory. As a result, a navigator’s
ability to locate and move toward a goal location might be affected by his or her ori-
entation within the remembered space. Additionally, experiments presented in
Section 4 suggest that spatial updating occurs with respect to the same reference di-
rections used to organize spatial memories.
10 Original
9 Sensorimotor aligned
Misaligned
8
Response latency [sec]
7
6
5
4
3
2
1
0
Block 1 Block 2
Fig. 2. Results of Kelly, Avraamides and Loomis (2007). Response latency is plotted as a
function of test block and imagined perspective. After learning a layout of objects, participants
walked into a neighboring room and performed Block 1 of the perspective-taking task. Results
indicate that performance was best for the originally experienced perspective during learning,
but was unaffected by the disparity between the orientation of the imagined perspective relative
to the orientation of the participants’ bodies during testing. After completing Block 1, partici-
pants returned to the empty learning room and performed Block 2. Results indicate that per-
formance was facilitated on the originally experienced perspective, and also on the perspective
aligned with the body during testing.
For the second block of trials, participants returned to the empty learning room (the
learned objects had been removed after learning), and performed the exact same per-
spective-taking task as before. Performance was again facilitated when participants
imagined the initially experienced perspective, but also when they imagined their
actual perspective, compared to performance on the misaligned perspective (see
Figure 2). Despite the fact that participants did not view the learned objects upon
Spatial Memory and Spatial Orientation 31
returning to the learning room, their sensorimotor representations of the objects were
reactivated, causing sensorimotor interference when imagining perspectives mis-
aligned with the body. This indicates that walking back into the empty learning room
was sufficient to reinstantiate the sensorimotor representation of the learned objects,
even though several minutes had passed since they were last seen. Renewal of the
sensorimotor representation must have drawn on the long-term representation, be-
cause the objects themselves were not experienced upon returning to the empty
learning room. In sum, Kelly et al.’s experiment suggests that the sensorimotor repre-
sentation is less sensitive to elapsed time than previously thought, and instead is de-
pendent on perceived self-location. The sensorimotor representation appears to be
context dependent, and moving from one room to another changes the context and
therefore also changes the contents of the sensorimotor representation.
4 Spatial Orientation
Staying oriented during movement through a remembered space and reorienting after
becoming lost are critical spatial abilities. With maps and GPS systems, getting lost
on one’s drive home might not present a life or death situation, but the same was not
true for our ancestors, whose navigation abilities were necessary for survival. Accord-
ing to Gallistel (1980; 1990), spatial orientation is achieved, in part, by relating prop-
erties of the perceived environment (i.e., the sensorimotor representation) with those
same properties in the remembered environment (i.e., the long-term representation),
and is also informed by perceived self-position as estimated by integrating self-
motion cues during locomotion, a process known as path integration. The importance
of information from path integration becomes particularly clear when navigating
within an ambiguous environment, such as an empty rectangular room in which two
orientations provide the exact same perspective of the room (e.g., Hermer & Spelke,
1994). In this case, one’s true orientation can only be known by using path integra-
tion to distinguish between the two potentially correct matches between sensorimotor
and long-term representations.
From time to time, the matching between perceived and remembered environments
can produce grossly incorrect estimates of self-position. Jonsson (2002; also see Gal-
listel, 1980) describes several such experiences. In one case, he describes arriving in
Cologne by train. Because his origin of travel was west of Cologne, he assumed that
the train was facing eastward upon its arrival at Cologne Central Station. The train
had, in fact, traveled past Cologne and turned around to enter the station from the
east, and was therefore facing westward upon its arrival. Jonsson’s initial explora-
tions of the city quickly revealed his orienting error, and he describes the disorienting
experience of rotating his mental representation of the city 180° into alignment with
the visible scene. Experiences such as these are typically preceded by some activity
that disrupts the path integration system (like riding in a subway, or falling asleep on
a train), which would have normally prevented such an enormous error.
Much of the experimental work on the topic of human spatial orientation has focused
on the cues used to reorient after explicit disorientation. In particular, those studies
32 J.W. Kelly and T.P. McNamara
100
90
Percentage of correct responses
80
70
60
50
40
30
20
10
0
Circular Square Rectangular Trapezoidal
Room shape
Fig. 3. Reorientation performance in four rooms, varying in their rotational symmetry. Partici-
pants learned to identify one of twelve possible object locations, and then attempted to locate
the learned location after disorientation.
Spatial Memory and Spatial Orientation 33
room (without featural cues) can be exactly reproduced by rotating the room 180°.
Because there are two orientations that produce the same perspective, the rectangular
room is two-fold rotationally symmetric. A square room is four-fold rotationally
symmetric, and so on. In our experiment, we tested reorientation performance within
environments of 1-fold (trapezoidal), 2-fold (rectangular), 4-fold (square) and ∞-fold
(circular) rotational symmetry. Participants memorized one of twelve possible target
locations within the room, and then attempted to re-locate the target position after
explicit disorientation. Reorientation performance (see Figure 3) was inversely pro-
portional to room rotational symmetry across the range of rotational symmetries
tested. This can be considered an effect of geometric ambiguity, with the greater
ambiguity of the square room compared to the trapezoidal room leading to compara-
tively poorer reorientation performance in the square room. The same analysis can be
applied to featural cues, which have traditionally been operationalized as unambigu-
ous indicators of self-location (e.g., Hermer & Spelke, 1996), but need not be
unambiguous.
Even in the absence of environmental cues, humans can maintain a sense of spatial
orientation through path integration. Path integration is the process of updating per-
ceived self-location and orientation using internal motion cues such as vestibular and
proprioceptive cues, and external motion cues such as optic flow, and integrating
those motion signals over time to estimate self-location and orientation (for a review,
see Loomis, Klatzky, Golledge & Philbeck, 1999).
The path integration process is noisy, and errors accrue with increased walking and
turning. In an experiment by Klatzky et al. (1990), blindfolded participants were led
along an outbound path consisting of one to three path segments, and each segment
was separated by a turn. After reaching the end of the path, participants were first
asked to turn and face the path origin and then to walk to the location of the path
origin. Turning errors and walked-distance errors increased with the number of path
segments, demonstrating that path integration is subject to noise. Errors that accumu-
late during path integration cannot be corrected for without perceptual access to envi-
ronmental features, such as landmarks or geometry.
Only occasionally are we faced with a pure reorientation task or a pure path integration
task. More commonly, environmental cues and path integration are both available as we
travel through a remembered space. In a recent experiment, we investigated the role of
environmental geometry in spatial orientation when path integration was also available.
Participants performed a spatial updating task, where they learned a location within a
room and attempted to keep track of that location while walking along an outbound path.
At the end of the path they were asked to point to the remembered location. The path was
defined by the experimenter and varied in length from two to six path segments, and
participants actively guided themselves along this path. The task was performed in envi-
ronments of 1-fold (trapezoidal), 2-fold (rectangular), 4-fold (square) and ∞-fold (circu-
lar) rotational symmetry. If rotational symmetry affects spatial updating performance like
34 J.W. Kelly and T.P. McNamara
it affected reorientation performance (see Section 4.1, above), then performance should
degrade as room shape becomes progressively more ambiguous. The effect of room
rotational symmetry was expected to be particularly noticeable at long path lengths, when
self-position estimates through path integration become especially error-prone (Klatzky
et al., 1990; Rieser & Rider, 1991), and people are likely to become lost and require
reorientation. Contrary to these predictions, spatial updating performance was quite
good, and was unaffected by increasing path length in all three angled environments
(square, rectangular and trapezoidal; see Figure 4). This is in stark contrast to perform-
ance in the circular room, where errors increased with increasing path length. Participants
were certainly using path integration to stay oriented when performing the task. Other-
wise, performance would have been completely predicted by room rotational symmetry
(like the reorientation experiment discussed above in Section 4.1). Participants were also
certainly using room shape cues, when available. Otherwise, pointing errors in all envi-
ronments would have increased with increasing path length, as they did in the circular
room.
To explain these results, we draw on previous work showing that long-term spatial
memories are represented with respect to a small number of reference directions (see
Section 2). Of particular relevance, Mou et al. (2007) showed that reference directions
often correspond to an axis of environmental symmetry. Based on this finding, we
believe that participants in the spatial updating task represented each environment
(including the room itself and the to-be-remembered locations within the room) with
respect to a reference direction, coincident with an environmental symmetry axis.
Perceived self-position was updated with respect to this reference direction (see
Cheng & Gallistel, 2005, for a similar interpretation of experiments on reorientation
by rats). In the circular room, any error in estimating self-position relative to the ref-
erence direction directly resulted in pointing error, because the environment itself
offered no information to help participants constrain their estimates of the orientation
of the reference direction. However, geometric cues in the three angled environments
at least partially defined the reference direction, which we believe corresponded to an
environmental symmetry axis. For example, the square environment defined the
symmetry axis within +/- 45°. If errors in perceived heading ever exceeded this +/-
45° threshold, then participants would have mistaken a neighboring symmetry axis for
the selected reference direction. The rectangular and trapezoidal environments were
even more forgiving, as the environmental geometries defined those symmetry axes
within +/- 90° and +/- 180°, respectively. Furthermore, participants in the angled
environments could use the environmental geometry to reduce heading errors during
locomotion, thereby preventing those errors from exceeding the threshold allowed by
a given rotational symmetry.
The experiments described in this section demonstrate how ambiguous environ-
mental cues and noisy self-motion cues can be combined to allow for successful
spatial orientation. During normal navigation, we typically have information from
multiple sources, all of which may be imperfect indicators of self-position. By com-
bining those information sources, we can stay oriented with respect to the remem-
bered environment, a crucial step toward successful navigation.
Spatial Memory and Spatial Orientation 35
Circular
70
Absolute pointing error [deg]
Square
60 Rectangular
Trapezoidal
50
40
30
20
10
0
2 4 6
Path segments
Fig. 4. Pointing error in a spatial updating task as a function of walked path length, plotted
separately for the four different surrounding environments. Pointing errors increased with
increased walking distance in the round room. In comparison, performance was unaffected by
path length in the square, rectangular, and trapezoidal rooms.
in the visual field) can be added to the long-term, environment-centered spatial mem-
ory. However, the nature of these interactions between long-term and sensorimotor
spatial memories remains poorly understood, and warrants further research.
The experiments on spatial orientation presented in Section 4 represent a step to-
ward understanding this interaction between sensorimotor and long-term representa-
tions. Participants in those experiments are believed to have monitored self-position
and orientation relative to the reference direction used to structure the long-term
memory of the environment, and the selected reference direction most likely corre-
sponded to an axis of environmental symmetry. Path integration helped participants
keep track of the selected reference direction and avoid confusion with neighboring
symmetry axes. This conclusion underscores the importance of the reference direc-
tions used in long-term memory, not just for retrieving inter-object relationships, but
also for staying oriented within remembered spaces and updating those spaces during
self-motion. A more complete understanding of spatial orientation should be in-
formed by further studies of the interaction between long-term spatial memory, sen-
sorimotor spatial memory, and path integration.
References
1. Avraamides, M.N., Kelly, J.W.: Imagined perspective-changing within and across novel
environments. In: Freksa, C., Nebel, B., Knauff, M., Krieg-Brückner, B. (eds.) Spatial
Cognition IV. LNCS (LNAI), pp. 245–258. Springer, Berlin (2005)
2. Avraamides, M.N., Kelly, J.W.: Multiple systems of spatial memory and action. Cogntive
Processing 9, 93–106 (2008)
3. Bridgeman, B., Lewis, S., Heit, G., Nagle, M.: Relation between cognitive and motor-
oriented systems of visual position perception. Journal of Experimental Psychology: Hu-
man Perception and Performance 5, 692–700 (1979)
4. Bülthoff, H.H., Edelman, S.: Psychophysical support for a two-dimensional view interpo-
lation theory of object recognition. Proceedings of the National Academy of Sci-
ences 89(1), 60–64 (1992)
5. Cheng, K.: A purely geometric module in the rat’s spatial representation. Cognition 23,
149–178 (1986)
6. Cheng, K., Gallistel, C.R.: Shape parameters explain data from spatial transformations:
Comment on Pearce et al (2004) and Tommasi and Polli (2004). Journal of Experimental
Psychology: Animal Behavior Processes 31(2), 254–259 (2005)
7. Cheng, K., Newcombe, N.S.: Is there a geometric module for spatial orientation? Squaring
theory and evidence. Psychonomic Bulletin & Review 12(1), 1–23 (2005)
8. Diwadkar, V.A., McNamara, T.P.: Viewpoint dependence in scene recognition. Psycho-
logical Science 8(4), 302–307 (1997)
9. Gallistel, C.R.: The Organization of Action: A New Synthesis. Erlbaum, Hillsdale (1980)
10. Gallistel, C.R.: The Organization of Learning. MIT Press, Cambridge (1990)
11. Hermer, L., Spelke, E.S.: A geometric process for spatial reorientation in young children.
Nature 370, 57–59 (1994)
12. Hermer, L., Spelke, E.S.: Modularity and development: The case of spatial reorientation.
Cognition 61(3), 195–232 (1996)
13. Hintzman, D.L., O’Dell, C.S., Arndt, D.R.: Orientation in cognitive maps. Cognitive Psy-
chology 13, 149–206 (1981)
Spatial Memory and Spatial Orientation 37
14. Hodgson, E., Waller, D.: Lack of set size effects in spatial updating: Evidence for offline
updating. Journal of Experimental Psychology: Learning, Memory, & Cognition 32, 854–
866 (2006)
15. Jonsson, E.: Inner Navigation: Why we Get Lost in the World and How we Find Our Way.
Scribner, New York (2002)
16. Kelly, J.W., Avraamides, M.N., Loomis, J.M.: Sensorimotor alignment effects in the learn-
ing environment and in novel environments. Journal of Experimental Psychology: Learn-
ing, Memory & Cognition 33(6), 1092–1107 (2007)
17. Kelly, J.W., McNamara, T.P.: Spatial memories of virtual environments: How egocentric
experience, intrinsic structure, and extrinsic structure interact. Psychonomic Bulletin &
Review 15(2), 322–327 (2008)
18. Klatzky, R.L.: Allocentric and egocentric spatial representations: Definitions, distinctions,
and interconnections. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition, pp.
1–17. Springer, Berlin (1998)
19. Klatzky, R.L., Loomis, J.M., Golledge, R.G., Cicinelli, J.G., Doherty, S., Pellegrino, J.W.:
Acquisition of route and survey knowledge in the absence of vision. Journal of Motor Be-
havior 22(1), 19–43 (1990)
20. Loomis, J.M., Klatzky, R.L., Golledge, R.G., Philbeck, J.W.: Human navigation by path
integration. In: Golledge, R.G. (ed.) Wayfinding: Cognitive mapping and other spatial
processes, pp. 125–151. Johns Hopkins, Baltimore (1999)
21. May, M.: Imaginal perspective switches in remembered environments: Transformation
versus interference accounts. Cognitive Psychology 48, 163–206 (2004)
22. McNamara, T.P.: How are the locations of objects in the environment represented in mem-
ory? In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial cognition III. LNCS
(LNAI), pp. 174–191. Springer, Berlin (2003)
23. McNamara, T.P., Rump, B., Werner, S.: Egocentric and geocentric frames of reference in
memory of large-scale space. Psychonomic Bulletin & Review 10(3), 589–595 (2003)
24. Milner, A.D., Goodale, M.A.: The visual brain in action. Oxford University Press, Oxford
(1995)
25. Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study. Envi-
ronment and Behavior 23(1), 47–69 (1991)
26. Mou, W., McNamara, T.P.: Intrinsic frames of reference in spatial memory. Journal of Ex-
perimental Psychology: Learning, Memory, and Cognition 28(1), 162–170 (2002)
27. Mou, W., McNamara, T.P., Valiquette, C.M., Rump, B.: Allocentric and egocentric updat-
ing of spatial memories. Journal of Experimental Psychology: Learning, Memory, and
Cognition 30(1), 142–157 (2004)
28. Mou, W., Zhao, M., McNamara, T.P.: Layout geometry in the selection of intrinsic frames
of reference from multiple viewpoints. Journal of Experimental Psychology: Learning,
Memory, and Cognition 33, 145–154 (2007)
29. Presson, C.C.: The development of spatial cognition: Secondary uses of spatial informa-
tion. In: Eisenberg, N. (ed.) Contemporary Topics in Developmental Psychology, pp. 87–
112. Wiley, New York (1987)
30. Presson, C.C., Montello, D.R.: Updating after rotational and translational body move-
ments: Coordinate structure of perspective space. Perception 23, 1447–1455 (1994)
31. Rieser, J.J.: Access to knowledge of spatial structure at novel points of observation. Jour-
nal of Experimental Psychology: Learning, Memory, and Cognition 15(6), 1157–1165
(1989)
32. Rieser, J.J., Rider, E.A.: Young children’s spatial orientation with respect to multiple tar-
gets when walking without vision. Developmental Psychology 27(1), 97–107 (1991)
38 J.W. Kelly and T.P. McNamara
33. Roskos-Ewoldsen, B., McNamara, T.P., Shelton, A.L., Carr, W.: Mental representations of
large and small spatial layouts are orientation dependent. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition 24(1), 215–226 (1999)
34. Rump, B., McNamara, T.P.: Updating in models of spatial memory. In: Barkowsky, T.,
Knauff, M., Montello, D.R. (eds.) Spatial cognition V. LNCS (LNAI), pp. 249–269.
Springer, Berlin (2007)
35. Schneider, G.E.: Two visual systems. Science 163, 895–902 (1969)
36. Shelton, A.L., McNamara, T.P.: Multiple views of spatial memory. Psychonomic Bulletin
& Review 4(1), 102–106 (1997)
37. Shelton, A.L., McNamara, T.P.: Systems of spatial reference in human memory. Cognitive
Psychology 43(4), 274–310 (2001)
38. Shelton, A.L., McNamara, T.P.: Orientation and perspective dependence in route and sur-
vey learning. Journal of Experimental Psychology: Learning, Memory, and Cognition 30,
158–170 (2004)
39. Sholl, M.J.: Cognitive maps as orienting schemata. Journal of Experimental Psychology:
Learning, Memory, and Cognition 13(4), 615–628 (1987)
40. Sholl, M.J.: The role of a self-reference system in spatial navigation. In: Montello, D. (ed.)
Spatial information theory: Foundations of geographic information science, pp. 217–232.
Springer, Berlin (2001)
41. Valiquette, C., McNamara, T.P.: Different mental representations for place recognition and
goal localization. Psychonomic Bulletin & Review 14(4), 676–680 (2007)
42. Waller, D., Hodgson, E.: Transient and enduring spatial representations under disorienta-
tion and self-rotation. Journal of Experimental Psychology: Learning, Memory, & Cogni-
tion 32, 867–882 (2006)
43. Wang, R.F., Crowell, J.A., Simons, D.J., Irwin, D.E., Kramer, A.F., Ambinder, M.S.,
Thomas, L.E., Gosney, J.L., Levinthal, B.R., Hsieh, B.B.: Spatial updating relies on an
egocentric representation of space: Effects of the number of objects. Psychonomic Bulletin
& Review 13, 281–286 (2006)
44. Wang, R.F., Spelke, E.S.: Updating egocentric representations in human navigation. Cog-
nition 77, 215–250 (2000)
45. Werner, S., Schmidt, K.: Environmental reference systems for large-scale spaces. Spatial
Cognition and Computation 1(4), 447–473 (1999)
Map-Based Spatial Navigation:
A Cortical Column Model for Action Planning
1 Introduction
Spatial cognition calls upon the ability to learn neural representations of the
spatio-temporal properties of the environment, and to employ them to achieve
goal-oriented navigation. Similar to other high-level functions, spatial cognition
involves parallel information processing mediated by a network of brain struc-
tures that interact to promote effective spatial behaviour [1,2]. An extensive
body of experimental work has investigated the neural bases of spatial cogni-
tion, and a significant amount of evidence points towards a prominent role of
the hippocampal formation (see [1] for recent reviews). This limbic region has
been thought to mediate spatial learning functions ever since location-selective
neurones — namely hippocampal place cells [3], and entorhinal grid cells [4] —
and orientation-selective neurones — namely head-direction cells [5] — were
found by means of electrophysiological recordings from freely moving rats.
Hippocampal place cells, grid cells, and head-direction cells are likely to sub-
serve spatial representations in allocentric (i.e., world centred) coordinates, thus
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 39–55, 2008.
c Springer-Verlag Berlin Heidelberg 2008
40 L.-E. Martinet et al.
providing cognitive maps [3] to support spatial behaviour. Yet, to perform flexible
navigation (i.e., to plan detours and/or shortcuts) two other components are nec-
essary: goal representation, and target-dependent action sequence planning [6].
The role of the hippocampal formation in these two mechanisms remains unclear.
On the one hand, the hippocampus has been proposed to encode topological-like
representations suitable for action sequence learning [6]. This hypothesis mainly
relies on the recurrent dynamics generated by the CA3 collaterals of the hip-
pocampus [7]. On the other hand, the hippocampal space code is likely to be
highly redundant and distributed [8], which does not seem adequate for learning
compact topological representations of high-dimensional spatial contexts. Also,
the experimental evidence for high-level spatial representations mediated by a
network of neocortical areas (e.g., the posterior parietal cortex [9], and the pre-
frontal cortex [10]) suggests the existence of an extra-hippocampal action plan-
ning system shared among multiple brain regions [11]. This hypothesis postulates
a distributed spatial cognition system in which (i) the hippocampus would take
part to the action planning process by conveying redundant (and robust) spa-
tial representations to higher associative areas, (ii) a cortical network would
elaborate more abstract and compact representations of the spatial context (ac-
counting for motivation-dependent memories, action cost/risk constraints, and
temporal sequences of goal-directed behavioural responses). Among the corti-
cal areas involved in map building and action planning, the prefrontal cortex
(PFC) may play a central role, as suggested by anatomical PFC lesion studies
showing impaired navigation planning in rats [12]. Also, the anatomo-functional
properties of the PFC seem appropriate to encode abstract contextual memo-
ries not merely based on spatial correlates. The PFC receives direct projections
from sub-cortical structures (e.g., the hippocampus [13], the amygdala [14], and
the ventral tegmental area [15]), and indirect connections from the basal ganglia
through the basal ganglia - thalamocortical loops [16]. These projections provide
the PFC with a multidimensional context, including emotional and motivational
inputs [17], reward-dependent modulation [18], and action-related signals [16].
The PFC seems then well suited to (i) process manifold spatial information
[19], (ii) encode the motivational values associated to spatio-temporal events
[6], and (iii) perform supra-modal decisions [20]. Also, the PFC may be involved
in integrating events in the temporal domain at multiple time scales [21]. The
PFC recurrent dynamics regulated by the modulatory action of dopaminergic
afferents [22] may permit to maintain patterns of activity over long time scales.
Finally, the PFC is likely to be critical to detecting cross-temporal contingencies,
which is relevant to the temporal organisation of behavioural responses, and to
the encoding of retrospective and prospective memories [21].
sub-cortical structures (mainly the thalamus); and layer VI, sending outputs to
sub-cortical brain areas (e.g., to the striatum and the thalamus). Layers II-III
and V-VI constitute the so called supragranular and infragranular layers, respec-
tively. The anatomo-functional properties of cortical columns have been widely
investigated [24]. Neuroanatomical findings have indicated that columns can be
divided into several minicolumns, each of which is composed of a population
of interconnected neurones [25]. Thus, a column can be seen as an ensemble of
interrelated minicolumns receiving inputs from cortical areas and other struc-
tures. It processes these afferent signals and projects the responses both within
and outside the cortical network. This twofold columnar organisation has been
suggested to subserve efficient computation and information processing [24].
2 Methods
2.1 Single Neurone Model
The elementary computational units of the model are artificial firing-rate neu-
rones i, whose mean discharge ri ∈ [0, 1] is given by
ri (t) = f Vi (t) · (1 ± η) . (1)
where Vi (t) is the membrane potential at time t, f is the transfer function, and
η is a random noise uniformly drawn from [0, 0.01]. Vi varies according to
dVi (t)
τi · = −Vi (t) + Ii (t) . (2)
dt
where τi = 10 ms is the membrane time constant, and Ii (t) is the synaptic drive
generated by all the inputs. Eq. 2 is integrated by using a time step Δt = 1 ms.
Both the synaptic drive Ii (t) and the transfer function f are characteristic of
the different types of model units, and they will be defined thereafter.
SL wu
wc Current position
column Goal
IL
wl column
wh
(A) (B)
Fig. 1. The cortical model and the implementation of the activation-diffusion process.
(A) Columns (c and c ) consist of sets of minicolumns (vertical grey regions), each of
which contains a supragranular (SL) and an infragranular (IL) layer unit. (B) Top:
back-propagation of the motivational signal through the network of SL neurones. Bot-
tom: forward-propagation of the goal-directed action signal through the IL neurones.
where i indexes other SL neurones of the cortical network; wim and rm are the
weight and the intensity of the motivational signal, respectively. In the current
44 L.-E. Martinet et al.
wjl j = wii
u
= βLT P ∀i , j ∈ c . (7)
with βLT P = 0.9. Finally, in order to adapt the topological representation online,
a synaptic potentiation-depression mechanism can modify the lateral projections
wl and w u . For example, if a new obstacle prevents the animat from achieving
a previously learnt transition from column c to c (i.e., if the activation of the
IL unit j ∈ θ ∈ c is not followed in the time by the activation of all IL units
j ∈ c ), then a depression of the wjl j synaptic efficacy occurs:
Goal
Food Box
Block B B
Path 1
Gate A
Block A P1
P2
Pa
th
2
Path 3
P3
Starting Place
Start
(A) (B)
Fig. 2. (A) Tolman & Honzik’s maze (adapted from [36]). The gate near the second
intersection prevented rats from going from right to left. (B) The simulated maze
and robot. The dimensions of the simulated maze were taken so as to maintain the
proportions of the Tolman & Honzik’s setup. Bottom-left inset: the real e-puck mobile
robot has a diameter of 70 mm and is 55 mm tall.
goal signal reaches the minicolumns selective for the current position s this coin-
cidence event occurs, which triggers the forward propagation of a goal-directed
path signal through the projections w l (Fig. 1B, bottom). Goal-directed trajec-
tories are generated by reading out the successive activations of IL neurones.
Action selection calls upon a competition between the minicolumns encoding
the (s, a1···N ) ∈ S × A pairs, where s is the current location, and a1···N are the
transitions from s to adjacent positions s . For sake of robustness, competition
occurs over a 10-timestep cycle. Notice that each SL synaptic relay attenuates
u
the goal signal by a factor wii (Eq. 3). Thus, the smaller the number of synaptic
relays, the stronger the goal signal received by the SL neurone corresponding to
the current location s. As a consequence, because the model column receptive
fields are distributed rather uniformly over the environment, the intensity of the
goal signal at a given location s is correlated to the distance between s and the
target position sg .
Fig. 2B shows a simulated version of the Tolman & Honzik’s apparatus, and
the simulated robot1 . We emulated the experimental protocol designed by Tol-
man & Honzik to assess the animats’ navigation performance. The overall pro-
tocol consisted of a training period followed by a probe test. Both training and
probe trials were stopped when the animat had found the goal.
Training period: it lasted 14 days with 12 trials per day. The animats could
explore the maze and learn their navigation policy.
– During Day 1, a series of 3 forced runs was carried out, in which additional
doors were used to force the animats to go successively through P1, P2, and
P3. Then, during the remaining 9 runs, all additional doors were removed,
and the subjects could explore the maze freely. At the end of the first training
day, a preference for P1 was expected to be already developed [36].
– From Day 2 to 14, a block was introduced at place A (Fig. 2B) to require
a choice between P2 and P3. In fact, additional doors were used to close
the entrances to P2 and P3 to force the animats to go first to the Block A.
Then, doors were removed, and the subjects were forced to decide between
P2 and P3 on their way back to the first intersection. Each day, there were
10 “Block at A” runs that were mixed with 2 non-successive free runs to
maintain the preference for P1.
Probe test period: it lasted 1 day (Day 15), and it involved 7 runs with a block at
position B to interrupt the common section (Fig. 2B). The animats were forced
to decide between P2 and P3 when returning to the first intersection point.
For these experiments, Tolman & Honzik used 10 rats with no previous train-
ing. In our simulations, we used a population of 100 animats, and we assessed
the statistical significance of the results by means of an ANOVA analysis (the
significant threshold was set at 10−2 , i.e. p < 0.01 was considered significant).
with rj (s) being the response of the neurone j when the animat is visiting the
location s ∈ S, and σJ (s) representing the standard deviation of the population
activity distribution for a given stimulus s.
Another measure was used to analyse the neural responses, the kurtosis func-
tion. This measure is defined as the normalised fourth central moment of a
probability distribution, and estimates its degree of peakedness. If applied to a
neural response distribution, the kurtosis can be used to measure its degree of
sparseness across both population and time [41]. We employed an average pop-
ulation kurtosis measure k̄1 = k1 (s)s∈S to estimate how many neurones j of a
population J were, on average, responding to a given stimulus s simultaneously.
The kurtosis k1 (s) was taken as:
with r̄J (s) = rj (s)j∈J . Similarly, an average lifetime kurtosis k̄2 = k2 (j)j∈J
was employed to assess how rarely a neurone j responded across time. The k2 (j)
function was given by:
with r̄j = rj (s)s∈S , and σj being the standard deviation of the cell activity rj .
Finally, we used an information theoretic analysis [42] to characterise the neu-
ral codes of our cortical and hippocampal populations. The mutual information
M I(S; R) between neural responses R and spatial locations S was computed:
P (r, s)
M I(S; R) = P (r, s) log2 (13)
P (r)P (s)
s∈S r∈R
where r ∈ R indicated firing rates, P (r, s) the joint probability of having the
animat visiting a region s ∈ S while recording a response r, P (s) the a priori
probability computed
as the ratio between time spent at place s and the total
time, and P (r) = s∈S P (r, s) the probability of observing a neural response r.
The continuous output space of a neurone, i.e. R = [0, 1], was discretized via a
binning procedure (bin-width equal to 0.1). The M I(S; R) measure allowed us
to quantify the spatial information content of a neural code, i.e. how much could
be learnt about the animat’s position s by observing the neural responses r.
3 Results
Day 1. During the first 12 training trials, the animats learnt the topology of
the maze and planned their navigation trajectory in the absence of both block
A and B. Similar to Tolman & Honzik’s findings, our results show that the
model learnt to select the shortest goal-directed pathway P1 significantly more
frequently than the alternative trajectories P2, P3 (ANOVA, F2,297 = 168.249,
Cortical Model for Navigation Planning 49
8 150 7
Number of occurences
Number of occurences
Number of occurences
7 6
5
6 100
4
5
3
4 50
2
3
1
2 0 0
P1 P2 P3 P1 P2 P3 P1 P2 P3
1 1 1
0 0 0
Fig. 3. Behavioural results. Top row: mean number of transits through P1, P2, and
P3 (averaged over 100 animats). Bottom row: occupancy grid maps. (A) During the
first 12 training trials (day 1) the simulated animals developed a significant preference
for P1 (no significant difference was observed between P2 and P3). (B) During the
following 156 training trials (days 2-14, in the presence of block A, Fig. 2B) P2 was
selected significantly more frequently than P3. (C) During the last 7 trials (day 15, test
phase), the block A was removed whereas the block B was introduced. The animats
exhibited a significant preference for P3 compared to P2.
p < 0.0001). The quantitative and qualitative analyses reported on Fig. 3A de-
scribe the path selection performance averaged over 100 animats.
Days 2-14. During this training phase (consisting of 156 trials), a block was
introduced at location A, which forced the animats to update their topological
maps dynamically, and to plan a detour to the goal. The results reported by Tol-
man & Honzik provided strong evidence for a preference for the shortest detour
path P2. Consistently, in our simulations (Fig. 3B) we observed a significantly
larger number of transits through P2 compared to P3 (ANOVA, F1,198 = 383.068
p < 0.0001), P1 being ignored in this analysis (similar to Tolman & Honzik’s
analysis) because blocked.
Day 15. Seven probe trials were performed during the 15th day of the simu-
lated protocol, by removing the block A and adding a new block at location B.
This manipulation aimed at testing the “insight” working hypothesis: after a first
run through the shortest path P1 and after having encountered the unexpected
block B, will animats try P2 (wrong behaviour) or will they go directly through
P3 (correct behaviour)? According to Tolman & Honzik’s results, rats behaved
as predicted by the insight hypothesis, i.e. they tended to select the longer but
50 L.-E. Martinet et al.
20 120
Number of individuals
15
Number of errors
80
10
40
5
0
0
0 10 20 30 40 50 Learning Randomly
Number of errors individual behaving
individual
(A) (B)
Fig. 4. Comparison between a learning and a randomly behaving agent. (A) Error
distribution of learning (black histogram) versus random (grey line) animats. (B) Mean
number of errors made by the model and by a randomly behaving agent.
effective P3. The authors concluded that rats were able to inhibit the previously
learnt policy (i.e., the “habit behaviour” consisting of selecting P2 after a fail-
ure of P1 during the 156 previous training trials). Our probe test simulation
results are shown in Fig. 3C. Similar to rats, the animats exhibited a signifi-
cant preference for P3 compared to P2 (ANOVA, F1,198 = 130.15, p < 0.0001).
Finally, in order to further assess the mean performance of the system during
the probe trials, we compared the action selection policy of learning animats
with that of randomly behaving (theoretical) animats. Fig. 4A provides the re-
sults of this comparison by showing the error distribution over the population of
learning agents (black histogram) and randomly behaving agents (grey curve).
The number of errors per individual are displayed in the boxplot of Fig. 4B.
These findings indicate a significantly better performance of learning animats
compared to random agents (ANOVA, F1,196 = 7.4432, p < 0.01).
Fig. 5A contrasts the mean spatial density (Eq. 10) of the HP receptive fields
with that of cortical column receptive fields. It is shown that, compared to
the upstream hippocampal space code, the cortical column model reduced the
redundancy of the learnt spatial code significantly (ANOVA, F1,316 = 739.2,
p < 0.0001). Fig. 5B shows the probability distribution representing the number
of active column units (solid curve) and active HP cells (dashed line) per spatial
location s ∈ S. As shown by the inset boxplots, the distribution kurtosis was
significantly higher for column units than for HP cells (ANOVA, F1,198 = 6057,
p < 0.0001). To further investigate this property, we assessed the average pop-
ulation kurtosis k̄1 (Eq. 11) of both columnar and HP cell activities (Fig. 5C).
Again, the columnar population activity exhibited a significantly higher kurtosis
Cortical Model for Navigation Planning 51
1.0 90
140
18
Distribution kurtosis
0.8 60
100
Probability
12 0.6 30
80
0 60
0.4 Place Column
6 units units
40
0.2
20
0 0
0
0 5 10 15 20 25 30
Place Column Place Column
units units Number of active units units units
Fig. 5. (A) Spatial density of the receptive fields of HP cells and cortical column units.
(B) Probability distribution of the number of active column units (solid line) and ac-
tive HP cells (dashed line) per spatial location s ∈ S. Inset boxplots: kurtosis measures
for the two distributions. (C) Population kurtosis of columnar and hippocampal as-
semblies.
than the HP cell activity (ANOVA, F1,3128 = 14901, p < 0.0001). These results
suggest that, in the model, the cortical column network was able to provide a
sparser state-space population coding than HP units.
In a second series of analyses, we focused on the activity of single cells, and
we compared the average lifetime kurtosis k̄2 (Eq. 12) of cortical and HP units.
As reported on Fig. 6A, we found that the kurtosis across time did not differ
significantly between cortical and HP units (ANOVA, F1,2356 = 2.2699, p <
0.13). This result suggests that, on average, single cortical and HP units tended
to respond to a comparable number of stimuli (i.e., spatial locations) over their
lifetimes. Along the same line, we recorded the receptive fields of the two types
of units. Figs. 6B,C display some samples of place fields of cortical and HP
cells, respectively. As expected, we found a statistical anticorrelation between
the lifetime kurtosis and the size of the receptive fields. The example of Fig. 6D
shows that, for a randomly chosen animat performing the whole experimental
protocol (15 days), the size of hippocampal place fields was highly anticorrelated
to the HP cells’ lifetime kurtosis (correlation coefficient = −0.94). These results
add to those depicted in Fig. 5 in that the increase of sparseness at the level
of the cortical population (compared to HP cells) was not merely due to an
enlargement of the receptive fields (or, equivalently, to a decrease of the lifetime
stimulus-dependent activity).
Despite their less redundant code, were cortical columns able to provide a
representation comparable to that of HP cells in terms of spatial information
content? The results of our information theoretic analysis (Eq. 13) suggest that
this was indeed the case. Fig. 6E shows that, for a randomly chosen animat,
52 L.-E. Martinet et al.
0.9
35 0.8
5 0.1
0.22 0.7
0.6
0.20
0.5
0.18 0.4
0.3
0.16
0.2
0.14
0.1
0.12 0
5 10 15 20 25 30
Place Column
Kurtosis across time units units
(D) (E)
Fig. 6. (A) Lifetime kurtosis for column and HP units. (B, C) Samples of receptive
fields of three column units and four HP cells. (D) Correlation between the lifetime
kurtosis and the size of receptive fields. (E) Mutual information M I(S; R) between
the set of spatial locations S and the activity R for both cortical and HP units.
the average amount of spatial information conveyed by cortical units was not
significantly lower than that of HP cells (ANOVA, F1,140 = 0.8034, P < 0.3716).
4 Discussion
goal signal. Yet, these two models rely on discretized state spaces (with prede-
fined grid units coding for places), whereas our model uses a place field popu-
lation providing a continuous representation of the environment [38]. Also, our
model learns topological maps coding for the state-action space simultaneously.
In the model by Samsonovich and Ascoli [43] no topological information is rep-
resented, but only a distance measure between each visited place and a set of
potential goals. Likewise, in Hasselmo’s model states and actions are not jointly
represented, which generates a route-based rather than a map-based navigation
system [1].
We adopted a three-fold working hypothesis according to which (i) the hip-
pocampus would play a prominent role in encoding spatial information; (ii)
higher-level cortical areas, particularly the PFC, would mediate multidimen-
sional contextual representations (e.g., coding for motivation-dependent memo-
ries and action cost/risk constraints) grounded on the hippocampal spatial code;
(iii) neocortical representations would facilitate the temporal linking of multi-
ple contexts, and the sequential organisation (e.g., planning) of behavioural re-
sponses. The preliminary version of the model presented here enabled us to focus
on some basic computational properties, such as the ability of the columnar or-
ganisation to learn a compact topological representation, and the efficiency of the
activation-diffusion planning mechanism. Further efforts will be put to integrate
multiple sources of information. For example, the animat should be able to learn
maps that encode reward (subjective) values, and action-cost constraints. Also,
these maps should be suitable to represent multiple spatio-temporal scales to
overcome the intrinsic limitation of the activation-diffusion mechanism in large
scale environments. Additionally, these multiscale maps should allow the model
to infer high-level shortcuts to bypass low-level environmental constraints.
The neurocomputational approach presented here aims at generating cross-
disciplinary insights that may help to systematically explore potential connections
between findings on the neuronal level (e.g., single-cell discharge patterns), and
observations on the behavioural level (e.g., spatial navigation). Mathematical rep-
resentations permit to describe both the space and time components characteris-
ing the couplings between neurobiological processes. Models can help to scale up
from single cell properties to the dynamics of neural populations, and generate
novel hypotheses about their interactions to produce complex behaviour.
Acknowledgments. Granted by the EC Project ICEA (Integrating Cognition,
Emotion and Autonomy), IST-027819-IP.
References
1. Arleo, A., Rondi-Reig, L.: Multimodal sensory integration and concurrent navi-
gation strategies for spatial cognition in real and artificial organisms. J. Integr.
Neurosci. 6(3), 327–366 (2007)
2. Dollé, L., Khamassi, M., Girard, B., Guillot, A., Chavarriaga, R.: Analyzing in-
teractions between navigation strategies using a computational model of action
selection. In: Freksa, C., et al. (eds.) SC 2008. LNCS (LNAI), vol. 5248, pp. 71–86.
Springer, Heidelberg (2008)
54 L.-E. Martinet et al.
3. O’Keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Oxford University
Press, Oxford (1978)
4. Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I.: Microstructure of a
spatial map in the entorhinal cortex. Nature 436(7052), 801–806 (2005)
5. Wiener, S.I., Taube, J.S.: Head Direction Cells and the Neural Mechansims of
Spatial Orientation. MIT Press, Cambridge (2005)
6. Poucet, B., Lenck-Santini, P.P., Hok, V., Save, E., Banquet, J.P., Gaussier, P.,
Muller, R.U.: Spatial navigation and hippocampal place cell firing: the problem of
goal encoding. Rev. Neurosci. 15(2), 89–107 (2004)
7. Amaral, D.G., Witter, M.P.: The three-dimensional organization of the hippocam-
pal formation: a review of anatomical data. Neurosci. 31(3), 571–591 (1989)
8. Wilson, M.A., McNaughton, B.L.: Dynamics of the hippocampal ensemble code
for space. Science 261, 1055–1058 (1993)
9. Nitz, D.A.: Tracking route progression in the posterior parietal cortex. Neu-
ron. 49(5), 747–756 (2006)
10. Hok, V., Save, E., Lenck-Santini, P.P., Poucet, B.: Coding for spatial goals in
the prelimbic/infralimbic area of the rat frontal cortex. Proc. Natl. Acad. Sci.
USA. 102(12), 4602–4607 (2005)
11. Knierim, J.J.: Neural representations of location outside the hippocampus. Learn.
Mem. 13(4), 405–415 (2006)
12. Granon, S., Poucet, B.: Medial prefrontal lesions in the rat and spatial navigation:
evidence for impaired planning. Behav. Neurosci. 109(3), 474–484 (1995)
13. Jay, T.M., Witter, M.P.: Distribution of hippocampal ca1 and subicular efferents
in the prefrontal cortex of the rat studied by means of anterograde transport of
phaseolus vulgaris-leucoagglutinin. J. Comp. Neurol. 313(4), 574–586 (1991)
14. Kita, H., Kitai, S.T.: Amygdaloid projections to the frontal cortex and the striatum
in the rat. J. Comp. Neurol. 298(1), 40–49 (1990)
15. Thierry, A.M., Blanc, G., Sobel, A., Stinus, L., Golwinski, J.: Dopaminergic ter-
minals in the rat cortex. Science 182(4111), 499–501 (1973)
16. Uylings, H.B.M., Groenewegen, H.J., Kolb, B.: Do rats have a prefrontal cortex?
Behav. Brain. Res. 146(1-2), 3–17 (2003)
17. Aggleton, J.: The amygdala: neurobiological aspects of emotion, memory, and men-
tal dysfunction. Wiley-Liss, New York (1992)
18. Schultz, W.: Predictive reward signal of dopamine neurons. J. Neurophysiol. 80(1),
1–27 (1998)
19. Jung, M.W., Qin, Y., McNaughton, B.L., Barnes, C.A.: Firing characteristics of
deep layer neurons in prefrontal cortex in rats performing spatial working memory
tasks. Cereb. Cortex 8(5), 437–450 (1998)
20. Otani, S.: Prefrontal cortex function, quasi-physiological stimuli, and synaptic plas-
ticity. J. Physiol. Paris 97(4-6), 423–430 (2003)
21. Fuster, J.M.: The prefrontal cortex–an update: time is of the essence. Neu-
ron. 30(2), 319–333 (2001)
22. Cohen, J.D., Braver, T.S., Brown, J.W.: Computational perspectives on dopamine
function in prefrontal cortex. Curr. Opin. Neurobiol. 12(2), 223–229 (2002)
23. Mountcastle, V.B.: Modality and topographic properties of single neurons of cat’s
somatic sensory cortex. J. Neurophysiol. 20(4), 408–434 (1957)
24. Mountcastle, V.B.: The columnar organization of the neocortex. Brain 120, 701–
722 (1997)
25. Buxhoeveden, D.P., Casanova, M.F.: The minicolumn hypothesis in neuroscience.
Brain 125(5), 935–951 (2002)
Cortical Model for Navigation Planning 55
26. Hampson, S.: Connectionist problem solving. In: The Handbook of Brain Theory
and Neural Networks, pp. 756–760. The MIT Press, Cambridge (1998)
27. Meyer, J.A., Filliat, D.: Map-based navigation in mobile robots - ii. a review of
map-learning and path-planing strategies. J. Cogn. Syst. Res. 4(4), 283–317 (2003)
28. Burnod, Y.: An adaptative neural network: the cerebral cortex. Masson (1989)
29. Bieszczad, A.: Neurosolver: a step toward a neuromorphic general problem solver.
Proc. World. Congr. Comput. Intell. WCCI94 3, 1313–1318 (1994)
30. Frezza-Buet, H., Alexandre, F.: Modeling prefrontal functions for robot navigation.
IEEE Int. Jt. Conf. Neural. Netw. 1, 252–257 (1999)
31. Hasselmo, M.E.: A model of prefrontal cortical mechanisms for goal-directed be-
havior. J. Cogn. Neurosci. 17(7), 1115–1129 (2005)
32. Schmajuk, N.A., Thieme, A.D.: Purposive behavior and cognitive mapping: a neu-
ral network model. Biol. Cybern. 67(2), 165–174 (1992)
33. Dehaene, S., Changeux, J.P.: A hierarchical neuronal network for planning behav-
ior. Proc. Natl. Acad. Sci. USA. 94(24), 13293–13298 (1997)
34. Banquet, J.P., Gaussier, P., Quoy, M., Revel, A., Burnod, Y.: A hierarchy of asso-
ciations in hippocampo-cortical systems: cognitive maps and navigation strategies.
Neural Comput. 17, 1339–1384 (2005)
35. Fleuret, F., Brunet, E.: Dea: an architecture for goal planning and classification.
Neural Comput 12(9), 1987–2008 (2000)
36. Tolman, E.C., Honzik, C.H.: ”Insight” in rats. Univ. Calif. Publ. Psychol. 4(14),
215–232 (1930)
37. Arleo, A., Gerstner, W.: Spatial orientation in navigating agents: modeling head-
direction cells. Neurocomput. 38(40), 1059–1065 (2001)
38. Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform
gabor space sampling, unsupervised growing networks, and reinforcement learning.
IEEE Trans. Neural. Netw. 15(3), 639–651 (2004)
39. Rao, S.G., Williams, G.V., Goldman-Rakic, P.S.: Isodirectional tuning of adjacent
interneurons and pyramidal cells during working memory: evidence for microcolum-
nar organization in pfc. J. Neurophysiol. 81(4), 1903–1916 (1999)
40. Triesch, J.: Synergies between intrinsic and synaptic plasticity mechanisms. Neural
Comput. 19(4), 885–909 (2007)
41. Willmore, B., Tolhurst, D.J.: Characterizing the sparseness of neural codes. Netw.
Comput. Neural Syst. 12(3), 255–270 (2001)
42. Bialek, W., Rieke, F., de Ruyter van Steveninck, R., Warland, D.: Reading a neural
code. Science 252(5014), 1854–1857 (1991)
43. Samsonovich, A., Ascoli, G.: A simple neural network model of the hippocampus
suggesting its pathfinding role in episodic memory retrieval. Learn. Mem. 12, 193–
208 (2005)
Efficient Wayfinding in Hierarchically
Regionalized Spatial Environments
Cognitive Neuroinformatics
University of Bremen
28359 Bremen, Germany
{trking,ckohlhag,zetzsche}@informatik.uni-bremen.de
1 Introduction
Agents situated in spatial environments must be capable of autonomous navigation
using prior learned representations. There exist a wide variety of approaches for
representing environments ranging from metrical maps [1] to topological graphs [2].
In the context of large-scale wayfinding topological models seem cognitively more
plausible because they are robust with regard to global consistency, and because
they permit abstracting from unnecessary details, enabling higher level planning
[3]. Topological graph-based representations of space can be divided into those uti-
lizing single “flat” graphs and those that employ hierarchies of graphs for different
layers of granularity. Single-graph schemes are limited in case of large domains since
action selection by an agent may take unacceptably long times due to huge search
spaces. Hierarchical approaches on the other hand decompose these spaces and are
therefore significantly more efficient but solutions are not always guaranteed to
be optimal.
One possibility for a hierarchical representation is to assume graph-subgraph
structures in which higher levels form subsets of lower, more detailed levels.
This approach has been particularly popular in geographical information sys-
tems (GIS), where this technique can be used to eliminate unwanted details.
Based on this idea a domain-specific path planning algorithm for street maps was
proposed in [4]. The authors approximated street maps by connection grids and
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 56–70, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 57
Fig. 1. Small indoor environment with superimposed region hierarchy. Local connec-
tivity graphs are used for efficient near-optimal path planning (not shown).
constructed each hierarchical layer as a subset of the next lower layer. Another
way of defining topological hierarchies is having nodes at higher levels represent
sets of nodes at lower levels. A hierarchical extension of the A∗ algorithm (HPA∗ )
can be found in [5]. In this approach a topological graph is abstracted from an
occupancy gridmap which facilitates restricting the search space. In [6] the D∗
algorithm for robot path planning is modified in order to support hierarchies
consisting of different node classes. Unlike the other approaches the hierarchical
D∗ algorithm is guaranteed to generate optimal paths, however, it requires the
offline computation and storage of partial paths. Furthermore, it depends on the
availability of exact metrical information. In addition to wayfinding, hierarchical
representations have been successfully applied to related spatial reasoning prob-
lems, such as the traveling salesman problem [7,8] or the automated generation
of route directions [9].
In this paper we introduce a cognitively motivated hierarchical representa-
tion which is based on regions. We provide a formal data structure for this
representation, and we develop an efficient wayfinding algorithm that exploits
its specific properties. Many natural and man-made environments exhibit an
intrinsic regionalization that can be directly exploited for building hierarchical
representations and there is strong experimental evidence that humans actually
make use of region-based hierarchical representations in the context of navi-
gation [10,11,12,13,14]. Research on cognitive maps supports this idea and has
identified the hierarchical nature of spatial representations as one of their crucial
properties [15,16].
58 T. Reineking, C. Kohlhagen, and C. Zetzsche
Our work originated from a project on a cognitive agent which explores and
navigates through an indoor environment by means of a hierarchical region-
based representation [17]. The architecture was based on a biologically inspired
approach for recognizing regions by sensorimotor features [18]. Figure 1 shows
a small example of a typical environment with a superimposed hierarchy. In
this representation, regions were formed by means of visual scene analysis [19]
and intrinsic connectivity structure. However, this solution depends on specific
properties of the environment. In the current paper we hence use a quadtree
approach for building hierarchies, to enable comparability with other approaches
and to ease complexity analysis.
In our model smaller regions are grouped to form new regions at the next hi-
erarchy level which yields a tree-like structure. This hierarchical representation
is augmented by local connectivity graphs for each subtree. Explicit metrical
information is completely disregarded, instead the regions themselves constitute
a qualitative metric at different levels of granularity. By combining the hierar-
chically organized connectivity graphs with this qualitative metric we were able
to develop an efficient online wayfinding algorithm that produces near-optimal
paths while drastically reducing the search space. Abstract paths are determined
at the top of subtrees and recursively broken down to smaller regions. By only
relaxing the first element of an abstract path the next action can be obtained in
logarithmic time depending on the number of regions while generating complete
paths can be done with almost linear complexity.
The idea of region-based wayfinding has been introduced as a heuristic in [13],
but without an explicit computational model. The aforementioned hierarchical
wayfinding algorithms do not consider the topological notion of regions as an
integral part of the underlying representation, nor can they be easily adapted to
regionalized environments in general.
The paper is structured into two parts. The first explains the hierarchical
region representations, its properties, and how it can be constructed. The second
introduces the wayfinding algorithm utilizing this data structure. We analyze its
algorithmic complexity and compare it to existing path planning approaches.
We conclude in a short discussion about advantages of the proposed model and
give hints towards future extensions.
2 Region Hierarchy
In this section we describe the region hierarchy as a formal data structure and
demonstrate how valid hierarchies can be constructed. We argue that most real-
world environments are inherently regionalized in that they are comprised of
areas that form natural units. These units form larger regions at higher levels of
granularity, resulting in a hierarchy of regions.
Most approaches in the context of path planning rely on metric information for
estimating distances. As mentioned in the introduction the proposed representa-
tion enables an agent to obtain qualitative distances based on regions, thus mak-
ing quantitative information an optional addition. A qualitative metric is given if
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 59
regions of one hierarchy level are of similar size and approximately convex. The
length of a path can then be assessed by the number of crossed regions at a given
level. This is especially useful since it allows an agent to estimate distances with
near-arbitrary precision depending on the considered hierarchy level.
In order to derive a computational model of the representation it is necessary
to make assumptions about how regions are modeled. First we assume that
each region is fully contained by exactly one region at next coarser level of
granularity, that there is a single region containing all other ones, and that
the set of all regions therefore constitutes a tree. Humans may use a more fuzzy
representation with fluent region boundaries and less strict containment relations
[20] but a tree of regions seems a reasonable approximation. Second, we demand
the descendants of a region to be reachable from each other without leaving the
parent region. This asserts the existence of a path within the parent region which
can be used as a clustering criterion for constructing valid hierarchies.
Unlike a flat connectivity graph a hierarchy allows the representation of con-
nectivity at multiple layers of detail. We propose imposing region connectivity
on the tree structure by storing connections in the smallest subtree only, thus
decomposing a global connectivity graph into local subgraphs. This limits the
problem space of a wayfinding task to the corresponding subtree, thus exclud-
ing large parts of the environment at each recursion step which is in fact the
underlying idea for the wayfinding algorithm described in the next section.
In order to determine the smallest subtree in which two nodes r1 , r2 are located, it
is necessary to determine their first common ancestor (FCA) f , i.e., the subtree’s
root node:
∀f, r1 , r2 : fca(f, r1 , r2 ) ⇔ in(r1 , f ) ∧ in(r2 , f )
∧ (¬∃p : in∗ (p, f )
∧ in(r1 , p)
∧ in(r2 , p)). (5)
The hierarchy is composed of two kinds of nodes: atomic regions and complex
regions. Atomic regions correspond to places in the environment that are not
further divided while complex regions comprise atomic regions or other complex
regions, and therefore represent the area covered by the non-empty set of all
their descendants. Regions do not intersect other than by containment and the
set of atomic regions exhaustively covers the environment. (In case of the indoor
domain an atomic region could be a room, whereas a complex region might
represent a hallway along with all its neighboring rooms.) An atomic region a
therefore contains no other region r:
∀a, r : atomic(a) ⇔ ¬in∗ (a, r). (6)
The connectivity of atomic regions is given by the environment. In our model
an atomic connection is simply a tuple of atomic regions a1 , a2 :
∀a1 , a2 : con(a1 , a2 ) ⇒ atomic(a1 ) ∧ atomic(a2 ). (7)
Further information like the specific action necessary for reaching the next region
could be represented as well but for the sake of simplicity we stick to region
tuples as connections. Note that the connection predicate is non-transitive and
irreflexive, thus disallowing a region to be connected with itself.
The global connectivity graph is hierarchically decomposed by storing atomic
connections in the root of the smallest subtree given by two regions a1 , a2 . There-
fore each region f carries a set of all atomic connections between its descendants
provided that this node is their FCA:
∀f, a1 , a2 : consa (f, a1 , a2 ) ⇔ fca(f, a1 , a2 )
∧ con(a1 , a2 ). (8)
This connection set is later used for obtaining crossings between regions at the
atomic level.
Alongside atomic connections the hierarchy also needs to represent the con-
nectivity of complex regions. For this purpose each region has a second set con-
taining complex connections. A complex connection is a tuple of two regions
c1 , c2 sharing the same parent f . A complex connection exists if a region a1
contained by (or equal to) c1 is atomically connected to a region a2 contained
by (or equal to) c2 :
∀f, c1 , c2 : consc (f, c1 , c2 ) ⇔ ∃a1 , a2 : consa (f, a1 , a2 )
∧ child(c1 , f, a1 )
∧ child(c2 , f, a2 ). (9)
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 61
A complex connection therefore exists if and only if the set of atomic connec-
tions contains a corresponding entry. The set of complex connections defines
a connected graph by interpreting region tuples as edges. This graph enables
searching for paths between complex regions, whereas the atomic connection set
yields the actual (atomic) crossings between the complex regions.
The existence of a path between two arbitrary regions s, d is conditioned on
a third region r that completely encompasses the path. Hence for all nodes x
along the path the in predicate must be fulfilled with respect to r:
Finally we state the connectivity criterion that a valid hierarchy must satisfy.
We require that all regions r1 , r2 located in a common parent region p must be
reachable from each other without leaving p:
2.2 Clustering
The problem of imposing a hierarchy onto an environment is essentially a matter
of clustering regions hierarchically. Humans seem to be able to do this effort-
lessly and there is evidence that the acquisition of region knowledge happens
very early during the exploration of an environment [14]. Some suggestions on
principles for the automated hierarchical clustering of spatial environments can
be found in [21]. However, automatically generating hierarchies similar to the
ones constructed by humans for arbitrary spatial configurations is an unsolved
problem. We briefly describe a domain-specific clustering approach for indoor en-
vironments. For the purpose of auditability and comparability of our wayfinding
algorithm’s performance we first state the more generic albeit artificial quadtree
as a possibility for generating hierarchies.
While humans seem to use various criteria for grouping regions, we focus on
the connectivity aspect, since it is essential for navigation. We require a proper
hierarchy to fulfill four properties. The first two are similarity of region size at
each hierarchy level and convexity as mentioned above. The third is given by
(11) and asserts the existence of a path within a bounding region. The fourth
property concerns the hierarchy’s shape. The tree should be nearly balanced and
its depth must be logarithmically dependent on the number of atomic regions.
This excludes “flat” hierarchies as well as “deformed” ones with arbitrary depths.
Note that the third requirement is necessary for correctness of the wayfinding
algorithm described in the next section while the hierarchy’s shape merely affects
62 T. Reineking, C. Kohlhagen, and C. Zetzsche
the algorithms computation time. Size and convexity of regions determine the
accuracy of qualitative distance estimates.
Generating proper clusters becomes significantly easier if one makes assump-
tions about the connectivity structure of an environment. In the spatial domain
it is popular to approximate place connectivity, i.e., connections between atomic
regions, by assuming grid-like connections where each region is connected to its
four neighboring regions. In this case a simple quadtree can be applied in which
a set of four adjacent regions at level k corresponds to one region at level k + 1.
The resulting hierarchy would indeed satisfy the connectivity property defined
by (11) and with a constant branching factor of b = 4 its depth would be loga-
rithmically bounded. Similar region size and convexity are asserted due to the
uniformity of grid cells. However, the applicability of the quadtree approach is
limited in case of environments with less regular and more restricted connectiv-
ity since this could easily violate the connectedness of regions. This connectivity
restriction is especially predominant in indoor environments.
A first approach towards modeling human grouping mechanisms for indoor
environments has been proposed in [17]. It applied a classical metrical cluster
algorithm to local quantitative representations and combined this with domain-
specific connectivity-based heuristics for the topological level. Indoor environ-
ments are especially suited for the hierarchical clustering of regions because they
offer natural region boundaries in the form of walls and because they are char-
acterized by local connectivity agglomerations. The latter can be exploited by
defining regions based on places of high connectivity, e.g., a hallway could form
a region together with its connected rooms. This not only asserts connectedness
of regions, it also leads to intuitively shaped hierarchies with convex regions like
the one shown in figure 1. Similar region sizes at each hierarchy level are a direct
result of the high degree of symmetry found in many indoor environments.
3 Wayfinding
3.1 Algorithm
The basic idea of the algorithm is to limit the search space to the region given
by the minimum subtree containing the source and destination region at each
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 63
recursion step. This is possible because (11) guarantees the connectedness of such
a region. Within this region an “abstract” path is constructed using a shortest
path algorithm. The search space is the local connectivity graph composed of the
direct descendants of the subtree’s root and their connections. Start and goal of
the abstract path are given by the regions in which the source and destination
region are located at the corresponding level. For each connection of the abstract
path a corresponding one from the set of atomic connections is selected and used
as a new destination for the next recursion. This process is repeated until the
atomic level is reached and a complete path has been constructed. Alternatively
only the first element from each abstract path is relaxed, which yields only the
first atomic region crossing while keeping the other crossings abstract.
Figure 2 illustrates how the region hierarchy is used to find a path from one
atomic region to another by means of complex and atomic connections.
1 find way ( s , d , h )
2 i f i n ( s , d , h ) then
3 return [ ] // empty p a t h
4
5 f c a := fca ( s , d , h)
6 cs a = c o n a ( f c a , h ) // atomi c c o n n e c t i o n s
7 cs c = c o n c ( f c a , h ) // complex c o n n e c t i o n s
8 p c := Dijkstra ( cs c , child ( s , fca ) , child (d , fca ))
9
10 c u r := s // c u r r e n t ( atomi c ) r e g i o n
11 p a := [ ] // ( atomi c ) p a t h
12 f o r each c c from p c
13 c a := s e l e c t ( cur , c c , c s a )
14 p a := p a + f i n d w a y ( cur , s o u r c e ( c a ) , h )
15 p a := p a + c a
16 cur = d e s t i n a t i o n ( c a )
17
18 return p a + f i n d w a y ( cur , d , h )
The algorithm has three input parameters: s denotes the atomic source region,
d the atomic or complex destination region and h is a properly clustered region
hierarchy. First the trivial case of s being located in (or equal to) d is covered
which results in an empty path (lines 2-3).
If this is not the case, the FCA (defined by (5)) of s and d is determined
(line 5). Given this region the set of complex connections cs c between the
direct descendants of fca and the set of corresponding atomic connections cs a
(given by (9) and (8)) is obtained (lines 6-7). The former is used to construct
an abstract path p c from s to d composed of the FCA’s direct descendants by
applying Dijkstra’s shortest path algorithm.
Next the current region cur is initialized with the source region s (line 10)
and the algorithm iterates through all complex connections in the abstract path
(line 12). From the set of atomic connections cs a an element corresponding to
the current complex connection is randomly selected (line 13). Corresponding
means that the source region of the atomic connection must be located in (or
64 T. Reineking, C. Kohlhagen, and C. Zetzsche
Fig. 2. For finding a path from region R1 to region R5 the FCA of both nodes is first
determined. This node contains a connectivity graph composed of complex connections
between direct descendants (C1-R3, R3-C2), which is used for finding the abstract
path from C1 to C2 (C1→R3→C2). The actual region crossings obtained from the set
of corresponding atomic connections (R2-R3, R3-R4) form new destinations for the
subsequent recursion steps (first R2, then R4).
equal to) the complex connection’s source region and the same must be true
for the destination region. This atomic connection c a can be thought of as a
waypoint that the agent has to pass in order to reach the next region, even if
the complete path has not been determined yet. This intermediate goal then
becomes the new destination for the recursion step; the new source is given by
the current region cur (line 14). The result is a partial atomic path which is
concatenated to the path variable p a along with the atomic connection c a
(lines 14-15). Afterwards the current region is updated accordingly (line 16).
Finally, the path from the current region to the actual destination d is obtained
recursively and the combined path is returned (line 18).
As mentioned the algorithm can be operated in two modes. In the form in
which it is stated above a complete path containing all atomic steps is con-
structed. If the agent is only interested in generating the next action towards
the goal, the iteration over all complex connections can be omitted. Instead
only the first connection at each hierarchy level is recursively broken down to
the atomic level while the remaining paths are kept abstract. This guarantees
a decrease of hierarchy level by at least one in each recursion because the next
intermediate destination given by l a is always located in the same complex
region (one level below the current FCA) as the current region.
3.2 Complexity
Here we investigate the time and space complexity of our algorithm. Space com-
plexity is equal to that of a flat connectivity graph apart from a slight overhead.
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 65
The number of atomic regions n and atomic connections is exactly the same.
1
There is an additional number of b−1 n of complex regions, assuming the tree
is balanced and that it has a constant branching factor b. The number of com-
plex connections depends on the specific connectivity of an environment but it
is obviously lower than the number of atomic connections, since complex con-
nections are only formed for atomic connections defining a region crossing at the
complex level. The overall space complexity is thus of the same class as that of
a flat connectivity graph.
If the algorithm is operated such that it only determines the next action, it
exhibits a time complexity of O(log n). The construction of an abstract path at a
given level lies in O(1), if the branching factor has an upper bound independent of
n (e.g., b = 4 for quadtrees). The selection of an atomic connection is of constant
time complexity as well because, given a complex connection, it is possible to
immediately obtain a corresponding atomic, for instance by using a lookup table.
By expanding only the first element of each abstract path, the current FCA
basically moves from the initial FCA down to the source region which leads to
logb n visited nodes in the worst case. This is because the tree’s depth is assumed
to be logarithmically dependent on n and because each recursion step leads to a
decrease of the current hierarchy level by at least one. Even though the current
subtree has to be redetermined for each new destination the complexity stays
the same. The overall complexity of determining the FCAs for all recursion
step is O(log n) as well since there are only logarithmically many candidates.
Implementation-wise the FCA search could be accomplished by using a simple
loop apart from the recursion. By combining the number of visited nodes with
the complexity for determining the current subtree additively, the complexity
for obtaining the next action is O(log n).
For planning complete paths the worst-case complexity is O(n log(log n)). In
the unlikely case of having to visit every single atomic region each complex region
becomes the FCA once. In caseof a properly shaped hierarchy the number of com-
1
plex regions is given by b−1 n and therefore lies in O(n). Determining the FCA
of two arbitrary regions lies in O(log(log n)) because it involves comparing two hi-
erarchy paths of logarithmic length. The comparison itself can be performed by
a binary search in O(log n). Again, the construction of an abstract path and the
retrieval of an atomic connection take a constant amount of time, which yields an
overall complexity of O(n log(log n)) for generating complete (atomic) paths.
3.3 Comparison
We compare the properties of our algorithm to the HPA∗ [5] path planning algo-
rithm and the hierarchical D∗ algorithm [6] as well as Dijkstra’s classical shortest
path algorithm [22]. We show that our approach exceeds all of these in terms of
time complexity while at the same time yielding reasonable approximations of
optimal solutions.
When implemented with an efficient distance queue the time complexity of
Dijkstra’s algorithms is O(|E| + n log n) with |E| denoting the number of edges
66 T. Reineking, C. Kohlhagen, and C. Zetzsche
in a given connectivity graph [23]. In case of sparse graph, i.e., graphs for which
n log n is not dominated by the number of edges, the complexity reduces to
O(n log n). This sparseness is generally satisfied in the spatial domain, since
connectivity is limited to the intermediate neighborhood, which is certainly true
for regionalized environments. For practical purposes HPA∗ and hierarchical D∗
are both significantly faster than Dijkstra’s algorithm, however their worst-case
complexity poses no improvement over the one of Dijkstra’s algorithm.
In contrast, our algorithm exhibits a complexity of only O(n log(log n)) for
planning complete paths. Furthermore, the number of expanded regions is nb in-
stead of n in case of Dijkstra’s algorithm because atomic regions behave trivially
as FCAs. The complexity of O(log n) for determining the next action can not be
compared, since Dijkstra’s algorithm has to generate all shortest paths before
being able to state the first step.
Despite being efficient an algorithm obviously also needs to produce useful
paths. Like Dijkstra’s algorithm the hierarchical D∗ algorithm generates optimal
paths, however, it does so at the expense of having to calculate partial solutions
offline, leading to increased storage requirements. HPA∗ on the other hand yields
near-optimal results, if an additional path smoothing is applied, while also using
precomputed partial solutions. Since both approaches make use of the A∗ search
algorithm, they both require the availability of metrical information in order
to obtain an admissible distance heuristic [24], which makes them unsuited for
cases where such knowledge can not be provided.
For the purpose of analyzing our algorithm in terms of path lengths we set up
a simulation in which we tested the proposed algorithm against optimal solutions
obtained via Dijkstra’s shortest path algorithm. Since we did not consider metric
knowledge the length of a path was measured by the number of visited atomic
regions. The environment was a quadratic grid with 64 atomic regions in which
each cell was connected to its four neighbors. We chose a grid-like connectivity
since it works well as a general approximation for many environments and since
it allowed us to avoid using domain-specific clustering criteria. A simple quadtree
was therefore applied to obtain a region hierarchy.
In 1000 iterations we randomly selected two cells as source and destination
regions for the wayfinding task and we compared the path lengths obtained via
our hierarchical algorithm to the ones of Dijkstra’s algorithm. On average the
produced paths contained 20.5% more atomic regions. Although these results
are not sufficient for an in-depth analysis, the example demonstrates that the
resulting paths are not arbitrarily longer than shortest paths. For domains with
more restricted connectivity such as indoor environments we observed better
performance, typically equal to optimal solutions.
The main source of error resulted from the selection of atomic connections
between two regions, because the regions themselves do not offer any information
that would permit the derivation of a useful selection heuristic. The discussion
points to some work that could improve this behavior. The error is considerably
reduced if the hierarchy’s branching factor is increased. In fact there is a direct
Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 67
Fig. 3. A grid-like environment with source S and destination D and two paths, one
optimal (dashed), the other obtained by the hierarchical algorithm (continuous). The
different gray levels of hierarchy nodes indicate the different search levels: white nodes
are completely disregarded, gray nodes are part of a local search and black nodes are
part of a search and recursively expanded. The gray cells at the bottom visualize the
atomic search space.
trade-off between efficiency and average path length, because higher branching
factors lead to larger local search spaces for which optimal solutions are obtained.
Next to time complexity and path optimality it is noteworthy to take a look
at the search space of the proposed algorithm. Unlike Dijkstra’s algorithm which
blindly expands nodes in all directions, our algorithm limits the set of possible
paths for each recursion by excluding solutions at the parent level. Figure 3 shows
the environment used during the simulation along with two exemplary paths be-
tween two region S and D, one optimal, the other one constructed hierarchically.
7
On the atomic level only 16 of the regions are considered by our approach while
Dijkstra’s algorithm visits each region. This ratios decreases further with more
atomic regions and it tends to zero as the number of region tends to infinity.
4 Discussion
the next action is done in O(log n). Aside from time complexity the hierarchical
organization leads to drastically reduced search spaces for most domains.
There are two main sources of error that can cause a suboptimality of com-
puted paths. One is the estimated path length based on regions at higher levels.
This estimate becomes less accurate the more region size varies and the less
convex regions are shaped. However, this problem effects human judgment of
distance as well, as has been shown in [10]. The other type of error results from
the way how atomic connections between regions are selected. In the current im-
plementation this is done by a first-match mechanism which is problematic for
environments with many region crossings, as in case of the mentioned grid maps.
This problem could be reduced if suitable selection heuristics were available. One
possibility is the use of relative directions for which hierarchical region-based ap-
proaches already exist [25].
In addition to efficient wayfinding the suggested hierarchical region-based rep-
resentation could provide an agent with other useful skills. For example, it en-
ables an abstraction from unnecessary details, and it can also overcome the
problem of asserting a common reference frame. Obviously all this requires that
real environments are actually regionalized in an inherently hierarchical fashion.
We believe that this is indeed the case for most real-world scenarios. Finally, we
expect hierarchical representations to facilitate the communication about spatial
environments between artificial agents and humans [9,26] and spatial problem
solving in general.
Acknowledgements
This work was supported by the DFG (SFB/TR 8 Spatial Cognition, project
A5-[ActionSpace]).
References
1. Thrun, S.: Robotic mapping: A survey (2002)
2. Werner, S., Krieg-Brückner, B., Herrmann, T.: Modelling navigational knowledge
by route graphs. In: Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.) Spatial
Cognition II 2000. LNCS (LNAI), vol. 1849, pp. 295–317. Springer, Heidelberg
(2000)
3. Kuipers, B.: The spatial semantic hierarchy. Technical Report AI99-281 (29, 1999)
4. Car, A., Frank, A.: General principles of hierarchical reasoning - the case of
wayfinding. In: SDH 1994, Sixth Int. Symposium on Spatial Data Handling, Edin-
burgh, Scotland (September 1994)
5. Botea, A., Müller, M., Schaeffer, J.: Near optimal hierarchical path-finding. Journal
of Game Development 1(1), 7–28 (2004)
6. Cagigas, D.: Hierarchical D* algorithm with materialization of costs for robot path
planning. Robotics and Autonomous Systems 52(2-3), 190–208 (2005)
7. Graham, S., Joshi, A., Pizlo, Z.: The traveling salesman problem: a hierarchical
model. Memory & Cognition 28(7), 1191–1204 (2000)
8. Pizlo, Z., Stefanov, E., Saalweachter, J., Li, Z., Haxhimusa, Y., Kropatsch, W.:
Traveling salesman problem: a foveating pyramid model. Journal of Problem Solv-
ing 1, 83–101 (2006)
70 T. Reineking, C. Kohlhagen, and C. Zetzsche
9. Tomko, M., Winter, S.: Recursive construction of granular route directions. Journal
of Spatial Science 51(1), 101–115 (2006)
10. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychol-
ogy 10, 526–550 (1978)
11. Hirtle, S.C., Jonides, J.: Evidence of hierarchies in cognitive maps. Memory and
Cognition 13(3), 208–217 (1985)
12. McNamara, T.P.: Mental representations of spatial relations. Cognitive Psychol-
ogy 18, 87–121 (1986)
13. Wiener, J., Mallot, H.: ’Fine-to-coarse’ route planning and navigation in regional-
ized environments. Spatial Cognition and Computation 3(4), 331–358 (2003)
14. Wiener, J., Schnee, A., Mallot, H.: Navigation strategies in regionalized environ-
ments. Technical Report 121 (January 2004)
15. Voicu, H.: Hierarchical cognitive maps. Neural Networks 16(5-6), 569–576 (2003)
16. Thomas, R., Donikian, S.: A model of hierarchical cognitive map and human mem-
ory designed for reactive and planned navigation. In: 4th International Space Syn-
tax Symposium, Londres (June 2003)
17. Gadzicki, K., Gerkensmeyer, T., Hünecke, H., Jäger, J., Reineking, T., Schult, N.,
Zhong, Y., et al.: Project MazeXplorer. Technical report, University of Bremen
(2007)
18. Schill, K., Zetzsche, C., Wolter, J.: Hybrid architecture for the sensorimotor rep-
resentation of spatial configurations. Cognitive Processing 7, 90–92 (2006)
19. Schill, K., Umkehrer, E., Beinlich, S., Krieger, G., Zetzsche, C.: Scene analysis with
saccadic eye movements: Top-down and bottom-up modeling. Journal of Electronic
Imaging 10(1), 152–160 (2001)
20. Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P.: Where’s downtown?: Be-
havioral methods for determining referents of vague spatial queries. Spatial Cog-
nition & Computation 3(2-3), 185–204 (2003)
21. Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining. In:
Geographic Data Mining and Knowledge Discovery, pp. 188–217. Taylor & Francis,
Inc., Bristol (2001)
22. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
Mathematik 1(1), 269–271 (1959)
23. Barbehenn, M.: A note on the complexity of dijkstra’s algorithm for graphs with
weighted vertices. IEEE Trans. Comput. 47(2), 263 (1998)
24. Junghanns, A.: Pushing the limits: new developments in single-agent search. PhD
thesis, University of Alberta (1999)
25. Papadias, D., Egenhofer, M.J., Sharma, J.: Hierarchical reasoning about direction
relations. In: GIS 1996: Proceedings of the 4th ACM international workshop on
Advances in geographic information systems, pp. 105–112. ACM, New York (1996)
26. Maaß, W.: From vision to multimodal communication: Incremental route descrip-
tions. Artificial Intelligence Review 8(2), 159–174 (1994)
Analyzing Interactions between Navigation
Strategies Using a Computational Model of
Action Selection
1 Introduction
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 71–86, 2008.
c Springer-Verlag Berlin Heidelberg 2008
72 L. Dollé et al.
However, it is not yet clear whether these learning systems are independent
or whether they interact for action control in a competitive or in a coopera-
tive manner. The competition implies that inactivation of one system enhances
the learning of the remaining functional system, while the cooperation states
that learning in one system would compensate the limitations of the other one
[7,8,9,10,11]. The present work aims at investigating such interactions using a
computational model of spatial navigation based on the selection between the
map-based and map-free strategies [12]. Besides a qualitative reproduction of
the experimental results obtained in animals, the modelling approach allows
us to further characterize the competitive or cooperative nature of interactions
between the two strategies.
Following our previous modelling efforts [12], we study the interaction between
the navigation strategies in the experimental paradigm proposed by Pearce et
al. (1998) [13]. In this paradigm, which is a modification of the Morris Hidden
Water Maze task [14], two groups of rats (“Control” group of intact animals and
“Hippocampal” group of animals with damaged hippocampus) had to reach a
hidden platform indicated by a landmark located at a fixed distance and ori-
entation from the platform. After four trials, the platform and its associated
landmark were moved to another location and a new session started. The au-
thors observed that both groups of animals were able to learn the location of the
hidden platform, but at the start of each new session the hippocampal animals
were significantly faster in finding the platform than controls. Moreover, only
the control rats were able to decrease their escape latencies within a session.
From these results, authors conclude that rats could simultaneously learn two
navigation strategies. On the one hand, a map-based strategy encodes a spatial
representation of the environment based on visual extra-maze landmarks and
self-movement information. On the other hand, a map-free strategy (called by
the authors “heading vector strategy”) encodes the goal location based on its
proximity and direction with respect to the intra-maze cue [15]. Based on these
conclusions, the decrease in the escape latency within sessions could be explained
by the learning of a spatial representation by intact animals. Furthermore, such
learning also suggests that when the platform is displaced at the start of a new
session, intact rats would swim to the previous (wrong) location of the platform
based on the learned map, whereas hippocampal animals would swim directly
to the correct location.
For the modelling purposes, the results of this experiment can be summarized
as follows: (i) both groups of rats could decrease their escape latencies across
sessions, but only the control rats improved their performance within sessions;
(ii) the improvement in the performance within each session, observed in the
control group, could be attributed to the use of a map-based strategy by these
rats; and (iii) higher performance of hippocampal rats relative to the controls
at the start of each session could be due to the use of the map-free strategy
(the only strategy that could be used by the lesioned animals). In other words,
the process of choosing the best strategy (i.e. the competition) performed by
Analyzing Interactions between Navigation Strategies 73
the control, but not the hippocampal, animals, decreased the performance of
controls relative to that of lesioned animals.
We have shown previously that the computational model used in the present
study is able to reproduce the behaviour of rats in the experiment of Pearce et al.
[12]. In the present paper, we extend these results by performing a further anal-
ysis of the interactions between both learning systems at different stages of
the experiment, taking into account the three points formulated above. In the
following section, we describe the model, the simulated environment and the
experimental protocol. Then we present the results and the analyses. Finally, we
discuss the results in terms of interactions between systems.
Strategy Learning. Both experts learn the association between their inputs
and the actions leading the robot to the platform, using a direct mapping be-
tween inputs (either SI or PC) and directions of movement (i.e., actions). Move-
ments are encoded by a population of 36 action cells (AC). The policy is learned
by both experts by means of a neural implementation of Q-learning algorithm [17].
74 L. Dollé et al.
Sensory inputs
Place cells
AC MFe AC MBe
Gating network
gMFe gMBe
ΦMFe , AMFe ΦMBe , AMBe
Selection
Fig. 1. The computational model of strategy selection [12]. The gating network receives
the inputs of both experts, and their reward prediction error, in order to compute their
reliability according to their performance (i.e., gating values gk ). Gating values are
then used with the Action value Ak in order to compute the probability of each expert
to be selected. The direction Φ proposed by the winning expert is then performed. See
text for further explanations.
* *
(a) (b)
Fig. 2. (a) A simplified view of ad hoc place cells. Each circle represents a place cell
and is located at the cell’s preferred position (i.e., the place where the cells are most
active). Cell activity is color coded from white (inactive cells) to black (highly active
cells) (b) The environment used in our simulation (open circles: platform locations,
stars: landmarks).
In this algorithm, the value of every state-action pair is learned by updating the
synaptic weight wij linking input cell i to action cell j:
where η is the learning rate and δ the reward prediction error. The scaling
factor hk ensures that the learning module is updating its weights according to
its reliability (for all the following equations, k is either the MBe or the MFe). Its
Analyzing Interactions between Navigation Strategies 75
computation is detailed further below. The eligibility trace e allows the expert
to reinforce the state-action couples previously chosen during the trajectory:
where rjpre is the activity of the pre-synaptic cell j, λ a decay factor, and ri the
activity of the action cell i in the generalization phase. Generalization in the
action space is achieved by reinforcing every action weighted by a Gaussian of
standard deviation σac centered on the chosen action. Each expert suggests a
direction of movement Φk :
k
i ai sin(φi )
Φk = arctan k , (3)
i ai cos(φi )
where ai is the action value of the discrete direction of movement φi . The corre-
sponding action-value Ak , computed by linear interpolation of the two nearest
discrete actions [17].
Action Selection. In order to select the direction Φ of the next robot move-
ment, the model uses a gating scheme such that the probability of being selected
depends not only on the Q-values of the actions (Ak ), but also on a gating value
gk . Gating values are updated in order to quantify the expert’s reliability ac-
cording to the current inputs. It takes the shape of a network linking the inputs
(place cells and sensory inputs) to the gating values gk , computed as a weighted
sum:
gk = zkP C rP C + zkSI rSI , (4)
where zkP C is the synaptic weight linking the PC, with activation rP C to the
gate k, idem for zkSI . Weights are updated in order to approach hk = gk(gcikci )
i
2
where ck = e(−ρδk ) (ρ > 0), according to the following rule:
P C,SI
Δzkj = ξ(hk − gk )rjP C,SI . (5)
gk Ak
P (Φ = Φk ) = i
. (6)
i∈k gi A
If both experts have the same gating value (i.e., reliability), then the expert
with the highest action value will be chosen. In contrast, if both experts have
the same action value, the most reliable expert, i.e., the one with highest gating
value, will be chosen.
3 Results
3.1 Learning across and within Session
Our model qualitatively reproduces the results obtained in animals (Fig. 3a).
As shown in Fig. 3b, both Control and Hippocampal groups are able to learn
the task, i.e., their escape latencies decrease with training. Moreover, the perfor-
mance of the Control group improves within each session, as there is a significant
decrement of the escape latency between the first and fourth trials (p<0.001).
Finally, as it was the case with rats, escape latencies of the Hippocampal group
in the first trial are smaller than the Control group (p<0.001). Concerning the
Striatal group, Fig. 3c shows a significant improvement within session for this
group, but no learning is achieved across sessions, suggesting a key role of the
MFe in the performance improvement across sessions of the Control group.
3.2 Role of Interactions between the MFe and the MBe in the
Control Group
First trials: Increase of MFe Selection Across Sessions and Competi-
tion Between the MFe and the MBe Within Trials
In the first trial of every session, the platform is relocated, so as the strategy
learned by the MBe in the previous session is not relevant anymore. Accord-
ingly, the selection of the MFe expert near the current platform location increases
from the early to the late sessions (p<0.05), strongly suggesting a role of the MFe
in the latency decrease across sessions that occurs in the Control group (Fig. 4a).
Fig. 4a also shows that the MBe is often selected near the previous platform lo-
cation, suggesting the existence of a competition between both experts. MBe
78 L. Dollé et al.
(a) Control and Hippocampal (Pearce) (b) Control and Hippocampal (c) Control and Striatal
140 140
100
120 120
80 100 100
60 80 80
60 60
40
40 40
20
20 20
0 0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Session Session Session
Control−1 Control−1 Control−1
Control−4 Control−4 Control−4
Hippocampal−1 Hippocampal−1 Striatal−1
Hippocampal−4 Hippocampal−4 Striatal−4
Fig. 3. Mean escape latencies measured during the first and the fourth trial of each
session. (a) Results of the original experiment with rats, reproduced from [13]. (b)
Hippocampal group (MFe only) versus Control group (provided with both a MFe and
a MBe). (c) Striatal group (MBe only) versus Control group. See text for explanations.
1 1
Early sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 (start) 2 4 6 8 10 (goal)
1
1
Late sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 (start) 2 4 6 8 10 (goal)
Fig. 4. First trials: (a) Selection rates of MBe (empty boxes) and MFe (full boxes) near
the current and the previous platform in early (top) and late sessions (bottom) (b)
Selection rates of MBe and current goal occupation within trial in early (top) and late
(bottom) sessions
preference does not change within a trajectory and is in average less selected
than the MFe (Fig. 4b).
The trajectories (Fig. 5a and 5b) confirm the existence of a competition: the
MBe tends to lead the robot to the previous location of the platform – as shown
in the navigational maps of this expert (Fig. 5c and 5d) – whereas the MFe has
Analyzing Interactions between Navigation Strategies 79
* * *
* * * * * *
* * *
* * *
* * * * * *
* * *
Fig. 5. First trials: (a) Trajectory of the robot for the 3rd session (b) Trajectory of
the robot for the 9th session. (c) Navigational map of the MBe for the 3rd session (d)
Navigational map of the MBe for the 9th session (e) Navigational map of the MFe for
the 3rd session (f) Navigational map of the MFe for the 9th session.
learned to orient the robot towards the appropriate direction, i.e., at the South
of the landmark (Fig. 5e and 5f). This result is consistent with the explanation
provided by Pearce and colleagues and shows that the competition between the
MBe and the MFe is mainly responsible for the poor performances of the Control
group in the first trials.
Fourth trials: Cooperation Between the MFe and the MBe Within
Trials. At the end of a session, the platform location remained stable during
four trials, allowing the MBe to learn its location. According to Pearce’s hy-
pothesis, rats behaviour depends mainly on the map-based strategy (involving
the hippocampus) that has learned the platform location for this session. How-
ever, simulation results show that the Striatal group –controlled by the MBe
only– is outperformed by both the Hippocampal and the Control groups, de-
spite a high improvement within sessions (c.f. Fig. 3c). This suggests that the
performance of the Control group on the fourth trials cannot be explained exclu-
sively by the MBe expert. Indeed, although this expert leads the agent towards
the current goal position, it also leads to the previous goal location as illustrated
by its selection rate on both sites (Fig. 6a). In addition, selection rates within a
trajectory show a strategy change from the MFe –which is preferred at the be-
ginning of a trial– towards a preference for the MBe at the end of the trajectory
(Fig. 6b).
80 L. Dollé et al.
Early sessions
1
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
(start) 2 4 6 8 10 (goal)
0
1 1
Late sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
(start) 2 4 6 8 10 (goal)
0
current goal occupation
current goal previous goal MBe selection
(a) (b)
Fig. 6. Fourth trials: (a) Selection rates of MBe (empty boxes) and MFe (full boxes)
near the current and the previous platform in early (top) and late (bottom) sessions.
(b) Selection rates of MBe and current goal occupation within trial in early (top) and
late (bottom) sessions.
This sequence is visible in typical trajectories (Fig. 7a and 7b). The naviga-
tional maps of each expert reveal that the MFe orients the robot towards the
South of the landmark (Fig. 7e and 7f), whereas the MBe leads it on the precise
location of the platform, only when it is at its vicinity (Fig. 7c and 7d).
This suggests that the experts are cooperating by both adequately partici-
pating to the resolution of the task, depending on their reliability at a specific
point of the journey. Our findings –pointing out a cooperative interaction at the
end of each session– extend Pearce’s hypothesis of MBe dominance in behaviour
control.
* * *
* * * * * *
* * *
* * *
* * * * * *
* * *
Fig. 7. Fourth trials: (a) Trajectory of the robot for the 3rd session (b) Trajectory of
the robot for the 11th session. (c) Navigational map of the MBe for the 3rd session (d)
Navigational map of the MBe for the 11th session (e) Navigational map of the MFe for
the 3rd session (f) Navigational map of the MFe for the 11th session.
(a) Control group (b) Hippocampal group (MFe only) (c) Striatal group (MBe only)
0.5 0.5 0.5
MBe−1 MBe−1 MBe−1
MBe−4 MBe−4 MBe−4
MFe−1 MFe−1 MFe−1
0.4 MFe−4 0.4 MFe−4 0.4 MFe−4
Averaged heading error
0 0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Session Session Session
Fig. 8. Average heading error near the current platform for the three groups. Zero
means the expert is pointing to the platform, one means a difference of π. (a) Results
in the Control group (MBe and MFe activated) (b) Hippocampal group (MFe only) (c)
Striatal group (MBe only).
MBe is able to reach the every possible platform location, but only when it is in
its vicinity. This suggests that a cooperation between the MFe –leading the robot
to the neighbourhood of the current platform– and the MBe –finding the precise
location once the robot is there– would perform well and enhance the performance
of the robot. Therefore, this particular configuration of the MBe is impaired in the
82 L. Dollé et al.
* * * *
* * * * * * * *
* * * *
(a) MBe in Striatal (b) MBe in (c) MFe in Striatal (d) MFe in
group Hippocampal group group Hippocampal group
Fig. 9. (a) Navigational map of the MBe in the Striatal group in the last session (fourth
trial): there are no center of attractions in other platform locations than the current
and the previous ones
(b) Navigational map of the MBe in the Hippocampal group in the last session (fourth
trial): the MBe has learned the ways to go the four possible locations of the platform
(c) Navigational map of the MFe in the Striatal group in the last session (fourth trial):
it has learned the same kind of map as in the Hippocampal and the Control groups
(d) Navigational map of the MFe in the Hippocampal group in the last session (fourth
trial): the learned policy is very close to the one in the Striatal group.
case where the MBe should perform the trajectory alone, but enhanced in the case
of a cooperation with the MFe.
We observe that the behavior of the robot when controlled by the MFe,
strongly influence the MBe. In contrast, the MBe-based behavior has less in-
fluence on the improvement of the MFe strategy. Remarkably, activation of both
experts (i.e., Control group) do not impair the simultaneous learning of both
strategies and allows the MBe to achieve better performance than when this
expert is the only one available.
4 Discussion
* *
* * * *
* *
(a) (c)
* *
* * * *
* *
(b) (d)
Current platform
Previous platform
Fig. 10. (a) Trajectory at the fourth trial of the 7th session: as the simulated robot
mainly went to this platform from the South, direction to the North were reinforced,
even at the North of the platform.
(b) Trajectory at the first trial of the 8th session: Starting from North, the robot needs
then a longer trial to readjust the direction towards the current platform.
(c) Navigational map of the MFe at the fourth trial of the 7th session : direction to the
North were reinforced, even at the North of the platform.
(d) Navigational map of the MFe at the first trial of the 8th session.
taken into account. The place-response strategy currently used in the model
associates locations to actions that lead to a single goal location. Therefore,
when the platform is relocated, the strategy has to be relearned. An alternative
map-based strategy can be proposed such as the relations between different
locations are learned irrespectively of the goal location (e.g. a topographical
map of the environment). Planning strategies can be used to find the new goal
location without relearning [3]. The use of computational models of planning
(e.g. [20,21]) as a map-based strategy in our model can yield further insights on
the use of spatial information in these types of tasks.
5 Conclusion
What stands out from our results is that our model allowed to analyze the
selection changes between both learning systems, while providing information
that is not directly accessible in experiments with animals (e.g., strategy selection
rate, expert reliability). This information can be used to elaborate predictions,
and propose new experiments towards the two-fold goal of further improving
Analyzing Interactions between Navigation Strategies 85
our models and expand our knowledge of animal behaviour. It showed also that
opposite interactions can happen within a single experiment, and depend mainly
on contextual contingencies and practice, as it has been suggested by recent
works (e.g., [22,23]).
Coexistence of several spatial learning systems allows animals to dynami-
cally select which navigation strategy is the most appropriate to achieve their
behavioural goals. Furthermore, interaction among these systems may improve
the performance, either by speeding learning through collaboration of differ-
ent strategies, or competitive processes that prevents sub-optimal strategies to
be applied. Besides, better understanding of these interactions in animals by
use of the modelling approach described in this paper also contributes to the
improvement of autonomous robot navigation systems. Indeed, several bioin-
spired studies began exploring the robotic use of multiple navigation strategies
[12,24,25,26], the topic is however far from being fully explored yet.
Acknowledgment
References
1. Trullier, O., Wiener, S.I., Berthoz, A., Meyer, J.A.: Biologically-based artificial
navigation systems: review and prospects. Progress in Neurobiology 83(3), 271–
285 (1997)
2. Filliat, D., Meyer, J.A.: Map-based navigation in mobile robots - i. a review of
localisation strategies. Journal of Cognitive Systems Research 4(4), 243–282 (2003)
3. Meyer, J.A., Filliat, D.: Map-based navigation in mobile robots - ii. a review of map-
learning and path-planning strategies. Journal of Cognitive Systems Research 4(4),
283–317 (2003)
4. Arleo, A., Rondi-Reig, L.: Multimodal sensory integration and concurrent navi-
gation strategies for spatial cognition in real and artificial organisms. Journal of
Integrative Neuroscience 6, 327–366 (2007)
5. Packard, M., McGaugh, J.: Double dissociation of fornix and caudate nucleus le-
sions on acquisition of two water maze tasks: Further evidence for multiple memory
systems. Behavioral Neuroscience 106(3), 439–446 (1992)
6. White, N., McDonald, R.: Multiple parallel memory systems in the brain of the
rat. Neurobiology of Learning and Memory 77, 125–184 (2002)
7. Kim, J., Baxter, M.: Multiple brain-memory systems: The whole does not equal
the sum of its parts. Trends in Neurosciences 24(6), 324–330 (2001)
8. Poldrack, R., Packard, M.: Competition among multiple memory systems: Con-
verging evidence from animal and human brain studies. Neuropsychologia 41(3),
245–251 (2003)
86 L. Dollé et al.
9. McIntyre, C., Marriott, L., Gold, P.: Patterns of brain acetylcholine release pre-
dict individual differences in preferred learning strategies in rats. Neurobiology of
Learning and Memory 79(2), 177–183 (2003)
10. McDonald, R., Devan, B., Hong, N.: Multiple memory systems: The power of
interactions. Neurobiology of Learning and Memory 82(3), 333–346 (2004)
11. Hartley, T., Burgess, N.: Complementary memory systems: Competition, cooper-
ation and compensation. Trends in Neurosciences 28(4), 169–170 (2005)
12. Chavarriaga, R., Strosslin, T., Sheynikhovich, D., Gerstner, W.: A computational
model of parallel navigation systems in rodents. Neuroinformatics 3(3), 223–242
(2005)
13. Pearce, J., Roberts, A., Good, M.: Hippocampal lesions disrupt navigation based
on cognitive maps but not heading vectors. Nature 396(6706), 75–77 (1998)
14. Morris, R.: Spatial localisation does not require the presence of local cues. Learning
and Motivation 12, 239–260 (1981)
15. Doeller, C.F., King, J.A., Burgess, N.: Parallel striatal and hippocampal systems
for landmarks and boundaries in spatial memory. Proceedings of the National
Academy of Sciences of the United States of America 105(15), 5915–5920 (2008)
16. Arleo, A., Gerstner, W.: Spatial cognition and neuro-mimetic navigation: A model
of hippocampal place cell activity. Biological Cybernetics 83(3), 287–299 (2000)
17. Strösslin, T., Sheynikhovich, D., Chavarriaga, R., Gerstner, W.: Robust self-
localisation and navigation based on hippocampal place cells. Neural Net-
works 18(9), 1125–1140 (2005)
18. Devan, B., White, N.: Parallel information processing in the dorsal striatum: Re-
lation to hippocampal function. Neural Computation 19(7), 2789–2798 (1999)
19. Hamilton, D., Rosenfelt, C., Whishaw, I.: Sequential control of navigation by locale
and taxon cues in the morris water task. Behavioural Brain Research 154(2), 385–
397 (2004)
20. Martinet, L.E., Passot, J.B., Fouque, B., Meyer, J.A., Arleo, A.: Map-based spatial
navigation: A cortical column model for action planning. In: Spatial Cognition (in
press, 2008)
21. Filliat, D., Meyer, J.: Global localization and topological map-learning for robot
navigation. In: Proceedings of the seventh international conference on simulation
of adaptive behavior on From animals to animats table of contents, pp. 131–140
(2002)
22. Pych, J., Chang, Q., Colon-Rivera, C., Haag, R., Gold, P.: Acetylcholine release
in the hippocampus and striatum during place and response training. Learning &
Memory 12(6), 564–572 (2005)
23. Martel, G., Blanchard, J., Mons, N., Gastambide, F., Micheau, J., Guillou, J.: Dy-
namic interplays between memory systems depend on practice: The hippocampus
is not always the first to provide solution. Neuroscience 150(4), 743–753 (2007)
24. Meyer, J., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The
Psikharpax project: Towards building an artificial rat. Robotics and Autonomous
Systems 50(4), 211–223 (2005)
25. Guazzelli, A., Corbacho, F.J., Bota, M., Arbib, M.A.: In: Affordances, motivations,
and the world graph theory, pp. 435–471. MIT Press, Cambridge (1998)
26. Girard, B., Filliat, D., Meyer, J.A., Berthoz, A., Guillot, A.: Integration of nav-
igation and action selection in a computational model of cortico-basal ganglia-
thalamo-cortical loops. Adaptive Behavior 13(2), 115–130 (2005)
A Minimalistic Model of Visually Guided
Obstacle Avoidance and Path Selection Behavior
1 Introduction
Selecting a path to approach a goal while avoiding obstacles is a fundamental spa-
tial behavior. Surprisingly few studies investigated the underlying mechanisms
and strategies in animals or humans (but see [1,2]). In the robotics community,
in contrast, obstacle avoidance and path selection is a vivid field of research and
several models have been developed (for an overview see [3,4]). These models
usually require rich spatial information: for example, the distances and direc-
tions to the goal and the obstacles have to be known and often a 2d map of
the environment has to be generated to select a trajectory to the goal. We be-
lief that in many situations successfull navigation behavior can also be achieved
using very sparse spatial information directly obtained from vision without map-
like representations of space. In this article, we present a series of minimalistic
visually guided models that closely predict empirical results on path selection
and obstacle avoidance behavior in rats.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 87–103, 2008.
c Springer-Verlag Berlin Heidelberg 2008
88 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
Obstacle avoidance methods which are related to the models proposed in the
following can be divided into two main categories: the first group models the goal as
an attractor whereas each obstacle is modelled as an repellor. Thus, the position of
each obstacle has to be known and the model’s complexity depends on the number
of obstacles. This group of methods is influenced by potential field methods [3,4]
which treat the robot as a particle moving in a vector field. The combination of
attractive and repulsive forces can be used to guide the agent towards the goal
while avoiding obstacles. Potential fields suffer several limitations: the agent can
get trapped in local minima, lateral obstacles can have a large influence on the
agent’s path towards the goal, and the approach predicts oscillating trajectories
in narrow passages [5]. Several improvements of the original method have been
proposed to overcome these drawbacks [6]. Potential fields have also been used to
model prey-approaching behavior in toads [7].
The task of goal approaching and obstacle avoidance can also be formulated as
dynamical system [8]. The movement decision is generated by solving a system
of differential equations. Again, the goal is represented as attractor whereas each
obstacle is modelled as repellor. The model is used to explain data obtained for
human path selection experiments [2]. In this model, route selection emerges
from on-line steering rather than from explicit path planning. In comparison
to the potential field method, the dynamical approach predicts smoother paths
and does not get trapped in local minima. A further extension of the model was
tested in real robot experiments [6].
The second class of obstacle avoidance methods only relies on distance infor-
mation at the agent’s current position and do not assume that the exact position
of each obstacle is known. The family of vector-field histogram (VFH) methods
[9,10,11] uses an occupancy grid as representation of the agent’s environment.
In a first processing step, obstacle information is condensed to a 1d polar his-
togram. In this representation, candidate corridors are identified and the corridor
which is closest to the goal direction is selected. The VFH+ method additionally
considers the robot’s dynamics: corridors which cannot be reached due to the
robot’s movement constraints are rejected [10]. The VFH* method incorporates
a look-ahead verification based on a map of the agent’s environment to prevent
trap situations due to dead ends [11]. For determing the movement decision, it
takes the consequences of certain movement decisions into account.
In the following, we present an exploratory empirical study investigating ob-
stacle avoidance and path selection behavior in rats (Sec. 2). We then (Sec. 3)
present a series of minimalistic visually guided models of obstacle avoidance and
path selection that could account for the empirical data. In contrast to the mod-
els introduced above, the proposed models (1) act purely on visual input, (2) do
not require map-like representations of space, and (3) the most basic 1d model
does not require any distance information to generate obstacle avoidance and
path selection behavior. Based on our findings, we finally conclude (Sec. 4) that
reliable obstacle avoidance is possible with such minimalistic models and that
our models implicitly solve the path-planning problem.
A Minimalistic Model of Visually Guided Obstacle Avoidance 89
In this part we present an exploratory study, examining path selection and ob-
stacle avoidance behavior in rats. For this, animals were trained to receive food
reward at a landmark visible in the entire experimental arena. Rats were re-
peatedly released into this arena and their paths, approaching this landmark,
was recorded. Placing obstacles between the start position and the landmark,
allowed us to systematically investigate how rats reacted to the obstacles during
target approach. The behavioral results from this experiment are the basis for
the development of a series of visually guided models of obstacle avoidance and
path selection behavior in the second part of this manuscript.
Apparatus. The apparatus consisted of an open area (140 × 140 cm), separated
from the remaining laboratory by white barriers (height 40 cm) and surrounded
by a black curtain (see Fig. 1). Within this area up to 6 obstacles (brown 0.5 l
bottles) were distributed. Food was available from a small feeder that was placed
directly under a black-white striped cylinder (25 cm diameter, 80 cm in height).
The cylinder was suspended from the ceiling about 40 cm above the ground and
was visible in the entire arena. A transparent start box was placed in one of
the corners of the arena. At the beginning of each trial, rats were released by
opening the door of the start box. Their trajectories were recorded by a tracking
system registering the position of a small reflector foil that was attached to a
soft leather harness the animals were wearing (sampling rate: 50 Hz).
90 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
Procedure. Prior to the experiments, the animals were familiarized with the
experimental setup by repeatedly placing them in the arena, allowing them to
explore it for 180 sec. The obstacles, feeder and landmark, and the start box were
randomly repositioned within the arena on a daily basis. After familiarization,
the animals were trained to receive cocoa flavored cereal (Kellogg’s) that was
located in the feeder under the landmark. For each trial, the feeder (incl. land-
mark) as well as the obstacles were randomly distributed in the arena. Before
each trial, the rats were placed in the start box and released after 15 seconds.
A trial lasted until food reward was found or until 600 seconds passed. Animals
were given 4 training-trials each day for a period of 10 days. In addition to the
food reward during the experiments, the rats received 15g rodent food pellets
per day.
The procedure during the test phase was identical to the training phase, but
animals received 8 trials per day for a given test-configuration of obstacles and
feeder. For each trial, rats’ trajectories were recorded until food reward was ob-
tained. Each day, the release box was randomly repositioned in one of the corners
of the arena. The positions of feeder and obstacles were adjusted accordingly.
Each rat was tested in each of the 20 test-configurations (see Fig(s). 2, 3, and 7).
The rats were subdivided into 2 groups with 3 animals each that were exposed
to the test-configuration in a different order.
2.2 Results
Fig(s). 2, 3, and 7 display the entire set of trajectories for all configurations as
well as the relative frequencies for passing the obstacles on the left side or the
right side or passing through the gap. It is apparent from these figures that the
rats adapted their path selection behavior according to the specific configuration.
In the following, we present a detailed analysis of how the configurations influ-
enced path selection behavior. Specific interest concerns the questions whether
animals minimized path length, how animals reacted to gaps of different sizes,
and whether animals built up individual preferences to pass obstacle configura-
tions to the left or right side. Finally, we extract motion parameters from the
recorded trajectories that will be used in the second part in which we present a
visually guided model of path selection and obstacle avoidance behavior.
A Minimalistic Model of Visually Guided Obstacle Avoidance 91
1 2 3 4
5 6 7 8
Fig. 2. Asymmetric configurations: rats’ chosen trajectories are displayed in the upper
row, the predictions of the 1d-model (see Sec. 3) are displayed below. The black and
gray horizontal bars depict the animals (upper) or the models (lower) behavior with
respect to passing the obstacles on the left (black) or the right (light gray) side.
Gap size. In configurations 13 to 16 the obstacles were arranged such that they
created a gap (see Fig. 3). The width of the gap was either 32 cm (configurations
13 and 14) or 14 cm (configurations 15 and 16). Rats’ behavior in choosing the
path through the gap depended on the width of the gap. In configurations 13
and 14 (wide gap) they ran through the gap in 83.76 % of the runs as compared
to 36.20 % of the runs for the configurations 15 and 16 (narrow gap; t-test:
t(5)=3.00, p=0.03).
9 10 11 12
13 14 15 16
Fig. 3. Symmetric configurations: rats’ chosen trajectories are displayed in the upper
row, the predictions of the 1d-model (see Sec. 3) are displayed below. The gray shaded
horizontal bars depict the animals (upper) or the models (lower) behavior with respect
to passing the obstacles on the left (black) or the right (light gray) side, or passing
through the gap (middle gray).
250 600
500
200
400
150
Count
Count
300
100
200
50
100
0 0
0 1 2 3 4 5 0 10 20 30 40
Distance [cm] Orientation change [deg]
Fig. 4. Left: histogram of rats’ distance covered between 2 subsequent samples; right:
histogram of change in orientation between 2 subsequent samples. The vertical gray
lines mark the mean distance and the mean orientation change, respectively.
2.3 Discussion
In the part of the work we presented an exploratory study examining rats’ path
selection and obstacle avoidance behavior. Animals were released from a start
box into an open arena with a number of obstacles and a feeder, marked by a
large landmark. Rats avoided the obstacles and approached the feeder fastly and
efficiently. In fact, in over 75 % of the trials in which path alternatives differed
in length, the animals showed a preference for the shorter alternative. These
empirical results demonstrate that the animals reacted to the specific target
configurations. The fact that animals minimized path length is remarkable to
some extend as the additional energy expenditure when taking detours or sub-
optimal paths are estimated to be rather small in this scenario. Nevertheless, the
animals did not adopt a general strategy that could account for the entire set
of configurations, such as moving towards and along the walls, but they decided
on the exact path on a trial by trial basis. It has to be noted, however, that for
the symmetric configurations (see Fig. 3) rats built up rather strong individual
preferences to pass obstacles on the right or the left side. Such preferences can
be explained by motor programs. Rats are well-known to develop stereotyped
locomotory behavior when repeatedly exposed to the same situation (for an
overview see [12]), such as being released from the start box. In other words, the
animals draw movement decisions already in the start box that were independent
of the specific obstacle configuration. However, at some point on their trajectory,
animals reacted to the configuration. Otherwise no variance in behavior would
have been observed.
94 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
Visual input. As input, our models use a panoramic image with a horizontal field
of view of 360◦. The vertical field of view covers 90◦ below the horizon. The rats’
visual field above the horizon is neglected as it does — at least for our setup —
not contain information necessary for obstacle avoidance. The angular resolution
of the images is 1◦ per pixel. Images are generated by a simple raycaster assuming
that the rats’ eye level is 5 cm above the ground plane.
The process of image generation is sketched in Fig. 5. For each direction of
sight, the raycaster computes the distance from the current robot position to the
corresponding obstacle (modeled as cylinders, black pixels) or the walls of the
arena (white pixels). Since the object’s distance to the agent is directly linked
to the elevation under which it is imaged, a 2d view of the environment were
computed (Fig. 5, middle). Close-by objects are imaged both larger and under
a larger elevation than distant objects. Based on the 2d images, 1d images can
be obtained by taking slices of constant elevation below the horizon (gray
horizontal lines) out of the 2d image. Depending on the elevation of the slice,
the resulting 1d view only contains obstacle information up to a certain distance
(Fig. 5, right). In case the slice is taken along the horizon (top right), also objects
at a very large distance are imaged. For this case, no depth cue is available
because we do not analyze the angular extend of the obstacles and we do not
compute optical flow between two consecutive input images.
Since the input images only contain visual information about the obstacles, the
goal direction (Fig. 5, vertical dark gray line) w.r.t. the agent’s current heading
direction (vertical light gray line) is provided as another input parameter.
Elevation [DEG]
22.5
45
67.5
90 −90 0 90 180
−90 0 90 180
Azimuth [DEG]
Azimuth [DEG]
Fig. 5. Image generation. For detailed explanations see the text; for visualization pur-
poses, the 1d views are stretched in the y-direction.
classify whether image regions lie in the ground plane or not (for reviews see
[13,14]). The ground plane could then be assumed to be free space, whereas
other image regions would be interpreted as obstacles.
Assuming an image preprocessing step is also reasonable from the viewpoint of
visual information processing in the brain: lower processing stages usually trans-
form the visual information into a representation which facilitates further process-
ing by higher-level stages [15]. As the goal direction could be derived in earlier
stages, we think it is reasonable to pass it as input parameter to our models.
The behavior is modeled in discrete time steps, each time step corresponding
to one sampling cycle of the used tracker. The models also neglect dynamic
aspects of the moving rats. The simulated agents are moving with a constant
velocity of 2.1cm per time step and a maximum turning rate of ±9.25◦ per time
step (compare Sec. 2.2). By limiting the maximum turning rate, aspects of the
agent’s kinematic are at least partially considered [10]; though the simplifications
could complicate a real robot implementation.
ρ heading 4
δ direction 3.5
γ
ρ(= α ) 3
Residual E(δ)
2.5
βr (= α)
δ 2
1.5
Fig. 6. Left: sketch of the 1d model. For explanations see the description above. Right:
optimization residuals E depending on the enlargement parameter δ. For details see
the section below.
the left side, the right side or through the middle. Depending on the configuration
c and the enlargement δ, a vector
hsim (c, δ) = hL (c, δ), hM (c, δ), hR (c, δ) (1)
of relative frequencies for passing on the left side, on the right side, or passing
through the middle. In order to determine the optimal value of δ, the following
dissimilarity measure
20
E(δ) = SSD hsim (c, δ), hrat (c) (2)
c=1
was minimized. The measure computes the sum of squared differences (SSD)
between the vectors of relative frequencies hsim and hrat for the simulation and
the rats’ data, respectively. The best fit (Fig. 6, right) was obtained for δ = 6◦
with an optimization residual of E = 0.989. The resulting trajectories are shown
in Fig(s). 2, 3, and 7; the configurations depicted in Fig. 7 were solely used for
adjusting the model parameters.
17 18 19 20
Fig. 7. Further configurations used only for the parameter optimization. Rats’ chosen
trajectories are displayed in the upper row, the predictions of the 1d-model (see Part
2) are displayed below. The gray shaded horizontal bars depict the animals (upper) or
the models (lower) behavior with respect to passing the obstacles on the left (black)
or the right (light gray) side, or passing through the gap (middle gray).
Correlation between simulation and behavioral data. To assess how good the
model fits the behavioral data, we correlated the relative frequencies hrat (c) and
hsim (c) (for 1 ≤ c ≤ 20). From the 9 possible combinations, the correlations
rL,L = 0.919, rM,M = 0.947, and rR,R = 0.935 are most relevant for our
purposes; the mixed correlations all exhibit negative correlations around −0.45.
However, this analysis does not distinguish between configurations with and
without a gap: the correlation rM,M is influenced because hM = 0 is assumed
for all configurations without a gap. To overcome this drawback, we separately
98 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
correlated the relative frequencies for the configurations with and without a gap.
For the first class (configurations 13 to 20) we obtained correlations rL,L = 0.816,
rM,M = 0.943, and rR,R = 0.969; for the second class (configurations 1 to 12)
we obtained rL,L = rR,R = 0.934 (as for these configurations hR equals 1 − hL ,
the correlations rL,L and rR,R are identical).
Trap situations. In order to test our model with other obstacle configurations
than those tested in the behavioral experiments, we performed tests (Fig. 8)
with cluttered environments (configurations 1, 2), a U-shaped obstacle configu-
ration (3), and a configuration for which the agent is completely surrounded by
obstacles (4). For each test run, the agents were initialized with identical start
position; the initial orientation was varied in steps of 15◦ . After initialization,
the simulated agent was moved forward for one step. Afterwards, the model was
applied to predict the agent’s path.
The results for configurations 1 and 2 show that almost every simulated tra-
jectory reaches the goal position. Some of the paths are not as short as possible
because the model also tries to avoid distant obstacles which do not directly block
the agent’s way towards the goal. In case the agent hits the obstacle, it cannot turn
fast enough to avoid the obstacle due to the limited turning rate. Our 1d model
is also able to reach the goal in test configuration 3. This is a test situation for
which many obstacle avoidance methods relying on depth information (e.g. poten-
tial field approaches) fail due to local minima [5]. Our model fails for condition 4:
in this case, no movement decision can be derived because the agent is completely
surrounded by obstacles (resulting in a completely black 1d view).
1 2 3 4
3.3 Discussion
Our model is capable to produce smooth trajectories reaching the goal position
without crashing into the obstacles. Since our model does not contain any noise,
the simulated trajectories look much smoother than the rats’ trajectories. Com-
paring the analysis whether the agent passed on the left side, on the right side
or through the gap to the corresponding behavioral data reveals that the model
covers several aspects we outlined in Sec. 2.2. These aspects will be discussed in
the following paragraphs in more detail; afterwards, we outline the limitations
of the 1d model.
A Minimalistic Model of Visually Guided Obstacle Avoidance 99
Gap size. For the behavioral data we observed that the rats more frequently pass
through larger gaps. Comparing configurations 13 to 16 (Figure 3) reveals that
all simulated trajectories pass through the gap if the gap is large (in comparison
to 83.76% for the rats’ trajectories). In case the gap is small, only 15.71% of the
simulated trajectories pass through the gap (rats: 36.20%).
Model limitations. Although the model is capable to reproduce the results ob-
tained from the behavioral experiments, the comparison between the simulated
and the rats’ trajectories reveal several aspects which are due to the lack of
depth information in our model: (1) the model seems to react earlier to obsta-
cles than the rats, (2) the simulated trajectories pass closer to obstacles than the
rats’ trajectories and (3) our model cannot solve the trap configuration 4 which
definitively could be solved by rats. The latter aspect is due to neglecting the
agent’s dynamics in our simulation.
(2) Distance while passing obstacles. Comparing the model’s and the rats’ tra-
jectories also reveals that the simulated agent passes-by closer to obstacles than
100 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
the rats. This can also be explained with the lack of depth information: indepen-
dent of the distance to the obstacle, the obstacle is enlarged by δ. If the agent
is far away from the obstacle, the enlargement δ is large compared to the size of
the obstacles. In case the agent is close to the obstacle, this enlargement is small
compared to the size of the obstacles imaged on the agent’s retina. For this rea-
son, the agent passes very close to obstacles. For larger δ, gaps between obstacles
get closed due to obstacle growing. For these cases, the agent could no longer
pass through gaps. These model properties could be avoided by introducing an
enlargement mechanism which depends on the distance to the obstacle.
1.5d model. Except for the input image, our 1.5d model is identical to the 1d
model described above. In contrary to our 1d model, the 1.5d model uses a view
taken out of the 2d image with constant elevation > 0 (Figure 5, right). Hence,
it only takes obstacles up to a certain distance from the agent’s current position
into account. Monocular vision as a cue of depth information received attention
in the context of robot soccer. A method called “visual sonar” [16,14] searches
along radial scan lines in the camera image. In case an obstacle is encountered
along the scan line, its distance can be computed. This information can then be
used for further navigation capabilities such as obstacle avoidance, path planning
or mapping. Like the proposed 1.5d model, the “visual sonar” relies on elevation
as a cue for depth information [17]. This depth cue can also be used by frogs
and humans [18,19].
Fig. 9 visualizes the results obtained for testing the 1.5d model with the trap
situations described in Sec. 3.2. For the experiments, horizontal views taken
at = 30◦ were used. For the cluttered environments (configurations 1 and
2), the model predicts paths which are shorter than the paths predicted by
the 1d model. Since the 1.5d model does not consider distant obstacles, there
are situations in which the 1.5d model approaches the goal, whereas the 1d
model avoids distant obstacles. Hence, the 1.5d model is able to predict shorter
paths. For test configuration 3, our model suffers the same problems as many
reactive obstacle avoidance methods incorporating depth information: due to
local minima, the simulated agents head towards the goal. When the obstacle
in front of the agent comes into sight, it starts to avoid the obstacle. However,
it then cannot reach the goal position any more. Related work tries to solve
this problem by map-building and look-ahead path-planning algorithms [11].
A Minimalistic Model of Visually Guided Obstacle Avoidance 101
1 2 3 4
22.5
45
heading
direction
67.5 goal
90
direction
−90 0 90 180
Azimuth [DEG]
repelling profile
4
attracting profile
combination
α
2ρ
Fig. 10. Left: sketch of the 2d model. Right: trap situations for the 2d model.
Since the model incorporates depth information, it can solve the test condition
4 at least if the initial orientation points towards the open side of the U-shaped
obstacle. Due to the restricted movement parameters, the model cannot turn
fast enough and hits the obstacles for other initial orientations.
2d model. The 2d model (Fig. 10) we are currently working on uses a 2d view
of the environment as shown in Fig. 5. For a set of n horizontal directions of
sight ϕi (1 ≤ i ≤ n), the distance di towards the visible obstacle is computed
based on the elevation under which the obstacle is imaged. By this step, the
2d image information is reduced to a 1d depth profile. At each direction of sight
ϕi , a periodic and unimodal curve (comparable to the von Mises distribution) is
placed. The curve’s height is weighted by the inverse of di . By summing over all
the von Mises curves, a repelling profile is computed. Goal attraction is modeled
by an attracting profile with a minimum at the goal direction. Both profiles are
summed up and a minimization process searches the profile’s minimum in the
range ±ρ around the agent’s heading direction. The direction of the minimum α
is used as movement direction. The polar obstacle representation is recomputed
in each iteration and not updated from stept to step.
102 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
Fig. 10 also visualizes the model’s trajectories obtained for trap configurations
3 and 4. Although the agent gets trapped in configuration 3, much more test
trials than for the 1.5d model successfully reach the goal. Since the trajectories
were simulated with relatively large σ, objects are passed by at a comparatively
large distance. We are currently working on improving the distance weighting
as well as the interplay between repelling and attractive profiles. By this means,
we expect that our 2d model performs better than the other models.
4 Conclusion
In this work we presented an exploratory study examining obstacle avoidance
and path selection behavior in rats and a minimalistic visually-guided model
that could account for the empirical data. The particular appeal of the model is
its simplicity, neither requiring map-like representations of the goal and obsta-
cles nor does it incorporate depth information. These results demonstrate that
reliable obstacle avoidance can be achieved with only two basic building blocks:
(1) the ability to approach the goal and (2) the ability to detect if the course
towards the goal is blocked by an obstacle and to avoid the obstacle. While the
proposed basic 1d-model is capable to reproduce the results of the behavioral ex-
periment described in Sec. 2, a detailed comparison of the simulated trajectories
with the empirical data suggests that the rats probably used depth information.
This can be concluded from the fact that rats seem to only react to obstacles
when they are at a certain distance from them and that rats passed by obstacles
at a comparatively large distance. Both of these aspects cannot be reproduced
by our 1d model. In order to explain these findings, we have presented first ideas
(1.5d and 2d model) of how depth information can be integrated in our model
in a sparse and biologically inspired fashion.
References
1. Fajen, B., Warren, W.: Behavioral dynamics of steering, obstacle avoidance, and
route selection. Journal of Experimental Psycholology: Human Perception Perfor-
mance 29(2), 343–362 (2003)
2. Fajen, B., Warren, W., Temizer, S., Kaelbling, L.P.: A dynamical model of visually-
guided steering, obstacle avoidance, and route selection. International Journal of
Computer Vision 54(1–3), 13–34 (2003)
3. Choset, H., Lynch, K., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L.,
Thrun, S.: Principles of Robot Motion. MIT Press, Cambridge (2005)
4. Siegwart, R., Nourbaksh, I.: Introduction to Autonomous Mobile Robots. MIT
Press, Cambridge (2004)
5. Koren, Y., Borenstein, J.: Potential field methods and their inherent limitations
for mobile robot navigation. In: Proceedings of the IEEE Conference on Robotics
and Automation, pp. 1398–1404 (1991)
6. Huang, W., Fajen, B., Finka, J., Warren, W.: Visual navigation and obstacle avoid-
ance using a steering potential function. Robotics and Autonomous Systems 54(4),
288–299 (2006)
A Minimalistic Model of Visually Guided Obstacle Avoidance 103
7. Arbib, M., House, D.: Depth and Detours: An Essay on Visually Guided Behav-
ior. In: Vision, Brain, and Cooperative Computations, pp. 129–163. MIT Press,
Cambridge (1987)
8. Schöner, G., Dose, M., Engels, C.: Dynamics of behavior: Theory and applications
for autonomous robot architectures. Robotics and Autonomous Systems 16(2–4),
213–245 (1995)
9. Borenstein, J., Koren, Y.: The vector field histogram – fast obstacle avoidance for
mobile robots. IEEE Journal of Robotics and Automation 7(3), 278–288 (1991)
10. Ulrich, I., Borenstein, J.: VFH+: Reliable obstacle avoidance for fast mobile robots.
In: Proceedings of the IEEE Conference on Robotics and Automation (1998)
11. Ulrich, I., Borenstein, J.: VFH*: Local obstacle avoidance with look–ahead verifica-
tion. In: Proceedings of the Internactional Conference on Robotics and Automation
(2000)
12. Gallistel, C.R.: The Organisation of Learning. MIT Press, Bradford Books, Cam-
bridge (1990)
13. Chen, Z., Pears, N., Liang, B.: Monocular obstacle detection using reciprocal-polar
rectification. Image and Vision Computing 24(12), 1301–1312 (2006)
14. Lenser, S., Veloso, M.: Visual sonar: Fast obstacle avoidance using monocular vi-
sion. In: Proceedings of the IEEE Conference on Intelligent Robots and Systems,
pp. 886–891 (2003)
15. Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representa-
tion. Annual Reviews Neuroscience 24, 1193–1216 (2001)
16. Horswill, I.D.: Visual collision avoidance by segmentation. In: Proceedings of the
IEEE Conference on Robotics and Autonomous Systems, pp. 901–909 (1994)
17. Hoffmann, J., Jüngel, M., Lötzsch, M.: A vision based system for goal-directed
obstacle avoidance. In: Nardi, D., Riedmiller, M., Sammut, C., Santos-Victor, J.
(eds.) RoboCup 2004. LNCS (LNAI), vol. 3276, pp. 418–425. Springer, Heidelberg
(2005)
18. Collett, T.S., Udin, S.B.: Frogs use retinal elevation as a cue to distance. Journal
of Comparative Physiology A 163(5), 677–683 (1988)
19. Ooi, T.L., Wu, B., He, Z.J.: Distance determined by the angular declination below
the horizon. Nature 414(6860), 197–200 (2001)
Route Learning Strategies in a Virtual Cluttered
Environment
1 Introduction
Finding the way between two locations is an essential and frequent wayfinding
task for both animals and humans. Typical examples include the way from the
nest to a feeding site or the route between your home and the office. While several
navigation studies, both in real and virtual environments, investigated the form
and content of route knowledge (e.g., [1,2,3]), empirical studies investigating the
route learning process itself are rather limited (but see [4,5]).
A very influential theoretical framework of spatial knowledge acquisition pro-
poses three stages when learning a novel environment [6]. First, landmark knowl-
edge, i.e., knowledge about objects or views allowing to identify places, is acquired.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 104–120, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Route Learning Strategies in a Virtual Cluttered Environment 105
In the second stage, landmarks are combined to form route knowledge. With in-
creasing experience in the environment, survey knowledge (i.e. knowledge about
distances and direction between landmarks) emerges. According to this model, the
mental representation of a route can be conceived as a chain of landmarks or places
with associated movement directives (e.g. turn right at red house, turn left at the
street lights). This landmark to route to survey knowledge theory of spatial learn-
ing has not remained unchallenged: Recent findings, for example, demonstrate
that repeated exposures to a route not necessarily resulted in improving metric
knowledge between landmarks encountered on the route [5]. Most participants
either had accurate knowledge from the first exposure or they never acquired
it. Furthermore, results from route learning experiments in virtual reality sug-
gest two spatial learning processes that act in parallel rather than subsequently
[4]: (1) a visually dominated strategy for the recognition of routes (i.e., chains
of places with associated movement directives) and (2) a spatially dominated
strategy integrating places into a survey map. The latter strategy requires no
prior connection of places to routes. Support for parallel rather than subsequent
learning processes also comes from experiments with rats: depending on the exact
training and reinforcement procedure, rats can be trained to approach positions
that are defined by the configuration of extramaze cues (c.f. spatially dominated
strategy), to follow local visual beacons (c.f. visually dominated strategy), or to
execute motor responses (e.g., turn right at intersection; [7,8]). Evidence for a
functional distinction of spatial memories also comes from experiments demon-
strating that participants who learned a route by navigation performed better
on route perspective tasks, while participants who learned a route from a map
performed better on tasks analysing survey knowledge [9].
In any case, route knowledge is usually described as a chain of stimulus-
response pairs [10,11], in which the recognition of a place stimulates or triggers
a response (i.e., a direction of motion). Places along a route can be recognized
by objects but also by views or scenes [12]. Evidence for this concept of route
memory mostly comes from experiments in mazes, buildings, or urban environ-
ments, in which decision points were well defined (e.g. [1,13,3]). Furthermore,
distinct objects (i.e., unique landmarks) are usually presented at decision points.
Route learning in open environments, in contrast, has received little attention
in humans, but has been convincingly demonstrated in ants [2]. The desert ant
Melophorus bagoti is a singly foraging ant and its environment is characterized
by cluttered distributed small grass tussocks. The ants establish idiosyncratic
routes while shuttling back and forth between a feeder and their nest. Each
individual ant follows a constant route for inbound runs (feeder to nest) and
outbound runs (nest to feeder). Usually both routes differ from each other and
show a high directionality [14]. In contrast, wood ants can learn bi-directional
routes when briefly reversing direction and tracing their path for a short distance
[15]. For both ant species view-dependend learning is essential for route learning
in open cluttered environments [16]. View dependent representations [17] and
view dependent recognition of places has also been demonstrated in humans and
has been shown to be relevant for navigation [12].
106 R. Hurlebaus et al.
Fig. 1. (a) The virtual environment from the perspective of the participant at the
home position. The sphere was visible only in close proximity. The text specified the
current task (here:”Search for the feeder!”). A distal landmark (large small column) is
visible in the background; (b) A map of the environment: the positions of home and
feeder are marked by asterisks; the crossed circles indicate the positions of the colored
columns (for illustration columns were plotted closer to the center of the environment);
(c) The no-local-objects condition; (d) The fog condition.
2.4 Procedure
General Experimental Task and Procedure. The general experimental task
was to repeatedly navigate between two target locations, the home (blue sphere)
and the feeder (red sphere). During navigation, the target (home or feeder) for
the current run was indicated by a text message (e.g., ”Go home!”). As soon as
the participant moved over the current target (e.g., blue sphere indicating home
location), the respective text message changed (e.g., ”Search for the feeder!”).
Runs from home to feeder are referred to as outbound runs, runs from the feeder
to home are referred to as inbound runs. Experimental session always started at
the home position. As participants were naive with respect to the environment,
the experiment had an extensive training-phase prior to the test-phase.
minute (for inbound and outbound runs) after 5 training session, they advanced
to the test phase. The maximal number of training sessions was 9.
2: Fog Condition. The fog condition was identical to a training session but the
visibility conditions were altered by introducing fog in the virtual environment:
the visibility of the environment decreased with increasing distance. Beyond a
distance of 2.0 units only fog but no other information was perceptible. By these
means, global landmarks as well as structures such as view axes or corridors
arising from obstacle constellations were eliminated from the visual scene. The
fog also covered the ground plane such that it provided no optic flow during
navigation. In this modified environment participants had to rely only on local
objects in their close proximity to find their way between home and feeder. All
participants had 20 minutes to complete as many runs as possible. After that
time the fog test stopped with the first visit back at home. Unfortunately, data
of three participants had to be excluded from the final data set due to technical
problems with the software.
3 Results
3.1 Training Phase
Route Learning Performance. 19 out of the total of 20 participants were
able to solve the task: they learned to efficiently navigate between the home
and the feeder. One participant was removed from the final data set, as he did
not reach the learning criterion. This participant also reported to be clueless
about the positions of the home and the feeder. For the remaining participants,
the time to reach the learning criterion differed: four participants reached it
after 5 training sessions, 6 participants after 6 sessions, 2 after 7 sessions, 2
after 8 sessions, and 5 participants needed 9 training session (6.9 sessions on
average). The increase in navigation performance was highly significant for both
inbound and outbound runs (see Figure 2, paired t-test first vs. last training
session, inbound: t(18) = 14.26; p < .001; outbound: t(18) = 10.76; p < .001).
Figure 3 show examples from two participants how route-knowledge evolved with
increasing training sessions for two participants. At the end of the training phase,
all remaining 19 participants solved the task of navigating between home and
Route Learning Strategies in a Virtual Cluttered Environment 111
2.5
0.5
0
1 2 3 4 5 6 7 8 9
session
the feeder reliably and efficiently. For this result and all other results we did not
found any significant gender differences. Since we have small groups (10 female
and 11 male) small differences if present are not ascertainable.
Outbound Runs and Inbound Runs. Participants showed better naviga-
tion performance (runs/min) on inbound runs as compared to outbound runs
(Wilcoxon signed rank test: p < .01, see Figure 2). In other words, participants
found the way from feeder to home faster than the way from home to feeder. It
appears that this difference increases with increasing number of sessions. Note,
however, that some particpants reached the learning criterion already after 5
sessions and proceeded to the test phase. In later sessions the number of par-
ticipants therefore decreases which explains the increasing variations in later
sessions and which could account for the saturation effect.
Constant and Variable Routes. Analysing the chosen trajectories in the last
training session in detail, reveals remarkable inter-individual differences. While
some participants were very conservative in their route choices (see right column
in Figure 3), other participants showed a large variability in their choices (see
left column in Figure 3). The calculated mean route similarity ranged from .19
in case of very variable to 1.0 for constant routes (mean=.67, std=.24). Figure 4
displays the route similarity values for all participants revealing a continuum
rather than distinct groups. Navigation performance (runs/min) in the last train-
ing session was significantly correlated with route similarity. Specifically, with
higher route similarity the navigation performance increased (r = .47, p < .05).
Neither navigation speed during the last training session, nor the number of
sessions needed to reach the performance criterium to enter the test phase sig-
nificantly correlated with the route similarity values of the last training session
112 R. Hurlebaus et al.
0 1 2 0 1 2
AS25 FW23
session 1 session 1
⇓ ⇓
0 1 2 0 1 2
AS25 FW23
session 5 session 4
⇓ ⇓
0 1 2 0 1 2
AS25 FW23
session 8 session 7
Fig. 3. Evolving route knowledge of two participants. Left column: variable routes with
similarity of 0.55 (mean of outbound and inbound runs of the last session, compare to
fig. 4 and see text); right column: constant route with similarity 1.0, (mean of outbound
and inbound runs of the last session, compare to fig. 4 and see text). lower left corner
measuring unit, participant, and session number.
Route Learning Strategies in a Virtual Cluttered Environment 113
0.9
0.8
0.7
0.6
similarity
0.5
0.4
0.3
0.2
0.1
0
27
KH 5
TU 1
N 7
TK 6
C 3
AN 0
SN 7
FW 8
SM 7
EH 2
20
C 3
AS 5
TS 6
SP 5
23
BG 9
KE 3
2
2
2
2
2
2
2
2
2
P2
2
TE
KB
JF
G
subject
2.5
homing error
2 r=−0.13, n.s.
1.5
0.5
0 0.2 0.4 0.6 0.8 1
similarity
In this part of the experiment participants were able to see closely obstacles.
Spatial information at larger distances was masked by fog (Fig. 1d). Individ-
ual performance (runs per minutes) during the fog test was compared with the
performance of the last training session (expressed as change in performance).
As expected, most participants show a performance decrease in the fog test (in-
bound: 14 of 16, outbound: 13 of 16). More interestingly, we found significant
correlations between the change in performance and the route similarity in the
last training session: participants with low route similarity values show stronger
(negative) changes in performance as compared to participants with higher route
similarity values (see Figure 7). These correlation were significant for both, in-
bound runs (r = .50, < .02) and outbound runs (r = .75, p < .001).
4 Discussion
0 1 2 0 1 2
0 1 2 0 1 2
0 1 2 0 1 2
0 1 2 0 1 2
Fig. 6. Four examples of behavior in the last training session (left column) and homing
behavior in the no-local-objects test (right column). The two top rows show results from
a participant with low route similarity values, the lower two rows show examples from
a participant with high route similarity values.
most earlier studies on route learning and route knowledge (e.g.,[3,20,21]), the
current environment did not feature a road network with predefined places, junc-
tions (decision points), and unique local landmarks. The environment was made
up of many similarly shaped objects with identical texture and height that were
116 R. Hurlebaus et al.
the entire training phase revealed differences for outbound runs (home to target)
and inbound runs (target to home): specifically, participants found their way
faster on inbound runs. This could be explained by the specific significance of
the home location, which may result from the fact that each training session
started at the home/nest. In central place foragers, like the desert ants, the
importance of the nest and its close surrounding is well documented [22]. An
alternative explanation for this effect is that the local surrounding of nest and
feeder were different (i.e. the spatial distribution of the surrounding obstacles):
the nest, for example, was positioned at a larger open space, surrounded by
fewer objects, as compared to the feeder. By these means, the nest position
might have been recognized from larger distances, hence resulting in an increased
performance. Further experiments will have to show whether semantic or spatial
(configurational) effects were responsible for the described effect.
The most important result of the training phase is that participants greatly
differed with respect to their route choices: using a novel method to compare
trajectories (see Section 2.5) we obtained descriptions of the similarity of the
traveled paths during the last training session. While some participants were
very conservative, selecting the same outbound path and the same inbound path
on most runs, others showed a high variability, navigating along many different
paths (for examples, see Figure 3). Participants’ route similarity values of the
last training session were correlated with their navigation performance during
that session: participants that established fixed routes during training showed
better navigation performance than participants that showed higher variabili-
ties in their route choices. How can these inter-individual differences in route
similarity and navigation performance be interpreted? Did different participants
employ different navigation or learning strategies, relying on different spatial
information?
Results from the test phase in which the availability of different types of spa-
tial information was systematically manipulated allowed for first answers: In
the fog condition (see Figure 1d) only obstacles in close proximity were visible.
By these means, global spatial information was erased (i.e., distal global land-
marks and spatial information, emerging by lined-up obstacles such as visual
gate-ways or corridors). We observed correlations between participants’ route
similarity values and their performance in the fog condition. Specifically, indi-
viduals showing a high variability in route choices showed a clear reduction of
navigation performance during the fog condition as compared to the last training
session. Individuals with a low variability in route choices, on the other hand,
were largely unaffected by the fog. These results suggest that participants with
variable route choice behavior strongly relied on distal or global spatial infor-
mation, while participants exhibiting conservative route choice behavior rather
relied on proximal spatial information, as provided by the close-by obstacles or
obstacle configurations. A straight forward assumption is that the latter group
learned local views (obstacle configurations) and corresponding movement de-
cisions (c.f. [23]) during the training phase that were also available also during
118 R. Hurlebaus et al.
the fog condition. In other words, route knowledge for these participants would
be best described as a sequence of recognition triggered responses [1,3].
If, in fact, participants exhibiting conservative route choice behavior relied on
recognition triggered responses, and participants showing variable route choice
behavior primarily relied on distal, global spatial information or knowledge, the
following behavior had to be predicted for the no local obstacle condition: if all
local obstacles dissappear after reaching the feeder and only the distal global
landmarks remained, returning to the home should be impossible for partici-
pants relying on recognition triggered responses only. Participants relying on
global information, on the other hand, should be able to solve the task. In con-
trast to these predictions, results demonstrate that all participants were able to
solve the task with a certain accuracy (see Figures 5 and 6). Furthermore, virtu-
ally no correlation (r=-.13) was found between participants’ route similarities in
the last training session and their homing performance in the no local obstacle
condition. This disproves the explanations given above: apparently participants
showing conservative route choice behavior did not solely rely on stored views
and remembered movement decisions (i.e., recognition triggered responses), but
had additional spatial knowledge allowing them to solve the homing task. A
detailed inspection of their homing trajectories revealed that some participants
reproduced the overall form of their habitual routes from the last training session
(see Figure 6). There are two ways of achieving such behavior: (1) participants
learned a motor program during training that was replayed during the no local
obstacle condition, or (2) they possessed a metric representation of the estab-
lished routes. While this experiment does not allow distinguishing between these
alternatives, informal interviews participants after the experiment support the
latter explanation.
Taken together, we have shown that participants could learn to efficiently nav-
igate between two locations in a complex cluttered virtual environment, lacking
predefined places, decision points, and road networks. In such unstructured en-
vironments a route is best described as a sequence places defined by views or
object configurations [3], rather than as a sequence of places defined by unique
single objects. Analyzing participants’ navigation behavior, we could show strong
interindividual differences that could be related to different navigation or orien-
tation strategies taking different kind of spatial information into account. Specif-
ically, participants showing a high variability in their route choices depended on
distal spatial information, suggesting that they learned global directions and
distances between relevant locations. Participants who established fixed routes
instead relied on proximal obstacles to guide their movements. However, even
if such local spatial information was not available, some were able to reproduce
the overall form of their preferred paths. Apparently they learned more than
reflex-like recognition triggered responses during training, presumably generat-
ing a metric representation of their preferred paths. These results are not in
line with the dominant landmark to route to survey knowledge framework of
spatial knowledge acquisition [6], stating that survey knowledge emerges not un-
til route knowledge is established. Apparently some participants were able to
Route Learning Strategies in a Virtual Cluttered Environment 119
learn about distances and directions in the environment without first establish-
ing route knowledge (c.f. [5]). The fact that participants’ route similarities of
their last training session did not fall into two distinct clusters but constituted a
continuum, furthermore, suggests that the two learning strategies sketched above
are not exclusive but complementary, existing in parallel (c.f. [4]), and that dif-
ferent participants weighted them differently. It is highly likely that these weights
are adopted during the course of learning.
Further research is needed to answer questions arising from this exploratory
study. For example, what triggers the usage of which strategy? How are the
strategies related to each other? And, how is metric information entangled with
the strategies applied?
References
1. van Janzen, G., Turennout, M.: Selective neural representation of objects relevant
for navigation. Nature Neuroscience 7(6), 572–574 (2004)
2. Kohler, M., Wehner, R.: Idiosyncratic route-based memories in desert ants,
Melophorus bagoti: How do they interact with path-integration vectors? Neuro-
biol. Learn. Mem. 83, 1–12 (2005)
3. Mallot, H., Gillner, S.: Route navigation without place recognition. what is recog-
nized in recognition triggered responses? Perception 29, 43–55 (2000)
4. Aginsky, V., Harris, C., Rensink, R., Beusmans, J.: Two strategies for learning
a route in a driving simulator. Journal of Environmental Psychology 17, 317–331
(1997)
5. Ishikawa, T., Montello, D.: Spatial knowledge acquisition from direct experience
in the environment: Individual differences in the development of metric knowledge
and the integration of separately learned places. Cognitive Psychology 52, 93–129
(2006)
6. Siegel, A., White, S.: The development of spatial representations of large-scale
environments. Advances in child development and behavior 10, 9–55 (1975)
7. Restle, F.: Discrimination cues in mazes: A resolution of the ’place-vs-response’
question. Psychological Review 64(4), 217–228 (1957)
8. Leonard, B., McNaughton, B.: Spatial representation in the rat conceptual be-
havioural and neurophysiological perspectives. In: Kessner, R., Olton, D.S. (eds.)
Comparative Cognition and Neuroscience: Neurobiology of Comparative Cogni-
tion. Hillsdale, New Jersey (1990)
9. Taylor, H., Naylor, S., Chechile, N.: Goal-specific influences on the representation
of spatial perspective. Memory and Cognition 27, 309–319 (1999)
10. Trullier, O., Wiener, S., Berthoz, A., Meyer, J.A.: Biologically based artificial nav-
igation systems: review and prospects. Progress in Neurobiology 51(5), 483–544
(1997)
11. Kuipers, B.: The spatial semantic hierarchy. Artificial Intelligence 119, 191–233
(2000)
12. Gillner, S., Mallot, H.: Navigation and acquisition of spatial knowledge in a virtual
maze. Journal of Cognitive Neuroscience 10, 445–463 (1998)
120 R. Hurlebaus et al.
13. Hölscher, C., Meilinger, T., Vrachliotis, G., Brösamle, M., Knauff, M.: Up the down
staircase: Wayfinding strategies and multi-level buildings. Journal of Environmen-
tal Psychology 26(4), 284–299 (2006)
14. Wehner, R., Boyer, M., Loertscher, F., Sommer, S., Menzi, U.: Ant navigation:
One-way routes rather than maps. Current Biology 16, 75–79 (2006)
15. Graham, P., Collett, T.: Bi-directional route learning in wood ants. Journal of
Experimental Biology 209, 3677–3684 (2006)
16. Judd, S., Collett, T.S.: Multiple stored views and landmark guidance in ants. Na-
ture 392, 710–714 (1998)
17. Diwadkar, V., McNamara, T.: Viewpoint dependence in scene recognition. Psycho-
logical Science 8, 302–307 (1997)
18. Needleman, S., Wunsch, C.: A general method applicable to the search for similar-
ities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
19. Basten, K., Mallot, H.: Building blocks for trail analysis (in preparation, 2008)
20. Gaunet, F., Vidal, M., Kemeny, A., Berthoz, A.: Active, passive and snapshot
exploration in a virtual environment: influence on scene memory, reorientation and
path memory. Cognitive Brain Research 11(3), 409–420 (2001)
21. Munzer, S., Zimmer, H., Schwalm, M., Baus, J., Aslan, I.: Computer-assisted navi-
gation and the acquisition of route and survey knowledge. Journal of Environmental
Psychology 26(4), 300–308 (2006)
22. Bisch-Knaden, S., Wehner, R.: Landmark memories are more robust when acquired
at the nest site than en route: experiments in desert ants. Naturwissenschaften 90,
127–130 (2003)
23. Christou, C.G., Bülthoff, H.H.: View dependence in scene recognition after active
learning. Memory and Cognition 27(6), 996–1007 (1999)
Learning with Virtual Verbal Displays: Effects of
Interface Fidelity on Cognitive Map Development
1 Introduction
Most research investigating verbal spatial learning has focused on comprehension of
route directions or the mental representations developed from reading spatial texts
[1-4]. Owing to this research emphasis, there is much less known about the efficacy
of verbal information to support real-time spatial learning and navigation. What dis-
tinguishes a real-time auditory display from other forms of spatial verbal information
is the notion of dynamic updating. In a dynamically-updated auditory display, the
presentation of information about a person’s position and orientation in the environ-
ment changes in register with physical movement. For example, rather than receiving
*
The authors thank Jack Loomis for insightful comments on the manuscript, Maryann Betty for
experimental preparation, Brandon Friedman for assistance in running participants, and Ma-
saki Miyanohara for helping with running participants and data analysis. This work was sup-
ported by an NRSA grant to the first author, #1F32EY015963-01.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 121–137, 2008.
© Springer-Verlag Berlin Heidelberg 2008
122 N.A. Giudice and J.D. Tietz
a sequential list of all the distances and turns at the beginning of a route, as is done
with traditional verbal directions, a real-time display provides the user with context-
sensitive information with respect to their current location/heading state as they
progress along the route. Vehicle-based navigation systems utilizing GPS and speech-
based route directions represent a good example of these dynamic displays. Dynamic
auditory interfaces also have relevance in navigation systems for the blind, and in this
context, they have proven extremely effective in supporting real-time route guidance
[see 5 for a review].
Rather than addressing route navigation, the current research uses free exploration
of computer-simulated training layouts to investigate environmental learning. The
training environments are explored using a non-visual interface called a virtual verbal
display (VVD). The VVD is based on dynamically-updated geometric descriptions,
verbal messages which provide real-time orientation and position information as well
as a description of the local layout geometry [see 6 for details]. A sample output
string is: “You are facing West, at a 3-way intersection, there are hallways ahead, left,
and behind.” If a user executed a 90° left rotation at this t-junction, the VVD would
return an updated message to reflect that he/she was now facing South, with hallways
extending ahead, left, and right.
We know that geometric-based displays are extremely effective for supporting free
exploration (open search) in both real and computer-based layouts [7-9]. However, their
efficacy for supporting cognitive map development is unclear. That is, participants who
trained using a virtual verbal display to search computer-based environments performed
significantly worse on subsequent wayfinding tests in the corresponding real environ-
ment [7, 8] than subjects who trained and tested exclusively in real environments [9].
These findings suggest that training with a virtual verbal display results in impoverished
environmental learning and cognitive map development compared to use of the same
verbal information for searching real building layouts. This deficit cannot be attributed
to environmental transfer more generally, as previous studies have demonstrated that
learning in virtual environments (VEs) transfers to accurate real-world navigation, even
with perceptually sparse visual displays similar to our geometric verbal display [10-12].
The current studies investigate several factors of interface fidelity which may ac-
count for problems in spatial knowledge acquisition with the VVD. As described by
Waller and colleagues [13], interface fidelity refers to how the input and output of
information from the virtual display is used, i.e. how one’s physical actions affect
movement in the VE and how well feedback from the system supports normal percep-
tual-motor couplings. These interactions can be distinguished from factors relating to
environment fidelity, which refers to how well the information rendered in the VE
resembles the real environment, e.g. sensory richness, spatial detail, surface features,
and field of view [13]. Our previous work with VVDs dealt with environment fidelity,
investigating whether describing more of the layout from a given vantage point, called
“verbal view depth,” would facilitate learning of global structure and aid subsequent
wayfinding behavior. However, the lackluster environmental transfer performance
with three levels of verbal view depth, ranging from local to global descriptions,
demonstrated that deficits in cognitive map development were not due to availability
of environmental information but to the interface itself [7, 8].
The current experiments hold environmental variables constant and manipulate
several factors relating to interface fidelity. Experiment 1 compares traditional verbal
descriptions, where the message is delivered as a monaural signal to both ears, with
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 123
spatialized audio descriptions, where the message is heard as coming from a specific
direction, e.g. a hallway to the left would be heard as a description emanating from
the navigator’s left side. Experiment 2 addresses the influence of body-based informa-
tion, e.g. physical rotation vs. imagined rotation. Experiment 3 follows the same
design of the first two verbal studies but uses a visual display as a control. All ex-
periments incorporate training in computer-based layouts and environmental transfer
requiring wayfinding in the corresponding real environment. Our focus is on the
transfer tests, as they provide the best index of environmental learning and cognitive
map development.
2 Experiment 1
In this study, blindfolded participants are given a training period where they use ver-
bal descriptions to freely explore unfamiliar computer-based floors of university
buildings and seek out four target locations. At test, they must find routes between
target pairs in the corresponding real environment. This design is well-suited for ad-
dressing environmental learning, as theories of cognitive map development have long
emphasized the importance of free exploration and repeated environmental exposure
[14, 15]. The wayfinding test represents a good measure of cognitive map accuracy,
as performance cannot be accomplished using a route matching strategy. Since no
routes are specified during training, accurate wayfinding behavior requires subjects to
form a globally coherent representation of the environment, i.e. the trademark of a
cognitive map [16].
Our previous work with virtual verbal displays was based exclusively on spatial lan-
guage (SL), i.e. consistent, unambiguous terminology for describing spatial relations
[17]. The problem with any purely linguistic display is that the information provided is
symbolic. A description of a door at 3 o’clock in 10 feet has no intrinsic spatial content
and requires cognitive mediation to interpret the message. By contrast, a spatialized
audio (SA) display is perceptual, directly conveying spatial information about the envi-
ronment by coupling user movement with the distance and direction of object locations
in 3-D space. For instance, rather than describing the location of the door, the person
simply hears its name as coming from that location in the environment.
Several lines of research support the benefit of spatialized auditory displays. Ex-
periments comparing different non-visual displays with a GPS-based navigation sys-
tem for the blind have shown that performance on traversing novel routes, finding
landmarks, and reaching a goal state is superior when guided with spatialized audio
versus spatial language [18-20]. Research has also shown that spatialized auditory
displays are beneficial as a navigation aid during real-time flight [21] and for provid-
ing non-visual information to pilots in the cockpit of flight simulators [22]. It is pre-
dicted that spatialized audio displays will have similar benefits on cognitive map
development, especially when training occurs in computer-based environments as are
used here. Spatial updating and environmental learning is known to be more cogni-
tively effortful in VEs than in real spaces, [23, 24]. However, recent work suggests
that SA is less affected by cognitive load than SL during guidance of virtual routes,
yielding faster and more accurate performance in the presence of a concurrent distrac-
tor task [25]. These findings indicate that the use of SA in the VVD may reduce the
working memory demands associated with virtual navigation, thus increasing re-
sources available for cognitive map development.
124 N.A. Giudice and J.D. Tietz
2.1 Method
Fig. 1. Experimental layout with target locations denoted. What is heard upon entering an
intersection (listed above and below the layout) is depicted in gray. Each arrow represents the
orientation of a user at this location.
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 125
Participants navigated the virtual environments using the arrow keys on a USB
numberpad. Pushing the up arrow (8) translated the user forward and the left (4) and
right (6) arrows rotated them in place left or right respectively. Forward movements
were made in discrete "steps," with each forward key press virtually translating the
navigator ahead one corridor segment (approximately ten feet in the real environ-
ment). Left-right rotations were quantized to 90 degrees. Pressing the numpad 5 key
repeated the last verbal message spoken and the 0 key served as a "shut-up" function
by truncating the active verbal message.
Verbal descriptions, based on a female voice, were generated automatically upon
reaching an intersection or target location and rotation at any point returned an up-
dated heading, e.g. “facing north”. A footstep sound was played for every forward
move when navigating between hallway junctions. Movement transitions took ap-
proximately 750 ms. The Vizard 3-D rendering application (www.worldviz.com) was
used to coordinate the verbal messages, present a visual map of what was being heard
for experimenter monitoring, and to log participant search trajectories for subsequent
analyses.
A within subjects design was used with participants running in one spatial language
and spatialized audio condition, counterbalanced across the two experimental envi-
ronments. The experiment comprised three phases. During practice, the movement
behavior was demonstrated and participants were familiarized with the speech output
from the VVD on a visual map depicting what would be spoken for each type of in-
tersection.
Training Phase. To start the trial, blindfolded participants stood in the center of a one
meter radius circle with four three inch RadioShack speakers mounted on tripods (at a
height of 152 cm) placed on the circumference at azimuths of 0° (ahead), 90° (right),
180° (behind) and 270° (left). In the SL conditions, the verbal message was
simultaneously presented from the left and right speaker only. With the SA
conditions, the participant heard the verbal message as coming from any of the four
speakers based on the direction of the hallway being described. The spatialized audio
messages were generated by sending the signal from the speaker outputs on the
computer’s sound card (Creative Labs Audigy2 Platinum) to a four-channel
multiplexer which routed the audio to the relevant speaker. The input device was
affixed via Velcro to an 88 cm stand positioned directly in front of them.
Subjects were started from an origin position in the layout, designated as "start"
and instructed to freely explore the environment using the verbal descriptions to ap-
prehend the space and the input device to affect movement. Their task for the training
period was to cover the entire layout during their search and to seek out four hidden
target locations. Although no explicit instructions were given about search strategy or
specific routes, they were encouraged to try to learn the global configuration of the
layout and to be able to navigate a route from any target to any other target. The train-
ing period continued until the number of forward moves in their search trajectory
equaled three times the number of segments comprising the environment. Participants
were alerted when 50 % and 75 % of their moves were exhausted.
126 N.A. Giudice and J.D. Tietz
Testing Phase. Upon completion of the training period, participants performed the
transfer tests. Blindfolded, they were led via a circuitous route to the corresponding
physical floor and started at one of the target locations. After removing the blindfold,
participants were told they were now facing north, standing at target X and requested
to walk the shortest route to target Y. They performed this wayfinding task using
vision, no verbal descriptions about the environment or target locations were given.
Participants indicated that they had reached the destination by speaking the target’s
name (e.g., “I have reached target dog”). To reduce accumulation of error between
trials, they were brought to the actual target location for incorrectly localized targets
before proceeding. Participants found routes between four target pairs, the order of
which were counterbalanced.
Analysis. Although our focus was on transfer performance, three measures of search
behavior were also analyzed from the training phase in all experiments:
1. Floor coverage percent: the number of unique segments traversed during train-
ing divided by the total number of segments in the environment.
2. Unique targets percent: ratio of unique targets encountered during training to the
total number of target locations (4).
3. Shortest routes traversed: sum of all direct routes taken between target locations
during the search period. A shortest route equals the route between target loca-
tions with the minimum number of intervening segments.
Two wayfinding test measures were analyzed for all studies during the transfer phase
in the real building:
1. Target localization accuracy percent: ratio of target locations correctly found at
test to the total number of target localization trials (four).
2. Route efficiency: length of the shortest route between target locations divided by
length of the route traveled (only calculated for correct target localization trials).
As predicted, training performance using both VVD display modes revealed accurate
open search behavior. Collapsing across SL and SA conditions, participants covered
97.3% of the segments comprising each floor, found 97.3% of the target locations and
traveled an average of 9.9 shortest routes between targets. By comparison, the theo-
retical maximum number of shortest routes traveled during the training period, given
100% floor coverage with the same number of moves, is 14.5 (averaged across
floors). Results from the inferential tests provide statistical support for the near identi-
cal performance observed between inputs; none of the one-way repeated measures
ANOVAs conducted for each training measure revealed reliable differences between
SL and SA conditions, all ps > .1. Indeed, performance on the training measures was
almost identical for all conditions across experiments (see table 1 for comparison of
all means and standard errors). These findings indicate that irrespective of training
condition, subjects adopted a broadly distributed, near optimal route-finding search
strategy.
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 127
Table 1. Training Measures of Experiments 1-3 by Condition. Each cell represents the mean (±
SEM) on three measures of search performance for participants in experiments 1-3. No signifi-
cant differences were observed between any of the dependent measures.
2 (N=16) Rotation
98.99(0.85) 100.00(0) 12.0625(0.99)
+ Spatialized Audio
Rotation
99.14(0.50) 98.44(1.56) 10.94(1.11)
+ Spatial Language
with target localization performance of 80% observed in a previous study after verbal
learning in real buildings [9]. This similarity is important as it shows that the same
level of spatial knowledge acquisition is possible between learning in real and virtual
environments. Our results are consistent with the advantage of spatialized auditory
displays vs. spatial language found for route guidance [18-20, 25] and extend the
efficacy of spatialized audio displays for supporting cognitive mapping and wayfind-
ing behavior.
3 Experiment 2
Experiment 2 was designed to assess the contribution of physical body movement
during virtual verbal learning on cognitive map development. Navigation with our
virtual verbal display, as with most desktop virtual environment technologies, lacks
the idiothetic information which is available during physical navigation, i.e. body-
based movement cues such as proprioceptive, vestibular, and biomechanical feed-
back. VEs incorporating these cues have greater interface fidelity as the sensorimotor
contingencies are more analogous to real-world movement [26]. Various spatial be-
haviors requiring accessing an accurate cognitive map show improved performance
when idiothetic information is included. For instance, physical rotation during VE
learning vs. imagined rotation benefits tasks requiring pointing to previously learned
targets [27, 28], estimation of unseen target distances [29] and updating self orienta-
tion between multiple target locations [30]. Path integration is also better in VEs pro-
viding proprioceptive and visual information specifying rotation compared to visual
information in isolation [31]. The inclusion of idiothetic information has also led to
improved performance on cognitive mapping tasks similar to the current experiment,
where VE learning is tested during transfer to real-world navigation [32, 33].
Where the previous work has addressed the role of body-based cues with visual
displays, Experiment 2 investigates whether similar benefits for verbal learning mani-
fest when physical body rotation is included in the VVD. As with experiment 1, par-
ticipants use the VVD to explore computer-based training environments and then
perform wayfinding tests in the corresponding real environment. However, rather than
using arrow keys to affect imagined rotations and translations during training, partici-
pants physically turn in place whenever they wished to execute a change of heading.
Translations are still done via the keypad as the benefit of physical translation on VE
learning is generally considered nominal. This is consistent with studies in real envi-
ronments showing that pointing to target locations is faster and more accurate after
actual than imagined rotations, whereas errors and latencies tend not to differ between
real and imagined translations [34].
We predict that inclusion of idiothetic information in the VVD will yield marked
improvements in spatial knowledge acquisition and cognitive map development. In
addition to the previous evidence supporting body-based cues, we believe the conver-
sion of linguistic operators into a spatial form in memory is a cognitively effortful
process, facilitated by physical movement. Evidence from several studies support this
movement hypothesis. Avraamides and colleagues (Experiment 3, 2004) showed that
mental updating of allocentric target locations learned via spatial language was
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 129
impaired until the observer was allowed to physically move before making their judg-
ments, presumably inducing the spatial representation. Updating object locations
learned from a text description is also improved when the reader is allowed to physi-
cally rotate to the perspective described by the text [35], with egocentric direction
judgments made faster and more accurately after physical, rather than imagined
rotation [36].
To test our prediction, this experiment adds real rotation to the spatialized audio
and spatial language conditions of Experiment 1. If the inclusion of rotational infor-
mation is critical for supporting environmental learning from verbal descriptions,
wayfinding performance during real-world transfer should be better after training with
both physical rotation conditions of the current experiment than was observed in the
analogous conditions with imagined rotation of Experiment 1. Furthermore, assuming
some level of complementarity between rotation and spatialization, the rota-
tion+spatialized audio (R+SA) condition is predicted to show superior performance to
the rotation+spatial language (R+SL) condition.
3.1 Method
Sixteen blindfolded-sighted participants, nine female and seven male, ages 18-24
(mean = 19.6) ran in the two hour study.
Experiment 2 employs the same spatial language and spatialized audio conditions as
Experiment 1 and adopts the same within Ss design using two counterbalanced condi-
tions, each including a practice, training, and transfer phase. The only difference from
Experiment 1 is that during the training phase, participants used real body rotation in
the VVD instead of imagined rotation via the arrow keys. Since all intersections were
right angle, left and right rotations always required turning 90° in place. An automati-
cally-updated heading description was generated when their facing direction was ori-
ented with the orthogonal corridor. They could then either continue translating by
means of the keypad or request an updated description of intersection geometry.
Heading changes were tracked using a three degree-of-freedom (DOF) inertial orienta-
tion tracker called 3D-Bird (ascension corporation: http://www.ascension-tech.com/
products/3dbird.php).
Fig. 2. Comparison of mean target localization accuracy (± SEM) between Experiments 1 and
2. Note: Both experiments compared SL and SA conditions but Experiment 1 used imagined
rotation and Experiment 2 (gray bars) used body rotation.
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 131
The finding that idiothetic information did not benefit transfer performance was
unexpected given previous literature showing that physical body movement during
and after verbal learning significantly improves latency and error performance at test
[35-37]. Differences in task demands likely contribute to these findings. In the previ-
ous studies, subjects learned a series of target locations from text or speech descrip-
tions and then were tested using a pointing-based spatial updating task. The increased
weighting of physical movement demonstrated in those studies may be less important
with the free exploration paradigm and transfer tests used here, as these tasks do not
force updating of Euclidean relations between targets. Thus, the addition of a pointing
task between target locations may have shown greater benefit of physical rotation
than was evident from our wayfinding task. This needs to be addressed in future ex-
periments as it cannot be resolved from the current data.
4 Experiment 3
Experiment 3 followed the same design of the previous two studies but subjects
learned the computer-based training environments from a visual display rather than a
verbal display. The main goal of Experiment 3 was to provide a good comparison
benchmark with the previous two verbal experiments. Specifically, we wanted to
investigate whether learning with verbal and visual displays lead to comparable envi-
ronmental transfer performance, findings which would provide proof of efficacy of
the VVD. Our previous experiments using an almost identical design to the current
studies, found that wayfinding performance during environmental transfer was sig-
nificantly worse after learning from a virtual verbal display than from a visual display
[8, Experiment 3, 10]. However, those studies only compared visual learning with a
spatial language condition, analogous to that used in experiment 1. By contrast, the
significantly improved transfer performance of the spatialized audio conditions are on
par with our previous findings with the visual display. Likewise, the SA conditions in
the first two experiments provide perceptual information about the direction of hall-
ways which is better matched with what is apprehended from a visual display. Since
the visual display and movement behavior in the previous studies differed slightly
from the information and movement of the VVD used here, Experiment 3 was run to
serve as a more valid comparison.
4.1 Method
Fourteen normally sighted participants, six females and eight males, ages 18-21
(mean = 19.2) ran in the one hour study.
The experimental procedure was identical to the previous studies except that sub-
jects only learned one environment and trained with a visual display instead of the
VVD. During training, participants saw the same geometric “views” of the layout on
the computer monitor (Gateway VX700, 43.18 cm diagonal) as were previously de-
scribed with each message from the VVD. The environment was viewed from the
center of the monitor and movement was performed via the keypad’s arrow keys, as
described earlier. Figure 3 shows an example of what would be seen from a 3-way
intersection. With each translation, the participant heard the footstep sound and the
132 N.A. Giudice and J.D. Tietz
Fig. 3. Sample 3-way intersection as seen on a visual display. Information seen from each view
is matched to what would be heard in the corresponding message from the VVD.
next corridor segment(s) was displayed with an animated arrow indicating forward
movement. With rotations, they saw the viewable segments rotate in place and an
updated indication of heading was displayed. In addition, they heard the target names,
starting location noise, foot step sound, and percent of training time elapsed via mon-
aural output through the same speakers. This design ensured the visual display was
equivalent in information content to what was available in the auditory conditions of
experiments 1 and 2.
Performance on the transfer tests after visual learning was quite good, resulting in
target localization accuracy of 78.6% (SE = 8.6) and route efficiency of 95.6% (SE =
2.4). Given our interest in comparing learning performance between the visual display
and the VVD, independent samples t-tests were used to evaluate how wayfinding
performance after visual learning compared to the same tests after verbal learning in
Experiments 1 and 2. As the presence or absence of spatialized information was the
only factor that reliably affected verbal learning performance, the visual learning data
was only compared to the combined performance from the spatial language and spati-
alized audio conditions of the previous experiments, collapsing across imagined and
real rotation. Note that these between-subjects comparisons were based on partici-
pants drawn from a similar background and who fell within the same range of spatial
abilities as measured by the SBSOD scale. As can be seen in Figure 4, the 78.6% (SE
= 8.6) target localization performance observed after visual learning was significantly
better than the 54.6% (SE = 5.4) performance of the spatial language conditions, t(28)
= 2.345, p=.027. By contrast, target localization accuracy in the spatialized audio
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 133
conditions, 77.5% (SE = 4.1), was almost identical to performance in the visual condi-
tion, t(26) = .116, p=.908. In agreement with the previous studies, route efficiency
was highly insignificant between all conditions, ps > .1.
Experiment 3 was run to benchmark performance with the VVD against visual
learning. Replicating earlier work, transfer performance after learning from a visual
display was significantly better than learning with spatial language with a VVD [8,
Experiment 3, 10]. However, target localization accuracy between the spatialized
audio conditions and the visual condition were nearly identical. This finding suggests
that learning with a spatialized audio display and an information-matched visual dis-
play build up into a spatial representation in memory which can be acted on in a func-
tionally equivalent manner.
Fig. 4. Comparison of mean target localization accuracy (± SEM) across all experiments. “Spa-
tial language” represents combined data from the two language conditions of Experiments 1
and 2, collapsing across imagined and real rotation. “Spatialized audio” represents the same
combined data from the two spatialized conditions of Experiments 1 and 2.
5 General Discussion
The primary motivation of these experiments was to investigate verbal learning and
cognitive map development using a new type of non-visual interface, called a virtual
verbal display. Previous research has demonstrated that VVDs support efficient
search behavior of unfamiliar computer-based environments but lead to inferior cog-
nitive map development compared to verbal learning in real environments or learning
in visually rendered VEs. The aim of this research was to understand what could ac-
count for these differences. Deficits in spatial knowledge acquisition with the VVD
134 N.A. Giudice and J.D. Tietz
were postulated as stemming from inadequacies of the interface. To address this pre-
diction, two factors influencing interface fidelity, spatialized audio and physical
rotation, were compared on a wayfinding task requiring accessing of an accurate
cognitive map.
Results showing almost identical performance on the training measures for all con-
ditions across experiments (see Table 1) but widely varying wayfinding accuracy
during transfer tests in the real building are informative. Indeed, these findings sup-
port the hypothesis that deficits in cognitive map development are related to factors of
interface fidelity, rather than use of ineffective search strategies with the VVD. The
most important findings from these studies are the results showing that information
about layout geometry conveyed as a spatialized verbal description versus from spa-
tial language lead to a dramatic improvement on cognitive map development. These
findings are congruent with previous studies showing an advantage of 3-D spatial
displays vs. spatial language during route guidance [18-20, 25].
The current results extend the efficacy of spatialized audio for providing perceptual
access to specific landmarks in the surrounding environment for use in route naviga-
tion to specifying environmental structure during free exploration to support cognitive
mapping. Of note to the motivations of the current work, wayfinding performance
during transfer after learning in the SA conditions in the VVD was on par with per-
formance after learning with an information-matched visual display, experiment 3,
and with verbal learning in real buildings [9]. The similarity of these results suggest
that virtual verbal displays incorporating spatialized information can support equiva-
lent spatial knowledge acquisition and cognitive map development. Although
comparisons between verbal and visual learning were made between subjects in the
current paper, these results are consistent with previous findings demonstrating func-
tionally equivalent spatial representations built up after learning target arrays between
the same conditions [38]. Interestingly, the benefit of SA seems to be magnified for
open search exploration of large-scale environments vs. directed guidance along
routes, as the 50% improvement for spatialized information observed in the current
study is much greater than the marginal advantage generally found in the previous
real-world route guidance studies. This finding is likely due to the increased cognitive
effort known for learning and updating in VEs [23, 24] being offset by the decreased
working memory demands of processing spatialized audio vs. spatial language [25].
The effects of including physical rotation vs. imagined rotation in the VVD were
investigated in Experiment 2. We expected this factor to have the greatest influence
on virtual verbal learning given the importance attributed to idiothetic cues from the
inclusion of physical rotation in visually rendered VEs [27, 29, 31, 33], and the im-
portance of physical movement on updating verbally learned target locations [35, 36].
Surprisingly, the inclusion of physical rotation during training with the VVD did not
lead to a significant advantage on subsequent wayfinding performance. Indeed, com-
parison of transfer performance between experiments 1 and 2 shows that conditions
employing spatialized descriptions led to the best verbal learning performance and did
not reliably differ whether they employed real or imagined rotation. As discussed in
Experiment 2, this finding may relate to our experimental design and more research is
needed to make any definitive conclusions.
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 135
References
1. Taylor, H.A., Tversky, B.: Spatial mental models derived from survey and route descrip-
tions. Journal of Memory and Language 31, 261–292 (1992)
2. Denis, M., et al.: Spatial Discourse and Navigation: An analysis of route directions in the
city of Venice. Applied Cognitive Psychology 13, 145–174 (1999)
3. Lovelace, K., Hegarty, M., Montello, D.: Elements of good route directions in familiar and
unfamiliar environments. In: Freksa, C., Mark, D.M. (eds.) Spatial information theory:
Cognitive and computational foundations of geographic information science, pp. 65–82.
Springer, Berlin (1999)
4. Tversky, B.: Spatial perspective in descriptions. In: Bloom, P., et al. (eds.) Language and
Space, pp. 463–492. MIT Press, Cambridge (1996)
5. Loomis, J.M., et al.: Assisting wayfinding in visually impaired travelers. In: Allen, G.L.
(ed.) Applied spatial cognition: From research to cognitive technology, pp. 179–202. Erl-
baum, Mahwah (2007)
6. Giudice, N.A.: Navigating novel environments: A comparison of verbal and visual learn-
ing, Unpublished dissertation, University of Minnesota, Twin Cities (2004)
7. Giudice, N.A.: Wayfinding without vision: Learning real and virtual environments using
dynamically-updated verbal descriptions. In: Conference and Workshop on Assistive
Technologies for Vision and Hearing Impairment, Kufstein, Austria (2006)
8. Giudice, N.A., et al.: Spatial learning and navigation using a virtual verbal display. ACM
Transactions on Applied Perception (in revision)
9. Giudice, N.A., Bakdash, J.Z., Legge, G.E.: Wayfinding with words: Spatial learning and
navigation using dynamically-updated verbal descriptions. Psychological Research 71(3),
347–358 (2007)
10. Giudice, N.A., Legge, G.E.: Comparing verbal and visual information displays for learning
building layouts. Journal of Vision 4(8), 889 (2004)
11. Ruddle, R.A., Payne, S.J., Jones, D.M.: Navigating buildings in “desk-top” virtual envi-
ronments: Experimental investigations using extended navigational experience. Journal of
Experimental Psychology: Applied 3(2), 143–159 (1997)
12. Bliss, J.P., Tidwell, P., Guest, M.: The effectiveness of virtual reality for administering
spatial navigation training to firefighters. Presence 6(1), 73–86 (1997)
13. Waller, D., Hunt, E., Knapp, D.: The transfer of spatial knowledge in virtual environment
training. Presence 7, 129–143 (1998)
14. Piaget, J., Inhelder, B., Szeminska, A.: The child’s conception of geometry. Basic Books,
New York (1960)
136 N.A. Giudice and J.D. Tietz
15. Siegel, A., White, S.: The development of spatial representation of large scale environ-
ments. In: Reese, H. (ed.) Advances in Child Development and Behavior. Academic Press,
New York (1975)
16. O’Keefe, J., Nadel, L.: The hippocampus as a cognitive map. Oxford University Press,
London (1978)
17. Ehrlich, K., Johnson-Laird, P.N.: Spatial descriptions and referential continuity. Journal of
Verbal Learning & Verbal Behavior 21, 296–306 (1982)
18. Loomis, J.M., et al.: Personal guidance system for people with visual impairment: A com-
parison of Spatial Displays for route guidance. Journal of Visual Impairment & Blind-
ness 99, 219–232 (2005)
19. Loomis, J.M., Golledge, R.G., Klatzky, R.L.: Navigation system for the blind: Auditory
display modes and guidance. Presence 7, 193–203 (1998)
20. Marston, J.R., et al.: Evaluation of spatial displays for navigation without sight. ACM
Transactions on Applied Perception 3(2), 110–124 (2006)
21. Simpson, B.D., et al.: Spatial audio as a navigation aid and attitude indicator. In: Human
Factors and Ergonomics Society 49th Annual Meeting, Orlando, Florida (2005)
22. Oving, A.B., Veltmann, J.A., Bronkhorst, A.W.: Effectiveness of 3-D audio for warnings
in the cockpit. Int. Journal of Aviation Psychology 14, 257–276 (2004)
23. Richardson, A.E., Montello, D.R., Hegarty, M.: Spatial knowledge acquisition from maps
and from navigation in real and virtual environments. Memory & Cognition 27(4), 741–
750 (1999)
24. Wilson, P.N., Foreman, N., Tlauka, M.: Transfer of spatial information from a virtual to a
real environment. Human Factors 39(4), 526–531 (1997)
25. Klatzky, R.L., et al.: Cognitive load of navigating without vision when guided by virtual
sound versus spatial language. Journal of Experimental Psychology: Applied 12(4), 223–
232 (2006)
26. Lathrop, W.B., Kaiser, M.K.: Acquiring spatial knowledge while traveling simple and
complex paths with immersive and nonimmersive interfaces. Presence 14(3), 249–263
(2005)
27. Lathrop, W.B., Kaiser, M.K.: Perceived orientation in physical and virtual environments:
Changes in perceived orientation as a function of idiothetic information available. Presence
(Camb) 11(1), 19–32 (2002)
28. Bakker, N.H., Werkhoven, P.J., Passenier, P.O.: The effects of proprioceptive and visual
feedback on geographical orientation in virtual environments. Presence 8(1), 36–53 (1999)
29. Ruddle, R.A., Payne, S.J., Jones, D.M.: Navigating large-scale virtual environments: What
differences occur between helmet-mounted and desk-top displays. Presence 8(2), 157–168
(1999)
30. Wraga, M., Creem-Regehr, S.H., Proffitt, D.R.: Spatial updating of virtual displays during
self- and display rotation. Mem. and Cognit. 32(3), 399–415 (2004)
31. Klatzky, R.L., et al.: Spatial updating of self-position and orientation during real, imag-
ined, and virtual locomotion. Psychological Science 9(4), 293–299 (1998)
32. Grant, S.C., Magee, L.E.: Contributions of proprioception to navigation in virtual envi-
ronments. Human Factors 40(3), 489–497 (1998)
33. Farrell, M.J., et al.: Transfer of route learning from virtual to real environments. Journal of
Experimental Psychology: Applied 9(4), 219–227 (2003)
34. Presson, C.C., Montello, D.R.: Updating after rotational and translational body move-
ments: Coordinate structure of perspective space. Perception 23(12), 1447–1455 (1994)
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 137
35. de Vega, M., Rodrigo, M.J.: Updating spatial layouts mediated by pointing and labelling
under physical and imaginary rotation. European Journal of Cognitive Psychology 13,
369–393 (2001)
36. Avraamides, M.N.: Spatial updating of environments described in texts. Cognitive Psy-
chology 47(4), 402–431 (2003)
37. Chance, S.S., et al.: Locomotion mode affects the updating of objects encountered during
travel: The Contribution of vestibular and proprioceptive inputs to path integration. Pres-
ence 7(2), 168–178 (1998)
38. Klatzky, R.L., et al.: Encoding, learning, and spatial updating of multiple object locations
specified by 3-D sound, spatial language, and vision. Experimental Brain Research 149(1),
48–61 (2003)
Cognitive Surveying: A Framework for Mobile
Data Collection, Analysis, and Visualization of
Spatial Knowledge and Navigation Practices
Drew Dara-Abrams
1 Introduction
Much has been established about how people learn and navigate the physical
world thanks to controlled experiments performed in laboratory settings. Such
studies have articulated the fundamental properties of “cognitive maps,” the
fidelity of the sensory and perceptual systems we depend on, and the set of
decisions we make in order to reach a novel destination, for instance. Clear and
precise findings certainly, but in the process what has often been controlled
away is the worldly context of spatial cognition. To divorce internal mental
processes from the external influences of the world is to tell an incomplete story,
This work has been generously supported by the National Science Foundation
through the Interactive Digital Multimedia IGERT (grant number DGE-0221713)
and a Graduate Research Fellowship. Many thanks to Martin Raubal, Daniel Mon-
tello, Helen Couclelis, and, of course, Alec and Benay Dara-Abrams for suggestions.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 138–153, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Cognitive Surveying 139
for what sort of thinking is more intimately and concretely tied to our physical
surroundings?
And, forgetting the real world also means forgetting that spatial cognition
research can be valuable to professionals and ordinary people alike. Building
construction and city planning projects are oftentimes so complex that concerns
about engineering or budget take all attention away from the impact that the
environments will ultimately have on the people that will live and work there.
Ecological concerns are now addressed with environmental-impact reports. Spa-
tial cognition research has already identified techniques that could be used to
produce similarly useful reports on how a building floor plan can allow visitors to
successfully navigate or how a neighborhood can be designed to naturally draw
residents together in communal spaces. Non-professionals may not be charged
with designing their surroundings, but still, many would appreciate having an
opportunity to contribute. Spatial cognition methodology can be used to collect,
aggregate, and analyze their input. Also, the same approach can be applied to as-
sist individuals: In-car navigation systems and other location-based services are
notoriously inflexible and would certainly be improved if they took into account
end-users’ spatial knowledge and other subjective tendencies.
What is needed are techniques for precisely measuring spatial cognition in
real-world settings and analyzing the behavioral data in an automated fashion
so that results are consistent and immediately available to act on. Surveying en-
gineers have perfected this sort of measurement and analysis; the only difference
is that whereas a surveyor maps the physical world more or less as it exists,
spatial cognition researchers attempt to characterize the world as it is used and
remembered by people.
Such a sentence is most always followed by a reference to The Image of the
City, that slim and intriguing volume by Kevin Lynch (1960). He and his fel-
low urban planners come to understand three very different American cities by
studying and interviewing residents, and by aggregating the results for each city,
they produce “images” that represent how an average person might remember
Boston, Jersey City, or downtown Los Angeles. These “images” are composed
of five elements (paths, edges, districts, nodes, and landmarks), which Lynch
details in terms at once understandable to an urban designer and meaningful
to a spatial cognition researcher. Unfortunately, it’s less clear how to go about
collecting “images of the city” on your own with any precision or consistency,
since a fair amount of expert interpretation appears to be involved.
Toward that end, let me propose cognitive surveying as an umbrella under
which we can pursue the goal of characterizing people’s spatial knowledge and
navigation practices in more carefully defined computational and behavioral
terms, while still producing results that are understandable to researchers and
laypeople alike. In this paper, I will specify the architecture of such a system for
behavioral data collection, analysis, and visualization. Much relevant research
already exists in spatial cognition, surveying engineering, geographic informa-
tion science, and urban planning; this framework of cognitive surveying ought
to serve well to integrate the pieces.
140 D. Dara-Abrams
2 Cognitive Surveying
2.1 Hardware
The tools of a surveyor have been combined, in recent years, into the single
package of a total station, which contains a theodolite to optically measure an-
gles, a microwave or infrared system to electronically measure distances, and a
computer interface to record measurements (for surveying background, see An-
derson & Mikhail, 1998). Some total stations also integrate GPS units to take
measurements in new territory or when targets are out of sight. The equipment
remains bulky, yet its functions can also now be approximated with portable,
consumer-grade hardware: a GPS unit, an electronic compass, and a mobile com-
puter (as illustrated in Figure 1). If the mobile computer has its own wireless
Internet connection, it can automatically upload measurements to a centralized
server for analysis. Otherwise, the user can download the measurements to a
PC that has an Internet connection. Although it is a somewhat more involved
process, asking the user to connect the mobile device to a PC at home, work,
or in a lab provides a further opportunity to also assess their spatial knowledge
away from the environment, by presenting tasks, like a map arrangement, on
the big screen of a PC. (More on these tasks and other measurements momen-
tarily.) Ultimately, cellular phones may be the platform of choice for real-world
measurement, since they are connected, ready at hand, and kept charged.
With their equipment, surveyors make only a few types of measurements, but by
repeating elementary measurements they are able to perform complex operations
like mapping property boundaries. The measurement techniques of cognitive
surveying are similarly elementary, already widely used in the spatial cognition
literature (if rarely used together), and become interesting when repeated ac-
cording to a plan (detailed in Figure 2). Since all this data is collected in the
field, the most fundamental measurement is the user’s position, which can be
captured to within a few meters by GPS, cleaned to provide a more accurate
fix, and recorded to a travel log (see Shoval & Isaacson, 2006). The GPS in-
formation can be supplemented with status information provided by the user
Cognitive Surveying 141
USB
download to user’s PC
for map arrangement
electronic compass
upload data to server (affixed to mobile computer)
for analysis, output, and Internet
visualization
(e.g., “lost” or “in car”). The travel log alone allows us to begin to evaluate
navigation practices (to be discussed in Section 3).
More complex measurements are required to assess spatial knowledge. These
are landmarks, other point measures, direction estimates, and distance estimates
(to be discussed in Section 4). Landmarks are points of personal importance
142 D. Dara-Abrams
labeled by the user. She may label any point of her choosing; an algorithm can
be used to suggest potential landmarks to her as well. Her knowledge of the
landmarks’ relative locations is measured by direction and distance questions.
From one landmark, she is asked to point to other landmarks, using an electronic
compass. (Compass readings will need to be corrected against models of magnetic
declination, which are available on-line.) Also, she is asked to judge the distance
from her current location to other landmarks by keying in numerical estimates.
In addition to labeling landmarks, the user can be asked to provide other point
measures. For instance, she can be asked “What’s the name of the neighborhood
you’re currently in?” The algorithms that decide when to ask these questions
can call on a base map of the environment, not to mention the user’s travel log.
Finally, when the user is sitting at a larger computer screen—back home at her
PC, say—she can be asked to again consider her landmarks by arranging them
to make a map. For all of these tasks, both the user’s answer and her reaction
time can be recorded.
From these measurements will come a comprehensive data set on an individ-
ual’s spatial knowledge and navigation practices to analyze and visualize.
Lynch only produced “images” for groups, but from this point, angle, and dis-
tance data can come both individual and aggregate analyses (see Figure 3). Of
particular interest will be the routes that people take, the accuracy of their spa-
tial knowledge, and the contents of their spatial knowledge (all of which will
be discussed below). While quantitative output will be necessary to run behav-
ioral studies, also important will be visualizations, which are oftentimes much
more effective at quickly conveying, for instance, the distorted nature of spatial
knowledge or the cyclical nature of a person’s movement day after day (see also
Dykes & Mountain, 2003; Kitchin, 1996a).
People move, doing so for any number of reasons, through any number of spaces,
at any number of scales. As such, a number of research traditions consider human
movement. At the scale of cities and other large environments, time geography
studies how the constraints of distance limit an individual, and transportation ge-
ography studies the travel of masses between origins and destinations (Golledge
& Stimson, 1996). Cognition is certainly involved in both, but memory and cog-
nitive processes are most evident, and most studied, in navigation. More specif-
ically, spatial cognition research often decomposes navigation into locomotion
and wayfinding (Montello, 2005), the former being perceptually guided move-
ment through one’s immediate surrounds (walking through a crowded square
and making sure to avoid obstacles, say) and the latter being route selection
between distant locations (figuring out how to travel from Notre Dame to the
Luxembourg Gardens, say). When attempting to understand how an individual
Cognitive Surveying 143
landmark map
arrangements aggregate analyses
spatial knowledge contents
• compute regularity of landmark use across multiple participants
environmental • evaluate distributions of point measures (say, to identify neighborhood
areas); draw polygons around regions of similar points (similar to
data sources Montello, Goodchild, Gottsegen, & Fohl, 2003)
base map
of environment
4 Spatial Knowledge
Spatial knowledge is the stored memories we call on when orienting ourselves in
a familiar environment, navigating toward a known destination, writing route
directions for a visitor from out of town, and so on. Like other aspects of cog-
nition, spatial knowledge can be modeled in a computational fashion (Kuipers,
1978). That, however, is not the goal of cognitive surveying, which is focused
on measuring spatial knowledge. In fact, the “read-out” provided by a cogni-
tive surveying system should be of much interest to cognitive scientists who are
developing and calibrating computational models of spatial knowledge.
Therefore, what is needed for the purposes of cognitive surveying is not a the-
ory of spatial knowledge itself but simply a working abstraction that can be used
to measure spatial knowledge. Lynch’s five elements are one such abstraction,
144 D. Dara-Abrams
but some are too subjective. By borrowing from surveying engineering, we can
fashion a more computationally precise set of elements to use to measure spatial
knowledge: landmarks, direction estimates, distance estimates, and regions.
user input GPS signal loss GPS point clustering environmental analysis
• (see Marmasse & Schmandt, 2000) (after Ashbrook & Starner, 2003) (after Raubal & Winter, 2002)
please press this button
whenever you’re near
procedure:
a landmark that you’d
1. ahead of time identify potential
like to label ? landmarks by analyzing base maps,
? taking into account:
• visual attraction (facade area,
shape, color, visibility)
• semantic attraction (cultural and
historical importance, signage)
procedure: 2. when user nears a potential land-
1. watch for loss of GPS signal mark, ask if it’s personally meaning-
procedure: ful and worth labeling
2. when the signal degrades and then
disappears, assume the user has 1. filter based on speed to separate
entered a building moving points (unfilled dots) from
3. ask user if the obstructing location is pause points (filled dots)
worth labeling 2. when pause points cluster together,
ask the user if this is a meaningful
landmark worth labeling
Once a surveyor has identified measurement points, the next task is to deter-
mine the directions and distances among the set. Using triangulation and other
trigonometric techniques may mean that some measurements can be inferred
from others. People certainly rely on shortcuts and heuristics, too, but whereas
Cognitive Surveying 145
traversing D
C
A
dDC
αDC
αCB
dCB
dBA B
procedure:
1. estimate distance (d) to previous
landmark visited
αBA 2. estimate angle (α) to previous
landmark visited
two advantages: a disadvantage: perhaps only useful for
• easy computations measuring routes (see Cornell, Heth, &
• not too many estimates required Rowat, 1992)
triangulation A
(direction only)
B
procedure:
1. estimate directions to other
αBA αBA
landmarks so that each landmark αAD α
AE
sits at the vertex of a triangle αBD
2. fit together direction estimates to D
C αBC αDA
determine the relative position of
each landmark
αCB
αDB
an advantage: can measure survey αCD
knowledge for large areas αDE
a disadvantage: depends on a large αCE αDC
number of estimates E
a question: How does triangle size affect
the data collected? What would be the αED αEA
difference between using ∆BCD and
∆BCE? αEC
trilateration
procedure: A
1. estimate distances from landmark dAB
to landmark (note that A-B can be
perceived to be a different distance
B
dBA
than B-A)
2. combine distance estimates using dBC dAD
dDB dDA
multidimensional scaling to produce a dBD
best-fit arrangement of landmarks dCB
advantages, disadvantages, and questions C dDC
are similar to triangulation D dAE dEA
dCD
another issue: distance knowledge is often
dED
poorer than direction knowledge (Waller & dEC dDE
Huan, 2003)
dCE
E
triangulation
B
(direction and distance) αBA
dAB
αBA
procedure: d αAD αAE
1. estimate directions and distances αBD BA dAD
from landmark to landmark dBC α d D dDA
C BC DB α
2. combine estimates using direction/ dBD DA
αCB
distance scaling to produce a best-fit dCB
d αDB
arrangement of landmarks αCD DC d d
an advantage: more comprehensive αDE AE EA
measure of survey knowledge αCE dCD αDC
d
a disadvantage: computationally intensive dEC dDE ED E
a question: Can missing estimates be dCE αED αEA
approximated based on others that were
in fact taken?
αEC
proper. Thus, triangulation is probably best used for cognitive surveying when
performed without any distance estimates or with a number of distance estimates
distributed around the triangle network. If using only distance measurements,
trilateration—as is performed in more complex forms to derive positions from
GPS signals—can be used instead.
In any case, the cognitive surveyor has an advantage over the land surveyor:
Precise base maps already exist for most urban and natural environments, and so
we can use information about streets, paths, and other environmental features
to guide the sampling design that determines when to ask users to estimate
directions and distances. (More on sampling design in the next section.)
As these direction and distance estimates are collected, they can be integrated
in order to approximate the user’s overall spatial knowledge and to propagate
error among repeated measurements. Multidimensional scaling, or MDS, is one
such technique often used to turn pairwise distance and direction estimates into
a best-fit two-dimensional configuration (Waller & Haun, 2003). When apply-
ing MDS to distance estimates alone, the optimization procedure is effectively
trilateration repeated many times.
4.3 Learning
People do not come by their spatial knowledge of an environment instantaneously—
we learn over time from repeated exposure and novel experience. The most widely
accepted theory of spatial microgenesis (as one’s acquisition of spatial knowledge
for an environment is called) proposes that people first learn the locations of point-
like landmarks, then learn the linear routes that connect pairs of landmarks, and
148 D. Dara-Abrams
finally learn how the landmarks and routes fit into an overall configuration, known
as survey knowledge (Siegel & White, 1975). If people follow discrete stages in this
manner, they will not begin to acquire metric knowledge, like the direction between
a pair of landmarks, until the final stage. Yet longitudinal studies suggest that spa-
tial microgenesis may progress in a continuous manner, without qualitatively differ-
ent stages (Ishikawa & Montello, 2006). The consistent, automated data collection
that a cognitive surveying system offers will be invaluable for studying how people
learn an environment over time.
4.4 Regions
One way by which we learn environments is to subdivide them into meaningful
regions (Hirtle, 2003). In the case of cities, these regions are usually neighbor-
hoods, districts, wards, barrios, and so on. Some are official while others are
informally defined—the City of London versus the Jets’ turf. Even if a region
name is in common parlance, its boundary is still likely vague (Montello, Good-
child, Gottsegen, & Fohl, 2003).
Regions may be areas, but like any other polygon, their extents can be ap-
proximated by point measures. In other words, users can be asked occasionally
“What’s the name of this neighborhood?” and around that sampling of points,
polygons of a certain confidence interval can be drawn. As with direction and
distance estimates, there is the question of when to ask point measurement
questions. A number of sampling designs can be used (as in Figure 6): wait for
user input; ask questions at preset temporal intervals; ask questions at uniform,
preset spatial locations; and, ask questions at preset spatial locations whose se-
lection has been informed by base maps of the environment in question. The
best approach is likely a combination of all four.
Other point measures may be collected and analyzed in a similar manner. For
example, in one of the more clever demonstrations of GPS tracking, Christian
Nold has logged people’s position and their galvanic skin response with the goal
of mapping the physical arousal associated with different parts of a city (see
biomapping.net).
Cognitive Surveying 149
This sort of subjective spatial data is highly personal, yet when aggregated it
can be of use. Again, take the example of regions. Mapping firms now compete
to provide data sets of city neighborhoods to Web search engines, real estate
firms, and others who want to organize their spatial data in a more intuitive
manner. (For example, see Zillow.com and EveryBlock.com.) A cognitive sur-
veying system is one means by which an individual’s spatial knowledge can be
measured and aggregated for these sorts of purposes.
Using this sort of subjective spatial data may very well help LBS become
easier and more useful for the end-users.
150 D. Dara-Abrams
7 The Environment
So far we have focused on the knowledge that people carry in their heads and the
cognitive processes that they use to navigate. Both are, by definition, intimately
tied to the physical world, the information that it offers and the constraints
that it imposes. Spatial cognition researchers wish to understand the interplay,
while the designers and planners who are charged with creating and enhancing
built environments want to understand how those places are used. As cognitive
surveying can be performed accurately in real-world settings, such a system can
be effective in both cases.
The behavioral data being collected and analyzed here is perfectly suited for
comparison with computational models of environmental form. In short, an en-
vironmental form model captures the patterns of accessibility or visibility for a
physical setting like a building interior or a university campus using grid cells
(Turner, Doxa, O’Sullivan, & Penn, 2001), lines (Hillier & Hanson, 1984), or
other geometric building blocks. Quantitative measures can be computed and
extracted for a certain location, a path, or an entire region. Certain environmen-
tal form measures have been found to predict the number of pedestrians walking
on city streets (Hillier, Penn, Hanson, & Xu, 1993) and the accuracy of students’
spatial knowledge for their university campus (Dara-Abrams, 2008). A cognitive
surveying system will help further research on the relationship between human
behavior/cognition and models of environmental form.
Even without the specificity of an environmental form model, the data collec-
tion and analysis of cognitive surveying can inform the work of architects, urban
designers, and city planners. Lynch demonstrated that collecting “images of the
city” identifies design flaws to remediate, captures reasons underlying residents’
attitudes toward development, and reveals which places are attractive to resi-
dents and which are not. These, among other practical outcomes of measuring
spatial knowledge and navigation practices, are details that can guide not just
the mechanics of design but also the way in which projects are presented and
framed to the public. Collecting “images” depends on trained experts, but a cog-
nitive surveying system could be deployed and used by architects and planners,
as well as expert cognitive scientists.
between those landmarks, and so on—but from these, complex descriptions can
be constructed and theoretically interesting questions addressed, including:
– When people are allowed to freely travel through an environment, does their
spatial knowledge contain the same sort of systematic errors that have been
found in lab-based studies?
– When people repeatedly explore an environment, how does their spatial
knowledge develop over time? Does their learning follow a fixed set of qual-
itative stages or instead progressively increase from the beginning?
– How do spatial abilities relate to other factors that may also cause individual
differences in spatial knowledge and navigation practices (e.g, regular travel
extent, confidence in spatial abilities, sex, demographics)?
– What are the optimal surveying operations and sampling designs for mea-
suring spatial knowledge? Are particular parameters more appropriate for
certain circumstances and studies than others? For instance, is knowledge for
a long route best tested using a different set of parameters than knowledge
for a neighborhood?
– Can models of environmental form predict where people are likely to travel,
which features they are likely to remember, and how accurate that spatial
knowledge will likely be? If so, can these models be used to better under-
stand which particular properties of real-world environments influence peo-
ple’s spatial knowledge and navigation practices?
– How can the automated collection of this subjective data improve location-
based services and assist the users of other electronic services?
– Will summaries and visualizations of people’s spatial knowledge and navi-
gation practices make for the beginnings of a “psychological-impact report”
for environmental design projects?
Cognitive surveying will better enable us to pursue all of these research
questions.
This paper’s contribution is the framework of cognitive surveying. In the fu-
ture, I intend to present implemented systems along with results that begin to
address the preceding questions. Even as a conceptual framework, cognitive sur-
veying can already help us take spatial cognition research into the real world.
We now know what sort of questions to ask of a person and what sort of mea-
surements to record, when to ask each question and when to alternate methods,
how to synthesize all these measurements and how to present them for analysis.
In addressing such issues, cognitive surveying will allow us to characterize the
world as it is remembered and used by people—if not with absolute accuracy, at
least with consistency and ease.
References
Anderson, J.M., Mikhail, E.M.: Surveying: Theory and practice. WCB/McGraw-Hill,
Boston (1998)
Ashbrook, D., Starner, T.: Using GPS to learn significant locations and predict move-
ment across multiple users. Personal Ubiquitous Computing 7, 275–286 (2003)
152 D. Dara-Abrams
Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study.
Environment and Behavior 23, 47–69 (1991a)
Montello, D.R.: The measurement of cognitive distance: Methods and construct valid-
ity. Journal of Environmental Psychology 11, 101–122 (1991b)
Montello, D.R.: Navigation. In: Shah, P., Miyake, A. (eds.) The Cambridge handbook
of visuospatial thinking, pp. 257–294. Cambridge University Press, UK (2005)
Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P.: Where’s downtown?: Behav-
ioral methods for determining referents of vague spatial queries. Spatial Cognition
and Computation 3, 185–204 (2003)
Montello, D.R., Richardson, A.E., Hegarty, M., Provenza, M.: A comparison of methods
for estimating directions in egocentric space. Perception 28, 981–1000 (1999)
Nothegger, C., Winter, S., Raubal, M.: Selection of salient features for route directions.
Spatial Cognition and Computation 4, 113–136 (2004)
Nurmi, P., Koolwaaij, J.: Identifying meaningful locations. In: The 3rd Annual In-
ternational Conference on Mobile and Ubiquitous Systems: Networks and Services
(MobiQuitous), San Jose, CA (2006)
Raubal, M., Miller, H.J., Bridwell, S.A.: User-centered time geography for location-
based services. Geografiska Annaler-B 86, 245–265 (2004)
Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks. In:
Egenhofer, M., Mark, D. (eds.) Geographic Information Science, pp. 243–259.
Springer, Heidelberg (2002)
Sadalla, E.K., Burroughs, W.J., Staplin, L.J.: Reference points in spatial cognition.
Journal of Experimental Psychology: Human Memory and Learning 5, 516–528
(1980)
Shoval, N., Isaacson, M.: Application of tracking technologies to the study of pedestrian
spatial behavior. The Professional Geographer 58, 172–183 (2006)
Siegel, A.W., White, S.H.: The development of spatial representations of large-scale
environments. In: Advances in child development and behavior, vol. 10, pp. 9–55.
Academic, New York (1975)
Sorrows, M.E., Hirtle, S.C.: The nature of landmarks for real and electronic spaces.
In: Freksa, C., Mark, D. (eds.) Spatial Information Theory, pp. 37–50. Springer,
Heidelberg (1999)
Srinivas, S., Hirtle, S.C.: Knowledge-based schematization of route directions. In:
Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial cognition
V: Reasoning, action, interaction, pp. 346–364. Springer, Berlin (2006)
Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychology 10,
422–437 (1978)
Turner, A., Doxa, M., O’Sullivan, D., Penn, A.: From isovists to visibility graphs: A
methodology for the analysis of architectural space. Environment and Planning B:
Planning and Design 28, 103–121 (2001)
Tversky, B.G.: Distortions in cognitive maps. Geoforum 23, 131–138 (1992)
Waller, D.A., Beall, A., Loomis, J.M.: Using virtual environments to assess directional
knowledge. Journal of Environmental Psychology 24, 105–116 (2004)
Waller, D.A., Haun, D.B.M.: Scaling techniques for modeling directional knowledge.
Behavior Research Methods, Instruments, and Computers 35, 285–293 (2003)
What Do Focus Maps Focus On?
1 Introduction
Maps are a dominant medium to communicate spatial information. They are
omnipresent in our daily life. In news and ads they point out where specific
places are, often in relation to other places; they link events, dates, and other
data to locations to illustrate, for example, commercial, historical, or sports de-
velopments. For planning holidays or trips to unknown places inside or outside
our hometown we often grab a map—or, nowadays, we recur to Internet plan-
ners, like Google Maps, or (car) navigation systems. And if we ask someone for
directions, we may well end up with a sketch map illustrating the way to take.
All these maps display different information for different purposes. Often, they
are intended for a specific task. However, the design of the maps not always reflects
this task-specificity. The depicted information may be hard to extract, either be-
cause of visual clutter, i.e., a lot of excess information, or because the map user
is not properly guided to the relevant information. In this paper, we discuss the
concept of focus maps, which is an approach to designing maps that guide a map
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 154–170, 2008.
c Springer-Verlag Berlin Heidelberg 2008
What Do Focus Maps Focus On? 155
user in reading information off a map. Using simple graphical and geometric oper-
ations, the constructed maps focus a user’s attention on the relevant information
for a given task. This way, we are able to design maps that not only are tailored
for the intended task, but also assist a map user in reading them.
In the next section, we present approaches to map-based assistance in spatial
tasks and illustrate the fundamental concepts underlying our approach, namely
schematization and a computational approach to constructing schematic maps.
Section 3 explains the concept of focus maps previously presented by [1] in
a restricted scope and discusses its generalized aim and properties. Section 4
introduces a toolbox for map construction and the relevant components needed
for designing focus maps. This section also shows examples of different kinds of
focus maps. In Section 5 we provide empirical evidence supporting our approach;
in Section 6 we outline how the concept of focus maps may be transferred to
the construction of 3D virtual worlds. The paper ends with conclusions and an
outlook on future work in Section 7.
Maps and map-like representations have been used by humans since ancient
times [2]. There are evidences that they are used universally, i.e., across cultures
[3]. That is, maps are (or have become) a form of representing space used by
almost any human being, just as natural language. Over time, maps have become
an everyday product. However, often there is a mismatch between what the map
designer has intended and how the map reader actually uses the map [4]. This
problem persists even though maps are rarely purely graphical representations,
but usually also contain (explanatory) verbal elements [5]. And this problem
increases with the increasing use of map-like representations in electronic form.
While there is a rich set of rules and guidelines for the generation of paper-based
cartographic maps (e.g., [6,7]), these rules are mostly missing for electronic maps
presented on websites or on mobile devices.
This can be observed in approaches for automatic assistance in spatial tasks.
Maps play a major role here; in addition to verbal messages almost all Internet
route planners and car navigation systems also provide information on the way to
take in graphical form. In research, for example in the areas of human-computer
interaction and context awareness, several approaches exist that deal with map-
based assistance (e.g., [8,9,10]). Most of these approaches employ mobile devices
to present maps; the maps are used as interaction means in location based ser-
vices [8,11,9]. Accordingly, this research aims at an automatic adaptation of the
maps to the given medium and situation [12]. Questions of context awareness and
adaptation to context play an important role [13] (see also the next section).
Our work is based on ideas presented by Berendt et al. [14]. They develop a
computational approach to constructing maps they term schematic. Schematic
maps are representations that are intentionally simplified beyond technical needs
to achieve cognitive adequacy [15]. They represent the specific knowledge needed
for a given task; accordingly, the resulting maps are task-specific maps [16]. Three
156 K.-F. Richter et al.
As we have detailed in the last section, maps are important in our everyday life. It
is a prime means to communicate spatial information; reading maps is a recurring
task. Consequently, assistance systems that use maps as communication means
should not only assist in the given spatial task, but also provide assistance in
reading the maps. This holds especially since the advent of mobile devices with
their small displays as platform for these assistance systems.
In line with the aspect maps approach (see last section), maps as assistance
means should concentrate on the relevant information. This serves to reduce
cognitive load of the users; they should not need to process spatial information
that is not needed for the task at hand. At the same time, however, these maps
should also guide their reading. This serves to speed up information processing;
by the design of the map, map users should be drawn to the relevant information.
We term this design principle of reader guidance focus map. The focus ef-
fect is a specific form of schematization. While it does not reduce information
represented in a map homogeneously by, for example, removing objects or sim-
plifying geometry over all objects, it reduces the information to be processed by
funneling a reader’s attention to the relevant information.
Since schematic maps are task-specific [16], what information focus maps fo-
cus on is dependent on the task at hand. When the task is to guide a user from
location A to location B, maps need to be designed differently from maps that
present points of interest in the depicted environment. That is, map design is
context dependent; the appearance of the generated map depends on the envi-
ronment depicted, on the selected information, and on the intended task. Other
than the approaches listed in Section 2 and other “traditional” approaches to
context (e.g., [19,20]) that define context by (non-exhaustive) lists of factors
whose parametrization is supposed to result in context-adaptive behavior, we
take a process-oriented approach to context [21]. Figure 1 provides a diagram-
matic view on this approach. It distinguishes between the environment at hand,
the environment’s representation (in the context of this paper this is the focus
map), and an agent using the representation to interact with the environment—
here, this is the map user. Between these three constituents, processes determine
What Do Focus Maps Focus On? 157
the interactions going on to solve a given task. For example, map reading and
interpretation processes determine what information the agent extracts from the
map, while processes of selection and schematization determine what information
gets depicted in the map by the map designer, i.e., determine the representation.
These processes, finally, are determined by the task at hand. The designer selects
and schematizes information with a specific task in mind, the map user reads
information off the map to solve a specific task. This way of handling context is
also flexibe with respect to task changes—be it the kind of task or the concrete
task at hand. Thus, it may well be the basis for flexibly producing different kinds
of maps using the same data basis, for example, in mobile applications.
What to Focus on
The term focus map stands for representations that guide a map user’s reading
processes to the relevant information. However, as just explained, depending on
the context there is a great variety of what this relevant information might be.
Accordingly, different kinds of maps can be summarized under the term focus
map. It is important to note that what is generally depicted on a map, i.e., the
(types of) objects shown, are selected in a previous step (see Section 4). The
selected features depend on the kind of task as illustrated above; focusing then
highlights specific instances of these features, namely those specifically relevant
for the actual task. For example, for a wayfinding map the street network as
well as landmark features may be selected for depiction; the route connecting
origin and destination and those landmarks relevant for the route then may be
highlighted using focus effects.
Broadly, we can distinguish between maps that focus on specific objects (or ob-
ject types) and maps that focus on specific areas of the depicted environment (cf.
also the distinction between object- and space-schematization in [22]). Focusing
158 K.-F. Richter et al.
on objects can be achieved by using symbols to represent the relevant objects, for
example, landmarks [23,24]. It may also be achieved by object-based schematiza-
tion, i.e., by altering the appearance of specific objects to either increase or de-
crease their visibility (see Section 4.2).
When focusing on specific areas, all objects in these areas are in focus, in-
dependent of their type. Objects in the focused area are highlighted, all other
objects are diminished. Such maps may, for example, focus on the route between
some origin and destination, funneling a wayfinder’s attention to the route to
take [1]. Several different areas can be in focus at the same time, which may be
disconnected. This holds also for focusing on multiple routes at the same time
to, for example, indicate alternative detours next to the proposed main route.
For all the different kinds of focus maps, graduated levels of focus are possible,
i.e., it is possible to define several levels of varying focus. In a way, this corre-
sponds to the depicitional precedence explained in Section 2; different types of
information may be highlighted to different degrees. This may be used to either
depict “next-best” information along with the most important information, or to
increase the funneling effect by having several layers of increasing focus around
an area. With this graduated levels of focus, we can distinguish strong and weak
focus. Using a strong focus, there is an obvious, hard difference in presenting
features in focus and those that are not. Features in focus are intensely high-
lighted, those that are not are very much diminished. A weak focus provides a
smoother transition between those features in focus and those that are not.
The kinds of focus maps presented so far all focus on either objects or areas,
i.e., on parts of the depicted environment. They emphasize structural informa-
tion [25]. However, maps may also be designed such that they emphasize the
actions to be performed. Such maps focus on functional information. Wayfind-
ing choreme maps [26] are an example of this kind of maps. In designing such
maps, the visual prototypes identified by Klippel [25] that represent turning ac-
tions at intersections emphasize the incoming and outgoing route-segments at
intersections, i.e., the kind of turn due at an intersection. This way, they ease
understanding which action to perform, reducing ambiguity and fostering con-
ceptualization of the upcoming wayfinding situations. Combining structural and
functional focus, for example, as in chorematic focus maps [27], then results in
maps that focus on the relevant information in the relevant areas.
Combining structural and functional focus is also employed in generating per-
sonalized wayfinding maps. Here, different levels of focus are used in that maps
depict information in different degrees of detail (focus) depending on how well
known an area is to the wayfinder [28]. Such maps that show transitions between
known and unknown parts of an environment are a good example for using mul-
tiple levels of focus. The maps consist of three classes of elements of different
semantics and reference frames:
most restricted elements of the map: only the previously traveled path and
prominent places or landmarks along the path are selected and depicted on
the resulting map.
– Transition points; they describe the transition from familiar to unfamiliar
areas and also define the transition between the individual reference frame
and a geographic frame of reference. For reasons of orientation and localiza-
tion, elements of the known part at the transition points are selected and
added to the map.
– One or more unfamiliar areas; all elements of these areas belong to a ge-
ographic frame of reference. This means focus effects can only sensibly be
applied to unfamiliar environments, as is further explained below.
We apply focus effects differently for each of the three classes of elements. The
familiar paths are highly schematized, chorematized (all angles are replaced by
conceptual prototypes; see [26]), and scaled down. No focusing is applied to these
parts of the map, as there is no additional environmental information depicted
that could distract the attention of the wayfinder. These paths only serve as
connections between familiar and unfamiliar environments.
The purpose of maps based on previous knowledge is to highlight the unknown
parts of a route. Accordingly, the transition areas are subject to focus. To enable
localization, a transition point has to be clearly oriented and identifiable. This
requires resolving ambiguities that may arise. To this end, elements in direct
vicinity of the transition points that belong to the known parts of a route are
selected and displayed. We apply a strong focus function to these points. This
enables a smooth reading of the transition between the different parts. In unfa-
miliar parts, we display much more environmental information to provide more
spatial context. To focus a wayfinder’s attention on the route to travel, we apply
focus effects on the route as explained above (see also Section 4.3).
4 Implementation
Focus maps, as a specific kind of schematic maps, are part of the toolbox for
schematic map design developed in project I2-[MapSpace] of the Transregional
Collaborative Research Center SFB/TR 8 Spatial Cognition.1 In this section, we
will briefly introduce the basics of this toolbox and the underlying operations for
generating focus maps. Section 4.3 then introduces a generic way of generating
focus maps and shows examples of the different kinds of focus maps discussed
so far.
1
http://www.sfbtr8.spatial-cognition.de/project/i2/
160 K.-F. Richter et al.
vector-based geometry and building up the required data structures (e.g., ex-
tracting a graph from a given street network). The toolbox is able to deal with
data given in different formats, for instance, as EDBS- or GML-files.2 There is
also functionality provided to export data again, which is also used as one way to
communicate between different parts of the toolbox. The main part of the tool-
box, though, is the provision of operations for the graphical and geometric ma-
nipulation of spatial objects (features represented as points, lines or polygons).
These operations form the basis for the different implemented schematization
principles; those operations required for focus maps are explained in more detail
in the next subsection.
The toolbox is implemented in Lisp. Maps can be produced as Scalable Vector
Graphics (SVG)3 or in Flash format4 . SVG is an XML-based graphics format
that is highly portable across different platforms and applications. Flash allows
for a simple integration of interaction means in the map itself and can be dis-
played by most modern browsers.
Mostly, the operations of the data processing part can be used independently
from each other; there is no predefined order of execution. The context model
presented in Section 3 (see also Fig. 1) may be used to implement a control mod-
ule that determines the execution order given a task, agent, and environment.
As for any schematic map to be constructed, the spatial information (e.g., objects
or spatial relations) to be depicted needs to be selected. Specific to focus maps,
the selection operation also involves determining which parts of this information
are to be highlighted. The concrete operation achieving this focus depends on
the kind of focus effect aimed for. Focusing on specific objects, for example, is
realized simply by type comparison of the objects in the database with the target
type. In focusing on specific areas, on the other hand, for every object a focus
factor is calculated that depends on the object’s distance to the focus area.
The most important operation for designing focus maps is adapted coloring
of depicted graphical objects. This operation determines the visual appearance
of the map; it works on a perceptual level. This operation is used for any kind of
focus map described in Section 3. The coloring operation manipulates the color—
the RGB values—of objects before they get depicted. Those objects that are in
focus are depicted in full color to make them salient. In contrast, the color of
objects not in focus is shifted towards white. This color shift renders these ob-
jects less visible as they are depicted in lighter, more grayish color. In contrast,
the non-shifted objects stick out, putting them in focus. Additionally, the ge-
ometry of objects not in focus may be simplified. This further diminishes their
visual appearance, as has been demonstrated by [1]. To this end, the toolbox
implements line simplification based on discrete curve evolution [29].
2
EDBS: http://www.atkis.de; GML: http://www.opengis.net/gml/
3
http://www.w3.org/Graphics/SVG/
4
http://www.adobe.com/support/documentation/en/flash/documentation.html
What Do Focus Maps Focus On? 161
Fig. 2. A focus map emphasizing a specific oject type. In this example, tramways (the
big, black lines) are highlighted, while water bodies (the light gray areas) are strongly
diminished.
factor to each color component. The distance d is defined as the minimal distance
between a coordinate c and the focus area f . The three new color components
r , g , b then are calculated to be the minimum of 230,5 and the sum of the old
color component (r, g, b, respectively) and the distance d multiplied with a factor
k, which determines how quickly colors fade out (i.e., corresponds to strong or
weak focus). This sum is normalized by the size of the environment s.
d = |c − f |
kd
r = min(230, r + )
s
kd
g = min(230, g + )
s
kd
b = min(230, b + )
s
When multiple areas a0 , ..., an are present, the secondary areas are integrated
as focus objects such that they decrease the added sum again. This is achieved
by calculating an additional distance value n that gives the minimal distance
(nearness) of a coordinate c to the nearest additional area. However, to restrict
the influence of additional areas, we only take those coordinates into account
that are nearer to the areas than the average distance between all objects and
5
A RGB-color of (230, 230, 230) corresponds to a light grey that is still visible on white
background.
What Do Focus Maps Focus On? 163
a) b)
the main focus object (the average distance is denoted by p). The value n is
additionaly modified by another focus factor j that determines the strength of
the additional areas’ influence.
5 Empirical Results
In the literature and our own work, we can find several arguments why focus
maps as they have been discussed in the previous sections are beneficial for map
reading and, consequently, for task performance. Li and Ho [30], for example,
discuss maps for navigation systems that highlight the area a wayfinder is cur-
rently in. A user study demonstrates that people consider this highlighting as
beneficial, especially if a strong focus function is used, i.e., if only the area in the
immediate vicinity is highlighted. In a similar line, the resource-adaptive naviga-
tion system developed at Universität Saarbrücken [8,11] adapts presentation of
information to the available display space and the time a user has to extract the
required information. The less time there is, the more focused the presentation is.
164 K.-F. Richter et al.
6 Focus in 3D
Schematization methods, including focus effects, can also be transferred to the
generation of 3D virtual environments (VEs). Nowadays, these environments are
more and more utilized, for example, to visualize geospatial data [33]. Some of
these geospatial virtual environments remodel real environments, for example,
cities, such as in Google Earth. One of the reasons for this newly emerged trend
is the huge amount of available 3D data to produce high quality virtual envi-
ronments. These virtual cities can be used not only for entertainment, but they
can also provide a new medium for tourism and can be used for training people,
for example, in rescue scenarios.
A virtual environment “[...] offers the user a more naturalistic medium in
which to acquire spatial information, and potentially allows to devote less cog-
nitive effort to learning spatial information than by maps” ([34], p. 275). While
this is an important aspect of using VEs for getting acquainted with an envi-
ronment, also several navigational problems have been identified [34]. Compared
to navigational experiences in a real environment, when navigating in a vir-
tual environment people get less feedback and information on their movement.
This is due to the fact that often virtual environments are only presented on a
desktop and movement is controlled by a joystick or a mouse. Vestibular and
proprioceptive stimuli are missing in this case [35]. Therefore, people have se-
vere problems in orienting and acquiring survey knowledge. Accordingly, there
has been a lot of research trying to improve navigational performance in VEs
6
None of these maps have been focus maps.
166 K.-F. Richter et al.
(e.g., [36,37]). Nevertheless, there are contradicting results how well survey
knowledge can be learned by a virtual environment [34] and which ways of pre-
senting spatial information in VEs are most efficient.
We believe that a transfer of schematization principles from 2D to 3D repre-
sentations is a promising way to ease the extraction of the relevant information
in VEs and, hence, a promising way to improve navigation performance. One
example of this transfer is the use of focus effects in 3D by, for example, fading
colors away from the relevant areas and using simplified geometry in areas that
are not in focus—similar to the maps depicted in Figure 3. This way, we can form
regions of interest, such as a specific route (see Fig. 5 for a sketch of this effect).
This focus effect may be used to form several regions of interest by highlighting
different features and using different levels of detail. Forming such regions may
help to get a better sense of orientation [38].
7 Conclusions
We have discussed and generalized the concept of focus maps previously pre-
sented by [1]. Focus maps are specific kinds of schematic maps. The concept of
focus maps covers a range of different kinds of maps that all have in common
What Do Focus Maps Focus On? 167
that they guide map reading to the relevant information for a given task. We
can distinguish between maps that focus on specific (types of) objects, and those
that focus on specific areas. We have illustrated their properties, design princi-
ples, and how they relate to our context model. We have introduced a toolbox
for the design of schematic maps and shown example maps constructed with this
toolbox. We have also outlined how navigation in 3D virtual environments may
benefit from a transfer of the concept of focus maps to these representations.
In addition to the transfer of focus effects from 2D to 3D representations
explained in Section 6, we plan to employ the concept of focus maps in maps
that while primarily presenting the route to take, also provide information on
how to recover from accidental deviation from that route. Here, those decision
points (intersections) that are considered to be especially prone to errors may
be highlighted and further environmental information, i.e., the surrounding area,
may be displayed in more detail than is used for the rest of the map.
We reported some empirical studies that support the claims of our approach.
Further analyses and empirical studies are required, though, to better understand
the properties of focus maps and wayfinding performance with diverse types of
maps. For example, we plan to perform eye-tracking studies that will determine
whether a map user’s map reading is guided as predicted by the employed design
principles. We will also further analyze the performance of map users in different
wayfinding tasks, such as route following or self-localization, where they are
assisted by different types of maps, for example, tourist maps or focus maps.
These studies will help to improve design principles for schematic maps and will
lead to a detailed model of map usage. Finally, we will evaluate the consequences
of transferring focus maps to 3D environments on navigation performance in
these environments.
Acknowledgments
This work has been supported by the Transregional Collaborative Research Cen-
ter SFB/TR 8 Spatial Cognition, which is funded by Deutsche Forschungsge-
meinschaft (DFG). Fruitful discussions with Jana Holsanova, University of Lund,
helped to sharpen ideas presented in this paper. We also like to thank partic-
ipants of a project-seminar held by C. Hölscher and G. Strube at Universität
Freiburg for providing their empirical results (see [32]).
References
1. Zipf, A., Richter, K.-F.: Using focus maps to ease map reading — developing
smart applications for mobile devices. KI Special Issue Spatial Cognition 02(4),
35–37 (2002)
2. Tversky, B.: Some ways that maps and diagrams communicate. In: Freksa, C.,
Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition II - Integrating
Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications,
pp. 72–79. Springer, Berlin (2000)
168 K.-F. Richter et al.
3. Stea, D., Blaut, J.M., Stephens, J.: Mapping as a cultural universal. In: Portu-
gali, J. (ed.) The Construction of Cognitive Maps, pp. 345–358. Kluwer Academic
Publishers, Dordrecht (1996)
4. Mijksenaar, P.: Maps as public graphics: About science and craft, curiosity and
passion. In: Zwaga, H.J., Boersema, T., Hoonhout, H.C. (eds.) Visual Informa-
tion for Everyday Use: Design and Research Perspectives, pp. 211–223. Taylor &
Francis, London (1999)
5. Tversky, B., Lee, P.U.: Pictorial and verbal tools for conveying routes. In: Freksa,
C., Mark, D.M. (eds.) Spatial Information Theory - Cognitive and Computational
Foundations of Geopraphic Information Science, Berlin, International Conference
COSIT, pp. 51–64. Springer, Heidelberg (1999)
6. MacEachren, A.: How Maps Work: Representation, Visualization and Design. Guil-
ford Press, New York (1995)
7. Hirtle, S.C.: The use of maps, images and ”gestures” for navigation. In: Freksa,
C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition II - Integrating
Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications,
pp. 31–40. Springer, Berlin (2000)
8. Wahlster, W., Baus, J., Kray, C., Krüger, A.: REAL: Ein ressourcenadaptierendes
mobiles Navigationssystem. Informatik Forschung und Entwicklung 16, 233–241
(2001)
9. Schmidt-Belz, B.P.S., Nick, A., Zipf, A.: Personalized and location-based mobile
tourism services. In: Workshop on Mobile Tourism Support Systems, Pisa, Italy
(2002)
10. Kray, C., Laakso, K., Elting, C., Coors, V.: Presenting route instructions on mobile
devices. In: International Conference on Intelligent User Interfaces (IUI 2003), pp.
117–124. ACM Press, New York (2003)
11. Baus, J., Krüger, A., Wahlster, W.: A resource-adaptive mobile navigation system.
In: IUI 2002: Proceedings of the 7th international conference on Intelligent user
interfaces, pp. 15–22. ACM Press, New York (2002)
12. Reichenbacher, T.: The world in your pocket — towards a mobile cartography.
In: Proceedings of the 20th International Cartographic Conference, Beijing, China
(2001)
13. Zipf, A.: User-adaptive maps for location-based services (LBS) for tourism. In:
Woeber, K., Frew, a., Hitz, M. (eds.) Proceedings of the 9th International Con-
ference for Information and Communication Technologies in Tourism, Innsbruck,
Austria, ENTER 2002. Springer, Heidelberg (2002)
14. Berendt, B., Barkowsky, T., Freksa, C., Kelter, S.: Spatial representation with
aspect maps. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition
1998. LNCS (LNAI), vol. 1404, pp. 157–175. Springer, Heidelberg (1998)
15. Klippel, A., Richter, K.-F., Barkowsky, T., Freksa, C.: The cognitive reality of
schematic maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-based Mobile
Services - Theories, Methods and Implementations, pp. 57–74. Springer, Berlin
(2005)
16. Freksa, C.: Spatial aspects of task-specific wayfinding maps - a representation-
theoretic perspective. In: Gero, J.S., Tversky, B. (eds.) Visual and Spatial Reason-
ing in Design, pp. 15–32. University of Sidney, Key Centre of Design Computing
and Cognition (1999)
17. Barkowsky, T., Freksa, C.: Cognitive requirements on making and interpreting
maps. In: Hirtle, S.C., Frank, A.U. (eds.) COSIT 1997. LNCS, vol. 1329, pp. 347–
361. Springer, Heidelberg (1997)
What Do Focus Maps Focus On? 169
18. Berendt, B., Rauh, R., Barkowsky, T.: Spatial thinking with geographic maps: An
empirical study. In: Czap, H., Ohly, P., Pribbenow, S. (eds.) Herausforderungen an
die Wissensorganisation:Visualisierung, multimediale Dokumente, Internetstruk-
turen, pp. 63–73. Ergon-Verlag, Würzburg (1998)
19. Dey, A.K.: Understanding and using context. Personal and Ubiquitous Comput-
ing 5(1), 4–7 (2001)
20. Sarjakoski, L.T., Nivala, A.M.: Adaptation to context - a way to improve the
usability of topographic mobile maps. In: Meng, L., Zipf, A., Reichenbacher, T.
(eds.) Map-based Mobile Services - Theories, Methods and Implementations, pp.
107–123. Springer, Berlin (2005)
21. Freksa, C., Klippel, A., Winter, S.: A cognitive perspective on spatial context.
In: Cohn, A.G., Freksa, C., Nebel, B. (eds.) Spatial Cognition: Specialization and
Integration. Number 05491 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany,
Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss
Dagstuhl, Germany (2007)
22. Peters, D., Richter, K.F.: Taking off to the third dimension — schematization of vir-
tual environments. International Journal of Spatial Data Infrastructures Research
(accepted); Special Issue GI-DAYS 2007. Young Researchers Forum, Münster
23. Elias, B., Paelke, V., Kuhnt, S.: Concepts for the cartographic visualization of
landmarks. In: Gartner, G. (ed.) Location Based Services & Telecartography - Pro-
ceedings of the Symposium 2005. Geowissenschaftliche Mitteilungen, TU Vienna,
pp. 1149–1155 (2005)
24. Neis, P., Zipf, A.: Realizing focus maps with landmarks using OpenLS services.
In: Mok, E., Gartner, G. (eds.) Proceedings of the 4th International Symposium
on Location Based Services & TeleCartography, Department of Land Surveying &
Geo-Informatics. HongKong Polytechnic University (2007)
25. Klippel, A.: Wayfinding choremes. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.)
COSIT 2003. LNCS, vol. 2825, pp. 320–334. Springer, Heidelberg (2003)
26. Klippel, A., Richter, K.-F., Hansen, S.: Wayfinding choreme maps. In: Bres, S.,
Laurini, R. (eds.) VISUAL 2005. LNCS, vol. 3736, pp. 94–108. Springer, Heidelberg
(2006)
27. Klippel, A., Richter, K.F.: Chorematic focus maps. In: Gartner, G. (ed.) Location
Based Services & Telecartography. Geowissenschaftliche Mitteilungen. Technische
Universität Wien, Wien, pp. 39–44 (2004)
28. Schmid, F.: Personalized maps for mobile wayfinding assistance. In: 4th Interna-
tional Symposium on Location Based Services and Telecartography, Hong Kong
(2007)
29. Barkowsky, T., Latecki, L.J., Richter, K.-F.: Schematizing maps: Simplification of
geographic shape by discrete curve evolution. In: Freksa, C., Brauer, W., Habel, C.,
Wender, K.F. (eds.) Spatial Cognition II - Integrating Abstract Theories, Empirical
Studies, Formal Methods, and Practical Applications, pp. 41–53. Springer, Berlin
(2000)
30. Li, Z., Ho, A.: Design of multi-scale and dynamic maps for land vehicle navigation.
The Cartographic Journal 41(3), 265–270 (2004)
31. Kuhnmünch, G., Strube, G.: Wayfinding with schematic maps. Data taken from
an article in preparation (2008)
32. Ahles, J., Scherrer, S., Steiner, C.: Selbstlokalisation mit Karten und Orientierung
im Gelände. Unpublished report from a seminar held in 2007/08 by C. Hölscher
and G. Strube. University of Freiburg (2007)
170 K.-F. Richter et al.
33. Slocum, T., Blok, C., Jiangs, B., Koussoulakou, A., Montello, D., Fuhrmann, S.,
Hedley, N.: Cognitive and usability issues in geovisualization. Cartography and
Geographic Information Science 28(1), 61–75 (2006)
34. Montello, D.R., Hegarty, M., Richerdson, A.E.: Spatial memory of real environ-
ments, virtual environments, and maps. In: Allen, G. (ed.) Human spatial memory:
Remembering where, pp. 251–285. Lawrence Erlbaum Associates, Mahwah (2004)
35. Nash, E.B., Edwards, G.W., Thompson, J.A., Barfield, W.: A review of pres-
ence and performance in virtual environments. International Journal of Human-
computer Interaction 12(1), 1–41 (2000)
36. Darken, R.P., Sibert, J.L.: A toolset for navigation in virtual environments. In:
UIST, pp. 158–165 (1993)
37. Darken, R.P., Sibert, J.L.: Wayfinding strategies and behaviours in large virtual
worlds. In: CHI, pp. 142–149 (1996)
38. Wiener, J.M., Mallot, H.A.: ’fine-to-coarse’ route planning and navigation in re-
gionalized environments. Spatial Cognition and Computation 3, 331–358 (2003)
39. Coors, V.: Resource-adaptive interactive 3d maps. In: SMARTGRAPH 2002: Pro-
ceedings of the 2nd international symposium on Smart graphics, pp. 140–144.
ACM, New York (2002)
Locating Oneself on a Map in Relation to Person
Qualities and Map Characteristics
1 Introduction
Spatial cognition refers to the myriad of cognitive processes involved in acquiring,
storing, representing, and manipulating knowledge about space. The spaces in ques-
tion may range from small spaces, visible from a single viewpoint and amenable to
direct manipulation (e.g., a desk surface littered with objects), to environmental
spaces that may be experienced by navigating to multiple vantage points (e.g., a cam-
pus or city environment), to geographic or celestial spaces that are rendered visible by
amplifiers of human capacities (e.g., maps representing the entire surface of Earth at
once, photographs of the far side of the moon) [1]. Cognitive processes concerning
space may be supported by a variety of representations ranging from the interior and
mental (e.g., mental images of individual objects or landmarks, a survey-like cogni-
tive map) to the external and concrete (e.g., Global Positioning System technology, a
room blueprint, a road map). The focus of the research discussed here is on human
adults’ ability to use external spatial representations (maps) to represent navigable
environments. Specifically, we examine adults’ success in connecting locations in
outdoor (campus or park) environments to locations on a map.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 171–187, 2008.
© Springer-Verlag Berlin Heidelberg 2008
172 L.S. Liben, L.J. Myers, and K.A. Kastens
The motivation for our focus on maps is both practical and theoretical. At the prac-
tical level, maps are pervasive tools across eras and cultures, and maps are used to
teach new generations about how to conceptualize and use the environments in which
they live and work [2,3,4,5]. They play a central role in a wide range of disciplines as
diverse as epidemiology, geology, geography, and ecology; they are used for common
life tasks such as navigating to new locations, interpreting daily news reports, and
making decisions about where to buy a house or locate a business [6,7]. Map use and
map education may provide important pathways for enhancing users’ spatial skills
more generally [5,8,9,10]. Research on map use may thus help to identify what map
qualities impede or enhance clarity or use, and may help to identify what qualities of
people must be taken into account when designing maps or educational interventions.
At the theoretical level, research on map understanding is valuable because maps
challenge users’ representational, logical, and – of particular relevance here – spatial
concepts. Studying how adults successfully use maps (or become confused by them)
may help to identify component spatial processes and strategies, in turn enhancing
understanding of basic spatial cognition.
In the current research, people were asked to find correspondences between loca-
tions in environmental space and locations on a map of that space. Figuring out where
one is “on a map” is an essential step for using a map to navigate from one’s current
location to another location. It is also an essential step for using a map to record in-
formation about spatial distributions of phenomena observed in the field, as when
geologists record locations of rock outcrops, ecologists record the nesting areas of a
particular species, or city planners record areas of urban blight.
There is a relatively large body of research that explores the way that people de-
velop and use mental representations of large environments [11,12,13]. There is also a
relatively large body of research that explores the way that people use maps to repre-
sent vista spaces, that is, spaces that extend beyond the tabletop, but that can still be
seen from a single vantage point or with only minor amounts of locomotion [14,15].
But there has been relatively little work that combines experience in large-scale, navi-
gable spaces with finding one’s location on ecologically valid maps of those spaces.
Our work falls at this intersection, and, as enumerated below, was designed to address
four major topics: adults’ success and strategies in identifying their current locations
on a map, whether these would differ with different map characteristics, whether suc-
cess would vary with participants’ spatial skills and gender, and, finally, whether pat-
terns of findings would be similar for field and computer mapping tasks.
First, we were interested in examining how well adults carry out the important step in
map use of locating themselves on a map when they are in a relatively unfamiliar en-
vironmental space and are given a map of that space without verbal information. This
is the condition one faces in real life when one is in a new environment with a map
labeled in a completely foreign language (as, for example, when an English-literate
monolingual is using a map labeled in Japanese or Arabic).
To collect relevant data, we asked college students (relatively new to campus) to
show their locations on a map similar to the one routinely provided to campus visi-
tors. Prior research [16] has shown that many adults head off in the wrong direction
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 173
after consulting posted “You Are Here” maps when the map is unaligned with the
referent space (i.e., when up on the map does not indicate straight ahead in the space).
Would adults likewise have difficulty identifying their own location on a map even if
they had the opportunity to manipulate it as they liked? Would they rotate the map as
they tried to get their bearings?
Second, we were interested in examining the effect of map variables on the user’s
success in identifying correct locations. Within psychology, research on map use has
tended to pay relatively little attention to the particular kind of map used. That is, psy-
chological research has generally examined map performance in relation to person
variables (e.g., age, sex, spatial skills) rather than in relation to cartographic variables
(e.g., scale, viewing angle, color schemes). Within cartography, research has tended
to examine the pragmatic effects of manipulating map variables (i.e., asking which of
several maps works best), paying relatively little attention to how perceptual and cog-
nitive theories inform or are informed by the observed effects.
One potentially fruitful way to tie these two traditions together is through the
concept of embodiment, the notion that our bodies and bodily activities ground some
aspects of meaning [17]. There has been considerable work on the importance of em-
bodied action for encoding spatial information from the environment. For example,
Hegarty and colleagues [18] reported that kinesthetic experiences associated with
moving through the environment contribute to learning spatial layouts. An embodi-
ment perspective also implies that place representations will be relatively more or less
difficult to interpret to the degree that they are more or less similar to embodied ex-
perience [19]. Consistent with this argument, prior research has shown that preschool
children are better able to identify locations on an oblique perspective map than on an
overhead map (plan view) of their classroom, and are better able to identify referents
on oblique than vertical aerial photographs [19,20,21]. In comparison to plan repre-
sentations, oblique representations are more consonant with perceptual experiences as
humans move through their ecological niche using the sensory and locomotor capaci-
ties of their species.
To test whether map characteristics have an effect on adult performance, we exam-
ined adults’ success in marking their locations on one of four different kinds of cam-
pus maps created by crossing two dimensions – viewing angle (varying whether the
map was plan vs. oblique) and map shape (varying whether the map was round vs.
square). We expected that the difference in viewing angle might show an advantage
for the oblique map (following the embodiment argument above). We expected that
the difference in shape might advantage the round map because unlike a rectilinear
map, it does not implicitly privilege any particular orientation (thus perhaps increas-
ing participants’ propensity to turn the map into alignment with the environment).
However, because the two map variables might be expected to interact (because an
oblique – but not a plan view map – specifies a particular viewing direction), we did
not design this work as a test of a priori predictions, but instead as a means of exam-
ining adults’ success and strategies in relation to map type.
174 L.S. Liben, L.J. Myers, and K.A. Kastens
A third goal of our research was to examine whether spatial skills would predict
performance on the campus mapping task, and if so, which spatial tasks would have
predictive value. Earlier investigators have addressed the relation between spatial
abilities and success in learning large-scale spatial layouts [18,22]. Here we extended
this approach to tasks that did not require integrating or remembering information
gathered across time and space, but instead required participants to link information
from the visible, directly perceived environment to a graphic representation of that
environment. To select the candidate spatial skills, we drew from the task- and meta-
analysis of Linn and Petersen [23] which identified three major kinds of spatial abili-
ties: mental rotation (skill in imagining figures or objects moving through two- or
three-dimensional space), spatial perception (skill in representing one’s own or an
object’s orientation despite conflicting visual cues or frames of reference), and spatial
visualization (skill in solving multi-step spatial tasks by a combination of verbal and
visual strategies). In addition, we designed our work to examine whether participant
sex would have any predictive value for performance on the mapping task, above and
beyond any that might be attributed to differences in measured spatial skills. This
question was of interest because of the continuing evidence of gender differences in
spatial cognition [24].
A final goal of our research was motivated by the practical challenges of studying
map-related spatial cognition in the field as in the campus mapping task just de-
scribed. There are surprisingly frequent changes in field sites even in environments
that might be expected to be highly stable. In our work, for example, even over short
time spans we have encountered the construction of new buildings, new roads, and
new signage, all of which influence the test environment, require a change in routes
between locations, and necessitate the preparation of new maps. Outdoor testing is
open to the exigencies of weather and daylight; the use of large field sites requires
energetic experimenters and participants. The layout of field sites cannot be manipu-
lated to test theoretically interesting questions. It is difficult to identify local partici-
pants who do not yet have too much familiarity with the site and equally well it is
difficult to identify and transport non-local participants to the site. These and similar
concerns led us to join others who have attempted to develop simulated testing envi-
ronments [19,25] to study environmental cognition.
The specific approach taken here was to derive research measures from the soft-
ware included in the Where Are We? [WAW?] map-skills curriculum developed by
Kastens [26]. This software links dynamic images of eye-level views of a park (video-
taped as someone walked through a real park) to a plan map of that park. The soft-
ware allows the user to control the walk through the park (and hence the sequence of
scenes shown on the video image) by clicking on arrows beneath the videotaped inset.
Arrows (straight, pointing left, pointing right) control whether the video inset shows
what would be seen if walking straight ahead, turning left, or turning right. As de-
scribed in more detail below, using WAW? exercises, we created mapping tasks in
which eye-level views of the terrain had to be linked to locations and orientations on
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 175
the map. Our goal was first, to explore whether the same kinds of spatial skills (if
any) would predict performance on the campus mapping and computer tasks, and
second, to examine whether performance on the campus and computer tasks was
highly related.
1.5 Summary
In summary, this research was designed to provide descriptive data on adults’ success
and their strategies in marking maps to indicate their locations in a relatively new
campus environment, to determine whether mapping performance or strategies would
vary across maps that differed with respect to viewing angle (plan vs. oblique) and
shape (square vs. round), to examine whether paper and pencil spatial tasks and par-
ticipant sex would predict success on the campus mapping task, to explore whether
similar person qualities would predict success on a computer mapping task, and to
determine whether performance on the field and computer mapping tasks would be
highly correlated.
2 Method
Students who were new to a large state university campus in the U.S. and were mem-
bers of the psychology department’s subject pool were recruited to participate in this
study. Sixty-nine students (50 women, 19 men; M [SD] age = 18.6 [1.4] years) par-
ticipated in session 1 for which they received course credit. Most participants (48)
took part in this first session within 6 weeks of their arrival on campus, and the
remainder did so within 10 weeks of arrival. Self-reported scores on the Scholastic
Aptitude Test (SAT) were provided by 44 participants: Ms (SDs) for verbal and quan-
titative scores, respectively, were 599 (75) and 623 (78). Participants’ race/ethnicity
reflected the subject pool which was almost entirely White.
Following completion of all session-1 testing, participants were invited to return
for session 2 for which they received either additional course credit or $10, as pre-
ferred. Of the initial group, 43 students (31 women, 12 men) returned.
Session 1 included the outdoor campus mapping activity and paper and pencil spa-
tial tasks; session 2 included the computer mapping tasks. All testing for session 1
was completed first to take advantage of better weather for outdoor testing, and to
minimize students’ familiarity with campus for the campus mapping task.
Participants were greeted in a small testing room in the psychology department where
they completed consent forms. They were then given a map of the room and asked to
place an arrow sticker on the map so that the point of the arrow would show exactly
where they were sitting in the room, and the direction of the arrow would show which
direction they were facing. They were told that the experimenter would be using a
stopwatch to keep track of how long the activities were taking, but to place the sticker
at a comfortable pace rather than attempt to rush. Participants implemented these di-
rections indoors without difficulty. Following this introduction to the procedure, they
were told that they would be doing something similar outside as they toured campus.
176 L.S. Liben, L.J. Myers, and K.A. Kastens
Participants were then led along a fixed route to five locations on campus. At each,
a laminated campus map was casually handed to participants (maps were intentionally
unaligned with the space), and participants were asked to place an arrow sticker on
the map to show their location and direction. (Because there was some experimenter
error in orienting participants at some locations, the directional data were compro-
mised and thus only those data depending on participant location are described here.)
Each participant was randomly assigned to use one of four different campus maps
described earlier. Both the oblique perspective map (the official campus map) and the
plan map were created by the university cartographers except that all labels were re-
moved. All maps were identical in size and scale: square sides and circle diameters
were 205 mm, representing approximately 965 m, thus at a scale of approximately
1:4,700. An illustrative map is shown in Fig. 1.
At each location, the experimenter recorded whether the participant turned the map
from its initial orientation, the time taken to place the sticker on the map (beginning
from when the map was handed to the participant), and the map orientation (in rela-
tion to the participant’s body) at the moment the sticker was placed. Participants did
not have a map as they were led from location to location, and experimenters chatted
with participants as they walked to reduce the likelihood that participants would focus
on their routes. After all test locations had been visited, the participants returned to
the lab where they were given the paper and pencil spatial tasks (described later). Par-
ticipants were asked to provide their scores on the SAT if they could remember them
and were willing to report them.
Fig. 1. Round oblique map. See text for information on map size and scale.
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 177
After the session was completed, each map with its sticker was scanned. Of the po-
tential 345 sticker placements (5 stickers for each of 69 participants), 3 stickers from
two participants’ maps became dislodged before the maps were scanned and thus full
data for the campus map task were available for 67 of the 69 participants. Sticker
placements were scored as correct if the tip of the arrow fell within a circle centered
on the correct location, with a radius of 6 mm (equivalent to approximately 28 m on
the ground).
In session 2 we administered computer mapping tasks drawn from the WAW? curricu-
lum described earlier. One task was drawn from the activity called Are We There Yet?
In this activity, the participant is shown a starting position and facing direction on the
map, sees on a video inset what would be visible from that position, and is asked to
use the arrow keys to navigate to a target location. To ease the participant’s introduc-
tion to the software, the navigation task used here was the easiest one available in
WAW? The second activity was drawn from the WAW? activity called Lost! In this
activity, participants are dropped into the park in some unknown location (i.e., it is
not marked on the map), and are asked to discover where they are by traveling around
the park via arrow clicks that control which video images are seen. We gave partici-
pants two Lost! problems, the first at the easiest level of task difficulty and the second
at the most difficult. For all three tasks, we recorded whether or not the problem was
solved (i.e., whether the target location was found or whether the location was cor-
rectly identified), how many seconds and how many arrow clicks the participant used
within the maximum time allotted (8 minutes for each of the tasks).
During session 1, participants were given paper and pencil tests to measure the three
spatial skills identified by Linn and Petersen [23]. A paper folding test (PFT) was
used to assess spatial visualization [27]. This task shows 20 sequences of between two
and four drawings in which a sheet of paper is folded one or more times and then a
hole is punched through the layers. Respondents are asked to select which of five
drawings shows the pattern of holes that would appear if the paper were then com-
pletely unfolded. Scores are the number marked correctly minus one-fourth the num-
ber marked incorrectly within the allowed time (here 2 minutes). The test of spatial
perception was the water level task (WLT) in which students are given drawings of
six tipped, straight-sided bottles and asked to draw a line in each to show where the
water would be if the bottle were about half full [28]. Lines drawn within 5° of hori-
zontal were scored as correct. Finally, mental rotation (MR) was assessed by a modi-
fied version of the Spatial Relations subtest of the Primary Mental Abilities (PMA)
battery [29]. Respondents are shown 21 simple line figures as models. Each model is
followed by five similar figures, and respondents are asked to circle any that show the
model rotated but not flipped over (i.e., not a mirror image). Scores are the number
correctly circled (2 per row) minus those incorrectly circled (up to 3 per row) within
the allotted time (here 2 minutes).
178 L.S. Liben, L.J. Myers, and K.A. Kastens
3 Results
The data are presented below in five sections. First, we offer descriptive data on the
performance on the campus mapping task. Second, we address the question of
whether performance or strategies on the campus mapping task differed as a function
of map type. Third, we address whether performance on the campus mapping task is
predicted by participant variables. Fourth, we address the same question for the com-
puter mapping task. Finally, we address the relation between performance on the
campus and computer mapping tasks.
College students’ performance on the campus mapping task covered the full range,
with some placing none, and others placing all five stickers correctly, M (SD) = 2.2
(1.4). An even more telling index of performance variability is evident in Fig. 2 which
shows the locations of erroneous responses for one target location. It is striking not
only that many responses are distant from the correct location, but also that many
responses fail to show the correct kind of location.
Fig. 2. Erroneous sticker placements (40 black circles) for one target location (star). Omitted
are 12 stickers placed correctly and 17 stickers falling within the area defined by adjacent
buildings (striped region). Note that some errors were particularly egregious, as in stickers
placed in open fields or parking lots.
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 179
Accuracy of Sticker Placements. As explained initially, this research was also de-
signed to examine whether task performance would vary with map qualities of shape
and viewing angle. To examine this question, the total number correct served as the
dependent variable in a two-way analysis of variance (ANOVA) in which between-
subjects factors were map shape and map angle. Neither main effect nor their interac-
tion was significant. Means (SDs) for round versus square, respectively, were 2.2
(1.3) versus 2.3 (1.5); for plan versus oblique, 2.1 (1.4) versus 2.4 (1.4).
Map Turning. A third dependent measure examined in relation to map type was use
of a map-turning strategy. For this analysis, the dependent measure was the number of
locations (0-5) at which participants turned the map rather than leaving it in the orien-
tation in which they received it from the experimenter. A few participants never
turned the map or turned it only once (n=4); on average, the map was turned on 3.9
(1.3) items. An ANOVA on the number of turns revealed neither main effects nor
interactions with respect to map shape or viewing angle. Means (SDs) for round ver-
sus square, respectively were 3.9 (1.2) versus 4.0 (1.4); for plan versus oblique, 4.1
(1.2) versus 3.8 (1.4).
Map Orientation. The final behavior examined with respect to map type was how the
participant held the map (with respect to the participant’s own body) while placing the
sticker. Based on the sides of the square map, we defined as canonical the position shown
in Fig. 2 or its 90°, 180°, or 270° rotation. A 2 (map shape) x 2 (map angle) ANOVA on
the number of canonical orientations (0-5) revealed a significant main effect of map shape,
F(1,65)=5.35, p=.024. More canonical orientations were used by participants with square
than with circular maps, Ms (SDs), respectively, 4.0 (1.0) versus 3.3 (1.4).
number of stickers placed correctly on the campus mapping task and scores on each
of the three paper and pencil spatial tests. Correlations of sticker accuracy with mental
rotation (MR), spatial visualization (PFT), and spatial perception (WLT), respec-
tively, were r(67) = .048, p = .357; r(67) = .321, p = .004; and r(67) = .219, p = .038
(here and below, one-tailed tests were used given directional hypotheses). These cor-
relations reflect data from all participants in session 1, irrespective of whether they
were available for session 2. (An identical pattern of results holds if analyses are lim-
ited to the 43 participants who took part in both sessions.) As anticipated, perform-
ance on the three spatial measures was also correlated: MR with PFT, r(69) = .425, p
< .001; MR with WLT, r(68) = .410, p < .001, and PFT with WLT, r(68) = .253, p =
.019. (Again, identical patterns hold with the smaller sample as well.)
The number of correct sticker placements was then used as the criterion variable
for a regression analysis of the campus mapping task. A stepwise regression was per-
formed with the three spatial tests entered on the first step. We entered participant sex
on the second step to determine if there were any effects of sex above and beyond
those that could be attributed to possible spatial skill differences. Finally, on step
three we entered the strategy variable of the number of locations at which the partici-
pant turned the map.
At the first level of the model, all three predictors together accounted for 15% of the
variance, R2 = .15, F(3, 66) = 3.61, p = .018. Within this multiple regression, however,
only PFT predicted success (standardized β = .34, p = .010). At the second level of the
model, participant sex did not significantly increase the prediction, p-change = .56,
although PFT remained a significant predictor (standardized β = .34, p = .010) and the
overall model remained significant, R2 = .15, F(4, 66) = 2.76, p = .035. Finally, at the
third level of the model, the map-turning strategy significantly improved the prediction,
R2-change = .108, p-change = .004 (standardized β = .35, p = .004), and PFT remained a
significant predictor (standardized β = .27, p = .033). The final overall model was
R2 = .25, F(5, 66) = 6.59, p = .002.
p = .003, but again, participant sex at step 2 did not add significantly to the model
after spatial scores had been entered (p-change = .603). However, unlike the prior
regression, in this analysis it was MR (standardized β = .52, p = .003) rather than PFT
(standardized β = .12, p = .475) that predicted mapping performance on the computer
task.
An additional goal of this research was to explore the possibility that the computer
mapping tasks drawn from WAW? might be a viable substitute for measuring success
on mapping tasks in the real, life-size environment. To evaluate this possibility, we
computed correlations between scores on the two tasks. Irrespective of which depend-
ent measure is used for the WAW? tasks (number completed, time in seconds, or
number of arrow clicks), there was no significant relation between scores on the cam-
pus and computer tasks. The highest correlation was between the number of correctly
placed stickers on the campus mapping task and the number of correctly completed
WAW? tasks, and it was not marginally significant even with a one-tailed test, r(43) =
.121, p = .22. Furthermore, what little trend toward an association there was disap-
pears entirely by statistically controlling for scores on the spatial tasks: partial r(39) =
.005, p = .487.
As an additional means of examining the distinctions or comparability of the two
mapping tasks, we compared the patterns of association between success on each
mapping task and the success on the paper and pencil spatial tasks. As is evident from
the findings described for each of the two mapping tasks taken individually, the re-
gression analyses showed different patterns for the campus and computer mapping
tasks. Particularly striking was the finding that MR score predicted performance on
the computer mapping task, but not performance on the campus mapping task. To
provide data bearing on the question of whether the associations differ in the two
tasks, we compared the sizes of the correlations between MR score and performance
on campus versus computer tasks. These correlations differed significantly,
t(40)=1.73, p <.05. Neither of the other correlations (PFT or WLT) differed signifi-
cantly between the two mapping tasks.
4 Discussion
We begin our discussion by commenting on what the empirical data suggest about
how well adults can mark a map to show their location in a real, relatively newly en-
countered campus environment, addressing the question of whether performance dif-
fers in relation to the two manipulated map characteristics (viewing angle and map
shape). In the course of doing so, we comment on the appearance and distribution of
the map-related behaviors observed during the campus mapping task. We then discuss
findings from the regression analyses concerning which individual difference vari-
ables predict performance on the campus mapping task and performance on the com-
puter mapping task. Finally, we discuss implications of data concerning the relation
between performance on the two mapping tasks.
182 L.S. Liben, L.J. Myers, and K.A. Kastens
4.1 Performance and Strategies on the Campus Mapping Task and Their
Relation to Map Characteristics
The data from the campus mapping task offer a compelling demonstration that many
adults are challenged by the request to show their location on a map. The fact that
some participants were right at every one of the locations establishes that the task was
a solvable one. The fact that some participants were wrong at every one of the loca-
tions establishes that the task was not a trivial one. Furthermore, egregious errors (see
Fig. 2) suggest that some adults’ map-interpretation skills are particularly poor. Al-
though it is perhaps not surprising to see errors like these among preschool and ele-
mentary school children [20,30], it is surprising to see them among adults. Based on
participants’ comments and affective demeanor during testing, we have every reason
to believe that all were engaged by the task, and all were trying their best.
In addition to providing information on absolute levels of performance, the campus
mapping task was of interest as an avenue for testing the possible impact of the map
characteristics of map shape and viewing angle. One reason that we thought that map
characteristics might lead to different behaviors and different levels of accuracy was
because the different map characteristics might be differentially conducive to partici-
pants’ aligning the map with the space, and research with both adults and children had
shown better performance with aligned than unaligned maps [16,31,32]. The current
data, however, provided no evidence that map shape affected accuracy on the location
tasks nor that it affected the number of items on which participants turned the map.
This was true even if we limited the comparison to the plan maps which – unlike the
oblique maps – did not imply a particular vantage point.
We had also hypothesized that oblique maps – in comparison to plan maps – might
elicit better performance insofar as they were more consonant with an embodied view,
that is, one more similar to that encountered by humans as they navigate through the
environment [19] and given that past research had shown advantages to oblique-
perspective representations for children [20,21]. Again, however, there were no sig-
nificant differences in accuracy or strategies in relation to map angle, either as a main
effect or in interaction with map shape.
Although there were no differences in accuracy in relation to map type, partici-
pants were significantly slower on the square plan map than on any other map type. In
addition, square maps were held in canonical positions in relation to participants’ bod-
ies significantly more often, implying that these maps were less often aligned with the
environmental space. Perhaps the extra time taken for the square plan maps reflects
additional time needed for mental rotation with unaligned maps. That the oblique ver-
sion did not require additional time suggests that participants may (like children) find
it easier to work with the oblique map, despite the fact that in most orientations, its
vantage point differs from the one experienced in the actual environment. The data do
not yet permit definitive conclusions about process, but they do permit the conclusion
that additional research on the effects of map characteristics is worthwhile.
As expected, the regression analyses showed that spatial skills significantly predicted
performance on both the campus mapping task and the computer mapping task. Sex
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 183
added no additional prediction in either task. Interestingly, the specific spatial skills
that predicted performance differed on the two tasks. For the campus mapping task, it
was the score on the paper folding task that was the significant predictor. Mental rota-
tion scores added nothing further to the prediction. The reverse held in the computer
mapping task. For this task, it was the score on the mental rotation task that predicted
task success, and other spatial scores did not add significantly to the prediction.
In the taxonomy offered by Linn and Petersen [23], the paper folding task falls
within the skill category labeled spatial visualization which they describe as covering
tasks that involve multiple steps, using visual or verbal strategies, or both. It is possi-
ble to think of the campus mapping task as one for which varied approaches would
indeed be viable. For example, someone might focus on landmark buildings, someone
else might focus on the geometric qualities of the streets, someone else might try to
figure out the direction walked from some earlier identified spot, some might try to
align the map and the space, and so on. In other words, this outdoor task – much like
normal map-based navigation – gives the map-user considerable freedom in structur-
ing the task.
That mental rotation mattered for performance on the computer mapping task is
also easily understood because in this task – unlike the campus mapping task – par-
ticipants had less control over the visual array and the map. That is, although partici-
pants controlled which video clip they saw (by selecting which of three arrows they
clicked at every choice point), they had no control of what was seen within the result-
ing video clip that was played. That is, once a video clip had been selected by an ar-
row click, participants saw whatever part of the park was recorded by the camera – at
the camera’s height, at the camera’s angle, at the camera’s azimuth, and at the cam-
era’s speed of rotation or translation. Furthermore, participants had no control over
the orientation of the map: the map of the videotaped park was always in a fixed posi-
tion, and thus, usually out of alignment with the depicted vista. It is thus not surpris-
ing that under these conditions, an ability to handle mental rotation was significantly
associated with performance.
An additional finding from the regression analysis on the campus mapping task
lends further support to the hypothesized importance of participants’ own actions for
success on the task. Specifically, as reported earlier, participants’ use of the map-
turning strategy added significant prediction to the score on the campus mapping task
even after spatial skills had been entered into the regression model. Aligning a map
with the referent space is an epistemic action, defined by Kirsch and Maglio as an
action in which an agent manipulates objects in the environment with the goal of ac-
quiring information [33]. As explicated by Kirsch and Maglio for the case of expert
Tetris players, epistemic actions serve the user by revealing otherwise inaccessible
information or by decreasing the cognitive load required to gain information. For ex-
ample, it is more time-efficient for Tetris players to rotate a polygon on the screen and
visually compare its shape with a candidate nesting place than to do the rotation and
comparison mentally. In our work, we have observed epistemic actions in a task in
which adults visited eight outcrops in a field site, and were asked to select which of
14 scale models best depicts the underlying geological structure [34]. As they strug-
gled to select the correct model, some participants rotated candidate models into
alignment with a map of the area, rotated candidate models into alignment with the
full-scale geological structure, placed two candidate models side by side to facilitate
184 L.S. Liben, L.J. Myers, and K.A. Kastens
comparison, and pushed rejected models out of the field of view. Like rotating a Tet-
ris shape or rotating a scale model of a geological structure, rotating a map into
alignment with the referent space decreases the cognitive load required to solve the
task at hand by substituting direct perception for mental rotation and mental compari-
son. Use of epistemic actions requires that the agent foresees, before the action is
taken, that the action will have epistemic value; such tactical foresight is separate
from the spatial skills measured by the paper and pencil tasks, in which the actions are
prescribed by the experimenter.
The regression findings just discussed provide one line of evidence that the computer
mapping task cannot be used as a substitute for the campus mapping task for studying
spatial cognition. That is, the finding that different spatial skills predict performance
on each of the two mapping tasks implies that the two tasks differ in important ways.
This conclusion is bolstered by two other findings, first, that there is a significant dif-
ference in the size of the correlation between MR and performance on the campus
mapping task versus the computer mapping task, and second, that the correlation be-
tween scores on the two mapping tasks is not significant. Taken together, these data
imply that it is important to continue to conduct mapping research – as well as map
skill education – in real, life-size environments.
5 Conclusions
The data from the present research bear upon adults’ success in using one of the most
common kinds of spatial representations of large environments – maps – as they ob-
serve the environment directly in the field or via another representational medium.
Our data show dramatic variability with respect to how well cognitively intact adults
(all of whom met the intellectual criteria needed for university admission) succeed in
indicating their locations on a map. Although some participants showed outstanding
performance, others made serious errors reminiscent of those made by young children
[20,32,35].
Our data also bear on questions about research in different kinds of spatial environ-
ments. The finding that different spatial skills predicted success on the campus versus
computer mapping tasks coupled with the finding that participants’ scores on the two
mapping tasks were not significantly correlated, lead to the conclusion that it is unwise
to substitute one task for the other. From the pragmatic perspective of conducting be-
havioral research in environmental cognition, this conclusion is perhaps disheartening. It
would ease research significantly if the answer were otherwise. From the perspective of
theoretical work on spatial cognition, however, the finding is more intriguing than dis-
heartening. The current findings contribute evidence to the growing conclusion that the
skills entailed in solving spatial problems in object or vista spaces do not entirely over-
lap with skills entailed in solving spatial problems in environmental spaces. Past re-
searchers have shown the importance of testing in real environments even for indoor,
built spaces (corridors and rooms) that are highly defined, homogeneous, and rectilinear
[18]. Our findings add to the evidence for the importance of testing in larger, more
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 185
varied, less clearly defined outdoor environments as well [36]. Outdoor environments
provide potential clues (e.g., a nearby building, a distant skyscraper, a river, the position
of the sun). But they also present potential challenges including barriers (that may ob-
struct otherwise useful landmarks), an absence of clear boundaries to define the borders
of the space (in contrast to the walls of a room), and vistas that may appear homoge-
nous to the untrained eye (e.g., desert vistas, dense forests, or acres of wheat fields as
far as the eye can see). A full understanding of human spatial cognition will thus re-
quire studying how people identify and use information that is available within a di-
verse range of environments.
Likewise, the findings from the research described here bear on the role of map
characteristics. Although our data do not yet permit firm conclusions about the way
that map qualities interact with environmental and person qualities, they do provide
strong support for the importance of systematically varying map qualities as we con-
tinue to explore the fascinating territory of spatial cognition.
References
1. Liben, L.S.: Environmental cognition through direct and representational experiences: A
life-span perspective. In: Gärling, T., Evans, G.W. (eds.) Environment, cognition, and ac-
tion, pp. 245–276. Oxford, New York (1991)
2. Downs, R.M., Liben, L.S.: Mediating the environment: Communicating, appropriating,
and developing graphic representations of place. In: Wozniak, R.H., Fischer, K. (eds.) De-
velopment in context: Acting and thinking in specific environments, pp. 155–181. Erl-
baum, Hillsdale (1993)
3. Harley, J.B., Woodward, D. (eds.): The history of cartography: Cartography in prehistoric,
ancient and Medieval Europe and the Mediterranean, vol. 1. University of Chicago Press,
Chicago (1987)
4. Stea, D., Blaut, J.M., Stephens, J.: Mapping as a cultural universal. In: Portugali, J. (ed.)
The construction of cognitive maps, pp. 345–360. Kluwer Academic Publishers, The
Netherlands (1996)
5. Uttal, D.H.: Seeing the big picture: Map use and the development of spatial cognition.
Dev. Sci. 3, 247–264 (2000)
6. MacEachren, A.M.: How maps work. Guilford, New York (1995)
7. Muehrcke, P., Muehrcke, J.O.: Map use: Reading, analysis, and interpretation, 4th edn. JP
Publications, Madison (1998)
8. Davies, C., Uttal, D.H.: Map use and the development of spatial cognition. In: Plumert,
J.M., Spencer, J.P. (eds.) The emerging spatial mind, pp. 219–247. Oxford, New York
(2007)
9. Liben, L.S.: Education for spatial thinking. In: Damon, W., Lerner, R.(series eds.) Ren-
ninger, K.A., Sigel, I.E. (vol. eds.) Handbook of child psychology: Child psychology in
practice, 6th edn., vol. 4, pp. 197–247. Wiley, Hoboken (2006)
186 L.S. Liben, L.J. Myers, and K.A. Kastens
10. National Research Council: Learning to think spatially: GIS as a support system in the K-
12 curriculum. National Academy Press, Washington (2006)
11. Evans, G.W.: Environmental cognition. Psy. Bull. 988, 259–287 (1980)
12. Gärling, T., Golledge, R.G.: Environmental perception and cognition. In: Zube, E.H.,
Moore, G.T. (eds.) Advances in environment, behavior and design, pp. 203–236. Plenum
Press, New York (1987)
13. Kitchin, R., Blades, M.: The cognition of geographic space. L.B. Taurus, London (2002)
14. Montello, D.R.: Scale and multiple psychologies of space. In: Campari, I., Frank, A.U.
(eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993)
15. Montello, D.R., Golledge, R.G.: Scale and detail in the cognition of geographic informa-
tion. Report of the specialist meeting of Project Varenius, Santa Barbara, CA, May 14-16,
1998. University of California Press, Santa Barbara (1999)
16. Levine, M., Marchon, I., Hanley, G.: The placement and misplacement of You-Are-Here
maps. Env. and Beh. 16, 139–158 (1984)
17. Johnson, M.L.: The meaning of the body. In: Overton, W.F., Mueller, U., Newman, J.L.
(eds.) Body in mind, mind in body: Developmental perspectives on embodiment and con-
sciousness, pp. 191–224. Erlbaum, New York (2008)
18. Hegarty, M., Montello, D.R., Richardson, A.E., Ishikawa, T., Lovelace, K.: Spatial abili-
ties at different scales: Individual differences in aptitude-test performance and spatial-
layout learning. Intelligence 34, 151–176 (2006)
19. Liben, L.S.: The role of action in understanding and using environmental place representa-
tions. In: Rieser, J., Lockman, J., Nelson, C. (eds.) The Minnesota symposium on child de-
velopment, pp. 323–361. Erlbaum, Mahwah (2005)
20. Liben, L.S., Yekel, C.A.: Preschoolers’ understanding of plan and oblique maps: The role
of geometric and representational correspondence. Child Dev. 67, 2780–2796 (1996)
21. Plester, B., Richards, J., Blades, M., Spencer, C.: J. Env. Psy. 22, 29–47 (2002)
22. Allen, G.L., Kirasic, K.C., Dobson, S.H., Long, R.G., Beck, S.: Predicting environmental
learning from spatial abilities: An indirect route. Intelligence 22, 327–355 (1996)
23. Linn, M.C., Petersen, A.C.: Emergence and characterization of sex differences in spatial
ability: A meta-analysis. Child Dev. 56, 1479–1498 (1985)
24. Halpern, D.F.: Sex differences in cognitive abilities, 3rd edn. Erlbaum, Mahwah (2000)
25. Lawton, C.A., Morrin, K.A.: Gender differences in pointing accuracy in computer-
simulated 3D mazes. Sex Roles 40, 73–92 (1999)
26. Kastens, K.A.: Where Are We? Tom Snyder Productions, Watertown, MA (2000)
27. Ekstrom, R.B., French, J.W., Harman, H.H.: Manual for kit of factor-referenced cognitive
tests. Educational Testing Service, Princeton (1976)
28. Liben, L.S., Golbeck, S.L.: Sex differences in performance on Piagetian spatial tasks: Dif-
ferences in competence or performance? Child Dev. 51, 594–597 (1980)
29. Thurstone, T.G.: Primary mental abilities for grades 9-12. Science Research Associates,
Chicago (1962)
30. Kastens, K.A., Liben, L.S.: Eliciting self-explanations improves children’s performance on
a field-based map skills task. Cog. and Instr. 25, 45–74 (2007)
31. Bluestein, N., Acredolo, L.: Developmental changes in map-reading skills. Child Dev. 50,
691–697 (1979)
32. Liben, L.S., Downs, R.M.: Understanding person-space-map relations: Cartographic and
developmental perspectives. Dev. Psy. 29, 739–752 (1993)
33. Kirsch, J.G., Maglio., P.: On distinguishing epistemic from pragmatic action. Cog. Sci. 18,
513–549 (1994)
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 187
34. Kastens, K.A., Liben, L.S., Agrawal, S.: Epistemic actions in science education. In: Fre-
ska, C., Newcombe, N.S., Gärdenfors, P. (eds.) Spatial cognition VI. Springer, Heidelberg
(in press)
35. Liben, L.S., Kastens, K.A., Stevenson, L.M.: Real-world knowledge through real-world
maps: A developmental guide for navigating the educational terrain. Dev. Rev. 22, 267–
322 (2002)
36. Pick, H.L., Heinrichs, M.R., Montello, D.R., Smith, K., Sullivan, C.N., Thompson, W.B.:
Topographic map reading. In: Hancock, P.A., Flach, J., Caird, J.K., Vicente, K. (eds.) Lo-
cal applications of the ecological approach to human-machine systems, vol. 2, pp. 255–
285. Erlbaum, Hillsdale (1995)
Conflicting Cues from Vision and Touch Can Impair
Spatial Task Performance: Speculations on the Role of
Spatial Ability in Reconciling Frames of Reference
Madeleine Keehner
Abstract. In “hand assisted” minimally invasive surgery, the surgeon inserts one
hand into the operative site. Despite anecdotal claims that seeing their own hand
via the laparoscopic camera enhances spatial understanding, a previous study us-
ing a maze-drawing task in indirect viewing conditions found that seeing one’s
own hand sometimes helped and sometimes hurt performance (Keehner et al.,
2004). Here I present a new analysis exploring the mismatch between kinesthetic
cues (knowing where the hand is) and visual cues (seeing the hand in an orienta-
tion that is incongruent with this). Seeing one’s left hand as if from the right side
of egocentric space (palm view) impaired performance, and this depended on
spatial ability (r=-.54). Conversely, there was no relationship with spatial ability
when viewing the left hand from the left side of egocentric space (back view).
The view-specific nature of the confusion raises a possible role for spatial abili-
ties in reconciling spatial frames of reference.
1 Introduction
This paper presents a new analysis of data originally presented at the Human Factors
and Ergonomics Society annual conference [1]. The original motivation for the study
was to assess a specific anecdotal claim made by surgeons working under minimally
invasive conditions. In laparoscopic or “keyhole” surgery, a special technique is
sometimes employed in which one of the small incisions in the patient’s body is
slightly enlarged, and the surgeon’s non-preferred hand is inserted through this into
the operative site. Under these conditions, the surgeon's hand becomes visible on the
video monitor via the laparoscopic camera, and it can be guided and used like a surgi-
cal instrument. Surgeons anecdotally report that seeing their own hand on the video
monitor enhances their understanding of the spatial relations within the operative
space, in this otherwise spatially demanding domain.
This claim is intuitively plausible, and is consistent with prior literature on cross-
modal sensory integration in peripersonal space. However, previous studies in this
field have typically allowed participants to view their own hands directly, not via
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 188–201, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 189
video feedback, and in these studies the angle from which the hand is seen is usually
consistent with its actual orientation in space. By contrast, in minimally invasive
surgery the surgical camera, or laparoscope, is often placed in an orientation that is at
odds with the surgeon's own perspective, producing a view of the hand that is spa-
tially misaligned and incompatible with proprioceptive information.
In the 2004 paper, we showed that congruent and conflicting kinesthetic and visual
cues sometimes help and sometimes impair performance on a spatial task (maze
drawing in indirect viewing conditions). The present new analysis provides novel
insights into these effects. In this paper I show that the confusion caused by seeing
one’s own hand from an incongruent angle is viewpoint-specific. Moreover, the de-
gree of confusion and even whether any confusion occurs (relative to performance on
the same task without seeing the hand) depends strongly on individual differences in
spatial abilities. In this paper I discuss possible reasons for the viewpoint specific
nature of the confusion and speculate on how spatial abilities might function in recon-
ciling conflicting sensory cues relating to position and orientation in space.
Previous studies with humans and primates have shown that the senses of vision and
touch have a special relationship. Graziano and Gross have identified bimodal neu-
rons that respond only when the information received through vision and touch corre-
sponds [2]. These specialized visuo-tactile neurons fire when the hand reaches
towards an object or location in reachable space that can simultaneously be seen.
When the hand or its target location is unseen, these cells do not respond. This find-
ing suggests that highly dexterous higher primates, including humans, have developed
specialized bimodal connections between vision and touch, evolved for exploring the
world with seen hands.
The fundamental nature of the relationship between vision and touch is demonstrated
neatly by the crossmodal congruency effect. Driver and colleagues have shown that
visual cues, such as LEDs attached to the surface of the hands, can enhance speed of
responses to tactile stimuli [3]. This inter-modal facilitatory effect demonstrates the
rapid and automatic crosstalk between the two senses, such that spatial cues presented in
one modality can speed reactions to spatial cues presented in the other modality. Impor-
tantly, this effect follows the hand when it moves in space, such as when the hands are
crossed in front of the body, demonstrating that these cross-modal sensory cues are
coded in limb centered or body centered spatial coordinates.
It is well established that the somatosensory cortex of the brain represents the mo-
ment-by-moment positions and orientations of body parts as we move our limbs,
trunk, and head in space and in relation to each other [4]. The somatosensory cortex
receives proprioceptive feedback from muscles, joints, and tendons, and combines
these in a representation of the body's configuration and the relationships among dif-
ferent body parts or effectors. This “felt” position and orientation of body parts makes
up our internal representation or “body schema”.
190 M. Keehner
Although these studies demonstrate that by adulthood the body schema is well-
developed in the sensorimotor cortex, a number of ingenious experiments using fake
hands and visual prisms have shown that what we see can influence our internal rep-
resentation of limb position. Sekiyama had participants wear a visual prism that re-
versed left and right. This produced a conflict between vision and touch, such that the
participant’s right hand looked like their left hand when viewed through the prism.
After adaptation the internal representation of the hand had changed in a way that
brought it into line with the visual information [7]. This finding demonstrates that
visual experience can dramatically affect the body schema representation. Indeed, it
has been argued that visual experience may be a key mechanism by which we acquire
our default representation of hand orientation in the body schema by the time we
reach adulthood, since the back of the hand is the most frequently seen orientation of
our own hands as we grasp and manipulate everyday objects [6].
The crossmodal congruency effect described above [3], in which cues from one
modality (vision) can speed attention in another modality (touch), has been shown to
occur even when the seen “hand” is not the participant’s own. The effect has been
demonstrated with fake hands, and occurs even though the participant’s own hand is
displaced somewhat from the location of the fake hand, such as being underneath the
table on which the fake hand is placed. Studies have shown that one of the most im-
portant factors for producing these illusions with false hands is temporal alignment. If
the participant feels a touch at precisely the same moment as they see a fake hand or
rubber glove being touched they can experience an illusion whereby they are con-
vinced that it is their own hand that they are seeing [10]. Thus, a perfect match in
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 191
terms of timing between what is seen and what is felt seems to be critical in aligning
information received through vision and touch. However, this effect is disrupted
when the discrepancy between the orientations of the fake hand and the participant’s
own hand become too great, such as when the fake hand is rotated ninety degrees
relative to the participant’s hand [8, 9].
Our apparently unitary representation of body position in space is generated when the
information from all of our sensory modalities are integrated in the brain. Human and
monkey studies have shown that this occurs in the posterior parietal cortex, specifi-
cally within the intraparietal sulcus (IPS). Areas within monkey IPS are critical for
integrating information acquired through vision and touch, and are active in control-
ling reaching and pointing movements in space. Homologous regions exist in human
IPS, and in both species this area appears to play a critical role in creating a represen-
tation of the space of our bodies and the space around our bodies, with particular
importance in tasks that involve movement of the hands guided by vision [11].
In an ingenious study, monkeys were trained to retrieve food rewards from beneath
a glass plate that could be turned clear or opaque at the flick of a switch. After train-
ing, neurons in the ventral intraparietal sulcus, which had previously been responding
only to proprioceptive information, showed visual responses, indicating that they had
become bimodal through the process of associating visual and proprioceptive infor-
mation. The visual receptive fields persisted even when the view of the arm was
obscured, leading the authors to argue that these intraparietal bimodal neurons allow
the updating of body images even when the limb is unseen [12].
Sekiyama argues that of all the brain regions containing bimodal neurons, the IPS
is perhaps the most important for our internal representation of the body in space [13].
Graziano and colleagues have shown that neurons in parietal area 5 respond to the
sight of a fake arm, and furthermore that these neurons can distinguish between plau-
sible and implausible arm orientations and even between a left hand and a right hand
by sight alone (e.g., the neurons did not respond when a fake right arm was placed in
the same position and orientation as the monkey’s own left arm) [14]. Sekiyama
argues that bimodal neurons in this region (unlike other sensorimotor regions) inte-
grate visual and proprioceptive cues when the visual information matches the internal
body schema representation, and therefore the parietal cortex and specifically the IPS
contains the highest level of representation of the body in space [13].
Thus, the parietal lobe plays a critical role in integrating the many different forms
of sensory information that we receive into a high-level, overarching representation of
our own body in space. From many different sensory inputs (e.g., head-based, trunk-
based, arm-based, and retinocentric frames of reference), the parietal lobe generates a
global egocentric frame of reference and a unified internal sense of our position and
orientation in space [15, 16].
Despite the obvious stability of this representation over time, Sekiyama argues that
the body schema is somewhat adaptable [13]. Studies from a range of domains indi-
cate that these adaptations occur in the high-level representation of the IPS. It appears
192 M. Keehner
that the bimodal IPS neurons can take account of changes to information from multi-
ple senses and as a result can alter the internal representation of the body.
Such modifications to the body schema are seen in prism adaptation studies, as dis-
cussed earlier, in which bimodal neurons of the IPS alter the way that they code the
relationship between vision and touch to recalibrate discrepant sensory information
caused by wearing prisms [7]. Similar modifications to the body schema are evident
in studies involving tool use in monkeys. Research has shown that changes to spatial
coding of the limbs result from extensive experience of using long tools or instru-
ments, such that the tips of the tools become coded in the same way as the tips of the
limbs [17]. As with prism adaptation, this recalibration of the spatial extent of the
limb is reflected in changes at the neural level [18]. Essentially these kinds of flexible
processes allow the system to alter the way that different spatial frames of reference
operate together, in order to maintain a coherent sense of space and position.
Perhaps it is this capacity to adapt to new information, both real and imagined, that
allows us to perform everyday spatial tasks. The computational processes involved in
encoding, maintaining, and manipulating spatial information include the kinds of
spatial transformation processes that psychologists study and that define what we call
spatial ability [19]. Standardized tests of spatial ability, as well as everyday operations
in the real world, include tasks such as imagining how an object would change if we
picked it up and turned it (mental rotation) or imagining how the world would look,
and the consequences for our intended actions, if we moved to a different location or
orientation in space (perspective shifting). What all of these processes have in com-
mon is the requirement to represent, manipulate, update, and reconcile different spa-
tial frames of reference. These flexible processes are among the key determinants of
spatial ability, and therefore individuals with better spatial abilities should be better
able to reconcile sensory cues that represent conflicting frames of reference. This is
the central hypothesis in the present analysis.
1.6 The Set-Up in Our Study and in Typical Hand Assisted Surgery
In typical minimally invasive surgery conditions, the surgeon has no direct view of
the operative site, but must instead depend on a 2-D image from the laparoscopic
camera presented on a monitor. This image lacks binocular depth cues, and further-
more it is quite common for the laparoscope to be inserted into the patient at an angle
that differs from the orientation of the surgeon. This means that the viewpoint from
which the camera captures the operative site is inconsistent with the surgeon's per-
spective, and presumably some kind of spatial transformation, such as mental rotation
or perspective shifting, must be performed in order to align the two. In extreme cases,
the laparoscope may be inserted through a port in the patient's body that produces a
view of the operative site that is up to 180° discrepant from the surgeon's perspective.
Ideally, the surgeon seeks to minimize the discrepancy between their view and that of
the camera, but this is by no means always possible and in any case the angle of the
laparoscope is often altered multiple times during a procedure in order to provide
unobstructed views of particular structures or to allow a particular instrument to be
inserted through a specific port.
In traditional minimally invasive surgery, the surgeon has no direct contact with
the operative site using his or her hands. However, in hand assisted methods, one
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 193
hand is inserted into the operative site through a slightly enlarged port in the patient's
body. This allows the surgeon to use one hand like a very dexterous instrument, and
it also makes the hand visible on the monitor via the laparoscopic camera. These are
the conditions that we replicated in our original study. The hand was either inserted
into the task space, and thus it appeared on the monitor, or it was not present in the
task space and was therefore not visible via the camera. It was not allowed to inter-
fere with the task at all, so that any effect of seeing the hand in view of the camera
was due to its presence alone, and not to any benefits that might result from using it to
help with the spatial task.
In the original paper, we found that both camera angle and spatial ability had main
effects on performance. We also found that having the hand in view was helpful,
relative to performing the task without the hand in view, for all participants when the
camera was inserted from the left side of the workspace. Contrary to this, we found
unexpectedly that when the camera was inserted from the right side of the workspace,
having the hand in view impaired performance for lower spatial participants only [1].
In what follows, I explore these effects further, and attempt to establish whether
there is some qualitative difference in the effects of seeing the hand in view of the
camera that depends on how the hand looks. I also examine whether and under which
circumstances these effects depend on the spatial abilities of the individual partici-
pant. Given the preceding discussion of the importance of spatial ability for reconcil-
ing different frames of reference, and the fundamental connection between what we
see and what we feel, I predict that spatial ability may be especially important for
reconciling incongruent visual and kinesthetic cues and for adapting to inconsisten-
cies between these two sources of information.
2 Method
2.1 Participants
Forty right-handed paid volunteers (18 males) were recruited from the UC Berkeley
undergraduate population, mean age 20.1 years (SD 2.3 years).
2.2 Apparatus
The apparatus was constructed to mimic laparoscopic conditions (see Figure 1). The
participant’s view of the workspace was provided by a laparoscope, with the image
presented on a monitor at head height. A shield prevented direct view of the work-
space. A permanent marker pen was attached to the end of a laparoscopic instrument,
whose movements were constrained by a fulcrum (analogous to the point of entry
through the abdominal wall). The instrument was offset -45º (+315º) in azimuth.
Holding the instrument handle with their right hand, participants used the monitor
image to guide the pen tip around a star-shaped maze mounted on a platform.
2.3 Design
The independent variables were camera angle and spatial ability. The dependent
variable was error difference (number of errors with hand in view minus number of
194 M. Keehner
Monitor
90 270
Shield
Participant
Fig. 1. Experimental setup. The participant’s view of the maze and the instrument tip was
obscured by the shield and they completed the maze drawing task using only the image from
the laparoscope, which was positioned either at 90º (left side) or at 270º (right side). On half of
the trials, the participant’s left hand was visible in the monitor image.
errors without hand in view). Reported here are two conditions (camera angles 90º
and 270º) that were common across three separate experiments. Amalgamating the
experiments resulted in a mixed design, in which some participants participated in
both camera angle conditions, while others participated in either 90º or 270º but not
both. The methodologies of the experiments were identical in all essential design
features (instructions and practice trials, apparatus, procedure, counterbalancing of
conditions and trials, total number of conditions and trials).
2.4 Procedure
The laparoscopic camera was secured at one of two positions (offset in azimuth by
90º or 270º; see Figure 1). On half of the trials, participants were instructed to hold
the maze platform so that their left hand appeared on the monitor (the hand did not
interfere or help with the task). Participants completed one practice trial at each angle
(using a different maze), followed by four experimental trials, two with the hand in
view and two without (order ABBA/BAAB). Instructions were given to complete the
star mazes as quickly as possible but with as few errors as possible.
The order of conditions was counterbalanced using a Latin square design. Spatial
visualization ability was assessed using three paper-and-pencil tests: the Mental Rota-
tions Test [20], the Paper Folding Test [21] and the Card Rotations Test [21]. These
tests correlated positively (r = .58 to .63), so standardized scores (z-scores) were cal-
culated for the three tests and they were averaged to produce an aggregate measure of
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 195
spatial ability. A median split was performed on the aggregate measure to create two
groups, defined as high and low spatial ability.
Errors were scored blind after the task was completed by a manual frequency
count. Using the ink trace, one error was allocated for every time the pen crossed the
outer border of the maze.
3 Results
Previously, we reported main effects on performance of camera angle, hand position,
spatial abilities, and the interactions among these variables. In the present analysis a
new variable was created to establish whether performance was affected positively or
negatively by having the hand in view. In this analysis, performance without the hand
in view was used as the baseline and the positive or negative effects of seeing the
hand in the monitor were assessed against this. This variable was generated by sub-
tracting the number of errors made without the hand in view from the number of er-
rors made with the hand in view, in each of the two conditions. Thus, a negative error
difference indicates that seeing one’s own hand helped performance, whereas a posi-
tive error difference indicates that seeing one’s own hand impaired performance. This
new variable makes it possible to isolate the effect of seeing the hand in the camera
-25
-20
Error difference (with hand - without hand)
-15
-10 High
spatial
-5
0
Low
spatial
5
10
15
Camera at 90 degrees Camera at 270 degrees
(back view of hand) (palm view of hand)
Fig. 2. Difference in errors with hand in view versus without hand in view, under the two view-
ing orientations, split by high and low spatial participants (median split of aggregate ability
measure). Error bars represent +/- 1 SEM.
196 M. Keehner
view, and to determine whether the effect is negative or positive, relative to not seeing
the hand. In all analyses the variables met assumptions of normality.
Figure 2 represents this difference in errors with the hand in view versus without
the hand in view under the two viewing orientations, split by high and low spatial
participants (median split of aggregate spatial ability measure). This plot indicates
that qualitatively different patterns of errors occurred when the camera was positioned
to show the back view of the hand versus when it showed the palm view of the hand.
When the back view was visible (90º), seeing the hand improved performance for all
participants. An independent samples t-test showed that this effect did not differ for
higher and lower spatial participants, t(25) = .67, p = .51, n/s. By contrast, when the
palm view was visible (270º), seeing the hand impaired performance for lower spatial
participants (more errors) but it somewhat helped performance for higher spatial par-
ticipants (fewer errors), and this difference between higher and lower spatial partici-
pants was significant, t(21) = -2.93, p = .008.
Four separate one-sample t-tests with alpha adjusted for multiple comparisons
tested these effects against zero. This analysis showed that all of the error differences
except one were significantly different from zero (t = -3.77 to 3.98, p = .003 to .002,
in all significant cases). Thus, in the 90º condition, seeing the hand was significantly
beneficial to both low and high spatial participants. By contrast, in the 270º condition,
seeing the hand significantly impaired low spatial participants, whereas it did not
significantly affect high spatial participants, either positively or negatively.
90º 270º
Fig. 3. Relationship between spatial ability and error difference (hand versus no hand) in the
two viewing conditions. Points below the dotted line indicate performance that was better with
the hand in view than without, and points above the dotted line indicate performance that was
poorer with the hand in view than without. The solid line represents the best-fit regression line.
These patterns were explored further using correlational analyses. Figure 3 shows
the relationships between spatial ability and error difference (hand minus no hand) in
the two viewing conditions. The dotted line indicates the level of errors at
which there was no effect, either positive or negative, of seeing the hand relative to
not seeing the hand. Points below this line indicate that seeing the hand helped per-
formance, whereas points above this line indicate that seeing he hand hurt perform-
ance, relative to not seeing the hand. The solid line is the best-fit regression line.
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 197
Figure 3 indicates that there was no systematic relationship between spatial ability
and error difference in the 90º view condition (back view), r = -.007, p = .97, n/s. By
contrast, in the 270º condition, Figure 3 shows a clear linear relationship between
spatial ability and error difference, indicating that as spatial ability decreases, the
detrimental effect of seeing the hand increases. This correlation was significant, r = -
.54, p = .008.
4 Discussion
This new analysis of the data from these experiments reveals that qualitatively differ-
ent effects occurred depending on the view of the hand that was available to the
participant in each trial. In the 90° view condition, when the back of the hand was
visible, all participants benefited from seeing their own hand in the monitor. Further-
more, in this condition there was no significant correlation with individual differences
in spatial ability. By contrast, the effects in the 270º condition, in which the palm
view of the hand was visible, were quite different. In this condition, low spatial par-
ticipants were significantly impaired when they saw their own hand in the monitor,
while high spatial participants did not experience any significant benefit or detriment.
Moreover, there was a strong correlation between spatial ability and the effect of
seeing the hand. In this condition, individual differences in spatial ability strongly
predicted whether an individual became confused by the sight of their own hand.
If we compare overall performance, it is clear from Figure 3 that there is overall
more benefit gained from seeing the back view of the hand (90° condition) - more of
the data points fall below the dotted line, indicating that people are better off with the
sight of the hand than without it. By contrast, in the 270° condition around half of all
participants do worse when they see their own hand than when they do not (data
points above the dotted line), and these are primarily lower spatial individuals.
What is responsible for the enhancement of performance in the 90° condition and
the apparent confusion in the 270° condition, caused by seeing one's own hand? It
would appear that the view of the hand in the 90° condition is adequately aligned with
how the hand feels (its internal representation in the body schema) that it does not
cause confusion. In terms of previous research with monkeys, this might be analo-
gous to the responses that occur in bimodal neurons only when the visual and
kinesthetic information are sufficiently compatible [2, 8]. Perhaps this harmonious
“visuotactile” representation of the hand in space is what helps the participant better
understand the spatial relations of the task when the hand appears in the camera image
compared to when it is not present.
By contrast, the view of the hand in the 270° condition does not allow the partici-
pant to compensate for the camera misalignment. In fact, for individuals with poor
spatial abilities it caused more confusion than when the hand was not visible at all.
This suggests that how the hand looks in this condition is fundamentally at odds with
how it feels. In other words, it is not possible to reconcile this view of the hand with
the internal representation of the hand’s position in the body schema. In this sense,
this condition seems analogous to previous studies where false hands were placed in
orientations that were too incongruent with the “felt” hand position to allow the illu-
sion of unity to prevail [8].
198 M. Keehner
Figure 4 shows how the hand looks from these two camera orientations. There are
at least two possible reasons why the palm view of the hand should be so difficult to
reconcile with the internal representation, compared to the back view. One possibility
is that the default representation of the hand in the body schema, which has been
shown to be the back view [5], means that the 90° view can be more readily integrated
with this, whereas the 270° palm view causes too much of a conflict. Another possi-
bility is that the appearance of the hand, and especially the angle of the wrist, in the
270° view is a biomechanically awkward position for the left hand to adopt, and
therefore it is difficult to perceive the seen hand as one’s own left hand. In fact, it is
almost impossible to move the left hand in such a way as to produce this view of it
under normal circumstances, whereas it is relatively easy to orient the left-hand in
such a way as to produce a view similar to that in the 90° viewing condition. This
account is consistent with previous research on mental rotation of hands, which has
shown that motor imagery (imagined movements of body parts) is subject to the same
biomechanical constraints as real movements of the body in space [22]. These two
accounts are not mutually exclusive. Indeed, given that extended visual experience of
seeing the hands in particular orientations can influence the internal body schema
[6, 7] it seems plausible that they might, if anything, be mutually reinforcing.
Fig. 4. View of the hand from the 90º camera orientation (left) and the 270º camera orientation
(right)
Why is spatial ability so important in the 270° view condition? If we assume that
the confusion in this condition arises from difficulties with reconciling conflict be-
tween two incompatible frames of reference (visual and kinesthetic), this gives us an
interesting insight into what kinds of abilities psychometric spatial tests such as the
ones we used may be tapping. Perhaps one of the key components of cognitive spa-
tial abilities is the ability to represent, manipulate, update, and reconcile different
spatial frames of reference. It has been claimed that all spatial manipulation tasks
essentially involve manipulating relations among three spatial frames of reference:
egocentric, object-centered, and environmental [23]. For example, mental rotation
tests require the test taker to transform the orientation of an object around its intrinsic
axes and then update the transformed object-centered frame of reference in relation to
stable reference frames of the environment and the self (egocentric). Paper folding
tests require the test taker to manipulate internal parts of an object with respect to the
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 199
object’s overall frame of reference. Test of spatial orientation, which involve egocen-
tric perspective shifts, require the test taker to transform and update their own egocen-
tric reference frame with respect to stable environmental and object-centered frames
of reference. Thus, it is possible that individuals who performed poorly on the psy-
chometric spatial ability tests that we administered were generally poor at such proc-
esses, and therefore also had particular difficulty reconciling the conflict between the
visual and kinesthetic frames of reference in the 270° condition.
Although somewhat speculative, this interpretation is consistent with what we
know about brain regions involved in integrating multiple frames of reference. Spatial
information from many different sensory sources are integrated in posterior parietal
cortex into a coherent whole [15, 16]. It has also been shown that spatial transforma-
tion tasks such as mental rotation involve these same parietal regions [24-28], and
moreover, individual differences in parietal activation have been shown to correlate
with individual differences in spatial abilities [29]. Thus, it may be that an essential
function of this region is to encode, represent, manipulate, and reconcile different
spatial frames of reference.
While more research is needed to demonstrate that these effects are replicable
within a single experiment and with a larger sample, the present analysis does raise
some interesting potential avenues to pursue. Future studies could establish the pa-
rameters of congruent versus conflicting visual and kinesthetic cues. For example, is
there a degree of rotation of the image of the hand at which the information changes
from being primarily helpful to primarily hurtful in these kinds of tasks (at least for
lower spatial individuals)? Another interesting future question is whether extended
visual experience of the hand in apparently incongruous orientations can overcome
confusion such as that observed in the 270º palm-view condition. Could this view of
the hand eventually become integrated with the body schema representation, such as
occurs in prism adaptation studies [7], and consequently help in spatial reasoning
tasks such as these, even for individuals with poorer spatial abilities?
If replicable, the implications of these findings for hand-assisted minimally inva-
sive surgery are clear. In previous studies we have found that laparoscopic surgeons
comprise the same wide range of spatial abilities as the general population [30], be-
cause the domain of medicine does not pre-select for these abilities. Therefore it is
likely that surgeons using these methods will be subject to the same effects that were
evident here (at least in the beginning of their laparoscopic experience - we do not
know about possible effects of extended experience with these methods). Thus, while
in some conditions seeing the hand in the operative view may be helpful, as surgeons
claim, in other circumstances it may actually impair their understanding of the spatial
relations of the operative space. Knowing how to avoid these conditions with judi-
cious laparoscope placement might be an important applied outcome of this line of
research.
Finally, these findings shed light on the interface between vision and touch and the
multimodal nature of our apparently unitary internal representation of the space
around us. They also highlight the importance of individual differences. The data
suggest that spatial ability is a key variable, and should be included in theoretical
accounts of how, and how well, people generate, maintain, and manipulate their men-
tal representations of space.
200 M. Keehner
References
1. Keehner, M., Wong, D., Tendick, F.: Effects of viewing angle, spatial ability, and sight of
own hand on accuracy of movements performed under simulated laparoscopic conditions.
In: Proceedings of the Human Factors and Ergonomics Society’s 48th Annual Meeting, pp.
1695–1699 (2004)
2. Graziano, M.S.A., Gross, C.G.: A bimodal map of space - somatosensory receptive-fields
in the macaque putamen with corresponding visual receptive-fields. Experimental Brain
Research 97(1), 96–109 (1993)
3. Driver, J., Spence, C.: Attention and the crossmodal construction of space. Trends in Cog-
nitive Sciences 2, 254–262 (1998)
4. Penfield, W., Rasmussen, T.L.: The cerebral cortex of man. MacMillan, New York (1955)
5. Sekiyama, K.: Kinesthetic aspects of mental representations in the identification of left and
right hands. Perception and Psychophysics 32, 89–95 (1982)
6. Funk, M., Brugger, P., Wilkening, F.: Motor processes in children’s imagery: the case of
mental rotation of hands. Developmental Science 8(5), 402–408 (2005)
7. Sekiyama, K., et al.: Body image as a visuomotor transformation device revealed in adap-
tation to reversed vision. Nature 407, 374–377 (2000)
8. Graziano, M.S.A.: Where is my arm? Proceedings of the National Academy of Sci-
ences 96, 10418–10421 (1999)
9. Maravita, A., Spence, C., Driver, J.: Multisensory integration and the body schema: Close
to hand and within reach. Current Biology 13, R531–R539 (2003)
10. Pavani, F., Spence, C., Driver, J.: Visual capture of touch: Out-of-the-body experiences
with rubber gloves. Psychological Science 11(5), 353–359 (2000)
11. Grefkes, C., Fink, G.R.: The functional organization of the intraparietal sulcus in humans
and monkeys. Journal of Anatomy 207, 3–17 (2005)
12. Obayashi, S., Tanaka, M., Iriki, A.: Subjective image of invisible hand coded by monkey
intraparietal neurons. Neuroreport. 11(16), 3499–3505 (2000)
13. Sekiyama, K.: Dynamic spatial cognition: Components, functions, and modifiability of
body schema. Japanese Psychological Research 48(3), 141–157 (2006)
14. Graziano, M.S.A., Cooke, D.F., Taylor, C.S.R.: Coding the location of the arm by sight.
Science 290, 1782–1786 (2000)
15. Cohen, Y.E., Anderson, R.A.: A common reference frame for movement plans in the pos-
terior parietal cortex. Nature Reviews Neuroscience 3, 553–562 (2002)
16. Colby, C.L.: Action-oriented spatial reference frames in cortex. Neuron. 20, 15–24 (1998)
17. Maravita, A., et al.: Tool-use changes multimodal spatial interactions between vision and
touch in normal humans. Cognition 83, B25–B34 (2002)
18. Iriki, A., Tanaka, M., Iwamura, Y.: Coding of modified body schema during tool use by
macaque postcentral neurones. NeuroReport 7(14), 2325–2330 (1996)
19. Hegarty, M., Waller, D.: Individual differences in spatial abilities. In: Miyake, A., Shah, P.
(eds.) The Cambridge handbook of visuospatial thinking. Cambridge University Press,
Cambridge (2005)
20. Vandenberg, S.G., Kuse, A.R.: Mental rotations, a group test of three-dimensional spatial
visualization. Perceptual & Motor Skills 47, 599–604 (1978)
21. Ekstrom, R.B., et al.: Manual for kit of factor-referenced cognitive tests. Educational Test-
ing Service, Princeton (1976)
22. Parsons, L.M.: Imagined spatial transformations of one’s hands and feet. Cognitive Psy-
chology 19, 178–241 (1987)
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 201
23. Zacks, J.M., Michelon, P.: Transformations of visuospatial images. Behavioral and Cogni-
tive Neuroscience Reviews 4(2), 96–118 (2005)
24. Zacks, J.M., Vettel, J.M., Michelon, P.: Imagined viewer and object rotations dissociated
with event-related fMRI. Journal of Cognitive Neuroscience 15(7), 1002–1018 (2003)
25. Carpenter, P.A., et al.: Graded functional activation in the visuospatial system with amount
of task demand. Journal of Cognitive Neuroscience 11(1), 9–24 (1999)
26. Harris, I.M., et al.: Selective right parietal lobe activation during mental rotation.
Brain 123, 65–73 (2000)
27. Podzebenko, K., Egan, G.F., Watson, J.D.G.: Widespread dorsal stream activation during a
parametric mental rotation task, revealed with functional magnetic resonance imaging.
Neuroimage 15, 547–558 (2002)
28. Keehner, M., et al.: Modulation of neural activity by angle of rotation during imagined
spatial transformations. Neuroimage 33, 391–398 (2006)
29. Lamm, C., et al.: Differences in the ability to process a visuo-spatial task are reflected in
event-related slow cortical potentials of human subjects. Neuroscience Letters 269, 137–
140 (1999)
30. Keehner, M., et al.: Spatial ability, experience, and skill in laparoscopic surgery. American
Journal of Surgery 188(1), 71–75 (2004)
Epistemic Actions in Science Education
Abstract. Epistemic actions are actions in the physical environment taken with
the intent of gathering information or facilitating cognition. As students and
geologists explain how they integrated observations from artificial rock
outcrops to select the best model of a three-dimensional geological structure,
they occasionally take the following actions, which we interpret as epistemic:
remove rejected models from the field of view, juxtapose two candidate models,
juxtapose and align a candidate model with their sketch map, rotate a candidate
model into alignment with the full scale geological structure, and reorder their
field notes from a sentential order into a spatial configuration. Our study differs
from prior work on epistemic actions in that our participants manipulate spatial
representations (models, sketches, maps), rather than non-representational
objects. When epistemic actions are applied to representations, the actions can
exploit the dual nature of representations by manipulating the physical aspect to
enhance the representational aspect.
1 Introduction
Kirsch and Maglio [1] introduced the term "epistemic action" to designate actions
which humans (or other agents) take to alter their physical environment with the
intent of gathering information and facilitating cognition.1 Epistemic actions may
uncover information that is hidden, or reduce the memory required in mental compu-
tation, or reduce the number of steps involved in mental computation, or reduce the
probability of error in mental computation. Epistemic actions change the informa-
tional state of the actor, as well as the physical state of the environment. Kirsch and
1
Magnani [24] used a similar term, "epistemic acting," more broadly, to encompass all actions
that provide the actor with additional knowledge and information, including actions that do
not alter anything in the environment (e.g., "looking [from different viewpoints]," "checking,"
"evaluating," "feeling [a piece of cloth]".) Roth [25] (p. 142) used "epistemic action" to refer
to sensing of objects and "ergotic action" to refer to manipulating objects in a school labora-
tory setting. In this paper, we use the term "epistemic action" in the original sense of Kirsh
and Maglio.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 202–215, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Epistemic Actions in Science Education 203
Maglio contrasted epistemic actions with "pragmatic actions," those taken to imple-
ment a plan, or implement a reaction, or in some other way move oneself closer to a
goal.
Kirsch and Maglio [1] explicated their ideas in terms of the computer game Tetris.
They showed that expert players make frequent moves that do not advance the goal of
nestling polygons together into space-conserving configurations, but do gain infor-
mation. For example, a player might slide a falling polygon over to contact the side of
the screen and then count columns outwards from the side to determine where to drop
the polygon down to fit into a target slot. For a skilled player this backtracking ma-
neuver is more time-efficient than waiting for the polygon to fall low enough for the
judgment to be made by direct visual inspection. At a different point in the game, a
player might rotate a polygon through all four of the available configurations before
selecting a configuration. Kirsh and Maglio showed that such physical rotation, fol-
lowed by direction perceptual comparison of the polygon and the available target
slots, is more time-efficient than the corresponding mental rotation. As an individual
player's skill increases from novice to expert, the frequency of such "extraneous"
moves increases [2].
In this paper, we apply the concept of epistemic actions to science and science edu-
cation. Scientists and science students manipulate objects in the physical world in the
course of trying to solve cognitively demanding puzzles. We argue that epistemic
actions, in the sense of Kirsch and Maglio [1], are an underappreciated tool that scien-
tists use, and that science students could be taught to use, to enhance the efficiency of
their cognitive effort. We begin by showing examples of participant actions that we
believe to be epistemic which emerged in our own study of spatial thinking in geo-
sciences. We then describe epistemic actions in other domains of science education,
and conclude by offering some generalizations and hypotheses about how epistemic
actions may work.
to indicate the location of the most steeply-dipping outcrops. They also make frequent
use of iconic gestures, while discussing or describing attributes of an observed out-
crop, a specific model, a group of models, or a hypothesized structure. For example, a
student uses a cupped hand to convey her interpretation that the structure is concave
upwards.
In addition to abundant deictic and iconic gestures, the videotapes also document
instances in which participants spontaneously move their hands in ways that do not
have apparent communicative value, manipulating the objects available to them in a
manner that we interpret as "epistemic actions."
2.2 Situation #2: Participant Moves Two Candidate Models Side by Side
As participants progress through their reasoning process, they may take two candidate
models out of the array and place them side by side (Fig. 2.) We infer that this action
is intended to facilitate comparing and contrasting attributes of the two models. The
side-by-side comparison technique is employed when the two models differ subtly;
for example, in Fig. 2 the two models are both concave, both elongate, both steep-
sided, both closed, and differ only in that one is symmetrical along the long axis while
the other is asymmetrical. Based on eye movements of people who were asked to
recreate spatial patterns of colored blocks working from a visually-available model,
Ballard, Hayhoe, Pook and Rao [7] concluded that their participants adopted a "mini-
mum memory strategy" when the model and under-construction area were close to-
gether. They kept in mind only one small element of the model (for example, the
color of the next block), and relied on repeated revisits back and forth between the
model and the under-construction block array. The revisits allowed them to acquire
information incrementally and avoid even modest demands on visual memory. Bal-
lard, et al.'s participants overwhelmingly favored this minimal memory strategy even
though it was more time-consuming than remembering multiple aspects of the model,
and even though they were instructed to complete the task as quickly as possible.
When Ballard, et al. increased the distance between model and copy, use of the mini-
mal memory strategy decreased.
Epistemic Actions in Science Education 205
Fig. 1. Participant places rejected models out of field of view. We infer that the purpose of this
action is to decrease the number of visually-available comparisons.
206 K.A. Kastens, L.S. Liben, and S. Agrawal
Fig. 2. After rejecting most models, this participant took the remaining two candidate models
out of the array and placed them side-by-side, to faciliate comparison of details
In some cases, participants place a candidate 3-D model side by side with their in-
scriptions (field notes) (Fig. 3). We infer that this juxtaposition facilitates the process of
comparing observation (in the notes) with interpretation (embodied in the candidate 3-D
model), presumably through enabling the minimal memory strategy as described above.
Participants' inscriptions took many forms [3], including a map of the field area with
outcrop locations marked. Among the participants who had a map, we noted an addi-
tional epistemic action: participants rotated the map and candidate model such that the
long axis of the model was oriented parallel to the long axis of the cluster of outcrop
positions marked on the map (Fig. 3). This alignment allowed a direct perceptual
Fig. 3. This participant has placed her inscriptions (notes) side by side with a candidate model
to facilitate comparison between her recorded observations and her candidate interpretation
Epistemic Actions in Science Education 207
Fig. 4. This participant, an expert, rotates several candidate models so that the long axis of the
model aligns with the long axis of the full-scale structure
208 K.A. Kastens, L.S. Liben, and S. Agrawal
comparison of inscriptions and model, without requiring the additional cognitive load of
mental rotation, as in the case of Kirsh and Maglio's [1] Tetris players.
2.4 Situation #4: Participant Rotates Model to Align with the Referent Space
In a few cases, a participant spontaneously rotated a model or models to align with the
full-scale structure formed by the outcrops in the perceptual space2 (Fig. 4). As in
Situation #3, we hypothesize that the alignment achieved by physical rotation enabled
a direct comparison, eliminating the cognitive load of mental rotation. An interesting
aspect of Situation #4 is that the full-scale structure was not perceptually available to
compare with the model structure. Only 2 of the 8 outcrops were visible to the par-
ticipants as they made and defended their model selection. We hypothesize that
Fig. 5. While observing the eight outcrops, this participant recorded observations onto blank
sheets of paper “sententially,” that is, sequenced from top to bottom, left to right on the paper,
like text in a book. When confronted with the integrative task, she tore up her inscriptions into
small rectangles with one outcrop per rectangle, and reorganized them into a map-like spatial
arrangement. (Note: in order to show the reader both the spatial arrangement of the paper scraps
and the details of the sketch, this figure was constructed by scanning the student’s inscriptions
and superimposing the scanned sketches onto a video screen shot).
2
After completing their explanation of their model selection, all participants were asked by the
experimenter to rotate their selected model into alignment with the full-scale structure. In this
paper, we are referring to individuals who spontaneously elected to align their model with the
structure before being asked to do so by the experimenter.
Epistemic Actions in Science Education 209
as they moved through the field area from outcrop to outcrop and then back to the
starting place, some participants acquired or constructed an embodied knowledge of
the outcrop locations and configuration, and that embodied knowledge is somehow
anchored to, or superimposed upon the landscape through which they moved.
2.5 Situation #5: Participant Rips Up Inscriptions, and Reorders Them in Space
In the no-map condition of our experiment [3], participants recorded their observa-
tions onto blank paper. Some participants situated their observations spatially to form
a sketch map of the field area, and others recorded their observations "sententially"
[8], in chronological order on the page from top to bottom, left to right, like text in a
book. One participant, a novice to field geology, recorded her observations senten-
tially, sketching each outcrop as she visited it. Then, when she was confronted with
the selection task, she spontaneously tore up her papers so that each outcrop sketch
was on a separate scrap of paper, and arranged the scraps spatially into a rough plan
view of the outcrop locations (Fig. 5).
4 Discussion
The participants in our study produced the actions described above spontaneously, as
they struggled to puzzle their way through a spatially-demanding task that most found
difficult. Some participants first asked whether it was OK to move or turn the models,
which suggests that they knew in advance that such actions would be beneficial. They
valued these actions sufficiently that they were willing to risk rejection of a poten-
tially forbidden move, and they anticipated that the experimenter might see these
actions as being of sufficient value to outlaw.
All of the examples of epistemic actions we have provided thus far, and the original
Tetris examples of Kirsch and Maglio [1], have involved spatial thinking, that is, think-
ing that finds meaning in the shape, size, orientation, location, direction, or trajectory of
objects, processes, or phenomena, or the relative positions in space of multiple objects,
processes, or phenomena. Spatial examples of epistemic actions seem most obvious and
most powerful. But is this association between epistemic actions and spatial thinking
inevitable? Are all epistemic actions in service of spatial thinking?
No. It is possible to think of counter-examples of epistemic actions that seek non-
spatial information. An everyday example would be placing two paint chips side by
side to make it easier to determine which is darker or more reddish, seeking informa-
tion about color. The science equivalent would be placing a spatula full of dirt or
sediment next to the color chips in the Munsell color chart [11].
Kirsh [12] developed a classification scheme for how humans (or other intelligent
agents) can manage their spatial environment: (a) spatial arrangements that simplify
choice; (b) spatial arrangements that simplify perception, and (c) spatial dynamics
that simplify internal computation. Our Situation #1, in which participants remove
rejected 3-D models from view, is a spatial arrangement that simplifies choice. Situa-
tion #2 and #3, in which participants juxtapose two items to simplify comparison, are
spatial arrangements that simplify perception. Situations #3 and #4 from the outcrop
experiment, plus the case of rotating a map to align with the terrain, simplify internal
computation by eliminating the need for mental rotation.
Kirsh's scheme classified epistemic actions according to the change in cognitive or
informational state of the actor. Epistemic actions could also be classified by the na-
ture of the change to the environment: (a) relocate/remove/hide objects, (b) cluster
objects, (c) juxtapose objects, (d) order or array objects, (e) rotate/reorient objects.
Considering both classification schemes together yields a two-dimensional matrix for
categorizing epistemic actions (Table 1). Each cell in the matrix of Table 1 describes
benefits obtained by the specified change to the environment (row) and change to the
cognitive state of the actor (column).
Epistemic Actions in Science Education 211
Kirsh's [12] taxonomy of actions to manage space was based on observation of people
playing games and engaging in everyday activities such as cooking, assembling fur-
niture, and bagging groceries. In the case of science or science education, we suggest
that epistemic actions can enhance cognition in a manner not explored by Kirsh: epis-
temic actions can exploit or enhance the dual nature of representations.
A spatial representation, such as a map, graph, or 3-D scale model, has a dual na-
ture: it is, simultaneously, a concrete, physical object, and a symbol that represents
212 K.A. Kastens, L.S. Liben, and S. Agrawal
something other than itself [13-18]. We suggest three ways in which epistemic actions
can exploit or enhance the dual nature of representations:
1. The action can rearrange or reorder the physical aspect of the representation so that
the referential aspect of the representation is more salient and/or has more
dimensions.
2. The action can rearrange or reorder the physical aspect of the materials so that a
more useful representation replaces a less useful representation.
3. The action can create a dual-natured representation from what had previously been
mere non-representational objects.
Mechanism (1): Manipulate the Physical Representation to Enhance or Fore-
ground its Referential Meaning. In Situation #4 of the artificial outcrop experiment,
an expert rotates candidate 3-D scale models to align with the full-scale structure.
Before rotation, the correct model accurately represented the full-scale structure with
respect to the attributes of concave/convex, elongate/circular, steep-sided/gentle-
sided, symmetric/asymmetric, and closed/open. After rotation, the model accurately
represented the full-scale structure with respect to all of those attributes, and also with
respect to alignment of the long axis. In other words, manipulating the physical object
transformed the representation into a more complete or more perfect analogy to the
referent structure. The same is true of rotating a map to align with the represented
terrain [19].
In addition to creating a new correspondence (alignment) where none had existed
previously, rotating the correct model to align with the referent space makes the other
correspondences more salient, and easier to check or verify. On the other hand, if the
model chosen is an incorrect model (for example, open-ended rather than closed-
contoured), the discrepancy between model and full-scale structure becomes harder to
overlook when the long axes of the model and referent are brought into alignment.
Mechanism (2): Manipulate the Physical Representation to Create a More Useful
Representation. In Situation #5 of the artificial outcrop experiment, the participant
had initially arranged annotated sketches of each outcrop onto her paper such that the
down-paper dimension represented the temporal sequence in which the eight outcrops
had been visited and the observations had been made. Upon receiving the task direc-
tions and seeing the choice array, she apparently realized that this was not a useful
organizational strategy. She physically destroyed that organization schema. Then she
physically reorganized the fragments into a more task-relevant spatial arrangement, in
which positions of outcrop sketches represented positions of full-scale outcrops. This
participant apparently had the ability to think of her inscriptions as both (a) a concrete
object that could be torn into pieces and reordered, and (b) a set of symbolic marks
standing for individual outcrops.
Mechanism (3): Manipulate the Physical World to Carry Representational
Meaning. In several of the examples described above, the objects have no represen-
tational significance before the epistemic action. The epistemic action creates repre-
sentational significance where none had previously existed.
For example, in the case of the children's growing bean plants, as a consequence
of the epistemic action, the spatial dimension parallel to the window sill becomes a
Epistemic Actions in Science Education 213
representation of water per unit time. The vertical dimension, the height of each plant,
becomes a representation of growth rate as a function of watering rate. The entire
array of plants becomes a living bar graph.
In the case of the fossils arranged on the table, the spatial dimension along the line
of fossils acquires two representational aspects, which run in parallel: geologic time
and evolutionary distance.
In the case of the igneous rocks, the two piles of rocks, fine-grained and coarse-
grained, represent the fundamental division of igneous rocks into extrusive and intru-
sive products of cooling magma. Within each pile, the rocks could further be ordered
according to the percentage of light-colored minerals, an indicator of silica content.
Kirlik [20] presents a compelling non-science example, in which a skilled short-
order cook continuously manipulates the positions of steaks on a grill, such that the
near-far axis of the grill (from the cook's perspective) represents doneness requested
by the customer, and the distance from left-hand edge of the grill represents time
remaining until desired doneness. This skilled cook need only monitor the perceptu-
ally-available attribute of distance from the left edge of grill, and need not try to per-
ceive the hidden attribute of interior pinkness, nor try to remember the variable attrib-
ute of elapsed-duration-on-grill. A less skilled cook in the same diner created only
one axis of representation (the near-far requested-doneness axis), and the least skilled
cook had no representations at all, only steaks.
activities improve? Is there individual variation in the epistemic actions found useful
by different science students or scientists, as Schwan and Riempp [23] have found
during instruction on how to tie nautical knots? Do those scientists who have reputa-
tions for "good hands in the lab" make more epistemic actions than those who do not,
by analogy with the strategic management of one's surrounding space that Kirsh [12]
found to be an attribute of expertise in practical domains?
Acknowledgements. The authors thank the study participants for their thoughts and
actions, G. Michael Purdy for permission to use the grounds of Lamont-Doherty Earth
Observatory, T. Ishikawa, M. Turrin and L. Pistolesi for assistance with data acquisi-
tion, L. Pistolesi for preparing the illustrations, and the National Science Foundation
for support through grants REC04-11823 and REC04-11686. The opinions are those
of the authors and no endorsement by NSF is implied. This is Lamont-Doherty Earth
Observatory contribution number 7171.
References
1. Kirsh, D., Maglio, P.: On distinguishing epistemic from pragmatic action. Cog. Sci. 18,
513–549 (1994)
2. Maglio, P., Kirsh, D.: Epistemic action increases with skill. In: Proceedings of the 18th an-
nual meeting of the Cognitive Science Society (1996)
3. Kastens, K.A., Ishikawa, T., Liben, L.S.: Visualizing a 3-D geological structure from out-
crop observations: Strategies used by geoscience experts, students and novices [abstract].
Geological Society of America Abstracts with Program, 171–173 (2006)
4. Kastens, K.A., Agrawal, S., Liben, L.S.: Research in Science Education: The Role of Ges-
tures in Geoscience Teaching and Learning. In: Geosci, J. (ed.) (2008)
5. Broadbent, D.E.: Perception and Communication. Oxford University Press, Oxford (1958)
6. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Ann. Rev. of
Neurosci. 18, 193–222 (2000)
7. Ballard, D.H., Hayhoe, M.M., Pook, P.K., Rao, R.P.N.: Deictic codes for the embodiment
of cognition. Beh. & Brain Sci. 20, 723–767 (1997)
8. Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth ten thousand words. Cog.
Sci. 11, 65–99 (1987)
9. Shepard, R.N., Metzler, J.: Mental Rotation of Three-Dimensional Objects. Sci. 171, 701–
703 (1971)
10. Liben, L.S., Downs, R.M.: Understanding Person-Space-Map Relations: Cartographic and
Developmental Perspectives. Dev. Psych. 29, 739–752 (1993)
11. Goodwin, C.: Practices of Color Classification. Mind, Cult., Act. 7, 19–36 (2000)
12. Kirsh, D.: The intelligent use of space. Artif. Intel. 73, 31–68 (1995)
13. Goodman, N.: Languages of art: An approach to a theory of symbols. Hackett, Indianapo-
lis (1976)
14. Potter, M.C.: Mundane Symbolism: The relations among objects, names, and ideas. In:
Smith, N.R., Franklin, M.B. (eds.) Symbolic functioning in childhood, pp. 41–65. Law-
rence Erlbaum Associates, Hillsdale (1979)
15. DeLoache, J.S.: Dual representation and young children’s use of scale models. Child
Dev. 71, 329–338 (2000)
Epistemic Actions in Science Education 215
16. Liben, L.S.: Developing an Understanding of External Spatial Representations. In: Sigel,
I.E. (ed.) Development of mental representation: theories and applications, pp. 297–321.
Lawrence Erlbaum Associates, Hillsdale (1999)
17. Liben, L.S.: Education for Spatial Thinking. In: Renninger, K.A., Sigel, I.E. (eds.) Hand-
book of child psychology, 6th edn., vol. 4, pp. 197–247. Wiley, Hoboken (2006)
18. Uttal, D.H., Liu, L.L., DeLoache, J.S.: Concreteness and symbolic development. In: Balter,
L., Tamis-LeMonde, C.S. (eds.) Child Psychology: A Handbook of Contemporary Issues,
pp. 167–184. Psychology Press, New York (2006)
19. Liben, L.S., Myers, L.J., Kastens, K.A.: Locating oneself on a map in relation to person
qualities and map characteristics. In: Freska, C., Newcombe, N.S., Gärdenfors, P. (eds.)
Spatial Cognition VI. LNCS, vol. 5248. Springer, Heidelberg (2008)
20. Kirlik, A.: The ecological expert: Acting to create information to guide action. In: The
Conference on Human Interaction with Complex Systems, Piscataway, NJ (1998)
21. Cowley, S.J., MacDorman, K.F.: What baboons, babies and Tetris players tell us about in-
teraction: A biosocial view of norm-based social learning. Cog. Sci. 18, 363–378 (2006)
22. National Research Council.: National Science Education Standards. National Academy
Press, Washington (1996)
23. Schwan, S., Riempp, R.: The cognitive benefits of interactive videos: learning to tie nauti-
cal knots. Learn. and Instr. 14, 293–305 (2004)
24. Magnani, L.: Model-based and manipulative abduction in science. Found. of Sci. 9, 219–
247 (2004)
25. Roth, W.M.: From epistemic (ergotic) actions to scientific discourse: The bridging func-
tion of gestures. Prag. and Cogn. 11, 141–170 (2003)
An Influence Model for Reference Object
Selection in Spatially Locative Phrases
1 Introduction
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 216–232, 2008.
c Springer-Verlag Berlin Heidelberg 2008
An Influence Model for Reference Object Selection 217
reference object may or may not be ambiguous and this leads to a variety of issues
which are discussed in Sect. 3.4. Even an ambiguous candidate reference object
must be treated differently from a referent in a referring expression because it
has a purpose in helping to locate the target.
that the ashtray is by the town-hall”. Talmy [1] lists attributes of located and
reference objects, and states that relative to the located object the reference is:
Thus the reference is likely to be somewhat bigger, if not vastly so, than the
target object. This scale issue is discussed in Sect. 3.3, permanence and perceiv-
ability in Sect. 3.2. These are not intended as absolute categorisations and the
model developed in this paper embodies the concept that the influences can be
traded against each other. For instance the phrase “the bicycle is leaning on the
bollard” uses as a reference an object smaller than the target (less appropriate)
but more permanently located (more appropriate).
Bennett and Agarwal [13] investigate the semantics of ‘place’ and derive a
logical categorisation of reference attributes. DeVega et al [14] analyse Spanish
and German text corpora and (with a restricted range of prepositions) find that
reference objects are more likely to be solid and countable (i.e. not a substance
like ‘snow’). It should be noted that the corpora were taken from novels rather
than first hand descriptions of real scenes.
Recent experimental work by Carlson and Hill [12] indicates that the geo-
metric placement of a reference is a more important influence than a conceptual
link between target and reference, and that proximity and joint location on a
cardinal axis (e.g., target directly above or directly to the left of reference) are
preferred (see Sect. 3.3). The experiments were carried out using 2-dimensional
object representations on a 2-dimensional grid. Earlier work by Plumert et al.
[15] focusses on hierarchies of reference objects in compound locative phrases
but also finds that in particular the smallest reference in the hierarchy might
be omitted if the relationship between it and the target did not allow sufficient
extra information to be provided (see Sect 3.4).
case of urban navigation. Based on interviewing subjects who have chosen par-
ticular landmarks in an experimental setting they derive the following charac-
teristics of good landmarks:
1. Permanence
2. Visibility,
3. Usefulness of Location
4. Uniqueness
5. Brevity of description
They also note that most landmarks do not exhibit all of the desired charac-
teristics, indeed the most frequently used landmarks, traffic lights, are ubiquitous
rather than unique. This is discussed in section 3.4.
The factors which contribute to “visual and cognitive salience” in urban way-
finding are investigated by Raubal and Winter [17] and Nothegger et al [18],
who test automatically selected landmarks against those selected by humans.
The measure of saliency for visual features is complex. Nothegger et al. [18]
point out that using deviation from a local mean or median value (for example
in a feature such as building colour) to represent salience does not hold for
asymmetric quantities such as size, where bigger is usually better than smaller.
Cognitive salience, including cultural or historic significance, is in practice related
to the issue of prior knowledge of the Landmark by the listener and is discussed
in section 3.2.
Winter [19] adds advance visibility to the list of desirable characteristics for
landmarks, citing both way-finder comfort and reduced likelihood of reference
frame confusion as reasons.
Sorrows and Hirtle [20], along with singularity (sharp contrast from the en-
vironment), prominence (visibility) and cultural or historic significance, which
are picked up in the lists already mentioned, also list accessibility and prototyp-
icality as characteristics of landmarks. Accessibility (as in the junction of many
roads) may make a landmark more frequently used and may lead to the accretion
of other characteristics useful for way-finding, but it probably mostly denotes
usefulness of location, which is further discussed in Sect. 3.3. Prototypicality is
an important factor as without specific knowledge of a landmark or reference,
categorical knowledge is required. A church which looked like a supermarket
would be a problematic reference.
Tezuka and Tanaka [21] note that landmark use is relative to the task at hand,
mode of transport and time of day. A good landmark for pedestrian navigation is
not necessarily good for car drivers. This seems always to be expressible in terms
of visibility etc. but highlights the need for speed, conditions and viewpoint to
be taken into account in assessing visibility. Also cultural factors, preferences
and, according to Klabunde and Porzel [22], social status may affect landmark
choice.
In [21] a reinforcement mechanism is proposed whereby landmark usage ef-
fectively improves the goodness of the landmark. The initial choice of a landmark
An Influence Model for Reference Object Selection 221
which subsequently becomes much used would presumably have been made be-
cause it displayed characteristics of a good landmark. However, an object’s prior
use as a landmark may cause continuation of use even if an otherwise more suit-
able landmark appears. A related case is noted in [20], “turn left where the red
barn used to be”, where the use of the landmark outlives the landmark itself.
The three primary influences on reference object suitability can be derived from
the necessary steps a listener must take on hearing a locative phrase, with the
addition of a cost function. Presented with a locative phrase and the task of
finding the target object the listener must do two things:
Reference suitability
Temporal relevance
Target mobility Reference mobility
(Listener presence)
Reference suitability
have multiple influences. For size, bounding box volume, convex hull volume,
actual volume, maximum dimension and sum of dimensions are all possible can-
didate influences. The apparent size, the area projected toward the speaker, may
in some cases be more important than the actual size. Raubal and Winter [17]
note this in the case of building façades, for instance. These are omitted from
Fig. 3 for simplicity, although they will be included in model implementations.
Persistence. Following Talmy [1] and the work by de Vega et al. [14] it is clear
that both the target object and candidate reference object mobility influence
reference choice. Intuitively the reference object is expected to be more stable
(see [25]) than the target. Also important, as pointed out by Burnett [16], is
when the listener will need to use the reference to find the target. If in Fig 1 the
target object is the post box and the listener will not be at the scene for some
time, then the pink house, rather than the skip (which may be removed) will be
a better reference even though the skip is nearer and plainly visible. This factor
is summarised as “Temporal relevance (listener presence)” in Fig. 3.
Scene Scale. As already noted, Miller and Johnson Laird [11] point out that the
scale of the reference and located objects are important in determining whether
a reference is appropriate. It is proposed here, following Plumert et al. [15], that
this is due to the influence on the search space. Choosing a large reference may
make the reference more apparent but may leave the listener a difficult task find-
ing the target object as, along with any preposition, it defines too large a region
of interest (e.g., “the table is near Oxford”). Reference size must be treated care-
fully as, dependent on geometry, the search space may vary considerably. To say
a target object is “next to the train” defines a large search area but to say that
it is “in front of the train” defines a much smaller area. Computational models il-
lustrating this can be seen in Gapp [26]. Geometry here is effectively a shorthand
term for what might be termed “projected area in the direction of the target” A
further important influence on search space is the location of the listener relative
to a target object of a given size. As Plumert et al. [15] point out, if the target
object is a safety pin and the listener is more than a few yards away, there may be
no single suitable reference. This factor is included with reference size and geom-
etry and target object size as influences on “scene scale” (see Fig 4) which in turn
influences search space. The real effect of some critical combination of a small tar-
get object and a distant listener will be to suppress the suitability of all reference
objects and force the decision to use a compound locative phrase containing more
than one reference. This is discussed in Sect. 4.5.
An Influence Model for Reference Object Selection 225
Reference ambiguity
Disambiguation by
Target size Target obscurance
grouping
Reference suitability
in the case where the target was “on the book” the extra communication cost of
using the two references was considered worthwhile by the speaker. It is possible
that there is a perceived chance of confusion in that an object “on A which
is on B” is not necessarily seen as “on B” (i.e., “on” is not always accepted as
transitive, although this is not necessarily the same as Miller and Johnson-Laird’s
limited transitivity [11]). The reference/target topology influence is included in
the model at present pending further testing of its relevance.
The inclusion of reference ambiguity along with disambiguation by grouping
in Fig. 4 is discussed in Sect. 3.4.
Reference Innate Cost. The costs of simple references such as “hill”, “house”
or “desk” are typically fairly comparable. However references can be parts of
objects (see [14]) such as “the town hall steps” or regions such as “the back
of the desk”. The distinction between this form of reference and a compound
reference is that there is still only a single preposition (in contrast to “in front
of the town hall by the steps”). It is clear that references of this nature incur
cost both for the speaker and the listener and in a computational model a cost
function will be required to prevent over-specifying of a reference (e.g., “The
back right hand corner of the desk”) when a less specific reference (“on the
desk”) would be sufficient. Sufficiency here is clearly related to the difficulty of
the search task. How these costs are quantified in the model, beyond a simple
count of syllables (which will be used in initial implementations), needs further
investigation.
Search Task Difficulty. It was earlier noted that communication cost would
become important if the time taken for the communication approached that re-
quired for the speaker to locate the target. As noted this is a factor in the results
reported by Plumert et al. [15]. The study concluded that a secondary reference
might be omitted because the target was “in plain view” although the topolog-
ical relationships involved were also a factor (see Sect. 3.3). Much of the search
task difficulty is already expressed in the model as search-space optimisation
and does not require re-inclusion as a factor influencing communication cost;
however, some factor is required in the model to represent the speed of visual
search the listener is capable of. This should be more or less constant for human
listeners and if not would require the speaker to know if the listener was much
slower or quicker than normal, which is outside the scope of the model at this
point. As a constant it should be incorporated automatically into the weights of
the model as it is learned and so is not explicitly included in Fig. 5.
Reference ambiguity
Disambiguation by
Reference innate cost
specification
Reference suitability
the evaluation is required (i.e., ignoring references that are clearly unsuitable).
Evidence from research on visual search (see for example Horowitz and Wolfe
[29]), although not directly applicable to the reference choice task, may help
guide experiments in this area.
results from a limited model, containing some 8 variables relating to target and
reference geometry, trained with a 320 case scene corpus, suggest that results
from the full model will be very worthwhile.
References
[1] Talmy, L.: Toward a Cognitive Semantics. MIT Press, Cambridge (2000)
[2] Dale, R., Reiter, E.: Computational interpretations of the gricean maxims in the
generation of referring expressions. Cognitive Science 19, 233–263 (1995)
[3] Duwe, I., Kessler, K., Strohner, H.: Resolving ambiguous descriptions through
visual information. In: Coventry, K.R., Olivier, P. (eds.) Spatial Language. Cog-
nitive and Computational Perspectives, pp. 43–67. Kluwer Academic Publishers,
Dordrecht (2002)
[4] van Deemter, K., van der Sluis, I., Gatt, A.: Building a semantically transparent
corpus for the generation of referring expressions (2006)
[5] Regier, T.: The human semantic potential: Spatial language and constrained con-
nectionism. MIT Press, Cambridge (1996)
[6] Lockwood, K., Forbus, K., Usher, J.: Spacecase: A model of spatial preposition
use. In: Proceedings of the 27th Annual Conference of the Cognitive Science So-
ciety (2005)
[7] Coventry, K.R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce,
D., Richards, L.V.: Spatial prepositions and vague quantifiers: Implementing the
functional geometric framework. In: Proceedings of Spatial Cognition Conference
(2004)
[8] Roy, D.K.: Learning visually-grounded words and syntax for a scene description
task. Computer Speech and Language 16(3) (2002)
[9] Herzog, G., Wazinski, P.: Visual translator: Linking perceptions and natural lan-
guage descriptions. Artificial Intelligence Review 8, 175–187 (1994)
[10] Herskovits, A.: Schematization. In: Olivier, P., Gapp, K.-P. (eds.) Representation
and Processing of Spatial Expressions, pp. 149–162. Laurence Earlbaum Asso-
ciates (1998)
[11] Miller, G.A., Johnson-Laird, P.N.: Language and perception. Harvard University
Press (1976)
[12] Carlson, L.A., Hill, P.L.: Processing the presence, placement, and properties of a
distractor in spatial language tasks. Memory and Cognition 36, 240–255 (2008)
[13] Bennett, B., Agarwal, P.: Semantic categories underlying the meaning of ‘place’.
In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS,
vol. 4736. Springer, Heidelberg (2007)
[14] de Vega, M., Rodrigo, M.J., Ato, M., Dehn, D.M., Barquero, B.: How nouns and
prepositions fit together: An exploration of the semantics of locative sentences.
Discourse Processes 34, 117–143 (2002)
[15] Plumert, J.M., Carswell, C., DeVet, K., Ihrig, D.: The content and organization
of communication about object locations. Journal of Memory and Language 34,
477–498 (1995)
[16] Burnett, G.E., Smith, D., May, A.J.: Supporting the navigation task: charac-
teristics of good landmarks. In: Proceedings of the Annual Conference of the
Ergonomics Society. Taylor & Francis, Abington (2001)
[17] Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks.
In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp.
243–259. Springer, Heidelberg (2002)
232 M. Barclay and A. Galton
[18] Nothegger, C., Winter, S., Raubal, M.: Computation of the salience of features.
Spatial Cognition and Computation 4, 113–136 (2004)
[19] Winter, S.: Route adaptive selection of salient features. In: Kuhn, W., Worboys,
M., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825. Springer, Heidelberg (2003)
[20] Sorrows, M., Hirtle, S.: The nature of landmarks for real and electronic spaces.
In: Freska, C., Mark, D. (eds.) Spatial Information Theory: Cognitive and Com-
putational Foundations of GIS. Springer, Heidelberg (1999)
[21] Tezuka, T., Tanaka, K.: Landmark extraction: A web mining approach. In: Cohn,
A.G., Mark, D.M. (eds.) COSIT 2005. LNCS, vol. 3693. Springer, Heidelberg
(2005)
[22] Klabunde, R., Porzel, R.: Tailoring spatial descriptions to the addressee: a
constraint-based approach. Linguistics 36(3), 551–577 (1998)
[23] Mainwaring, S.D., Tversky, B., Ohgishy, M., Schiano, D.J.: Descriptions of simple
spatial scenes in english and japanese. Spatial Cognition and Computation 3(1),
3–43 (2003)
[24] Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and
Semantics: Speech Acts, vol. 3, pp. 43–58. Academic Press, New York (1975)
[25] Vandeloise, C.: Spatial Prepositions. University of Chicago Press (1991)
[26] Gapp, K.P.: An empirically validated model for computing spatial relations.
Künstliche Intelligenz, pp. 245–256 (1995)
[27] Regier, T., Carlson, L.: Grounding spatial language in perception: An empiri-
cal and computational investi- gation. Journal of Experimental Psychology: Gen-
eral 130(2), 273–298 (2001)
[28] Tenbrink, T.: Identifying objects on the basis of spatial contrast: An empiri-
cal study. In: Freksa, C., Knauff, M., Krieg-Bruckner, B., Nebel, B., Thomas
Barkowsky, T. (eds.) Spatial Cognition IV: Reasoning, Action, Interaction. In-
ternational Conference Spatial Cognition 2004, pp. 124–146. Springer, Heidelberg
(2005)
[29] Horowitz, T.S., Wolfe, J.M.: Search for multiple targets: Remember the targets,
forget the search. Perception and Psychophysics 63, 272–285 (2001)
[30] Barclay, M.J., Galton, A.P.: A scene corpus for training and testing spatial com-
munication systems (in press, 2008)
[31] Montello, D.R.: Scale and multiple psychologies of space. In: Frank, A.U., Cam-
pari, I. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg
(1993)
Tiered Models of Spatial Language
Interpretation
Robert J. Ross
1 Introduction
While particular semantics and schema based models of spatial language use have
been proposed in the literature [1,2], as well as layered spatial representation
and reasoning models [3,4], and a wealth of qualitative and quantitative models
of spatial reasoning (see [5] for a review), the processing of spatial language
remains challenging both due to to complexities of spatial reasoning, but also
because of the inherent difficulties of language processing due to the remarkable
efficiency of spoken communication (see [6] for a discussion). For the development
of sophisticated linguistically aware spatial applications, it is not only necessary
to develop spatial reasoning systems, but it is also necessary to identify the
properties of spatial language - particularly with respect to what elements of
language are left under-specified for efficient communication, and which hence
must be retrieved through other mechanisms.
In this paper we will consider this problem of moving from the surface lan-
guage form to embodied processing for verbal route instructions. Route interpre-
tations, like scene descriptions, involve the semantic conjunction, or complexing,
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 233–249, 2008.
c Springer-Verlag Berlin Heidelberg 2008
234 R.J. Ross
(1) go into the second room, it’s the one after John’s office, on the left
(SL1 / NonAffectingDirectedMotion
processInConfiguration (G1 / Going)
route (GR1 / GeneralizedRoute
pathPlacement (GL1 / GeneralizedLocation
relatum (C1 / Corridor)
hasSpatialModality (SM1 / PathIndicatingExternal))
destination (GL2 / GeneralizedLocation
relatum (K1 / Kitchen)
hasSpatialModality (SM2 / GeneralDirectional))))
goal, then we must define those actions in a meaningful way. Such definitions
require, amongst other factors, a suitable choice of action granularity, relevant
parametrization, as well as the traditional notions of applicability constraints and
effects. For the current model, we have chosen a granularity and enumeration of
action close to the conception of spatial action in human language as identified
in the Generalized Upper Model introduced earlier. We will refer to these action
types as action schemas, but it should be noted that the types of action schemas
and GUM configurations are not one to one; action schemas necessarily introduce
full spatial information including perspective, and are also, as will be seen below,
marginally finer grained than GUM configurations.
Excluding non-spatial features such as start time, performer of the action and
so forth, we can define the generalized form of a directed motion action schema
as follows:
M otion(direction, extent, pathConstraint) (2)
where:
For each action there is also an implicit source which is the starting point of
any motion. The source of a motion, typically omitted from surface language, is
necessarily required to define an action. Trivially the source of motioni is equal
to the final location of motioni−1 . Furthermore, certain pragmatic constraints
hold on which parameters of a motion action schema are set. For example, the
specification of an extent without either direction or a pathConstraint is not
permitted. Furthermore explicit definition of extent and a path constraint must
not be contradictory with respect to the world model.
While action schemas are similar in centralization and composition to config-
urations within the Generalised Upper Model, action schemas are more finely
centralized, typically decomposing a single GUM motion configuration into mul-
tiple action schemas, e.g., the configuration given for Sentence 2 earlier is given
by two distinct action schemas instances within the embodiment model, one cap-
turing the path placement constraint, while the other captures the destination
constraint. Multiple action schemas are then given a logical structuring with
ordering and conditional operators.
We must also define the effects of such schemas. The defining characteristic of a
movement is a profile of probable location of the mover following the initialization
of the action. While there are some logical symbolic ways to define such results,
our approach follows Mandel et al. [16] in that we give a probable location of
Tiered Models of Spatial Language Interpretation 241
the mover as a function of the starting pose and the action schemas considered
as follows:
p(xj , yk , ol ) = fschema (x0 , y0 , o0 ) (3)
where x0 , y0 , o0 denotes the starting pose of the agent (location on Cartesian
plane and orientation respectively), p(xj , yk , ol ) denotes the probability of even-
tual occupation of an arbitrary pose, and the motion profile of each schema is
determined empirically as a function of the supplied parameters.
While action schemas and the logical form of language share common features,
the mapping function between the two is non-trivial and is highly dependent
on forms of spatial and discourse context. To illustrate, Figure 2 schematically
depicts an office environment with a robot (oval with a straight line to indi-
cate orientation). In such environments, the identification of discourse referents
and motion constraints defined largely in terms of the agent’s ego-centric per-
spective, the grounding process must also include the application of perspective
and reference frame transformation to directions provided in surface form spatial
descriptions.
For single action instructions, if, during the grounding process, a unique
parametrization of the action schema can be made, then the action may be
committed to by the agent immediately. Whereas if no suitable parametrization
is found, or if multiple solutions exist, then clarification through spoken dialogue
is necessary to resolve the inherent ambiguity. For multiple action schemas as
typified by complete route instructions we must adopt an incremental integration
approach which composes a process structure from supplied information:
1. Construct multiple ungrounded action schemas through decomposition and
augmentation of surface spatial language configurations.
2. For action schema 1 apply grounding process as per single schema grounding,
Store final position, probability tuples for the action.
3. For action schema i + 1 take the most probably location tuples from action
schema i and supply them as input parameters to the grounding of action
schema i + 1.
4. If for action schema n one solution exists where the probability of location
(p) is greater than a threshold (t), the sequence of grounded action schemas
can be committed to by the agent.
This method, similar to the search algorithm applied by [16], essentially moves
a set of most probable locations through a physical search space seeking the
most probable final destination given the set of action specifications supplied.
However since the search space in our case has been simplified to a conceptual
graph structure which includes information on explicit junctions etc., rather than
a more fine grained voronoi graph which treats all nodes equally, the search
process is considerably simplified, resulting in even short route interpretations
providing accurate results.
Moreover, the current model offers a simple backtracking solution for the
case where for action schema n the number of solutions is greater than one, or
where no solution exists. In this case rather than rejecting the user’s request,
the interpretation algorithm may backtrack to the last action segment where
no unique solution exists and compose a clarification question relevant to that
point.
The models of spatial action and route interpretation described earlier have
been partially implemented within a modular framework of spatial language
production and analysis, and evaluated as a whole in a user study with the
developed system.
244 R.J. Ross
References
19. Eschenbach, C.: Geometric structures of frames of reference and natural language
semantics. Spatial Cognition and Computation 1(4), 329–348 (1999)
20. Bateman, J., Farrar, S.: Modelling models of robot navigation using formal spatial
ontology. In: Proceedings of Spatial Cognition 2004 (2004)
21. Larsson, S.: Issue-Based Dialogue Management. Ph.d. dissertation, Department of
Linguistics, Göteborg University, Göteborg (2002)
22. Steedman, M.J.: The syntactic process. MIT Press, Cambridge (2000)
23. Ross, R.J., Shi, H., Vierhuf, T., Krieg-Bruckner, B., Bateman, J.: Towards Dialogue
Based Shared Control of Navigating Robots. In: Proceedings of Spatial Cognition
2004, Germany. Springer, Heidelberg (2004)
24. Goschler, J., Andonova, E., Ross, R.J.: Perspective use and perspective shift in
spatial dialogue. In: Proceedings of Spatial Cognition 2008 (2008)
Perspective Use and Perspective Shift
in Spatial Dialogue
1 Introduction
Communication about the world, even in its simplest form, can easily turn into a
problem-solving task because form and function do not match unequivocally in
language systems. Multiple forms may correspond to one and the same function
or meaning, and multiple functions may be associated with one and the same
verbal expression. In addition, the same referential object or scene can trigger a
number of different perceptual and conceptual representations [1], or a certain
arrangement of objects can be perceived and conceptualized in multiple ways. For
example, in a study of goal-directed dialogue [2], different description schemes
were used by participants in reference to a maze and movement in it (path,
coordinate, line, and figural schemes). Similarly, in a study of how people describe
complex scenes with multiple objects, participants’ choices varied significantly
[3] depending on the nature of the spatial array. Thus, we wanted to investigate
how people deal with these issues in a dialogic spatial task. Specifically, we
were interested in their perspective-taking and how they would solve occurring
ambiguities and misunderstandings.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 250–265, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Perspective Use and Perspective Shift in Spatial Dialogue 251
Multiple perspectives, or ways of speaking about the world and the entities
that populate it, are reflected at different levels of language, e.g., in lexical and
syntactic alternatives, but also in variation at a conceptual level. In spatial refer-
ence, different conceptualizations can be seen in the choices of spatial perspective
and frames of reference. Perspective taking involves abstracting from the visual
scene or schematization [4] and it has been interpreted as occurring at the level of
microplanning of utterances [5,6] rather than macroplanning (deciding on what
information to express, e.g., which landmarks and their relations are to be men-
tioned). Therefore, while being related to lexical and grammatical encoding, it
carries conceptual choices beyond them.
In spatial perspective, there have been two views as defined by Tversky [7] - on
the narrow view, perspective is realized through the choice of reference system,
variously classified into deictic, intrinsic, extrinsic, or, egocentric and allocentric
in wayfinding, and relative, intrinsic, and absolute in Levinson’s framework [8].
On the other hand, the broadly viewed perspective choices refer to the use of
reference systems in extended spatial descriptions (e.g., of a room, apartment,
campus, town). Spatial perspective of this kind has also been categorized in al-
ternative ways. In a binary classification schema, embedded perspective refers
to a viewpoint within the environment and goes well together with verbs of
locomotion and terms with respect to landmarks’ spatial relations to an agent
while external perspective takes a viewpoint external to the environment and
is commonly associated with static verbs and cardinal directions [9]. In a tri-
partite framework of spatial perspective, the route perspective/tour is typical
of exploring an environment with a changing viewpoint, the gaze perspective is
associated with scanning a scene from a fixed viewpoint outside an environment
(e.g., describing a room from its entrance), and in the survey perspective a scene
or a map is scanned from a fixed viewpoint above the environment [7,6].
Variability in perspective is an important feature of spatial language. Previous
studies have considered several individual, environmental, and learning factors
as a source of this kind of variation in verbal descriptions. Mode of knowledge
acquisition has been shown to affect perspective choices in spatial memory, for
example, participants who studied maps gave more accurate responses later to
survey perspective tasks whereas participants who were navigating gave more
accurate responses to route perspective tasks [10]. In addition, in these exper-
iments, spatial goals (route vs. survey) were also shown to affect perspective
choices.
Taylor & Tversky [6] tested the influence of four environmental features on
spatial perspective choices and found that although overall most participants’ de-
scriptions followed a survey or a mixed perspective, preference for the use of route
perspective rather than mixed was enhanced in environments that contained a
single path (vs. multiple paths) and environments that contained landmarks of
a single size scale (vs. landmarks of varying size). The other two environmental
features that were manipulated (overall size and enclosure) did not produce any
clear pattern of preferences in their participants’ descriptions.
252 J. Goschler, E. Andonova, and R.J. Ross
When two interlocutors refer to one and the same spatial array, they select
a frame of reference or a perspective for the description. Thus, in dialogue,
perspective use and perspective switching are part of overall coordination. The
need to align perspectives may arise because interlocutors have different viewing
positions (vantage points) with respect to a scene, or because the terms referring
to objects’ spatial relations may be ambiguous or underspecified. In our study,
we kept the vantage point invariable, but there were two possible perspectives
on the scene, namely survey and route perspective: participants could look at
the map and refer to the main directions as left, right, up, and down in a survey
perspective; or they could take the perspective of the wheelchair avatar and refer
to the main directions with left, right, forward, backward in a route perspective.
The availability of these two different perspectives on spatial scenes leads
in many situations to ambiguous utterances in route or location descriptions,
e.g., the meaning of left and right may differ in route and survey perspective.
Whenever people have to deal with two-dimensional representations of three-
dimensional space, this problem is likely to occur. Thus, the data we collected
with participants who navigated a wheelchair avatar on a map on the computer
screen point to more general problems when people have to use maps of any
kind. For example, if one is told to go left in the position indicated by the
wheelchair avatar on Fig.1., this could be interpreted as an instruction to turn
to one’s intrinsic left and then continue movement (in the route perspective)
or to move in the direction they are already facing in which case left would be
employed as a term in the survey perspective on this map. This term left could
receive the same interpretation only when the instruction-follower’s orientation
is aligned with the bodily axes of the speaker (the two perspectives would then
be conflated).
How do interlocutors manage to align perspectives and communicate success-
fully then? Alignment or marking of perspective can be achieved explicitly by
giving a verbal signal of the choice or of the switch of perspective. Previous
research has indicated that this is rare. However, previous studies have mostly
focused on individuals giving a spatial description to an imaginary rather than a
real interlocutor. Dialogue, on the other hand, offers the addressee the possibility
to explicitly confirm or question a perspective choice or even initiate a switch
that may not have otherwise occurred.
In the corpus of data we collected, participants did so by saying “from my
point of view”, “(to the left) on the map”, “if you look at the picture” to express
that they were using the survey perspective. The route perspective was signalled
254 J. Goschler, E. Andonova, and R.J. Ross
Table 1. Examples of linguistic markers of route and survey perspective in the corpus
by phrases such as “seen from the wheelchair”, “seen from the driver”, etc.
(Table 1). There are in fact some further linguistic markers for perspective that
can give the interaction partners clues about which perspective their dialogue
partner is taking. For example, while the terms left and right (Ger., “links”,
“rechts”) are perspectively ambiguous, up/above and down (Ger., “hoch”/“nach
oben”; “runter”/“nach unten”) are not.
Alignment of perspective, however, may also be achieved implicitly, without
any verbal reaction. In the case of real-world tasks such as navigation, for exam-
ple, tacit agreement (and alignment of perspective) may also occur at the level
of subsequent task-relevant non-verbal action (e.g., physical movement) by the
instruction-follower which indicates that the previous speaker’s utterance was
treated as felicitous enough and ensuing action could be initiated. Most of the
participants in our study did not refer explicitly to their perspective choices at
all, and still managed to take the same perspective and accomplish the task.
3 Method
In order to examine how people deal with the problem of perspective when more
than one is possible and appropriate to use, we elicited a small corpus of typed
interaction by giving participants a shared spatial task. To accomplish this task,
which consisted of the navigation of an iconic wheelchair on the schematized map
of an office building, participants had to interact with a partner. Participants’
utterances were then analyzed with respect to the use of the route and survey
spatial perspectives.
3.1 Participants
Participants were 22 sixteen to seventeen year-old students at a local high-school.
All of them were native speakers of German. Dialogue partners communicated
in same-sex dyads (5 male, 6 female).
Perspective Use and Perspective Shift in Spatial Dialogue 255
3.2 Apparatus
3.3 Stimuli
Each dyad participant was given a view of a schematised indoor environment
on the computer screen. The same spatial environment was available to both
speakers to minimize explicit negotiation of the map. The same map was used
throughout all trials and with all dyads. The map, depicted in Figure 1, included
unnamed locations, 6 named locations, and the position of the wheelchair avatar
at any given time. One participant’s view also indicated a target location for the
shared task through the highlighting of one room on the screen.
3.4 Procedure
Two participants at a time were placed at the terminals and provided with sepa-
rate written instructions. Instructions required the participant with the joystick
to act as an instruction-follower in the interaction by imagining that they were
situated in the environment with their partner, who is in the wheelchair and is
instructing them towards a goal. The instruction-giver on the other hand was
asked to imagine being situated in the wheelchair and giving instructions towards
a goal.
256 J. Goschler, E. Andonova, and R.J. Ross
Fig. 1. Interface window. The goal area is identified only on the instruction-giver’s
map.
The complete task consisted of 11 trials where within each trial the instruction-
giver directed the instruction-follower towards the goal. Each trial began with the
wheelchair avatar located within a room but facing an exit onto a corridor. Par-
ticipants were then free to communicate via the chat interface. No time-out was
used for the trial, but instructions did request that participants attempt the task
as quickly as possible. Once participants had successfully navigated the wheelchair
avatar to the target room, the screen went blank. After two seconds, the map reap-
peared with a new starting position of the wheelchair in one of the rooms and a
new target room. The same 11 start and end point configurations were used across
all dyads in a different pseudo-randomized order for each dyad.
While the task structure is similar to the Map Task [16], and in particular its
text-mediated realization by Newlands et al. [17], in that both tasks involve the
description of routes between two interlocutors, there are important differences
between the tasks with respect to our research goals. The Map Task purposefully
introduces disparities between the maps used by interlocutors to solicit explicit
discussion of the spatial arrangements presented. While this results in interesting
dialogue structure, it also complicates rationality for explicit perspective shift,
which, as we will see, exists even with the isomorphic spatial representations
Perspective Use and Perspective Shift in Spatial Dialogue 257
present in our task. Moreover, it has been our aim to analyse communication in
an interaction situation which is more directly related to our targeted application
domain of route following assistance systems [15].
The main research questions in this study were related to the choice of perspec-
tive made by the instruction-giver and instruction-follower in these dialogues,
how their choices changed over time, especially in terms of the general efficiency
of interaction (measured here in number of utterances spoken before the goal
was reached), how much coordination there was between interlocutors, and the
patterns underlying shifts from route to survey perspective and vice versa.
There are several features of the design that are very likely to have influ-
enced the choice of perspective. One was the setup with the map on the screen
which was positioned vertically in front of them. This should trigger the use of
the survey perspective since it is the one aligned with their own bodily axes. It
is unambiguous and cognitively “cheaper” because there is no need for mental
rotation because of the orientation of the wheelchair. That is why the survey
perspective could have been expected to dominate. On the other hand, partic-
ipants may have been biased towards the use of the route perspective by the
task in which movement in a wheelchair with its clear intrinsic front and back
was involved. In addition, participants were explicitly encouraged to take the
perspective of the wheelchair in the task instructions.
The interaction in the eleven dyads on 11 trials each yielded a corpus of 121
dialogues and a total of 1301 utterances, the majority of which (1121) were
task-related. As the focus of this study was on perspective use, only the 552
utterances indicating a spatial perspective were included in the analyses (49.24%
of all task-related dialogues). Other task-related utterances included incremental
route instructions by the instruction-giver such as go on or stop, go out of the
room or similar and clarification questions by the instruction-follower such as
what?, where to? here?.
In order to examine the preference for one of the two perspectives (route vs.
survey), we first classified all task-related utterances indicating a spatial per-
spective into the following categories: (a) utterances with route perspective, (b)
utterances with survey perspective, (c) utterances with mixed perspective, and
(d) utterances with conflated perspective where the description is valid in both
route and survey perspectives. Only a small percentage was either mixed (1.59%)
or conflated (7.58%) and they were excluded from subsequent analyses. Thus,
the data could be analysed in terms of a binary choice between route and sur-
vey perspective utterances yielding a mean percent use of route perspective as
a measure and 462 utterances to be included in the analysis. As a result, the
overall mean percent use of route perspective in this corpus was established as
258 J. Goschler, E. Andonova, and R.J. Ross
Table 2. Number of utterances with spatial perspective and mean percent use of
utterances in route perspective produced by instruction-givers and instruction-followers
director and the matcher became more efficient not only from one trial to the next
but also from the beginning to the end of a trial (measured in number of words).
In our study, we addressed the question of changes in the efficiency of interaction
over time in a correlation analysis which revealed a negative correlation between
trial number, on the one hand, and the average number of utterances produced
on a trial by dyads, the number of instruction-givers’ utterances, and the num-
ber of instruction-followers’ utterances, on the other hand. These correlations
reached significance but were rather weak (Table 3) and need to be interpreted
in view of the differences across experimental designs and measures. The studies
using the referential communication task mentioned above examine efficiency in
reference to the same shape, object, or stimulus more generally, selected from
a limited and pre-specified set of options which were visually available to both
participants, whereas in our study, although the task remained the same (giving
route directions in a certain map), the routes themselves, i.e., the positioning
of their start and end points on the map, varied on each trial. Still, the overall
result is clear - efficiency in terms of shorter dialogues increases across the span
of the experimental session.
Perspective shift was coded systematically in the following way: Every use of a
certain perspective by an interlocutor was checked against the speaker’s latest
utterance in a spatial perspective on a given trial. If the perspective in the
utterance differed from the perspective used in the previous utterance, it was
coded as a perspective shift, e.g., from route to survey or vice versa. Shifts across
speakers were not included in the analysis.
In order to examine the distribution of perspective shifts across trials and
dyads, we calculated the mean percent perspective shift on each trial for each
dyad. Although the overall percentage of perspective shift was relatively low
(M=8.78%), there was considerable variation across dyads (SD=15.94%) ranging
from 0% to 67% switches. We found that although perspective shift did not
correlate with trial number, it did correlate positively with the three measures
of efficiency (number of utterances overall, utterances by instruction-givers, and
utterances by instruction-followers). Trials on which participants were ‘high-
volume’ speakers tended to elicit also more perspective shifts in their utterances.
That perspective switches occur at all is not surprising and has been described
by Tversky [7]. They argue that after a while, both perspectives are conceptually
represented and available in speakers’ minds. The question is, when and why do
switches occur?
The factors influencing perspective choices can be of spatial or communicative
nature. Thus, certain changes in the spatial situation could be responsible for
the occurrence of perspective shifts. In addition, both the interlocutor’s verbal
and non-verbal behaviour can lead to a perspective switch. If the instruction-
giver is faced with behaviour by the instruction-follower that does not follow the
plan as outlined and intended by the instruction-giver, for example, a turn in
the wrong direction, this could be a reason for the speaker to shift perspective in
order to achieve better mutual understanding. We found indeed occurrences of
misunderstandings and mistakes that might have triggered a perspective shift,
as is likely to have happened in this piece of conversation:
(1) a. Instructor: fahr den flur nach rechts ganz durch. Dann nach links in den
2. flur und in den letzten raum rechts.
drive through the corridor to the right. Then to the left into the 2nd
corridor and into the last room on the right-hand side
b. Instructor: falscher raumM
wrong room
c. Instructee: wo denn?
where?
d. Instructor: nach oben
up
e. Instructor: jez revhts
now right
262 J. Goschler, E. Andonova, and R.J. Ross
In this example, the instruction-giver first describes the way in the route per-
spective but then the instruction-follower makes a navigation error which the
instruction-giver points out by uttering falscher raum (E., “wrong room”). When
the instruction-follower asks wo denn? (E., “where?”), the instruction-giver im-
mediately switches to the survey perspective by saying “nach oben” (E., “up”).
The following example shows how the use of a certain perspective by one of
the interlocutors could influence their partner to use the same perspective, i.e.,
to align with them:
(2) a. Instructor: so, jetzt wieder links, dann den gang nach rechts direkt
so, now left again, then the corridor to the right directly
b. Instructor: links
left
c. Instructee: ich bin nu oben links in der Ecke
I am in the upper left corner
d. Instructor: aso, dann auf der karte nach rechts
ok, then to the right on the map
e. Instructor: genau, jetzt nach unten
exactly, now down
Here again, the instruction-giver starts by using the route perspective, but
the instruction-follower interrupts with a description of her location in the sur-
vey perspective. The instruction-giver switches to survey, marking the use of this
perspective explicitly by saying auf der Karte (E., “on the map”). The interlocu-
tor, in this case the instruction-follower, can even explicitly ask for directions in
another perspective than the one used by the instruction-giver:
had given directions in the route perspective. After answering this particular
question in the survey perspective, the instruction-giver switches back to using
the route perspective.
Thus, the analysis of the corpus data shows that the verbal behavior of the
interlocutor exerts an influence on perspective choices and can lead to perspective
shifts. It remains to be studied in future research to what extent perspective
shifts can be caused by different kinds of verbal behavior, non-verbal behavior
(spatial action), and how exactly these diverse factors interact.
5 Conclusions
Previous findings have shown variability in spatial perspective and perspective
shift to be ubiquitous in monologic descriptions of spatial relationships and in
spatial instructions such as those found in route directions. This study explored
these issues in a dialogic online navigation task which enhances the ecological
validity of such interactions. The analyses reveal that within a route instruction
task of this kind, survey and route spatial perspectives are more or less equally
likely to occur, although there is a clear tendency for instruction-givers to have
an initial preference for route-perspective descriptions which, however, gradually
evolve towards the more economical and efficient use of route perspective instruc-
tions. This is, in effect, a reflection of a trend away from a rather incremental, lo-
cal, ego-based strategy towards a more holistic, global, and environment-oriented
strategy in producing directions.
Our results also point towards a great deal of coordination among speakers,
even though the instruction-followers’ verbal contributions were limited because
of the nature of the task and the requirement to navigate via a joystick. In
addition, the findings support communicative models that account for increased
efficiency as a result of joint effort across the lifespan of an interaction.
Our data confirm the occurrence of mixing of perspective in spatial language
on a regular basis (on approximately 9% of all trials) and thus show that this
phenomenon is not restricted to monological spatial descriptions. The correlation
between perspective shifts and number of utterances needed before the spatial
goal is reached reflect the tendency for speakers in more efficient dialogues to
stay within one perspective and minimize the number of switches in describing
and negotiating a route. As a whole, we identified several driving forces behind
perspective shifts in dialogues, including the relative difficulty of specific spatial
situations and changes across situations, navigation errors by the interlocutor,
and explicit and implicit verbal reactions by the interlocutor. Controlled experi-
mental paradigms in future research need to disentangle these diverse influences.
Acknowledgements
We gratefully acknowledge the support of the Deutsche Forschungsgemein-
schaft (DFG) through the Collaborative Research Center SFB/ TR 8 Spatial
264 J. Goschler, E. Andonova, and R.J. Ross
Cognition - Projects I5-[DiaSpace]. We would also like to thank the students and
teachers of the Gauderkesee Gymnasium, Bremen for their participation in our
study.
References
1. Clark, E.: Conceptual perspective and lexical choice in acquisition. Cognition. 64,
137 (1997)
2. Garrod, S.C., Anderson, A.: Saying what you mean in dialogue: a study in concep-
tual and semantic co-ordination. Cognition. 27, 181–218 (1987)
3. Andonova, E., Tenbrink, T., Coventry, K.: Spatial description, function, and con-
text (submitted)
4. Tversky, B., Lee, P.U.: How space structures language. In: Freksa, C., Habel, C.,
Wender, K.F. (eds.) Spatial Cognition 1998. LNCS (LNAI), vol. 1404, pp. 157–176.
Springer, Heidelberg (1998)
5. Levelt, W.J.M.: Speaking: From intention to articulation. MIT Press, Cambridge
(1989)
6. Taylor, H., Tversky, B.: Perspective in spatial descriptions. Journal of Memory and
Language 35, 371–391 (1996)
7. Tversky, B., Lee, P., Mainwaring, S.: Why do speakers mix perspectives? Spatial
cognition and computation 1, 399–412 (1999)
8. Levinson, S.C.: Space in language and cognition: explorations in cognitive diversity.
Cambridge University Press, Cambridge (2003)
9. Kriz, S., Hegarty, M.: Spatial perspective in spoken descriptions of real world envi-
ronments at different scales. In: Proceedings of the XXVII Annual Meeting of the
Cognitive Science Society, Stresa, Italy (2005)
10. Taylor, H., Naylor, S., Chechile, N.: Goal-specific influences on the representation
of spatial perspective. Memory & Cognition 27, 309–319 (1999)
11. Levelt, W.: Perspective Taking and Ellipsis in Spatial Descriptions. In: Bloom, P.,
Peterson, M., Nadel, L., Garrett, M. (eds.) Language and Space, pp. 77–109. MIT
Press, Cambridge (1996)
12. Vorwerg, C.: Consistency in successive spatial utterances. In: Coventry, K., Ten-
brink, T., Bateman, J. (eds.) Spatial language and dialogue. Oxford University
Press, Oxford (in Press)
13. Schober, M.F.: Spatial perspective taking in conversation. Cognition 47(1), 1–24
(1993)
14. Striegnitz, K., Tepper, P., Lovett, A., Cassel, J.: Knowledge representation for
generating locating gestures in route directions. In: Spatial Language in Dialogue.
Oxford University Press, Oxford (2008)
15. Ross, R.J.: Tiered models of spatial language interpretation. In: Proceedings of
Spatial Cognition 2008, Freiburg, Germany (2008)
16. Anderson, A.H., Bader, M., Bard, E.G., Boyle, E.H., Doherty, G.M., Garrod, S.C.,
Isard, S.D., Kowtko, J.C., McAllister, J.M., Miller, J., Sotillo, C.F., Thompson,
H.S., Weinert, R.: The HCRC Map Task corpus. Language and Speech 34(4), 351–
366 (1992)
17. Newlands, A., Anderson, A.H., Mullin, J.: Adapting communicative strategies to
computer-mediated communication: an analysis of task performance and dialogue
structure. Applied Cognitive Psychology 17(3), 325–348 (2003)
Perspective Use and Perspective Shift in Spatial Dialogue 265
1 Introduction
We are aiming at a formal specification of connections between linguistic repre-
sentations and logical theories of space. Language covers various kinds of spatial
relationships between entities. It can express, for instance, orientations between
them (“the cat sat behind the sofa”), regions they occupy (“the plant is in the
corner”), shapes they commit to (“the terrace is surrounded by a wall”), or
distances between them (“ships sailed close to the coast”). Formal theories of
space also cover various types of relations, such as orientations [1], regions [2,3],
shapes [4], or even more complex structures, such as map hierarchies [5]. Com-
pared to natural language, spatial theories focus on one particular spatial aspect
and specify its underlying spatial logic in detail. Natural language, on the other
hand, comprises all of these aspects, and has thus to be linked to a number of
different spatial theories. This linking has to be specified for each aspect and each
spatial logic, identifying relevant information necessary for a linking or mapping
function. This process involves contextual as well as domain-specific knowledge.
Our overall aim is to provide a general framework for identifying links be-
tween language and space as a generic approach to spatial communication and
independent of concrete kinds of applications in which it is used. It should be ap-
plicable to any spatial context in connection with human-computer interaction,
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 266–282, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Natural Language Meets Spatial Calculi 267
utterance:
spatialModality relatum
3. The relation placement relates the SpatialLocating to the location of the loca-
tum. This location is represented by the GUM category GeneralizedLocation.
It refers to “to the right of the table” in the example. A GeneralizedLocation
specifies the spatial position of a locatum and consists of a spatial term, e.g.
a spatial preposition, and an entity that corresponds to the reference object.
Hence, the GeneralizedLocation defines two relations: spatialModality (spatial
relation) and relatum (reference object). In the example, the spatialModality
is expressed by “to the right of” and the relatum is expressed by “the table”.
The relatum, however, may remain implicit in natural language discourse
[12], such as in the example “the chair is to the right”, i.e. to the right of
an undefined relatum, be it the speaker, listener or another entity. In case
multiple relata are described together with the same spatial modality, they
fill the relation relatum as a collection.
Binding the relatum and the spatialModality in the placement relation is rather
a design issue than a logical constraint. This encapsulation allows convenient
combinations of multiple locations expressed within one configuration: in the
example “The plant is in the corner, by the window, next to the chair.”, one
SpatialLocating defines three placements. This is even more important as soon as
placements are modified by expressing spatial perspectives, spatial accessibility,
extensions or enhancements of the spatial relation. The utterance “The plant is
to the front left of the chair, right here in the corner.” combines two relations
(front and left) with respect to one relatum (the chair), while a second relatum
(in the corner) is combined with possible access information (right here). More-
over, modifications that are encapsulated together with the placement are easier
to compare in case of re-use of spatial placements, e.g. throughout a dialogue
discourse. Moreover, the GeneralizedLocation retains its structure independently
of the configuration. It is equally specified in “he goes to the right of the chair ”
(dynamic spatial configuration) and “he stands to the right of the chair ” (static
spatial configuration), related by different relations (destination and placement).
Types of spatial relationships between locatum and reference objects are de-
scribed by the category SpatialModality. Linguistically, this category corresponds
to a preposition, an adverb, an adjective, or parts of the verb. It is subdivided
into several categories that are primarily grouped into (1) relations expressing
distance between entities, (2) functional dependencies between entities, and (3)
positions between entities relative to each other depending on particular prop-
erties of the entities (such as intrinsic front side, size, shape). There are, how-
ever, intersections between these three general groups. Subcategories that refer
particularly to spatial relationships based on orientations are subsumed under
ProjectionRelation, describing positions between entities relative to each other
depending on particular orientation-based properties of the entities.
FrontProjectionExternal
Access
FrontProjection
FrontProjectionInternal
FrontalProjection
BackProjectionExternal
BackProjection
SpatialDistance
Disjointness
BackProjectionInternal
Parthood
HorizontalProjection
LeftProjectionExternal
LeftProjection
LeftProjectionInternal
LateralProjection
RightProjectionExternal
RightProjection
RightProjectionInternal
information. The perspective (A) and the relatum (B) can even be identical: in
this case, the locatum and the relatum are identical (i.e. A = B). The reference
frame will automatically be intrinsic, and the orientation has to be determined
by the intrinsic front.
Even if GUM’s spatial relationships, then, are linked almost directly with
DCC’s orientations, especially by means of the inherent distinction between
front/back and right/left projections, a missing perspective and relatum of an
utterance have to be inferred and mapped to a DCC representation. What ex-
actly these missing links are, and how an adequate mapping can be constructed
by taking other information into account, is described in the following.
windows
computer writing
table
table desk
chair
plant
TV
2 chairs
on table
fridge
stove coffee
table sofa
kitchen
table armchair
speaker
dining
3 chairs table with
4 chairs
door door
Fig. 5. Room layout of a scene description task, introduced in [7]. Arrows indicate
intrinsic orientations of objects.
Natural Language Meets Spatial Calculi 273
The next sentence “with the table just in front of it (the armchair)” also
refers to an intrinsic frame of reference, but with the armchair as origin, i.e. the
armchair refers to A in DCC (see also Fig. 6), which also coincides with B. In
this case, the locatum (table) is located at a position with one of the orientations
1–4 and 10–14. Hence, information about the armchair’s intrinsic front and the
frame of reference have to be taken into account.
In case of a relative frame of reference as in “to the right of that (the stove)
is the fridge”, the perspective point A is indicated by the speaker, the refer-
ence point B is indicated by the relatum (stove), and the locatum (fridge) is
indicated by a point that refers to one of the orientations 10–12 in DCC. Here,
the frame of reference, the possibility of the stove having an intrinsic front and
the perspective, i.e. the position of the speaker, are relevant for the mapping.
If the relatum has no intrinsic front, it follows that a relative frame of reference
applies. Otherwise, the choice of the underlying frame of reference is based on
user preferences (extracted from the dialogue history) and the likeliness of in-
trinsic vs. relative frame of reference (according to the contextual descriptions).
In cases where the relatum is missing—e.g. the relatum of “further to the
right” is omitted in Example 7—it is usually possible to determine its position
by considering the preceding utterances. Hence, the sequence of utterances may
give implicit information about missing entities in GUM’s representation, and
thus has to be considered throughout the construction of the mapping between
GUM and DCC. Similarly, in Example 9, the given perspective “here” can either
be interpreted as reference to the speaker or to the position that has just been
described in a previous sentence, though a relative frame of reference can be
assumed for explicit perspectives.
Given the corpus data, we conclude that the following parameters are involved
in mapping the linguistic semantics of an utterance to a spatial situation:
Fig. 7 displays the connection of an ontology with a spatial logic for regions
such as S4u , by means of a single link relation E which we might read as ‘is the
spatial extension of’.
Natural Language Meets Spatial Calculi 277
O D
......
.........
Contextual Descriptions - ......
... External Descriptions
E
C (S1 , S2 )(D, O)
6
C E (S1 , S2 )
Internal Descriptions
constraints will typically be different. Similar to conceptual spaces [30], they are
intended to reflect different representational dimensions or layers of a situation.
In this case, the external description does not affect the decidability of the for-
malism, as shown in [28]. Of course, this is not always the case: the computational
benefits of using E-connections as the basic building block in a layered represen-
tation can get lost in case the external descriptions are too expressive. While a
general characterisation of decidability preserving constraints is difficult to give,
this can be dealt with on a case-by-case basis. In particular, the benefits of a
modular design remain regardless of this issue.
Similarly to the above example, when combining GUM with DCC, assuming
Φ axiomatises a LeftProjection (“left of”) within a SpatialLocating configuration,
we need to enforce that elements participating in that configuration are mapped
to elements of DCC models restricted to the five ‘leftness’ relations of DCC (see
Section 3.2).
6
∀x, y, z Φ(x, y, z) → Li (E(x), E(y), E(z))
i=2
This would be a typical external description for C E (GU M, DCC). Note that
any internal description can be turned into an external one in case the exter-
nal language is properly more expressive. However, the converse may be the case
as well. For a (set of) formula(s) χ, denote by Mod(χ) the class of its models. An
Natural Language Meets Spatial Calculi 279
C E (S1 , S2 )(D,
O)
-
(possible) - C E (S1 , S2 )(O)
.... C E (S1 , S2 )(D)
.... (naı̈ve)physics/
contexts -
world knowledge
.. ..
..6 ..6
.. ..
.. ..
..
.. C E (S1 , S
2) ..
..
..
.. - ..
..
. .
linguistic - S1
................. S2
................ (qualitative) spa-
semantics tial reasoning
sorted first-order logic FOLms , and quantified modal logic QS5, are covered.
The DCC composition tables and GUM have already been formalised in Casl,
and it has also been used successfully to formally verify the composition tables
of qualitative spatial calculi [34].
As should be clear from the discussion so far, E-connections can essentially
be considered as many-sorted heterogeneous theories: component theories can
be formulated in different logical languages (which should be kept disjoint or
sorted), and link relations are interpreted as relations connecting the sorts of
the component logics.4
Fig. 9 shows perspectival E-connections as structured logical theories in the
system Hets. Here, dotted arrows denote the extra-logical or external sources of
input for the formal representation, i.e. for the description of relevant context and
world-knowledge; black arrows denote theory extensions, and dashed arrows a
pushout operation into a (typically heterogeneous) colimit theory of the diagram
(see [35,36,37] for technical details).
Acknowledgements
Our work was carried out in the DFG Transregional Collaborative Research
Center SFB/TR 8 Spatial Cognition, project I1-[OntoSpace]. Financial support
by the Deutsche Forschungsgemeinschaft is gratefully acknowledged. The authors
would like to thank John Bateman and Till Mossakowski for fruitful discussions.
4
The main difference between various E -connections now lies in the expressivity of the
‘link language’ L connecting the different logics. This can range from a sub-Boolean
logic, to various DLs, or indeed to full first-order logic.
Natural Language Meets Spatial Calculi 281
References
1. Moratz, R., Dylla, F., Frommberger, L.: A Relative Orientation Algebra with Ad-
justable Granularity. In: Proceedings of the Workshop on Agents in Real-Time and
Dynamic Environments (IJCAI 2005) (2005)
2. Casati, R., Varzi, A.C.: Parts and Places - The Structures of Spatial Representa-
tion. MIT Press, Cambridge (1999)
3. Cohn, A.G., Bennett, B., Gooday, J., Gotts, N.M.: Representing and Reasoning
with Qualitative Spatial Relations. In: Stock, O. (ed.) Spatial and Temporal Rea-
soning, pp. 97–132. Kluwer Academic Publishers, Dordrecht (1997)
4. Schlieder, C.: Qualitative Shape Representation. In: Geographic Objects with In-
determinate Boundaries, pp. 123–140. Taylor & Francis, London (1996)
5. Kuipers, B.: The Spatial Semantic Hierarchy. Artificial Intelligence 19, 191–233
(2000)
6. Kracht, M.: Language and Space, Book manuscript (2008)
7. Bateman, J., Tenbrink, T., Farrar, S.: The Role of Conceptual and Linguistic On-
tologies in Discourse. Discourse Processes 44(3), 175–213 (2007)
8. Freksa, C.: Using Orientation Information for Qualitative Spatial Reasoning. In:
Frank, A.U., Campari, I., Formentini, U. (eds.) Theories and methods of spatio-
temporal reasoning in geographic space, pp. 162–178. Springer, Berlin (1992)
9. Bateman, J.A., Henschel, R., Rinaldi, F.: Generalized Upper Model 2.0: Documen-
tation. Technical report, GMD/Institut für Integrierte Publikations- und Informa-
tionssysteme, Darmstadt, Germany (1995)
10. Horrocks, I., Kutz, O., Sattler, U.: The Even More Irresistible SROIQ. In: Knowl-
edge Representation and Reasoning (KR 2006), pp. 57–67 (2006)
11. Shi, H., Tenbrink, T.: Telling Rolland Where to Go: HRI Dialogues on Route Nav-
igation. In: WoSLaD Workshop on Spatial Language and Dialogue, Delmenhorst,
Germany, October 23-25 (2005)
12. Tenbrink, T.: Space, Time, and the Use of Language: An Investigation of Relation-
ships. Mouton de Gruyter, Berlin (2007)
13. Levinson, S.C.: Space in Language and Cognition: Explorations in Cognitive Di-
versity. Cambridge University Press, Cambridge (2003)
14. Herskovits, A.: Language and Spatial Cognition: An Interdisciplinary Study of
the Prepositions in English. Studies in Natural Language Processing. Cambridge
University Press, London (1986)
15. Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting. The Psychological Se-
mantics of Spatial Prepositions. Essays in Cognitive Psychology. Psychology Press,
Hove (2004)
16. Talmy, L.: How Language Structures Space. In: Pick, H., Acredolo, L. (eds.) Spatial
Orientation: Theory, Research, and Application, pp. 225–282. Plenum Press, New
York (1983)
17. Halliday, M.A.K., Matthiessen, C.M.I.M.: Construing Experience Through Mean-
ing: A Language-Based Approach to Cognition. Cassell, London (1999)
18. Vorwerg, C.: Raumrelationen in Wahrnehmung und Sprache: Kate-
gorisierungsprozesse bei der Benennung visueller Richtungsrelationen. Deutscher
Universitätsverlag, Wiesbaden (2001)
19. Winterboer, A., Tenbrink, T., Moratz, R.: Spatial Directionals for Robot Naviga-
tion. In: van der Zee, E., Vulchanova, M. (eds.) Motion Encoding in Language and
Space. Oxford University Press, Oxford (2008)
282 J. Hois and O. Kutz
20. Cohn, A.G., Hazarika, S.M.: Qualitative Spatial Representation and Reasoning:
An Overview. Fundamenta Informaticae 43, 2–32 (2001)
21. Renz, J., Nebel, B.: Qualitative Spatial Reasoning Using Constraint Calculi. In:
Aiello, M., Pratt-Hartmann, I., van Benthem, J. (eds.) Handbook of Spatial Logics,
pp. 161–215. Springer, Dordrecht (2007)
22. Renz, J., Mitra, D.: Qualitative direction calculi with arbitrary granularity. In:
Zhang, C., Guesgen, H.W., Yeap, W.K. (eds.) PRICAI 2004. LNCS (LNAI),
vol. 3157, pp. 65–74. Springer, Heidelberg (2004)
23. Dylla, F., Moratz, R.: Exploiting Qualitative Spatial Neighborhoods in the Situa-
tion Calculus. In: Freksa, C., Knauff, M., Krieg-Brückner, B., Nebel, B., Barkowsky,
T. (eds.) Spatial Cognition IV: Reasoning, Action, Interaction. International Con-
ference Spatial Cognition 2004. Springer, Heidelberg (2005)
24. Billen, R., Clementini, E.: Projective Relations in a 3D Environment. In: Sester,
M., Galton, A., Duckham, M., Kulik, L. (eds.) Geographic Information Science,
pp. 18–32. Springer, Heidelberg (2006)
25. Gabbay, D., Kurucz, A., Wolter, F., Zakharyaschev, M.: Many-Dimensional Modal
Logics: Theory and Applications. Studies in Logic and the Foundations of Mathe-
matics, vol. 148. Elsevier, Amsterdam (2003)
26. Lewis, D.K.: Counterpart Theory and Quantified Modal Logic. Journal of Philoso-
phy 65; repr. in The Possible and the Actual, Michael J.(ed.) Loux, Ithaca (1979);
also in: David K. Lewis. (ed.) Philosophical papers 1, Oxford 1983, pp. 113–126
(1968)
27. Kutz, O.: E -Connections and Logics of Distance. PhD thesis, The University of
Liverpool (2004)
28. Kutz, O., Lutz, C., Wolter, F., Zakharyaschev, M.: E -Connections of Abstract
Description Systems. Artificial Intelligence 156(1), 1–73 (2004)
29. Cuenca Grau, B., Parsia, B., Sirin, E.: Combining OWL Ontologies Using E -Con-
nections. Journal of Web Semantics 4(1), 40–59 (2006)
30. Gärdenfors, P.: Conceptual Spaces - The Geometry of Thought. Bradford Books.
MIT Press, Cambridge (2000)
31. Serafini, L., Bouquet, P.: Comparing Formal Theories of Context in AI. Artificial
Intelligence 155, 41–67 (2004)
32. Mossakowski, T., Maeder, C., Lüttich, K.: The Heterogeneous Tool Set. In: Grum-
berg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 519–522. Springer,
Heidelberg (2007)
33. CoFI: The Common Framework Initiative: Casl Reference Manual. Springer, Freely
(2004), http://www.cofi.info
34. Wölfl, S., Mossakowski, T., Schröder, L.: Qualitative Constraint Calculi: Hetero-
geneous Verification of Composition Tables. In: Proceedings of the Twentieth In-
ternational Florida Artificial Intelligence Research Society Conference (FLAIRS
2007), pp. 665–670. AAAI Press, Menlo Park (2007)
35. Kutz, O., Mossakowski, T., Codescu, M.: Shapes of Alignments: Construction,
Composition, and Computation. In: International Workshop on Ontologies: Rea-
soning and Modularity (at ESWC) (2008)
36. Kutz, O., Mossakowski, T.: Conservativity in Structured Ontologies. In: 18th Eu-
ropean Conf. on Artificial Intelligence (ECAI 2008). IOS Press, Amsterdam (2008)
37. Codescu, M., Mossakowski, T.: Heterogeneous Colimits. In: Boulanger, F., Gaston,
C., Schobbens, P.Y. (eds.) MoVaH 2008 (2008)
Automatic Classification of Containment and Support
Spatial Relations in English and Dutch
Abstract. The need to communicate and reason about space is pervasive in hu-
man cognition. Consequently, most languages develop specialized terms for de-
scribing relationships between objects in space – spatial prepositions. However,
the specific set of prepositions and the delineations between them vary widely.
For example, in English containment relationships are categorized as in and
support relationships are classified as on. In Dutch, on the other hand, three dif-
ferent prepositions are used to distinguish between different types of support re-
lations: op, aan, and om. In this paper we show how progressive alignment can
be used to model the formation of spatial language categories along the con-
tainment-support continuum in both English and Dutch.
1 Introduction
Being able to reason and communicate about space is important in many human tasks
from hunting and gathering to engineering design. Virtually all languages have de-
veloped specialized terms to describe spatial relationships between objects in their
environments. In particular, we are interested in spatial prepositions. Spatial preposi-
tions are typically a closed-class of words and usually make up a relatively small part
of a language. For example, in English there are only around 100 spatial prepositions.
Understanding how people assign spatial prepositions to arrangements of objects in
the environment is an interesting problem for cognitive science.
Several different aspects of a scene have been shown to contribute to spatial prepo-
sition assignment: geometric arrangement of objects, typical functional roles of ob-
jects (e.g. [9]), whether those functional relationships are being fulfilled (e.g. [4]) and
even the qualitative physics of the situation (e.g. [5]). The particular elements that
contribute to prepositions and how they are used to divide the space of prepositions
has been found to vary widely between languages (e.g. [1, 2]).
This paper shows how progressive alignment can be used to model how spatial
prepositions are learned. Progressive alignment uses the structural alignment process
of structure-mapping theory to construct generalizations from an incremental stream
of examples. The specific phenomena we model here is how people make distinctions
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 283–294, 2008.
© Springer-Verlag Berlin Heidelberg 2008
284 K. Lockwood, A. Lovett, and K. Forbus
along the containment-support continuum in both English and Dutch, based on a psy-
chological experiment by Gentner and Bowerman [11]. To reduce tailorability in
encoding the stimuli, we use hand-drawn sketches which are processed by a sketch
understanding system. We show that our model can learn to distinguish these prepo-
sitions, using (as people do) semantic knowledge as well as geometric information,
and requiring orders of magnitude fewer examples than other models of learning spa-
tial prepositions.
The next section describes the Gentner and Bowerman study that provided the in-
spiration for our experiments. Section 3 reviews structure-mapping theory, progres-
sive alignment, and the analogical processing simulations we use in our model. It
also summarizes the relevant aspects of CogSketch, the sketch understanding system
we used to encode the stimuli, and the ResearchCyc knowledge base we use for
common sense knowledge. Section 4 describes the simulation study. We conclude
by discussing related work, broader issues, and future work.
Table 1. Table showing the containment and support prepositions in English and Dutch.
Drawings here are taken from the original Genter and Bowerman paper.
in in containment
Automatic Classification of Containment and Support Spatial Relations 285
Bowerman and Pederson found in a previous study [1] that some ways of dividing
up the containment-support continuum are very common crosslinguistically while
others are relatively rare. English follows a more linguistically common approach by
grouping all support relations together into the on category while the Dutch op-om-
aan distinction is extremely rare. Both use the very common containment category.
Following the Typological Prevalence Hypothesis, both English and Dutch children
should learn the common and shared category of in around the same time. It should
take Dutch children longer to learn the rare aan/op/om distinctions for support than it
takes the English children to learn the common on category.
2.1 Experiment
They tested children in five age groups (2, 3, 4, 5, and 6 years old) as well as adults
who were native speakers of English and Dutch. Each subject was shown a particular
arrangement of objects and asked to describe the relationship in their native language.
In the original experiment, 3-dimensional objects where used. So, for example, a sub-
ject would be shown a mirror on the wall of a doll house and asked “Where is the mir-
ror”. The set of all stimuli is shown in Table 2 below.
The results of the study were consistent with the Typological Prevalence hypothe-
sis. Specifically, Dutch children are slower to acquire the op, aan, om system of sup-
port relations than English children are to learn the single on category. Both groups
of children learned the in category early and did not differ in their proficiency using
the term. Across all prepositions, English-speaking 3 to 4 year old children used the
correct preposition 77% of the time, while the Dutch children used the correct prepo-
sition 43% of the time. Within the Dutch children, the more typical op category was
learned sooner than the rarer aan and om categories. For a more detailed description
of the results, please see the original paper.
3 Simulation Background
Several existing systems were used in our simulation. Each is described briefly here.
statements that match the example are updated, and the statements of the example that
do not match the generalization are incorporated, but with a probability of 1/n, where
n is the number of examples in that generalization. If the example is not sufficiently
close to any generalization, it is then compared against the list of unassimilated ex-
amples in that context. If the similarity is over the assimilation threshold, the two ex-
amples are used to construct a new generalization, by the same process. An example
that is determined not to be sufficiently similar to either an existing generalization or
unassimilated example is maintained as a separate example.
3.3 CogSketch
3.4 ResearchCyc
Consider the sketch below showing the stimuli “freckles on face”. If you just look at the
topological relationship between the freckle glyphs and the face glyph, they clearly form
a contained glyph group with the face as the container and the freckles as the insider. As
work by Coventry and others has shown [6], geometric properties are not sufficient to
account for the way people label situations with spatial prepositions. A purely geomet-
ric account would declare freckles to be in the face, but we actually say freckles are
on/op faces. To model such findings, we must use real-world knowledge as part of our
simulation. For example, we know that freckles are physically part of a face. We use
knowledge from the ResearchCyc2 as an approximation for such knowledge. Freckles,
for example, are a subclass of PhysiologicalFeatureOfSurface, providing
the semantic knowledge that, combined with geometric information, enables us to
1
Available online at http://spatiallearning.org/projects/cogsketch_index.html. The publicly
available version of CogSketch comes bundled with the OpenCyc KB as opposed to the Re-
searchCyc KB which was used for this work.
2
http://research.cyc.com/
288 K. Lockwood, A. Lovett, and K. Forbus
Fig. 1. Sketch of the spatial arrangement “freckles on face”. If you examine just the geometric
information, the freckles are in the area delineated by the face.
model spatial preposition judgments. As the world’s largest and most complete gen-
eral knowledge base, ResearchCyc contains much of the functional information
needed about the figure and ground objects in our stimuli.
4 Experiment
4.1 Materials
All 32 original stimuli from the Gentner and Bowerman study were sketched using
CogSketch. Each sketch was stored as a case containing: (1) the automatically com-
puted qualitative spatial relationships and (2) information about the types of objects in
the sketch. In the original experiment subjects were cued as to which object should be
the figure (e.g. “where is the mirror”) and which should be the ground. To approxi-
mate this, each sketch contained two glyphs, one named figure and one named
ground, and these names were used by the model. Recall that names in CogSketch are
just strings that are used to refer to the objects. Each object was also conceptually la-
beled using concepts from the ResearchCyc KB. For instance, in the mirror on the
wall stimulus, the mirror was declared to be an instance of the concept Mirror and
the wall was labeled as an instance of WallInAConstruction.
When people learn to identify spatial language categories in their native languages,
they learn to focus on the relationships between objects, and to retain only the impor-
tant features of the objects themselves rather than focusing on the surface features of
the objects. As noted above, having conceptual labels and a knowledge base allows us
to simulate this type of knowledge. For each conceptual label, additional concepts
from its genls hierarchy were extracted from ResearchCyc. The genls hierarchy speci-
fies subclass/superclass relationships between all the concepts of the KB. So, for ex-
ample, Animal and Dog would both be genls of Daschund. Here we were particularly
interested in facts relating to whether objects were surfaces or containers – and this
was particularly important for ground glyphs. The original facts were removed (in our
example “Daschund” would be deleted) to simulate abstraction away from specific
object types to more important semantic categories.
In the original study, the physical objects used as stimuli were manipulated to
make the important relationships more salient to subjects. We approximated this by
drawing our sketches so as to highlight the important relationships for the individual
Automatic Classification of Containment and Support Spatial Relations 289
spatial language categories. For example, the sketches for aan that required showing a
connection by fixed points were drawn from an angle that made the connectivity be-
tween the parts observable. Figure 2 below shows two aan sketches: knob aan door
and clothes aan line. They are drawn from perspectives that allow the system easy ac-
cess to the point-contact relationship.
Fig. 2. Two sketched stimuli showing objects drawn from different angles to make the point
connections salient
4.2 Method
The basic spatial category learning algorithm is this: For each word to be learned, a
generalization context is created. Each stimulus representing an example of that word
in use is added to the appropriate generalization contexts using SEQL. (Since we are
looking at both Dutch and English, each example will be added to two generalization
contexts, one for the appropriate word in each language.) Recall that SEQL can con-
struct more than one generalization, and can include unassimilated examples in its
representation of a category.
We model the act of assigning a spatial preposition to a new example E as follows.
We let the score of a generalization context be the maximum score obtained by using
SME to compare E to all of the generalizations and unassimilated examples in that
context. The word associated with the highest-scoring generalization context repre-
sents the model’s decision.
To test this model, we did a series of trials. Each trial consisted of selecting one
stimulus as the test probe, and using the rest to learn the words. The test probe was
then labeled as per the procedure above. The trial was correct if the model generated
the intended label for that stimulus. There were a total of 32 trials in English (8 for in
and 24 for on) and 32 trials in Dutch (8 each for in, op, aan, and om) one for each
stimulus sketch.
4.3 Results
The results of our experiment are shown below. The generalizations and numbers
given are for running SEQL on all the sketches for a category. The table below sum-
marizes the number of sketches that were classified correctly, for each preposition the
290 K. Lockwood, A. Lovett, and K. Forbus
English Dutch
in 6 75% in 6 75%
op 7 87%
on 21 87% aan 6 75%
om 8 100%
English Dutch
in on in op aan om
Generalizations 2 6 2 2 3 3
Exemplars 2 0 2 2 0 2
number is out of 8 total sketches except for English on which has 24 total sketches.
All results are statistically significant (P < 10-4), except for the English in (P < 0.2),
which is close. For an in-depth discussion of the error patterns, see section 4.4.
Recall that within each generalization context, SEQL was free to make as many gen-
eralizations as it liked. SEQL was also able to keep some cases as exemplars if they
did not match any of the other cases in the context. The table below summarizes the
number of generalizations and exemplars for each context.
Best Generalization IN
Size: 3
(candle in bottle, cookie in bowl, marble in water)
--DEFINITE FACTS:
(rcc8-TPP figure ground)
--POSSIBLE FACTS:
33%: (Basin ground)
33%: (Bowl-Generic ground)
Fig. 3. One of the generalizations for English in along with the sketches for the component
exemplars
Automatic Classification of Containment and Support Spatial Relations 291
At first the amount of variation within the contexts might seem surprising. How-
ever, since the stimuli were chosen to cover the full range of situations for each con-
text it makes more sense. Consider the Dutch category op. The 8 sketches for this one
generalization included very different situations: clingy attachment (e.g. sticker op
cupboard), traditional full support (e.g. cookie op plate) and covering relationships
(e.g. top op jar).
Two of the English generalizations are shown in the figures below. For each gener-
alization the cases that were combined are listed followed by the facts and associated
probabilities.
Best Generalization ON
Size: 2
(top on tube, lid on jar)
--DEFINITE FACTS:
(Covering-Object figure)
(above figure ground)
--POSSIBLE FACTS:
50%: (definiteOverlapCase figure ground)
50%: (rcc8-PO figure ground)
50%: (rcc8-EC figure ground)
Fig. 4. Sample generalizations for English on along with the component sketches
Closer examination of the specific errors made by SEQL is also illuminating. For ex-
ample, both the Dutch and English experiments failed on two in stimuli. It was the
same two stimuli for both languages: flower in book, and hole in towel. The first case,
flower in book, is hard to represent in a sketch. In the original study, actual objects
were used making it easier to place the flower in the book. It is not surprising that this
case failed given that it was an exemplar in both in contexts and did not share much
structure with other stimuli in that context. Hole in towel fails for a different reason.
The ResearchCyc knowledge base does not have any concept of a hole. Moreover,
how holes should be considered in spatial relationships seems different than for
physical objects.
Many of our errors stem from the small size of our stimuli set. For contexts that
contained multiple variations, there were often only one or two samples of each. An
292 K. Lockwood, A. Lovett, and K. Forbus
interesting future study will be to see how many stimuli are needed to minimize error
rates. (Even human adults are not 100% correct on these tasks.) Interestingly, om is
one of the prepositions that is harder for Dutch children to learn (it covers situations
of encirclement with support). However, it was the only Dutch preposition for which
our system scored 100%. This again is probably explainable by sample size. Since the
entire context contained only cases of encirclement with support, there was more in
common between all of the examples.
4.5 Discussion
Our results suggest that progressive alignment is a promising technique for modeling
the learning of spatial language categories. Using a very small set of training stimuli
(only 7 sketches in some cases) SEQL was able to correctly label the majority of the
test cases. An examination of the results and errors indicates that our model, consis-
tent with human data, uses both geometric and semantic knowledge in learning these
prepositions. SEQL is able to learn these terms reasonably well, even with far less
data than human children, but on the other hand, it is given very refined inputs to be-
gin with (i.e., sketches). As noted below, we plan to explore scaling up to larger
stimulus sets in future work.
5 Related Work
There has been considerable cognitive science research into spatial prepositions, in-
cluding a number of computational models. Most computational models (cf. [16, 18,
10]) are based only on geometric information, which means that they cannot model
findings of Coventry et al [6] and Feist & Gentner[9], who showed that semantic
knowledge of functional properties is also crucial. Prior computational models have
also focused only on inputs consisting of simple geometric shapes (squares, circles,
triangles, etc.). We believe our use of conceptually labeled sketches is an interesting
and practical intermediate point between simple geometric stimuli and full 3D vision.
We also differ from many other models of spatial language use in the number of
training trials required. Many current models use orders of magnitude more trials than
we do. We are not arguing that people learn spatial preposition categories after expo-
sure to only 7 examples. After all, children have a much harder task than the one we
have modeled here: they have many more distractions and a much richer environment
from which to extract spatial information. On the other hand, we suspect that requiring
103-104 exposures, as current connectionist models need, is psychologically implausible.
For example, one model requires an epoch of 2100 stimuli just to learn the distinction
above/below/over/under for one arrangement of objects (a container pouring a liquid
into a bowl/plate/dish) [7]. The actual number of trials that is both sufficient and cogni-
tively plausible remains an open question and an interesting problem for future work.
investigation suggested by these results. First, we would like to expand our experi-
ments to include more relationships (e.g. under, over, etc). Second, we would like to
expand to other languages. For example, Korean uniquely divides the containment re-
lationship into tight fit and loose fit relations. Third, we are in the process of building
a sketch library of more instances of spatial relations. With more sketches, we will
have additional evidence concerning the coverage of our model.
There is also clearly a tradeoff between using a cognitively plausible number of
training examples and having enough training examples to get good generality. For
example, being able to automatically extract the important object types and features
(e.g. containers) and ignore the spurious ones (e.g. that something is edible). We are
planning future experiments to examine this issue by varying the number of training
trials used. It will also be interesting to see if we can use the same set of experiments
to model the development of spatial language categories in children by varying the
availability of different types of information.
Acknowledgments. This work was sponsored by a grant from the Intelligent Systems
Program of the Office of Naval Research and by The National Science Foundation
under grant no: SBE-0541957, The Spatial Intelligence and Learning Center. The
authors would like to thank Dedre Gentner and Melissa Bowerman for access to their
in-press paper and stimuli.
References
1. Bowerman, M., Pederson, E.: Crosslinguistic perspectives on topological spatial relation-
ships. In: The 87th Annual Meeting of the American Anthropological Association, San
Francisco, CA (paper presented, 1992)
2. Bowerman, M.: Learning How to Structure Space for Language: A Crosslinguistic Per-
spective. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and
Space, pp. 493–530. MIT Press, Cambridge (1996)
3. Cohn, A.: Calculi for Qualitative Spatial Reasoning. In: Pfalzgraf, J., Calmet, J., Campbell,
J.A. (eds.) AISMC 1996. LNCS, vol. 1138, pp. 124–143. Springer, Heidelberg (1996)
4. Coventry, K.R., Prat-Sala, M., Richards, L.V.: The Interplay Between Geometry and Func-
tion in the Comprehension of ‘over’, ‘under’, ‘above’, and ‘below’. Journal of Memory
and Language 44, 376–398 (2001)
5. Coventry, K.R., Mather, G.: The real story of ‘over’? In: Coventry, K.R., Oliver, P. (eds.)
Spatial Language: Cognitive and Computational Aspects, Kluwer Academic Publishers,
Dordrecht (2002)
6. Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting: The Psychological Semantics of
Spatial Prepositions. Essays in Cognitive Science Series. Lawrence Erlbaum Associates,
Mahwah (2004)
7. Coventry, K.R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., Rich-
ards, L.V.: Spatial prepositions and vague quantifiers: Implementing the functional geo-
metric framework. In: Proceedings of Spatial Cognition Conference. Springer, Germany
(2005)
8. Falkenhainer, B., Forbus, K., Gentner, D.: The Structure-Mapping Engine. In: Proceedings
of the Fifth National Conference on Artificial Intelligence, pp. 272–277. Morgan Kauf-
mann, San Francisco (1986)
294 K. Lockwood, A. Lovett, and K. Forbus
9. Feist, M.I., Gentner, D.: On Plates, Bowls, and Dishes: Factors in the Use of English ‘in’
and ‘on’. In: Proceedings of the 20th Annual Conference of the Cognitive Science Society
(1998)
10. Gapp, K.P.: Angle, distance, shape and their relationship to project relations. In: Moore,
J.D., Lehman, J.F. (eds.) Proceedings of the Seventeenth Annual Conference of the Cogni-
tive Science Society, pp. 112–117. Lawrence Erlbaum Associates Inc., Mahwah (1995)
11. Gentner, D., Bowerman, M.: Why Some Spatial Semantic Categories are Harder to Learn
than Others: The Typological Prevalence Hypothesis (in press)
12. Gentner, D.: Structure-Mapping: A theoretical framework for analogy. Cognitive Sci-
ence 7, 155–170 (1983)
13. Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. American Psy-
chologist 52, 42–56 (1997)
14. Halstead, D., Forbus, K.: Transforming between Propositions and Features: Bridging the
Gap. In: Proceedings of AAAI, Pittsburgh, PA (2005)
15. Kuehne, S., Forbus, K., Gentner, D., Quinn, B.: SEQL: Category learning as progressive
abstraction using structure mapping. In: Proceedings of the 22nd Annual Meeting of the
Cognitive Science Society (2000)
16. Lockwood, K., Forbus, K., Halstead, D., User, J.: Automatic Categorization of Spatial
Prepositions. In: Proceedings of the 28th Annual Conference of the Cognitive Science So-
ciety (2006)
17. Markman, A.B., Gentner, D.: Commonalities and differences in similarity comparisons.
Memory & Cognition 24(2), 235–249 (1996)
18. Regier, T.: The human semantic potential: Spatial language and constrained connection-
ism. MIT Press, Cambridge (1996)
19. Regier, T., Carlson, L.A.: Grounding spatial language in perception: An empirical and
computational investigation. Journal of Experimental Psychology: General 130(2), 273–
298 (2001)
20. Skorstad, J., Gentner, D., Medin, D.: Abstraction Process During Concept Learning: A
Structural View. In: Proceedings of the 10th Annual Conference of the Cognitive Science
Society (1988)
Integral vs. Separable Attributes
in Spatial Similarity Assessments
1 Introduction
Similarity assessment implies a judgment about the semantic proximity of two or
more entities. In a rudimentary form, this process consists of a decomposition of the
entities under comparison into elements in which the entities are the same, and
elements in which they differ (James 1890). People perform such tasks based on their
intuitions and knowledge; however, their judgments are often subjective and display
no strict mathematical models (Tversky 1977). Formalized similarity assessments are
critical ingredients of Naive Geography (Egenhofer and Mark 1995), which serves as
the basis for the design of intelligent GISs that will act and respond much like a
person would. The challenge for machines to perform similarly is the translation of a
qualitative similarity assessment into the quantitative realm of similarity scores,
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 295–310, 2008.
© Springer-Verlag Berlin Heidelberg 2008
296 K.A. Nedas and M.J. Egenhofer
typically within the range of 0 (worst match) to 1 (best match). This paper addresses
similarity within the context of spatial database systems.
Spatial similarity assessment is commonly based on the comparisons of spatial ob-
jects, which are typically characterized by geometric (Bruns and Egenhofer 1996) and
thematic (Rodríguez and Egenhofer 2004) attributes. Geometric attributes are associ-
ated with the objects’ shapes and sizes, while thematic attributes capture non-spatial
information. For example, the class of Rhodes is island, its name and population are
thematic attributes, while a shape description such as the ratio of the major and minor
axes of its minimum bounding rectangle provides values for its geometric attributes.
The same dichotomy of spatial and thematic characteristics applies to relations. For
example, Rhodes, which is disjoint from the Greek mainland and located 650km
southeast of Thessaloniki, has a smaller population than Athens. Spatial similarity
assessments consider the objects’ characteristics and relations.
The similarity of two spatial objects is typically computed with a distance (i.e.,
dissimilarity) measure that is defined upon the objects’ representations. To yield cog-
nitively plausible results, this estimate must match with people’s notions of object
similarity (Gärdenfors 2000). A critical aspect in this process is the role of an aggre-
gation function, which combines atomic judgments (i.e., comparisons of pairs of
attribute values) into an overall composite measure for pairs of objects. Separable
attributes are perceptually independent as they refer to properties that are obvious,
compelling, and clearly perceived as two different qualities of an entity (Torger-
son 1965). Conversely, integral attributes create a group when their values are con-
ceptually correlated, but lack an obvious separability (Ashby and Townsend 1986).
Conceptual correlation implies that the values of such attributes are perceived as a
single property, independent of their attributes’ internal representations (e.g., as a set
of concomitant attributes). While general-purpose information systems employ pri-
marily separable attributes, such as age, job title, salary, and gender in a personnel
database, a significant amount of integral attributes may be hidden in the representa-
tional formalisms that GISs employ to model the complex topological relations of
spatial objects (Egenhofer and Franzosa 1995; Clementini and di Felice 1998). The
set of possible integral attributes grows with metric refinements of topological rela-
tions (Egenhofer and Shariff 1998; Nedas et al. 2007).
Psychological research has converged to a consensus that aggregation functions
should differ depending on whether the atomic judgments are made on separable or
integral attributes (Attneave 1950; Nosofsky 1986; Shepard 1987; Nosofsky 1992;
Takane and Shibayama 1992; Hahn and Chater 1997; Gärdenfors 2000). Since the
recognition of the integral attributes and the form of the aggregation function affect
the rankings at the object level, spatial information systems should employ a psycho-
logically compliant model (i.e., a model that acocunts for integral attributes) for simi-
larity assessments using psychologically correct aggregation functions to determine
the similarity of a result to a query. Most of the current studies and prototypes, how-
ever, do not account for integral attributes as they use psychologically deviant meth-
ods, making no distinctios between separable and integral attribtues.
Would the incorporation of psychologically compliant provisions into a formalized
spatial similarity assessment yield different similarity results? To answer this ques-
tion, this paper sets up a similarity simulation that generates a broad spectrum of ex-
perimental results for spatial similarity queries. This simulation provides a rationale
Integral vs. Separable Attributes in Spatial Similarity Assessments 297
2 Similarity Measures
Similarity-based information retrieval goes beyond the determination of an exact
match between queries and stored data. It provides the users with a range of possible
answers, which are the most similar to the initial requests and, therefore, the most
likely to satisfy their queries. The results of such spatial queries are ranked (Hjaltason
and Samet 1995) according to similarity scores, enabling exploratory access to data
by browsing, since users usually know only approximately what they are looking for.
Such similarity-based retrieval also relieves users from the burden of reformulating a
query repeatedly until they find useful information.
Findings from psychology about the way that people perceive the nature of similarity,
its properties, and its relationship to peripheral notions, such as difference and
dissimilarity, are largely ignored in computational similarity assessments. The focus
on the computational feasibility and efficiency, while dismissing cognitive elements,
renders the plausibility of such approaches to human perception questionable. The
similarity of one object to another is an inverse function of the distance between the
objects in a conceptual space, that is the collection of one or more domains
(Gärdenfors 2000). Attribute weights that indicate each dimension’s salience within
the space offer a refined similarity asssessment. The distance in a conceptual space
indicates dissimilarity, which should be compatible with people’s judgments of
overall dissimilarity; therefore, its correct calculation becomes important. Following
widely accepted psychological research (Attneave 1950; Torgerson 1965; Shepard
1987; Ashby and Lee 1991; Nosofsky 1992; Gärdenfors 2000), the perceived
Integral vs. Separable Attributes in Spatial Similarity Assessments 299
interpoint distances between the objects’ point representations in that space should be
computed either by a Euclidean metric or a city-block metric (also known as the
Manhattan distance).
Which one to employ depends on whether one deals with integral or separable di-
mensions. Integral dimensions are strongly unanalyzable and typically perceived as a
single stimulus. For instance, the proximity of two linear objects may be described
with a number of measures that associate the boundaries and interiors of the objects
(Nedas et al. 2007), but the closeness relation may be perceived as one stimulus from
the users that inspect the lines. Hence, a set of integral dimensions constitutes in es-
sence one multi-dimensional attribute (Torgerson 1965). Separable dimensions, on the
other hand, are different and distinct properties (e.g., length and height) that are per-
ceptually independent (Ashby and Lee 1991). It has been suggested and experimen-
tally confirmed (Attneave 1950; Torgerson 1965; Shepard 1987) that, with respect to
human judgments for similarity, a Euclidean metric performs better with integral
dimensions, whereas a city-block metric matches more closely separable dimensions.
Perceptually separable dimensions are expected to have a higher frequency of oc-
currance in databases; therefore, in the general case the composite dissimilarity indi-
cator between two objects will be calculated by the weighted average of individual
dissimilarities along each of the dimensions. For a group of n integral attributes, how-
ever, an Euclidean metric should be adopted to derive the dissimilarity of the objects
with respect to this integral group. Therefore, the combination of the n concomitant
attributes of an integral group should yield one dissimilarity component rather than n
individual components in the composite measure (Figure2).
Fig. 2. Combining the dissimilarity values d4 and d5 of two integral attributes (Attribute 4 and
Attribute 5) into a single dissimilarity component, before summing it up with the dissimilarity
values d1 … d3 (of the separable attributes 1…3) to determine the overall dissimilarity D be-
tween a DB Object and a Query Object
Most approaches to compute the deviations between two ranking lists (Mosteller and
Rourke 1973; Gibbons 1996) rely on statistical tests, which consider the entire range
of the lists. An evaluation of ranking lists produced from database queries or web
search queries is different, however, as they focus only on the first few ranks, because
the relevance of retrieved items decreases rapidly for lower ranks. For the
experiments in this study, the relevant portion of the ranking list was defined as the
ten best hits. This decision was partially based on the experimental outcomes that
people retain no more than five to nine items in short term memory (Miller 1956).
The 7+/-2 rule refers to unidimensional stimuli; therefore, people are expected to be
able to retain this number of results in short term memory only for very simple
queries. This choice was also based on the typical strategy of current web-search
engines, which present ten items per page, starting from the most relevant. Therefore,
the set of the ten best results is not only easy to browse and inspect, but also
convenient in the sense that users can memorize it to a large degree and perform swift
comparative judgments about the relevance of each match to their query.
As the database size grows, the ranks of the ten best results are determined based on
finer differences of their similarity values. If one also considers that psychologically
compliant methods approximate better, but do not necessarily model human perception
Integral vs. Separable Attributes in Spatial Similarity Assessments 301
exactly, then a measure of incompatibility that relies only on rank differences would be
strict. A more practical and objective indicator of the incompatibility between two
methods considers instead the overlap of common objects within the relevant portion of
the ranking lists. This measure, denoted by O, expresses the percentage of the common
items within the ten best results that the compared methods produce. The selection of
this measure is also further justified by the fact that each of the items in the relevant
portion is equally accessible to the users (i.e., ten results per page).
The actual rank differences are examined as a secondary, less crucial index of in-
compatibility. They are used as an additional criterion when the overlap measure
provides borderline evidence for that purpose. The rank differences are assessed using
a modified Spearman Rank Correlation (SRC) test. This test is an appropriate statistic
for ordinal data, provided that its resulting coefficient is used only to test a hypothesis
about order (Stevens 1951). The SRC coefficient R, with xi and yi as the rank orders of
item i in two compared samples that contain n items each (Equation 1), takes a value
between –1 and +1, where +1 indicates perfect agreement between two samples (i.e.,
the elements are ranked identically), while –1 signals complete disagreement (i.e., the
elements are ranked in inverse order). A value of 0 means that there is no association
between the two samples, whereas other values than 0, 1, and –1 would indicate in-
termediate levels of correlation.
n
6∑ (xi − yi ) 2 (1)
R = 1− i=1
n ⋅(n −1)2
The SRC coefficient and similar statistics are designed for evaluations of ranking lists
that contain exactly the same elements. Hence, it cannot be readily applied to tests
that require a correlation value between a particular subsection of the ranking lists.
This observation is essential, because the items in the relevant portion of the lists will
only incidentally be the same for two different methods. To enable the comparison of
lists with different numbers of entries, a modified SRC coefficient is computed as
Fig. 3. Overlap percentage O and modified Spearman Rank Correlation coefficient R' for the
relevant portion of two ranking lists
302 K.A. Nedas and M.J. Egenhofer
follows: first, the different elements in the two lists are eliminated and R (Equation 1)
is computed for the common elements that remain. Then, the modified coefficient R'
is calculated by multiplying R with the overlap percentage O (Figure 3). This second
step is necessary in order to avoid misleading results. For example, when among the
top ten items only one common element exists, R = 1, but R' = 0.1.
Methods that produce very similar results are characterized by positive values of
the measures O and R', close to 1, whereas methods that produce very dissimilar re-
sults are characterized by an overlap value close to 0 and by a modified SRC coeffi-
cient value close to 0 or negative.
The dissimilarities of the ranks for an object query with different methods are
captured through the incompatibility measures O and R', which are each functions of
five variables n, m, p, g, and d.
• Variable n is the number of objects in the database, determining the database size.
The experiments were conducted for the set N = {1,000, 5,000, 25,000, 100,000},
so that each database size increases approximately one order of magnitude over its
predecessor. A dataset of 1,000 objects was adopted as a characteristic case of a
small database, a dataset of 100,000 objects as a characteristic case of a large data-
base, with datasets of 5,000 and 25,000 objects as representatives of medium-small
and medium-large databases, respectively.
• Variable m is the number of attributes that participate in the similarity assessment
of a database object to a query object. The set examined is
M = {2, 5, 10, 20, 30, 40, …, 100} and accounts for the most simple and complex
modeled objects. The case of queries on a single attribute is omitted, because it is
irrelevant for this investigation. One integral attribute is undefined, because it es-
sentially degenerates to one separable attribute.
• Variable p is the percentage of integral attributes out of the total number of attrib-
utes m. The actual number of integral attributes is, therefore, p⋅m. In this manner, p
also indirectly determines the number of separable attributes. The percentages
taken are p = {0%, 10%, 20%, …, 100%). The two extreme values represent the
cases where all attributes are separable (0%) and integral (100%).
• Variable g is the number of integral groups in which the integral attributes are
distributed. Its values are constrained by the specific instantiations of m and p. For
example, for objects with ten attributes (m = 10), four of which are integral
(p = 40%), there could be one group of four attributes or two groups of two attrib-
utes. For the experiment, g has a range from 1 to 50. The smallest value occurs in
various settings, starting with m = 2 and p = 100%. The largest value occurs only if
m = 100 and p = 100%.
• Variable d is the group distribution policy, specifying how a number of integral
attributes p⋅m is distributed into g integral groups. For some configurations there
could be numerous such possibilities. For instance, eight integral attributes that are
distributed into two groups can yield several different allocations, such as 6-2, 5-3,
and 4-4. Preliminary experimentation indicated that the similarity results could be
affected by the distribution policy, especially for larger percentages of integral
Integral vs. Separable Attributes in Spatial Similarity Assessments 303
attributes. This parameter is treated as a binary variable taking the values “optimal”
and “worst.” An optimal distribution policy tries to distribute the integral attributes
evenly, such that each integral group contains approximately the same number of
attributes (Figure 4a), whereas a worst-case distribution policy creates dispropor-
tionately-sized groups by assigning as many attributes as possible to one large in-
tegral group, while populating the remaining groups with the smallest number of
attributes (Figure 4b). The binary treatment of the group distribution policy allows
inferences about the behavior of this variable between its two extremes settings,
while keeping the number of produced diagrams within realistic limits.
Fig. 4. Splitting integral attributes into groups using (a) an optimal and (b) a worst distribution
policy
Fig. 5. A 4-dimensional diagram depicting the measures (a) O and (b) R' (color figure available
at http://www.spatial.maine.edu/~max/similarity/4D-0.pdf)
of the deviant method. The interactions behind these deviations explain the outcome
illustrated in the diagrams.
The main conclusion is that the measures O and R' become progressively worse as
the percentage of integral attributes increases and the number of groups in which
these integral attributes are distributed decreases. When either or both trends occur,
the aggregated terms with the compliant method reduce to a number much less than
m. For example, for one separable attribute, nine integral attributes, and three groups,
the deviant method aggregates ten terms and the compliant four terms. Moreover, the
effect of the one remaining separable attribute with the compliant method is dispro-
portionate on the final score compared to that of the other attributes. As the number of
groups increases, the measures have a greater concordance, because the impact of
such isolated attributes on the final score diminishes.
This observation also explains the dissonance to the deterioration pattern observed
at the highest layer of the optimal distribution policy diagrams, where such separable
attributes disappear. The even distribution of integral attributes into groups makes the
compliant method behave similarly to the deviant at this layer. For example, consider
a query with ten attributes, all of which are integral and must be distributed in five
groups. The deviant approach will aggregate all ten attributes as separable. The com-
pliant will first separate the ten attributes in groups of two, aggregate each group, and
combine the resulting five terms to derive the object’s similarity. For a single group,
the compliant method becomes identical to the Euclidean distance function. The trend
of deterioration, however, is not interrupted at the highest layer of the diagrams for
the worst distribution policy because the group sizes with this policy differ drastically.
In this case, the smaller integral groups continue to have a disproportionate influence
on the final similarity score.
The more uniform the distribution into groups is, the less significant the effects on
the measures O and R' become. The wavy patterns at the higher layers of the optimal-
distribution diagrams also support this conclusion. Such effects are due to the alternat-
ing exact and approximate division of integral attributes into groups. For example, for
nine integral attributes and three groups the division is exact with three attributes in
each group, while for ten or eleven integral attributes, the groups differ in size. In the
diagrams of the worst distribution policy where group sizes remain consistently im-
balanced, the small stripes of temporary improvements disappear. Excluding the wavy
patterns and the case of all attributes being integral, the measures appear to be invari-
ant to the group distribution policy elsewhere.
The results worsen slightly with an increase in the number of attributes; however,
the influence of this variable is much more subtle compared to the others. When the
attribute number is very small, the methods are often identical, because the attributes
are insufficient to form integral groups (e.g., for two attributes and up to 50% per-
centage of integral attributes). This observation explains the cause for the very high
values of O and R' detected at the rightmost edge of the diagrams.
The compared methods also yield progressively different outcomes as the database
size increases. This result was anticipated, because two functions are expected to
demonstrate approximately the same degree of correlation regardless of the sample
size with which they are tested. Hence, if the entire ranking lists were considered (i.e.,
if the lists contained all database objects), and assuming all other variables equal, the
two compared methods would exhibit on average the same correlation, regardless of
306 K.A. Nedas and M.J. Egenhofer
Fig. 6. Overview of the results acquired from the experiment (color figure available at http://
www.spatial.maine.edu/~max/similarity/4D-1.pdf)
Integral vs. Separable Attributes in Spatial Similarity Assessments 307
the database size. Increasing the number of objects in the database, while keeping the
size of the relevant portion constant leaves more potential for variations within the
ten best results and explains why the overlaps and correlations decline for larger
databases.
Both O and R' take a value of 1 at the lowest layer where all attributes are separa-
ble and the compared methods coincide. For all other db scenarios, the modified
Spearman Rank Correlation coefficient R' has a lower value than the overlap O. This
result is not surprising considering that R' is a stricter measure than O. The diagrams
suggest that the correct recognition of integral attributes and groups is immaterial for
smaller datasets as long as the percentage of integral attributes remains below 40%.
For the largest database considered this limit drops to around 20%. At these percent-
ages, O and R' have values of 0.5 and 0.2, respectively. Such values constitute border-
line measurements, because they imply that only half of the retrieved objects in the
relevant portion are the same and that these common objects are ranked very differ-
ently. The need for different treatments of separable vs. integral attributes also cor-
roborated by the actual sizes of real-world geographic databases, which are often
much larger than the largest dataset in this experiment. Only for objects with very
small numbers of attributes—no more than two or three attributes for the objects—the
recognition of integral attributes is negligible.
5 Conclusions
Computational similarity assessments among spatial objects typically compare the
values of corresponding attributes and relations employing distance functions to
capture dissimilarities. Psychological findings have suggested that different types of
aggregation functions—for the conversions from the attributes’ similarity values to
the objects’ similarity values—should be used depending on whether the attributes are
separable (which reflects perceptual independence) or whether they are integral
(which reflects a dependency among the attributes). Current computational similarity
methods have ignored the potential impact of such differences, however, treating all
attributes and their values homogeneously.
An experimental comparison between a psychologically compliant approach
(which recognizes groups of integral attributes) and a psychologically deviant ap-
proach (which fails to detect such groups) showed that the rankings produced with
each method are incompatible. The results do not depend per se on the correlation of
the attribute dimensions. It is the choice of the aggregation function that yields the
object similarities depending on whether the attributes are perceptually distinguish-
able or not and, therefore, the perceptual plausibility of the obtained results will be
affected if one ignores the perceptual "correlation" of the attribute dimensions. The
simulations showed that even for a modest amount of integral attributes within the
total set of attributes considered, the dissimilarities are pronounced, particularly in
the presence of a single integral group or a small number of them. This trend worsens
for large-scale databases. Both scenarios correspond closely to spatial representations
and geographic databases. The structure of the current formalisms used to represent
detailed topological, directional, and metric relations is often based on criteria other
than a one-to-one correspondence between the representational primitives employed
308 K.A. Nedas and M.J. Egenhofer
and human perception. Such formalisms are likely to contain one or few integral
groups within their representation. Furthermore, geographic databases are typically
large, in the order of 105 or 106 objects. This result is, therefore, significant, because
it suggests that existing similarity models may need to be revised such that new simi-
larity algorithms must consider the possible presence of perceptually correlated
attributes.
Future work should consider the impact of these findings beyond highly-structured
spatial databases to embrace the less rigid geospatial semantic web (Egenhofer 2002),
which is driven by ontologies. Similarity relations fit well into an ontological frame-
work, because it is expected that people who commit to the same ontology perceive
identically not only the concepts that are important in their domain of interest, but
also the similarity relations that hold among these concepts. This alignment of indi-
vidual similarity views towards a common similarity view is emphasized by the fact
that ontologies already have inherent a notion of qualitative similarity relations among
the concepts that they model. This notion is reflected in their structure (i.e., in the way
they specify classes and subclasses) and in the properties and roles that are attributed
to each concept. Formalizing similarity within ontologies would be a step forward in
the employment of ontologies not only as means for semantic integration, but also as
tools for semantic management, and would help their transition from symbolic to
conceptual constructs.
Acknowledgments
This work was partially supported by the National Geospatial-Intelligence Agency
under grant numbers NMA401-02-1-2009 and NMA201-01-1-2003.
References
Ashby, F., Lee, W.: Predicting Similarity and Categorization from Identification. Journal of
Experimental Psychology: General 120(2), 150–172 (1991)
Ashby, F., Townsend, J.: Varieties of Perceptual Independence. Psychological Review 93(2),
154–179 (1986)
Attneave, F.: Dimensions of Similarity. American Journal of Psychology 63(4), 516–556
(1950)
Basri, R., Costa, L., Geiger, D., Jacobs, D.: Determining the Similarity of Deformable Shapes.
Vision Research 38, 2365–2385 (1998)
Bruns, T., Egenhofer, M.: Similarity of Spatial Scenes. In: Kraak, M.-J., Molenaar, M. (eds.)
Seventh International Symposium on Spatial Data Handling (SDH 1996), Delft, The Nether-
lands, pp. 173–184. Taylor & Francis, London (1996)
Clementini, E., di Felice, P.: Topological Invariants for Lines. IEEE Transactions on Knowl-
edge and Data Engineering 10(1), 38–54 (1998)
Dey, D., Sarkar, S., De, P.: A Distance-Based Approach to Entity Reconciliation in Heteroge-
neous Databases. IEEE Transactions on Knowledge and Data Engineering 14(3), 567–582
(2002)
Egenhofer, M.: Query Processing in Spatial-Query-by-Sketch. Journal of Visual Languages and
Computing 8(4), 403–424 (1997)
Integral vs. Separable Attributes in Spatial Similarity Assessments 309
Egenhofer, M.: Towards the Semantic Geospatial Web. In: Voisardand, A., Chen, S.-C. (eds.)
10th ACM International Symposium on Advances in Geographic Information Systems,
McLean, VA, pp. 1–4 (2002)
Egenhofer, M., Franzosa, R.: On the Equivalence of Topological Relations. International Jour-
nal of Geographical Information Systems 9(2), 133–152 (1995)
Egenhofer, M., Mark, D.: Naive Geography. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995.
LNCS, vol. 988, pp. 1–15. Springer, Heidelberg (1995)
Egenhofer, M., Shariff, R.: Metric Details for Natural-Language Spatial Relations. ACM
Transactions on Information Systems 16(4), 295–321 (1998)
Gärdenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge (2000)
Gibbons, J.: Nonparametric Methods for Quantitative Analysis. American Sciences Press,
Syracuse (1996)
Goyal, R., Egenhofer, M.: Similarity of Cardinal Directions. In: Jensen, C., Schneider, M.,
Seeger, B., Tsotras, V. (eds.) Proceedings of the Seventh International Symposium on Spa-
tial and Temporal Databases, Los Angeles, CA. LNCS, vol. 2121, pp. 36–55. Springer, Hei-
delberg (2001)
Gudivada, V., Raghavan, V.: Design and Evaluation of Algorithms for Image Retrieval by
Spatial Similarity. ACM Transactions on Information Systems 13(1), 115–144 (1995)
Hahn, U., Chater, N.: Concepts and Similarity. In: Lamberts, K., Shanks, D. (eds.) Knowledge,
Concepts, and Categories, pp. 43–92. MIT Press, Cambridge (1997)
Hjaltason, G., Samet, H.: Ranking in Spatial Databases. In: Egenhofer, M.J., Herring, J.R.
(eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)
James, W.: The Principles of Psychology. Holt, New York (1890)
Li, B., Fonseca, F.: TDD: A Comprehensive Model for Qualitative Similarity Assessment.
Spatial Cognition and Computation 6(1), 31–62 (2006)
Miller, G.: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for
Processing Information. The Psychological Review 63(1), 81–97 (1956)
Mosteller, F., Rourke, R.: Sturdy Statistics: Nonparametric & Order Statistics. Addison-
Wesley, Menlo Park (1973)
Nabil, M., Ngu, A., Shepherd, J.: Picture Similarity Retrieval using the 2D Projection Interval
Representation. IEEE Transactions on Knowledge and Data Engineering 8(4), 533–539
(1996)
Nedas, K.: Semantic Similarity of Spatial Scenes. Ph.D. Dissertation, Department of Spatial
Information Science and Engineering, University of Maine (2006)
Nedas, K., Egenhofer, M., Wilmsen, D.: Metric Details for Topological Line-Line Relations.
International Journal of Geographical Information Science 21(1), 21–24 (2007)
Nosofsky, R.: Attention, Similarity, and the Identification-Categorization Relationship. Journal
of Experimental Psychology: General 115(1), 39–57 (1986)
Nosofsky, R.: Similarity Scaling and Cognitive Process Models. Annual Review of Psychol-
ogy 43(1), 25–53 (1992)
Rodríguez, A., Egenhofer, M.: Determining Semantic Similarity among Entity Classes from
Different Ontologies. IEEE Transactions on Knowledge and Data Engineering 15(2), 442–
456 (2003)
Rodríguez, A., Egenhofer, M.: Comparing Geospatial Entity Classes: An Asymmetric and
Context-Dependent Similarity Measure. International Journal of Geographical Information
Science 18(3), 229–256 (2004)
Shepard, R.: Toward a Universal Law of Generalization for Psychological Science. Journal of
Science 237(4820), 1317–1323 (1987)
310 K.A. Nedas and M.J. Egenhofer
Stevens, S.: Mathematics, Measurement, and Psychophysics. In: Stevens, S. (ed.) Handbook of
Experimental Psychology, pp. 1–49. John Wiley & Sons, Inc., New York (1951)
Takane, Y., Shibayama, T.: Structures in Stimulus Identification Data. In: Ashby, F. (ed.) Prob-
abilistic Multidimensional Models of Perception and Cognition, pp. 335–362. Earlbaum,
Hillsdale (1992)
Torgerson, W.: Multidimensional Scaling of Similarity. Psychometrika 30(4), 379–393 (1965)
Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)
Wentz, E.: Developing and Testing of a Trivariate Shape Measure for Geographic Analysis.
Geographical Analysis 32(2), 95–112 (2000)
Spatial Abstraction: Aspectualization,
Coarsening, and Conceptual Classification
1 Introduction
Abstraction is one of the key capabilities of human cognition. It enables us to
conceptualize the surrounding world, build categories, and derive reactions from
them to cope with a certain situation. Complex and overly detailed circumstances
can be reduced to much simpler concepts and not until then it becomes feasible
to deliberate about conclusions to draw and actions to take.
Certainly, we want to see such abstraction capabilities in intelligent artificial
agents too. This requires us to implement abstraction principles in the knowledge
representation used by the artificial agent. First of all, abstraction is a process
transforming a knowledge representation. But how can this process be charac-
terized? We can distinguish three different facets of abstraction. For example it
is possible to regard a subset of the available information only, or the level of
detail of every bit of information can be reduced, or the available information
can be used to construct new, more abstract entities. Intuitively, these types of
abstraction are different and lead to different results as well. Various terms have
been coined for abstraction principles, distributed over several scientific fields
like cognitive science, artificial intelligence, architecture, linguistics, geography,
and many more. Among others we find the terms granularity [1,2], generaliza-
tion [3], schematization [4,5], idealization [5], selection [5,6], amalgamation [6],
or aspectualization [7]. Unfortunately, some of these terms define overlapping
concepts, different ones sometimes have the same meaning, or a single term is
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 311–327, 2008.
Springer-Verlag Berlin Heidelberg 2008
312 L. Frommberger and D. Wolter
used for different concepts. Also, these terms are often not distinguished in an
exact manner or only defined by giving examples.
In this work we take a formal view from a computer scientist’s perspective.
We study abstraction as part of knowledge representation. Our primary concern
is representation of spatial knowledge, yet we aim at maintaining a perspective
as general as possible, allowing adaption to other domains. Spatial information
is rich and can be conceptualized in a multitude of ways, making its analysis
challenging as well as relevant to applications. Handling of spatial knowledge is
essential to all agents acting in the real world.
One contribution of this article is a formal definition of abstraction processes:
aspectualization, coarsening, and conceptual classification. We characterize their
properties and investigate into the consequences that arise when using abstrac-
tion in agent control processes. Applying the formal framework to a real appli-
cation in robot navigation exemplifies its utility. Appropriate use of abstraction
allows knowledge learned in a simplified computer simulation to be transferred
to a control task with a real autonomous robot. Aspectualizable knowledge rep-
resentations which we introduce and promote in this paper play a key role. The
exemplified robot application shows how abstraction principles empower intel-
ligent agents to transfer decision processes, thereby being able to cope with
unfamiliar situations. Put differently, aspectualizable knowledge representations
enable knowledge transfer.
This paper is organized as follows: In Section 2 we give our definition of the
spatial abstraction paradigms and discuss the role of abstraction in knowledge
representation and its utility in agent control tasks. Section 3 covers the case
study of learning navigational behavior in simulation and transferring it to a
real robot. The paper ends with a discussion of formal approaches to spatial
abstraction and their utility (Section 4) and a conclusion.
The term abstraction is etymologically derived from the Latin words “abs” and
“trahere”, so the literal meaning is “drawing away”. However, if we talk about
abstraction in the context of information processing and cognitive science, ab-
straction covers more than just taking away something, because it is not intended
merely to reduce the amount of data. Rather, abstraction is employed to put the
focus on the relevant information. Additionally, the result is supposed to gen-
eralize and to be useful for a specific task at hand. We define abstraction as
follows:
We first concentrate on information reduction. Let us say that all potential values
of a knowledge representation are elements of a set S which can be regarded as
Spatial Abstraction 313
2.1 Aspectualization
Aspects are semantic concepts. They are pieces of information that represent
certain properties. For example, if we record the trajectory of a moving robot,
we have a spatio-temporal data set denoting at what time the robot visited which
place. Time and place are two different aspects of this data set. Aspectualization
singles out such aspects.
Aspects may span over several features si . However, to be able to single out an
aspect from a feature vector by aspectualization, it must be guaranteed that no
feature refers to more than one aspect. We call this property aspectualizability:
aspectualization
(e.g., by focusing
on shape, disregarding
object color)
2.2 Coarsening
When the set of values a feature can take is reduced, we speak of a coarsening:
Definition 4. Coarsening is the process or result of reducing the details of in-
formation of an observation by lowering the granularity of the input space. For-
mally, it is defined as a function κ : Dn → Dn (n ∈ N),
Proof. Choose ϕ(s) = (s + κc (s), κc (s)) , ϕ−1 (t1 , t2 ) = t1 + (−t2 ) and κa (t1 , t2 ) =
t2 , and define (S , ⊕) with S = Image(ϕ) and t ⊕ u = ϕ ϕ−1 (t) + ϕ−1 (u)
for each t, u ∈ S . Checking that (S , ⊕) is a group and ϕ a homomorphism is
straightforward.
Note that Theorem 1 does not introduce additional redundancy into the rep-
resentation. If we would allow for introducing redundancy we could bijectively
create new representations by concatenating s and an arbitrary abstraction κ(s)
with the effect that any abstraction, including conceptional classification, can
always be achieved by an aspectualization from this representation. Therefore,
we do not regard this kind of redundancy here.
Not every representation allows for coarsening, as the following example shows:
the view point of accessibility we now argue for preferring the use of aspec-
tualizable representations, as relevant aspects are clearly separated and easy to
access and aspectualization itself is a computationally simple process. Accessibil-
ity eases knowledge extraction: Section 3.3 will show an example of an algorithm
that makes use of the aspectualizability of a representation. Once again, aspectu-
alizability can be achieved by abstraction. In particular, conceptual classification
is a powerful means. So abstraction helps to create representations that allow
for distinguishing different aspects by using aspectualization.
Fig. 2. Left: a screenshot of the robot navigation scenario in the simulator, where the
strategy is learned. Right: a Pioneer 2 in an office building, where the strategy shall
be applied. The real office environment offers structural elements not present in the
simulator: open space, uneven walls, tables, and other obstacles.
knowledge of a learning task and transferring it to another one has recently been
labeled transfer learning, and several approaches have been proposed to tackle
this problem [12,13,14, e.g.]. We will describe how such transfer capabilities can
be achieved by spatial state space abstraction and we will point out how abstrac-
tion mechanisms allow for knowledge transfer in a more general sense: Learned
navigation knowledge is not only transferable to a similar task with another
goal location, but abstraction allows us to operate on the same abstract entities
in quite different tasks. We will show that the spatial state space abstraction
approach even allows for bridging the gap between results gained in a simple
simulator and real robotics just by the use of spatial abstraction.
In our simulation scenario, the robot is able to perceive walls around it as line
segments within a certain maximum range. This perception is disturbed by noise
such that every line segment is detected as several smaller ones. The agent can
also identify the walls. In our simulator, this is modeled in a way that every wall
has a unique color and the agent perceives the color of the wall. The robot is
capable of performing three actions: moving forward and turning a few degrees
either to the left or to the right. Turning includes a small forward movement;
and some noise is added to all actions. There is no built-in collision avoidance
or any other navigational intelligence provided.
For learning we use the reinforcement learning paradigm of Q-learning [15].
The result is a Q-function that assigns an expected overall reward to any state-
action pair (s, a) and a policy π(s) = argmaxa Q(s, a) that delivers the action
with the highest expected reward for every state s.
320 L. Frommberger and D. Wolter
14 13 12 14 13 12
15 11 15 4 3 2 11
16 10 16 5 1 10
(a) (b)
Fig. 3. Neighboring regions around the robot in relation to its moving direction. Note
that the regions R1 , . . . , R5 in the immediate surroundings (b) overlap R10 , . . . , R16
(a). The size of the grid defining the immediate surroundings is given a-priori. It is
a property of the agent and depends on its size and system dynamics (for example,
the robot’s maximal speed). In this work, only the thick drawn boundaries in (a) are
regarded for building the representation.
This learning task is a complex one, because the underlying state space is
large and continuous, and reinforcement learning processes are known to suffer
from performance problems under these conditions. Thrun and Schwartz stated
that for being able to adapt RL to complex tasks it is necessary to discover the
structure of the world and to abstract from its details [16]. In any case, a sensible
reduction of the state space will be beneficial for any RL application.
To achieve that structural abstraction, we make use of the observation that
navigation in space can be divided into two different aspects: Goal-directed be-
havior towards a task-specific target location, and generally sensible behavior
that is task-independent and the same in any environment [17]. According to
[12], we refer to the first as problem space and to the latter as agent space. It is
especially agent space that encodes structural information about the world that
persists in any learning task and therefore this knowledge is worth transferring
to different scenarios.
The structure of office environments as depicted in Fig. 2 is usually character-
ized by walls, which can be abstracted as line segments in the plane. Even more
it is relative position information of line segments with respect to the robot’s
moving direction that defines structural paths in the world and leads to sensible
action sequences for a moving agent. Thus, for encoding agent space, we use the
qualitative representation RLPR (Relative Line Position Representation) [17].
Inspired by the “direction relation matrix” [18], the space around the agent is
partitioned into bounded and unbounded regions Ri (see Fig. 3). Two functions
τ : N → {0, 1} and τ : N → {0, 1} are defined: τ (i) denotes whether there is a
line segment detected within a region Ri and τ (i) denotes whether a line spans
from a neighboring region Ri+1 to Ri . τ i is used for bounded sectors in the im-
mediate vicinity of the agent (R1 to R5 in Fig. 3(b)). Objects that appear there
have to be avoided in any case. The position of detected line segments in R10 to
R16 (Fig. 3(a)) is helpful information to be used for general orientation and mid-
term planning, so τ is used for R10 to R16 . This abstraction from line segments
in the simulator to a vector of RLPR values is a conceptual classification.
Spatial Abstraction 321
environmental
data from
simulation
conceptual classification
landmark
enriched RLPR
representation
Ψ1 Ψ2
aspectualization aspectualization
problem
agent space
space
ψ(s) = (ψl (s), ψr (s)) = (c1 , . . . , c7 , τ (R1 ), . . . , τ (R5 )), τ (R10 ), . . . , τ (R16 )
We call the new emerging state space O = Image(ψ) the observation space. It
is a comparably small and discrete state space, fulfilling the three goals of ab-
straction we defined in Section 2.5. The RLPR based approach has been shown
to outperform metrical representations that rely on distances or absolute coor-
dinates with regard to learning speed and robustness [17]. For an example of
deriving RLPR values refer to Fig. 5.
So conceptual classification is employed twice for both problem and agent
space to create a compact state space representation. ψ(s) is aspectualizable
regarding the two aspects of navigation (see Fig.4). Let us now investigate how
to take advantage of that to transfer general navigation knowledge to a new task.
during learning. So the policy must provide sensible actions to take in the absence
of known landmarks. This is the behavior that refers to the aspect of general
navigation behavior or agent space. It has to be singled out from π.
By design, ψ(s) is aspectualizable with regard to agent space, and the desired
information is easily accessible. An aspectualization κ(o) = κ(ψl (s), ψr (s)) =
ψr (s)) provides structural world information for any observation. That is, struc-
turally identical situations share an identical RLPR representation.
A new Q-function Qπ for a general, aspectualized policy π for arbitrary states
with the same aspect ψr (s) can be constructed by Q-value averaging over states
with identical ψr (s), which are easily accessible because of the aspectualizability
of O. Given a learned policy π with a value function Qπ (o, a) (o ∈ O, a ∈ A), we
construct a new policy π with Qπ (o , a) (o ∈ O , a ∈ A) in a new observation
space O = Image(ψr ), with the following function [20]:
−1
c∈{ψl (s)} (maxb∈A (|Qπ ((c, o ), b)|))) Q((c, o ), a)
Qπ (o , a) =
|{((c, o ), a)|Qπ ((c, o ), a) = 0}|
This is a weighted sum over all possible landmark observations (in reality, of
course, only the visited states have to be considered, because Q(o, a) = 0 for
the others, so the computational effort is very low). It is averaged over all state-
action pairs where the information is available, that is, the Q-value is not zero.
A weighting factor scales all values according to the maximum reward over all
actions.
This procedure has been applied to a policy learned in the simulated environ-
ment depicted in Fig. 2 for 40,000 learning episodes. For the exact experimental
conditions of learning the policy in simulation refer to [20]. The resulting policy
Qπ can then be used to control a real robot as shown in the following section.
τ (R1 ) = 1
τ (R2 ) = 0
τ (R3 ) = 0
τ (R4 ) = 1
τ (R5 ) = 0
τ (R10 ) = 1
τ (R11 ) = 0
τ (R12 ) = 0
τ (R13 ) = 0
τ (R14 ) = 1
τ (R16 ) = 1
Fig. 5. Screenshot: Abstraction to RLPR in the robot controller. Depicted are the
qualitative regions (see Fig. 3) and the interpreted sensor data which has been ac-
quired from the robot position shown in Fig. 2 right. The overall representation for
this configuration is ψr (s) = {1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1}.
on the line detection of the laser range finder data and the corresponding RLPR
representation. Fig. 6 gives an overview on the development of representations
in both simulator and robot application.
In the simulation three action primitives (straight on, turn left, turn right) have
been used that always move the robot some fixed distance. Rather then implement-
ing this step-wise motion on the real robot, we mapped the action to commands
controlling the wheel speeds in order to obtain continuous motion. Additionally,
movement is smoothed by averaging the most recent wheel speed commands to
avoid strong acceleration/deceleration which the robot drive cannot handle well.
We applied the averaging to the last 8 actions which (given the 0.25 second inter-
val of wheel commands) yields a time of 2 seconds before reaching the wheel speed
associated with the action primitive. In accordance to the robot’s size and motion
dynamics the inner regions of the RLPR grid (Fig.3(b)) have been set to 60 cm in
front and both 30 cm to the left and the right of the robot.
We analyzed the behavior of the Pioneer 2 robot with the learned policy in our
office environment. In contrast to the simple simulation environment the office
environment presents uneven walls, open spaces of several meters, plants, and
furniture like a sofa or bistro tables. The robot shows a reasonable navigation be-
havior, following corridors straightly and turning smoothly around curves. It also
showed ability to cope with structural elements not present in the simulated envi-
ronment, such as open space or tiny obstacles. In other words, general navigation
skills learned in simulation have been transferred to the real-world environment.
The robot only got stuck when reaching areas with a huge amount of clutter
324 L. Frommberger and D. Wolter
geometric
RLPR landmark representation
representation configuration of obstacle
outlines
landmark
RLPR
enriched RLPR
representation
representation
RLPR representation
Fig. 6. Evolution of spatial representations in both simulation and real robot applica-
tion. Abstraction techniques enable both scenarios to operate on the RLPR represen-
tation to achieve a reasonable action selection.
Fig. 7. Pioneer 2 entering an open space, using the aspectualized policy learned in
the simulator. It shows a reasonable navigation behavior in a real office environment,
driving smoothly forward and safely around obstacles.
(such as hanging leaves of plants) and in dead ends where the available motion
primitives do not allow for collision-free movement anymore. Because the original
task was goal-oriented (searching a specific place), the robot also showed a strong
tendency of moving forward and thus actively exploring the environment instead
of just avoiding obstacles. This generally sensible navigation behavior could now,
Spatial Abstraction 325
for example, be used as a basis for learning new goal-oriented tasks on the
robotics platform. Fig. 7 gives an impression of the robot experiment.
4 Discussion
Performing abstraction is a fundamental ability of intelligent agents and different
facets of abstraction have thus been issued in previous work, addressing various
scientific fields and considering a rich diversity of tasks. First, we comment on
a critical remark by Klippel et al.: In their thorough study on schematization,
they state that “there is no consistent approach to model schematization” [4].
We believe that by our formal definitions of abstraction principles the manifold
terms used to describe abstraction can very well be classified and related.
The insight that abstraction can be divided into different categories has
been mentioned before. Stell and Worboys present a distinction of what they
call “selection” and “amalgamation” and formalize these concepts for graph
structures [6]. Our definition of aspectualization and coarsening corresponds to
selection and amalgamation, which Stell and Worboys describe as being “con-
ceptually distinct” types of generalization. Regarding this, we pointed out that
this conceptual distinctness does only apply to the process of abstraction and
not the result, as we could show that the effect of different abstraction paradigms
critically depends on the choice of the initial state space representation.
Bertel et al. also differentiate between different facets of abstraction (“as-
pectualization versus specificity”, “aspectualization versus concreteness”, and
“aspectualization versus integration”), but without giving an exact definition [7].
“Aspectualization versus specificity” corresponds to our definition of aspectu-
alization, and “aspectualization versus concreteness” to coarsening. However,
our definition of aspectualization is tighter than the one given by Bertel et al.:
According to them, aspectualization is “the reduction of problem complexity
through the reduction of the number of feature dimensions”. In our definition,
it is also required that all the other components remain unchanged.
The notion of schematization, which Leonard Talmy describes as “a process
that involves the systematic selection of certain aspects of a referent scene to
represent the whole disregarding the remaining aspects” [21] is tightly connected
to our definition of aspectualization. If we assume the referent scene to be as-
pectualizable according to Def. 3, then the process mentioned by Talmy is as-
pectualization as defined here.
Annette Herskovits defines the term schematization in the context of linguis-
tics as consisting of three different processes, namely abstraction, idealization,
and selection [5]. According to our definition, abstraction and selection would
both be an aspectualization, and idealization refers to coarsening.
The action-centered view on abstraction we introduced in Section 2.5 is also
shared by the definition of categorizability given by Porta and Celaya [22]. The
authors call an environment categorizable, if “a reduced fraction of the available
inputs and actuators have to be considered at a time”. In other words: In a
326 L. Frommberger and D. Wolter
5 Conclusion
In this article we classify abstraction by three distinct principles: aspectualiza-
tion, coarsening, and conceptual classification. We give a formal definition of
these principles for classifying and clarifying the manifold concept names for
abstraction found in the literature. This enables us to show that knowledge rep-
resentation is of critical importance and thus must be addressed in any discus-
sion of abstraction. Identical information may be represented differently, and, by
choosing a specific representation, different types of abstraction processes may
be applicable and lead to an identical result. Also, as abstraction is triggered by
the need to perform a certain task, abstraction can never be regarded as purely
data driven, but it requires a solid a-priori concept of the problem to solve and,
consequently, the actions to take.
We introduce the notion of aspectualizability in knowledge representations. As-
pectualizable knowledge representations are key to enabling knowledge transfer.
By designing an aspectualizable representation, it is possible to transfer naviga-
tion knowledge learned in a simplified simulation to a real-world robot setting.
Acknowledgments. This work was supported by the DFG Transregional Collab-
orative Research Center SFB/TR 8 “Spatial Cognition” (project R3-[Q-Shape]).
Funding by the German Research Foundation (DFG) is gratefully acknowledged.
The authors would like to thank Jan Oliver Wallgrün, Frank Dylla, and Jae Hae
Lee for inspiring discussions. We also thank the anonymous reviewers for pointing
us to further literature from different research communities.
References
1. Hobbs, J.R.: Granularity. In: Proceedings of the Ninth International Joint Confer-
ence on Artificial Intelligence (IJCAI), pp. 432–435 (1985)
2. Bittner, T., Smith, B.: A taxonomy of granular partitions. In: Montello, D. (ed.)
Spatial Information Theory: Cognitive and Computational Foundations of Geo-
graphic Information Science (COSIT), pp. 28–43. Springer, Berlin (2001)
3. Mackaness, W.A., Chaudhry, O.: Generalization and symbolization. In: Shekhar,
S., Xiong, H. (eds.) Encyclopedia of GIS (2008)
4. Klippel, A., Richter, K.F., Barkowsky, T., Freksa, C.: The cognitive reality of
schematic maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-based Mobile
Services – Theories, Methods and Implementations, pp. 57–74. Springer, Berlin
(2005)
5. Herskovits, A.: Schematization. In: Olivier, P., Gapp, K.P. (eds.) Representation
and Processing of Spatial Expressions, pp. 149–162. Lawrence Erlbaum Associates,
Mahwah (1998)
6. Stell, J.G., Worboys, M.F.: Generalizing graphs using amalgamation and selection.
In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651,
pp. 19–32. Springer, Heidelberg (1999)
Spatial Abstraction 327
7. Bertel, S., Vrachliotis, G., Freksa, C.: Aspect-oriented building design: Toward
computer-aided approaches to solving spatial contraint problems in architecture.
In: Allen, G.L. (ed.) Applied Spatial Cognition: From Research to Cognitive Tech-
nology, pp. 75–102. Lawrence Erlbaum Associates, Mahwah (2007)
8. Moravec, H.P., Elfes, A.E.: High resolution maps from wide angle sonar. In:
Proceedings of the IEEE International Conference on Robotics and Automation
(ICRA), St. Louis, MO (1985)
9. Gutmann, J.S., Weigel, T., Nebel, B.: A fast, accurate and robust method for self-
localization in polygonal environments using laser range finders. Advanced Robot-
ics 14(8), 651–667 (2001)
10. Roberts, F.S.: Tolerance geometry. Notre Dame Journal of Formal Logic 14(1),
68–76 (1973)
11. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. In: Adaptive
Computation and Machine Learning. MIT Press, Cambridge (1998)
12. Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforce-
ment learning. In: Proceedings of the Twentieth International Joint Conference on
Artificial Intelligence (IJCAI) (2007)
13. Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Pro-
ceedings of the Twenty Fourth International Conference on Machine Learning
(ICML 2007), Corvallis, Oregon (2007)
14. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning
and advice taking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML
2006. LNCS (LNAI), vol. 4212, pp. 425–436. Springer, Heidelberg (2006)
15. Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
16. Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro,
G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Sys-
tems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge (1995)
17. Frommberger, L.: A generalizing spatial representation for robot navigation with
reinforcement learning. In: Proceedings of the Twentieth International Florida Ar-
tificial Intelligence Research Society Conference (FLAIRS 2007), Key West, FL,
USA, pp. 586–591. AAAI Press, Menlo Park (2007)
18. Goyal, R.K., Egenhofer, M.J.: Consistent queries over cardinal directions across
different levels of detail. In: Tjoa, A.M., Wagner, R., Al-Zobaidie, A. (eds.) Pro-
ceedings of the 11th International Workshop on Database and Expert System Ap-
plications, Greenwich, UK, pp. 867–880 (2000)
19. Schlieder, C.: Representing visible locations for qualitative navigation. In: Car-
rete, N.P., Singh, M.G. (eds.) Qualitative Reasoning and Decision Technologies,
Barcelona, Spain, pp. 523–532 (1993)
20. Frommberger, L.: Generalization and transfer learning in noise-affected robot nav-
igation tasks. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS
(LNAI), vol. 4874, pp. 508–519. Springer, Heidelberg (2007)
21. Talmy, L.: How language structures space. In: Pick Jr., H.L., Acredolo, L.P. (eds.)
Spatial Orientation: Theory, Research, and Application. Plenum, New York (1983)
22. Porta, J.M., Celaya, E.: Reinforcement learning for agents with many sensors and
actuators acting in categorizable environments. Journal of Artificial Intelligence
Research 23, 79–122 (2005)
Representing Concepts in Time*
Martin Raubal
Abstract. People make use of concepts in all aspects of their lives. Concepts
are mental entities, which structure our experiences and support reasoning in
the world. They are usually regarded as static, although there is ample evidence
that they change over time with respect to structure, content, and relation to
real-world objects and processes. Recent research considers concepts as dy-
namical systems, emphasizing this potential for change. In order to analyze the
alteration of concepts in time, a formal representation of this process is neces-
sary. This paper proposes an algebraic model for representing dynamic concep-
tual structures, which integrates two theories from geography and cognitive
science, i.e., time geography and conceptual spaces. Such representation allows
for investigating the development of a conceptual structure along space-time
paths and serves as a foundation for querying the structure of concepts at a spe-
cific point in time or for a time interval. The geospatial concept of ‘landmark’ is
used to demonstrate the formal specifications.
1 Introduction
Humans employ concepts to structure their world, and to perform reasoning and cate-
gorization tasks. Many concepts are not static but change over time with respect to
their structure, substance, and relations to the real world. In addition, different people
use the same or similar concepts to refer to different objects and processes in the real
world, which can lead to communication problems. In this paper, we propose a novel
model to represent conceptual change over time. The model is based on a spatio-
temporal metaphor, representing conceptual change as movement along space-time
paths in a semantic space. It thereby integrates conceptual spaces [1] as one form of
conceptual representation within a time-geographic framework [2].
Formal representations of dynamic concepts are relevant from both a theoretical
and practical perspective. On the one hand, they allow us to theorize about how peo-
ple’s internal processes operate on conceptual structures and result in their alterations
over time. On the other hand, they are the basis for solving some of the current press-
ing research questions, such as in Geographic Information Science (GIScience) and
*
This paper is dedicated to Andrew Frank, for his 60th birthday. He has been a great teacher
and mentor to me.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 328–343, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Representing Concepts in Time 329
2 Related Work
This section starts with an explanation of the notion of concepts and their importance
for categorization. We then introduce conceptual spaces and time geography as the
underlying frameworks for representing concepts in time.
2.1 Concepts
There are several conflicting views on concepts, categories, and their relation to each
other across and even within different communities. From a classical perspective,
concepts have been defined as structured mental representations (of classes or indi-
viduals), which encode a set of necessary and sufficient conditions for their applica-
tion [6]. They deal with what is being represented and how such information is used
during categorization [7]. Barsalou et al. [8] view concepts as mental representations
of categories and point out that concepts are context dependent and situated. For ex-
ample, the concept of a chair is applied locally and does not cover all chairs
universally. From a memory perspective, “concepts are the underlying knowledge in
long-term memory from which temporary conceptualizations in working memory are
constructed.” [8, footnote 7] It is important to note the difference between concepts
and categories: a concept is a mental entity, whereas a category refers to a set of enti-
ties that are grouped together [9].
Concepts are viewed as dynamical systems that evolve and change over time [8].
New sensory input leads to the adaptation of previous concepts, such as during the
interactive process of spatial knowledge acquisition [10]. Neisser’s [11] perceptual
cycle is also based on the argument that perception and cognition involve dynamic
1
See, for example, http://dynamo.cs.manchester.ac.uk/
330 M. Raubal
cognitive structures (schemata in his case rather than explicit concepts). These are
subject to change as more information becomes available.
Here, we use concepts within the paradigm of cognitive semantics, which asserts
that meanings are mental entities—mappings from expressions to conceptual struc-
tures, which refer to the real world [12-14]. The main argument is therefore that a
symbolic representation of an object cannot refer directly to objects, but rather
through concepts in the mind. This difference between objects, concepts, and symbols
is often expressed through the semiotic triangle [15].
Time
Doctor’s office
Mall
Geographical
space
Home
Three classes of constraints limit a person’s activities in space and time. Capability
constraints limit an individual’s activities based on her abilities and the available re-
sources. For example, a fundamental requirement for many people is to sleep between
six and eight hours at home. Coupling constraints require a person to occupy a certain
location for a fixed duration to conduct an activity. If two people want to meet at a Café,
then they have to be there at the same time. In time-geographic terms, their paths cluster
into a space-time bundle. Certain domains in life are controlled through authority con-
straints, which are fiat restrictions on activities in space and time. A person can only
shop at a mall when the mall is open, such as between 10am and 9pm.
All space-time paths must lie within space-time prisms (STP). These are geometri-
cal constructs of two intersecting cones [22]. Their boundaries limit the possible
locations a path can take based on people’s abilities to trade time for space. Figure 2
depicts a space-time prism for a scenario where origin and destination have the same
location. The time budget is defined by Δt = t2−t1 in which a person can move away
from the origin, limited only by the maximum travel velocity. The interior of the
332 M. Raubal
Time
t2
PPS
t1 Geographical
PPA space
prism defines a potential path space (PPS), which represents all locations in space
and time that can be reached by the individual during Δt. The projection of the PPS
onto geographical space results in the potential path area (PPA) [23].
Different definitions of what a representation is have been given in the literature. In this
paper, we commit to the following: “A world, X, is a representation of another world, Y,
if at least some of the relations for objects of X are preserved by relations for corre-
sponding objects of Y.” [24, p.267] In order to avoid confusion about what is being
represented how and where regarding conceptual change over time, we distinguish
between two representations—the mental world and the mental model—, according to
[24]. The mental world is a representation of the real world and concerned with the
inner workings and processes within the brain and nervous system (i.e., inside the head).
Here, we formally specify a possible mental model as a representation of the mental
world2. The goal is to be able to use this model to explain the processes that lead to the
change of concepts in time. In this sense, we are aiming for informational equivalence
[24], see also [25] and [26] for examples from the geospatial domain.
2
A mental model is therefore a representation of a representation of the real world—see Palmer
[24] for a formal demonstration of this idea.
Representing Concepts in Time 333
The proposed mental model for representing conceptual change in time is based on a
spatio-temporal metaphor. The power of spatial metaphors for modeling and compre-
hending various non-spatial domains has been widely demonstrated [27-30]. From a
cognitive perspective, the reason for such potential is that space plays a fundamental
role in people’s everyday lives, including reasoning, language, and action [31].
Our representation of conceptual change in a mental model is based on the meta-
phorical projection of entities, their relations, and processes from a spatio-temporal
source domain to a semantic target domain. As with all metaphors, this is a partial
mapping, because source and target are not identical [30]. Concepts are represented as
n-dimensional regions in conceptual spaces, which can move through a semantic
space in time. The goal of this metaphor is to impose structure on the target domain
and therefore support the explanation of its processes.
Time
Semantic coupling
SPPS
SST-path2
SSTS
SSTE
SST-path1
CS2
dsem
CS1 Semantic
space
Fig. 3. Representation of moving conceptual spaces in a semantic space over time. For clarity
reasons, the concept regions are only visualized once (during semantic coupling).
Our proposed mental model allows for representing conceptual change over time from
two perspectives, namely (a) change of the geometrical structure of concepts as n-
dimensional regions within one conceptual space and (b) changes between different
conceptual spaces. Case (a) presumes that no change of quality dimensions has oc-
curred in the conceptual space, therefore allowing only for movement of the concept
region within this particular space—caused by a change in dimension values. One can
Representing Concepts in Time 335
then measure the semantic distance between a concept c at time ti and the same con-
cept at time ti+1. Three strategies for calculating semantic similarity between concep-
tual regions, including overlapping concepts, have been demonstrated in [20] and can
be applied here. These methods differ, in that for each vector of c(ti) one or several
corresponding vectors of c(ti+1) are identified.
Case (b) applies to mappings between conceptual spaces, leading to a change in
quality dimensions. These mappings can either be projections, which reduce the com-
plexity of the space by reducing its number of dimensions, or transformations, which
involve a major change of quality dimensions, such as the addition of new dimen-
sions. As shown in [36], projections (Equation 1) and transformations (Equation 2)
can be expressed as partial mappings with C, D denoting conceptual spaces and m, n
the number of quality dimensions. For projections, the semantics of the mapped qual-
ity dimensions must not change or can be mapped by rules.
(Rproj: Cm → Dn) where n < m and Cm ∩ Dn = Dn (1)
(Rtrafo: C → D ) where (n ≤ m and C ∩ D ≠ D ) or (n > m)
m n m n n
(2)
A conceptual space is formally specified3 as a data type, together with its attributes.
Every conceptual space has an identifier Id, a Position in the semantic space at a
3
The complete Hugs code including the test data for this paper is available at http://www.
geog.ucsb.edu/~raubal/Downloads/CS.hs. Hugs interpreters can be downloaded freely from
http://www.haskell.org.
336 M. Raubal
constructSemanticSpaceTimePath i cs
= NewSemanticSpaceTimePath id css
where
id = i
css = filter ((i== ).getConceptualSpaceId) cs
Semantic space-time stations are specified as special types of SemanticSpace-
TimePaths—similar to the representation of space-time stations in [43]—, i.e.,
consisting of conceptual space instances with equal positions (but potential temporal
gaps). The derivation of a SemanticSpaceTimeStation is based on the sorting
function sortConceptualSpaces, which orders conceptual spaces according to
their positions.
class SemanticSpaceTimePaths sstPath where
constructSemanticSpaceTimeStation :: sstPath ->
[ConceptualSpace]
instance SemanticSpaceTimePaths SemanticSpaceTimePath
where
constructSemanticSpaceTimeStation
(NewSemanticSpaceTimePath id cs)
= sortConceptualSpaces cs
The data type SemanticSpaceTimeEnvelope is defined by a Center (of
type Position) and a Boundary for each time step. The projection of SSTE to
semantic space results in a region (equivalent to the PPA from time geography),
whose boundary delimits a semantic similarity area. Note that contrary to semantic
space-time stations, semantic potential path spaces—which result from integration
over a sequence of SSTE slices—cannot have gaps. One can now determine algo-
rithmically, whether a conceptual space falls inside the boundary or not (which identi-
fies conceptual change).
data SemanticSpaceTimeEnvelope =
NewSemanticSpaceTimeEnvelope Center Time Boundary
Semantic coupling constraints are represented through the semanticMeet func-
tion. It determines whether two instances of conceptual spaces interact at a given time
step. This definition leaves room for integrating semantic uncertainty by specifying a
threshold for the semantic distance (epsilon), within which the conceptual spaces
are still considered to be interacting, see also [44]. Contextual constraints are fiat
boundaries in the semantic space and can therefore be represented by the Boundary
type.
class ConceptualSpaces cs where
semanticMeet :: cs -> cs -> Bool
instance ConceptualSpaces ConceptualSpace where
semanticMeet cs1 cs2
= (getConceptualSpaceTime cs1 ==
getConceptualSpaceTime cs2)
&& (semanticDistance cs1 cs2 <= epsilon)
338 M. Raubal
Time
fa
sd
vi
t4
fa
sd
co
t3
fa
SST-path sd
ci
co
t2
fa
sd dcs2-cs3
dcs1-cs3 Semantic
t1 vi dcs1-cs2
dcs1-cs2 space
Four time steps are considered, which results in four instances of the conceptual
space4. In this scenario, the person’s ‘landmark’ concept comprises three quality di-
mensions at time t1 (cs1). Through experience and over the years, the person has
acquired a sense of cultural importance of buildings (cs2)—a building may be fa-
mous for its architectural style, therefore being a landmark—, adding this new dimen-
sion and also the significance of color. Next, for the reason of variation in the per-
son’s interests, cultural importance vanishes again (cs3). Over time, due to physio-
logical changes resulting in color blindness, the person’s concept structure changes
back to the original one, eliminating color and again including visibility. Figure 4
visualizes these conceptual changes over time.
cs1 = NewConceptualSpace 1 (3,1) 1 [fa,sd,vi]
cs2 = NewConceptualSpace 1 (6,3) 2 [fa,sd,ci,co]
cs3 = NewConceptualSpace 1 (4,2) 3 [fa,sd,co]
cs4 = NewConceptualSpace 1 (3,1) 4 [fa,sd,vi]
The formal specifications can now be used to query the temporal conceptual repre-
sentation in order to find conceptual changes and when they happened, and what
semantics is represented by a particular conceptual structure at a specific time. We
can infer that the semantic change from cs1 at time 1 to cs2 at time 2 (transforma-
tion with two new dimensions) is larger than the change from cs1 at time 1 to cs3 at
time 3 (transformation with one new dimension) by calculating the respective seman-
tic distances (dcs1-cs2 and dcs1-cs3 in Figure 4). The change resulting from the move
between time 2 and 3 (dcs2-cs3) is due to a projection, involving a reduction to three
dimensions. Similarity is thereby a decaying function of semantic distance, which
depends on the semantic space. The interpretation of semantic distance is domain-
dependent and may be determined through human participants tests [49].
semanticDistance cs1 cs2
3.605551
semanticDistance cs1 cs3
1.414214
semanticDistance cs2 cs3
2.236068
We can further construct the semantic space-time path for the conceptual space un-
der investigation from the set of all available conceptual space instances (allCs).
The result (presented below is only the very beginning for space reasons) is a list of
the four conceptual space instances with Id=1 in a temporal sequence. This SST-
path is visualized in Figure 4.
constructSemanticSpaceTimePath 1 allCs
[NewSemanticSpaceTimePath 1 [NewConceptualSpace 1 …]
Applying the constructSemanticSpaceTimeStation function to the
SST-path derives all conceptual space instances with equal positions but potentially
temporal gaps, such as cs1 and cs4.
4
The quantitative values for the positions of conceptual spaces in the semantic space are for
demonstration purposes. Their determination, such as through similarity ratings from human
participants tests, is left for future work.
340 M. Raubal
constructSemanticSpaceTimeStation
(constructSemanticSpaceTimePath 1 allCs)
[NewConceptualSpace 1 (3.0,1.0) 1 [Dimension "area"
(100.0,1200.0) "sqm",Dimension "shape" (0.0,100.0)
"%",Dimension "color" (0.0,255.0) "RGB"],
NewConceptualSpace 1 (3.0,1.0) 4 [Dimension "area"
(100.0,1200.0) "sqm",Dimension "shape" (0.0,100.0)
"%",Dimension "color" (0.0,255.0) "RGB"]]
Acknowledgments
The comments from Carsten Keßler and three anonymous reviewers provided useful
suggestions to improve the content of the paper.
Bibliography
1. Gärdenfors, P.: Conceptual Spaces - The Geometry of Thought. MIT Press, Cambridge
(2000)
2. Hägerstrand, T.: What about people in regional science? Papers of the Regional Science
Association 24, 7–21 (1970)
3. Brodaric, B., Gahegan, M.: Distinguishing Instances and Evidence of Geographical Con-
cepts for Geospatial Database Design. In: Egenhofer, M., Mark, D. (eds.) Geographic In-
formation Science - Second International Conference, GIScience 2002, Boulder, CO,
USA, September 2002, pp. 22–37. Springer, Berlin (2002)
4. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. In: Scientific American, pp.
34–43 (2001)
5. Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Ac-
quisition 5(2), 199–220 (1993)
6. Laurence, S., Margolis, E.: Concepts and Cognitive Science. In: Margolis, E., Laurence, S.
(eds.) Concepts - Core Readings, pp. 3–81. MIT Press, Cambridge (1999)
7. Smith, E.: Concepts and induction. In: Posner, M. (ed.) Foundations of cognitive science,
pp. 501–526. MIT Press, Cambridge (1989)
8. Barsalou, L., Yeh, W., Luka, B., Olseth, K., Mix, K., Wu, L.: Concepts and meaning. In:
Beals, K., et al. (eds.) Parasession on conceptual representations, pp. 23–61. University of
Chicago, Chicago Linguistics Society (1993)
9. Goldstone, R., Kersten, A.: Concepts and Categorization. In: Healy, A., Proctor, R. (eds.)
Comprehensive handbook of psychology, pp. 599–621 (2003)
10. Piaget, J., Inhelder, B.: The Child’s Conception of Space. Norton, New York (1967)
11. Neisser, U.: Cognition and Reality - Principles and Implications of Cognitive Psychology.
Freeman, New York (1976)
12. Lakoff, G.: Cognitive Semantics, in Meaning and Mental Representations. In: Eco, U., San-
tambrogio, M., Violi, P. (eds.), pp. 119–154. Indiana University Press, Bloomington (1988)
13. Green, R.: Internally-Structured Conceptual Models in Cognitive Semantics. In: Green, R.,
Bean, C., Myaeng, S. (eds.) The Semantics of Relationships - An Interdisciplinary Per-
spective, pp. 73–89. Kluwer, Dordrecht (2002)
342 M. Raubal
14. Kuhn, W., Raubal, M., Gärdenfors, P.: Cognitive Semantics and Spatio-Temporal Ontolo-
gies. Spatial Cognition and Computation 7(1), 3–12 (2007)
15. Ogden, C., Richards, I.: The Meaning of Meaning: A Study of the Influence of Language
Upon Thought and of the Science of Symbolism. Routledge & Kegan Paul, London (1923)
16. Barsalou, L.: Situated simulation in the human conceptual system. Language and Cogni-
tive Processes 5(6), 513–562 (2003)
17. Sowa, J.: Categorization in Cognitive Computer Science. In: Cohen, H., Lefebvre, C.
(eds.) Handbook of Categorization in Cognitive Science, pp. 141–163. Elsevier, Amster-
dam (2006)
18. Gärdenfors, P.: Representing actions and functional properties in conceptual spaces. In:
Ziemke, T., Zlatev, J., Frank, R. (eds.) Body, Language and Mind, pp. 167–195. Mouton
de Gruyter, Berlin (2007)
19. Raubal, M.: Formalizing Conceptual Spaces, in Formal Ontology in Information Systems.
In: Varzi, A., Vieu, L. (eds.) Proceedings of the Third International Conference (FOIS
2004), pp. 153–164. IOS Press, Amsterdam (2004)
20. Schwering, A., Raubal, M.: Measuring Semantic Similarity between Geospatial Concep-
tual Regions. In: Rodriguez, A., et al. (eds.) GeoSpatial Semantics - First International
Conference, GeoS 2005, Mexico City, Mexico, November 2005, pp. 90–106. Springer,
Berlin (2005)
21. Devore, J., Peck, R.: Statistics - The Exploration and Analysis of Data, 4th edn. Duxbury,
Pacific Grove (2001)
22. Lenntorp, B.: Paths in Space-Time Environments: A Time-Geographic Study of the
Movement Possibilities of Individuals. Lund Studies in Geography, Series B (44) (1976)
23. Miller, H.: Modeling accessibility using space-time prism concepts within geographical in-
formation systems. International Journal of Geographical Information Systems 5(3), 287–
301 (1991)
24. Palmer, S.: Fundamental aspects of cognitive representation. In: Rosch, E., Lloyd, B. (eds.)
Cognition and categorization, pp. 259–303. Lawrence Erlbaum, Hillsdale (1978)
25. Frank, A.: Spatial Communication with Maps: Defining the Correctness of Maps Using a
Multi-Agent Simulation. In: Freksa, C., et al. (eds.) Spatial Cognition II - Integrating Ab-
stract Theories, Empirical Studies, Formal Methods, and Practical Applications, pp. 80–99.
Springer, Berlin (2000)
26. Frank, A.: Pragmatic Information Content: How to Measure the Information in a Route
Description. In: Duckham, M., Goodchild, M., Worboys, M. (eds.) Foundations of Geo-
graphic Information Science, pp. 47–68. Taylor & Francis, London (2003)
27. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago
(1980)
28. Kuipers, B.: The ’Map in the Head’ Metaphor. Environment and Behaviour 14(2), 202–
220 (1982)
29. Kuhn, W.: Metaphors Create Theories for Users. In: Frank, A.U., Campari, I. (eds.) Spatial
Information Theory: Theoretical Basis for GIS, pp. 366–376. Springer, Berlin (1993)
30. Kuhn, W., Blumenthal, B.: Spatialization: Spatial Metaphors for User Interfaces. Geoinfo-
Series, vol. 8. Department of Geoinformation, Technical University Vienna, Vienna (1996)
31. Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the
Mind. The University of Chicago Press, Chicago (1987)
32. Skupin, A.: Where do you want to go today [in attribute space]? In: Miller, H. (ed.) Socie-
ties and Cities in the Age of Instant Access, pp. 133–149. Springer, Dordrecht (2007)
Representing Concepts in Time 343
33. Skupin, A., Fabrikant, S.: Spatialization Methods: A Cartographic Research Agenda for
Non-Geographic Information Visualization. Cartography and Geographic Information Sci-
ence 30(2), 95–119 (2003)
34. Burrough, P., Frank, A., Masser, I., Salgé, F.: Geographic Objects with Indeterminate
Boundaries. GISDATA Series. Taylor & Francis, London (1996)
35. Frank, A.: Ontology for spatio-temporal Databases. In: Koubarakis, M., et al. (eds.) Spatio-
temporal Databases: The Chorochronos Approach, pp. 9–77. Springer, Berlin (2003)
36. Raubal, M.: Mappings For Cognitive Semantic Interoperability. In: Toppen, F., Painho, M.
(eds.) AGILE 2005 - 8th Conference on Geographic Information Science, pp. 291–296. In-
stituto Geografico Portugues (IGP), Lisboa (2005)
37. Winter, S., Nittel, S.: Formal information modelling for standardisation in the spatial domain.
International Journal of Geographical Information Science, 2003 17(8), 721–742 (2003)
38. Raubal, M., Kuhn, W.: Ontology-Based Task Simulation. Spatial Cognition and Computa-
tion 4(1), 15–37 (2004)
39. Krieg-Brückner, B., Shi, H.: Orientation Calculi and Route Graphs: Towards Semantic
Representations for Route Descriptions. In: Raubal, M., et al. (eds.) Geographic Informa-
tion Science, 4th International Conference GIScience 2006, Muenster, Germany, pp. 234–
250. Springer, Berlin (2006)
40. Guttag, J., Horowitz, E., Musser, D.: The Design of Data Type Specifications. In: Yeh, R.
(ed.) Current Trends in Programming Methodology, pp. 60–79. Prentice-Hall, Englewood
Cliffs (1978)
41. Frank, A., Kuhn, W.: Specifying Open GIS with Functional Languages. In: Egenhofer, M.,
Herring, J. (eds.) Advances in Spatial Databases (SSD 1995), pp. 184–195. Springer, Port-
land (1995)
42. Hudak, P.: The Haskell School of Expression: Learning Functional Programming through
Multimedia. Cambridge University Press, New York (2000)
43. Miller, H.: A Measurement Theory for Time Geography. Geographical Analysis 37(1),
17–45 (2005)
44. Ahlqvist, O.: A Parameterized Representation of Uncertain Conceptual Spaces. Transac-
tions in GIS 8(4), 493–514 (2004)
45. Nothegger, C., Winter, S., Raubal, M.: Selection of Salient Features for Route Directions.
Spatial Cognition and Computation 4(2), 113–136 (2004)
46. Smith, B., Mark, D.: Geographical categories: an ontological investigation. International
Journal of Geographical Information Science 15(7), 591–612 (2001)
47. Brodaric, B., Gahegan, M.: Experiments to Examine the Situated Nature of Geoscientific
Concepts. Spatial Cognition and Computation 7(1), 61–95 (2007)
48. Mark, D., Turk, A., Stea, D.: Progress on Yindjibarndi Ethnophysiography. In: Winter, S.,
et al. (eds.) Spatial Information Theory, 8th International Conference COSIT 2007, Mel-
bourne, Australia, pp. 1–19. Springer, Berlin (2007)
49. Hahn, U., Chater, N.: Understanding Similarity: A Joint Project for Psychology, Case-
Based Reasoning, and Law. Artificial Intelligence Review 12, 393–427 (1998)
50. Kosslyn, S.: Image and brain - The resolution of the imagery debate. MIT Press, Cam-
bridge (1994)
51. Tversky, B.: Cognitive Maps, Cognitive Collages, and Spatial Mental Model. In: Frank,
A., Campari, I. (eds.) Spatial Information Theory: Theoretical Basis for GIS, pp. 14–24.
Springer, Berlin (1993)
The Network of Reference Frames Theory: A Synthesis
of Graphs and Cognitive Maps
Tobias Meilinger
Abstract. The network of reference frames theory explains the orientation behav-
ior of human and non-human animals in directly experienced environmental
spaces, such as buildings or towns. This includes self-localization, route and sur-
vey navigation. It is a synthesis of graph representations and cognitive maps, and
solves the problems associated with explaining orientation behavior based either
on graphs, maps or both of them in parallel. Additionally, the theory points out the
unique role of vista spaces and asymmetries in spatial memory. New predictions
are derived from the theory, one of which has been tested recently.
1 Introduction
Orientation in space is fundamental for all humans and the majority of other animals.
Accomplishing goals frequently requires moving through environmental spaces such
as forests, houses, or cities [26]. How do navigators accomplish this? How do they
represent the environment they traveled? Which processes operate on these represen-
tations in order to reach distant destinations or to self-localize when lost? Various
theories have been proposed to explain these questions. Regarding the underlying
representation these theories can be roughly classified into two groups which are
called here graph representations and cognitive maps. In the following paper, I will
explain graph representations and cognitive maps. I will also highlight how graph
representations and cognitive maps fail to properly explain orientation behaviour. As
a solution I will introduce the network of reference frames theory and discuss it with
respect to other theories and further empirical results.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 344–360, 2008.
© Springer-Verlag Berlin Heidelberg 2008
The Network of Reference Frames Theory 345
Graph representations and cognitive maps are especially suited to represent route and
survey knowledge, respectively. The other side of the coin is, however, that they also
have their specific limitations. These will now be described in detail.
Graph representations (1) do not represent survey knowledge, (2) often ignore met-
ric relations given in perception, and (3) often assume actions are sufficient to explain
route knowledge. The main limitation of graph representations is that there is no sur-
vey knowledge expressed at all. Using a graph representation, navigators know how
to reach a location and have the ability to choose between different routes. Graph
representations, however, do not give navigators any cue as to where their goal is
1
Often the term cognitive map is used for the sum of all spatial representations. Contrary to
that, cognitive map is understood here as a specific spatial representation, namely storing spa-
tial information within one reference frame. A reference frame here is not understood as the
general notion of representing something relative to ones’ own body (= egocentric) vs. rela-
tive to other objects (= allocentric), but a reference frame is considered as one single coordi-
nate system (cf. [15]). Nevertheless, a reference frame can be egocentric or allocentric.
346 T. Meilinger
located in terms of direction or distance. This problem originates from the fact that
graph representations normally do not represent metric knowledge at all. This is de-
spite the fact that not only human navigators are provided with at least rough distance
estimates especially by their visual system and by proprioceptive cues during locomo-
tion. Some graph models ignore this already available information and instead assume
that a navigator stores raw or only barely processed sensory data ([4], [20]). As a final
point, actions themselves ([19], [20], [45]) can not be sufficient to explain route
knowledge. Rats can swim a route learned by walking [18]. Cats can walk a route
learned while being passively carried along a route [10]. We can cycle a path learned
by walking. Even for route knowledge the edge of a graph representing how to get
from one node to the next has to be more abstract than a specific action. However, not
only graph representations are limited.
Cognitive maps (1) have problems in explaining self-localization and route knowl-
edge. (2) There is a surprising lack of evidence that proves non-human animals have
cognitive maps at all. (3) Human survey navigation is not always consistent with a
cognitive map, and (4), cognitive maps are necessarily limited in size. Self-localizing
based exclusively on a cognitive map can only take the geometric relations into ac-
count that are displayed there, (e.g., the form of a place). The visual appearance of
landmarks is almost impossible to represent within a cognitive map itself. This infor-
mation has to be represented separately and somehow linked to a location within the
cognitive map. This is probably one reason why simultaneously constructing a map
while staying located within this map (SLAM) is considered a complicated problem
in robotics [42]. Similarly, planning a route based on a cognitive map alone is also not
trivial, as possible routes have to be identified first [16]. Another issue is that cogni-
tive maps seem to be limited to human navigation. If animals had cognitive maps,
they would easily be able to take novel shortcuts, (i.e., directly approach a goal via a
novel path without using updating or landmarks visible from both locations). How-
ever, the few observations arguing for novel shortcuts in insects and mammals have
been criticized because they do not exclude alternative explanations and could not be
replicated in better controlled experiments [1]. For example, in the famous experiment
by Tolman, Ritchie and Khalish [43] rat’s shortcutting behavior can be explained by
assuming they directly approached the only available light source within the room.
Although the discussion whether non-human animals are able to make novel shortcuts
has yet to be settled, such shortcutting behavior should be fairly common if orienta-
tion was based on a cognitive map. This is clearly not the case. Similarly, a human
shortcutting experiment within an “impossible” virtual environment casts doubt upon
a cognitive map as the basis for such survey navigation [34]. In this experiment unno-
ticeable portals within the virtual environment teleported participants to another loca-
tion within the environment. They could, therefore, not construct a consistent
two-dimensional map of this environment. Still, participants were able to shortcut
quite accurately. The last shortcoming of cognitive maps is that we have to use many
of them anyway. We surely do not have one and the same cognitive map (reference
frame) to represent the house we grew up in, New York and the Eiffel Tower. At one
point, we have to use multiple cognitive maps and (probably) represent relations be-
tween them.
Graph representations and cognitive maps have specific advantages and limitations.
Graphs are good for representing route knowledge. However, they do not explain survey
The Network of Reference Frames Theory 347
In this chapter, I will describe the network of reference frames theory in terms of the
representations and the processes acting on those, and how these are used for different
tasks, such as navigation, survey knowledge, etc.
2.1 Representation
The network of reference frames theory describes the memory representation acquired
by human and non-human animals when locomoting through environmental spaces
such as the country side, buildings, or cities. It also describes how this representation is
used for self-localization, route and survey navigation. The theory is a fusion between
graph representations and cognitive maps (cf., Fig.2). It assumes that the environment
is encoded in multiple interconnected reference frames. Each reference frame can be
described as a coordinate system with a specific orientation. These reference frames
form a network or graph. A node within this network is a reference frame referring to a
single vista space. Vista spaces surround the navigator and can be perceived from
.
348 T. Meilinger
Fig. 2. A visualization of the network of reference frame theory. Reference frames correspond
to single vista spaces. They are connected via perspective shifts which specify the translation
and rotation necessary to get from one reference frame to the next one.
one point of view, for example, a room, a street or even a valley [26].2 This means
that the basic unit in the network is always the reference frame of a vista space.
Within this vista space reference frame, the location of objects and the surrounding
geometry are specified. The edges in the network define the so called perspective shift
necessary to move from one reference frame to the next. Such a perspective shift
consists of both a translation and a rotation component, for example, moving forward
150 meters and then turning right 90°. Perspective shifts all point to another reference
frame,3 they may differ in precision and the association strength with which they
connect the two reference frames. The more familiar a navigator is with an environ-
ment, the more precise the perspective shifts will become and the more strongly the
perspective shift will connect two reference frames.
The network of vista space reference frames connected via perspective shifts is
stored in long-term memory. Several processes shape or operate on this memory.
These processes are encoding, reorientation by recognition, route navigation, and
survey navigation. In the following they will be described in detail (for a summary see
Table 1).
2.2 Encoding
Table 1. Summary of the representation and processes assumed in the network of reference
frames theory
Representation
Network (graph) consisting of nodes connected by edges (see Fig. 2)
Node: a reference frame with an orientation specifying locations and orientations within a
vista space; within this reference frame, objects and the geometric layout are encoded
Edge: perspective shift, i.e., translation and rotation necessary to move to the next reference
frame; perspective shifts point to the next reference frame and differ in precision and
association strength.
Processes
Encoding: first time experience or the geometry of a vista space define the orientation of a
new reference frame; the visual scene itself, updating, or global landmarks can provide the
perspective shift to the next vista space reference frame; familiarity increases the accuracy
of the perspective shifts and the association strength of these connections.
Self-localization by recognition: recognizing a vista space by the geometry or landmarks it
contains provides location and orientation within this vista space and the current
node/reference frame within the network
Route navigation by activation spread: an activation spread mechanism provides a route from
the current location to the goal; during wayfinding, reference frames on the route are pre-
activated and, therefore, recognized more easily; recently visited reference frames are
deactivated
Survey navigation by imagination: imagining connected vista spaces not visible step-by-step
within the current reference frame; allows retrieving direction and straight line distance to
distant locations; this can be used for shortcutting or pointing.
350 T. Meilinger
Repeated Visits. Re-visiting an environmental space can add new perspective shifts
to the network and will increase the precision and association strength of existing
perspective shifts (for the later see 2.4). Walking a new route to a familiar goal will
form a new chain of reference frames and perspective shifts connecting the start and
goal. That way, formerly unconnected areas, such as city districts, can be connected.
When walking a known route in reverse direction, the theory assumes that new per-
spective shifts are encoded in a backward direction. Then two reference frames A and
B are connected with two perspective shifts, one pointing from A to B and the other
one pointing from B to A. In principle, inverting one perspective shift would be suffi-
cient to get the opposite perspective shift. However, such an inversion process is as-
sumed to be error-prone and costly therefore it is usually not applied.
When navigating an existing perspective shift along its orientation repeatedly no
new perspective shift is encoded, but the existing perspective shift becomes more
precise. This increase in precision corresponds to a shift from route knowledge to
more precise survey knowledge. The precision of survey knowledge is directly de-
pendent upon the precision of the perspective shift (for a similar model for updating
see [6]). For many people, perspective shifts will be imprecise after the first visit, and
therefore, highly insufficient, (e.g., for pointing to distant destinations). However,
they still accurately represent route knowledge, (i.e., indicate which reference frame is
connected with which other reference frame). When the perspective shifts become
more precise after repeated visits, survey knowledge will also become more precise
(cf., [25]; see 2.5). This corresponds with the original claim that route knowledge
usually develops earlier than survey knowledge (e.g., [36]). However, survey knowl-
edge does not have to develop at all (e.g., [24]) or can in principle also be observed
after just a few learning trials (e.g., [27]). Correspondingly, the perspective shifts may
be precise enough for pointing or other survey knowledge tasks after little experience
The Network of Reference Frames Theory 351
or they may remain imprecise even after an extended experience. Here, large differ-
ences between individuals due to the sense of direction can be expected (cf., [9],
[35]). Updating global orientation while navigating an environmental space will result
in more precise perspective shifts, and therefore, improve survey knowledge. It fol-
lows that people with a good sense of direction will also acquire precise survey
knowledge quicker. Similarly, environments which ease such updating will lead to
more precise perspective shifts and improve survey knowledge accordingly. This
facilitation can be gained, for example, by uniform slant, distant landmarks, or a grid
city, which all have been shown to enhance orientation performance (e.g., [25], [32]).
When someone gets lost within a familiar environmental space, the principal mode of
reorientation will be by recognizing a single vista space within this environment (for
self-localizing by the structure of environmental spaces see [21], [38]). A vista space
can be recognized by its geometry or by salient landmarks located within (cf. [3]).
First, recognizing a vista space provides navigators, with their location and their ori-
entation within this vista space. Second, recognizing a vista space provides navigators
with their location within the network, (i.e., in which node or vista space reference
frame they are located). Their position in terms of direction and distance with respect
to currently hidden locations in the environmental space however, has to be inferred
from memory. This will be explained in the section on survey navigation by imagina-
tion further below.
Route navigation means selecting and traveling a route from the current location to a
goal. The network of reference frames theory assumes an activation spread mecha-
nism to explain route selection which was proposed by Chown et al. [4] as well as
Trullier et al. [45]. Within the network, activation from the current reference frame
(current node) spreads along the perspective shifts (edges) connecting the various
reference frames (nodes). If the activation reaches the goal node, the route transferring
the activation will be selected, (i.e., a chain of reference frames connected with per-
spective shifts). Here, the association strength of perspective shifts is important. The
association strength is higher for the most navigated perspective shifts. Activation
will be spread faster along those edges that are higher in association strength. If sev-
eral possible routes are encoded within the network, the route that spreads the activa-
tion fastest will be selected for navigation. This route must not necessarily be the
shortest route or the route with the least number of nodes. As the activation propa-
gates easier via highly associated edges, such familiar routes will be selected with
higher probability.
During navigation, the perspective shift provides navigators with information about
where to move next, (i.e., perform the perspective shift). If the perspective shift is
rather imprecise, navigators will only have an indicated direction in which to move.
Moving in this direction, they will eventually be able to recognize another vista space
reference frame. By updating the last reference frame visited, it will prevent naviga-
tors from getting lost. Pre-activating reference frames to come and de-activating
352 T. Meilinger
already visited reference frames will facilitate recognition. When successfully navi-
gating a known route, its perspective shifts will become more accurate and their asso-
ciation strengths will increase, making it more probable that the route will be selected
again.
The described process is probably sufficient to explain most non-human route
navigation. It is also plausible that such a process is inherited in humans and applied
for example, when navigating familiar environments without paying much attention.
However, humans can certainly override this process and select routes by other
means.
Survey knowledge tasks such a pointing or shortcutting require that relevant locations
are represented within one frame of reference, (e.g., the current location and the goal
destination). The network of reference frames theory assumes that this integration
within one frame of reference occurs online within working memory. This is only
done when necessary and only for the respective area. For example, when pointing to
a specific destination, only the area from the current location to the destination is
represented. In this framework, the integration within one frame of reference happens
during the retrieval of information and not during encoding or elaboration, as with a
cognitive map. The common reference frame is available only temporarily in working
memory and is not constantly represented in long term memory. The integration itself
is done by imagining distant locations as if the visibility barriers of the current vista
space were transparent. The current vista space can be the one physically surrounding
the navigator or another vista space that is imagined. From the current vista space’s
reference frame, a perspective shift provides the direction and orientation of the con-
nected reference frame. With this information, the navigator imagines the next vista
space within the current frame of reference, (i.e., this location is imagined in terms of
direction and distance from the current vista space). This way, the second vista space
is included in the current reference frame. Now, a third vista space can be included
using the perspective shift connecting the second and the third vista space reference
frames. That way, every location known in the surrounding environmental space can
be imagined. Now, the navigator can point to this distant location, determine the
straight line distance, and try to find a shortcut.
The network of reference frames theory is a fusion between graph representations and
cognitive maps. Multiple reference frames or cognitive maps are connected with each
other within a graph structure. As in graph representations, the basic structure is a
network or graph. However, in contrast to most existing graph models ([4], [19], [20],
[45], [48]), metric information is included within this graph. This is done for the
The Network of Reference Frames Theory 353
nodes, which consist of reference frames, as well as for the edges, (i.e., the perspec-
tive shifts, which represent translations and turns). Such a representation avoids the
problems associated with the mentioned graph representations (see 1.2): (1) Most
importantly, it can explain survey knowledge, as metric relations are represented
contrary to other graph models. (2) Representing metric relations also uses informa-
tion provided by perception. Depth vision and other processes allow us to perceive the
spatial structure of a scene. This information is stored and not discarded like in other
graph models. (3) Perspective shifts represent abstract relations that can be used to
guide walking, cycling, driving, etc. No problem of generalizing from one represented
action to another action occurs as in other graph representations.
The network of reference frames theory also avoids problems from the cognitive
map (see 1.2): (1) It can explain self-localization and route navigation in a straight
forward manner which is difficult for cognitive maps. (2) An environmental space is
not encoded within one reference frame as with a cognitive map. The representation,
therefore, does not have to be consistent globally. So, contrary to cognitive maps,
short cutting is also possible when navigating “impossible” virtual environments [34].
(3) The lack of clear evidence for survey navigation in non-human mammals and
insects can be easily explained. According to the network of reference frames theory,
these animals are not capable of imagining anything or they do not do so for survey
purposes. However, survey navigation relies on the same representation as self-
localization and route navigation. Only the additional process of imagining operates
on this representation. This process might have even evolved for completely different
purposes than navigation. Contrary to that, cognitive map theory has to assume that an
additional representation, (i.e., a cognitive map), evolved only in humans specifically
for orientation. These are much stronger assumptions. (4) Imagining distant destina-
tions within working memory involves a lot of computation. Survey tasks are, there-
fore, rigorous and error prone which probably most people can confirm. In contrast,
this daily life observation is not plausible with a cognitive map. Deriving the direction
to distant locations from a cognitive map is rather straight forward and should not be
more rigorous than, for example, route navigation. 4
The network of reference frames theory also has advantages compared to assuming
both a graph and a cognitive map in parallel (see 1.2):5 Here survey navigation is
again explained by the cognitive map part. This does not avoid the last three problems
mentioned in the last paragraph.6 In addition, the network of reference frames theory
makes fewer assumptions. On a rough scale, it only assumes one representation, the
4
Alternatively to simply read out survey relations from a cognitive map, mental travel has been
proposed as an alternative process [2]. Mental travel can be considered as being more effort-
ful and is, therefore, much more plausible. For the network of reference frames theory
continuous mental travel in the area of an encoded vista space can be imagined. Between non-
adjacent vista spaces, this should be rather difficult.
5
Some theories assuming both a network representation and a global cognitive map are skepti-
cal regarding the necessity and the evidence for such a cognitive map ([16], [31]).
6
In his theory Poucet [31] assumes a network layer with pairwise metric relations between
places. This representation can be used to compute shortcuts and avoids the problems men-
tioned with cognitive maps. However, Poucet also proposes a global integration within a cog-
nitive map, leading again to the mentioned problems. In addition, it is unclear which of the
two metric representations determine survey navigation.
354 T. Meilinger
3.2 Vista Space Reference Frames as the Basic Unit in the Representation of
Environmental Spaces
7
In Yeap’s theory [50] all vista spaces are directly adjacent to each other and are connected via
exits. Survey relations computed from that representation are, therefore, correct when the
form of individual vista spaces are correct. In the network of reference frames theory the pre-
ciseness of survey relations depends of the preciseness of the perspective shifts. In addition,
Yeap assumes a hierarchical structuring on top of the basic vista space level. Touretzky and
Redish [44] do not tell anything about environmental spaces. They also assume multiple, si-
multaneously active reference frames represent one vista space.
The Network of Reference Frames Theory 355
(e.g., [3]). Short cutting is difficult, because it encompasses more than just one vista
space. In contrast, selecting the direct path to a necessarily visible location within a
vista space is trivial. Visibility is also correlated with behavior. More vista spaces,
(i.e., corridors on a route), lead to larger errors in Euclidean distance estimation [41].
Learning a virtual environmental space is easier with a full view down a corridor than
when visual access is restricted to a short distance, which results in more vista spaces
that need be encoded [38]. Place cells in human and rodent hippocampus seem to
represent a location in a vista space ([5], [30]). Place cells fire every time a navigator
crosses a specific area independent of head orientation. This area is relative to the
surrounding boundaries of a vista space and is adjusted when changing the overall
size or shape of the vista space [29]. One and the same place cell can be active in
different vista spaces, and can therefore, not encode one specific location in an envi-
ronmental space [37]. In conclusion, a set of place cells is a possible neuronal repre-
sentation of locations within one frame of reference. This frame is likely to be limited
to a vista space.
In addition to arguments from the literature, we recently tested the prediction from
the network reference frames theory concerning the importance of vista space refer-
ence frames [23]. This prediction incorporated, first, that a vista space is the largest
unit encoded within one single reference frame, and second, that the orientation of
such a vista space reference frame is important, (i.e., that navigators perform better
when they are aligned with that orientation). Participants learned a simple immersive
virtual environmental space consisting of seven corridors by walking in one direction.
In the testing phase, they were teleported to different locations in the environment and
were asked to self-localize and then point towards previously learned targets. As pre-
dicted by the network of reference frames theory, participants performed better when
oriented in the direction in which they originally learned each corridor, (i.e., when
they were aligned with an encoded vista space reference frame). If the whole envi-
ronment was encoded within one single frame of reference, this result could not be
predicted. One global reference frame should not result in any difference at all (cf.,
[12]) or participants should perform better when aligned with the orientation of this
single global reference frame as predicted by reference axis theory ([28], [33]). No
evidence for this could be observed. Participants seem to encode multiple local refer-
ence frames for each vista space in the orientation they experienced this vista space
(which coincided with its geometry).
The reference frames in the network of reference frames theory correspond to vista
spaces and they are connected via perspective shifts. Are these relations egocentric or
allocentric? Egocentric and allocentric reference frames have been discussed inten-
sively over the last few years (e.g., [28], [46]). In an egocentric reference frame loca-
tions and orientations within an environment are represented relative to the location
and orientation of a navigator’s body in space [15]. This is best described by a polar
coordinate system. An allocentric reference frame is specified by a space external to a
navigator. Here object-to-object relations are represented in contrast to the object-to-
body relations in the egocentric reference frame. An allocentric reference frame is
best described by a Cartesian coordinate system.
356 T. Meilinger
3.4 The Relation between Vista Space Reference Frames: Network vs.
Hierarchy
Hierarchic theories of spatial memory have been very prominent (e.g., [4], [11], [40],
[50]). In such views, smaller scale spaces are stored at progressively lower levels of
the hierarchy. Contrary to these approaches, the network of reference frames theory
does not assume environmental spaces are organized hierarchically, but assumes envi-
ronmental spaces are organized in a network. There is no higher hierarchical layer
assumed above a vista space. All vista spaces are equally important in that sense. This
does not exclude vista spaces themselves from being organized hierarchically.
Hierarchical graph models or hierarchical cognitive maps still face most of the
problems discussed in 3.1. However, one argument for hierarchical structuring is
based on clustering effects. In clustering effects, judgments within a spatial region are
different from judgments between or without spatial regions. For instance, within
a region distances are estimated faster and judged being shorter or locations are
The Network of Reference Frames Theory 357
remembered lying more to the center of such a region than they were seen before.
Many of these clustering effects have been examined for regions within a vista space
or a whole country usually learned via maps (e.g., [40]). They are, therefore, not rele-
vant here. However, clustering effects are also found in directly experienced envi-
ronmental spaces. Experiments show that distance judgments [11] and route decisions
between equal length alternatives [49] are influenced by regions within the environ-
mental space. These effects cannot be explained by the network of reference theory
alone. A second categorical memory has to be assumed which represents a specific
region (cf., [13]). Judgments must be based at least partially on these categories and
not on the network of reference frames only. These categories might consist of verbal
labels such as “downtown” [22]. As a prediction, no clustering effects for directly
learned environmental spaces should be observed when such a category system is
inhibited, (e.g., by verbal shadowing).
The perspective shifts assumed by the network of reference frames theory are not
symmetric. They always point from one vista space to another and are not inverted
easily. Tasks accessing a perspective shift in its encoded direction should be easier
and more precise than tasks that require accessing the perspective shift in the opposite
direction - at least as long as there is no additional perspective shift encoded in the
opposite direction. This asymmetry can explain the route direction effect in spatial
priming and different route choices for wayfinding there and back.
After learning a route presented on a computer screen in only one direction, recog-
nizing pictures of landmarks is faster when primed with a picture of an object encoun-
tered before the landmark than when primed with an object encountered after the
landmark (e.g., [14]). According to the network of reference frames theory the direc-
tionality of perspective shifts speeds up activation spread in the direction the route
was learned. Therefore, priming is faster in the direction a route was learned.
Asymmetries are also found in path choices. In a familiar environment, navigators
often choose different routes on the way out and back (e.g., [39]). According to the
network of reference frames theory, different perspective shifts usually connect vista
spaces on a route out and back. Due to different connections, different routes can be
selected when planning a route out compared to planning the route back.
The network of reference frames theory explains asymmetries on the level of route
knowledge. However, it also predicts an asymmetry in survey knowledge. Learning a
route mainly in one direction should result in an improved survey performance, (i.e.,
faster and more precise pointing), in this direction compared to the opposite direction.
This yet has to be examined.
4 Conclusions
The network of reference frames theory is a synthesis from graph representations and
cognitive maps. It resolves problems that exist in explaining the orientation behavior
of human and non-human animals based on either graphs, maps or both of them in
parallel. In addition, the theory explains the unique role of vista spaces as well as
358 T. Meilinger
asymmetries in spatial memory. New predictions from the theory concern, first, the
role of orientation within environmental spaces, which has been tested recently, sec-
ond, the lack of clustering effects in environmental spaces based on the assumed
memory alone, and third, an asymmetry in survey knowledge tasks. Further experi-
ments have to show whether the network of reference frames theory will prove of
value in these and other cases.
References
1. Bennett, A.T.D.: Do animals have cognitive maps? Journal of Experimental Biology 199,
219–224 (1996)
2. Byrne, P., Becker, S., Burgess, N.: Remembering the past and imagining the future: a neu-
ral model of spatial memory and imagery. Psychological Review 114, 340–375 (2007)
3. Cheng, K., Newcombe, N.S.: Is there a geometric module for spatial orientation? Squaring
theory and evidence. Psychonomic Bulletin & Review 12, 1–23 (2005)
4. Chown, E., Kaplan, S., Kortenkamp, D.: Prototypes location, and associative networks
(PLAN): Towards a unified theory of cognitive mapping. Cognitive Science 19, 1–51
(1995)
5. Ekstrom, A., Kahana, M., Caplan, J., Fields, T., Isham, E., Newman, E., Fried, I.: Cellular
networks underlying human spatial navigation. Nature 425, 184–187 (2003)
6. Fujita, N., Klatzky, R.L., Loomis, J.M., Golledge, R.G.: The encoding-error model of
pathway completion without vision. Geographical Analysis 25, 295–314 (1993)
7. Gallistel, C.R.: The organization of learning. MIT Press, Cambridge (1990)
8. Hamilton, D.A., Driscoll, I., Sutherland, R.J.: Human place learning in a virtual Morris
water task: some important constraints on the flexibility of place navigation. Behavioural
Brain Research 129, 159–170 (2002)
9. Hegarty, M., Waller, D.: Individual differences in spatial abilities. In: Shah, P., Miyake, A.
(eds.) The Cambridge Handbook of Visuospatial Thinking, pp. 121–169. Cambridge Uni-
versity Press, Cambridge (2005)
10. Hein, A., Held, R.: A neural model for labile sensorimotor coordination. In: Bernard, E.E.,
Kare, M.R. (eds.) Biological prototypes and synthetic systems, vol. 1, pp. 71–74. Plenum,
New York (1962)
11. Hirtle, S.C., Jonides, J.: Evidence of hierarchies in cognitive maps. Memory & Cogni-
tion 13, 208–217 (1985)
12. Holmes, M.C., Sholl, M.J.: Allocentric coding of object-to-object relations in overlearned
and novel environments. Journal of Experimental Psychology: Learning, Memory and
Cognition 31, 1069–1078 (2005)
13. Huttenlocher, J., Hedges, L.V., Duncan, S.: Categories and particulars: prototype effects in
estimating spatial location. Psychological Review 98, 352–376 (1991)
14. Janzen, G.: Memory for object location and route direction in virtual large-scale space. The
Quarterly Journal of Experimental Psychology 59, 493–508 (2006)
The Network of Reference Frames Theory 359
15. Klatzky, R.L.: Allocentric and egocentric spatial representations: Definitions, distinctions,
and interconnections. In: Freska, C., Habel, C., Wender, K.F. (eds.) Spatial cognition - An
interdisciplinary approach to representation and processing of spatial knowledge, pp. 1–17.
Springer, Berlin (1998)
16. Kuipers, B.: The spatial semantic hierarchy. Artificial Intelligence 119, 191–233 (2000)
17. Loomis, J.M., Klatzky, R.L., Golledge, R.G., Philbeck, J.W.: Human navigation by path
integration. In: Golledge, R.G. (ed.) Wayfinding behavior, pp. 125–151. John Hopkins
Press, Baltimore (1999)
18. MacFarlane, D.A.: The role of kinesthesis in maze learning. University of California Pub-
lications in Psychology 4 277-305 (1930); (cited from Spada, H. (ed.) Lehrbuch allge-
meine Psychologie. Huber, Bern (1992)
19. McNaughton, B.L., Leonard, B., Chen, L.: Cortical-hippocampal interactions and cogni-
tive mapping: A hypothesis based on reintegration of parietal and inferotemporal pathways
for visual processing. Psychbiology 17, 230–235 (1989)
20. Mallot, H.: Spatial cognition: Behavioral competences, neural mechanisms, and evolution-
ary scaling. Kognitionswissenschaft 8, 40–48 (1999)
21. Meilinger, T., Hölscher, C., Büchner, S.J., Brösamle, M.: How Much Information Do You
Need? Schematic Maps in Wayfinding and Self Localisation. In: Barkowsky, T., Knauff,
M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition V, pp. 381–400. Springer, Berlin
(2007)
22. Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding - a dual task
experiment in a virtual city. Cognitive Science 32, 755–770 (2008)
23. Meilinger, T., Riecke, B.E., Bülthoff, H.H.: Orientation Specificity in Long-Term-Memory
for Environmental Spaces (submitted)
24. Moeser, S.D.: Cognitive mapping in a complex building. Environment and Behavior 20,
21–49 (1988)
25. Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study. Envi-
ronment and Behavior 23, 47–69 (1991)
26. Montello, D.R.: Scale and multiple psychologies of space. In: Frank, A.U., Campari, I.
(eds.) Spatial information theory: A theoretical basis for GIS, pp. 312–321. Springer, Ber-
lin (1993)
27. Montello, D.R., Pick, H.L.: Integrating knowledge of vertically aligned large-scale spaces.
Environment and Behavior 25, 457–484 (1993)
28. Mou, W., Xiao, C., McNamara, T.P.: Reference directions and reference objects in spatial
memory of a briefly viewed layout. Cognition 108, 136–154 (2008)
29. O’Keefe, J., Burgess, N.: Geometric determinants of the place fields of hippocampal neu-
rons. Nature 381, 425–428 (1996)
30. O’Keefe, J., Nadel, L.: The hippocampus as a cognitive map. Clarendon Press, Oxford
(1978)
31. Poucet, B.: Spatial cognitive maps in animals: New hypotheses on their structure and neu-
ral mechanisms. Psychological Review 100, 163–182 (1993)
32. Restat, J., Steck, S.D., Mochnatzki, H.F., Mallot, H.A.: Geographical slant facilitates navi-
gation and orientation in virtual environments. Perception 33, 667–687 (2004)
33. Rump, B., McNamara, T.P.: Updating Models of Spatial Memory. In: Barkowsky, T.,
Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition V, pp. 249–269. Springer,
Berlin (2007)
34. Schnapp, B., Warren, W.: Wormholes in virtual reality: What spatial knowledge is learned
for navigation? In: Proceedings of the 7th Annual Meeting of the Vision Science Society
2007, Sarasota, Florida, USA (2007)
360 T. Meilinger
35. Sholl, J.M., Kenny, R.J., DellaPorta, K.A.: Allocentric-heading recall and its relation to
self-reported sense-of-direction. Journal of Experimental Psychology: Learning, Memory,
and Cognition 32, 516–533 (2006)
36. Siegel, A.W., White, S.H.: The development of spatial representations of large-scale envi-
ronments. In: Reese, H. (ed.) Advances in Child Development and Behavior, vol. 10, pp.
10–55. Academic Press, New York (1975)
37. Skaggs, W.E., McNaughton, B.L.: Spatial Firing Properties of Hippocampal CA1 Popula-
tions in an Environment Containing Two Visually Identical Regions. Journal of Neurosci-
ence 18, 8455–8466 (1998)
38. Stankiewicz, B.J., Legge, G.E., Mansfield, J.S., Schlicht, E.J.: Lost in Virtual Space: Stud-
ies in Human and Ideal Spatial Navigation. Journal of Experimental Psychology: Human
Perception and Performance 37, 688–704 (2006)
39. Stern, E., Leiser, D.: Levels of spatial knowledge and urban travel modeling. Geographical
Analysis 20, 140–155 (1988)
40. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychology 10,
422–437 (1978)
41. Thorndyke, P.W., Hayes-Roth, B.: Differences in spatial knowledge acquired from maps
and navigation. Cognitive Psychology 14, 560–589 (1982)
42. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
43. Tolman, E.C., Ritchie, B.F., Khalish, D.: Studies in spatial learning. I. Orientation and the
short-cut. Journal of Experimental Psychology 36, 13–24 (1946)
44. Touretzky, D.S., Redish, A.D.: Theory of rodent navigation based on interacting represen-
tations of space. Hippocampus 6, 247–270 (1996)
45. Trullier, O., Wiener, S.I., Berthoz, A., Meyer, J.-A.: Biologically based artificial naviga-
tion systems: Review and prospects. Progress in Neurobiology 51, 483–544 (1997)
46. Wang, F.R., Spelke, E.S.: Human spatial representation: insights form animals. Trends in
Cognitive Sciences 6, 376–382 (2002)
47. Wang, R.F., Brockmole, J.R.: Simultaneous spatial updating in nested environments. Psy-
chonomic Bulletin & Review 10, 981–986 (2003)
48. Werner, S., Krieg-Brückner, B., Herrmann, T.: Modelling Navigational Knowledge by
Route Graphs. In: Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.) Spatial Cognition
2000. LNCS (LNAI), vol. 1849, pp. 295–316. Springer, Heidelberg (2000)
49. Wiener, J., Mallot, H.: Fine-to-coarse route planning and navigation in regionalized envi-
ronments. Spatial Cognition and Computation 3, 331–358 (2003)
50. Yeap, W.K.: Toward a computational theory of cognitive maps. Artificial Intelligence 34,
297–360 (1988)
Spatially Constrained Grammars
for Mobile Intention Recognition
Peter Kiefer
1 Introduction
The problem of inferring an agent’s intentions from her behavior is called in-
tention recognition problem. The closely related problem of plan recognition has
been discussed in AI literature since many years [1]. Approaches for plan recog-
nition differ in the way the domain and possible plans are represented. While
early work tended to be quite general, like Kautz’s event hierarchies [2], current
research is typically concerned with specialized use cases (e.g. [3]), and efficient
inference (e.g. [4]).
A class of intention recognition problems with specific need for efficient infer-
ence is mobile intention recognition. We observe a mobile user’s trajectory and
try to ‘guess’ what intentions she has in mind. These mobile problems are differ-
ent, not only because of the restricted computational and cognitive resources [5].
Mobile intention recognition problems also differ to ‘traditional’ use cases be-
cause mobile behavior happens in space. This has a number of implications. One
is that we have knowledge about the spatial context, about spatial objects, their
relations, and spatial constraints. A glance at current research on the inverse
problem, spatio-temporal planning, gives us an idea how these constraints can
look like: Seifert et al. discuss an interactive assistance system that supports in
spatio-temporal planning tasks [6]. In their example they describe the constraints
that need to be considered when planning a trip: the temporal order of activities,
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 361–377, 2008.
c Springer-Verlag Berlin Heidelberg 2008
362 P. Kiefer
the time needed for traveling from A to B, and spatial constraints about what
actions can be performed at which location. Important about Seifert’s approach
is that the chosen hierarchical spatial structure offers a cognitively appealing
way of interaction between user and planning system, while at the same time
helping to prune the search space.
In this paper, we will see that complex constraints between intentions and
space not only give us a rich toolbox to formalize typical behavioral patterns
in mobile intention recognition, but can also speed up inference. We choose for-
mal grammars to represent intentions so that the intention recognition problem
becomes a parsing problem. Grammars are, in general, cognitively easy to under-
stand and make the connection between expressiveness and complexity explicit.
The main contribution of this paper is the combination of spatial constraints
with Tree Adjoining Grammars (TAG), a formalism from natural language pro-
cessing (NLP) that falls in complexity between context-free and context-sensitive
grammars (CFG, CSG). The idea to apply grammar formalisms from NLP to
plan/intention recognition is also followed by Geib and Steedman [7], and in own
previous work [8]. In difference to these approaches, our spatially constrained
grammars allow the formalization of complex, non-local constraints between in-
tentions and space (and not only between intentions).
The rest of this paper is structured as follows: in section 2 we explain which
steps are necessary to state a mobile intention recognition problem as a pars-
ing problem. In this context we review Spatially Grounded Intentional Systems
(SGIS) [9]. In section 3, we explain which important use cases cannot be han-
dled with SGIS, and proceed over Spatially Constrained Context-Free Grammars
(SCCFG) to Spatially Constrained Tree-Adjoining Grammars (SCTAG). Using
real motion track data from the location-based game CityPoker we discuss which
general spatio-temporal behavior patterns are handled best with which formal-
ism. The paper closes with a discussion of related work (section 4) and an outlook
on questions that remain open (section 5).
br
br b0 br
br b0
bcs br
br br
bcs
br br bc
bs
bs
b0
br
br
br
br
bc
Fig. 1. Segmented motion track with classified behavior sequence from a CityPoker
game. (The player enters from the right.)
by a restaurant does not necessarily have the intention to eat there. Schlieder
calls this spatio-temporal design problem room crossing problem [9].
This leads us to the second implication of spatio-temporality: the gap be-
tween sensor input (e.g. position data from a GPS device) and high-level in-
tentions (e.g. ‘find a restaurant’ ) is extremely large. It is not possible to design
an intelligent intention recognition algorithm that works directly on pairs of
(latitude/longitude). To bridge this gap, we use a multi-level architecture with
the level of behaviors as intermediate level between position and intention. We
process a stream of (lat/lon)-pairs as follows:
1. Preprocessing. The quality of the raw GPS data is improved. This includes
removing points with zero satellites, and those with an impossible speed.
2. Segmentation. The motion track is segmented at the border of regions, and
when the spatio-temporal properties (e.g. speed, direction) of the last n
points have changed significantly [12].
3. Feature Extraction. Each segment is analyzed and annotated with certain
features, like speed and curvature [13].
4. Classification. Using these features, each motion segment is classified to one
behavior. We can use any mapping function from feature vector to behaviors,
for instance realized as a decision tree.
which are hidden in a city. The gaming area is structured by five rectangular
cache regions. In each cache region there are three potential cache coordinates
(one is drawn as a circle in Fig. 1). Cards are only hidden in one of the three
potential caches. Players can find out about the correct cache by answering a
multiple choice question. Once they have arrived at the cache, they perform
a detail search in the environment, under bushes, trees, or benches, until they
finally find the cards. They may then trade one card against one from their hand,
and continue in the game. For a complete description of the game, refer to [9].
The reason why this game is especially suited as exemplary use case is that
CityPoker is played by bike at high speed. The user’s cognitive resources are
bound by the traffic, and she does not have the possibility to interact with
the device (a J2ME enabled smartphone, localized by GPS) in a proper way.
Similar situations occur in other use cases, like car navigation or maintenance
work. Depending on the intention recognized we want to select an appropriate
information service automatically. For instance, if we recognize the intention
F ind W ay we will probably select a map service. It is up to the application
designer to decide whether to present the service with information push, or just
to ease the access to this service (‘hotbutton’). We will not discuss the step of
mapping intentions to information services any further in this paper.
When parsing formal grammars we easily find ourselves in a situation where the
same input sequence may have two or more possible parse trees, i.e. more than
one possible interpretation. This is especially true when parsing an incomplete
behavior sequence incrementally. One way to deal with ambiguity are probabilis-
tic grammars [17] where we have to determine a probability for each rule in the
grammar. A spatial way of ambiguity reduction is proposed by Schlieder in [9]:
SGIS are context-free production systems, like that in Fig. 2, with the extension
that each rule is annotated with a number of regions in which it is applicable. We
call this the spatial grounding of rules. For instance, a HandleCache intention is
grounded in all regions of type cache. We modify all rules accordingly. An SGIS
rule for the original rule (12) would look like follows:
This reduces the number of possible rules applicable at each position in the
behavior sequence, thus avoiding many ambiguities. Figure 3 shows two possible
interpretations for the behavior sequence from Fig. 1: without spatial knowledge
we could not decide which of the two interpretations is correct. For parsing in
SGIS we replace the pure behavior stream (beh1 , beh2 , beh3 , ...) by a stream of
behavior/region pairs: ((beh1 , reg1 ), (beh2 , reg2 ), (beh3 , reg3 ), ...). Each behavior
is annotated with the region in which it occurs. Also the non-terminals in the
parse tree are annotated with a region (Intention, region), with the meaning that
all child-intentions or child-behaviors of this intention must occur in that region.
SGIS are a short form of writing rules of the following form (where Symbol can
be an intention or a behavior):
That means, we cannot write rules for arbitrary combinations of regions. In addi-
tion, we require that another rule can only be inserted at an intention Symboli if
the region of the other rule is (transitive) child in the partonomy, i.e. in the above
rule we can only insert productions with a region regy part of regx (which in-
cludes the same region: regy .equals(regx )). SGIS have been designed for partono-
mially structured space. The nesting of rules follows closely the nesting of regions
366 P. Kiefer
HandleRegion
DetailSearch SearchCards
DetailSearch SearchCards
CrossCache SearchCards
SearchWayToC GoToCache
SearchWayToC GoToCache
NavigateTowardsC
...
br b0 br bcs bc br ...
Fig. 3. Parsing ambiguity if we had no spatial knowledge (see track from Fig. 1).
Through spatial disambiguation in SGIS we can decide that the bottom parse tree is
correct.
Spatially Constrained Grammars for Mobile Intention Recognition 367
The definition of a new spatial context-free grammar that handles these ex-
amples is quite straightforward.
Definition 1. A Spatially Constrained Context-Free Grammar is defined as
SCCF G = (CF G, R, SR, GC, N LC), where
– CFG is a context-free grammar (I, B, P, S), defined over intentions I, and
behaviors B, with production rules P and start symbol S (the top-level inten-
tion).
– R is a set of regions
– SR is a set of spatial relations, where each relation r ⊆ R × R
– GC ⊆ P × R is a set of grounding constraints (as in SGIS [9])
– NLC is a set of spatial non-local constraints. Each constraint has a type
from the spatial relations SR and is defined for two right-hand symbols of
one production rule from P.
We introduce the grounding constraints to make SCCFG a real extension of
SGIS. However, we will not always need them, as in the CityPoker example.
The reason is that CityPoker-regions are typed according to their level in the
partonomy (cache part of cache region part of gameboard). With a SCCFG we
can rewrite the rules from Fig. 2 without spatial grounding in a specific region,
but with part of and identical relations, for instance for rules (5) and (12):
identical part of
HandleRegion → SelectCache GotoCache HandleCache
identical
HandleCache → SearchCards DiscussStrategy
SCCFG obviously have a higher expressiveness than SGIS. We can express more
spatial relations than part of, and create a nesting of relations by applying the
production rules. In difference to SGIS, the nesting of constraints is not neces-
sarily accompanied by an according nesting of regions in the partonomy. The
example above for rule (5) shows that we could also infer new relations from
those we know (HandleCache must be partof SelectCache).
In principle, we could define an SCCFG for a non-partonomial spatial struc-
ture although this might make the model cognitively more demanding.
Spatially Constrained Grammars for Mobile Intention Recognition 369
Fig. 4. Substitution (left) and adjoining (right) on a TAG (taken from [20, Fig. 2.2])
they support certain kinds of dependencies, including crossed and nested depen-
denciess. They are polynomially parsable and thus especially attractive for mobile
intention recognition.
Tree-Adjoining Grammars (TAG), first introduced in [22], are a MCSG with
an especially comprehensible way of modeling dependencies. The fundamental
difference to CFGs is that TAGs operate on trees, and not on strings. A good
introduction to TAG is given by Joshi and Schabes in [20]. They define TAG as
follows.
Definition 2. A Tree-Adjoining Grammar is defined as TAG = (NT, Σ, IT,
AT, S), where
– NT are non-terminals
– Σ are terminals.
– IT is a finite set of initial trees. In an initial tree, interior nodes are labeled
by non-terminals. The nodes on the frontier (leaf nodes) are labeled by either
terminals, or non-terminals. A frontier node labeled with a non-terminal
must be marked for substitution. We mark substitution nodes with a ↓.
– AT is a finite set of auxiliary trees. In an auxiliary tree, interior nodes are
also labeled by non-terminals. Exactly one node at the frontier is the foot
node, marked with an asterisk ∗. The foot node must have the same label as
the root node. All other frontier nodes are either terminals or substitution
nodes, as in the initial trees.
– S is a distinguished non-terminal (starting symbol).
The two operations defined on TAGs are substitution and adjoining (see Fig. 4).
Adjoining is sometimes also called adjunction. Both operations work directly on
trees. Substitution is quite straightforward: we can place any initial tree (or any
tree that has been derived from an initial tree) headed with a symbol X into
a substitution node labeled with X↓. It is the adjoining operation that makes
TAGs unique: we can adjoin an auxiliary tree labeled with X into an interior
node of another tree with the same label. This operation works as follows: (1) we
remove the part of the tree which is headed by the interior node, (2) replace it
by the auxiliary tree, and (3) attach the partial tree which was removed in step 1
at the foot node. The language defined by a TAG is a set of trees. By traversing
a tree we can certainly also interpret it as a String, just like traversing a parse
tree of a CFG. If, just for a moment, we try to interpret the two operations as
operations on Strings, we see that substitution just replaces a non-terminal by
a number of symbols. This is exactly as applying a production rule in a CFG.
Adjoining manipulates a String in a more intricate way: a part of the old String
(the terminals of the grey tree in Fig. 4) becomes surrounded by new Strings to
the left and to the right (by the left and right handside of the X∗ in the auxiliary
trees).
Joshi and Schabes later add to their definition of TAG the following Adjoining
Constraints: Selective Adjunction, Null Adjunction, and Obligatory Adjunction.
Every non-terminal in any tree may be constrained by one of these. Selective
Adjunction restrains the auxiliary tree that may be adjoined at that node to a
Spatially Constrained Grammars for Mobile Intention Recognition 371
identical part of
(γ) Continue
identical
Fig. 5. Initial tree (α) and auxiliary trees (β and γ) in a SCTAG for CityPoker
set of auxiliary trees. Obligatory Adjunction does the same, but at the same time
forces us to do adjoin at that node. Null Adjunction disallows any adjunction
at that node. These local constraints are important to write sensible grammars,
but will not be further discussed here due to our focus on non-local constraints.
A discussion of the formal properties of TAGs, the differences to other gram-
mars, a corresponding automaton, as well as parsing algorithms can be found in
a number of publications, e.g. [20,23,24]. For our use case it should be clear that
(1) we can easily rewrite any CFG as TAG, (2) TAGs are more expressive than
CFGs, and (3) writing a TAG is not necessarily more complicated than writing
a CFG. Instead of writing a number of production rules, we just write a number
of trees.
– TAG = (I, B, IT, AT, S), defined over intentions I, and behaviors B.
– R is a set of regions
– SR is a set of spatial relations, where each relation r ⊆ R × R
– GC ⊆ (IT ∪ AT ) × R is a set of grounding constraints
– NLC is a set of spatial non-local constraints. Each constraint has a type from
the spatial relations SR and is defined for two nodes in one tree from IT∪AT.
This definition applies the idea of spatial constraints to TAGs. The non-local
constraints are now defined between nodes in initial/auxiliary trees. The idea of
specifying non-local dependencies in TAG is not new. In earlier work on TAGs,
Joshi describes this concept as ‘TAGs with links’ [23, Section 6.2].
During the operations of substitution and adjoining the non-local constraints
remain in the tree, and become stretched if necessary. Adjoining may also lead to
372 P. Kiefer
(γ adj α) Play
DiscussStrategy↓ Continue
Continue RevisitRegion↓
identical
Play
DiscussStrategy↓ Continue
GotoRegion↓HandleRegion↓ Continue
GotoRegion↓HandleRegion↓ Continue
Continue RevisitRegion↓
ContinueRevisitRegion↓
identical
identical
cross-dependencies like needed for modeling the crossed return to region pattern.
Figure 5 lists part of a SCTAG that handles the re-visisting of cache regions in
CityPoker. Non-local spatial constraints are displayed as dotted lines. A complete
grammar for this use case would convert all context-free rules from Fig. 2 to trees
and add them to the grammar. This step is trivial. Figure 6 demonstrates how
cross-dependencies evolve through two adjoining operations.
quite high. Joshi presents a TAG parser that adopts the idea of Earley and
improves the average case complexity [20].
We build the parsers for SCCFG and SCTAG on these Earley-like parsers.
Earley parsers work on a chart in which the elementary constructs of the gram-
mar are kept, production rules for CFGs, trees for TAGs. A dot in each of these
chart entries marks the position up to which this construct has been recognized.
In Joshi’s parser the ‘Earley dot’ traverses trees and not Strings. Earley parsers
work in three steps: scan, predict, and complete. Predict checks for possible
derivations and adds them to a chart. Scan reads the next symbol from the
stream and matches it with the chart entries. Complete passes the recognition
of rules up the tree until finally we have recognized the starting symbol. The
TAG parser has a fourth operation, called ‘adjoin’, to handle this additional
operation.
Our point is that adding spatial constraints to such a parser will not make it
slower but faster. The reason is that spatial constraints give us more predictive
information. ‘Any algorithm should have enough information to know which
tokens are to be expected after a given left context’ [20, p.36]. Knowing the
spatial context of left-hand terminals we can throw away those hypotheses that
are not consistent with the spatial constraints. We add this step after each scan
operation.
4 Related Work
We started this paper by saying that approaches for intention recognition dif-
fer in the way the domain and possible intentions are represented. A number
of formalisms has been proposed for modeling the mental state of an agent,
ranging from finite state machines [26] to complex cognitive modeling architec-
tures, like the ACT-R architecture [27]. With our formal grammars, which are
between these two extremes, we try to keep the balance between expressiveness
and computational complexity.
Using formal grammars to describe structural regularities is common, not
only in NLP, but also in areas like computer vision [28], and action recognition
[29]. Pynadath’s state dependent grammars constrain the applicability of a rule
dependent on a general state variable [17]. The generality of this state variable
leads to an explosion in symbol space if trying to apply a parsing algorithm,
so that an inference mechanism is chosen which translates the grammar into a
Dynamic Bayes Network (DBN).
Choosing a grammatical approach means using grammars not only for syntax
description, but implicitly assigning a certain semantics (in terms of intentions
and plans). Linguistics is also concerned with semantics, both, on the sentence
level, and on the level of discourse. Webber et al. [30], as one example for the
literature on discourse semantics, argue that multiple, possibly overlapping, se-
mantic relations are common in discourse semantics. By using (lexicalized) TAG
they describe these relations without the need for building multiple trees.
374 P. Kiefer
Acknowledgements
I would like to thank Klaus Stein for the discussions on the algorithmic possibil-
ities of SCTAG parsing. Christoph Schlieder’s motivating and constant support
of my PhD research made this work possible.
References
1. Schmidt, C., Sridharan, N., Goodson, J.: The plan recognition problem: An in-
tersection of psychology and artificial intelligence. Artificial Intelligence 11(1-2),
45–83 (1978)
2. Kautz, H.A.: A Formal Theory of Plan Recognition. PhD thesis, University of
Rochester, Rochester, NY (1987)
3. Jarvis, P.A., Lunt, T.F., Myers, K.L.: Identifying terrorist activity with ai plan-
recognition technology. AI Magazine 26(3), 73–81 (2005)
4. Bui, H.H.: Efficient approximate inference for online probabilistic plan recogni-
tion. Technical Report 1/2002, School of Computing Science, Curtin University of
Technology, Perth, WA, Australia (2002)
5. Baus, J., Krueger, A., Wahlster, W.: A resource-adaptive mobile navigation system.
In: Proc. 7th International Conference on Intelligent User Interfaces, San Francisco,
USA, pp. 15–22. ACM Press, New York (2002)
6. Seifert, I., Barkowsky, T., Freksa, C.: Region-Based Representation for Assistance
with Spatio-Temporal Planning in Unfamiliar Environments. In: Location Based
Services and TeleCartography, pp. 179–192. Springer, Heidelberg (2007)
7. Geib, C.W., Steedman, M.: On natural language processing and plan recognition.
In: Proceedings of the 20th International Joint Conference on Artificial Intelligence
(IJCAI), pp. 1612–1617 (2007)
376 P. Kiefer
28. Chanda, G., Dellaert, F.: Grammatical methods in computer vision: An overview.
Technical Report GIT-GVU-04-29, College of Computing, Georgia Institute of
Technology, Atlanta, GA, USA (November 2004),
ftp://ftp.cc.gatech.edu/pub/gvu/tr/2004/04-29.pdf
29. Bobick, A., Ivanov, Y.: Action recognition using probabilistic parsing. In: Proc. of
the Conference on Computer Vision and Pattern Recognition, pp. 196–202 (1998)
30. Webber, B., Knott, A., Stone, M., Joshi, A.: Discourse relations: A structural
and presuppositional account using lexicalised tag. In: Proc. of the 37th. Annual
Meeting of the American Association for Computational Linguistics (ACL1999),
pp. 41–48 (1999)
31. Charniak, E., Goldman, R.P.: A bayesian model of plan recognition. Artificial
Intelligence 64(1), 53–79 (1993)
32. Liao, L., Patterson, D.J., Fox, D., Kautz, H.: Learning and inferring transportation
routines. Artificial Intelligence 171(5-6), 311–331 (2007)
33. Bui, H.H.: A general model for online probabilistic plan recognition. In: Proceedings
of the International Joint Conference on Artificial Intelligence (IJCAI) (2003)
34. Brandherm, B., Schwartz, T.: Geo referenced dynamic Bayesian networks for user
positioning on mobile systems. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA
2005. LNCS, vol. 3479, pp. 223–234. Springer, Heidelberg (2005)
35. Ashbrook, D., Starner, T.: Using gps to learn significant locations and predict
movement across multiple users. Personal and Ubiquitous Computing 7(5), 275–
286 (2003)
36. Gottfried, B., Witte, J.: Representing spatial activities by spatially contextu-
alised motion patterns. In: RoboCup 2007, International Symposium, pp. 329–336.
Springer, Heidelberg (2007)
37. Samaan, N., Karmouch, A.: A mobility prediction architecture based on contextual
knowledge and spatial conceptual maps. IEEE Transactions on Mobile Comput-
ing 4(6), 537–551 (2005)
38. Musto, A., Stein, K., Eisenkolb, A., Röfer, T., Brauer, W., Schill, K.: From motion
observation to qualitative motion representation. In: Habel, C., Brauer, W., Freksa,
C., Wender, K.F. (eds.) Spatial Cognition 2000. LNCS (LNAI), vol. 1849, pp. 115–
126. Springer, Heidelberg (2000)
39. Laube, P., van Krefeld, M., Imfeld, S.: Finding remo - detecting relative motion
patterns in geospatial lifelines. In: Developments in Spatial Data Handling, Pro-
ceedings of the 11th International Symposium on Spatial Data Handling, pp. 201–
215 (2004)
40. Steedman, M.: Plans, affordances, and combinatory grammar. Linguistics and Phi-
losophy 25(5-6), 725–753 (2002)
Modeling Cross-Cultural Performance on the Visual
Oddity Task
1 Introduction
A central problem in studying spatial cognition is representation. Understanding and
modeling the visual representations people construct for the world around them is a
difficult challenge for cognitive science. Dehaene and colleagues [7] made important
progress on this problem by designing a study which directly tests what features peo-
ple represent when they look at geometric figures in a visual scene. Their study util-
ized the Oddity Task methodology: participants were shown an array of six images
and asked to pick the image that did not belong (e.g., see Fig. 1). By varying the diag-
nostic spatial feature, i.e., the feature that distinguished one image from the other five,
they were able to test which features their participants were capable of representing
and comparing.
Dehaene and colleagues ran their study on multiple age groups within two popula-
tions: Americans and the Mundurukú, an indigenous group in South America. They
found that while the Americans performed better overall, the Mundurukú appeared to
be capable of encoding the same spatial features. The Mundurukú performed above
chance on nearly all of the 45 problems, and their pattern of errors correlated highly
with the American pattern of errors. Dehaene concluded from the results that many
spatial features are universal in human representation. However, several questions
remain: (1) What makes one problem harder than another? (2) Why is it that, despite
the high correlation between population groups, some problems seem especially hard
for Americans, while other problems seem especially hard for the Mundurukú? (3) To
what extent can questions 1) and 2) be answered in terms of the process of encoding
representations, versus the process of operating over those representations to solve
problems?
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 378–393, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Modeling Cross-Cultural Performance on the Visual Oddity Task 379
A B C
D E F
This paper presents a cognitive model designed to explore these questions. Our
model is based upon two core claims about spatial cognition: (1) When people encode
a visual scene, they focus on the qualitative attributes and relations of the objects in
these scene [11]. This provides them with a more abstract, more robust representation
than one filled with quantitative details about the scene. (2) People compare low-level
visual representations using the same mapping process used to perform abstract
analogies. Our model of comparison is based on Gentner’s [14] structure-mapping
theory of analogy.
Our model uses four components to simulate the oddity task from end-to-end. We
use a modified version of CogSketch1 [13], a sketch understanding system, to auto-
matically construct qualitative representations of sketches and other two-dimensional
stimuli. We use the Structure-Mapping Engine (SME) [8], a computational model of
structure-mapping theory, to model comparison and similarity judgments. We use
two additional components based on structure-mapping theory: MAGI [9], which
models symmetry detection, and SEQL [18], which models analogical generalization.
Using this approach, we have modeled human performance on geometric analogy
problems [25] (problems of the form “A is to B as C is to …?”); a subset of the Ra-
ven’s Progressive Matrices [20], a visually-based intelligence test; and basic visual
comparison tasks [19,21]. However, the Dehaene task offers a unique opportunity in
that it was designed to isolate specific spatial features and check for their presence or
absence in one’s representation.
This paper presents our cognitive model of performance on the Oddity Task and
uses it to study factors that contribute to difficulty on the task. In comparing the
model with human results, we focus on two population groups: American children
aged 8-13, and the full set of Mundurkú of all ages. We consider these groups because
their overall performance on the 45 problems in the Dehaene study was comparable:
75% for the Americans and 67% for the Mundurkú. We provide evidence for what
might distinguish these groups from each other via ablation studies using the model.
1
Publicly available at http://www.spatialintelligence.org/projects/cogsketch_index.html
380 A. Lovett, K. Lockwood, and K. Forbus
We begin by briefly reviewing the components of our model. We then show how
these component models are combined in our overall model of the Oddity Task. We
analyze the results produced by running the model on the 45 problems from the origi-
nal study, and use ablation studies to explore possible explanations for performance
differences between the two groups. We close with a discussion of related and future
work.
2.2 MAGI
MAGI [9] is a model of symmetry detection based upon SME. Essentially, it identi-
fies symmetry in a representation by comparing the representation to itself, while
avoiding perfect self-matches. MAGI is important in modeling spatial cognition be-
cause it is often necessary to identify axes of symmetry in a visual scene, or in a spe-
cific object.
2.3 SEQL
SEQL [18] is a model of analogical generalization. SEQL is based upon the idea that
individuals learn generalizations for categories through a process of progressive
alignment [16], in which instances of a category are compared and the commonalities
are abstracted out as a direct result of the comparison. Given a set of cases, SEQL can
build one or more generalizations from them by comparing them via SME and elimi-
nating the structure that fails to align between cases, leaving only the structure that is
common across all the cases in the generalization. Because the generalization is in the
Modeling Cross-Cultural Performance on the Visual Oddity Task 381
same form as individual case representations, new cases can be compared to the gen-
eralization to measure their similarity to a category.
3.1 CogSketch
CogSketch [13] is a sketch understanding system based upon the nuSketch [12] archi-
tecture. Users sketch a series of glyphs, or objects in a sketch. CogSketch then
computes a number of qualitative spatial relations between the glyphs, building up a
structural representation of the sketch that corresponds to the shape representation.
CogSketch can also decompose a glyph into its component edges and construct a
representation of the qualitative relations between the glyph’s edges. This corresponds
to the edge representation.
382 A. Lovett, K. Lockwood, and K. Forbus
Many of the spatial relations in the shape representation (e.g., relative position,
containment) are computed based on the relative position and topology of the glyphs.
However, some shape relations can only be computed by first decomposing a glyph
into its edges and constructing the glyph’s edge representation. By comparing two
glyph’s edge representations using SME, CogSketch can identify the corresponding
edges in the two glyph’s shapes. These correspondences can be used to determine
whether the two glyphs are the same shape, and whether one glyph’s shape is a trans-
formation of the other (e.g., a rotation or a reflection). Furthermore, a glyph’s edge
representation can be compared to itself via MAGI to identify axes of symmetry.
In order to model the Oddity Task, we examined the Dehaene [7] stimuli and identi-
fied a set of qualitative attributes and relations that appeared to be important for solv-
ing the problems. All attributes and relations had to be among those that could be
computed automatically by CogSketch.
Table 1 summarizes qualitative attributes and relations for the edge representa-
tions. Many relations are based on corners between edges. The other relations can
only hold for edges that are not connected by a corner along the shape.
Table 2 summarizes attributes and relations for shapes. Empty/filled is a simplifica-
tion of shape color; it refers to whether the shape has any fill color. Frame-of-
Reference relations are used when a smaller shape is located inside a larger, symmetric
shape (i.e., a circle). The inner’s shape location is described in terms of which quadrant
of the larger shape it is located in; additionally, the inner shape may lie exactly along
the larger shape’s axes of symmetry. Shape-proximity-group refers to shapes grouped
together based on the Gestalt law of proximity [26]. Currently, grouping by proximity
is only implemented for circles.
Line/Line and Line/Point relations apply only to special shape types. Line/Line re-
lations are for shapes that are simple, straight lines (thus these relations are a subset of
the edge relations). Line/Point relations are for when a small circle lies near a line.
The centered-on relation applies when the circle lies at the center of the line. This
relation is essentially a special case of the frame-of-reference relation for a dot lying
at the center of a circle.
Axes of symmetry, same-shape, rotation-between, and reflection-between are all
computed by comparing shapes’ edge representations, as described above. Reflections
are classified as X-Axis-Reflections, Y-Axis-Reflections, and Other-Reflections.
Our model of the oddity task is based on the following theoretical claims:
1) People encode qualitative, structural representations of visual scenes and use
these representations to perform visual tasks.
2) For a given problem, people will focus on a particular representational level
(either the shape level or the edge level) in solving that problem.
3) Qualitative spatial representations are compared via structure-mapping, as
implemented in SME.
4) People will identify the common features across a set of images via analogi-
cal generalization, as implemented in SEQL.
Note that these claims are general enough to apply to many spatial tasks. However,
they are not detailed enough to fully specify how any task would be completed. Thus,
384 A. Lovett, K. Lockwood, and K. Forbus
Our model attempts to pick out the image that does not belong by performing a series
of Generalize/Compare trials. In each trial, the system constructs a generalization
from a subset of the images in the array (either the top three or the bottom three). This
generalization represents what is common across all of these images. For example,
consider the right-angled triangle problem (Fig. 1, Problem A). The generalization
built from the three top images will describe three connected edges, with two of the
edges being perpendicular. In the rightmost top image, the two perpendicular edges
form an edges-same-length-corner, but this relation will have been abstracted out
because it is not common to all three images.
The generalization is then compared to each of the other images in the array, using
SME. The model examines the similarity scores for the three images, looking for a
particular pattern of results: two of the images should be quite similar to the generali-
zation, while the third image, lacking a key feature, should be less similar. In this
case, the lower middle triangle will be less similar to the generalization because it
lacks a right angle.
Similarity is based on SME’s structural evaluation score, but it must be normalized.
There are two different ways to normalize it: Similarity scores can be normalized based
only on the size of the generalization (gen-normalized). This score measures how much
of the generalization is present in the image being compared. This measure is ideal for
noticing whether an image lacks some feature of the generalization.
Alternatively, similarity scores can be normalized based on both the size of the
generalization and the size of the image’s representation (fully-normalized). This
score measures both how much of the generalization is present in the image and how
much of the image is present in the generalization. While more complex than gen-
normalized scores, fully-normalized scores are necessary for noticing an odd image
out that possesses an extra qualitative feature that the other images lack. For exam-
ple, it allows the model to pick out the image with parallel lines from the other five
images without parallel lines (Fig. 1, Problem C).
In each Generalize/Compare trial, the model must make three choices. The first is
which subset of the images to generalize over (either the top three images or the
bottom three). The second is whether to use gen-normalized or fully-normalized
similarity scores. The third is whether to use edge representations or shape representa-
tions—recall that we are predicting that edge representations and shape representa-
tions will never be combined in a single comparison.
These choices are made via the following simple control mechanism: (1) To en-
sure that the results are not dependent on the order of the images in the array, trial
runs are attempted in pairs, one based on generalizing from the top three images and
one based on generalizing from the bottom three images. (2) Because the gen-
normalized similarity score is simpler, it is always attempted first. (3) The model
Modeling Cross-Cultural Performance on the Visual Oddity Task 385
chooses whether to use edge or shape representations based on the makeup of the first
image. If the image contains multiple shapes, or if the image contains an elliptical
shape consisting of only a single edge (e.g., a circle), then a shape representation is
used. Otherwise, an edge representation is used. Note, however, that an edge repre-
sentation will be quickly abandoned if it is impossible to find a good generalization
across images, as indicated by different images having different numbers of edges.
After the initial pair of trials is run, the model looks for a sufficient candidate. Re-
call that each Generalize/Compare run produces three similarity scores for the three
images that have been compared to the generalization. A sufficient candidate is cho-
sen when the lowest-scoring image has a similarity score noticeably lower than the
other two (< 95% of the second lowest-scoring image), meaning the image is noticea-
bly less similar to the generalization.
In cases where a sufficient candidate is not found, the model will attempt addi-
tional trials. (1) If the model was previously run using edge representations, it will try
using shape representations. (2) The model will try using a fully-normalized similar-
ity score, to see if the odd image out possesses an extra feature. At this point, if no
sufficient candidate has been identified, the model gives up (this is the equivalent of a
person guessing randomly, but we do not allow the model to make such guesses).
5 Simulation
We evaluated our model by running it on the 45 problems from the Dehaene [7]
study. The original stimuli, in the form of PowerPoint slides, were copied and pasted
into CogSketch, which automatically converted each PowerPoint shape into a glyph.
Four of the 45 problems were touched up in PowerPoint to ease the transition—lines
or polygons that had been drawn as separate parts and then grouped together were
redrawn as a single shape. Five additional problems were modified after being pasted
into CogSketch. In all five cases, we removed simple edges which had been added to
the images of the problem to help illustrate an angle or reflection to which partici-
pants were meant to attend. Because the model was unable to understand the informa-
tion these lines were meant to convey, they would have served only as distracters.
Aside from the changes to these nine problems, no changes were made to the stimuli
which had been run on human participants.
In analyzing the results, we consider first the model’s overall accuracy, including
the correlation between its performance and that of both the American participants
and the Mundurukú participants. We then use the model to identify four factors that
could contribute to problem difficulty. We examine the correlation between these
factors and human performance on the subset of problems that are correctly solved by
the model.
Our model correctly solves 39/45 problems. Note that chance performance would be
7.5/45. Furthermore, there is a strong correlation between the model’s performance and
the performance of the human participants. Table 3 shows the Pearson correlation coef-
ficient between the model and each of the human populations. As the table shows, the
386 A. Lovett, K. Lockwood, and K. Forbus
model correlates better with the American participants. However, there is also a high
correlation with the Mundurukú participants. The coefficient of determination, which is
computed by squaring the correlation coefficient, indicates the percentage of the vari-
ance in one variable which is accounted for by another. In this case, the coefficient of
determination between the model and the Mundurukú participants is (.4932 = .243),
meaning the model accounts for about ¼ of the variance in the performance of the
Mundurukú participants.
Table 3. Correlations between the model and the American and Mundurukú participants
Americans Mundurukú
Americans * .758
Mundurukú .758 *
Fig. 2 plots the performance of the two populations and the model. As the figure
shows, the six problems on which the model fails are among the hardest for both
populations. The one clear exception is problem 21 (see Fig. 3). Although the model
fails on this problem, the Mundurukú performed quite well on it (86% accuracy).
Fig. 2. Performance of Americans, Mundurukú, and our model on the Oddity Task
Discussion. Fig. 3 shows the six problems which our model fails to solve. As the
percentages show, these problems were for the most part quite difficult for both the
Americans and the Mundurukú, with performance on some problems little or no
higher than chance (17%).
Modeling Cross-Cultural Performance on the Visual Oddity Task 387
Fig. 3. The six problems the model fails to solve. Above each problem the average accuracy
for the Americans and the Mundurukú are listed, respectively, followed by the number of the
correct answer.
Overall, these six problems can be roughly broken down into three categories
based on what is required to solve them. First, problem 22 requires encoding whether
the dot lies along the axes of the asymmetric quadrilateral. Our model simply does not
encode this relation—nor, it appears, do Americans, as they actually fall below
chance on this problem. Interestingly, the Mundurukú are well above chance; at this
time, it is difficult to say why they are better at solving this problem.
388 A. Lovett, K. Lockwood, and K. Forbus
We analyzed problem difficulty on the 39 problems that the model correctly solves.
We used the model to identify four factors that could contribute to difficulty. For this
paper, we focus on factors related to encoding the stimuli. The factors are:
(1) Shape Comparison. Some problems (e.g., Fig. 1, Problem D) require construct-
ing edge representations of two shapes and comparing them in order to identify a
relation between the shapes (e.g., a rotation or a reflection). This may be difficult
because it involves switching between the edge and shape representations, and be-
cause it requires conducting an additional comparison with SME before one begins
comparing the six images.
(2) Shape Symmetry: Some problems (e.g., Fig. 1, Problem E) require comparing a
shape’s edge representation to itself, via MAGI, in order to identify an axis of sym-
metry. This could be difficult for similar reasons.
(3) Shape Decomposition: Several problems (e.g., Fig 1, Problem A) require de-
composing shapes into edges in order to represent each image at the edge representa-
tion level. It is possible that this will be difficult for individuals because there may be
a temptation to consider closed shapes only at the shape representation level.
(4) Shape Grouping: A couple problems (e.g., Fig. 1, Problem F) require grouping
shapes together based on the Gestalt rule of proximity. Normally, one would assume
this was easy, but preliminary analysis indicated it might be difficult for the Mundu-
rukú participants.
We used the model to produce a measure for each difficulty factor on each problem
via ablation; for example, we ran the model with the ability to conduct shape com-
parisons turned off in order to identify the problems on which shape comparisons
were required. We then attempted to find a difficulty function, based on the four fac-
tors, which correlated highly with each of the human populations. This was done by
performing an exhaustive search over all possible linear weights for the four factors in
the range of 0 to 15.
Modeling Cross-Cultural Performance on the Visual Oddity Task 389
Results. The optimal difficulty function for the American participants is shown in
Table 4 (the weight for each factor is normalized based on the size of the largest
weight). In addition to the weight of each factor, the table shows the individual
contribution of each factor to the correlation between the function and human per-
formance. This was computed by removing a factor from the difficulty function and
considering the drop in the function’s correlation with the human population.
As Table 4 shows, the difficulty function had an overall correlation of .667 with
the American participants. This means that the function explains (.6672 = 44%) of the
variance in human performance on the 39 problems. Most of the contribution to this
correlation comes from shape comparison and shape symmetry. It appears that the
American participants had a great deal of difficulty with problems that required de-
composing shapes into edges and comparing the edge representations to identify rela-
tions between shapes, or symmetry within a single shape. Shape decomposition also
contributed to the correlation, suggesting that the participants had some difficulty
with the problems requiring focusing on the edge representations of closed shapes.
Table 4. Relative contribution of factors to our difficulty function for American performance
The optimal difficulty function for the Mundurukú participants is shown in Table 5.
This difficulty function had a correlation of .637 with the human data, indicating it ac-
counts for (.6372 = 41%) of the variance in the Mundurukú performance. By far, the
most important factor was shape comparison. The other contributing factor was shape
grouping, suggesting that the Mundurukú participants might have some difficulty with
problems requiring grouping elements together based on proximity. This is surprising,
as Gestalt grouping is generally thought to be a basic, low-level operation. Note that the
Mundurukú participants had no trouble with problems requiring estimating relative
distances, as indicated by their high performance on problem 21 (Fig. 3).
Table 5. Relative contribution of factors to our difficulty function for Mundurukú performance
Table 6 shows the correlation between each difficulty function and each population
group. As expected, each difficulty function correlates far better with the population
group for which it was built. The fact that there is still a relatively high correlation
between the American function and the Mundurukú performance, and between
the Mundurukú function and the American performance, most likely results from the
fact that both groups have a great deal of trouble with problems requiring shape
comparison.
Discussion. One of our original goals was to use the model to identify differences
between the two populations. Our two difficulty functions appear to have accom-
plished this. The difficulty function for American participants suggests that they tend
to encode images holistically. They tend to have trouble when a problem requires
breaking a shape down into its edge representation. This may be because the aca-
demic training in basic shapes encourages Americans to look at shapes as a whole,
rather than explicitly considering the individual edges that make up a shape. The
Mundurukú participants, in contrast, appear to encode stimuli more analytically. They
are better able to consider shapes in terms of their component edges; most noticeably,
they are better at using a shape’s edges to identify axes of symmetry. However, they
had difficulty seeing groups of shapes holistically in this task.
6 Related Work
Several AI systems have been constructed to explore visual analogy. Croft and Tha-
gard’s DIVA [5] uses a 3D scene graph representation from computer graphics as a
model of mental imagery. That is, the system “watches” animation in the computer
graphics system in order to perceive its mental imagery. Analogy is carried out via a
connectionist network over the hierarchical structure of the scene graph. DIVA’s
initial inputs, unlike ours, are generated by hand. Their background knowledge is also
hand-generated specifically for their simulation, unlike our use of the same knowl-
edge base across many simulation systems and experiments. DIVA has only been
tested on a handful of examples, and to the best of our knowledge, has not been used
to model specific psychological findings. Davies and Goel’s Galatea [6] uses a small
vocabulary of primitive visual elements (line, circle box) plus a set of visual trans-
formation over them (e.g., move, decompose) to describe base and target descriptions,
and uses a copy/substitution algorithm to model analogy, carrying sequences of
transformations from one description to the other. All of Galatea’s inputs are hand-
generated, as is its background knowledge, and it has only been tested on a few
Modeling Cross-Cultural Performance on the Visual Oddity Task 391
7 Discussion
We have described a model of the Oddity task, using CogSketch to automatically
encode stimuli in terms of qualitative spatial representations, MAGI to detect symme-
try, and SME and SEQL to carry out the task itself. We showed that this combination
of modules can achieve behavior comparable to the participants in Dehaene et al’s
study of American and Mundurukú performance on the same stimuli. Furthermore,
we were able to provide some evidence about possible causes for performance differ-
ences between the groups, through statistical analysis of ablation experiments on the
model.
We find these results quite exciting on their own, but they are also part of a larger
pattern. That is, similar combinations of qualitative representations and analogical
processing have already been used to model a variety of visual processing tasks
[19,20,25]. This study lends further evidence for our larger hypotheses, that (1) quali-
tative attributes and relations are central to human visual encoding and (2) people
compare low-level visual representations using the same mapping process they use for
abstract analogies. The study also lends support to the proposal that (3) comparison
operations are performed using either a shape representational focus or an edge repre-
sentational focus.
We plan to pursue two lines of investigation in future work. First, this paper fo-
cused on difficulties related to encoding. Our model suggests difficulties involving
comparisons may also be implicated. For example, a problem might be harder because
the six images in the array are less similar, making alignment and generalization pro-
duction more difficult. We plan to explore how well aspects of the comparison process
can explain the variance. Of particular interest are whether their contributions are uni-
versal, or whether there will be cultural differences. Second, we plan on using these
analyses to construct more detailed models of specific groups performing this task (i.e.,
children and adults, as well as both cultures). Comparing these models to each other,
and to models of similar spatial tasks, could help identify general processing con-
straints on such tasks. This may shed light on how universal human spatial representa-
tions and reasoning are, both across cultures and across tasks.
Acknowledgements
This work was supported by NSF SLC Grant SBE-0541957, the Spatial Intelligence
and Learning Center (SILC). We thank Elizabeth Spelke for providing the original
oddity task stimuli.
392 A. Lovett, K. Lockwood, and K. Forbus
References
1. Abravanel, E.: The Figure Simplicity of Parallel Lines. Child Development 48(2), 708–710
(1977)
2. Appelle, S.: Perception and Discrimination as a Function of Stimulus Orientation: The
Oblique Effect in Man and Animal. Psychological Bulletin 78, 266–278 (1972)
3. Bhatt, R., Hayden, A., Reed, A., Bertin, E., Joseph, J.: Infants’ Perception of Information
along Object Boundaries: Concavities versus Convexities. Experimental Child Psychol-
ogy 94, 91–113 (2006)
4. Biederman, I.: Recognition-by-Components: A Theory of Human Image Understanding.
Psychological Review 94, 115–147 (1987)
5. Croft, D., Thagard, P.: Dynamic Imagery: A Computational Model of Motion and Visual
Analogy. In: Magnani, L., Nersessian, N. (eds.) Model-based Reasoning: Science, Tech-
nology, Values, pp. 259–274. Kluwer/Plenum (2002)
6. Davies, J., Goel, A.K.: Visual Analogy in Problem Solving. In: Proceedings of the Interna-
tional Joint Conference on Artificial Intelligence, pp. 377–382 (2001)
7. Dehaene, S., Izard, V., Pica, P., Spelke, E.: Core Knowledge of Geometry in an Amazo-
nian Indigene Group. Science 311, 381–384 (2006)
8. Falkenhainer, B., Forbus, K., Gentner, D.: The Structure-Mapping Engine. In: Proceedings
of the Fifth National Conference on Artificial Intelligence (1986)
9. Ferguson, R.W.: MAGI: Analogy-Based Encoding Using Regularity and Symmetry. In:
Proceedings of the 16th Annual Conference of the Cognitive Science Society, pp. 283–288
(1994)
10. Forbus, K., Oblinger, D.: Making SME Greedy and Pragmatic. In: Proceedings of the
Cognitive Science Society (1990)
11. Forbus, K., Ferguson, R., Usher, J.: Towards a Computational Model of Sketching. In:
Proceedings of the 2001 Conference on Intelligent User Interfaces (IUI-2001) (2001)
12. Forbus, K., Lockwood, K., Klenk, M., Tomai, E., Usher, J.: Open-Domain Sketch Under-
standing: The nuSketch Approach. In: AAAI Fall Symposium on Making Pen-based Inter-
action Intelligent and Natural (2004)
13. Forbus, K., Usher, J., Lovett, A., Wetzel, J.: CogSketch: Open-Domain Sketch Under-
standing for Cognitive Science Research and for Education. In: Proceedings of the Euro-
graphics Workshop on Sketch-Based Interfaces and Modeling (2008)
14. Gentner, D.: Structure-Mapping: A Theoretical Framework for Analogy. Cognitive Sci-
ence 7(2), 155–170 (1983)
15. Gentner, D., Markman, A.B.: Structure Mapping in Analogy and Similarity. American
Psychologist 52, 42–56 (1997)
16. Gentner, D., Loewenstein, J.: Relational Language and Relational Thought. In: Amsel, E.,
Byrnes, J.P. (eds.) Language, Literacy, and Cognitive Development: The Development and
Consequences of Symbolic Communication. Lawrence Erlbaum Associates, Mahwah
(2002)
17. Huttenlocher, J., Hedges, L.V., Duncan, S.: Categories and Particulars: Prototype Effects
in Estimating Location. Psychological Review 98(3), 352–376 (1991)
18. Kuehne, S., Forbus, K., Gentner, D., Quinn, B.: SEQL: Category Learning as Progressive
Abstraction Using Structure Mapping. In: Proceedings of the 22nd Annual Meeting of the
Cognitive Science Society (2000)
19. Lovett, A., Gentner, D., Forbus, K.: Simulating Time-Course Phenomena in Perceptual
Similarity via Incremental Encoding. In: Proceedings of the 28th Annual Meeting of the
Cognitive Science Society (2006)
Modeling Cross-Cultural Performance on the Visual Oddity Task 393
20. Lovett, A., Forbus, K., Usher, J.: Analogy with Qualitative Spatial Representations Can
Simulate Solving Raven’s Progressive Matrices. In: Proceedings of the 29th Annual Con-
ference of the Cognitive Society (2007)
21. Lovett, A., Sagi, E., Gentner, D.: Analogy as a Mechanism for Comparison. In: Proceed-
ings of Analogies: Integrating Multiple Cognitive Abilities (2007)
22. Markman, A.B., Gentner, D.: Commonalities and Differences in Similarity Comparisons.
Memory & Cognition 24(2), 235–249 (1996)
23. Mitchell, M.: Analogy-making as Perception: A Computer Model. MIT Press, Cambridge
(1993)
24. Palmer, S.E.: Hierarchical Structure in Perceptual Representation. Cognitive Psychol-
ogy 9(4), 441–474 (1977)
25. Tomai, E., Lovett, A., Forbus, K., Usher, J.: A Structure Mapping Model for Solving
Geometric Analogy Problems. In: Proceedings of the 27th Annual Conference of the Cog-
nitive Science Society (2005)
26. Wertheimer, M.: Gestalt Theory. In: Ellis, W.D. (ed.) A Sourcebook of Gestalt Psychol-
ogy, pp. 1–11. The Humanities Press, New York (1924/1950)
Modelling Scenes Using the Activity within
Them
School of Computing,
University of Leeds,
Leeds LS2 9JT, United Kingdom
{hannah,rf,dch,agc}@comp.leeds.ac.uk
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 394–408, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Modelling Scenes Using the Activity within Them 395
crossed the road”, for example) we discuss regions (roads) which might be visu-
ally determined by clear kerb stones and line markings. However, these regions
could also be functionally determined: it is easy to imagine some dirt path which
has no clear visible boundaries, but which is still a road by virtue of the cars
driven along it regularly (much to the peril of chickens). In this sense, roads and
paths can be identified as much by typical patterns of motion as by physical
structures. There are certain things we can find out from motion patterns which
would be very difficult to discover through the analysis of static scene structures.
For example, whilst it is possible to imagine a hypothetical scene analysis system
that could identify roads and roundabouts from static images, determining what
side of the road people drive on or which way around the roundabout people
travel would require analysis of motion.
Within the field of Computer Vision there is a body of work concerning the
modelling of scene structure through tracking visible agents, and this work iden-
tifies such emergent, functional paths. In scenes with limited behavioural reper-
toires (Fernyhough et al. [7] call these “strongly stylised domains”) and in which
the behaviour of interest is detectable from trajectories alone, such systems work
well. In scenes where finer grained ideas of motion are of interest (such as around
chairs and benches, which we might be interested in detecting as the loci of sit-
ting and standing activities) trajectory based systems have difficulties. In areas
where behaviour is not as constrained (such as on a train platform, where paths
have little meaning) the trajectory based systems also have difficulties. Strong
occlusion is also a problem for trajectory based systems, and much work con-
siders the problem of maintaining tracks through occlusion. In this paper we
sidestep this difficult problem by using what we call “tracklets”, which are short
indicative bursts of motion, and by working at the level of image features rather
than tracked unitary objects.
The current paper makes two contributions: we apply feature based tracking
(as used in the activity modelling community, e.g. in [13]) to the problem of
modelling scene geography, and we do this within a qualitative framework to ex-
tract descriptions that can be used within Qualitative Spatial Reasoning (QSR)
systems. This allows us to label regions of unconstrained scenes, some of which
are difficult for computer vision systems to handle.
2 Related Work
Whilst there is a large literature on modelling spatial regions using a priori ideas
about space and motion, or previously crafted maps, the current paper falls in
the category of scene modelling from automated analysis of video. Work in scene
modelling has thus far concentrated on the analysis of gross patterns of motion,
such as the trajectories of tracked people (or other moving objects) or on optical
flow patterns.
Systems which work at the level of the entire trajectory are able to con-
struct models of the way in which agents move through the scene. Johnson and
396 H.M. Dee et al.
Fig. 1. A frame of video showing two sets of tracklets: most recent (just completed)
in blue; previous in green. These give a robust indication of motion in the image plane
without committing to any object segmentation or scale.
398 H.M. Dee et al.
between start and end points using the tracklet as a robust means of getting
to this. In order to do this, we look at the gross motion within each tracklet
thresholding on angle from the vertical θ and distance travelled d between first
and last points. This descriptor is one of up, up-right, right, down right . . . or still.
Tracklets are classified as still if their total movement d is below a threshold α:
in the current implementation α = 2 pixels, which we find allows for considerable
camera shake whilst still detecting most significant motion. This calculation is
set out in Table 1.
This directional quantization is similar to the system described in [8], although
they work with optical flow rather than tracked features and match their motion
descriptors to hand-crafted templates.
3π
Up-right UR π
8
<θ≤ 8
3π 5π
Right R 8
<θ≤ 8
5π 7π
Down-right DR 8
<θ≤ 8
... ... ... ...
−π −3π
Up-left UL 8
≥θ> 8
Fig. 2. A screenshot from the chair dataset with grid overlayed, showing histograms
calculated from different scene cells. Cell A near the top of the door does not see much
movement, and the movement that is observed is R and L corresponding to the opening
and closing motion of the door. Cell B is at a medium height on the wall behind the
chairs and sees motion both to the left and the right due to people moving backwards
and forwards behind the row of chairs. C, in the door region, has a major peak in its
histogram corresponding to motion to the left, due to people opening the door and
going out through it, and a less pronounced peak at R, presumably corresponding to
the door closing again.
drink tea or coffee; a 30 minute video from the UK Home Office “i-LIDS” (Im-
agery library for intelligent detection systems [9]) dataset of an underground
station, including platform, train track region and a bench where passengers oc-
casionally sit and wait for trains; and a 14 minute video of a busy roundabout
intersection, taken from the top of a 20 metre pole using an experimental 2
camera setup (containing considerable camera shake as a result). We have not
attempted to correct any issues with these datasets by pre-processing. These will
be called the chair, i-LIDS and roundabout scenes. Figure 3 shows the gener-
ated histogram information presented as a bitmap for each scene and for each
direction.
Fig. 3. Histogrammed direction data (one row per bin) showing evident patterns of
motion within each input scene
Modelling Scenes Using the Activity within Them 401
We use a smoothness term which penalises adjacent labels which are different
and does not penalise adjacent labels which are the same (thus encouraging
uniform regions). We have a smaller penalty for labels which are “one out”,
which has the effect of lowering the penalty for adjacent regions with adjacent
directions (right and up-right, for example). This can be thought of as decreasing
the penalty term for labels which are conceptual neighbours as well as physical
neighbours. Equation 2 provides details of the smoothness term for two adjacent
squares i and j; k is a constant set in these experiments to be 0.5.
Fig. 4. Considering each direction independently. Final row shows the result of using
an MRF to combine these to form an overall segmentation, rather than using a set
threshold on each direction alone.
Modelling Scenes Using the Activity within Them 403
Fig. 5. Learned motion patterns used for scene partitioning, with clusters learned for
each scene. Colour coding in this figure is chosen within each scene: darker regions in
one scene are not necessary related to darker regions in another.
Fig. 6. Learned motion patterns used for scene partitioning, with clusters learned
across all scenes. Despite diferent values of K, the bench in the i-LIDS scene has
been identified as similar in motion pattern to the chairs in the chair scene. In this
Figure, the colour coding changes between values of K but is consistent across scenes.
For example, in the K=10 column the dark grey region which makes up the majority
of the column corresponds to a vector representing very little motion.
algorithm, and then we use these clusters as the basis for our segmentation. As
before, we use a Markov random field to smooth the segmentation. We use a
smoothness term which does not consider conceptual neighbours, as it is more
difficult to determine an ordering on the 8-dimensional1 input vectors. Thus the
1
The dimensions being: up; up left; left; down left; down; down right; right; and up
right.
404 H.M. Dee et al.
Fig. 7. Illustrations of the learned cluster centres. The size of the arrow is proportional
to the frequency with which that direction was observed. These illustrations are clusters
learned across all scenes when K=10.
smoothness term has a penalty for neighbouring squares which differ in cate-
gory, and no penalty for neighbouring squares which are the same. The distance
measure used is Euclidean distance between histograms. Figure 5 shows the par-
titioning of each scene given by the use of K-means clustering, and the same
partitioning after application of an MRF.
The images in Figure 5 illustrate segmentations obtained by training on each
scene individually. The motivation for this is that we might expect the motion
patterns of vehicles at a roundabout to be different to those of people in an under-
ground station, or in a university common room. However we might also expect
there to be a certain amount of similarity in motion between the scenes. Applying
K-means to all three datasets at once provides us with motion descriptors which
are not individually tailored to each scene but which capture similarities between
motion in each, and the results of this are shown in Figure 6. Figure 6 includes di-
agrams drawn with different values of K (the number of clusters). In each of these,
similar patterns appear.
Figure 7 shows cluster centres learned across all scenes when K=10, corre-
sponding to the second column in Figure 6. This figure shows quite clearly that
the observed patterns do not correspond to single dominant directions, but often
to pairs of opposites.
6 Evaluation
Informally, various scene elements can be identified – in the i-LIDS scene, the
track region is clear, in the chair scene, the chairs are clear, and in the roundabout
there is an obvious structure in the right place.
More formal evaluation is difficult as the generation of ground truth for motion
segmentation is not a trivial matter. We are concerned not with the way in which
the scene is superficially structured, but the way in which people interact with
the scene as they move around. For example, whilst the roundabout dataset is
indeed a roundabout, the majority of traffic goes straight across and turning
traffic is fairly uncommon. In the i-LIDS dataset, the platform has a number of
associated motion patterns, which differ from region to region (in some areas,
Modelling Scenes Using the Activity within Them 405
Rough “ground-truth”.
Fig. 8. Ground truth with various segmentations: dominant direction, motion patterns
learned per scene and motion patterns learned across all scenes
hardly anybody waits, but in others there are often people milling around).
Despite these acknowledged difficulties we believe that comparison with a hand-
marked-up ground truth is the best way to evaluate this work and have generated
a simple region based segmentation against which to compare out output. This
is shown in Figure 8, alongside various outputs.
From Figure 8 we can see that many of the identified ground truth image
regions have parallels in the segmentations. The MRF based upon dominant
direction alone is the least like the ground truth segmentations; whilst it is
possible to find similarities it would be generous to say that these segmentations
were clear.
With the segmentations learned for each scene individually the scene structure
is more evident. The chair scene in particular has clearly highlighted the chairs
as regions of heightened motion (although not the door). Within the i-LIDS
dataset there is an unexpected distinction between regions of the train platform;
the middle area where most people chose to wait is associated with a different
cluster centre to the far and nearground, and there appears to be some form of
emergent “path” heading to and from the bench. The edge of the platform and
the train region have both emerged from the observed data. In the roundabout
406 H.M. Dee et al.
scene the near and far sections stand out very well, as does the left hand feeder
branch to the roundabout.
Finally considering the segmentations created by learning over all scenes si-
multaneously (the final line of images in Figure 8) we can begin to detect sim-
ilarities between the regions defined in each scene. Whilst we cannot claim to
have constructed something that can detect chairs and benches it is however fair
to say that the clusters associated with the chairs in the chair scene (marked
as pale grey in the ground truth) also seem to be associated with the bench in
the i-LIDS scene (marked as black in the ground truth). The roundabout scene
is not segmented as clearly in the combined segmentations as in the individual
segmentations, presumably as this scene contains strongly directional motion
(each section effectively being a one-way street).
This paper has presented a novel approach for the unsupervised learning of spa-
tial regions from motion patterns. Our aim is to create segmentations of input
video which correspond to semantically meaningful regions in an unsupervised
fashion, and then to use these semantically meaningful regions within a quali-
tative spatial reasoning (QSR) framework. We have made considerable progress
towards this aim, and have generated segmentations which correspond in part to
ground truth segmentations of three experimental scenes. Our method is robust
to camera shake and background changes in a way that the existing path based
systems are not (due to their reliance on some form of background model).
Further investigation is required to determine which varieties of input are
most useful to this type of system: the directional histograms used here could be
augmented by information about speed, for example, and we are investigating
ways to further exploit the tracklet representation. We have carried out informal
investigations in the variation of histogram bin size (resulting in the 16 by 16
bins reported here) but a more thorough study could be useful, and the opti-
mal size will almost certainly be scene dependent. The use of overlapping bins
or pyramidical representations is also something we wish to pursue. Perhaps
more interestingly, further investigation is needed into the detection of common
patterns across different scenes, perhaps within a supervised or semi-supervised
machine learning framework. The similarity between segmentation of the bench
in the i-LIDS dataset and the chairs in the chair dataset is a promising sign, and
it would be an interesting experiment to collect video of many scenes containing
chairs or benches and see if we can learn their associated motion patterns from
observation.
The scenes under consideration in this paper contain various types of mo-
tion constrained in various ways, and perhaps because of this the two broad
approaches outlined in this paper (dominant direction vs. K-means clustering)
perform differently in each scene. The dominant direction thresholding results in
clear images of the roundabout scene, which is an example of what Fernyhough
called a strongly stylised domain. As such we should expect strong directions to
Modelling Scenes Using the Activity within Them 407
Acknowledgements
References
1. Fernyhough, J.H., Cohn, A.G., Hogg, D.C.: Generation of semantic regions from
image sequences. In: Proc. European Conference on Computer Vision (ECCV),
Cambridge, UK, pp. 475–484 (1996)
2. Laptev, I.: On space-time interest points. Journal of Computer Vision 64(2/3),
107–123 (2005)
3. Johnson, N., Hogg, D.C.: Learning the distribution of object tractories for event
recognition. Image and Vision Computing 14(8), 609–615 (1996)
4. Stauffer, C., Grimson, E.: Learning patterns of activity using real-time tracking.
IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(8),
747–757 (2000)
5. Makris, D., Ellis, T.: Learning semantic scene models from observing activity in
visual surveillance. IEEE Transactions on Systems, Man and Cybernetics 35(3),
397–408 (2005)
6. McKenna, S.J., Charif, H.N.: Summarising contextual activity and detecting un-
usual inactivity in a supportive home environment. Pattern Analysis and Applica-
tions 7(4), 386–401 (2004)
7. KaewTraKulPong, P., Bowden, R.: Probabilistic learning of salient patterns across
spatially separated, uncalibrated views. In: Intelligent Distributed Surveillance Sys-
tems, pp. 36–40 (2004)
8. Xiang, T., Gong, S.: Beyond tracking: Modelling activity and understanding be-
haviour. International Journal of Computer Vision 67(1), 21–51 (2006)
9. Bicego, M., Cristiani, M., Murino, V.: Unsupervised scene analysis: a hidden
Markov model approach. Computer Vision and Image Understanding (CVIU) 102,
22–41 (2006)
10. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In:
Proc. International Conference on Computer Vision (ICCV), Nice, France (2003)
11. Laptev, I., Pérez, P.: Retrieving actions in movies. In: Proc. International Confer-
ence on Computer Vision (ICCV) (2007)
12. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of
flow and appearance. In: Proc. European Conference on Computer Vision (ECCV),
pp. 428–441 (2006)
408 H.M. Dee et al.
13. Gryn, J.M., Wildes, R.P., Tsotsos, J.: Detecting motion patterns via direction
maps with application to surveillance. In: Workshop on Applications of Computer
Vision, pp. 202–209 (2005)
14. Colombo, A., Leung, V., Orwell, J., Velastin, S.A.: Markov models of periodically
varying backgrounds for change detection. In: Visual Information Engineering, Lon-
don, UK (2007)
15. Shi, J., Tomasi, C.: Good features to track. In: Proc. Computer Vision and Pattern
Recognition (CVPR), pp. 593–600 (1994)
16. Lucas, B.D., Kanade, T.: An iterative image registration technique with an appli-
cation to stereo vision. In: International Joint Conference on Artificial Intelligence,
pp. 674–679 (1981)
17. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Re-
port CMU-CS-91-132, Carnegie Mellon (1991)
18. Home Office Scientific Development Branch U.i-LIDS: Imagery library for intelli-
gent detection systems, http://scienceandresearch.homeoffice.gov.uk/hosdb/
cctv-imaging-technology/video-based-detection-systems/i-lids/
19. Boykov, Y., Veksler, O., Zabih, R.: Efficient approximate energy minimization
via graph cuts. IEEE transactions on Pattern Analysis and Machine Intelligence
(PAMI) 20(12), 1222–1239 (2001)
20. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via
graph cuts? IEEE transactions on Pattern Analysis and Machine Intelligence
(PAMI) 26(2), 147–159 (2004)
21. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow al-
gorithms for energy minimization in vision. IEEE transactions on Pattern Analysis
and Machine Intelligence (PAMI) 26(9), 1124–1137 (2004)
Pareto-Optimality of Cognitively Preferred
Polygonal Hulls for Dot Patterns
Antony Galton
1 Introduction
When presented with a two-dimensional pattern of dots such as the one shown
in Figure 1, and asked to draw a polygonal outline which best captures the
shape formed by the pattern, people readily respond by drawing outlines such
as those shown in Figure 2. Interestingly, on first encountering this task, people
often tend to imagine that there is a unique solution, ‘the’ outline of the dots;
but they will very quickly be persuaded that there is typically no unique best
answer. Only the convex hull has any claim to uniqueness, but in very many
cases (such as the example shown), the convex hull is a bad solution to the task,
since it does not capture the shape that we humans perceive the dots as forming.
This is illustrated in Figure 3, where two distinct point-sets, having the shape
of the letters ‘C’ and ‘S’, have the same convex hull.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 409–425, 2008.
c Springer-Verlag Berlin Heidelberg 2008
410 A. Galton
2 Previous Work
As mentioned above, there is already a considerable body of work, much of it
in the pattern analysis, computer vision, and geographical information science
communities, on defining the shape of dot patterns. A typical paper in this area
will propose an algorithm for generating a shape from a pattern of dots, explore
its mathematical and/or computational characteristics (e.g., computational com-
plexity), and examine its behaviour when applied to various dot patterns. The
evaluation of this behaviour is typically very informal, often amounting to little
more than observing that the shape produced by the algorithm is a ‘good ap-
proximation’ to the perceived shape of the dots. While lip-service is generally
paid to the fact that there is no objective definition of such a ‘perceived shape’,
little is said about how to verify this, or indeed, about exactly what it means.
The much-cited work of Edelsbrunner et al. [1], introduces the notion of α-
shape: whereas the convex hull of a point-set S is the intersection of all closed
half-planes containing all the points of S, their ‘α-hull’ is the intersection of all
closed discs of radius 1/α containing all points of S (for α < 0 the closed disc of
radius 1/α is interpreted as the complement of an open disk of radius −1/α, and
for α = 0 it is a half-plane). The α-shape is a piecewise linear curve derived in
a straightforward manner from the α-hull. For certain (typically small negative)
values of α, the α-shape can come close to capturing the cognitively salient
412 A. Galton
generalises the ‘gift-wrap’ algorithm for constructing convex hulls; a line segment
of length r is swung about an extremal point of the set until it encounters another
point in the set; the two points are joined, and the procedure repeated from the
second point, until a closed shape is produced. Additional components of the
footprint will be obtained if points in the set lie outside the first component.
Similar results can be obtained by joining all pairs of points separated by at
most r and then selecting the peripheral joins, resulting in the ‘close pairs’
method. In the third algorithm, a region is produced by successively removing
the longest exterior edges from the Delaunay triangulation of the points, subject
to the condition that the region remains connected and its boundary forms a
Jordan curve. In this work, more attention was paid to the question of evaluation
criteria, and nine questions were listed that could be used to help classify different
types of solution to the general problem of associating a region with a set of
points. But like the work previously reviewed, this paper shied away from any
detailed examination of the concept of ‘perceived shape’ other than noting that
any such examination must ‘go beyond computational geometry to engage with
more human-oriented disciplines such as cognitive science’.
Moreira and Santos [6] proposed a ‘concave hull’ algorithm which is an al-
ternative generalisation of the gift-wrap algorithm, in which at any stage only
the k nearest neighbours of the latest point added to the outline are considered
as candidates for the next addition. They state the problem as that of find-
ing ‘the polygon that best describes the region occupied by the given points’,
and acknowledge that the word ‘best’ here is ambiguous, what counts as a best
solution being application dependent; but evaluation of the algorithm is largely
confined to its computational characteristics and not the adequacy of the results,
for which they do little more than refer to the criteria listed in [5]. Outputs from
this algorithm (for Pattern 5 in Appendix A) are shown in Figure 4.
In work currently in press, Duckham et al. [10] present more detailed eval-
uation for the Delaunay-based method first presented in [5], leading to a con-
clusion that ‘normalized parameter values of between 0.05–0.2 typically produce
optimal or near-optimal shape characterization across a wide range of point
distributions’, but it is acknowledged that what ‘optimal’ means here is both
underspecified and somehow connected with ‘a shape’s “visual salience” to a
human’. The actual evaluation presented in [10] takes the approach of starting
with a well-defined shape, generating a dot pattern from it, and then testing the
algorithm’s efficacy at reconstructing the original shape.
The purpose of the present paper is to take some first steps towards estab-
lishing some principles for evaluating any proposed solution to the problem of
determining an outline for a set of points. Whereas previous work has mostly
been concerned with proposing particular algorithms for generating outlines,
here I propose that, independently of any particular algorithm, we consider a
full range of possible outlines, and try to determine what features, describable
in objective (e.g., geometrical) terms, influence cognitive judgments as to the
suitability of an outline as a depiction of ‘the’ shape defined by the set of points.
414 A. Galton
k=7 k=8 k = 10
For the case n = 12 and k = 7, this comes to 86,276; but the 12-point dot
pattern shown in Figure 6, with seven vertices in its convex hull, actually has
only 5674 polygonal hulls, approximately 6.6% of the upper bound. Even so, the
number of polygonal hulls does grow rapidly as the number of dots increases,
and for large values of n it becomes impracticable to compute all of them (with
n = 16 we are already talking days rather than hours or minutes in the worst
case). In reality, however, only a tiny fraction of the polygonal hulls are worth
considering as good candidates for the ‘perceived shape’ of the dot pattern.
Figure 7 illustrates three of the 5674 polygonal hulls for the dot pattern in
Figure 6. The leftmost one is the convex hull. This is easily defined, has well-
known mathematical and computational properties, and might be considered as a
useful representation of the dot pattern for some purposes; but as already noted,
it does not usually capture the perceived shape of the pattern. The rightmost
one provides a very jagged outline which does not correspond to anything that
416 A. Galton
s
s
s
s s
s s
s s
s s
s
we readily perceive when observing the dots on their own. The middle hull, on
the other hand, does seem to capture pretty well a shape that we can readily
perceive in the dots. It is certainly not unique in doing so, however, and in the
pilot study reported below, only 2 out of 13 subjects drew this as their preferred
hull for this pattern of dots.
What factors make a polygonal hull acceptable as a representation of the ‘per-
ceived shape’ of a dot pattern? The problem with the convex hull is that it will
often include large areas devoid of dots; these are the perceived concavities in the
shape, and the convex hull completely fails to account for them. Of all possible
hulls, the convex hull simultaneously maximises the area while minimising the
perimeter. It is the maximality of the area which causes the problem, since this
correlates with the inclusion of the empty spaces represented by the concavities
in the perceived outline. At the other extreme, the jagged figure on the right does
very well at reducing the area, but at the cost of a greatly extended perimeter.
The middle figure seems to strike a better balance, with both area and perimeter
taking intermediate values, as shown in Table 1.
A cognitively acceptable outline should (a) not contain too much empty space,
and (b) should not be too long and sinuous. This suggests that to produce the
Pareto-Optimality of Cognitively Preferred Polygonal Hulls 417
Table 1. Area and perimeter measurements for the hulls in Figure 7 (units of mea-
surement arbitrary)
Area Perimeter
Hull 1 42761.0 783.5
Hull 2 27163.0 962.5
Hull 3 21032.0 1599.3
optimal outline we should seek to simultaneously minimise both the area and
the perimeter. These are, of course, conflicting objectives, since the minimum
perimeter (that of the convex hull) corresponds to the maximum area. In the
language of multi-objective optimisation theory [11], we seek non-dominated so-
lutions. A polygonal hull with area A1 and perimeter P1 is said to dominate
one with area A2 and perimeter P2 (with respect to our chosen objectives of
minimising both area and perimeter) so long as
The hulls which are not dominated by any other hulls form what is known as
the Pareto set. When plotted in area-perimeter space (‘objective space’) they
lie along the Pareto front. This shows up in the graphs as the ‘south-western’
frontier of the set of points corresponding to all the hulls for a given dot pattern.
Area-perimeter plots for all eight dot patterns used in the pilot study described
below can be found in Appendix B. In these figures, area is plotted along the
horizontal axis, perimeter along the vertical; the convex hull, with maximal area
and minimal perimeter, corresponds to the point at the extreme lower right.
In light of the above considerations, we propose the following
Hypothesis: The points in area-perimeter space corresponding to polyg-
onal hulls which best capture a perceived shape of a dot pattern lie on or
close to the Pareto front.
The next section describes a pilot study which was carried out as a first step in
the investigation of this hypothesis.
4 Pilot Study
A small pilot study was carried out to gain an initial estimation of the plausibility
of the hypothesis. Eight dot patterns were presented to 13 adult subjects, who
were asked to draw a polygonal outline which best captures the shape formed
by each pattern of dots. An example dot pattern with two possible polygons was
shown (these are our Figures 1 and 2), and more precise rules given as follows:
1. The outline must be a simple closed polygon whose vertices are members of
the dot pattern; that is, it must consist of a series of straight edges joining
up some or all of the dots, forming a closed circuit.
418 A. Galton
2. You do not have to include all the given dots as vertices of your outline; but
any dots that are not used must be in the interior of the polygon formed,
not outside it.
3. The outline must not intersect or touch itself; so outlines such as the two
below are not allowed: [here the two non-examples of Figure 5 were given].
The eight dot patterns used in the pilot study are shown in Appendix A.
The results of the pilot study are tabulated in Table 2. The rows of the table
correspond to the eight dot patterns. For each dot pattern the following data
are given:
– The number of dots in the pattern.
– The total number of polygonal hulls for the pattern.
– The number of Pareto-optimal polygonal hulls for the pattern.
– The maximum number of dominators for any individual polygonal hull.
– The number of distinct hulls generated by the subjects: the relevance of
this figure is that it shows that the subjects provided a variety of different
responses — for none of the dot patterns were there just one or two ‘obvious’
outlines to draw.
– The number of subjects who responded with a Pareto-optimal hull.
– The mean relative domination of the responses — this quantity is explained
below.
Our hypothesis was that hulls corresponding to some ‘perceived shape’ of the
dot pattern should lie on or close to the Pareto front in the area-perimeter plot.
Totalling the figures in the penultimate column of the table, we see that 57
out of the total 104 responses were Pareto-optimal. The figures in the fourth
column give the number of Pareto-optimal hulls available for that dot pattern,
an indication of the size of the ‘target’ if our hypothesis is correct. The fifth
column in the table shows the maximum number of hulls by which any given
hull for that dot pattern is dominated: it will be seen that this always falls short
of the total number of hulls, but not usually by much.
Pareto-Optimality of Cognitively Preferred Polygonal Hulls 419
5 Next Steps
The pilot study reported here is limited in both scale and scope. There are
many possibilities for further work to examine a range of additional factors with
larger-scale experiments. Here we list a number of such possibilities.
1. Choice of dot patterns. The dot patterns used in the pilot study were chosen
on the basis of an informal idea that they were in some way ‘interesting’.
420 A. Galton
expect the preferred hulls to lie towards the left of this series, near the knee,
but the experimental results do not really bear this out. Further investigation
is needed to determine what factors influence the location of the optimal
hulls along the front. Factors that might be considered include sinuosity (a
measure of which is the number of times the outline changes from convex
to concave or vice versa as it is traversed), or the number of vertices in the
hull. Both of these are to some extent correlated with perimeter, although
the correlation is far from exact. One might also wish to investigate other
factors such as symmetry, which undoubtedly affect visual salience.
5. Evaluation of algorithms. Having established an appropriate set of criteria
for evaluating polygonal hulls, one can then begin experimenting with dif-
ferent algorithms. Many of the published algorithms for producing outlines
of dot-patterns yield polygonal hulls in the sense defined in this paper, and
an obvious first step would be to investigate to what extent these algorithms
tend to produce outlines that are optimal according to the criteria that have
been established. In particular, most of the existing algorithms involve a pa-
rameter — typically a real-valued length parameter, but in the case of the
k-nearest neighbour algorithm of [6], it is a positive integer. It would there-
fore be interesting to investigate how the objective evaluation criteria vary
as the parameter is varied: one could, for example, trace the path followed by
an algorithm’s output in area-perimeter space as the parameter runs through
the full range of its possible values, and hence find which parameter settings
optimise the quality of the output. For the hulls shown in Figure 4, for ex-
ample, the number of dominators in area-perimeter space are 0, 5, 4, 0, 5,
and 0 respectively, suggesting that this algorithm, like our human subjects,
is very good at finding hulls on or near the Pareto front.
6. Algorithm design. Going beyond this, one might also ask whether it is pos-
sible to design an algorithm with those criteria in mind, that is, to tailor an
algorithm to produce hulls which are optimal with respect to the criteria.
With larger point sets, one can only expect to identify the Pareto-optimal
hulls to some degree of approximation, suggesting that a fruitful approach
here might be to use some form of evolutionary algorithm.
7. Extension to three dimensions. Many of the ideas discussed here could prob-
ably be generalised to apply to three-dimensional dot patterns. A hull must
now be a volume of space bounded by a polyhedral surface rather than an
area bounded by a polygonal outline: a ‘polyhedral hull’. Some, but not
all, of the algorithms that have been used for generating outlines of two-
dimensional dot patterns readily generalise to three dimensions; little work
has been done on this, though the Power Crust algorithm of [12,13] is not
unrelated. There would be obvious practical difficulties in asking experimen-
tal subjects to construct polyhedra in space rather than drawing outlines on
a piece of paper, but no doubt some suitable experiments could be devised.
For the time being, however, the two-dimensional case already offers ample
scope for further investigation.
422 A. Galton
Acknowledgments
The author wishes to thank Jonathan Fieldsend and Richard Everson for useful
comments on an earlier draft of this paper, including advice on multi-objective
optimisation.
References
1. Edelsbrunner, H., Kirkpatrick, D.G., Seidel, R.: On the shape of a set of points in
the plane. IEEE Transactions on Information Theory IT-29(4), 551–559 (1983)
2. Garai, G., Chaudhuri, B.B.: A split and merge procedure for polygonal border
detection of dot pattern. Image and Vision Computing 17, 75–82 (1999)
3. Melkemi, M., Djebali, M.: Computing the shape of a planar points set. Pattern
Recognition 33, 1423–1436 (2000)
4. Chaudhuri, A.R., Chaudhuri, B.B., Parui, S.K.: A novel approach to computation
of the shape of a dot pattern and extraction of its perceptual border. Computer
Vision and Image Understanding 68(3), 257–275 (1997)
5. Galton, A.P., Duckham, M.: What is the region occupied by a set of points? In:
Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (eds.) Geographic Infor-
mation Science: Proceedings of the 4th International Conference, GIScience 2006,
pp. 81–98. Springer, Heidelberg (2006)
6. Moreira, A., Santos, M.: Concave hull: a k-nearest neighbours approach for the
computation of the region occupied by a set of points. In: Proceedings of the 2nd In-
ternational Conference on Computer Graphics Theory and Applications (GRAPP
2007), Barcelona, Spain, March 8-11 (2007)
7. Alani, H., Jones, C.B., Tudhope, D.: Voronoi-based region approximation for geo-
graphical information retrieval with gazetteers. International Journal of Geograph-
ical Information Science 15(4), 287–306 (2001)
8. Arampatzis, A., van Kreveld, M., Reinbacher, I., Jones, C.B., Vaid, S., Clough, P.,
Joho, H., Sanderson, M.: Web-based delineation of imprecise regions. Computers,
Environment and Urban Systems 30, 436–459 (2006)
9. Galton, A.P.: Dynamic collectives and their collective dynamics. In: Mark, D.M.,
Cohn, A.G. (eds.) Spatial Information Theory. Springer, Heidelberg (2005)
10. Duckham, M., Kulik, L., Worboys, M., Galton, A.: Efficient generation of sim-
ple polygons for characterizing the shape of a set of points in the plane. Pattern
Recognition (March 2008) (in press) (Accepted for publication)
11. Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms. John Wi-
ley, Chichester (2001)
12. Amenta, N., Choi, S., Kolluri, R.: The power crust. In: Sixth ACM Symposium on
Solid Modeling and Applications, pp. 249–260 (2001)
13. Amenta, N., Choi, S., Kolluri, R.: The power crust, unions of balls, and the medial
axis transform. Computational Geometry: Theory and Applications 19(2-3), 127–
153 (2001)
Pareto-Optimality of Cognitively Preferred Polygonal Hulls 423
q q q
q q
q q q
q q q
q q
q q
q q q
q q q q
q q
Pattern 1 Pattern 2
q
q q q
q q q
q
q q q
q q
q q
q q q q
q q q q
Pattern 3 Pattern 4
q q
q
q q
q q q
q
q q q q
q q q
q q q
q q
q
q q
Pattern 5 Pattern 6
q
q
q
q
q q
q q
q q
q q q q q
q q
q q q
q q
Pattern 7 Pattern 8
424 A. Galton
P P
A A
Pattern 1 Pattern 2
P P
A A
Pattern 3 Pattern 4
P P
A A
Pattern 5 Pattern 6
P P
A A
Pattern 7 Pattern 8
Pareto-Optimality of Cognitively Preferred Polygonal Hulls 425
P P
p pp
p pp p
pp
p p p pp
p p
pp pp p p
p paapa aap p p p
pp pp
p pap ap ap p p p p p ppp ap p p p p a a aa
p pa p ap pa p p paap a a p pp p p p
pp p pap p pp pp p
A A
Pattern 1 Pattern 2
P p P
pp
pp
pp p
p
pp
pp
pp pp p p p
pp ap p
p p p ap a
p ap p p ap p p p p p
p pa pa p p p p p ap aapa aa a
pa pa ap p p p
A A
Pattern 3 Pattern 4
P P
p p
p p pp a
p pp a
p pp p p papa p a
p pp p p
p p pp ppa p aap p ap
p ppapapaapapa a p p a pa p p a
p p pa p ppa p p p p p p p p p p p pp
p p pp p
A A
Pattern 5 Pattern 6
P P
p
p pp p p pp
ppp p p pp p
p p p pp pp pp p p p p p
p p p p p ppp ap p apap aa ap papap a ap p p p p p p ap pp p
aap p p p p p p p p p p p pap p p p p ap
ap p p p p p
A A
Pattern 7 Pattern 8
Qualitative Reasoning about Convex Relations
1 Introduction
Since the work of [1] on temporal intervals, constraint calculi have been used
to model a variety of aspects of space and time in a way that is both quali-
tative (and thus closer to natural language than quantitative representations)
and computationally efficient (by appropriately restricting the vocabulary of
rich mathematical theories about space and time). For example, the well-known
region connection calculus by [2] allows for reasoning about regions in space. Ap-
plications include geographic information systems, human-machine interaction,
and robot navigation.
Efficient qualitative spatial reasoning mainly relies on the algebraic closure
algorithm. It is based on an algebra of (often binary) relations: using relational
composition and converse, it refines (basic) constraint networks in polynomial
time. If algebraic closure detects an inconsistency, the original network is surely
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 426–440, 2008.
c Springer-Verlag Berlin Heidelberg 2008
Qualitative Reasoning about Convex Relations 427
2 Qualitative Calculi
Qualitative calculi are employed for representing knowledge about a domain
using a finite set of labels, so-called base relations. Base relations partition the
domain into discrete parts. One example is distinguishing points on the time
line by binary relations such as “before” or “after”. A qualitative representation
only captures membership of domain objects in these parts. For example, it can
be represented that time point A occurs before B, but not how much earlier nor
at which absolute time. Thus, a qualitative representation abstracts, which is
particularly helpful when dealing with infinite domains like time and space that
possess an internal structure like for example Rn .
In order to ensure that any constellation of domain objects is captured by
exactly one qualitative relation, a special property is commonly required:
Definition 1. Let B = {B1 , . . . , Bk } be a set of n-ary relations over a domain
D. These relations are said to be jointly exhaustive and pairwise disjoint (JEPD),
if they satisfy the properties
1. ∀i, j ∈{1, . . . , k} with i = j : Bi ∩ Bj = ∅
2. Dn = i∈{1,...,k} Bi
For representing uncertain knowledge within a qualitative calculus, e.g., to rep-
resent that objects x1 , x2 , . . . , xn are either related by relation Bi or by relation
Bj , general relations are introduced.
Definition 2. Let B = {B1 , . . . , Bk } be a set of n-ary relations over a domain
D. The set of general relations RB (or simply R) is the powerset P(B). The
semantics of a relation R ∈ RB is defined as follows:
R(x1 , . . . , xn ) :⇔ ∃Bi ∈ R, Bi (x1 , . . . , xn )
In a set of base relations that is JEPD, the empty relation ∅ ∈ RB is called
the impossible relation. Reasoning with qualitative information takes place on
the symbolical level of relations R, so we need special operators that allow us
to manipulate qualitative knowledge. These operators constitute the algebraic
structure of a qualitative calculus.
428 D. Lücke, T. Mossakowski, and D. Wolter
R := {(x2 , x1 )|(x1 , x2 ) ∈ R}
Additional permutation operations can be defined, but a small basis that can
generate any permutation suffices, given that the permutation operations are
strong (see discussion further below) [3]. A restriction to few operations partic-
ularly eases definition of higher arity calculi.
Note that for n = 2 one obtains the classical composition operation for binary
calculi (cp. [4]) which is usually noted as infix operator. Nevertheless different
kinds of binary compositions have been used for ternary calculi, too.
in the calculus, i.e. whether the set of general relations RB is closed under an
operation. Indeed, for some calculi the set of relations is not closed, there even
exist calculi for which no closed set of finite size can exist, e.g. the composition
operation in Freksa’s double cross calculus [5].
φ : B m → RB
φ (B1 , . . . , Bk ) := {R ∈ B|R ∩ φ(B1 , . . . , Bk ) = ∅}
is a convex subset of Rn .
One key problem is to decide whether a given CSP has a solution or not. This
can be a very hard problem. Infinity of the domain underlying qualitative CSPs
inhibits searching for an agreeable valuation of the variables. This is why decision
procedures that purely operate on the symbolic, discrete level of relations (rather
than on the level of underlying domain) receive particular interest.
Definition 10. A constraint network is called consistent if a valuation of all
variables exists, such that all constraints are fulfilled. A constraint network is
called n-consistent (n ∈ N) if every solution for n − 1 variables can be extended
to a n variable solution involving any further variable. A constraint network
is called strongly n-consistent, if it is m-consistent for all m ≤ n. A CSP in
n-variables is globally consistent, if it is strongly n-consistent.
A fundamental technique for deciding consistency in a classical CSP is to en-
force k-consistency by restricting the domain of variables in the CSP to mutually
agreeable values. Backtracking search can then identify a consistent variable as-
signment. If the domain of some variable gets restricted to down to zero size
while enforcing k-consistency, the CSP is not consistent. This procedure except
for backtracking search (which is not applicable in infinite domains) is also ap-
plied to qualitative CSPs [4]. For a JEPD calculus with n-ary relations any
qualitative CSP is strongly n-consistent unless it contains a constraint with the
empty relation. So the first step in checking consistency would be to test n + 1-
consistency. In the case of a calculus with binary relations this would mean
analyzing 3-consistency, also called path-consistency. This is the aim of the al-
gebraic closure algorithm which exploits that composition lists all 3-consistent
scenarios.
Definition 11. A CSP over binary relations is called algebraically closed if for
all variables X1 , X2 , X3 and all relations R1 , R2 , R3 the constraint relations
imply
R3 ⊆ R1 R2
To enforce algebraic closure, the operation R3 := R3 ∩ R1 R2 (as well as a
similar operation for converses) is applied for all variables until a fixpoint is
reached.
Enforcing algebraic closure preserves consistency, i.e., if the empty relation is
obtained during refinement, then the qualitative CSP is inconsistent. However,
algebraic closure does not mandatorily decide consistency: a CSP may be alge-
braically closed but inconsistent — even if composition is strong [7].
Algebraic closure has also been adapted to ternary calculi using binary compo-
sition [8]. Binary composition of ternary relations involves 4 variables, it may not
be able to represent all 4-consistent scenarios though. Scenarios with 4 variables
are specified by 4 ternary relations. However, binary composition R1 R2 = R3
only involves 3 ternary relations. Therefore, using n-ary composition in reasoning
with n-ary relations is more natural (cp. [3]).
Qualitative Reasoning about Convex Relations 431
f
dou
e
B
i
l r A,B tri
s
A
b
Fig. 1. The nine base relations of the LR-calculus; tri designates the case of A = B =
C, whereas dou stands for A = B = C
to base relations [1]. Renz pioneered research on identifying larger sets for which
algebraic closure decides consistency, thereby obtaining a practical decision pro-
cedure [12]. If however algebraic closure is too weak for deciding consistency of
scenarios, no approaches are known for dealing with qualitative CSPs on the
algebraic level. Unfortunately this is the case for the LR-calculus.
Proposition 13. All scenarios only containing the relations l and r are alge-
braically closed wrt. the LR-calculus with binary composition.
the set of {l, r} is closed under all permutations. A look at the binary composition
table of LR reveals that all compositions containing only l and r on their left
hand side, always have the set {l, r} included in their right hand side:
Of course not all LR-scenarios over the variables l and r are consistent. We will
show that
β r
6
α rγ
σ
αr -
α
for the relations of the LR-calculus. In R2 the sign of the scalar product
sign(X, Y ) determines the relative direction of X and Y . Given three points
α, β and γ that are connected by an LR-relation, we can construct a local co-
ordinate system with origin α. It has one base vector going from α to β; we call
this vector α. The vector orthogonal to this one and and facing to the right is
called α , as shown in Fig. 2. The vector from α to γ is called σ. With this we
get that (α β r γ) is true iff α , σ > 0, and (α β l γ) is true iff α , σ < 0,
and of course we know that the points α, β, and γ are different points in these
cases. The vectors α and σ are described by
yβ − yα xγ − xα
α = ,σ= .
xα − xβ yγ − yα
In fact more inequations are derivable, but already these ones are not jointly
satisfiable and we conclude:
Theorem 14. Classical algebraic closure does not enforce scenario consistency
for the LR-calculus.
434 D. Lücke, T. Mossakowski, and D. Wolter
Proof. We consider the algebraically closed LR scenario SCEN and the inequa-
tions (1) to (6) that we derived when projecting it into R2 , the intended domain
of LR. From inequations (1), (6), (4), (5) and (3) we obtain
xE · yC yA · xE
< yE <
xC xA
and again using inequations (6), (4) and (5) we get
yC · xA < xC · yA
As discussed earlier ternary composition is more natural for ternary calculi than
binary composition. Therefore we examined the ternary composition table of the
LR-calculus1 and conclude:
Theorem 15. Algebraic closure wrt. ternary composition does not enforce sce-
nario consistency for the LR-calculus.
Proof. Let us have a closer look at the ternary composition operation wrt. the
relations contained in SCEN, namely the relation l and r. Recall that the set {l, r}
of LR-relations is closed under all permutation operations. So we only need to
consider the fragment of the composition table with triples over l, r:
(r, r, r) = {r}, (r, r, l) = {b, r, l},
(r, l, r) = {f, r, l}, (r, l, l) = {i, r, l},
(l, r, r) = {i, r, l}, (l, r, l) = {f, r, l},
(l, l, r) = {b, r, l}, (l, l, l) = {l}.
We see that any composition that contains r as well as l in the triple on the
left-hand side yields a superset of {r, l} on the right-hand side. So all com-
posable triples that have both l and r on their left hand side cannot yield an
empty set while applying algebraic closure. So, we have to investigate how the
compositions (l, l, l) and (r, r, r) are used when enforcing algebraic closure.
Enumerating all composable triples (X1 X2 r1 X4 ), (X1 X4 r2 X3 ), (X4 X2 r3 X3 )
and their respective refinement relation (X1 X2 rf X3 ) yields a list with 18 entries
shown in Appendix A. All of those entries list l as refinement relation whenever
composing (l, l, l) and analogously for r. Thus, no refinement is possible, and
the given scenario is algebraically closed wrt. ternary composition.
We believe that advancing to even higher arity composition will not provide us
with a sound algebraic closure algorithm. It turns out, however, that moving to
a certain level of k-consistency does indeed make a change.
Hence a valuation for k + 1 variables exists. The second step of this proof is
trivial, since global consistency implies k-consistency for all k ∈ N.
In [7, Prop. 1] it was shown that whether composition is weak or strong is
independent of the property of algebraic closure to decide consistency. However,
in some cases, these two properties are related:
Theorem 19. In a binary calculus over the real line that
1. has only 2-consistent relations
2. and has strong binary composition
algebraic closure decides consistency of CSPs over convex base relations.
Proof (Proof sketch). By Thm. 18 we know that strong 3-consistency decides
global consistency. Since composition is strong, algebraic closure decides 3-
consistency and, since we have 2 consistency, it decides strong 3-consistency
too. Thus algebraically closed scenarios are either inconsistent (containing the
empty relation) or globally consistent. Put differently, global consistency and
consistency coincide.
Corollary 20. For CSPs over convex {LR, DCC}-relations strong 7-consistency
decides global consistency.
Proof. Follows directly from Thm. 18 for both calculi.
Corollary 21. Global consistency of scenarios in convex {LR, DCC}-relations
is polynomially decidable.
Proof. Compute the set of strongly 7-consistent scenarios in constant time (e.g.
using quantifier elimination2 ). The given scenario is strongly 7-consistent iff all
7-point subscenarios are contained in the set of strongly 7-consistent scenarios.
By Thm. 18 this decides global consistency.
Unfortunately consistency and global consistency are not equivalent in the LR-
calculus.
Proposition 22. For the LR-calculus not every consistent scenario is globally
consistent.
DD
DD yy
DD yy
DD yy
D yyy
89:;
?>=<
E@ 89:;
?>=<
B 89:;
?>=<
C
~~ @@
~~ @@
~~ @@
~ @@
~~
89:;
?>=<
B 89:;
?>=<
D@ 89:;
?>=<
A 89:;
?>=<
D
~ @@
~~ @@
~~~ @@
~ @@
~~
89:;
?>=<
A 89:;
?>=<
C?
??
??
??
?
Acknowledgements
This work was supported by the DFG Transregional Collaborative Research Cen-
ter SFB/TR 8 “Spatial Cognition”, projects I4-[Spin] and R3-[Q-Shape]. Funding
by the German Research Foundation (DFG) is gratefully acknowledged.
438 D. Lücke, T. Mossakowski, and D. Wolter
References
18. Moratz, R., Dylla, F., Frommberger, L.: A relative orientation algebra with ad-
justable granularity. In: Proceedings of the Workshop on Agents in Real-Time,
and Dynamic Environments (IJCAI 2005) (2005)
19. Freksa, C.: Using orientation information for qualitative spatial reasoning. In: Pro-
ceedings of the International Conference GIS - From Space to Territory: Theories
and Methods of Spatio-Temporal Reasoning on Theories and Methods of Spatio-
Temporal Reasoning in Geographic Space, London, UK, pp. 162–178. Springer,
Heidelberg (1992)
20. Renegar, J.: On the computational complexity and geometry of the first order
theory of the reals. Part I–III. Journal of Symbolic Computation 13(3), 301–328
(1992); 255300, 301328, 329352
(A C l E) (A E l D) (E C l D) (A C l D)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(A C l B) (A B l E) (B C l E) (A C l E)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(E A l B) (E B l C) (B A l C) (E A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C D l B) (C B l A) (B D l A) (C D l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C D l E) (C E l A) (E D l A) (C D l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C E l B) (C B l A) (B D l A) (C E l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(E C r B) (E B r A) (B C r A) (E C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D A l B) (D B l C) (B A l C) (D A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
440 D. Lücke, T. Mossakowski, and D. Wolter
(D A l E) (D E l C) (E A l C) (D A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(A D r B) (A B r C) (B D r C) (A D r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(A D r E) (A E r C) (E D r C) (A D r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(A E r B) (A B r C) (B E r C) (A E r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r B) (C B r E) (B A r E) (C A r E)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r E) (C E r D) (E A r D) (C A r D)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r B) (C B r D) (B A r D) (C A r D)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D C r B) (D B r A) (B C r A) (D C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D C r E) (D E r A) (E C r A) (D C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
Author Index