You are on page 1of 158

123

Table of Contents

Keynote lectures ............................................................................................................................................................................................. S6


The perception of others goal-directed actions ............................................................................................................................. S6
Body ownership, self-location, and embodied cognition ............................................................................................................ S6
Life as we know it ................................................................................................................................................................................... S6
Elements of extreme expertise ............................................................................................................................................................. S7
Dynamic Field Theory: from the sensory-motor domain to embodied higher cognition ................................................. S7
How t(w)o perform actions together .................................................................................................................................................. S7
Symposia ........................................................................................................................................................................................................... S8
DRIVER COGNITION ............................................................................................................................................................................. S8
The CSB model: A cognitive approach for explaining speed behavior ................................................................................. S8
Validation of the Driving by Visual Angle car following model ............................................................................................ S8
The effects of event frequency and event predictability on drivers attention allocation ................................................ S9
Integrated modeling for safe transportation (IMoST 2): driver modeling & simulation .................................................. S9
Simulating the influence of event expectancy on drivers attention distribution ................................................................ S9
PROCESSING LANGUAGE IN CONTEXT: INSIGHTS FROM EMPIRICAL APPROACHES ............................... S10
Investigations into the incrementality of semantic interpretation: the processing of quantificational restriction ... S10
When the polar bear fails to find a referent: howare unmet presuppositions processed? .............................................. S10
Deep or surface anaphoric pronouns?: Empirical approaches ................................................................................................. S10
Comparing presuppositions and scalar implicatures ................................................................................................................... S11
The time course of referential resolution ....................................................................................................................................... S11
COGNITION OF HUMAN ACTIONS: FROM INDIVIDUAL ACTIONS TO INTERACTIONS .............................. S11
Signaling games in sensorimotor interactions............................................................................................................................... S11
Perceptual cognitive processes underlying the recognition of individual and interactive actions ............................... S11
Neural theory for the visual processing of goal-directed actions ........................................................................................... S11
From individual to joint action: representational commonalities and differences ............................................................ S12
Neural mechanisms of observing and interacting with others ................................................................................................. S12
CORTICAL SYSTEMS OF OBJECT GRASPING AND MANIPULATION ..................................................................... S12
Influences of action characteristics and hand used on the neural correlates of planning and executing object
manipulations ........................................................................................................................................................................................... S13
Attention is needed for action control: evidence from grasping studies .............................................................................. S13
Effects of object recognition on grasping ...................................................................................................................................... S13
The representation of grasping movements in the human brain ............................................................................................. S13
Avoiding obstacles without a ventral visual stream ................................................................................................................... S14
Action and semantic object knowledge are processed in separate but interacting streams: evidence from fMRI
and dynamic causal modelling........................................................................................................................................................... S14
EYE TRACKING, LINKING HYPOTHESES AND MEASURES IN LANGUAGE PROCESSING ........................ S14
Conditional analyses of eye movements......................................................................................................................................... S14
Rapid small changes in pupil size index processing difficulty: the index of cognitive activity in reading, visual world,
and dual task paradigms ...................................................................................................................................................................... S15
Measures in sentence processing: eye tracking and pupillometry.......................................................................................... S15
Improving linking hypotheses in visually situated language processing: combining eye movements and event-related
brain potentials ........................................................................................................................................................................................ S16
Oculomotor measurements of abstract and concrete cognitive processes ........................................................................... S16
MANUAL ACTION ................................................................................................................................................................................ S16
The Bremen-Hand-Study@Jacobs: effects of age and expertise on manual dexterity .................................................... S17

123

Planning anticipatory actions: on the interplay between normative and mechanistic models ...................................... S17
Identifying linguistic and neural levels of interaction between gesture and speech during comprehension using EEG
and fMRI ................................................................................................................................................................................................... S17
Neural correlates of gesture-syntax interaction ............................................................................................................................ S18
Interregional connectivity minimizes surprise responses during action perception .......................................................... S18
The development of cognitive and motor planning skills in young children ..................................................................... S18
PREDICTIVE PROCESSING: PHILOSOPHICAL AND NEUROSCIENTIFIC PERSPECTIVES ............................. S18
Bayesian cognitive science, unification, and explanation ......................................................................................................... S19
The explanatory heft of Bayesian models of cognition ............................................................................................................. S19
Predictive processing and active inference .................................................................................................................................... S19
Learning sensory predictions for perception and action ............................................................................................................ S19
Layer resolution fMRI to investigate cortical feedback and predictive coding in the visual cortex .......................... S19
HOW LANGUAGE AND NUMERICAL REPRESENTATIONS CONSTITUTE MATHEMATICAL
COGNITION ........................................................................................................................................................................................... S20
Influences of number word inversion on multi-digit number processing: a translingual eye-tracking study .......... S20
On the influence of linguistic and numerical complexity in word problems ..................................................................... S21
Linguistic influences on numerical understanding: the case of Welsh................................................................................. S21
Reading space into numbers: an update ......................................................................................................................................... S21
How language and numerical representations constitute mathematical cognition: an introductory review ............. S21
Language influences number processing: the case of bilingual Luxembourg .................................................................... S21
Language differences in basic numerical tasks ............................................................................................................................ S22
Cognitive components of the mathematical processing network in primary school children: linguistic and language
independent contributions .................................................................................................................................................................... S22
It does exist! A SNARC effect amongst native Hebrew speakers is masked by the MARC effect........................... S22
MODELING OF COGNITIVE ASPECTS OF MOBILE INTERACTION .......................................................................... S22
Creating cognitive user models on the basis of abstract user interface models ................................................................ S22
Expectations during smartphone application use ......................................................................................................................... S22
Evaluating the usability of a smartphone application with ACT-R ....................................................................................... S23
Simulating interaction effects of incongruous mental models................................................................................................. S24
Special offer! Wanna buy a trout?Modeling user interruption and resumption strategies with ACT-R ......... S24
Tutorials .......................................................................................................................................................................................................... S25
Introduction to probabilistic modeling and rational analysis ................................................................................................... S25
Modeling vision ...................................................................................................................................................................................... S25
Visualization of eye tracking data .................................................................................................................................................... S25
Introduction to cognitive modelling with ACT-R ....................................................................................................................... S25
Dynamic Field Theory: from sensorimotor behaviors to grounded spatial language ...................................................... S25
Poster presentations ..................................................................................................................................................................................... S27
The effect of language on spatial asymmetry in image perception ....................................................................................... S27
Towards formally founded ACT-R simulation and analysis.................................................................................................... S27
Identifying inter-individual planning strategies ............................................................................................................................ S28
Simulating events. The empirical side of the event-state distinction .................................................................................... S29
On the use of computational analogy-engines in modeling examples from teaching and education ......................... S30
Brain network states affect the processing and perception of tactile near-threshold stimuli ........................................ S31
A model for dynamic minimal mentalizing in dialogue ........................................................................................................... S32
Actions revealing cooperation: predicting cooperativeness in social dilemmas from the observation of everyday
actions ........................................................................................................................................................................................................ S33
The use of creative analogies in a complex problem situation ............................................................................................... S34
Yes, thats right? Processing yes and no and attention to the right vs. left........................................................................ S35
Perception of background color in head mounted displays: applying the source monitoring paradigm ................... S36
Continuous goal dynamics: insights from mouse-tracking and computational modeling .............................................. S37
Looming auditory warnings initiate earlier event-related potentials in a manual steering task ................................... S38
The creative process across cultures ................................................................................................................................................ S38

123

How do human interlocutors talk to virtual assistants? A speech act analysis of dialogues of cognitively impaired people
and elderly people with a virtual assistant..................................................................................................................................... S40
Effects of aging on shifts of attention in perihand space ......................................................................................................... S41
The fate of previously focused working memory content: decay or/and inhibition? ...................................................... S41
How global visual landmarks influence the recognition of a city ......................................................................................... S42
Explicit place-labeling supports spatial knowledge in survey, but not in route navigation .......................................... S44
How important is having emotions for understanding others emotions accurately? ...................................................... S45
Prosody conveys speakers intentions: acoustic cues for speech act perception ............................................................... S46
On the perception and processing of social actions.................................................................................................................... S46
Stage-level and individual-level interpretation of multiple adnominal adjectives as an epiphenomenontheoretical
and empirical evidence ......................................................................................................................................................................... S47
What happened to the crying bird? Differential roles of embedding depth and topicalization modulating syntactic
complexity in sentence processing ................................................................................................................................................... S48
fMRI-evidence for a top-down grouping mechanism establishing object correspondence in the Ternus display . S48
Event-related potentials in the recognition of scene sequences .............................................................................................. S49
Sensorimotor interactions as signaling games .............................................................................................................................. S50
Subjective time perception of verbal action and the sense of agency .................................................................................. S51
Memory disclosed by motion: predicting visual working memory performance from movement patterns ............. S52
Role and processing of translation in biological motion perception ..................................................................................... S53
How to remember Tubingen? Reference frames in route and survey knowledge of ones city of residency ......... S53
The effects of observing other peoples gaze: faster intuitive judgments of semantic coherence .............................. S54
Towards a predictive processing account of mental agency .................................................................................................... S55
The N400 ERP component reflects implicit prediction error in the semantic system: further support from a connectionist
model of word meaning ....................................................................................................................................................................... S56
Similar and differing processes underlying carry and borrowing effects in addition and subtraction: evidence from eyetracking ...................................................................................................................................................................................................... S57
Simultaneous acquisition of words and syntax: contrasting implicit and explicit learning ........................................... S58
Towards a model for anticipating human gestures in human-robot interactions in shared space ............................... S59
Preserved expert object recognition in a case of unilateral visual agnosia ......................................................................... S60
Visual salience in human landmark selection ............................................................................................................................... S60
Left to right or back to front? The spatial flexibility of time ................................................................................................. S61
Smart goals, slow habits? Individual differences in processing speed and working memory capacity moderate
the balance between habitual and goal-directed choice behavior .......................................................................................... S62
Tracing the time course of n - 2 repetition costs ...................................................................................................................... S62
Language cues in the formation of hierarchical representation of space............................................................................. S63
Processing of co-articulated place information in lexical access ........................................................................................... S64
Disentangling the role of inhibition and emotional coding on spatial stimulus devaluation ........................................ S65
The role of working memory in prospective and retrospective motor planning ............................................................... S66
Temporal preparation increases response conflict by advancing direct response activation ......................................... S67
The flexibility of finger-based magnitude representations ....................................................................................................... S68
Object names correspond to convex entities ................................................................................................................................. S69
The role of direct haptic feedback in a compensatory tracking task .................................................................................... S71
Comprehending negated action(s): embodiment perspective ................................................................................................... S71
Effects of action signaling on interpersonal coordination ........................................................................................................ S72
Physiological changes through sensory augmentation in path integration: an fMRI study ........................................... S73
Do you believe in Mozart? The influence of beliefs about composition on representing joint action outcomes
in music ..................................................................................................................................................................................................... S73
Processing sentences describing auditory events: only pianists show evidence for an automatic space pitch
association ................................................................................................................................................................................................ S74
A free energy approach to template matching in visual attention: a connectionist model ............................................ S75
ORAL PRESENTATIONS ....................................................................................................................................................................... S77
Analyzing psychological theories with F-ACT-R: an example F-ACT-R application .................................................... S79

123

F-ACT-R: defining the ACT-R architectural space .................................................................................................................... S81


Defining distance in language production: extraposition of relative clauses in German ............................................... S81
How is information distributed across speech and gesture? A cognitive modeling approach ...................................... S84
Towards formally well-founded heuristics in cognitive AI systems ..................................................................................... S87
Action planning is based on musical syntax in expert pianists. ERP evidence................................................................. S89
Motor learning in dance using different modalities: visual vs. verbal models .................................................................. S90
A frontotemporoparietal network common to initiating and responding to joint attention bids .................................. S93
Action recognition and the semantic meaning of actions: how does the brain categorize different social actions? S95
Understanding before language ......................................................................................................................................................... S95
An embodied kinematic model for perspective taking .............................................................................................................. S97
The under-additive effect of multiple constraint violations ................................................................................................... S100
Strong spatial cognition...................................................................................................................................................................... S103
Inferring 3D shape from texture: a biologically inspired model architecture .................................................................. S105
An activation-based model of execution delays of specific task steps ............................................................................... S107
How action effects influence dual-task performance ............................................................................................................... S110
Introduction of an ACT-R based modeling approach to mental rotation .......................................................................... S112
Processing linguistic rhythm in natural stories: an fMRI study............................................................................................ S114
Numbers affect the processing of verbs denoting movements in vertical space ............................................................. S115
Is joint action necessarily based on shared intentions? ........................................................................................................... S117
A general model of the multi-level architecture of mental phenomena. Integrating the functional paradigm
and the mechanistic model of explanation................................................................................................................................... S119
A view-based account of spatial working and long-term memories: Model and predictions ..................................... S120
Systematicity and Compositionality in Computer Vision ....................................................................................................... S123
Control and flexibility of interactive alignment: Mobius syndrome as a case study..................................................... S125
Efficient analysis of gaze-behavior in 3D environments ........................................................................................................ S127
The role of the posterior parietal cortex in relational reasoning .......................................................................................... S129
How to build an inexpensive cognitive robot: Mind-R ........................................................................................................... S131
Crossed hands stay on the time-line .............................................................................................................................................. S134
Is the novelty-P3 suitable for indexing mental workload in steering tasks? .................................................................... S135
Modeling perspective-taking by forecasting 3D biological motion sequences ................................................................ S137
Matching quantifiers or building models? Syllogistic reasoning with generalized quantifiers .................................. S139
What if you could build your own landmark? The influence of color, shape, and position on landmark salience S142
Does language shape cognition? ..................................................................................................................................................... S144
Ten years of adaptive rewiring networks in cortical connectivity modeling. Progress and perspectives ............... S146
Bayesian mental models of conditionals ...................................................................................................................................... S148
Visualizer verbalizer questionnaire: evaluation and revision of the German translation ............................................. S151
AUTHOR INDEX ..................................................................................................................................................................................... S155

Disclosure: This issue was not sponsored by external commercial interests.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


DOI 10.1007/s10339-014-0632-2

ABSTRACTS

Special Issue: Proceedings of KogWis 2014


12th Biannual conference of the German cognitive science society
(Gesellschaft fur Kognitionswissenschaft)
Edited by Anna Belardinelli and Martin V. Butz

Keynote lectures
The perception of others goal-directed actions
Harold Bekkering
Donders Institute for Brain, Cognition and Behavior, Radboud
University Nijmegen, The Netherlands
It is widely assumed that perception of the world is based on internal
models of that world and that models are shaped via prior experiences
that modulate the likelihood of a certain action given a certain context. In this talk, I will outline some experimental and theoretical
ideas how humans perceive goal-directed actions of others on the
basis of object and movement knowledge. I will also discuss a
potential role for language in improving our world model including a
better perception of other agents goal-directed actions.

Body ownership, self-location, and embodied cognition


H. Henrik Ehrsson
Department of Neuroscience, Karolinska Institutet, Stockholm,
Sweden
Ask any child if his hands belong to him and the answer will be Of
course! However, how does the brain actually identify its own
body? In this talk, Dr. Ehrsson will describe how cognitive neuroscientists have begun to address this fundamental question. One key
idea is that parts of the body are distinguished from the external
world by the patterns of the correlated information they produce
from different sensory modalities (vision, touch and muscle sense).
It is hypothesized that these correlations are detected by neuronal
populations in premotor and posterior parietal areas that integrate
multisensory information from the space near the body. Dr. Ehrsson
and his team have recently used a combination of functional magnetic resonance imaging (fMRI) and human behavioral experiments
to present experimental results that support these predictions. To
change the feeling of body ownership, perceptual illusions were
used so that healthy individuals experienced a rubber hand as their
own, their real hand being disowned, or that a mannequin was
their body.
Dr. Ehrsson will also describe recent experiments that investigate how
we come to experience our body as being located at a specific place in the
world, and how this sense of self-location depends on body ownership To
this end an out-of-body illusion was used to perceptually teleport

123

participants bodily self to different locations during high-resolution


fMRI acquisition. It was found that activity patterns in the hippocampus,
retrosplenial, posterior cingulate, and posterior parietal cortices reflected
the sense of self-location, and that the functional interplay between selflocation and body ownership was mediated via the posterior cingulate
cortex suggesting a key role of this structure in generating the coherent
experience of the bodily self in space.
In the final part of his talk Dr. Ehrsson will discuss recent studies
that have investigated how the central construct of the bodily self
influences other higher cognitive functions such as the visual perception of the world and the ability to remember personal events
(embodied cognition). These experiments suggest that the representation of ones own body affects visual perception of object size by
rescaling the visual representation of external space, and that efficient
hippocampus-based episodic-memory encoding requires a first-person
perspective of the spatial relationship between the body and the
world. Taken together, the studies reviewed in this lecture advance
our understanding of how we come to experience ownership of a body
located at a single place, and unravel novel basic links between
central body representation, visual perception of the world and episodic memory.

Life as we know it
Karl Friston
Wellcome Trust Centre for Neuroimaging, Institute of Neurology,
University College London, UK
How much about our interaction withand experience ofour world
can be deduced from basic principles? This talk reviews recent
attempts to understand the self-organized behavior of embodied
agentslike ourselvesas satisfying basic imperatives for sustained
exchanges with our world. In brief, one simple driving force appears
to explain nearly every aspect of our behavior and experience. This
driving force is the minimization of surprise or prediction error. In the
context of perception, this corresponds to (Bayes-optimal) predictive
coding that suppresses exteroceptive prediction errors. In the context
of action, simple reflexes can be seen as suppressing proprioceptive
prediction errors. We will look at some of the phenomena that emerge
from this formulation, such as hierarchical message passing in the
brain and the perceptual inference that ensues. I hope to illustrate
these points using simple simulations of how life-like behavior
emerges almost inevitably from coupled dynamical systemsand
how this behavior can be understood in terms of perception, action
and action observation.

Cogn Process (2014) 15 (Suppl 1):S1S158

Elements of extreme expertise


Wayne D. Gray
Rensselaer Polytechnic Institute, Troy, NY, USA
We are studying the acquisition and deployment of extreme expertise
during the real-time interaction of a single human with complex,
dynamic decision environments. Our dilemma is that people who
have the specific skills we wish to generalize to (such as helicopter
piloting, laparoscopic surgery, and air traffic control) are very rare in
the college population and too expensive to bring into our lab. Our
solution has been to study expert and novice video game players. Our
approach takes the position that Cognitive Science has been overly
fixated on isolating small components of individual cognition. That
approach runs the danger of overfitting theories to paradigms. Our
way out of this dilemma is to bring together (a) powerful computational models, (b) machine learning techniques, and (c) microanalysis
techniques that integrate analyzes of cognitive, perceptual, and action
data collected from extreme performers to develop, test, and extend
cognitive theory.
Since our January 2013 start, we have built our experimental
paradigm, collected naturalistic and laboratory data, published journal
and conference papers, won Rensselaer Undergraduate research prizes, developed single-piece optimizers (SPOs, i.e., machine
learning systems), compared machine performers to human performers, and begun analyzing eye and behavioral data from two 6 h
human studies. Our tasks have been the games of Tetris and Space
Fortress. Future plan include (a) using our SPOs to tutor piece-bypiece placement, (b) developing integrated cognitive models that
account for cognition, action, and perception, and (c) continued
exploration of the differences between good players and extreme
experts in Tetris and Space Fortress.
Games such as Tetris and Space Fortress are often dismissed as
merely requiring reflex behavior. However, with an estimated total
number of board configurations of 2199 (approx. 8 followed by 59
zeroes), Tetris cannot be merely reflect behavior. Our preliminary
analyzes show complex goal hierarchies, dynamic two-piece plans
that are updated after every episode, sophisticated use of subgoaling,
and the gradual adaptation of strategies and plans as the speed of play
increases. These are very sophisticated, human strategies, beyond our
current capability to model, and are challenging topic for the study of
the Elements of Extreme Expertise.

Dynamic Field Theory: from the sensory-motor domain


to embodied higher cognition

S7
been extended to understand elements of visual cognition such as
scene representations, object recognition, change detection, and
binding. Sequences of cognitive or motor operations can be understood in this framework, which begins to reach into language by
providing simple forms of grounding of spatial and action concepts.
Discrete events emerge from instabilities in the underlying neural
dynamics. Categories emerge from inhomogeneities in the underlying
neural populations that are amplified into macroscopic states by
dynamic instabilities. I will illustrate how the framework makes
contact with psychophysical and neural data, but can also be used to
create artificial cognitive systems that act and think based on its own
sensory and motor systems.

How t(w)o perform actions together


Natalie Sebanz
SOMBY LAB, Department of Cognitive Science, Central European
University, Budapest, Hungary
Humans are remarkably skilled at coordinating their actions with one
another. Examples range from shaking hands or lifting a box together
to dancing a tango or playing a piano duet. What are the cognitive and
neural mechanisms that enable people to engage in joint actions? How
does the ability to perform actions together develop? And why is it so
difficult to have robots engage in smooth interactions with humans
and with each other? In this talk, I will review recent studies
addressing two key ingredients of joint action: how individuals
include others in their action planning, and how they achieve the finegrained temporal coordination that is essential for many different
types of joint action. This research shows that people have a strong
tendency to form representations of others tasks, which affects their
perception and attention, their action planning, and their encoding of
information in memory. To achieve temporal coordination of their
actions, people reduce the variability of their movements, predict the
actions of their partners using their own motor system, and modulate
their own actions to highlight critical information to their partner. I
will discuss how social relations between individuals and groups and
the cooperative or competitive character of social interactions modulate these processes of action planning and coordination. The next
challenge for joint action research will be to understand how joint
action enables learning. This will allow us to understand what it takes
for people to become experts in particular joint actions, and how
experts teach individual skills through performing joint actions with
novices.

Gregor Schoner
Institut fur Neuroinformatik, Ruhr-Universitat Bochum, Germany
The embodiment stance emphasizes that cognitive processes are
closely linked to the sensory and motor surfaces. This implies that
cognitive processes share with sensory-motor processes fundamental
properties including graded state variables, continuous time dependence, stability, and continuous metric contents. According to the
embodiment hypothesis these properties are pervasive throughout
cognition. This poses the challenge to understand how seemingly
categorical states emerge, on which cognitive processes seem to
operate at discrete event times. I will review Dynamic Field Theory, a
theoretical framework that is firmly grounded in the neurophysiology
of population activation in the higher nervous system. Dynamic Field
Theory has its origins in the sensory-motor domain where it has been
used to understand movement preparation, sensory-motor decisions,
and motor memory. In the meantime, however, the framework has

123

S8

Symposia
DRIVER COGNITION
Convenor: Martin Baumann
Ulm University, Germany
From a psychological point of view driving is a highly complex task
despite the fact that millions of people perform this task in a safe and
efficient way each day. It involves many mental processes and
structures, such as perception, attention, memory, knowledge, manual
control, decision making, and action selection. These processes and
structures need to work closely integrated to master the challenges of
driving a vehicle in a highly dynamic task environmentour daily
traffic. On the other hand despite all advances in traffic safety in
recent years still about 31.000 people were killed in 2010 on European roads. A high percentage of these fatalities are due to human
error, which reflects a brake-down of the interplay between the
aforementioned cognitive processes. Therefore, understanding the
cognitive processes that underlie driver behavior is not just a highly
interesting academic endeavor to learn how the human mind masters
highly dynamic tasks but is also vital for further improvement of
traffic safety.
The papers presented in this symposium address different
aspects of driver cognition demonstrating the variety of processes
relevant in the study of driver cognition. They have all in common
that their empirical work is based on models of the underlying
mental processes, ranging from conceptual models to quantitative
and computational models. Two papers present recent results on
models addressing the drivers longitudinal control behavior.
Whereas Kathner and Kuhl present results on the validation of a
specific car following model that is based on those input variables
that are actually available to the human driver, Brandenburg and
Thuring present empirical results on the validation of a general
model of speed behavior based on the interplay of bottom-up and
top-down processes. Weber presents the results of a joint research
project that aimed at developing an integrated driver model within
a computational cognitive architecture, called CASCaS, allowing
simulations of driver behavior in a real-time driving simulation
environment. Both papers of Wortelen and Kaul and Baumann
investigate factors influencing the distribution of attention while
driving. Wortelen implemented a computational model of attention
distribution within the cognitive architecture CASCaS to model the
effects of expectations about event frequencies and of information
value on attention distribution. Kaul and Baumann investigated the
effects of event predictability in comparison to event frequency on
attention distribution to explain related findings on rear-end
accidents.

The CSB model: A cognitive approach for explaining


speed behavior
Stefan Brandenburg, Manfred Thuring
Cognitive Psychology and Cognitive Ergonomics, TU Berlin,
Germany
Based on Daniel Kahnemans (2012) distinction between highly
automated, fast processes (system 1) and conscious slower cognition
(system 2), the Components of Speed Behavior Model explains the
drivers longitudinal control of a vehicle by the interplay of bottomup and top-down processes.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


System 1 is active in common and uncritical situations. The
regulation of speed is determined by sensory data from the environment that are processed automatically without demanding many
resources. The resulting visual, auditive, haptic and kinesthetic
sensations are integrated into a subjective speed impression. An
unconscious and automated control process continuously matches
this impression against the drivers skills and resources. When his
capabilities are exceeded, the driver decelerates the vehicle, when
they are underchallenged he accelerates it. In case both components
are balanced, he keeps the speed constant. The drivers behavior
determines the objective speed of the vehicle that in turn impacts
his sensations and thus his subjective speed impression. Hence in
the dynamic situation of driving, system 1 is considered as a
closed-loop process that requires but little attention and controls the
speed of the car in an automated way. This process is monitored by
system 2 that is responsible for tactic and strategic actions. It takes
over control when a critical situation demands specific maneuvers
under attention or when decisions for way finding and navigation
are required.
The assumptions of the CSB Model with respect to system 1 were
tested in four experiments using a simple driving simulator. Their
results support the basic characteristics of the model. In the most
complex study, features of the environment were varied together with
the drivers mental workload. As predicted by the model, these
variables influenced the subjective impression of speed as well as the
objective speed. Besides such supporting evidence, additional influences were detected which served to state some components more
precisely and to expand the model. The final CSB version is published
in Brandenburg (2014).
References
Brandenburg S (2014) Geschwindigkeitswahl im Straenverkehr:
Theoretische Erklarung und empirische Untersuchungen. SVHVerlag, Saarbrucken
Kahnemann D (2012) Schnelles Denken - langsames Denken. Siedler
Verlag, Munchen

Validation of the Driving by Visual Angle car following


model
David Kathner, Diana Kuhl
Deutsches Zentrum fur Luft- und Raumfahrt, Braunschweig, Germany
Development and validation of Advanced Driver Assistance Systems
require both an understanding of driver behavior as well as the means
to quickly test systems under development at various stages. Quantitative models of human driver behavior offer both capabilities.
Conventional models attempt to reproduce or predict behavior based
on arbitrary input, whereas psychological models seek to emulate
human behavior based on assumed cognitive functions. One common
driving task is car following. As this is a straightforward control
problem, a plethora of control models exist (Brackstone, McDonnald
1999). But typical car following models use input variables that are
not directly accessible to human drivers, such as speed of or distance
to a lead vehicle. One example of such a model is the classic Helly car
following model (Helly 1959). Andersen and Sauer (2007) argued
that to a human driver the only available input parameter is the visual
angle of a lead vehicle. They substituted both velocities and distances
in Hellys model with the visual angle, changing the properties of the
controller considerably. They showed their Driving by Visual Angle
(DVA) model to be superior to other car following models but did not
compare it directly to the Helly model. In a simulator pre-study, we

Cogn Process (2014) 15 (Suppl 1):S1S158


recreated Anderson and Sauers experimental setting to gather
information on the DVA parameter properties and compared them to
the original findings. To test the models usability in real world settings, we conducted an extensive data collection in real traffic. On a
70 km course through urban and rural settings as well as on a
motorway, 10 subjects were instructed to follow a lead vehicle driven
by a confederate. We will present findings on the models quality,
properties of the models parameters such as their stability, and
compare them to similar models of car following.

S9

Integrated modeling for safe transportation (IMoST 2):


driver modeling & simulation
Lars Weber
OFFIS, Institute for Information Technology, Oldenburg, Germany

References
Andersen GJ, Sauer CW (2007) Optical information for car following: the driving by visual angle (DVA) model. Human Factors
49:878896
Brackstone M, McDonald M (1999) Car-following: a historical
review. Transp Res Part F 2(4):181196
Helly W (1959) Simulation of bottlenecks in single lane traffic flow.
In: International symposium on the theory of traffic flow,
New York, NY, USA

The effects of event frequency and event predictability


on drivers attention allocation
Robert Kaul1, Martin Baumann2
1
Deutsches Zentrum fur Luft- und Raumfahrt e.V., Institut fur
Verkehrssystemtechnik, Braunschweig, Germany; 2 Department
Human Factors, Ulm University, Germany
Safe driving requires the appropriate allocation of visual attention to
the relevant objects and events of a traffic situation. According to the
SEEV model (e.g., Horrey, Wickens, Consalus 2006) the allocation of
visual attention to a visual information source is influenced by four
parameters: i) the salience of the visual information, ii) the effort to
allocate attention to this source, iii) the expectancy, i.e. the expectation
that at a given location new relevant information will occur and iv) the
value or importance of the piece of information perceived at an
information source. Whereas the first two reflect more or less bottomup processes of attention allocation, the latter two reflect top-down
processes. According to the SEEV model the expectancy itself is
mainly determined by the frequency of events at that information
source or location. But it seems plausible to assume that these topdown processes, represented in the expectancy parameter of the model,
are also influenced by the by the predictability of events at a certain
information source. That is, many predictable events in channel cause
less attention allocation than a single but unexpected event.
In a driving simulator experiment, conducted within the EU project
ISi-PADAS, we compared the effects of event frequency and event
predictability on the allocation of visual attention. 20 participants
took part in this experiment. They had to drive in an urban area with
a lead car changing either frequently its speed or not at all on a
straight section before a crossing, braking either predictably at the
crossing (stop sign) or unpredictably (priority sign) at the crossing,
and simultaneously performing a visual secondary task with either
high frequency or low frequency stimulus presentation. Drivers gaze
behavior was recorded while driving. The results show drivers
allocation of visual attention is mainly determined by the predictability of the lead cars behavior demonstrating the importance of the
drivers ability to predict events as major determinant of driving
behavior.
References
Horrey WJ, Wickens CD, Consalus KP (2006) Modeling drivers
visual attention allocation while interacting with in-vehicle
technologies. J Exp Psychol Appl 12(2):6778

IMoST 21 is an interdisciplinary research project between the three


partners C.v.O. University of Oldenburg, OFFIS and the DLR Brunswick (20102013). The project addresses the question of completing the
scope of model-based design to also incorporate human behavior. The
application area is the design of advanced driver assistance systems
(ADAS) in the automotive domain. Compared to the predecessor project IMoST 1 which addressed a single driving maneuver only (entering
the autobahn), IMoST 2 increased the scope of the scenario and deals
with typical driving maneuvers on the autobahn, including lane changes
as well as free-flow and car-following.
The presentation will give an overview of the final state of driver
modeling and simulation activities conducted in the project. During the
3 years of the project a driver model was implemented based on the cognitive architecture CASCaS. The architecture incorporates several
psychological theories about human cognition and offers a flexible component based approach to integrate various human modeling techniques.
The presentation will provide a brief overview about the various submodels like multimodal perception, situation representation, decision
making and action selection/execution and how this architecture can be
used to model and simulate human machine interaction in the domain of
driver modeling. Additionally, some of the empirical study results will be
presented which were used to parameterize the model.

Simulating the influence of event expectancy on drivers


attention distribution
Bertram Wortelen
OFFIS, Institute for Information Technology, Oldenburg, Germany
The distribution of attention is a critical aspect of driving. The
increased use of assistance and automation system as well as new
1

Integrated Modeling for Safe Transportation 2 (Funded by MWK


Niedersachsen (VW Vorab)).

123

S10
infotainment systems changes the distribution of attention. This work
presents the Adaptive Information Expectancy (AIE) model, a new
model of attention distribution, which is based on Wickens SEEVmodel. It can be integrated into cognitive architectures which are used
to simulate task models. The AIE model enables a very detailed
simulation of the distribution of attention in close interaction with the
simulation of a task model. Unlike the SEEV model, simulations using
the AIE model allow to derive several measures of human attention
distribution besides the percentage gaze distribution, like gaze frequencies and gaze transition probabilities. Due to the tight integration
with the simulation of task models, it is also possible to simulate the
resulting consequences on the operators behavior (e.g. steering
behavior of drivers). The AIE model considers two factors which have
a great impact on drivers attention: the expectancy of events and the
value of information. The main focus is on the expectancy of events.
The AIE model provides a new method to automatically determine the
event expectancy from the simulation of a task model.
It is shown how the AIE model is integrated in the cognitive
architecture CASCaS. A driving simulator study is performed to
analyze the AIE model in a realistic driving environment. The simulation scenario is driven by human drivers as well as by a driver
model developed with CASCaS using the AIE model. This scenario
investigates the effects of both factors on drivers attention distribution: event expectancy and information value. Comparing the
behavior of the human drivers to model behavior shows a good model
fit for the percentage distribution of attention as well as gaze frequencies and gaze transition probabilities.

Cogn Process (2014) 15 (Suppl 1):S1S158


provide an interdisciplinary platform for linguists and cognitive
psychologists to discuss questions pertaining the cognitive processing
of language. Our speakers will present their research obtained by
means of different empirical approaches.

Investigations into the incrementality of semantic


interpretation: the processing of quantificational
restriction
Petra Augurzky, Oliver Bott, Wolfgang Sternefeld, Rolf Ulrich
SFB 833, University of Tubingen, Germany
Language comprehenders have the remarkable ability to restrict
incoming language seemingly effortless in a way that it optimally fits
the referential domain of discourse. We present a study which
investigates the incremental nature of this update process, in particular, whether the semantic processor immediately takes into account
the context of the utterance to incrementally compute and, if necessary, reanalyze the semantic value of yet partial sentences.

When the polar bear fails to find a referent: how


are unmet presuppositions processed?
Christian Brauner, Bettina Rolke
SFB 833, University of Tubingen, Germany

PROCESSING LANGUAGE IN CONTEXT:


INSIGHTS FROM EMPIRICAL APPROACHES
Convenors: Christian Brauner, Gerhard Jager, Bettina Rolke
Project B2, SFB 833, University of Tubingen, Germany
Discourse understanding does not only mean integrating semantic
knowledge along syntactic rules. It rather needs a Theory of Mind,
entails the inclusion of context information, and presupposes that
pragmatic principles were met. Moreover, data from brain imaging
studies suggest that language is embodied within the motor and
sensory processing systems of the brain. Thus, it seems clear that the
faculty of language does not constitute a single, encapsulated processing module. Instead it requires the interoperation of several
different processing modules serving to aid an unambiguous discourse
understanding.
Important processing prerequisites for successful discourse
understanding are the ability to make references to previously
established knowledge and to integrate new information into a given
context. There are several linguistic tools which help to signal the
requirement for suitable referents in a given discourse and which
provide additional meaning aspects. One example are presuppositions. These carry context assumptions aside from the literal meaning
of the words. For example, the sentence The cat ran away asserts
that some cat ran away, whereas it presupposes that there exists a cat
and that the cat that is mentioned is unique in the discourse. This
symposium will have its main focus on the cognitive processing of
such semantic and pragmatic phenomena.
The interconnection of the faculty of language with different
cognitive processing modules confronts us with questions that seem to
escape a uniform analysis by one single academic discipline. Hence,
research into cognitive language processing and pragmatics in particular are a fruitful interdisciplinary interface between linguistics and
cognitive psychology. While linguists have mainly focused on theoretical aspects of pragmatics, cognitive psychologists aimed to
identify involved cognitive processing functions. The symposium will

123

Discourse understanding fails when presuppositions, i.e., essential


context information, are not given. We investigated the time-course of
presupposition processing by presenting presupposition triggers such
as the definite article or the iterative again in a context which
contrasted with the presupposed content of the trigger or was compatible with it. By means of reading-time studies and event-related
brain potentials we show that readers notice semantic inconsistencies
at the earliest time point during reading. Our results additionally
suggest that different presupposition processing strategies were
employed depending on the type of required reference process.

Deep or surface anaphoric pronouns?: Empirical


approaches
Pritty Patel-Grosz, Patrick Grosz
SFB 833, University of Tubingen, Germany
Anaphoric expressions, such as pronouns (he/she/it/they), which generally retrieve earlier information (e.g. a discourse referent), are typically
taken to be central in establishing textual coherence. From a cognitive/
processing perspective, the following question has been posed by Hankamer, Sag (1976) and Sag, Hankamer (1984): do all anaphoric expressions
involve the same cognitive mechanisms, or is there a difference between
deep anaphora (which retrieves information directly from the context)
vs. surface anaphora (which operates on structural information/syntactic
principles)? This question has largely been investigated for phenomena
such as do it anaphora vs. VP ellipsis, but it also bears relevance for
pronouns proper: in the course of categorizing pronouns into weak vs.
strong classes (cf. Cardinaletti, Starke 1999), Wiltschko (1998) argues that
personal pronouns are deep anaphoric (by lacking an elided NP), whereas
demonstrative pronouns are surface anaphoric (and contain an elided NP).
We present new empirical evidence and argue that a distinction between
deep anaphoric vs. surface anaphoric pronouns must be rejected, at least in
the case of personal vs. demonstrative pronouns, and that the observed

Cogn Process (2014) 15 (Suppl 1):S1S158


differences between these classes can be deduced at the level of pragmatics, employing economy principles in the spirit of Cardinaletti, Starke
(1999) and Schlenker (2005).
References
Cardinaletti A, Michal S (1999) The typology of structural deficiency:
a case study of three classes of pronouns. In Henk van Riemsdijk
(ed) Clitics in the languages of Europe. Mouton, Berlin,
pp 145233
Hankamer J, Ivan S (1976) Deep and surface anaphora. Linguist Inq
7:391426
Ivan S, Hankamer J (1984) Toward a theory of anaphoric processing.
Linguist Philos 7:325345
Schlenker P (2005) Minimize restrictors! (Notes on definite descriptions, condition C and epithets. In Proceedings of Sinn und
Bedeutung 2004, pp 385416
Wiltschko M (1998) On the syntax and semantics of (relative) pronouns and determiners. J Comper German Linguisti 2:143181

Comparing presuppositions and scalar implicatures


Jacopo Romoli
University of Ulster, UK
In a series of experiments sentences were used containing a presupposition that was either compatible or incompatible to a context
sentence. This was compared to a sentence in a context containing
either a compatible or incompatible scalar implicature. The talk will
draw some conclusions on the cognitive cost of presuppositions in
relation to the putative cost of scalar implicatures.

The time course of referential resolution


Petra Schumacher
University of Mainz, Germany
Referential expressions are essential ingredients for speaker-hearer
interactions. During reference resolution incoming information
must be linked with prior context and also serves information
progression. Speakers use different referential forms and other
means of information packaging (e.g., linear order, prosody) to
convey additional meaning aspects. Using event-related brain
potentials, we can investigate the time course of reference resolution and examine how comprehenders exploit multiple cues
during the construction of a mental representation. In this talk, I
present data that indicate that reference resolution is guided by two
core mechanisms associated with i) referential accessibility and
expectation (N400) and ii) accommodation and mental model
updating (Late Positivity).

COGNITION OF HUMAN ACTIONS:


FROM INDIVIDUAL ACTIONS TO INTERACTIONS
Convenor: Stephan de la Rosa
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Previous research has focused on the perceptual cognitive processes
involved in the execution and observation of individual actions such
as a person walking. Only more recently research started to investigate the perceptual-cognitive processes involved in the interaction of
two or more people. This symposium provides an interdisciplinary

S11
view regarding the relationship between individual actions and
interactions. It will provide new insights from several research fields
including decision making, neuroscience, philosophy of neuroscience,
computational neuroscience, and psychology. The aim of the symposium is give a state of the art overview about commonalities and
differences of the perceptual cognitive processes underlying individual actions and social interactions.

Signaling games in sensorimotor interactions


Daniel Braun
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
In our everyday lives, humans not only signal their intentions through
verbal communication, but also through body movements, for instance
when doing sports to inform team mates about ones own intended
actions or to feint members of an opposing team. Here, we study such
sensorimotor signaling in order to investigate how communication
emerges and on what variables it depends on. In our setup, there are
two players with different aims that have partial control in a joint
motor task and where one of the two players possesses private information the other player would like to know about. The question then is
under what conditions this private information is shared through a
signaling process. We manipulated the critical variables given by the
costs of signaling and the uncertainty of the ignorant player. We found
that the dependency of both players strategies on these variables can
be modeled successfully by a game-theoretic analysis.

Perceptual cognitive processes underlying


the recognition of individual and interactive actions
Stephan de la Rosa
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Humans are social beings whose physical interactions with other
people require rapid recognition of the other person actions, for
example when shaking hands. Previous research has investigated the
perceptual cognitive processes involved in action recognition using
open loop experiments. In these experiments participants passively
view actions during recognition. These studies identified several
important bottom-up mechanisms in action recognition. However, in
daily life participants often recognize action for or during action
production. In order to fully understand action recognition under more
realistic conditions, we examined visual action perception in classical
open-loop (participants observe actions), semi-closed (participants
interact with an avatar which carries out prerecorded actions), and
closed loop experiments (two participants interact naturally with each
other using feedback loops). Our results demonstrate the importance
of considering high level factors that are under top-down control in
action recognition.

Neural theory for the visual processing of goal-directed


actions
Martin. A. Giese
Section for Computational Sensomotorics, Dept. for Cognitive
Neurology, HIH and CIN, University Clinic Tubingen, Germany
The visual recognition of biological movements and actions is an
important visual function that involves computational processes that
link neural representations for action perception and execution.

123

S12
This fact has made this topic highly attractive for researchers in
cognitive neuroscience, and a broad spectrum of partially highly speculative theories have been proposed about the computational processes
that might underlie action vision in primate cortex. In spite of this very
active discussion about hypothetical computational and conceptual theories, our detailed knowledge about the underlying neural processes is
quite limited, and a broad spectrum of critical experiments that narrow
down the relevant computational key steps remain yet to be done.
I will present a physiologically-inspired neural theory for the processing of goal-directed actions, which provides a unifying account for
existing neurophysiological results on the visual recognition of hand
actions in monkey cortex. At the same time, the model accounts for
several new experimental results, where a part of these experiments were
motivated by testing aspects of the proposed neural theory. Importantly,
the present model accounts for many basic properties of cortical actionselective neurons by simple physiologically plausible mechanisms that
are known from visual shape and motion processing, without necessitating a central computational role of motor representations.
The same model also provides an account for experiments on the
visual perception of causality, suggesting that simple forms of causality perception might be a side-effect of computational processes
that mainly subserve the recognition of goal-directed actions. Extensions of the model might provide a basis for the investigation of the
neurodynamic phenomena in the visual processing of action stimuli.
Acknowledgments
Research supported by the EC FP7 projects AMARSi, Koroibot,
ABC, and Human Brain Project, and by the BMBF and the DFG.

From individual to joint action: representational


commonalities and differences
Hong Yo Wong
CIN, University of Tubingen, Germany
To what extent do the structures underpinning individual action differ
from those underpinning joint action? What are the representational
commonalities and differences between individual and joint action?
Can an individual account of planning intentions be extended to cover
the case of joint action (as suggested by Bratman)? What is the
phenomenology of acting together? Is an adequacy condition on a
theory of action that it must account for the action of an arbitrary
number of agents (as suggested by Butterfill)? This talk will approach
these questions from the point of view of the philosophy of action. We
will draw on recent empirical studies on joint action to reflect on
prominent philosophical accounts of joint action, using this as an
opportunity to reflect on the significance of a philosophy of action for
the science of action (and vice versa).

Neural mechanisms of observing and interacting


with others
Kai Vogeley
University Hospital Cologne, Germany
Over the last decade, cognitive neuroscience has started to systematically study the neural mechanisms of social cognition or

123

Cogn Process (2014) 15 (Suppl 1):S1S158


social information processing. Essentially, two different neural
systems have been established in this research domain that appear to
constitute two different routes of processing underlying our social
cognitive capacities in everyday social encounters, namely the socalled mirror neuron system (MNS) and the social neural network (SNN, also theory of mind network or mentalizing network).
The functional roles of both systems appear to be complementary.
The MNS serves comparatively early stages of social information
processing that are more related to spatial or bodily signals
expressed in the behaviour of others and supports the detection of
potential social salience, including observation of other persons
actions. Complementary to the functional role of the MNS, the SNN
serves comparatively late stages of social information processing
that are more related to the evaluation of emotional and psychological states of others that have to be inferred as inner mental
experience from the behaviour of this person. Empirical studies on
the neural mechanisms of ongoing social interactions with others
show that essentially SNN components are recruited during the
experience of social encounters together with the reward system of
the brain.

CORTICAL SYSTEMS OF OBJECT GRASPING


AND MANIPULATION
Convenor: Marc Himmelbach
Division of Neuropsychology, Hertie-Institute for Clinical Brain
Research, Centre for Integrative Neuroscience, University
of Tubingen, Germany
Reaching for objects, grasping them, and finally using or manipulating these objects are typical human capabilities. Although several
non-human species are able to do these things, the anatomical
adaptation of our hands for an extraordinarily precise and flexible use
in the interaction with an infinite number of different target objects
makes humans unique among the vertebrate species. The unique
anatomy of our hands is matched by a cortical sensorimotor control
system connecting multiple areas in the frontal and parietal lobes of
the human cortex, which underwent a considerable enlargement
across the primate species. Although our hands by themselves, their
flexible and precise use, and the capacities of our cortical hand motor
systems already distinguish us from all other species, the use of
objects as tools to act on further objects and thereby mediate and
transform our actions, makes us truly human. Although various nonhuman species use tools in some situations, the versatility of human
tool use is totally unrivalled. Neuropsychological and neuroimaging
research showed that dedicated cortical tool use systems overlap
partially with the arm/hand sensorimotor systems but include additional frontal, parietal, and temporal cortical structures. While most of
the structures that seem to be relevant for tool use beyond the armhand sensorimotor system have been identified, we are still missing a
satisfactory description of their individual functional contributions.
Across the whole range from simple grasping to the use of objects as
tools on other objects, investigations of interdependencies and interactions between these cortical system components are still at the
beginning. The speakers of this symposium together cover the range
from simple grasping to tool use and will present their current
behavioral, neuropsychological, and neuroimaging findings that further specify the functional description of the human object grasping
and manipulation systems.

Cogn Process (2014) 15 (Suppl 1):S1S158

Influences of action characteristics and hand used


on the neural correlates of planning and executing
object manipulations
Joachim Hermsdorfer1, Marie-Luise Brandi1,2, Christian Sorg2,
Georg Goldenberg3, Afra Wohlschlager2
1
Department of Sport and Movement Science, Technical University
Munich, Germany; 2 Department of Neurology, Technical University
Munich, Germany; 3 Department of Neuropsychology, Bogenhausen
Hospital, Germany
Studies using functional magnetic resonance imaging (fMRI) techniques
have revealed a wide-spread neural network active during the naming or
imagination of tool action as well as during pantomimes of tool use.
Actual tool has however only rarely been investigated due to methodological problems. We have constructed a tool carousel to enable the
controlled and quick presentation and use of a variety of everyday tools
and corresponding recipients, while restricting body movements to lower
arm and hand. In our paradigm we compared the use of tools as well as the
goal-directed manipulation of neutral objects with simple transportation.
We tested both hands in 17 right-handed healthy subjects. An action
network including parietal, temporal as well as frontal areas was found.
Irrespectively of the exact characteristics of the action, planning was
strongly lateralized to the left brain and involved similar areas, which
remained active during actual task execution. Handling a tool versus a
neutral bar and using an object versus simple transportation strengthens
the lateralization of the action network towards the left brain. The results
support the assumption that a dorso-dorsal stream is involved in the online manipulation of objects according to orientation and structure
independent of object knowledge. Regions of a ventral-dorsal pathway
process and code the specific knowledge of how a common tool is used.
Temporal-ventral areas identify objects and may code semantic tool
information. Use of the left-hand leads to a larger recruitment of action
areas, possibly to compensate for the lack of routine and automatism
when using the non-dominant hand.

S13

Effects of object recognition on grasping


Marc Himmelbach
Division of Neuropsychology, Hertie-Institute for Clinical Brain
Research, Centre for Integrative Neuroscience, University
of Tubingen, Germany
Grasping a manipulable object requires action programming and
object recognition, two processes that were supposed to be anatomically segregated in a dorsal and a ventral visual subsystem.
Our studies investigated interactions between these proposed subsystems studying the influence of familiar everyday objects on
grasp programming and its cortical representation in humans. Our
behavioral studies revealed an effect of learned identity-size
associations on reach-to-grasp movements under binocular viewing
conditions, counteracting veridical binocular depth and size
information. This effect of object recognition on grasp programming was further supported by differences in the scaling of the
maximum grip aperture between grasping featureless cuboids and
grasping recognizable everyday objects in healthy humans. A
subsequent fMRI experiment showed that during grasping everyday objects relative to grasping featureless cuboids BOLD signal
levels were not only increased at the lateral occipital cortex but
also at the anterior intraparietal sulcus, suggesting that objectidentity information is represented in the dorsal subsystem. Measuring reach-to-grasp kinematics in two patients with lateral
occipito-temporal brain damage we observed significant behavioral
deficits in comparison to a large healthy control group, suggesting
a causal link between visual processing in the ventral system and
grasp programming. In conclusion, our work shows that the recognition of a particular object not only affects grasp planning, i.e.
the selection of a broad motor plan, but also the parameterization
of reach-to-grasp movements.

The representation of grasping movements


in the human brain
Attention is needed for action control: evidence
from grasping studies
Constanze Hesse
School of Psychology, University of Aberdeen, UK
It is well known that during movement preparation, attention is
allocated to locations which are relevant for movement planning.
However, until now, very little research has examined the influence of
distributed attention on movement kinematics. In our experiments, we
investigated whether the execution of a concurrent perceptual task
that requires attentional resources interferes with movement planning
(primarily mediated by the ventral stream) and/or movement control
(primarily mediated by the dorsal stream) in grasping. Participants
had to grasp objects of varying sizes whilst simultaneously performing a perceptual identification task. Movement kinematics and
perceptual identification performance in the dual-task conditions were
compared to the baseline performance in both tasks (i.e. performance
levels in the absence of a secondary task). Furthermore, movement
kinematics were measured continuously such that interference effects
could also be detected at early stages of the movement. Our results
indicate that both movement planning (as indicated by prolonged
reaction times) as well as movement control (as indicated by a
delayed adjustment of the grip to the objects size) are altered when
attention has to be shared between a grasping task and a perceptual
task. These findings suggest that the dorsal and the ventral stream
share common attentional processing resources and that even simple
motor actions such as grasping are not completely automated.

Angelika Lingnau
Center for Mind/Brain Sciences, Department of Psychology
and Cognitive Science, University of Trento, Italy
Daily life activities require skillful object manipulations. Whereas
we begin to understand the neural substrates of hand prehension in
monkeys at the level of single cell spiking activity, we still have a
limited understanding of the representation of grasping movements
in the human brain. With recent advances in human neuroimaging,
such as functional magnetic resonance imaging (fMRI) repetition
suppression (fMRI-RS) and multi-variate pattern (MVP) analysis,
it has become possible to characterize some of the properties
represented in different parts of the human prehension system. In
this talk, I will present several studies using fMRI-RS and MVP
analysis that investigated the representation of reach direction,
wrist orientation, grip type and effector (left/right hand) of simple
non-visually guided reach-to-grasp movements. We observed a
preference for reach direction along the dorsomedial pathway, and
overlapping representations for reach direction and grip type along
the dorsolateral pathway, in line with a growing literature that
casts doubts on a clear-cut distinction between separate pathways
for the reach and grasp component. Moreover, we were able to
distinguish between premotor areas sensitive to grip type, wrist
orientation and effector, and parietal areas that are sensitive to grip
type across wrist orientation and grip type. Our results support the
view of a hierarchical representation of movements within the
prehension system.

123

S14

Avoiding obstacles without a ventral visual stream


Thomas Schenk
Department of Neurology, University of Erlangen-Nuremberg,
Germany
When reaching for a target it is important to avoid knocking over
objects that stand in the way. We do this without thinking about it.
Experiments in a hemiblind patient demonstrate that obstacles that are
not perceived can be avoided. To account for such dissociations the
two visual-streams model suggests that perception is handled in the
ventral visual stream while visually-guided action depends on visual
input from the dorsal stream. The model also assumes that the dorsal
stream cannot store visual information. Consequently it is predicted
that patients with dorsal stream damage will fail in the obstacleavoidance task, but succeed when a short delay is introduced between
obstacle presentation and response onset. This has been confirmed in
patients with optic ataxia. In contrast ventral stream damage should
allow normal obstacle avoidance but destroy the patients ability to
avoid obstacles in a delayed condition. We tested these predictions in
DF. As expected we found that she can avoid obstacles in the standard
condition. More surprisingly she is equally good in the delayed
condition and a subtle change in the standard condition is sufficient to
impair her obstacle-avoidance skills. The implications of these findings for the control of reaching will be discussed.

Action and semantic object knowledge are processed


in separate but interacting streams: evidence
from fMRI and dynamic causal modelling
Peter H. Weiss-Blankenhorn
Department of Neurology, University Hospital Cologne, Germany &
Cognitive Neuroscience, Institute of Neuroscience & Medicine (INM3), Research Centre Julich, Germany
While manipulation knowledge is differentially impaired in patients
suffering from apraxia, function knowledge about objects is selectively impaired in patients with semantic dementia. These clinical
observations fuelled the debate whether manipulation and function
knowledge about objects rely on differential neural substrates, as the
processing of function knowledge may be based on either the action
or the semantic system.
By using new experimental tasks and effective connectivity analysis, fMRI studies can contribute to this debate. Behavioral data
revealed that functional object knowledge (= motor-related semantic
knowledge) and (non-motor) semantic object knowledge are processed similarly, while processing manipulation-related action
knowledge took longer. For the manipulation task compared to the
two (motor and non-motor) semantic tasks, a general linear model
analysis revealed activations in the bilateral extra-striate body area
and the left intra-parietal sulcus. The reverse contrast led to activations in the fusiform gyrus and inferior parietal lobe bilaterally as well
as in the medial prefrontal cortex. Effective connectivity analysis
demonstrated that action and semantic knowledge about objects are
processed along two separate, but interacting processing streams with
the inferior parietal lobe mediating the exchange of information
between these streams.

123

Cogn Process (2014) 15 (Suppl 1):S1S158

EYE TRACKING, LINKING HYPOTHESES


AND MEASURES IN LANGUAGE PROCESSING
Convenors: Pia Knoeferle, Michele Burigo
Bielefeld University, Germany
The present symposium focuses on a core topic of eye tracking in
language processing, viz. linking hypotheses (the attributive relationship between eye movements and cognitive processes). One
central topic will be eye-tracking measures and their associated
linking hypotheses in both language comprehension and production. The symposium will discuss both new and established gaze
measures and their linking assumptions, as well as ambiguity in
our linking assumptions and how we could begin to address this
issue.

Conditional analyses of eye movements


Michele Burigo, Pia Knoeferle
Bielefeld University, Germany
In spoken language comprehension fixations guided by the verbal
input have been interpreted as reflecting a referential link between
words and corresponding objects (Tanenhaus, Spivey-Knowlton,
Eberhard, Sedivy 1995). However, they cannot reveal other aspects of
how comprehenders interrogate a scene (e.g., attention shifts from one
object to another). Inspections, on the other hand, are, by definition, a
good reflection of attentional shifts much like saccades (see Altmann,
Kamide 2004 for related discussion). One domain where attentional
shifts and their direction are informative is spatial language (e.g., the
plant is above the clock). Some models predict that objects are
inspected as they are mentioned while others predict that attention
must shift from a later-mentioned object (the clock) to the earlier
mentioned located object (the plant). To assess these model predictions, we examined in particular the directionality of attention shifts
via conditional analyzes. We ask where people look next, as they hear
the spatial preposition and after they have made one inspection to the
clock. Will they continue to inspect the clock or do they shift attention
back to the plant? Three eye tracking experiments were used to
investigate the directionality of attention shifts during spatial language processing. The results from these conditional analyzes
revealed, for the first time, the overt attentional shifts from the reference object (the clock) to the located object (the plant) in sentences
such as The plant is above the clock). In addition conditional analyzes of inspections may provide a useful approach for further
refining the linking hypotheses between eye movements and cognitive
processes (Fig. 1).
References
Altmann GTM (1999) Thematic role assignment in context. J Memory Lang 41:124145
Regier T, Carlson L (2001) Grounding spatial language in perception:
an empirical and computational investigation. J Exp Psychol Gen
130:273298
Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JE
(1995) Integration of visual and linguistic information in spoken
language comprehension. Science 268:16321634

Cogn Process (2014) 15 (Suppl 1):S1S158

Fig. 1 The plant is above the clock. The 5 9 6 grid was used to
define the objects locations and was invisible to participants

Rapid small changes in pupil size index processing


difficulty: the index of cognitive activity in reading,
visual world, and dual task paradigms
Vera Demberg
Saarland University, Saarbrucken, Germany
The size of the pupil has long been known to reflect arousal (Hess,
Polt 1960) and cognitive load in a variety of different tasks such as
arithmetic problems (Hess, Polt 1964), digit recall (Kahneman, Beatty
1966), attention (Beatty 1982) as well as language complexity (Schluroff 1982; Just, Carpenter 1993; Hyona et al. 1995; Zellin et al.
2011; Frank, Thompson 2012), grammatical violations (Gutirrez,
Shapiro 2010) and context integration effects (Engelhardt et al. 2010).
All of these studies have looked at the macro-level effect of the
overall dilation of the pupil as response to a stimulus. Recently, a
micro-level measure of pupil dilation has been proposed, called the
Index of Cognitive Activity or ICA (Marshall 2000, 2002, 2007),
which does not relate processing load to the overall changes in size of
the pupil, but instead counts the frequency of rapid small dilations,
which are usually discarded as pupillary hippus (Beatty, LuceroWagoner 2000).
Some aspects which make the ICA particularly interesting as a
measure of cognitive load are that the ICA a) is less sensitive to
changes in ambient light and fixation position b) is more dynamic,
which makes it easier to separate the effect of stimuli in close
sequence and c) is faster than overall pupil size, i.e., it can usually be
measured in the time window of 3001,200 ms after stimulus.
If it reliably reflects (linguistic) processing load, the ICA could
hence constitute a useful new method to assess processing load using
an eye-tracker, in auditory experiments, visual world experiments, as
well as in naturalistic environments which are not well suited for the
use of EEG, e.g. while driving a car, and could therefore usefully
complement the range of experimental paradigms currently used.
In this talk I will report experimental results on the index of
cognitive activity (ICA) in a range of reading experiments, auditory
language plus driving experiments as well as a visual world experiment, which all indicate that the ICA is a useful index of linguistic
processing difficulty.
References
Beatty J (1982) Task-evoked pupillary responses, processing load,
and the structure of processing resources. Psychol Bull 91(2):276

S15
Beatty J, Lucero-Wagoner B (2000) The pupillary system. Cambridge
University Press, Cambridge
Engelhardt PE, Ferreira F, Patsenko EG (2010) Pupillometry reveals
processing load during spoken language comprehension. Quart J
Exp Psychol 63:639645
Frank S, Thompson R (2012) Early effects of word surprisal on pupil
size during reading. In: Miyake N, Peebles D, Cooper RP (eds)
Proceedings of 34th annual conference cognitive science society,
pp 15541559
Gutirrez RS, Shapiro LP (2010) Measuring the time-course of sentence processing with pupillometry. In: CUNY conference on
human sentence processing
Hess E, Polt J (1960) Pupil size as related to interest value of visual
stimuli. Science
Hess E, Polt J (1964) Pupil size in relation to mental activity during
simple problem-solving. Science
Hyona J, Tommola J, Alaja A (1995) Pupil dilation as a measure of
processing load in simultaneous interpretation and other language tasks. Quart J Exp Psychol 48(3):598612
Just MA, Carpenter PA (1993) The intensity dimension of thought:
pupillometric indices of sentence processing. Can J Exp Psychol
47(2)
Kahneman D, Beatty J (1966) Pupil diameter and load on memory.
Science
Marshall S (2000) US patent no. 6,090,051
Marshall S (2002) The index of cognitive activity: Measuring cognitive work-load. In: Proceedings of 7th conference on human
factors and power plants, IEEE, pp 57
Marshall S (2007) Identifying cognitive state from eye metrics. Aviat
Space Environ Med 78(Supplement 1):B165B175
Schluroff M (1982) Pupil responses to grammatical complexity of
sentences. Brain Lang 17(1):133145
Zellin M, Pannekamp A, Toepel U, der Meer E (2011) In the eye of
the listener: pupil dilation elucidates discourse processing. Int J
Psychophysiol

Measures in sentence processing: eye tracking


and pupillometry
Paul E. Engelhardt1, Leigh B. Fernandez2
1
University of East Anglia, UK; 2 University of Potsdam, Germany
In this talk, we will present data from two studies that measured pupil
diameter as participants heard temporarily ambiguous sentences. In the
first study, we examined visual context. Tanenhaus et al. (1995) found
that in the context of a relevant visual world containing an apple on
a towel, an empty towel, and a box, listeners will often incorrectly
parse an instruction, such as put the apple on the towel in the box. The
misinterpretation is that the apple must be moved on to the empty
towel, and thus, the primary dependent measure is rate of saccadic eye
movements launched to the empty towel. Eye movements to the empty
towel do not occur when the visual world contains more than one
apple (Ferreira et al. 1995). In the first study, we examined the role that
visual context plays on the processing effort associated with gardenpath sentence processing (see example A). Pupil diameter was measured from the key (disambiguating) word in the sentence (e.g.
played). Our main hypothesis was that relevant visual context (e.g. a
picture of a woman dressing herself) would be associated with reduced
processing effort (i.e. no increase in pupil size). In contrast, when the
visual context supported the garden-path misinterpretation (e.g. a
picture of a woman dressing a baby) pupil diameter would reliably
increase.2 Results were consistent with both predictions.
2

The prosodic boundary between clauses was also manipulated.

123

S16
A. While the woman dressed (#) the baby that was cute and cuddly
played on the floor.
B. The superintendent learned [which schools/students] the proposal [that expanded/to expand] upon the curriculum would motivate
____ during the following semester.3
In the second study, we examined a special type of filler gap
dependency, called parasitic gap constructions. Filler gap dependencies occur when a constituent within a sentence has undergone
movement (e.g. Whati did the boy buy ti?). In this sentence, what has
moved from its canonical position as the object of buy, and thus, the
parser must be able to keep track of moved constituents and correctly
associate them with the correct verbs (or gap sites). Difficulty arises
when (1) there are multiple verbs in the sentence, and (2) when those
verbs are optionally transitive (i.e. have the option to take a direct
object or not). Parasitic gaps are a special type of construction
because a filler is associated with two gaps. An example is What did
the attempt to fix _ ultimately damage _?. Even more interestingly,
from a linguistic perspective, is that the first gap occurs in an illegal position. Phillips (2006) used a self-paced word-by-word reading
paradigm to test sentences containing parasitic gap like constructions
(see example B). He found slowdowns only in situations in which
parasitic gap dependency was allowed (i.e. with to expand). Furthermore, reading times were influenced by plausibility (i.e. it is
possible to expand schools but not students). In our second study, we
used similar materials to investigate parasitic gaps using changes in
pupil diameter over time as an index of processing load. Our data
indicates that the parser actively forms dependencies as soon as
possible, regardless of semantic fit.
In summary, this talk will compare and contrast findings from eye
tracking and reading times with pupil diameter. Both of our studies
showed similarities to the original works, but at the same time, also
showed novel dissociations. The relationship between discrete measures, such as saccadic eye movements, and continuous measures,
such as pupil diameter and mouse tracking, will be discussed.
References
Ferreira F, Henderson JM, Singer M (1995) Reading and language
processing: Similarities and differences. In Henderson JM, Singer
M, Ferreira F (Eds) Reading and language processing. Erlbaum,
Hillsdale, pp 338341
Phillips C (2006) The real-time status of island phenomena. Language
795823
Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JC
(1995) Integration of visual and linguistic information in spoken
language comprehension. Science 268(5217):16321634

Improving linking hypotheses in visually situated


language processing: combining eye movements
and event-related brain potentials

Cogn Process (2014) 15 (Suppl 1):S1S158


movements and ERPs). This naturally limits the conclusions we can draw
from this research with regard to language comprehension (theory). In
further refining our theories of sentence comprehension, better linking
hypotheses would thus be an essential step.
The present contribution argues that combining eye-tracking and
event-related brain potentials would improve the interpretation of
these two individual measures, the associated linking hypotheses, and
correspondingly insights into situated language comprehension processes (Knoeferle, in press).
Acknowledgments
This research was funded by the Cognitive Interaction Technology
Excellence Center (DFG).
References
Knoeferle P (in press) Cognitive Neuroscience of Natural Language
Use, Cambridge University Press, Cambridge, chap Language
comprehension in rich non-linguistic contexts: combining eye
tracking and event-related brain potentials

Oculomotor measurements of abstract and concrete


cognitive processes
Andriy Myachykov
Northumbria University, Newcastle-upon-Tyne, UK
Analysis of oculomotor behavior has long been used as a window into
the cognitive processes underlying human behavior. Eye tracking
allows recording of highly accurate categorical and chronometric
data, which provides experimental evidence about various aspects of
human cognition including, but not limited to, retrieval and activation
of information in memory and allocation and distribution of visual
attention. As a very diverse and accurate experimental tool, eye
tracking has been used for the analysis of low-level perceptual processes as well as for the investigation of higher cognitive processes
including mental arithmetic, language, and communication. One
example of the latter is research using visual world paradigm,
which uses eye movements of language users (listeners and speakers)
in order to understand cognitive processes underlying human linguistic communication.
In the first part of my talk, I will offer a broad overview of eye
tracking methodology with a specific focus on measurements and
their evidential value in different cognitive domains. The second part
will discuss results of a number of recent eye-tracking studies on
sentence production and comprehension as well as number
processing.

MANUAL ACTION
Pia Knoeferle
Bielefeld University, Germany
Listeners eye movements to objects in response to auditory verbal input,
as well as their event-related brain potentials (ERPs) have revealed that
non-linguistic cues contribute rapidly towards real-time language comprehension. While the findings from these two measures have contributed
important insights into context effects during real-time language comprehension, there is also considerable ambiguity in the linking between
comprehension processes and each of these two measures (eye
3

The critical items were taken from Phillips (2006) and were
simplified for auditory presentation.

123

Convenor: Dirk Koester


Bielefeld University, Germany
The hand is one of our most important tools for interacting with the
environment, both physically and socially. Manual actions and the
associated processes of motor control, both sensorimotor and cognitive, have received much attention. This research strand has a focus
on the complexity of movement details (e.g. kinematics, dynamics or
degrees of freedom). At the same time, in a seemingly different
research field, manual actions have been scrutinized for their communicative goals or functions, so-called co-speech gestures. Here, a
focus is on what kind of information is supported by such actions;
whether meaning is conveyed but also synchronization, akin to

Cogn Process (2014) 15 (Suppl 1):S1S158


kinematics, is currently under investigation. A tight functional interrelation between manual action control and language has long been
proposed (e.g. Steklis, Harnad 1976). Not only hand movements are
relevant and have to be controlled, also the environmental context
(i.e., the situation) has to be taken into account in order to fully
understand manual actions. Furthermore, technical advances permit
also the deeper investigation of the neural basis, in addition to the
cognitive basis, of (manual) action control. Regarding other cognitive
domains, recent evidence points towards a tight functional interaction
of grasping with other cognitive domains such as working memory or
attention (Spiegel et al. 2013; Logan, Fischman 2011). Whats more,
manual actions may be functional for abstract cognitive processing,
e.g., numerical reasons (as suggested by the phenomenon of finger
counting).
In this symposium we will bring together latest research that
explores the manifold functions and purposes of manual actions such
as exploring and manipulating objects, the development of such
control processes for grasping and the changes associated with aging.
Different models of action control will be presented and evaluated.
Also, evidence for the role of manual gestures in interacting and
communicating with other people will be presented. That is, not only
the (physical) effects of manual actions in the environment will be
discussed but also the interpretation of gestures, i.e., communicative
goals will be debated. The symposium will shed light on new concepts of and approaches to understanding the control of manual
actions and their functions in a social and interactive world.
References
Logan SW, Fischman MG (2011) The relationship between end-state
comfort effects and memory performance in serial and free
recall. Acta Psychol 137:292299
Spiegel MA, Koester D, Schack T (2013) The functional role of
working memory in the (re-)planning and execution of grasping
movements. J Exp Psychol Human Percept Performance
39:13261339
Steklis HD, Harnad SR (1976) From hand to mouth: Some critical
stages in the evolution of language. Annal N Y Acad Sci
280(1):445455

The Bremen-Hand-Study@Jacobs: effects of age


and expertise on manual dexterity
Ben Godde, Claudia Voelcker-Rehage
Jacobs Center on Lifelong Learning and Institutional Development,
Jacobs University, Bremen, Germany
A decline in manual dexterity is common in older adults and has been
demonstrated to account for much of the observed impairment in
everyday tasks, like pouring milk into a cup, preparing meals, or
retrieving coins from a purse. Aiming at the understanding of the
underlying mechanisms, the investigation of the regulation and
coordination of isometric fingertip forces has been given lot of
attention during the last decades. Also tactile sensitivity is increasingly impaired with older age and deficits in tactile sensitivity and
perception and therefore in sensorimotor feedback loops play an
important role for age-related decline in manual dexterity. Within the
Bremen-Hand-Study@Jacobs our main focus was on the question of
how age and expertise influence manual dexterity during middle
adulthood. In particular, we were interested in the capacity of older
employees to enhance their fine motor performance through practice.
To reach this goal, we investigated basic mechanisms responsible for
age-related changes in precision grip control and tactile performance
as well as learning capacities (plasticity) in different age and expertise
groups on a behavioral and neurophysiological (EEG) level.

S17
Our results confirmed a decline in basic components of manual
dexterity, finger force control and tactile perception, with increasing
age, even already during middle adulthood. Also age-related changes
in underlying neurophysiological correlates could be observed in
middle-aged adults. Performing manual tasks on a comparable level
to younger adults required more frontal (i.e. cognitive) brain resources in older workers indicating compensatory plasticity. Furthermore,
in both the motor and tactile domain expertise seemed to counteract
age-related decline and to postpone age effects for about 10 years.
Although older adults generally performed at a lower baseline performance level, they were able to improve motor and tactile
functioning by short term practice or stimulation interventions. Particularly in the tactile domain such an intervention was well suited to
attenuate age-related decline. Overall, our data suggest that the aging
process of manual dexterity seems to start slowly but continuously
goes on during the working lifespan and can be compensated by
continuous use (expertise) or targeted interventions.

Planning anticipatory actions: on the interplay


between normative and mechanistic models
Oliver Herbort
Department of Psychology, University of Wurzburg, Germany
Actions frequently foreshadow subsequent actions. For example, the
hand orientation used to grasp an object depends on the intended
object manipulation. Here, I examine whether such anticipatory grasp
selections can be described purely in terms of their function or
whether the planning process also has to be taken into account. To test
functional accounts, three posture-based cost functions were used to
predict grasp selection. As an example for a model of the planning
process, I evaluated the recently proposed weighted integration of
multiple biases model. This model posits that grasp selection is
heuristically based on the direction of the intended object rotation as
well as other factors. The models were evaluated using two empirical
datasets. The datasets were from two experiments, in which participants had to grasp and rotate a dial by various angles. The models
were fitted to the empirical data of individual participants using
maximum likelihood estimates of the models free parameters. The
model including the planning process provided a closer fit to the data
of both experiments than the functional accounts. Thus, human
actions can only be understood as the superimposition of their function and computational artifacts imposed by the limitations of the
central nervous system.

Identifying linguistic and neural levels of interaction


between gesture and speech during comprehension
using EEG and fMRI
Henning Holle
Department of Psychology, University of Hull, UK
Conversational gestures are hand movements that co-occur with
speech but do not appear to be consciously produced by the speaker.
The role that these gestures play in communication is disputed, with
some arguing that gesture adds only little information over and above
what is already transmitted by speech alone. My own work has provided strong evidence for the alternative view, namely that gestures
add substantial information to the comprehension process. One level
at which this interaction between gesture and speech takes place
seems to be semantics, as indicated by the N400 of the Event Related
Potential. I will also present findings from a more recent study that

123

S18
has provided evidence for a syntactic interaction between gesture and
speech (as indexed by the P600 component). Finally, fMRI studies
suggest that areas associated with the detection of semantic mismatches (left inferior frontal gyrus) and audiovisual integration (left
posterior temporal lobe) are crucial components of the brain network
for co-speech gesture comprehension.

Neural correlates of gesture-syntax interaction

Cogn Process (2014) 15 (Suppl 1):S1S158


activation for the surprising than for the non-surprising context in the
parietal and temporal multi-modal association cortices (ACs) that are
known to process context. Fronto-insular cortex (FIC) was more
active for surprising actions compared to non-surprising actions.
When the non-surprising action was perceived, functional connectivity between brain areas that represent action surprise and
contextual surprise was enhanced. The findings suggest that the
strength of the interregional neural coupling minimizes surprising
sensations necessary for perception of others goal-directed actions
and provide support for a hierarchical predictive model of brain
function.

Leon Kroczek, Henning Holle, Thomas Gunter


Max-Planck-Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany
In a communicative situation, gestures are an important source of
information which also impact speech processing. Gesture can for
instance help when speech perception is troubled by noise (Obermeier
et al. 2012) or when speech is ambiguous (Holle et al. 2007).
Recently, we have shown that not only meaning, but also structural
information (syntax) used during language comprehension is influenced by gestures (Holle et al. 2012). Beat gestures, which highlight
particular words in a sentence, seem to be able to disambiguate
sentences that are temporarily ambiguous with respect to their syntactic structure. Here we explored the underlying neural substrates of
the gesture-syntax interaction with fMRI using similar ambiguous
sentence material as Holle et al. (2012). Participants were presented
with two types of sentence structures which were either easy (SubjectObject-Verb) or more difficult (Object-Subject-Verb) in their syntactic complexity. A beat gesture was shown either at the first or the
second noun phrase (NP). Activations related to syntactic complexity
were primarily lateralized to the left (IFG, pre-SMA, pre-central
gyrus, and MTG) and bilateral for the Insula. A ROI-based analysis
showed interactions of syntax and gesture in the left MTG, left preSMA, and in the bilateral Insula activations. The pattern of the
interaction suggests that a beat on NP1 facilitates the easy SOV
structure and inhibits the more difficult OSV structure and vice versa
for a beat on NP2. Because the IFG was unaffected by beat gestures it
seems to play an independent/isolated role in syntax processing.

Interregional connectivity minimizes surprise responses


during action perception
Sasha Ondobaka, Marco Wittmann, Floris P de Lange,
Harold Bekkering
Donders Institute for Brain, Cognition and Behavior, Radboud
University Nijmegen, Netherlands
The perception of other individuals goal-directed actions requires the
ability to process the observed bodily movements and the surrounding
environmental context at the same time. Both action and contextual
processing have been studied extensively (Iacoboni et al. 2005;
Shmuelof and Zohary 2005; Bar et al. 2008), yet, the neural mechanisms that integrate action and contextual surprise remain elusive.
The predictive account describes action perception in terms of a
hierarchical inference mechanism which generates prior predictions
to minimize surprise associated with incoming action and contextual
sensory input (Friston et al. 2011; Koster-Hale and Saxe 2013). Here,
we used functional neuroimaging to establish which brain circuits
represent action and contextual surprise and to examine the neural
mechanisms that are responsible for minimizing surprise-related
responses (Friston 2005). Participants judged whether an action was
surprising or non-surprising dependent on the context in which the
action took place. They first viewed a surprising or non-surprising
context, followed by a grasping action. The results showed greater

123

The development of cognitive and motor planning skills


in young children
Kathrin Wunsch1, Roland Pfister2, Anne Henning3,4,
Gisa Aschersleben4, Matthias Weigelt1
1
Department of Sport and Health, University of Paderborn, Germany;
2
Department of Psychology, University of Wurzburg, Germany;
3
Developmental Psychology, University of Health Sciences Gera,
Germany; 4 Department of Psychology, Saarland University,
Germany
The end-state comfort (ESC) effect signifies the tendency to avoid
uncomfortable postures at the end of goal-directed movements and
can be reliably observed during object manipulation in adults, but
only little is known about its development in children. Therefore, the
present study investigated the development of anticipatory planning
skills in children and its interdependencies with the development of
executive functions. Two hundred and seventeen participants in 9 age
groups (3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-year-olds, and adults) were tested
in three different end-state comfort tasks and three tasks to assess
executive functioning (Tower of Hanoi, Mosaic, and the D2 attention
endurance task). Regression analysis revealed a robust developmental
trend for each individual end-state comfort task across all age groups
(all p \ .01). Somewhat surprisingly, there was no indication of
generalization across these tasks, as correlations between the three
motor tasks failed to reach significance for all age groups (p [ .05).
Furthermore, we did not observe any systematic correlation between
performance in the end-state comfort tasks and the level of executive
functioning. Accordingly, anticipatory planning develops with age,
but the impact of executive functions on this development seems to be
rather limited. Moreover, motor planning does not seem to be a
holistic construct, as the performance in the three different tasks was
not correlated. Further research is needed to investigate the interdependencies of sensory-motor skill development with other cognitive
abilities.

PREDICTIVE PROCESSING: PHILOSOPHICAL


AND NEUROSCIENTIFIC PERSPECTIVES
Convenor: Alex Morgan
CIN, University of Tubingen, Germany
The idea that the brain makes fallible inferences and predictions in
order to get by in a world of uncertainty is of considerable vintage,
but it is now beginning to achieve maturity due to the development of
a range of rigorous theoretical tools rooted in Bayesian statistics that
are increasingly being used to explain various aspects of the brains
structure and function. The emerging Bayesian brain approach in
neuroscience introduces novel ways of conceptualizing perception,
cognition, and action. It also arguably involves novel forms of

Cogn Process (2014) 15 (Suppl 1):S1S158

S19

neuroscientific explanation, such as an emphasis on statistical optimality. The science is moving rapidly, but philosophers are
attempting to keep up, in order to understand how these recent
developments might shed light on their traditional concerns about the
nature of mind and agency, as well as concerns about the norms of
psychological explanation. The purpose of this symposium is to bring
together leading neuroscientists and philosophers to discuss how the
Bayesian brain approach might reshape our understanding of the
mind-brain, as well as our understanding of mind-brain science.

this principle; such as hierarchical message passing in the brain and


the perceptual inference that ensues. I hope to illustrate the ensuing
brain-like dynamics using models of bird songs that are based on
autonomous dynamics. This provides a nice example of how
dynamics can be exploited by the brain to represent and predict the
sensorium that isin many instancesgenerated by ourselves. I hope
to conclude with an illustration that illustrates the tight relationship
between pragmatics of communication and active inference about the
behavior of self and others.

Bayesian cognitive science, unification, and explanation

Learning sensory predictions for perception and action

Matteo Colombo
Tilburg Center for Logic and Philosophy of Science, Tilburg
University, Netherlands

Axel Lindner
Hertie Institute for Clinical Brain Research, University of Tubingen,
Germany

It is often claimed that the greatest value of the Bayesian framework


in cognitive science consists in its unifying power. Several Bayesian
cognitive scientists assume that unification is obviously linked to
explanatory power. But this link is not obvious, as unification in
science is a heterogeneous notion, which may have little to do with
explanation. While a crucial feature of most adequate explanations in
cognitive science is that they reveal aspects of the causal mechanism
that produces the phenomenon to be explained, the kind of unification
afforded by the Bayesian framework to cognitive science does not
necessarily reveal aspects of a mechanism. Bayesian unification,
nonetheless, can place fruitful constraints on causal-mechanical
explanation.

Perception and action are not only informed by incoming sensory


information but, also, by predictions about upcoming sensory events.
Such sensory predictions allow, for instance, to perceptually distinguish self- from externally- produced sensations: by comparing actionbased predictions with the actual sensory input, the sensory component
that is produced by ones own actions can be isolated (attenuated etc.).
Likewise, action-based sensory predictions allow the motor system to
react more rapidly to predictable events and, thus, to be less dependent
on delayed sensory feedback. I will demonstrate that the cerebellum, a
structure intimately linked to plasticity within the motor domain,
accounts for learning action- based sensory predictions on a short time
scale. I will further show that this plasticity is not solely related to the
motor domainit also influences the way we perceptually interpret the
sensory consequences of our behavior. Specifically, I will present
experiments in which we use virtual reality techniques to alter the
visual direction subjects associate with their pointing movements.
While we were able to change the predicted visual consequences of
pointing in healthy individuals, such recalibration of a sensory prediction was dramatically comprised in patients with lesions in the
Cerebellum. Extending these results on sensory predictions for selfproduced events, I will show that the cerebellum also underlies the
learning of sensory predictions about external sensory eventsindependent of self-action. In contrast to healthy controls, cerebellar
patients were significantly impaired in learning to correctly predict the
re-occurrence of a moving visual target that temporarily disappeared
behind an occluder. In summary, our research suggests that the cerebellum plays a domain-general role in fine-tuning predictive models
irrespective of whether sensory predictions are action-based (efference
copies) or sensory-based, and irrespective of whether sensory predictions support action, perception, or both.

The explanatory heft of Bayesian models of cognition


Frances Egan, Robert Matthews
Department of Philosophy, Rutgers University, USA
Bayesian models have had a dramatic impact on recent theorizing
about cognitive processes, especially about those brain-environment
processes directly implicated in perception and action. In this talk we
examine critically explanatory character of these models, especially
in light of so-called new mechanist claims to the effect that these
models are not genuinely explanatory, at least are little more than
explanation sketches. We illustrate our points with examples drawn
from both classical dynamics and cognitive ethology. We conclude
with a discussion of the import of these models for the presumption,
common among neuropsychologists, that commonsense folk psychological concepts such as belief and desire have an important role
to play in cognitive neuroscience.

Layer resolution fMRI to investigate cortical feedback


and predictive coding in the visual cortex
Predictive processing and active inference
Karl Friston
Institute of Neurology, University College London, UK
How much about our interaction withand experience ofour world
can be deduced from basic principles? This talk reviews recent
attempts to understand the self-organized behavior of embodied
agents, like ourselves, as satisfying basic imperatives for sustained
exchanges with the environment. In brief, one simple driving force
appears to explain many aspects of action and perception. This
driving force is the minimization of surprise or prediction error that
in the context of perceptioncorresponds to Bayes-optimal predictive coding. We will look at some of the phenomena that emerge from

Lars Muckli
Institute of Neuroscience and Psychology, University of Glasgow, UK
David Mumford (1991) proposed a role for reciprocal topographic
cortical pathways in which higher areas send abstract predictions of
the world to lower cortical areas. At lower cortical areas, top-down
predictions are then compared to the incoming sensory stimulation.
Several questions arise within this framework: (1) do descending
predictions remain abstract, or do they translate into concrete level
predictions, the language of lower visual areas? (2) how is incoming
sensory information compared to top-down predictions? Are input
signals subtracted from the prediction (as proposed in the predictive
coding framework) or are they multiplied (as proposed by other

123

S20
models i.e. biased competition or adaptive resonance theory)? Contributing to the debate of abstract or concrete level information, we
aim to investigate the information content of feedback projections with
functional MRI. We have exploited a strategy in which feedforward
information is occluded in parts of visual cortex: i.e. along the nonstimulated apparent motion path, behind a white square that we used to
occlude natural visual scenes, or by blindfolding our subjects (Muckli,
Petro 2013). By presenting visual illusions, contextual scene information or by playing sounds we were able to capture feedback signals
within the occluded areas of the visual cortex. MVPA analysis of the
feedback signal reveals that they are more abstract than the feedforward signal. Furthermore, using high resolution MRI we found that
feedback is sent to the outer cortical layers of V1. We also show that
feedback to V1 can originate from auditory information processing
(Vetter, Smith, Muckli 2014). We are currently developing strategies
to reveal the precision and potential functions of cortical feedback.
Our results link into the emerging paradigm shift that portrays the
brain as a prediction machine (Clark 2013).
References
Clark A (2013) Whatever next? Predictive brains, situated agents, and
the future of cognitive science. Behav Brain Sci 36(3):181204
Muckli L, Petro L (2013) Network interactions: non-geniculate input
to V1. Curr Opin Neurobiol 23(2):195201
Mumford D (1991) On the computational architecture of the neo
cortexthe role of the thalamocortical loop Biol Cybern
65(2):135145
Vetter S, Muckli L (2014) Decoding sound and imagery content in
early visual cortex. Curr Biol 24(11):12561262

HOW LANGUAGE AND NUMERICAL


REPRESENTATIONS CONSTITUTE
MATHEMATICAL COGNITION
Convenor: Hans-Christoph Nuerk
University of Tubingen, Germany
Mathematical or numerical cognition has often been studied with little
consideration of language and linguistic processes. The most basic
representation, the number magnitude representation has been viewed
as amodal and non-verbal. Only in the last years, the influence of
linguistic processes has received again more interest in cognitive
research. Now we have evidence that even the most basic tasks like
magnitude comparison and parity judgment, and even the most basic
representations, such as spatial representation of number magnitude,
are influenced by language and linguistic processes.
The symposium brings together international researchers from
different fields (Linguistics, Psychology and Cognitive Neuroscience) with at least three different foci within the general symposium
topic: (i) How is spatial representation of number influenced by
reading and writing direction? (ii) How do number word structures
of different languages influence mathematical and numerical performance? (iii) How are linguistic abilities of children and linguistic
complexity of mathematical tasks related to mathematical
performance?
After an overview given by the organizer, the symposium starts
with a presentation by Fischer and Shaki, who have shaped the
research about reading and writing influences on the relation between
space and number in recent years. They give an update about explicit
and implicit linguistic influences on spatial-numerical cognition.
Tzelgov and Zohar-Shai may partially challenge this view, because
they show that related linguistic effects, namely, the Linguistic
Markedness Effect, may mask seemingly observed null effects of

123

Cogn Process (2014) 15 (Suppl 1):S1S158


number-space relations in Hebrew. Concluding the first part, Soltanlou, Huber, and Nuerk examine how different basic numerical
effects including the SNARC (spatial-numerical association of
response-codes) effect are influenced by linguistic and other cultural
properties.
The next three talks are concerned with the question, how number
word structure influences numerical and mathematical processing in
children and adults. In the last years, it has been shown repeatedly that
intransparent number word structures specifically interfere with
mathematical performance. Schiltz, Van Rinsveld, and Ugen make
use of the fact that all children in Luxemburg are taught bilingual
(French, German). They are therefore able to examine the influence of
different number word structures in within-participant designs. They
observe linguistic influences on mathematical cognition, which,
however, are mediated by a childs proficiency in a given language.
Dowker, Lloyd, and Roberts compare performance in English and
Welsh language. Welsh number word structure is fully transparent
(13 = Ten-three; 22 = two-tens-two) for all two-digit number words,
which is commonly only found in Pacific Rim Countries. Since in
Wales some children are taught in Welsh, and some in the less
transparent English, the impact of the transparency of a number word
system can be studied within one culture thereby avoiding confounds
of language and culture common in cross-cultural studies. The results
suggest that children benefit in specific numerical tasks, but not in
arithmetic performance in general. Finally, Bahnmueller, Goebel,
Moeller, and Nuerk used eye-tracking methodology to examine in a
translingual eye-tracking study, which processes are underlying linguistic influences on numerical cognition. They show that at least for
a sub-group language guides attentional processing of multi-digit
Arabic numbers in a way consistent with the number word structure.
In the final part of the symposium, Szucs examined the differential
contribution of language-related and language-independent skills on
mathematical performance. He observed that on one hand, phonological decoding skills predict mathematical performance in
standardized tests, but that on the other hand, children with pure
dyscalculia do not show deficits in verbal and language functions.
Finally, Daroczy, Wolska, Nuerk, and Meurers used an interdisciplinary (Linguistics, Psychology) approach to study mathematical
word problems. They systematically varied linguistic and numerical
complexity within one study and examined how both factors contribute to mathematical performance in this task.
The symposium concludes with a general discussion about how language and numerical representations constitute mathematical cognition.

Influences of number word inversion on multi-digit


number processing: a translingual eye-tracking study
Julia Bahnmueller1,2, Silke Goebel3, Korbinian Moeller1,
Hans-Christoph Nuerk1,2
1
IWM-KMRC Tubingen, Germany; 2 University of Tubingen,
Germany; 3 University of York, UK
Differences in number word systems become most obvious for multidigit numbers. Therefore, the investigation of multi-digit numbers is
crucial to identify linguistic influences on number processing. One of
the most common specificities of a number word system is the
inversion of number words with respect to the digits of a number (e.g.,
the German number word for 27 is siebenundzwanzig (*seven and
twenty). While linguistic influences of the number word system have
been reliably observed over the last years, the specific cognitive
contributions underlying these processes are still unknown.
Therefore present study aimed at investigating the underlying
cognitive processes and language specificities of three-digit number

Cogn Process (2014) 15 (Suppl 1):S1S158


processing. More specifically, it was intended to clarify to which
degree three-digit number processing is influenced by parallel and/or
sequential processing of the involved digits and modulated by language. English- and German-speaking participants were asked to
complete a three-digit number comparison task while their response
latencies as well as their eye movements were recorded. Results
showed that in both language groups there were indicators of both
parallel and sequential processing with clear-cut language-based
differences being observed. Reasons for the observed language-specific differences contributing to a more comprehensive understanding
of mathematical cognition are discussed.

On the influence of linguistic and numerical complexity


in word problems
Gabriella Daroczy1, Magdalena Wolska1, Hans-Christoph Nuerk1,2,
Detmar Meurers1
1
University of Tubingen, Germany; 2 IWM-KMRC Tubingen,
Germany
Word problems, in which a mathematical problem is given as a
reading text, before arithmetic calculation can begin, belong to the
most difficult mathematical problems in children in adults. Different linguistic factors, e.g., text complexity (nominalization vs.
verbal phrases), numerical factors, (carry or non-carry, addition or
subtraction), and the relation between linguistic text and mathematical problem (order consistency) can all contribute to the
difficulty of a word problem. Our interdisciplinary group systematically varied linguistic and numerical factors in a withinparticipant design. The results showed that both linguistic and
numerical complexity as well as their interrelation contributed to
mathematical performance.

Linguistic influences on numerical understanding:


the case of Welsh
Ann Dowker1, Delyth Lloyd2, Manon Roberts3
1
Dept of Experimental Psychology, University of Oxford, England;
2
University of Melbourne, Australia; 3 Worcester College, Oxford,
England
It is sometimes suggested that a reason why children in Pacific Rim
countries excel in mathematics is that their counting systems are
highly transparent: e.g. 13 is represented by the equivalent of tenthree; 22 by the equivalent of two-tens-two etc. This may make
both counting and the representation of place value easier to acquire
than in many other languages. However, there are so many cultural
and educational differences between, for example, the USA and
China that it is hard to isolate the influence of any particular factor. In
Wales, both a regular counting system (Welsh) and an irregular
counting system (English) are used within a single region. Approximately 20 % of children in Wales receive their education in the
Welsh medium, while following the same curriculum as those
attending English medium schools. This provides an exceptional
opportunity for studying the effects of the regularity of the counting
system, in the absence of major confounding factors. Studies so far
suggest that Welsh-speaking children do not outperform their English-speaking counterparts in all aspects of arithmetic, but that they do
show superiority in some specific aspects: notably in reading and
comparing 2-digit numbers, and in the precision of their non-verbal
number line estimation.

S21

Reading space into numbers: an update


Martin H. Fischer1, Samuel Shaki2
1
University of Potsdam, Germany;

Ariel University, Israel

Number-Space associations, and the SNARC effect in particular, were


extensively investigated in the past two decades. Still, their origin and
directionality remain unclear. We will address the following questions: (a) Does the SNARC effect reflect recent spatial experience or
long-standing directional habits? (b) Does the SNARC effect spillover from reading habits for numbers or from reading habits for
words? (c) What is the contribution of other directionality cues (e.g.,
vertical grounding such as more is up; cultural metaphors)?
Finally, we will consider the impact of empirical findings from an
Implicit Association Test.

How language and numerical representations constitute


mathematical cognition: an introductory review
Hans-Christoph Nuerk
University of Tubingen and IWM-KMRC Tubingen, Germany
Mathematical or numerical cognition has often been studied largely
independently of language and linguistic processes. Only in the last
years, the influence of such linguistic processes has received more
interest. Now it is known that even the most basic tasks like magnitude comparison and parity judgment and even basic representations,
such as spatial representation of number magnitude, are influenced by
language and linguistic processes.
Within the general topic of language contributions to mathematical cognition and performance we can distinguish at least three
different foci: (i) How does spatial representation of number influence
by reading and writing direction? (ii) How do number word structures
of different languages influence mathematical and numerical performance? (iii) How are linguistic abilities of children and linguistic
complexity of mathematical tasks related to mathematical
performance?
A short overview over the state of the research in above topics is
given and it is introduced, which open questions are addressed in this
symposium.

Language influences number processing: the case


of bilingual Luxembourg
Christine Schiltz, Amandine Van Rinsveld, Sonja Ugen
Cognitive Science and Assessment Institute, University
of Luxembourg, Luxembourg
In a series of studies we investigated how language affects basic
number processing tasks in a GermanFrench bilingual setting. The
Luxembourg school system indeed progressively educates pupils to
become GermanFrench bilingual adults, thanks to extensive language courses in both German and French, as well as a progressive
transition of teaching language from German (dominant in primary
school) to French (dominant in secondary school). Studying numerical cognition in children and adults successfully going through the
Luxembourg school system thus provides an excellent opportunity to
investigate how progressively developing bilingualism impacts
numerical representations and computations. Studying this question in
Luxembourgs GermanFrench bilingual setting is all the more

123

S22
interesting, since the decades and units of two-digit number words
follow opposite structures in German (i.e. unit-decade) and French
(decade-unit). In a series of experiments pupils from grades 7, 8, 10,
11, and adults made magnitude comparisons and additions that were
presented in different formats: Arabic digits and number words. Both
tasks were performed in separate German and French testing sessions
and we recorded correct responses rates and response times. The
results obtained during magnitude comparison show that orally presented comparisons are performed differently by the same
participants according to task language (i.e. different compatibility
effects in German vs. French). For additions it appears that the level
of language proficiency is crucial for the computation of complex
additions, even in adults. In contrast, adults tend to retrieve simple
additions equally well in both languages. Taken together, these results
support the view of a strong language influence on numerical representations and computations.

Language differences in basic numerical tasks


Mojtaba Soltanlou, Stefan Huber, Hans-Christoph Nuerk
University of Tubingen and IWM-KMRC Tubingen, Germany
Connections between knowledge of language and knowledge of
number have been suggested on theoretical and empirical grounds.
Chomsky (1986) noted that both the sentences of a language and the
numbers in a counting sequence have the property of discrete infinity,
and he suggested that the same recursive device underlies both
(Bloom 1994 and Hurford 1987). Numerical researchers have therefore begun to examine the influences of linguistic properties.
In this internet study, we explored adults from various countries in
some basic numerical tasks consist of symbolic and non-symbolic
magnitude comparison and parity judgment, and recorded responses
to find the error rate and reaction time. The results suggest that not
only distinct languages influence these kinds of tasks differentially,
but that the other cultural and individual factors play an important
role in numerical cognition.

Cognitive components of the mathematical processing


network in primary school children: linguistic
and language independent contributions
Denes Szucs
University of Cambridge, UK
We have tested the cognitive components of mathematical skill in
more than one hundred 9 year old primary school children. We aimed
to separate the contributions of language related and language independent skills. We used 18 cognitive tests and 9 custom experiments.
We identified phonological decoding efficiency and verbal intelligence as important contributors to mathematical performance
(measured by standardized tests). In addition, spatial ability, visual
short term and working memory were also strong predictors of
arithmetic performance. Further, children with pure developmental
dyscalculia only showed impaired visuo-spatial processing but no
impairment in verbal and language function. The results can shed
light on the differing role of language and visual function in arithmetic and on co-morbidity of language and arithmetic disorders.

123

Cogn Process (2014) 15 (Suppl 1):S1S158

It does exist! A SNARC effect amongst native Hebrew


speakers is masked by the MARC effect
Joseph Tzelgov, Bar Zohar-Shai
Ben-Gurion University of the Negev, Israel
The SNARC effect has been found mainly with participants who
speak Germanic languages. The effect in these studies implies that
mental number line spreads from left-to-right. Therefore, it was
suggested that the effect derives from the experience of writing
from left-to-right. Commonly, studies of spatial-numerical associations in Hebrew speakers report a null SNARC effect when the
standard designs in which the participants are asked to perform
parity task twice, each time with a different parity-to-hand mapping. It has been argued that this is due to different reading
directions of words and numbers. Hebrew is written from right-toleft while numbers are written by Hebrew writers from left-toright as in Germanic languages. In this paper, we show that a
SNARC effect in native Hebrew speakers does exists when the
design minimizes the MARC effect. Furthermore even Hebrew is
written from right-to-left the mental number line as estimated by
the SNARC effect spreads from left-to-right as in Germanic languages. These findings challenge the assumption that direction of
reading is the main source of the direction of spatial-numerical
association.

MODELING OF COGNITIVE ASPECTS


OF MOBILE INTERACTION
Convenors: Nele Russwinkel, Sabine Prezenski, Stefan Lindner
TU Berlin, Germany
Interacting with mobile devices is gaining more and more importance
in our daily life. Using those devices provides huge comfort, but
nevertheless entails specific challenges. In contrast to the classical
home computer setting, mobile device usage is more prone to disruptions, more influenced by time pressure and more likely to be
affected by earlier interaction experiences. An important issue in this
context consists in interfaces fitting best for the users cognitive
abilities. These abilities display a high variety between different
groups of users. How can developers and designers adapt an interface
to meet the users skills and preferences? For these purposes, cognitive modeling provides an appealing opportunity to gain insights
into the users skills and cognitive processes. It offers a theoretical
framework as well as a computational platform for testing theories
and deriving predictions.
The scope of this symposium lies in introducing selected
approaches to user modeling and showing their application to the
domain of mobile interaction. In this context we are particularly
interested in criteria like learnability and efficiency from a cognitive
as well as a technical point of view. Moreover, research concerning
individual differences, interruption and expectancy is presented.
Overall, we aim to show that the mobile interaction scenario offers an
interesting research area to test model approaches in real life applications, but also discuss cognitive processes that are relevant within
those tasks. We will look upon those different cognitive aspects of
mobile interaction and the role of modeling to improve cognitive
appropriate applications.

Cogn Process (2014) 15 (Suppl 1):S1S158

Creating cognitive user models on the basis of abstract


user interface models
Marc Halbrugge
TU Berlin, Germany
The recent explosion of mobile appliances creates new challenges not
only for application developers and content creators, but also for
usability professionals. Conducting a classical usability study of a
mobile user interface (UI) on an exhaustive number of devices is
more or less impossible. One approach to tackle the engineering side
of the problem is model-based user interface development, where an
abstract UI model is adapted to the target device at runtime (Calvary
et al. 2003). When this method is applied, the application flow is
modeled first and user controls are abstractly identified by their roles
therein (e.g. command, choice, output). The elements of the final UI
as presented to the users (e.g. buttons, switches, labels) are all representations of those, enriched by physical properties like position,
size, and textual content.
While knowing the sizes and positions of the UI elements already
allows predictions of completion times for previously specified tasks,
e.g. by creating simple cognitive models using CogTool (John et al.
2004), the additional information encoded into the abstract UI model
allows to go much further. It contains machine readable knowledge
about the application logic and the UI elements that are to be visited
to attain a specified goal, which creates a significant opportunity for
machine translation into more precise cognitive models (Quade et al.
2014). In this talk, I will show how completion time predictions can
be improved based on abstract UI model information. Data from two
empirical studies with a kitchen assistance application is presented to
illustrate the method and quantify the gain in prediction accuracy.
References
Calvary G, Coutaz J, Thevenin D, Limbourg Q, Bouillon L, Vanderdonckt J (2003) A unifying reference framework for multitarget user interfaces. Interact Comput 15(3):289308
John BE, Prevas K, Salvucci DD, Koedinger K (2004) Predictive
human performance modeling made easy. In: CHI 04: Proceedings of the SIGCHI conference on Human factors in
computing systems. ACM Press, pp 455462
Quade M, Halbrugge M, Engelbrecht KP, Albayrak S, Moller S
(2014) Predicting task execution times by deriving enhanced
cognitive models from user interface development models. In:
Proceedings of EICS 2014, Rome, Italy (in press)

Expectations during smartphone application use


Stefan Lindner
TU Berlin, Germany
Expectations serve a multitude of purposes and play a large role in the
adoption and use of new technological devices. I will briefly discuss a
classification of expectations, implementation ideas in ACT-R and
their role during smartphone app use.
In a general sense, expectations coordinate our goals and desires
with the current and the future state of the environment. They are
necessary for any kind of intentions, help in action preparation
(Umbach et al. 2012), and play a prominent role in action-perception
feedback loops (Friston, Kiebel 2009).
Experience-based expectations are expectations that result from the
individual learning history. Both the utility and activation mechanisms of ACT-R can be interpreted as reflecting experience-based
expectations about our environment. One possible way to model the
formation of experience-based expectations from past experiences
using the partial matching and blending algorithms of ACT-R is

S23
described in Kurup et al. (2012). Other implementations are possible
(Lindner, Russwinkel 2013). Universal expectations are expectations
that result from the universally inherited pre-structuring of the environment. In ACT-R universal expectations are in part already
reflected in the modelers decisions regarding the content of the
model environment, memory items and production elements.
Both types of expectations play a dynamic role during the adaptation
and use of a technical device. Using a new smartphone app users will
first rely on general expectations derived from past use of other smart
phone apps or computer programs. Universal expectations, especially
in the form of assumed form-function contingencies, play an important role in this phase as well. With time, however, users will
increasingly rely on expectations that are in line with specific
knowledge acquired during use.
References
Friston K, Kiebel S (2009) Predictive coding under the free-energy
principle. Philos Trans R Soc Biol Sci 364:12111221
Kurup U, Lebiere C, Stentz A, Hebert M (2012) Using expectations to
drive cognitive behavior. In: Proceedings of the 26th AAAI
conference on artificial intelligence
Lindner S, Russwinkel N (2013). Modeling of expectations and surprise
in ACT-R. In: Proceedings of the 12th international conference on
cognitive modeling, pp 161166. Available online: http://iccmconference.org/2013-proceedings/papers/0027/index.html
Umbach VJ, Schwager S, Frensch PA, Gaschler R (2012) Does
explicit expectation really affect preparation? Front Psychol
3:378. doi:10.3389/fpsyg.2012.00378

Evaluating the usability of a smartphone application


with ACT-R
Sabine Prezenski
TU Berlin, Germany
The potentials of using ACT-R (Anderson 2007) based cognitive
models for evaluating different aspects of usability are demonstrated
using a shopping list application for an Android application.
Smartphone applications are part of our everyday life. A successful application should meet the standard of usability as defined in
EN ISO-924-110 (2008) and EN ISO-924-111 (1999). In general,
usability testing is capacious and requires vast resources. In this work,
we demonstrate how cognitive models can answer important questions concerning efficiency, learnability and experience in a less
demanding and rather effective way. Further we outline how cognitive models provide explanations about underlying cognitive
mechanisms which effect usability.
Two different versions of a shopping list application (Russwinkel
and Prezenski 2014) are evaluated. The versions have a similar
appearance but differ in menu-depth. User tests were conducted and
an ACT-R model, able to interact with the application, was designed.
The task of the user respectively the model consists in selecting
products for a shopping list. In order to discover potential learning
effects, repetition of the task was required.
User data show, that for both versions time on task decreases as
user experience increases. The version with more menu-depth is less
efficient for novice users. The influence of menu-depth decreases as
user experience increases. Learning transfers from different versions
are also found. Time on task for different conditions is approximately
the same for real users and the model. Furthermore, our model is able
to explain the effects displayed in the data. The learning effect is
explained through the building of application-specific knowledge
chunks in the models declarative memory. These application-specific
knowledge chunks further resolve why expertise is more important
than menu-depth.

123

S24

Cogn Process (2014) 15 (Suppl 1):S1S158

References
Anderson JR (2007) How Can the Human Mind Occur in the Physical
Universe? (p 304) New York Oxford University Press
EN ISO 9241-110 (2008) Ergonomics in Human-System-Interaction
Part 110: Fundamentals in Dialogmanagement (ISO 9241-110:
2006). International Organization for Standardization, Genf
EN ISO 9241-11 (1999) Ergonomic Requirements for Office Work
with Visual Display Terminals (VDTs). Part 11: Guidance on
Usability. International Organization for Standardization, Genf
Russwinkel N, Prezenski S (2014) ACT-R meets usability. In: Pro
ceedings of the sixth international conference on advanced
cognitive technologies and applications. COGNITIVE

possible? In: Stephanidis C (ed) Universal access in human


computer interaction. Users diversity, vol. 6766 of lecture notes
in computer science. Springer, Berlin, pp 131139
Dickinson A, Arnott JL, Prior S (2007) Methods for humancomputer
interaction research with older people. Behav Inf Technol
26(4):343352
Hanson VL (2011) Technology skill and age: what will be the same
20 years from now? Univ Access Inf Soc 10:443452
Hawthorn D (2000) Possible implications of aging for interface
designers. Interact Comput 12(5):507528
Reason, JT (1990) Human error. Ambridge University Press,
Cambridge

Simulating interaction effects of incongruous mental


models

Special offer! Wanna buy a trout?Modeling user


interruption and resumption strategies with ACT-R

Matthias Schulz
TU Berlin, Germany

Maria Wirzberger
TU Berlin, Germany

Traditional usability evaluations involving older adults are difficult to


conduct (Dickinson et al. 2007) and the results may also be misleading, as often only the cognitively and physically fittest seniors
participate (Hawthorn 2000). In addition to this, older adults often
lack experience in using modern devices (Hanson 2011). Furthermore, it is reasonable to assume that older adults often have problems
operating new devices, if they inappropriately transfer prior experience using other devices (Arning, Ziefle 2007). Such an
inappropriately transfer would result in an increase of wrong or
redundant interaction steps, which in turn may lead to unintended
actions being recognized by the system (Bradley et al. 2011).
To simulate the effects of incongruous mental models or the
inappropriate transfer of prior experience using other devices, an
existing tool for automatic usability evaluationthe MeMo workbenchwas extended. The goal of the enhancement was to simulate
interaction of users with a smartphone including mistakes and slips;
According to Reason (Reason 1990, p 12 ff.), Mistakes, Lapses, and
Slips are the primary error types which can be used to classify errors
in human computer interaction. To simulate mistakeserrors which
result from incongruous mental models or inappropriately transferring
prior experiencea new processing module was added. This processing module uses 4 generalized linear models (GLMs) to compute
what kind of interaction the user model intends to apply to the
touchscreen. To simulate slips we added a new execution module
which computes the probability that the user model interaction is not
executed as intended (e.g. missing a button when trying to hit it).
Our results show that it is possible to simulate interaction errors
(slips and mistakes) and describe interaction parameters for younger
and older adults operating a touchscreen by using the improved
MeMo workbench.

Interruption is a frequently appearing phenomenon users have to


deal with in interactions with technical systems. Especially when
using mobile applications on Smartphones they are confronted with
a variety of distractors, induced by the system itself (e.g., product
advertisement, system crash) or resulting from the mobile context
(e.g., motion, road traffic). Such interruptions might be critical
especially in periods of already enhanced demands on working
memory, resulting in increased experienced workload. Based on a
time course model of interruption and resumption to a main task,
developed by Altmann and colleagues (e.g., Altmann, Trafton 2004),
this research explores an interruption scenario due to product
advertisement while using a simple shopping app Product advertisement is an omnipresent and at the same time cognitively
demanding kind of interruption, as it forces a decision for or against
the offered product.
We developed an ACT-R model, able to perform an interrupted
product selection task under alternating workload conditions,
resuming by either cognitively or visually tying in with the product
selection. In brief, the task consists of searching and selecting a set of
predefined products in several runs, and meanwhile being interrupted
by product advertisement at certain times. Different levels of workload are induced by shopping for one vs. three people. Model
validation is performed experimentally with a sample of human
participants, assessing workload by collecting pupil dilation data.
Our main focus of analysis consists in how execution and
resumption performance differ in case of workload, and what strategies users apply in this terms to react to interruptions. In detail, we
expect an impaired task performance and extended resumption times
with increasing workload. Moreover, strategies while resuming to the
product selection might differ in terms of varying workload levels.
Important results concerning the assumed effects will be addressed
within this talk.

References
Arning K, Ziefle M, (2007) Understanding age differences in PDA
acceptance and performance. Comput Human Behav 23(6):2904
2927
Bradley M, Langdon P, Clarkson P (2011) Older user errors in
handheld touchscreen devices: to what extent is prediction

123

References
Altmann, EM, Trafton JG (2004) Task interruption: resumption lag
and the role of cues. In: Proceedings of the 26th annual conference of the Cognitive Science Society, Chicago, Illinois

Cogn Process (2014) 15 (Suppl 1):S1S158

Tutorials
Introduction to probabilistic modeling and rational
analysis
Organizer: Frank Jakel
University of Osnabruck, Germany
The first part of the course is a basic introduction to probability theory
from a Bayesian perspective, covering conditional probability, independence, Bayes rule, coherence, calibration, expectation, and
decision-making. We will also discuss how Bayesian inference differs
from frequentist inference. In the second part of the course we will
discuss why Bayesian Decision Theory provides a good starting point
for probabilistic models of perception and cognition. The focus here
will be on rational analysis and ideal observer models that provide an
analysis of the task, the environment, the background assumptions
and the limitations of the cognitive system under study. We will go
through several examples from signal detection to categorization to
illustrate the approach.

Modeling vision
Organizer: Heiko Neumann
University of Ulm, Germany
Models of neural mechanisms underlying perception can provide
links between experimental data from different modalities such as,
e.g., psychophysics, neurophysiology, and brain imaging. Here we
focus on visual perception.
The tutorial is structured into three parts. In the first part the role of
models in vision science is motivated. Models can be used to formulate hypotheses and knowledge about the visual system that can be
subsequently tested in experiments which, in turn, may lead to model
improvements. Modeling vision can be described at various levels of
abstraction and using different approaches (first principles approaches, phenomenological models, dynamical systems). In the second
part specific models of early and mid-level vision are reviewed,
addressing topics such as, e.g., contrast and motion detection, perceptual grouping, motion integration, figure-ground segregation,
surface perception, and optical flow. The third part focuses on higherlevel form and motion processing and building learning-based representations. In particular, object recognition, biological/articulated
motion perception, and attention selection are considered.

Visualization of eye tracking data


Organizer: Michael Raschke
Contributors: Tanja Blascheck, Michael Burch, Kuno Kurzhals,
Hermann Pfluger
University of Stuttgart, Germany
Apart from measuring completion times and recording accuracy rates
of correctly given answers during performance of visual tasks, eye
tracking experiments provide an additional technique to analyze how
the attention of an observer is changing on a presented stimulus.
Besides using statistical algorithms to compare eye tracking metrics,
visualization techniques allow us to visually analyze different aspects
of the recorded data. However, in most times only state of the art
visualization techniques are usually used, such as scan path or heat
map visualizations.

S25
In this tutorial we will present an overview on further existing
visualization techniques for eye tracking data and demonstrate their
application in different user experiments and use cases. The tutorial
will present three topics of eye tracking visualization:
1.) Visualization for supporting the general analysis process of a
user experiment.
2.) Visualization for static and dynamic stimuli.
3.) Visualization for understanding cognitive and perceptual processes and refining parameters for cognition and perception
simulations.
This tutorial is designed for researchers who are interested in eye
tracking in general or in applying eye tracking techniques in user
experiments. Additionally, the tutorial could be of interest for psychologists and cognitive scientists who would like to evaluate and
refine cognition and perception simulations. It is suitable for PhD
students as well as for experienced researchers. The tutorial requires a
minimal level of pre-requisites. Fundamental concepts of eye tracking
and visualization will be explained during the tutorial.

Introduction to cognitive modelling with ACT-R


Organizers: Nele Russwinkel, Sabine Prezenski, Fabian Joeres,
Stefan Lindner, Marc Halbrugge
Contributors: Fabian Joeres, Maria Wirzberger; Technische
Universitat Berlin, Germany
ACT-R is the implementation of a theory of human cognition. It has a
very active and diverse community that uses the architecture in laboratory tasks others in applied research. ACT-R is oriented on the
organization of the brain and is called hybrid architecture because it
holds symbolic and subsymbolic components. The aim of working on
cognitive models with a cognitive architecture is to understand how
humans produce intelligent behavior.
In this tutorial the cognitive architecture ACT-R is introduced
(Anderson 2007). In the beginning we will give a short introduction of
the background, structure and scope of ACT-R. Then we would like
to start with hands-on examples how cognitive models are written in
ACT-R.
In the end of the tutorial we will give a short overview about
recent work and its benefit for applied cognitive science.
References
Anderson JR (2007) How can the human mind occur in the physical
universe? Oxford University Press, New York

Dynamic Field Theory: from sensorimotor behaviors


to grounded spatial language
Organizers: Yulia Sandamirskaya, Sebastian Schneegans
Ruhr University Bochum, Germany
Dynamic Field Theory (DFT) is a conceptual and mathematical
framework, in which cognitive processes are grounded in sensorimotor behavior through continuous in time and in space dynamics of
Dynamic Neural Fields (DNFs). DFT originates in Dynamical Systems thinking which postulates that the moment-to-moment behavior
of an embodied agent is generated by attractor dynamics, driven by
sensory inputs and interactions between dynamic variables. Dynamic
Neural Fields add representational power to the Dynamical Systems
framework through DNFs, which formalize the dynamics of neuronal
populations in terms of activation functions defined over behaviorally
relevant parameter spaces. DFT has been successfully used to account

123

S26
for development of visual and spatial working memory, executive
control, scene representation, spatial language, and word learning, as
well as to guide behavior of autonomous cognitive robots. In the
tutorial, we will cover the basic concepts of Dynamic Field Theory in
several short lectures. The topics will be: the attractors and instabilities that model elementary cognitive functions; the couplings
between DNFs and multidimensional DNFs; coordinate transformations and coupling DNFs to sensory and motor systems; autonomy

123

Cogn Process (2014) 15 (Suppl 1):S1S158


within DFT. We will show on an exemplary architecture for generation of flexible spatial language behaviors how the DNF
architectures may be linked to sensors and motors and generate realworld behavior autonomously. The same architecture may be used to
account for behavioral findings on spatial language. The tutorial will
include a hands-on session to familiarize participants with a MATLAB software framework COSIVINA, which allows to build complex
DNF architectures with little programming overhead.

Cogn Process (2014) 15 (Suppl 1):S1S158

Poster presentations
The effect of language on spatial asymmetry in image
perception
Zaeinab Afsari, Jose Ossandon, Peter Konig
Osnabruck University, Germany
Image viewing studies recently revealed that healthy participants
demonstrated leftward spatial bias while performing free viewing
task. This leftward gaze bias has been suggested to be due to the
lateralization in the cortical attention system, but might be alternatively explained or influenced by reading direction on the horizontal
spatial bias while freely viewing images. Four eye-tracking experiments were conducted by using different bilingual subjects and
different direction of reading paragraphs primes. Participants first
read a text and subsequently freely viewed nine images while the eye
movements were recorded. Experiment 1 investigates the effect of
reading direction among bilingual participants with right-to-left
(RTL) and left-to-right (LTR) text primes. Those participants were
native Arabic/Urdu speakers. In concordance with previous studies,
after reading LTR prime, a leftward shift in the first second of image
exploration was observed. In contrast, after reading RTL text primes,
participants displayed a rightward spatial bias. This result demonstrates that reading direction of text primes influences later
exploration of complex stimuli. In experiment 2, we investigated
whether this effect was due to a systematic influence of native vs.
secondary language, independently of the direction of reading. For
this purpose, we measured German/English bilinguals with German/
English LTR reading direction text stimuli. Here, participants showed
leftward spatial bias after reading LTR texts in either cases. This
demonstrates that for the present purpose, the difference between
primary and secondary language is not important. In Experiment 3,
we investigated the relative influence of scanning direction and actual
reading direction. LTR bilingual participants were presented with
normal (LTR) and mirrored left-to-right (mLTR) texts. Upon reading
the primes, reading direction differed markedly, reflecting mirrored
and not mirrored conditions. However, we did not observe significant
differences in the leftward bias. The bias is even slightly stronger
after reading mLTR. This experiment demonstrates that the actual
scanning direction did not influence the asymmetry on later complex
image stimuli. In Experiment 4, we studied the effect of reading
direction among bilingual participants with LTR as primary language
and RTL as secondary language. These participants were native
Germans and Arabic Germans who learned Arabic mainly later in life.
They showed a leftward bias after reading both LTR and RTL text
primes. In conclusion, although it seems like the reading direction
was the main factor for modulating the perceptual bias, there could be
another explanation. The innate laterality systems in our brain (left
lateralized linguistic and right lateralized attention) play a role in
increasing/decreasing the bias.

Towards formally founded ACT-R simulation


and analysis
Rebecca Albrecht1, Michael Giewein2, Bernd Westphal2
1
Center for Cognitive Science, University of Freiburg, Germany;
2
Software Engineering, University of Freiburg, Germany
Abstract
The semantics of the ACT-R cognitive architecture is today defined
by the ACT-R interpreter. As a result, re-implementations of ACT-R

S27
which, e.g., intend to provide a more concise syntax cannot be proven
correct. We present a re-implementation of ACT-R which is based on
a formal abstract semantics of ACT-R.
Keywords
ACT-R Implementation, Formal Semantics
Introduction
ACT-R (Anderson 1983, 2007) is a widely used cognitive architecture. It provides an agent programming language to create a
cognitive model and an interpreter to execute the model. A model
consists of a set of chunk types, a set of production rules, and the
definition of an initial cognitive state. An execution of a model is a
sequence of time-stamped cognitive states where one cognitive state
is obtained by the execution of a production rule on its predecessor
in the sequence.
Over the past thirty years the ACT-R interpreter has been extended
and changed immensely based on findings in psychological research.
Unfortunately, the relation between concepts of the ACT-R theory
and the implementation of the ACT-R interpreter are not always clear.
So today, strictly speaking, only the Lisp source code of the ACT-R
interpreter is defining the exact semantics of an ACT-R model, so it
is often felt that modelers merely write computer code that mimics the
human data (Stewart, West 2007). Due to this situation, it is
unnecessarily hard to compare different ACT-R models for similar
tasks and ACT-R modelling is often perceived to be rather inefficient
and error prone (Morgan, Haynes, Ritter, Cohen 2005) in the literature. To overcome these problems, we propose a formal abstract
syntax and semantics for the ACT- R cognitive architecture (Albrecht
2013; Albrecht, Westphal 2014b). The semantics of an ACT-R model
is the transition system which describes all possible computations of
an ACT-R model.
In this work, we report on a proof-of-concept implementation of
the formal semantics given in Albrecht (2013) which demonstrates a
formally founded approach to ACT-R model execution and provides a
basis for new orthogonal analyzes of (partial) ACT-R models, e.g., for
the feasibility of certain sequences of rule executions (Albrecht,
Westphal 2014a).
Related Work
Closest to our work is the deconstruction and reconstruction of ACTR by Stewart and West (2007). Their work aims to ease the evaluation
of variations in the structure of computational models of cognition.
To this end, they analyzed the Lisp implementation of the ACT- R 6
interpreter and re-engineered it, striving to clarify fundamental concepts of ACT-R. To describe these fundamental concepts they use the
Python programming language and obtain another working ACT-R
in- terpreter called Python ACT-R. To validate Python ACT-R, they
statistically compare predictions of both implementations on a set of
ACT-R models.
In our opinion, firstly, there should be an abstract, formal definition
of ACT-R syntax and semantics to describe fundamental concepts.
And only secondly, another interpreter should be implemented based
on this formal foundation which may, as Python ACT-R does, also
offer a more convenient concrete syntax for ACT-R models. This twostep approach in particular allows to not only test but formally verify
that each interpretation implements the formal semantics.
The ACT-UP (Reitter, Lebiere 2010) toolbox for rapid prototyping
of complex models is also not based on a formal basis. ACT-UP
offers higher level means to access fundamental concepts of the ACTR theory for more efficient modelling, but the aim is not to clarify
these fundamental concepts. Re-implementations of ACT-R in the
Java programming language (jACT-R, 2010; ACT-R: The Java Simulation and Development Environment 2013) have the main purpose
to make ACT-R accessible for other applications written in Java.
They do not contribute to a more detailed understanding of basic
concepts of the ACT-R theory.

123

S28
Implementation
We implemented the formal ACT-R semantics provided by (Albrecht
2013; Albrecht, Westphal 2014b) in the Lisp dialect Clojure, which
targets the Java Virtual Machine (JVM). As a Lisp dialect, it is
possible to establish a close relation between the formalization and
the implementation. By targeting the JVM, our approach subsumes
the work of (Buttner 2010) without the need for TCP/IP based interprocess communication.
In the formal semantics the signature for the abstract syntax is
described using relation symbols, function symbols, and variables.
Chunk types are given as functions and production rules as tuples
over the signature. An ACT-R architecture is defined as a set of
interpretation functions for symbols used in the signature. The components can be directly translated into a Clojure implementation.
The current implementation supports ACT-R tutorial examples for
base level learning and spreading activation using an own declarative
module (Giewein 2014). The results of the ACT-R 6 interpreter are
reproduced up to small rounding errors.
Conclusion
Our implementation of an ACT-R interpreter based on a formal
semantics of ACT-R demonstrates the feasibility of the two-step
approach to separate the clarification of fundamental concepts and
a re-implementation. In future work, we plan to extend our
implementation to support further models. Technically, our choice
of Clojure allows to more conveniently interface Java code and
cognitive models. Conceptually, we plan to use our implementation
as a basis for more convenient modelling languages and as an
intermediate format for new, exhaustive analyzes of cognitive
models based on model- checking techniques and constraint
solvers.
References
ACT-R The Java Simulation and Development Environment (2013)
Retrieved from http://cog.cs.drexel.edu/act-r/about.html, 16
May 2014
Albrecht R (2013) Towards a formal description of the ACT-R unified
theory of cognition. Unpublished masters thesis, Albert-Ludwigs-Universitat Freiburg
Albrecht R, Westphal B (2014a) Analyzing psychological theories
with F-ACT-R. In: Proceedings of KogWis 2014, to appear
Albrecht R, Westphal B (2014b) F-ACT-R: defining the architectural
space. In: Proceedings of KogWis 2014, to appear
Anderson JR (1983) The architecture of cognition, vol 5. Psychology
Press
Anderson JR (2007) How can the human mind occur in the physical
universe? Oxford University Press, Oxford
Buttner P (2010) Hello Java! Linking ACT-R 6 with a Java simula
tion. In: Proceedings of the 10th international conference on
cognitive modeling, pp 289290
Giewein M (2014) Formalisierung und Implementierung des de
klarativen Moduls der kognitiven Architektur ACT-R. (Bachelors Thesis, Albert- Ludwigs-Universitat Freiburg)
jACT-R (2010). Retrieved from http://jactr.org, 16 May 2014
Morgan GP, Haynes SR, Ritter FE, Cohen MA (2005) Increasing effi
ciency of the development of user models. In SIEDS, pp 8289
Reitter D, Lebiere C (2010) Accountable modeling in ACT-UP, a
scalable, rapid-prototyping ACT-R implementation. In: Proceedings ofthe 10th international conference on cognitive
modeling (ICCM), pp 199204
Stewart TC, West RL (2007) Deconstructing and reconstructing
ACT-R: exploring the architectural space. Cogn Syst Res
8(3):227236

123

Cogn Process (2014) 15 (Suppl 1):S1S158

Identifying inter-individual planning strategies


Rebecca Albrecht, Marco Ragni, Felix Steffenhagen
Center for Cognitive Science, University of Freiburg, Germany
Abstract
Finding solutions to planning problems can be very complex as they
may consist of hundreds of problem states to be searched by an agent.
In order to analyze human planning strategies cognitive models can
be used. Usually the quality of a cognitive model is evaluated w.r.t.
quantitative criteria such as overall planning time. In complex planning tasks, however, this may not be enough as different strategies
may need the amount of same time to be solved. We present an
integration of different AI methods from knowledge representation
and planning to qualitatively evaluate a cognitive model with respect
to inter-individual factors.
Keywords
Qualitative Analysis, Model Evaluation, Strategies, Graph-based
Representations, Planning
Introduction
In cognitive modeling, a computer model based on psychological
assumptions is used to describe human behavior in a certain task. In
order to evaluate the quality of a cognitive model average results from
behavioral experiments, e.g. response times, are compared to average
results predicted by a cognitive model. However, this method does
not accommodate for modeling qualitative and inter-individual
differences.
We present a method for analyzing qualitative differences in user
strategies w.r.t. psychological factors which are different in individuals, e.g. working memory capacity. A qualitative representation of a
user strategy is given by a path, i.e. a sequence of states, in the
problem space of a task. Individual factors are represented by
numerical parameters controlling user strategies. In order to evaluate
a cognitive model strategies used by participants in a behavioral
experiment are compared to strategies predicted by the cognitive
model w.r.t. different parameter values. The cognitive model is
evaluated by identifying for each participant a set of parameter values
such that the execution of a model best predicts the participants
strategies.
Method Sketch
Firstly, we represent strategies of participants and the cognitive model
w.r.t. different parameter settings in so-called strategy graphs. Formally, a strategy graph for a problem instance p is a directed, labelled
multigraph which includes a set of vertices Vp which represent all
states traversed by any participant or the cognitive model, a set of
edges Ep which represent the application of legal actions in the task, a
set of initial states Sp , Vp and a set of goal states Gp , Vp Note that
the strategy graph may include multiple edges (for different agents)
between two states. An example for a partial strategy graph with a
planning depths of three in a task from the Rush Hour planning
domain (Flake, Baum 2002) is shown in Fig. 1.
Secondly, parameter values for which the cognitive model best
replicates human participants strategies are identified based on
similarity measures calculated for each pair of parameter values and
human participants. The similarity of two strategies is restricted to
values between 0 and 1 and is calculated based on strategies given in
the strategy graph. In the evaluation, each participant is assigned to a
set of parameter values where the cognitive models strategy is
maximally similar to the participants strategy. The parameter values
assigned to participants are identified as the planning profile of the
participant. In this step, several similarity measures are possible, e.g.
the Waterman-Smith algorithm (Smith, Waterman 1981).

Cogn Process (2014) 15 (Suppl 1):S1S158

S29
Flake GW, Baum EB (2002) Rush hour is PSPACE-complete, or
Why you should generously tip parking lot attendants. Theor
Comput Sci 270:895911
Mcdermott D (1996) A heuristic estimator for means-ends analysis in
planning. In: Proceedings of the 3rd international conference on AI
planning systems, pp 142149
Smith TF, Waterman MS (1981) Identification of common molecular
subsequences. J Mol Biol 147:195197
Steffenhagen F, Albrecht R, Ragni M (2014) Automatic identification
of human strategies by cognitive agents. In: Proceedings of the
37th German conference on artificial intelligence, to appear

Fig. 1 Example of a partial strategy graph for a planning depth of


three in the Rush Hour problem domain. States are possible Rush
Hour board configurations. Dashed edges indicate moves on optimal
solution paths. Solid edges represent moves of participants in a
behavioral experiment or moves of cognitive agents. The circle
around the state in the center of the figure indicates a so-called
decision point where several moves can be considered optimal. The
dashed game objects in problem states on the bottom of the figure
indicate the game object which was moved

With respect to the presented method, the quality of the cognitive


model is given by the mean similarity of strategies used by participants and strategies used by the cognitive model for best replicating
parameter settings.
Preliminary Evaluation Results
We evaluated the proposed method preliminarily in the Rush Hour
planning domain (Steffenhagen, Albrecht, Ragni 2014). Human data
was collected in a psychological experiment with 20 participants
solving 22 different Rush Hour tasks. The cognitive model was programmed to use means-end analysis (Faltings, Pu 1992; Mcdermott
1996) with different parameters to control local planning behavior with
respect to assumed individual factors. The similarity was calculated
with the Waterman-Smith algorithm for calculating local sequence
alignments (Smith, Waterman 1981). For each of the 20 participants a
set of parameter values controlling the cognitive model was identified
(1) constantly over all tasks and (2) for each task separately. The
evaluation reveals that this cognitive model can predict 44 % of human
strategies for (1) and 76 % of human strategies for (2).
Conclusion
We present a method to qualitatively evaluate cognitive models by
analyzing user strategies, i.e. sequences of states traversed in the
solution of a task. The state space of a planning problem, e.g. the
Rush Hour problem space, might be very complex. As a result, user
strategies and, therefore, underlying cognitive processes cannot be
analyzed by hand. With the presented method human strategies are
analyzed automatically by identifying cognitive models which traverse the same problem states as human participants.
In cognitive architectures often numerical parameters are used to
control the concrete behavior of a cognitive model, e.g. decay rate in
ACT-R. Often, these parameters also influence planning strategies of
a model. Although parameter values might be different in individuals,
usually they are set constantly over all executions of the model. With
respect to the outlined similarity measure it is possible to analyze
which parameter values induce strategies similar to an individual.
Acknowledgment
This work has been supported by a grant to Marco Ragni within the
project R8-[CSpace] within the SFB/TR 8 Spatial Cognition.
References
Faltings B, Pu P (1992) Applying means-ends analysis to spatial
planning. In: Proceedings of the 1991 IEEE/RSJ international
workshop on intelligent robots and systems, pp 8085

Simulating events. The empirical side of the event-state


distinction
Simone Alex-Ruf
University of Tubingen, Germany
Since Vendler (1957) an overwhelming amount of theoretical work on
the categorization of situations concerning their lexical aspect has
emerged within linguistics. Telicity, change of state, and punctuality
vs. durativity are the main features used to distinguish between events
and states. Thus, the VPs (verbal phrases) in (1) describe atelic stative
situations, the VPs in (2) telic events:
(1) to love somebody, to be small
(2) to run a mile, to reach the top
Although there are so many theories about what constitutes an
event or a state, the empirical studies concerning this question can be
counted on one hand. This is quite surprising, since the notion of
lexical aspect is a central issue within verb semantics. Even more
surprising is the fact that these few studies provide results pointing in
completely opposite directions:
The studies in Stockall et al. (2010) and Coll-Florit and Gennari
(2011) report shorter RTs (reaction times) to events than to states and,
therefore, suggest that the processing of events is easier. In contrast,
Gennari and Poeppel (2003) found shorter RTs after reading states
than after reading events. They explain this result by the higher level
of complexity in the semantics of verbs describing events, which
requires longer processing times.
A closer look at these studies, however, reveals that in nearly all of
them different verbs or VPs were compared: Gennari and Poeppel
(2003), for example, used eventive VPs like to interrupt my father and
stative VPs like to resemble my mother. One could argue that these
two VPs not only differ in their lexical aspect, but perhaps also in
their emotional valence and in the way the referent described by the
direct object is affected by the whole situation, and that these features
therefore occurred as confounding variables, influencing the results in
an undesirable way.
To avoid this problem, in the present study German ambiguous
verbs were used: Depending on the context, verbs like fullen (to fill),
schmucken (to decorate) and bedecken (to cover) lead to an eventive
or a stative reading. With these verbs sentence pairs were created,
consisting of an eventive (3) and a stative sentence (4) (= target
items). The two sentences of one pair differed only in their grammatical subject, but contained the same verb and direct object:
Target items:
(3) Der Konditor/fullt/die Form/[].
The confectioner/fills/the pan/[].
(4) Der Teig/fullt/die Form/[].
The dough/fills/the pan/[].
In a self-paced reading study participants had to read these sentences phrase-by-phrase and in 50 % of all trials answer to a
comprehension question concerning the content of the sentence.

123

S30
Note that in the event sentences all referents described by the
grammatical subjects were animate, whereas in the state sentences all
subjects were inanimate. Many empirical studies investigating animacy
suggest that animate objects are remembered better than inanimate
objects (see, for example, Bonin et al. 2014). Therefore, shorter RTs on
the subject position of event sentences than of state sentences were
expected, resulting in a main effect of animacy. Since this effect could
influence the potential event-state effect measured on the verb position
as a spillover effect, control items containing the same subjects, but
different, non-ambiguous verbs like stehen (to stand) were added:
Control items:
(5) Der Konditor/steht/hinter der Theke/[].
The confectioner/stands/behind the counter/[].
(6) Der Teig/steht/hinter der Theke/[].
The dough/stands/behind the counter/[].
The results confirmed this assumption: Mean RT measured on the
subject position was significantly shorter for the animate than for the
inanimate referents, F(1, 56) = 9.65, p = .003 (587 vs. 602 ms).
Within the control items, this animacy effect influenced the RTs on
the verb position: After animate subjects RTs of the (non-ambiguous)
verb were shorter than after inanimate subjects (502 vs. 515 ms),
revealing the expected spillover effect.
However, within the target items mean RT measured on the
position of the (ambiguous) verb showed the opposite pattern: After
animate subjects it was significantly longer than after inanimate
subjects, F(1, 56) = 4.12, p = .047 (534 vs. 520 ms). Here no
spillover effect emerged, but a main effect which can be attributed to
the different lexical aspect of the two situation types.
If indeed processing times are longer for events than for states,
how could this effect be explained? The simulation account, proposed, for example, by Glenberg and Kaschak (2002) und Zwaan
(2004), provides an elegant solution. A strong simulation view of
comprehension suggests that the mental representation of a described
situation comes about in exactly the same way than when this situation is perceived in real-time. This means that language is made
meaningful by cognitively simulating the actions implied by sentences (Glenberg and Kaschak 2002:595).
Imagine what is simulated during the processing of a state like the
dough fills the pan: The simulation contains a pan and some dough in
this pan, but nothing more. In contrast, the simulation of an event like
the confectioner fills the pan not only requires additional participants
like the confectioner and perhaps a spatula, but also action (of the
confectioner), movement (of the confectioner, the dough, and the
spatula), situation change (from an empty to a full pan) and a relevant
time course. The simulation of a state can be envisioned as a picture,
for the imagination of an events simulation a film is needed. In short,
the simulation evoked by an event is more complex than that of a
state.
Under the assumption that a simulation constitutes at least a part of
the mental representation of a situation, it seems comprehensible that
the complexity of such a simulation has an influence on its processing
and that the higher degree of complexity in the simulation of events
leads to longer RTs.
References
Bonin P, Gelin M, Bugaiska A (2014) Animates are better remembered
than inanimates: further evidence from word and picture stimuli.
Mem Cognit 42: 370382. doi:10.3758/s13421-013-0368-8
Coll-Florit M, Gennari SP (2011) Time in language: Event duration in
language comprehension. Cogn Psych 62:4179
Gennari SP, Poeppel D (2003) Processing correlates of lexical
semantic complexity. Cognition 89:B27B41
Glenberg AM, Kaschak MP (2002) Grounding language in action.
Psychon Bull Rev 9:558565

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Stockall L, Husband EM, Beretta A (2010) The online composition of
events. Queen Marys Occasional Papers Advancing Linguistics 19
Vendler Z (1957) Verbs and times. Philosoph Rev 66:143160
Zwaan, RA (2004) The immersed experiencer: Toward an embodied
theory of language comprehension. In: Ross BH (ed) The psychology of learning and motivation, vol 44. Academic Press,
New York, pp 3562

On the use of computational analogy-engines


in modeling examples from teaching and education
Tarek R. Besold
Institute of Cognitive Science, University of Osnabruck, Germany
Abstract
The importance of analogy for human cognition and learning has
widely been recognized, and analogy-based methods are also being
explicitly integrated into the canon of approved education and
teaching techniques. Still, the actual level of knowledge about analogy as instructional means and device as of today is rather low. In this
summary report on preliminary results from an ongoing project, I
propose the use of computational analogy-engines as methodological
tools in this domain of research, additionally motivating this attempt
at connecting AI and the learning sciences by two worked application
case studies.
Keywords
Computational Analogy-making, Artificial Intelligence, Education,
Cognitive Modeling, Computational Modeling
Introduction: Analogy in Education and Cognitive Modeling
Analogical reasoning (i.e., the ability of perceiving and operating on
dissimilar domains as similar with respect to certain aspects based on
shared commonalities in relational structure or appearance) is considered essential for learning abstract concepts (Gentner et al. 2001)
and in general for childrens process of learning about the world
(Goswami 2001).
Concerning an educational context, analogies facilitate learners
construction processes of new ideas and conceptions on the grounds
of already available concepts (Duit 1991), and can be used for
facilitating the understanding of concepts and procedures in abstract
and formal domains such as mathematics, physics or science (GuerraRamos 2011). Still, it is not a cure-all as unsuccessful analogies may
produce misunderstandings and can result in harmful misconceptions
(Clement 1993).
Analogy has also been actively investigated in artificial intelligence (AI), bringing forth numerous computational frameworks and
systems for automated analogy-making and analogical reasoning.
And indeed, computational analogy frameworks also have found
entrance into the context of education and teaching: For instance in
Thagard et al. (1989), the authors present a theory and implementation of analogical mapping that applies to explanations of unfamiliar
phenomena as e.g. used by chemistry teachers, and Forbus et al.
(1997) show how an information-level model of analogical inferences
can be incorporated in a case-based coach that is being added to an
intelligent learning environment. Siegler (1989) conjectures how the
Structure-Mapping Engine (Falkenhainer et al. 1989) could be used to
gain insights about developmental aspects of analogy use.
Analogy Engines in the Classroom: Worked Examples
Building on the outcome of these and similar research efforts, in
Besold (2013), I firstly advocated to expand research applying analogy-engines to problems from teaching and education into a proper
program in its own right, opening up a new application domain to
computational analogy-making.

Cogn Process (2014) 15 (Suppl 1):S1S158


In order to provide factual grounding and initial worked examples
for the possible applications of computational analogy-engines, Besold (2013) and Besold et al. (2013) feature two case studies. In both
cases, the Heuristic-Driven Theory Pro jection (HDTP) analogymaking framework (Schmidt et al. 2014) was applied to modeling
real-world examples taken from a classroom context. Besold (2013)
provides an HDTP model of the string circuit analogy for gaining a
basic understanding of electric current (Guerra-Ramos 2011) used in
science classes for 8 to 9 year old children. Besold et al. (2013) give a
detailed and fairly complex formal model of the analogy-based Calculation Circular Staircase (Schwank et al. 2005), applied in teaching
basic arithmetics and the conception of the naturals as ordinal numbers to children attending their initial mathematics classes in primary
school. The Calculation Circular Staircase, (i.e., a teaching tool
shaped like a circular staircase with the steps being made up by
incrementally increasing stacks of balls, grouped in expanding circles
of ten stacks per circle corresponding to the decimal ordering over the
naturals) offers children a means of developing an understanding of
the interpretation of numbers as results of transformation operations
by enabling a mental functional motor skill-based way of accessing
the foundational construction principles of the number space and the
corresponding basic arithmetic operations. The HDTP model gives a
precise account of how the structure of the staircase and the declarative counting procedure memorized by children in school interact in
bringing forth the targeted conception of the natural number space.
Summarizing, in both case studies, the respective formal model shows
to be highly useful in uncovering the underlying structure of the
method or teaching tool, together with the consecutive steps of reasoning happening on the level of computational theory.
Conclusion
By providing a detailed formal description of the involved domains
and their relation in terms of their joint generalization and the corresponding possibility for knowledge transfer our models try to
explicate the structural relations and governing laws underlying the
respective teaching tools. Also we point out how the identified constructive and transformation-based conceptualizations then can
provide support and a deeper- rooted model for the childrens initially
very flat and sparse conceptions of the corresponding domains.
In general, modeling educational analogies sheds new light on a
particular analogy, in terms of which information is transferred, what
the limitations of the analogy are, or whether it makes unhelpful
mappings; and what potential extensions might be needed. On this
basis, we hope to acquire a deeper understanding of the basic principles and mechanisms underlying analogy- based learning in fairly
high-level and abstract domains.
References
Besold TR (2013) Analogy engines in classroom teaching: modeling
the string circuit analogy. In: Proceedings of the AAAI Spring
2013 symposium on creativity and (early) cognitive development
Besold TR, Pease A, Schmidt M (2013) Analogy and Arithmetics: An
HDTP-based model of the calculation circular staircase. In:
Proceedings of the 35th annual meeting of the cognitive science
society, Cognitive Science Society, Austin, TX
Clement J (1993) Using bridging analogies and anchoring intuitions
to deal with students preconceptions in physics. J Res Sci Teach
30:12411257
Duit R (1991) The role of analogies and metaphors in learning sci
ence. Sci Educ 75(6):649672
Falkenhainer B, Forbus K, Gentner D (1989) The structure-mapping
engine: algorithm and examples. Artif Intell 41:163
Forbus K, Gentner D, Everett J, Wu M (1997) Towards a computa
tional model of evaluating and using analogical inferences. In:
Proceedings of the 19th annual conference of the cognitive
science society, pp 229234

S31
Gentner D, Holyoak K, Kokinov B (eds) (2001) The analogical mind:
perspectives from cognitive science. MIT Press
Goswami U (2001) The analogical mind: perspectives from cognitive
science, MIT Press, chap Analogical reasoning in children,
pp 437470
Guerra-Ramos M (2011) Analogies as tools for meaning making in
elementary science education: how do they work in classroom
settings? Eurasia J Math Sci Technol Educ 7(1):2939
Schmidt M, Krumnack U, Gust H, Kuhnberger KU (2014) Heuristicdriven theory projection: an overview. In: Prade H, Richard G
(eds) Computational approaches to analogical reasoning: current
trends. Springer, Berlin, pp 163194
Schwank I, Aring A, Blocksdorf K (2005) Beitrage zum Mathematikunterricht, Franzbecker, Hildesheim, chap Betreten
erwunschtdie Rechenwendeltreppe
Siegler R (1989) Mechanisms of cognitive development. Annu Rev
Psychol 40:353379
Thagard P, Cohen D, Holyoak K (1989) Chemical analogies: two
kinds of explanation. In: Proceeding of the 11th international
joint conference on artificial intelligence, pp 819824

Brain network states affect the processing


and perception of tactile near-threshold stimuli
Christoph Braun1,2,3,4, Anja Wuhle1,5, Gianpaolo Demarchi3,
Gianpiero Monittola3, Tzvetan Popov6, Julia Frey3, Nathan Weisz3
1
MEG-Center, University of Tubingen, Germany; 2 CIN, Werner
Reichardt Centre for Integrative Neuroscience, University of
Tubingen, Germany; 3 CIMeC, Center for Mind/Brain Sciences,
University of Trento, Italy; 4 Department of Psychology and
Cognitive Science, University of Trento, Italy; 5 CEA, DSV/I2BM,
NeuroSpin Center, F-91191 Gif-sur-Yvette, France, INSERM, U992,
Cognitive Neuroimaging Unit, F-91191 Gif-sur-Yvette, France, Univ
Paris-Sud, Cognitive Neuroimaging Unit, F-91191 Gif-sur-Yvette,
France; 6 Radboud University Nijmegen, Donders Institute for Brain,
Cognition, and Behavior, 6500 HE Nijmegen, The Netherlands
Introduction
Driving a biological or technical system to its limits reveals more
detailed information about its functional principles than testing it at
its standard range of operation. We applied this idea to get a better
understanding of how tactile information is processed along the
somatosensory pathway. To get insight into what makes a stimulus to
become conscious i.e. reportable, we studied the cortical processing
of near-threshold touch stimuli that are either perceived (hits) or not
(misses). Following the concept of win2con proposed by Weisz et al.
(2014) we tested the hypothesis that the state of functional connectivity within the sensory network determines up to which level in the
sensory processing hierarchy misses are processed as compared to
hits. The level of sensory processing was inferred by studying
somatosensory evoked responses elicited by the near-threshold
stimuli. Since the amplitudes of near-threshold somatosensory stimuli
are low, a paired pulse paradigm was used in which inhibitory effects
of the near-threshold stimuli onto a subsequently applied suprathreshold stimulus was assessed. Results show that the state of a widespread cortical network prior to the application of the tactile stimulus
is crucial for a tactile stimulus to elicit activation of SII and to be
finally perceived.
Subjects and Methods
Twelve healthy subjects participated in the study. Using a piezoelectric stimulator (Quaerosys, Schotten, Germany) we applied tactile
stimuli to the tip of the index finger of the left hand. Intensities of the
near threshold stimuli were adjusted to subjects personal sensory
threshold using a staircase procedure. The near-threshold stimulus

123

S32
was followed by a supra-threshold stimulus to probe the cortical
activation of the first stimulus. Control conditions in which the first
stimulus was omitted and in which the first stimulus was delivered at
supra-threshold intensities were also added. Subjects reported in all
trials how many stimuli they had perceived.
Pre-stimulus network states and post-stimulus cortical processing
of the sensory input were studied by means of magnetoencephalography. To assess the cortical network prior to stimulation, source
activity was estimated for nodes of an equally spaced grid and all-toall imaginary coherence was calculated. Alterations in power and
graph theoretical network parameters were estimated. Since secondary somatosensory cortex (SII) appears to play a crucial role in the
processing of consciously perceived tactile stimuli we used it as a
seed region for identifying the related brain network. In order to
assess post-stimulus processing and its dependency from the prestimulus network state, evoked responses were recorded. Since evoked
responses to near-threshold stimulation are rather weak, the activation
induced by near-threshold stimulus was probed by subsequently
applying a suprathreshold test stimulus. To determine the source
activity, a spatio-temporal dipole model with one source for primary
somatosensory cortex (SI) contralateral to the stimulation site and two
dipoles for ipsi- and contralateral SII were used. The model was
applied to both, the direct evoked responses of the near-threshold
stimuli and to the activation evoked by the probe stimulus. Since the
duration of activation differs across the different sensory brain areas
in the paired pulse approach varying ISIs of 30, 60, and 150 ms
between the near-threshold and the test stimulus allowed for probing
the sensory processing of the near-threshold stimulus at different
levels (Wuhle et al. 2010).
Results
Network analysis for the prestimulus period yielded increased alpha
power in trials in which the near-threshold stimulus was not detected. On
a global level brain networks appeared to be more strongly clustered for
misses than for hits. In contrast, on a local level, clustering coefficients
were stronger for hits than for misses in particular for contralateral SII. A
detailed analysis of the connectedness of SII revealed that except for
connections to the precuneus SII was more strongly connected to other
brain areas such as ipsilateral inferior frontal/anterior temporal cortex and
middle frontal gyrus for hits than for misses. Results suggest that the state
of the prestimulus somatosensory network involving particularly middle
frontal gyrus, cingulate cortex and fronto temporal regions determine
whether near-threshold tactile stimuli elicit activation of SII and are
subsequently perceived and reported.
Studying poststimulus activation, no significant difference
between hits and misses was found on the level of SI, neither for the
direct evoked response of the near-threshold stimulus nor for its
effects on the subsequent probe stimulus. In contrast, on the level of
SII a significant difference between hits and misses could be shown in
response to the near-threshold stimuli. Moreover, the SII response of
the probe stimulus was inhibited by the primarily applied nearthreshold for exclusively an ISI of 150 ms, but not for shorter ISIs
(Wuhle et al. 2011).
Discussion
The here reported study emphasizes the importance of the prestimulus
state of brain networks for the subsequent activation of brain regions
involved in higher level stimulus processing and for the conscious
perception of sensory input. In tactile stimulus processing, secondary
somatosensory cortex appears to be the critical region that is
embedded in a wide brain network and that is relevant for the gating
of sensory input to higher level analysis. This finding corresponds
with the established view that processing of sensory information SII is
strongly modulated by top-down control. Network analyzes indicated
that the sensory network involving SII, middle frontal gyrus, cingulated cortex and fronto temporal brain regions has to be distinguished

123

Cogn Process (2014) 15 (Suppl 1):S1S158


from the global brain network. For stimuli to be perceived consciously, it seems that the sensory network has to reveal increased
coupling in a local (clustering) as well as a long-range (efficiency)
sense.
Combining a sensory task at the limit of sensory performance with
elaborated techniques for brain network analyzes and the study of
brain activation, the current study provided insight into the interaction
between brain network states, brain activation and conscious stimulus
perception.
References
Weisz N, Wuhle A, Monittola G, Demarchi G, Frey J, Popov T, Braun
C (2014) Prestimulus oscillatory power and connectivity patterns
predispose conscious somatosensory perception. Proc Natl Acad
Sci USA 111(4):E417E425
Wuhle A, Mertiens L, Ruter J, Ostwald D, Braun C (2010) Cortical
processing of near-threshold tactile stimuli: an MEG study.
Psychophysiology 47(3):523534
Wuhle A, Preissl H, Braun C (2011) Cortical processing of nearthreshold tactile stimuli in a paired-stimulus paradigman MEG
study. Eur J Neurosci 34(4):641651

A model for dynamic minimal mentalizing in dialogue


Hendrik Buschmeier, Stefan Kopp
Social Cognitive Systems Group, CITEC and Faculty of Technology,
Bielefeld University, Germany
Spontaneous dialogue is a highly interactive endeavor in which
interlocutors constantly influence each others actions. As
addressees they provide feedback of perception, understanding,
acceptance, and attitude (Allwood et al. 1992). As speakers they
adapt their speech to the perceived needs of the addressee, propose
new terms and names, make creative references, draw upon
established and known to be shared knowledge, etc. This makes
dialogue a joint activity (Clark 1996) whose outcome is not
determined up front but shaped by the interlocutors while the
interaction unfolds over time.
One of the tasks interlocutors need to carry out while being
engaged in a dialogue is keeping track of the dialogue information
state. This is usually considered to be a rich representation of the
dialogue context, most importantly including which information is
grounded and which is still pending to be grounded (and potentially
much more information; see, e.g., Ginzburg 2012). Whether such a
detailed representation of the information state is necessaryand
whether it is a cognitively plausible assumptionfor participating in
dialogue is a topic of ongoing debate.
On the one hand, Brennan and Clark (Brennan and Clark 1996;
Clark 1996) state that speakers maintain a detailed model of
common ground and design their utterance to the exact needs of
their communication partnerseven to the extent that approximate
versions of mutual knowledge may be necessary to explain certain
dialogue phenomena (Clark and Marshall 1981). On the other hand,
Pickering and Garrod (2004) argue thatfor reasons of efficiencydialogue cannot involve heavy inference on common
ground, but is an automatic process that relies on priming and
activation of linguistic representations and uses interactive repair
upon miscommunication. A position that falls in between this
dichotomy is Galati and Brennans (2010) lightweight one-bit
partner model (e.g., has the addressee heard this before or not) that
can be used instead of full common ground when producing an
utterance.

Cogn Process (2014) 15 (Suppl 1):S1S158


We propose that interlocutors in dialogue engage in dynamic
minimal mentalizing, a process that goes beyond the single properties
in the focus of Galati and Brennans (2010)
one-bit model, but is comparable in computational efficiency.
We assume that speakers maintain a probabilistic, multidimensional
(consisting of a fixed number of state variables), and dynamic
attributed listener state (Buschmeier and Kopp 2012). We model
this as a dynamic Bayesian network representation (see Fig. 2) that is
continuously updated by the addressees communicative feedback
(i.e., short verbal-vocal expressions such a uh-huh, yeah, huh?;
head gestures; facial expressions) seen as evidence of understanding
in response to ongoing utterances.
The proposed model is multidimensional because it represents the
listeners mental state of listening in terms of the various communicative functions that can be expressed in feedback (Allwood et al.
1992): is the listener in contact?; is he or she willing and able to
perceive and understand what is said?; and does he or she accept the
message and agrees to it? Instead of making a decision conditioned
on the question whether the interlocutor has heard something before,
this model allows to make use of the still computationally feasible but
richer knowledge of whether he or she has likely perceived, understood, etc. a previously made utterance.
Further, the model is fully probabilistic since the attributed mental
states are modelled in a Bayesian network. Each dimension is represented as a random variable and the probabilities over the state of
each variable (e.g., low, medium, high understanding) are interpreted
in terms of the speakers degree of belief in the addressee being in a
specific state. This is a graded form of common ground (BrownSchmidt 2012) and presupposition (e.g., this knowledge is most likely
in the common ground; see variables GR and GR0 in Fig. 2), which
can be accommodated by, e.g., interactively leaving information out
or adding redundant information; or by making information pragmatically implicit or explicit.
Finally, since the model is based on a dynamic Bayesian network,
the interpretation of incoming feedback signals from the addressee is
influenced by the current belief state, and changes of the attributed
listener state are tracked over time. Representing these dynamics
provides speakers with a broader basis for production choices as well
as enabling strategic placement of feedback elicitation cues based on
informational needs. It also allows for a prediction of the addressees
likely future mental state, thus enabling anticipatory adaptation of
upcoming utterances.
In current work, the model of dynamic minimal mentalizing is
being applied and evaluated in a virtual conversational agent that is
able to interpret its users communicative feed- back and adapt its
own language accordingly (Buschmeier and Kopp 2011, 2014).

S33
Acknowledgments
This research is supported by the Deutsche Forschungsgemeinschaft
(DFG) through the Center of Excellence EXC 277 Cognitive Interaction Technology.
References
Allwood J, Nivre J, Ahlsen E (1992) On the semantics and pragmatics
of linguistic feedback. J Semant 9:126. doi:10.1093/jos/9.1.1
Brennan SE, Clark HH (1996) Conceptual pacts and lexical choice in
conversation. J Exp Psychol Learn Memory Cogn 22:14821493.
doi:10.1037/0278-7393.22.6.1482
Brown-Schmidt S (2012) Beyond common and privileged: gradient
representations of common ground in real-time language use.
Lang Cogn Process 6289. doi10.1080/01690965.2010.543363
Buschmeier H, Kopp S (2011) Towards conversational agents that attend
to and adapt to communicative user feedback. In: Proceedings of the
11th international conference on intelligent virtual agents, Reykjavk, Iceland, pp 169182, doi:10.1007/978-3-642-23974-8_19
Buschmeier H, Kopp S (2012) Using a Bayesian model of the listener
to unveil the dialogue information state. In: SemDial 2012:
proceedings of the 16th workshop on the semantics and pragmatics of dialogue, Paris, France, pp 1220
Buschmeier H, Kopp S (2014) When to elicit feedback in dialogue:
towards a model based on the information needs of speakers. In:
Proceedings of the 14th International Conference on Intelligent
Virtual Agents, Boston, MA, USA, pp 7180
Clark HH (1996) Using language. Cambridge University Press,
Cambridge. doi:10.1017/CBO9780511620539
Clark HH, Marshall CR (1981) Definite reference and mutual knowledge. In: Joshi AK, Webber BL, Sag IA (eds) Elements of discourse
understanding. Cambridge University Press, Cambridge, pp 1063
Galati A, Brennan SE (2010) Attenuating information in spoken
communication: For the speaker, or for the addressee? J Memory
Lang 62:3551. doi10.1016/j.jml.2009.09.002
Ginzburg J (2012) The interactive stance. Oxford University Press,
Oxford
Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of
dialogue. Behav Brain Sci 27:169 226. doi:10.1017/S0140
525X04000056

Actions revealing cooperation: predicting


cooperativeness in social dilemmas
from the observation of everyday actions
Dong-Seon Chang, Heinrich H. Bulthoff, Stephan de la Rosa
Max Planck Institute for Biological Cybernetics, Dept. of Human
Perception, Cognition and Action, Tubingen, Germany

Fig. 1 The dynamic Bayesian network model for dynamic minimal


mentalizing. The network consists of the mental state variables for
contact (C), perception (P), understanding (U), acceptance (AC),
agreement (AG), and groundedness (GR) attributed to the listener

Introduction
Human actions contain an extensive array of socially relevant information. Previous studies have shown that even brief exposure to
visually-observed human actions can lead to accurate predictions of
goals or intentions accompanying human actions. For example, motion
kinematics can enable predicting the success of a basketball shot, or
whether a hand movement is carried out with cooperative or competitive intentions. It has been also reported that gestures accompanying a
conversation can serve as a rich source of information for decision
making to judge about the trustworthiness of another person. Based on
these previous findings we wondered whether humans could actually
predict the cooperativeness of another individual by identifying visible
social cues. Would it be possible to predict the cooperativeness of a
person by just observing everyday actions such as walking or running?
We hypothesized that even brief excerpts of human actions depicted and
presented as biological motion cues (i.e. point-light-figures) would

123

S34
provide sufficient information to predict cooperativeness. Using
motion-capture technique and a game-theoretical interaction setup we
explored whether prediction of cooperation was possible merely by
observing biological motion cues of everyday actions, and which
actions were enabling these predictions.
Methods
We recorded six different human actionswalking, running, greeting,
table tennis playing, choreographed dancing (Macarena) and spontaneous dancingin normal participants using an inertia-based motion
capture system. We used motion capture technology (MVN Motion
Capture Suit from XSense, Netherlands) to record all actions. A total
number of 12 participants (6 male, 6 female) participated in motion
recording. All actions were then post-processed to short movies (ca.
5 s) showing point light stimuli. These actions were then evaluated by
24 other participants in terms of personality traits such as cooperativeness and trustworthiness, on a Likert scale ranging from 1 to 7. The
original participants who provided the recorded actions then returned a
few months later to be tested for their actual cooperativeness performance. They were given standard social dilemmas used in game
theory such as the give some game, stag hunt game, and public goods
game. In those interaction games, they were asked to exchange or give
tokens to another player, and depending on their choices they were
able to win or lose an additional amount of money. The choice of
behavior for each participant was then recorded and coded for cooperativeness. This cooperativeness performance was then compared
with the perceived cooperativeness based on the different ratings of
their actions performed and evaluated by other participants.
Results and Discussion
Preliminary results showed a significant correlation between cooperativeness ratings and actual cooperativeness performance. The
actions showing a consistent correlation were Walking, Running and
Choreographed Dancing (Macarena). No significant correlation was
observed for actions such as Greeting, Table tennis playing or
Spontaneous Dancing. A similar tendency was consistently observed
across all actions, although no significant correlations were found for
all social dilemmas. The ratings of different actors and actions were
highly consistent across different raters and high inter-rater-reliability
was achieved. It seems possible that natural and constrained actions
carry more social cues enabling prediction of cooperation than actions
showing more variance across different participants. Further studies
with higher number of actors and raters are planned to confirm
whether accurate prediction of cooperation is really possible.

The use of creative analogies in a complex problem


situation
Melanie Damaskinos1, Alexander Lutsevich1, Dietrich Dorner1,
Ute Schmid1, C. Dominik Guss1,2
1
Otto-Friedrich Universitat Bamberg, Germany; 2 University of
North Florida, USA
Keywords
Analogy, creativity, dynamic decision making, complex problem
solving, strategies
Analogical reasoning is one key element of creative thinking and
one of the key human abilities of domain-general cognitive mechanisms (Keane 1988). A person takes the structure and elements of one
domain and tries to apply them to the new problematic domain.
Experimental research has shown the transfer of knowledge from one
domain to another (e.g., Wiese et al. 2008).
Analogical reasoning has been studied often in classrooms and
related to mathematical problems, but the use of analogies in complex
and uncertain domains has been studied rarely in the laboratory. Yet,
the study of creative analogy use in complex problem solving would be

123

Cogn Process (2014) 15 (Suppl 1):S1S158


highly relevant considering the demands of most real-life problems.
The goal of the current study is to examine how helpful or hindering
certain analogies can be for solving a complex and dynamic problem
such as improving living conditions of a fictional tribe in the MORO
simulation (Dorner 1996). We expect an analogy story that highlights a
dynamic system (blood sugar) prime participants and facilitate problem
solving more than an analogy story that highlights linear processing
(visual perception) or no analogy story at all (control). The facilitating
analogy story will make participants more sensitive to the interconnectedness of the system variables in the complex problem and
therefore will lead to more reflection time at the beginning of the
simulation and more in-depth information collection and fewer actions.
Method
Participants were 29 psychology students from Otto-Friedrich University Bamberg, Germany. (More data will be collected.)
We used three different analogy stories (facilitating systems analogy
storyblood sugar, distracting linear analogy storyvisual perception, controlno story). Participants received either the blood-sugar
story, the visual perception story, or no story prior to working with the
MORO simulation. The stories were 1.5 pages long including two
figures each. The blood-sugar story described the changes in blood
sugar dependent on food intake. It also showed the long-term consequences of high sugar consumption. It showed that the body is a
dynamic system. The visual perception story described the linear
process of perception from stimulus till processing in cortex. The
blood-sugar story will prime systemic thinking considering side- and
long-term effects of actions and considering the balance of the system. The visual-perception story will prime liner one-dimensional
thinking. The control group will not receive any story and not be
primed.
MORO is the computer simulation of a tribe of semi-nomads in the
Sahel zone (Lutsevich 2013). MORO is especially suited to study
complex problem solving due to the huge number of variables
involved and the demand to come up with novel solutions and to
coordinate the decisions. Participants take the role of developmental
aid assistants and try to help improve the living conditions of the
MORO tribe. Participants sit in front of a computer screen and can
select information and make decisions using the mouse. A file documenting all of the participants decisions is automatically saved to
the hard drive. For the current study we focused only on the first
12 min of played time, because we postulated that especially the
initial time would be influenced mostly by the analogy story presented. Later, demands of the problem situation will become more
influential.
A demographic questionnaire was administered to control for
potential confounding variables, assessing age, sex, major, and student status.
Results
We are still continuing with data collection, but preliminary results
refer to the three dependent variables: ANumber of actions, IS
Number of information searches, and RTReflection time periods
greater than 15 s where no action and no information search took
place. These variables were assessed for the first 12 min in intervals of
4 min each from the participants MORO log files and then combined.
Results did not confirm our expectations. Initial data analysis
showed that the group primed with the system blood sugar analogy
story compared to the two other groups did not show more reflection
periods and more information searches in the first 12 min.
For the first 12 min, participants of the systems analogy story
followed a more balanced strategy compared to the two other
groups. The control group followed blind actionism, engaging in
most actions and information searches, but fewest reflection times.
The group primed with the linear analogy spent most time reflecting,
but made the fewest actions and information searches. The means for
actions, information searches, and reflection times of the systems
analogy group were between the means of the linear prime group and

Cogn Process (2014) 15 (Suppl 1):S1S158

S35

control group (see Fig. 1). Mean differences among the three groups
were not significant for Actions, F(2, 26) = 1.81, p = .18; but for
Information Searches, F(2, 26) = 5.63, p = .009; and for Reflection
Times, F(2, 26) = 4.81, p = .02.
An alternative explanation for the strategic differences among the
three groups could be individual difference variables. We assessed the
need to reduce cognitive uncertainty, final high school leaving
examination grade, age, and gender. None of the four variables correlated significantly with either actions, or information searches, or
reflection time periods (see Table 1). Yet the three decision-making
measures correlated significantly with each other. Obviously, the
more time spent for reflection, the fewer actions and the fewer
information searches took place. Or, the more actions and the more
information searches, the fewer reflection times took place.
Conclusion
Creativity has been rarely studied in relation to complex microworlds.
Thus, a process-analysis of creative analogical reasoning in a complex, uncertain, and dynamic microworld is a novel research topic and
other researchers expressed the need to experimentally assess creativity in a complex and novel problem situation and to focus on idea
evaluation and implementation (Funke 2000).
Further data analysis will also include the correlation of strategy
and performance in MORO. Preliminary results of the current study
showed that the presented analogy stories primed decision making and
problem solving but not in the expected direction. Participants primed
with the systems story followed a balanced approach where number of
actions, information searches, and reflection times were similarly
frequent. Participants primed with the linear story spent most time
reflecting and searching information, perhaps because they were
primed that a decision leads to a linear consequence. Participants who
did not receive any story showed most actions and fewest reflection
times. It is possible that no story provided no helpful cues and lead to
most uncertainty and to actionism (see Dorner 1996). These findings
could have implications for training programs and education which
focus on teaching children, students, and experts to be sensitive to the
characteristics of complex, uncertain, and dynamic problems.

Actions - InformationSearches - Reflection Times


16
14
12
10
8
6
4
2
0
Linear System Control Linear System Control Linear System Control
Prime Prime
Prime Prime
Prime Prime
Actions

Information Searches
**

Reflection Times **

Fig. 1 Means of actions, information searches, and reflection time


periods of 15 s or longer for the first 12 min of participants working
on the MORO simulation

Acknowledgments
This research was supported through a Marie-Curie IIF Fellowship to
the last author.
References
Dorner D (1996) The logic of failure. Metropolitan Books, New York
Funke J (2000) Psychologie der Kreativitat [Psychology of creativity]. In: Holm-Hadulla RM (ed) Kreativitat. Springer,
Heidelberg, pp 283300
Keane MT (1988) Analogical problem solving. Ellis Horwood,
Chichester
Lutsevich A, Dorner D (2013) MORO 2 (completely revised new
version). Program documentation. Otto-Friedrich Universitat
Bamberg
Wiese E, Konerding U, Schmid U (2008) Mapping and inference in
analogical problem solvingas much as needed or as much as
possible? In: Love BC, McRae K, Sloutsky VM (eds) Proceedings of the 30th annual conference of the cognitive science
society. Lawrence Erlbaum, Mahwah pp 927932

Yes, thats right? Processing yes and no and attention


to the right vs. left
Irmgard de la Vega, Carolin Dudschig, Barbara Kaup
University of Tubingen, Germany
Recent studies suggest that positive valence is associated with the
dominant hands side of the body and negative valence with the nondominant hands side of the body (Casasanto 2009). This association
is also reflected in response times, with right- and left-handers
responding faster with their dominant hand to positive stimuli (e.g.,
love), and with their non-dominant hand to negative stimuli (e.g.,
hate; de la Vega et al. 2012; see also de la Vega et al. 2013). Interestingly, a similar finding emerges for yes- and no-responses: righthanded participants respond faster with their dominant hand to yes,
and with their non-dominant hand to no (de la Vega et al. in prep).
The present study tested whether the association between yes/no
and (non-)dominant hand is reflected in a visual attention shift.
Spatial attention has been shown to be influenced by various categories. For example, the association between numbers and horizontal
space (SNARC effect; Dehaene et al. 1993) is also reflected in visual
attention: in a target detection task, participants responded faster to a
target presented on the left after a low digit, and to a target on the
right after a high digit (Fischer et al. 2003; see also Dudschig et al.
2012).
We adapted the target detection task from Fischer et al. (2003) to
investigate visuospatial attention shifts after yes or no. In line with the
results obtained by Fischer et al. (2003), we expected faster detections
of a target located on the right after yes, and of a target on the left
after no. Twenty-two volunteers (1 male; MAge = 23.0, SDAge = 5.3)
participated in the study. The word yes (in German: Ja) or no (in
German: Nein) appeared centrally on the computer screen for 300 ms,
followed by a target on the right or on the left. Participants task was

Table 1 Correlations of individual difference variables and behavioral complex-problem solving measures
Cognitive uncertainty

Final high school grade

Age

Gender

Actions

-.05

-.11

-.20

-.14

Information searches
Reflection times

-.24
.24

.17
.04

-.18
.30

.09
.07

Actions

Information searches

.27
-.70***

-.53**

*** p \ .001; ** p \ .005; * p \ .05

123

S36

Cogn Process (2014) 15 (Suppl 1):S1S158


de la Vega I, Dudschig C, Kaup B (in prep.) Faster responses to yes
with the dominant hand and to no with the non-dominant hand: a
compatibility effect
Dudschig C, Lachmair M, de la Vega I, De Filippis M, Kaup B (2012)
From top to bottom: spatial shifts of attention caused by linguistic stimuli. Cogn Process 13:S151S154
Fischer MH, Castel AD, Dodd DD, Pratt J (2003) Perceiving numbers
causes spatial shifts of attention. Nat Neurosci 6:555556
Posner MI, Cohen Y (1984) Components of visual orienting. In:
Bouma H, Bouwhuis D (eds) Attention and Performance Vol.
X. Erlbaum, pp 531556

Perception of background color in head mounted


displays: applying the source monitoring paradigm
Nele M. Fischer, Robert R. Brauer, Michael Unger
University of Applied Sciences Leipzig, Germany

Fig. 1 Mean response times in the target detection task. Error bars
represent confidence intervals (95 %) for within-subject designs and
were computed as recommended by Masson and Loftus (2003)

to press a key as soon as they had detected the target in the left or
right box. Responses under 100 ms were excluded from analysis
(1.1 %). The remaining RTs were submitted to a 2 (word: yes vs. no)
x 2 (target location: left vs. right) ANOVA. Visuospatial attention
was influenced by the words yes or no, as indicated by an interaction
between word and target location. However, contrary to our
hypothesis, an interference showed (see Fig. 1): Target detection
occurred faster on the left after yes, and faster on the right after no,
F(1,21) = 6.80, p = .016.
One explanation for this unexpected pattern might be inhibition of
return (see Posner, Cohen 1984). Upon perceiving the word yes or no,
attention might move immediately to the right or to the left, but after
it is withdrawn, participants might be slower to detect a stimulus
displayed in this location. Using variable delays between word and
target presentation should clarify this issue. Another possibility is that
the observed pattern does not result from an association between yes/
no and right/left stemming from handedness, but rather corresponds to
the order in which the words yes and no are usually encountered.
When used together in a phrase, yes is usually used before no (e.g.,
Whats your answeryes or no?); as a result, in left-to-right
writing cultures, yes might become associated with the left side, no
with the right side. We are planning to investigate this possibility, as
well as the question under which conditions an association between
yes and the left side vs. yes and the right hand becomes activated, in
future studies.
References
Casasanto D (2009) Embodiment of abstract concepts: good and bad
in right- and left-handers. J Exp Psychol Gen 138:351367
Dehaene S, Bossini S, Giraux P (1993) The mental representation of
parity and number magnitude. J Exp Psychol Gen 122:371396
de la Vega I, De Filippis M, Lachmair M, Dudschig C., Kaup B
(2012) Emotional valence and physical space: limits of interaction. J Exp Psychol Hum Percept Perform 38:375385
de la Vega I, Dudschig C, De Filippis M, Lachmair M, Kaup B (2013)
Keep your hands crossed: the valence-by-left/right interaction is
related to hand, not side, in an incongruent hand-response key
assignment. Acta Psychol 142:273277

123

Monocular look-around Head Mounted Displays (HMDs), for instance


the Smart Glasses Vuzix M100, are wearable devices that enrich visual
perception with additional information by placing a small monitor
(e.g., LCD) in front of one eye. While having access to various kinds of
information, users can engage in other tasks, such as reading assembly
instructions on the HMD while performing a manual assembly task. To
reduce the distraction from the main task, the information should be
presented in a way that is perceived comfortable with as little effort as
possible. It is likely that display polarity has an impact on information
perception since positive polarity (i.e. black font on white background)
is widely recognized for better text readability. However, in specific
viewing conditions the bright background illumination of a positive
polarity was found to reduce word recognition and induce discomfort
compared to negative polarity (white font on black background) (Tai,
Yan, Larson, Sheedy 2013). Since perception of HMDs might differ to
some extend from stationary displays (e.g., Naceri, Chellali, Dionet,
Toma 2010) and color has an impact on information perception (e.g.,
Dzulkifli, Mustafar 2013), we investigated the impact of polarity on
perception in a monocular look-around HMD. If one type of polarity
(positive or negative) is less distracting from the presented content, we
would expect enhanced recognition due to a deeper processing of the
material (Craik, Lockhart 1972). Meanwhile, the memory of the
polarity itself should decrease when it is less distracting (source
monitoring: Johnson, Hashtroudi, Lindsay 1993). Furthermore, subjective preference ratings should match the less distracting polarity
(Tai et al. 2013).
To test this, we conducted a recognition test within the source
monitoring paradigm (Johnson et al. 1993) and asked participants for
their polarity preference. In our experimental setting, 32 single-item
words were presented in sequence with either positive or negative
polarity on the LCD screen of the monocular look-around HMD
Vuzix M100. Directly afterwards participants rated their preferred
polarity. Following a short distraction the recognition and source
memory test was conducted. All previously presented (old) words
were mixed with the same amount of new distracter words. For each
item, participants decided whether the item was previously presented
or new and, if assigned old, they had to determine the items polarity
(positive or negative).
The results of our study on polarity for the monocular look-around
display Vuzix M100 indicated that negative polarity increased word
recognition and was preferred by participants. Despite our assumptions,
the recognition of negative polarity (source monitoring) increased as
well, which might be the effect of the higher recognition rate for items
having negative polarity. These results do not only support a design
decision, they also correspond to the subjective preference ratings of
participants with data from memory research. Thus, preference ratings

Cogn Process (2014) 15 (Suppl 1):S1S158


appear to be a good indicator for issues of user perception. Based on
these results, we recommend the use of negative polarity to display short
text information, e.g. assembly instructions, in monocular look-around
HMDs with near-to-eye LCD display (e.g., approximately 4 cm distance to the eye in Vuzix M100), since it appears to be less distracting
and more comfortable than positive polarity. Due to the small sample
size, further examination is needed on this topic.
References
Craik FIM, Lockhart RS (1972) Levels of processing: a framework
for memory research. J Verbal Learn Verbal Behav 11:671684
Dzulkifli MA, Mustafar MF (2013) The influence of colour on
memory performance: a review. Malaysian J Med Sci 20:39
Johnson MK, Hashtroudi S, Lindsay DS (1993) Source monitoring.
Psychol Bull 114:328
Naceri A, Chellali R, Dionnet F, Toma S (2010) Depth perception
within virtual environments: comparison between two display
technologies. Int J Adv Intell Syst 3:5164
Tai YC, Yan SN, Larson K, Sheddy J (2013) Interaction of ambient
lighting and LCD display polarity on text processing and viewing
comfort. J Vis 13(9) article 1157

Continuous goal dynamics: insights from


mouse-tracking and computational modeling
Simon Frisch, Maja Dshemuchadse, Thomas Goschke,
Stefan Scherbaum
Technische Universitat Dresden, Germany
Goal-directedness is a core feature of human behavior. Therefore, it is
mandatory to understand how goals are represented in the cognitive
system and how these representations shape our actions. Here, we will
focus on the time-dependence of goal-representations (Scherbaum,
Dshemuchadse, Ruge, Goschke 2012). This feature of goal-representations is highlighted by numerous task-switching studies which
demonstrate that setting a new goal is associated with behavioral costs
(Monsell 2003; Vandierendonck, Liefooghe, Verbruggen 2010).
Moreover, participants have difficulties to ignore previously relevant
goals (perseveration, PS) or to attend to previously irrelevant goals
(learned irrelevance, LI, c.f. Dreisbach, Goschke 2004). Thus, goals
are not switched on or off instantaneously but take time to build
up and decay. This is also assumed by connectionist models of task
switching (e.g. Gilbert, Shallice 2002), where goal units need time to
shift between different activation patterns.
While both empirical evidence and theory underline the dynamic
nature of goals, models and empirical findings have mostly been
linked by comparing modelled and behavioral outcomes (e.g.
response times). However, these discrete values provide only loose
constraints for theorizing about the processes underlying these measures. Here, we aim towards a deeper understanding of continuous
goal dynamics by comparing the continuous performance of a
dynamic neural field (DNF) model with a continuous measure of
goal-switching performance, namely mouse movements. Originally,
the two phenomena of PS and LI were studied by Dreisbach and
Goschke (2004) in a set switching task: participants categorized a
number presented in a cued color (target) while ignoring a number in
another color (distracter). After several repetitions, the cue indicated
to attend to a new color. Two kinds of switches occurred: In the PScondition, the target was presented in a new color while distracters
were presented in the previous target color (e.g. red). In the LI-condition, the target was presented in the previous distracter color while
distracters were presented in a new color (e.g. green). While the
results indicated typical switch patterns in response times for both
conditions, the processes underlying the observed switch costs

S37
remained unclear. For example, Dreisbach and Goschke (2004) could
only speculate whether the LI-effect was driven by difficulties to
activate a goal that had been ignored beforehand or by a novelty boost
that draws attention towards the distracting color.
Addressing these open questions, we created a DNF- model of the
task. Instead of including additional mechanisms to incorporate processes like attentional capture or goal-specific inhibition, we built the
most parsimonious model that relies exclusively on continuously
changing levels of goal activation. In this respect, DNFs are suited
exceptionally well to model dynamic goal-directed behavior as they
embrace cognition as a deeply continuous phenomenon that is tightly
coupled to our sensorimotor systems (Sandamirskaya, Zibner, Schneegans, Schoner 2013). Our model consists of three layers, similarly
to previous models of goal-driven behavior and goal-switching (c.f.
Gilbert, Shallice 2002; Scherbaum et al. 2012). A goal layer represents
the cued target color by forming a peak of activation at a specific site.
When activation reaches a threshold, it feeds into an associations-layer
representing colors and magnitudes of the current stimuli. The
emerging pattern of activation is then projected into a response layer,
resulting in a tendency to move to the left or right. Notably, as is
typical for DNF-models, all layers are continuous in representational
space. This allowed us to study the models behavior continuously
over time instead of obtaining discrete threshold responses. Crucially,
the inert activation dynamics inherent to DNFs provide a simple
mechanism for the time-consuming processes of goal-setting and shifting observed in behavioral data.
A simulation study of the original paradigm indicated similar costs
in response times for PS- and LI-switches as observed by Dreisbach
and Goschke (2004). However, continuous response trajectories
provided differential patterns for PS- and LI- trials: PS-switches
yielded response trajectories that were deflected towards the previously relevant information, while LI-switches yielded a tendency to
keep the response neutral for a longer time before deciding for one
alternative. We validated these predictions in a set-switching experiment that was similar to the one conducted by Dreisbach and
Goschke (2004). However, instead of responding with left or right key
presses, participants moved a computer mouse into the upper left or
right corner of the screen. As expected, goal switches induced switch
costs in response times. More intriguingly, mouse movements replicated the models dynamic predictions: PS-switches yielded
movements strongly deflected to the alternative response, whereas LIswitches yielded indifferent movements for a longer time than in
repetition trials.
In summary, our DNF-model and mouse-tracking data suggest that
continuously changing levels of goal activation constitute the core
mechanism underlying goal-setting and shifting. Therefore, we
advocate the combination of continuous modelling with continuous
behavioral measures, as this approach offers new and deeper insights
into the dynamics of goals and goal-directed action.
References
Dreisbach G, Goschke T (2004) How positive affect modulates cognitive control: reduced perseveration at the cost of increased
distractibility. J Exp Psychol Learn Memory Cogn 30(2):343353.
doi:10.1037/0278-7393.30.2.343
Gilbert SJ, Shallice T (2002) Task switching: A PDP model. Cogn
Psychol 44(3):297337. doi:10.1006/cogp.2001.0770
Monsell S (2003) Task switching. Trend Cogn Sci 7(3):134140. doi:
10.1016/S1364-6613(03)00028-7
Sandamirskaya Y, Zibner SKU, Schneegans S, Schoner G (2013)
Using dynamic field theory to extend the embodiment stance
toward higher cognition. New Ideas Psychol 31(3):322339. doi:
10.1016/j.newideapsych.2013.01.002
Scherbaum S, Dshemuchadse M, Ruge H, Goschke T (2012)
Dynamic goal states: adjusting cognitive control without

123

S38

Cogn Process (2014) 15 (Suppl 1):S1S158

conflict monitoring. NeuroImage 63(1):126136. doi:10.1016/


j.neuroimage.2012.06.021
Vandierendonck A, Liefooghe B, Verbruggen F (2010) Task
switching: Interplay of reconfiguration and interference control.
Psychol Bull 136(4):601626. doi:10.1037/a0019791

Looming auditory warnings initiate earlier eventrelated potentials in a manual steering task
Christiane Glatz, Heinrich H. Bulthoff, Lewis L.Chuang
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Automated collision avoidance systems promise to reduce accidents
and relieve the driver from the demands of constant vigilance. Such
systems direct the operators attention to potentially critical regions of
the environment without compromising steering performance. This
raises the question: What is an effective warning cue?
Sounds with rising intensities are claimed to be especially salient.
By evoking the percept of an approaching object, they engage a
neural network that supports auditory space perception and attention
(Bach et al. 2008). Indeed, we are aroused by and faster to respond to
looming auditory tones, which increase heart rate and skin conductance activity (Bach et al. 2009).
Looming sounds can differ in terms of their rising intensity profiles. While it can be approximated by a sound whose amplitude
increases linearly with time, an approaching object that emits a
constant tone is better described as having an amplitude that increases
exponentially with time. In a driving simulator study, warning cues
that had a veridical looming profile induced earlier braking responses
than ramped profiles with linearly increasing loudness (Gray 2011).
In the current work, we investigated how looming sounds might
serve, during a primary steering task, to alert participants to the
appearance of visual targets. Nine volunteers performed a primary
steering task whilst occasionally discriminating visual targets. Their
primary task was to minimize the vertical distance between an
erratically moving cursor and the horizontal mid-line, by steering a
joystick towards the latter. Occasionally, diagonally oriented Gabor
patches (108 tilt; 18 diameter; 3.1 cycles/deg; 70 ms duration) would
appear on either the left or right of the cursor. Participants were
instructed to respond with a button-press whenever a pre-defined
target appeared. Seventy percent of the time, these visual stimuli were
preceded by a 1,500 ms warning tone, 1,000 ms before they appeared.
Overall, warning cues resulted in significantly faster and more sensitive detections of the visual target stimuli (F1,8 = 7.72, p \ 0.05;
F1,8 = 9.63, p \ 0.05).
Each trial would present one of three possible warning cues. Thus,
a warning cue (2,000 Hz) could either have a constant intensity of
65 dB, a ramped tone with linearly increasing intensity from 60 dB to
approximately 75 dB or a comparable looming tone with an exponentially increasing intensity profile. The different warning cues did
not vary in their influence of the response times to the visual targets
and recognition sensitivity (F2,16 = 3.32, p = 0.06; F2,16 = 0.10,
p = 0.90). However, this might be due to our small sample size. It is
noteworthy that the different warning tones did not adversely affect
steering performance (F2,16 = 1.65, p \ 0.22). Nonetheless, electroencephalographic potentials to the offset of the warning cues were
significantly earlier for the looming tone, compared to both the
constant and ramped tone. More specifically, the positive component
of the event- related potential was significantly earlier for the looming
tone by about 200 ms, relative to the constant and ramped tone, and
sustained for a longer duration (see Fig. 1).
The current findings highlight the behavioral benefits of auditory
warning cues. More importantly, we find that a veridical looming tone
induces earlier event-related potentials than one with a linearly

123

Fig. 1 The topographical plot shows the 500 ms after sound offset,
with scalp maps plotted every 50 ms, for the constant (row 1), the
ramped (row 2), and the looming tone (row 3). The looming cues
evoked a strong positive deflection about 200 ms earlier than the
other sounds. The black bar at the bottom of the figure indicates
where the significance level of 0.01 was exceeded using a parametric
test on the combined Fz, FCz, Cz, and Pz activity
increasing intensity. Future work will investigate how this benefit
might diminish with increasing time between the warning tone and
the event that is cued for.
References
Bach DR, Schchinger H, Neuhoff JG, Esposito F, Salle FD, Lehmann
C, Herdener M, Scheffler K, Seifritz E (2008) Rising sound
intensity: an intrinsic warning cue activating the amygdala.
Cerebral Cortex 18(1):145150
Bach DR, Neuhoff JG, Perrig W, Seifritz E (2009) Looming sounds
as warning signals: the function of motion cues. Int J Psychophysiol 74(1):2833
Gray R (2011) Looming auditory collision warnings for driving.
Human Factors 53(1):6374

The creative process across cultures


Noemi Goltenboth1, C. Dominik Guss1,2, Ma. Teresa Tuason2
1
Otto-Friedrich Universitat Bamberg, Germany; 2 University of North
Florida, USA
Keywords
Creativity, Culture, Artists, Cross-cultural comparison
Creativity is the driving force of innovation in societies across the
world, in many domains such as science, business, or art. Creativity
means to come up with new and useful ideas (e.g., Funke 2008). Past
research has focused on the individual, the creative process and its
product, and the role of the social environment when evaluating
creative products. According to previous research, individual difference variables such as intelligence and extraversion can partially
predict creativity (e.g., Batey and Furnham 2006). Researchers have
also shown the importance of the social environment when labeling
products as creative or not (e.g., Csikszentmihalyi 1988). Although,
creativity could be influenced by and differ among cultures, the
influence of culture on creativity has been rarely studied.
Creativity and Culture
Culture can be defined as the knowledge base used to cope with the
world and each other, shared by a group of people and transmitted
from generation to generation (e.g., Guss et al. 2010). This knowledge
encompasses, for example, declarative world knowledge, values and
behaviors (e.g., norms, rituals, problem-solving strategies). Following
this definition, different cultures could value different aspects of
creativity (e.g., Lubart 1990).

Cogn Process (2014) 15 (Suppl 1):S1S158


The current study is based on two recommendations of creativity
researchers. First, it is important to study creativity across cultures as
Westwood and Low (2003, p 253) summarized: Clearly personality
and cognitive factors impact creativity and account for individual
differences, but when it comes to differences across cultures the
picture is far from clear. Second, researchers recommend ethnographic or socio-historical analyzes and case studies of creativity in
different countries to study emic conceptions and to study the interaction of societal, family and other factors in creativity (e.g.,
Simonton 1975). The current study addresses these recommendations
by investigating creativity across cultures focusing on experts from
Cuba, Germany, and Russia.
Method
Going beyond traditional student samples, we conducted semi-structured interviews with experts, i.e., 10 Cuban, 6 Russian, and 9
German artists. Informed consent was obtained. All of the artists have
received awards and fellowships for their creative work (i.e., compositions, books, poems, paintings). The interviews focused on a)
their personal history, b) the creative process, and c) the role of
culture during the creative process. These interviews lasted between
30 min and 1 h 43 min. They were transcribed verbatim and domains
and themes were derived from these interviews using consensual
qualitative research methodology (Hill et al. 2005). This means that at
least 3 raters independently read and coded each transcribed interview. Then the raters met and discussed the codings until they
obtained consensus.
Results
Several categories were mentioned by more than three quarters of all
25 participants. These categories refer to the following domains: 1)
How I became an artist, 2) What being an artist means to me, 3)
Creating as a cognitive process, 4) Creating as a motivational process,
and 5) The role of culture in creating.
Table 1 shows that German artists generally talk about financial
problems and the problem of selling their work, a topic rarely
mentioned by Cuban and Russian artists. Russian and German
artists generally recognize persistence and hard work in creativity,
and how a daily routine is helpful. A daily routine is rarely
mentioned by Cuban artists. All artists, regardless of culture,
recognize the universality of creativity, but acknowledge culture
specific expressions.
Discussion
The current study is innovative as it investigates cultural differences
among famous artists from Cuba, Russia, and Germany including
different groups of artists. The semi-structured interviews reveal a
wealth of different domains and categories related to creativity, and
highlight the need for a holistic, action oriented, and system-oriented

S39
approach when studying creativity. The findings also broaden a narrow cognitive view on creativity, highlighting also the role of
motivational and socio-cultural factors during the creative process
(for the role of societal context in creativity see also Nouri et al.
2014).
Whereas most artists experience similar creative processes, we
also found themes highlighting the influence of the artists cultural
background. Results are beneficial for further developing a comprehensive theory of the creative process taking cultural differences into
consideration and perhaps integrating them in computational creativity models (e.g., Colton and Wiggins 2012).
Acknowledgments
This research was supported through a Marie-Curie IIF Fellowship to
the second author and to a Fellowship of the Studienstiftung des
deutschen Volkes to the first author. We would like to thank the artists
for participating and allowing us a glimpse into their world that we
may learn from their experiences.
References
Batey M, Furnham A (2006) Creativity, intelligence, and personality:
a critical review of the scattered literature. Genetic Soc Gen
Psych Monogr 132:355429
Colton S, Wiggins GA (2012) Computational creativity: the final
frontier? In: Proceedings of 20th European conference on artificial intelligence (ECAI). Montpellier, France, pp 2126
Csikszentmihalyi M (1988) Society, culture and person: a systems
view of creativity. In: Sternberg RJ (ed) The nature of creativity:
contemporary psychological perspectives. Cambridge University
Press, New York, pp 325339
Funke J (2008) Zur Psychologie der Kreativitat. In Dresler M,
Baudson TG (eds) Kreativitat. Beitrage aus den Natur- und
Geisteswissenschaften [Creativity: Contributions from natural
sciences and humanities] (pp 3136). Hirzel, Stuttgart
Guss CD, Tuason MT, Gerhard C (2010) Cross-national comparisons
of complex problem-solving strategies in two microworlds. Cogn
Sci 34:489520
Hill CE, Knox S, Thompson BJ, Williams EN, Hess SA, Ladany N
(2005) Consensual qualitative research. J Counslg Psych
52:196205. doi:10.1037/0022-0167.52.2.196
Lubart TI (1990) Creativity and cross-cultural variation. Int J Psych
25:3959
Nouri R, Erez M, Lee C, Liang J, Bannister BD, Chiu W (2014)
Social context: key to understanding cultures effects on creativity. J Org Behav. doi:10.1002/job.1923

Table 1 Some cultural differences in category frequencies


Cuba

Russia

Germany

Being an artist means being financially uncertain

Variant

Typical

General

Being an artist means to deal with the necessary evil of marketing and selling the work

Rare

Rare

Typical

Being creative is natural to human beings

Variant

Typical

Variant

Creativity is persistence and hard work

Variant

General

General

It helps me to have a daily regular routine

Rare

Variant

Typical

Variant

Typical

Creativity is universal, but culture provides specific expressions (forms and circumstances) Typical
for creativity
General * [90 %

General = 910 General = 6

Typical * 5089 %

Typical = 58

Variant * 1149 %

Variant = 34

Variant = 23 Variant = 34

Rare * \10 %

Rare = 12

Rare = 1

General = 89

Typical = 45 Typical = 57
Rare = 12

123

S40
Simonton DK (1975) Sociocultural context of individual creativity: a
trans-historical time-series analysis. J Pers Soc Psych 32:11191133
Westwood R, Low DR (2003) The multicultural muse: culture, creativity and innovation. Int J Cross Cult Manag 3:235259

How do human interlocutors talk to virtual assistants?


A speech act analysis of dialogues of cognitively
impaired people and elderly people with a virtual
assistant
Irina Grishkova1, Ramin Yaghoubzadeh2, Stefan Kopp2, Constanze
Vorwerg1
1
University of Bern, Switzerland; 2 Bielefeld University, Germany
An artificial daily calendar assistant was developed to provide valuable support for people with special needs (Yaghoubzadeh et al.
2013). Users may interact differently when they communicate with an
artificial system. They normally tend to adapt their linguistic behavior
(Branigan et al. 2009), but different users may have different interaction styles (Wolters et al. 2009). In this study, we investigated how
people with cognitive impairments and elderly people talk to their
virtual assistant, focusing on pragmatic aspects: speech acts performed, and the linguistic means use to perform them.
A starting point of our analysis is the observation that the patterns
in which linguistic actions occur, and which provide socially shaped
potentials for achieving goals, (Ehlich, Rehbein 1979) are not necessarily linear, but often manifest characteristic recursivity, decision
points, supportive accessory patterns, and omissions of pattern elements (Griehaber 2001). In addition, the linguistic means used to
perform linguistic action units may vary considerably. We addressed
two questions: (1) What communication patterns between a human
and an artificial assistant occur in each of three groups of users
(elderly people, people with cognitive impairments, control group)
when making a request to enter an appointment? (2) What linguistic
forms are typically used by the three user groups for making those
requests? To answer these questions, we carried out a pragmatic
analysis of conversations between participants of these three groups
and the artificial assistant based on Searles speech act theory (Searle
1969; 1976), and techniques of the functional-pragmatic discourse
analysis (Griehaber 2001).
Three user groups participated in the study: cognitively impaired
people (A) were all participants had light to medium mental retardation (approximately F70-F71 on the APA DSM scale [American
Psychiatric Association 2000]), elderly people (B), and a control
group (C) (Yaghoubzadeh et. al. 2013). The participants were handed
cards with appointments and asked to plan the appointments for the
following week by speaking to the virtual assistant if it were a human
being. The assistant was presented on a TV screen and as being able
to understand the user and speak to him, using a Wizard-of-Oz
technique.
All interactions between the participants and the assistant were
recorded and transcribed. We split all dialogues in dialogue phases and
annotated the speech acts performed by both the human interlocutor and
the artificial assistant within a conversation. Therefore each dialogue
phase was split in minimal communication itemsspeech acts (Searle
1969), using a pattern oriented description (Hindelang 1994). For each
speech act, we provided its definition in terms of illocutionary force and
rules for performance (Searle 1969), as well as the complete list of
linguistic forms used in the conversations.
We modeled the structures of the pertinent dialogue phases
(greeting, making an appointment, farewell) for each of the three

123

Cogn Process (2014) 15 (Suppl 1):S1S158


groups, as sequence patterns in the form of network structures (with
speech acts as nodes and possible reactions as linking arrows). The
smallest units in these structures were the speech acts determined by
the definitions provided. Based on this, sequences of speech acts
were analyzed. We also investigated the range and frequency of
reactions found in the dialogues to a particular speech act. The
relative frequencies of speech act sequences were determined for
greeting and farewell phases as well for particular speech acts, such
as expressives and assertives, for each of the user groups. The
politeness of discourse was determined by number of expressive
speech acts and complexity of speech in terms of number of
assertive speech acts (used to specify a request or explain an
appointment) following a directive speech act.
Results show that the elderly interlocutors have a more complicated dialogue structure when communicating with an artificial
assistant. They use more assertive utterances like explaining,
repeating, specifications. Furthermore, we have found that some of
the elderly speakers use more different expressive speech acts, compared to cognitively impaired people, demonstrating more politeness
towards an artificial assistant.
The analysis of linguistic means has yielded a number of different
forms when requesting a virtual assistant to enter an appointment in
their virtual calendar. The linguistic forms used in the dialogues were
classified as I- and we-form, form of third person, or neutral form. The
most frequently used forms were I-form and neutral form. Participants
from A use the neutral form twice as much as the I-form. In contrast,
C users use the I-form twice as much as the neutral form. Participants
from B also use I-form most frequently, but in contrast to A or C, they
also use the we-form and the form of third person.
Altogether, the results show that there are no fundamental differences in dialogue patterns between groups; however, there is a
larger heterogeneity in the group A, and especially in the group B, as
compared to the group C. The group B does also seem to display a
larger diversity in linguistic means.
References
American Psychiatric Association (2000) Diagnostic and statistical
manual of mental disorders DSM-IV-TR, 4th ed. American
Psychiatric Publ., Arlington, VA
Branigan HP, Pickering JM, Pearson J, McLean JF (2010) Linguistic
alignment between people and computers. J Pragmat 42:
23552368
Ehlich K, Rehbein J (1979 a) Sprachliche Handlungsmuster. In:
Soeffner HG (Hrsg.) Interpretative Verfahren in den Sozial- und
Textwissenschaften. Metzler, Stuttgart, pp 243274
Griehaber W (2001) Verfahren und Tendenzen der funktionalpragmatischen Diskursanalyse. In: Ivanyi Z, Kertesz A (Hrsg.)
Gesprachsforschung. Tendenzen und Perspektiven. Peter Lang,
Frankfurt am Main, pp 7595
Hindelang G (1994) Sprechakttheoretische Dialoganalyse. In: Fritz G,
Hundsnurscher F (Hrsg.), Handbuch der Dialoganalyze. Niemeyer, Tubingen, pp 95112
berSearle J (1969) Sprechakte. Ein sprachphilosohischer Essay. U
setzt von Wiggershaus R und R. Suhrkamp Taschenbuch
Wissenschaft, Frankfurt am Main
Searle J (1976) A classification of illocutionary acts. Lang Soc
5(1):123
Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old
doesnt mean acting old: how older users interact with spoken
dialog systems. ACM Transa Accessible Comput 2(1):2
Yaghoubzadeh R, Kramer M, Pitsch K, Kopp S (2013) Virtual agents
as daily assistants for elderly or cognitively impaired people. In:
Intelligent virtual agents. Springer, Berlin, pp 7991

Cogn Process (2014) 15 (Suppl 1):S1S158

Effects of aging on shifts of attention in perihand space


Marc Grosjean1, Nathalie Le Bigot2
1
Leibniz Research Centre for Working Environment and Human
Factors, Dortmund, Germany; 2 University of Bretagne Occidentale
& CNRS (Lab-STICCUMR 6285), Brest, France
It is well established that visual processing is altered for stimuli that
appear near the hands, that is in perihand space (for a recent review,
see Brockmole et al. 2013). For example, placing ones hands near a
display has been shown to increase visual sensitivity (Dufour and
Touzalin 2008), enhance attentional engagement, such as the ability
to detect changes in dynamic displays (Tseng and Bridgeman 2011),
but also to slow down attentional disengagement, as evidenced by
longer search times when trying to find a target stimulus in a cluttered
display (Abrams et al. 2008). A number of studies suggest that these
hand-proximity effects, as they are known, are modulated by the
functionality of the hands and that visual processing is altered at
locations where action is more likely to occur (e.g., Le Bigot and
Grosjean 2012; Reed et al. 2010).
Although it is well documented that cognitive processing generally becomes slower and less accurate over the lifespan (e.g.,
Verhaegen and Salthouse 1997), hand-proximity effects have rarely
been studied with regard to aging. Of particular relevance for the
present study, sensorimotor abilities are also known to deteriorate
with age, especially for hand movements (Ranganath et al. 2001).
These age-related changes presumably reduce the overall functionality of the hands, which in turn could influence how visual
processing changes in perihand space. To test this notion, we sought
to examine whether visual processing, in general, and shifts of
attention, in particular, are affected by hand proximity in the same
way for younger and older individuals. In a covert-orienting task
(Posner 1980), younger (mean age \ 25 years) and older (mean
age [ 65 years) right-handed adults were asked to discriminate
between a target (letter) and distractor stimulus that could appear at a
peripheral left or right location. The stimulus was preceded by an
uninformative peripheral cue (stimulus-onset asynchrony = 100 ms)
that was presented either at the upcoming stimulus location (valid
trial) or at the opposite location (invalid trial). Participants performed
the task under four hand-position configurations: Left only, right only,
both hands, or no hands (control condition) near the display.
As expected, older adults were overall slower to respond than
younger adults, and both age groups showed a reliable cueing
effect: Responses were faster on valid than on invalid trials.
Interestingly, younger adults also revealed an interaction between
cue validity and hand position, which reflected that the cueing
effects were larger when their dominant hand was near the display.
The latter finding is in line with those of Llyod et al. (2010), who
also observed that involuntary shifts of attention are affected by
hand proximity (for younger adults) and that this effect seems to be
limited to the right (dominant) hand. More generally, these findings
suggest that hand proximity affects visual processing in different
ways for younger and older adults. This may reflect how the
functionality of the hands and peoples representation of peripersonal space changes when cognitive and motor skills become
slower and less accurate over the lifespan. Consistent with this
notion, it has been shown that older individuals tend to have a
more compressed representation of peripersonal space (Ghafouri
and Lestienne 2000) than younger adults and tend to spatially
allocate their attention more around the trunk of their body than
around their hands (Bloesch et al. 2013).
Both age groups also showed evidence of a right hemi-field
advantage (i.e., faster responses to stimuli presented to the right than
to the left of fixation), which is most likely due to a left-hemisphere
(right-hemifield) advantage in processing linguistic stimuli (Geffen
et al. 1971). However the latter effect was modulated by hand position

S41
for older adults only. In particular, the advantage was larger when
their dominant hand was near the display. These results further suggest that visual processing is differentially affected by hand proximity
in younger and older adults. In contrast to younger adults, which
showed an effect of hand proximity on the involuntary shifting of
attention, hand position seems to only affect the attentional prioritization of space in older adults (Reed et al. 2006).
References
Abrams RA, Davoli CC, Du F et al. (2008) Altered vision near the
hands. Cognition 107:10351047
Bloesch EK, Davoli CC, Abrams RA (2013) Age-related changes in
attentional reference frames for peripersonal space. Psychol Sci
24:557561
Brockmole JR, Davoli CC, Abrams RA, Witt JK (2013) The world
within reach: effects of hand posture and tool-use on visual
cognition. Curr Direction Psychol Sci 22:3844
Dufour A, Touzalin P (2008) Improved visual sensitivity in the
perihand space. Exp Brain Res 190:9198
Geffen G, Bradshaw JL, Wallace G (1971) Interhemispheric effects
on reaction time to verbal and nonverbal visual stimuli. J Exp
Psychol 87:415422
Ghafouri M, Lestienne FG (2000) Altered representation of peripersonal space in the elderly human subject: a sensorimotor
approach. Neurosci Lett 289:193196
Le Bigot N, Grosjean M (2012) Effects of handedness on visual
sensitivity in perihand space. PLoS ONE 7(8): e43150
Llyod DM, Azanon E, Poliakoff E (2010) Right hand presence
modulates shifts of exogenous visuospatial attention in near
perihand space. Brain Cogn 73:102109
Posner MI (1980) Orienting of attention. Quart J Exp Psychol
32:325
Ranganath VK, Siemionow V, Sahgal VS, Yue GH (2001) Effects of
aging on hand function. J Am Geriatrics Soc 49:14781484
Reed CL, Betz R, Garza JP, Roberts RJ Jr (2010) Grab it! Biased
attention in functional hand and tool space. Attention Perception
Psychophys 72:236245
Reed CL, Grubb JD, Steele C (2006) Hands up: attentional prioritization of space near the hand. J Exp Psychol Human Percept
Performance 32:166177
Tseng P, Bridgeman B (2011) Improved change detection with nearby
hands. Exp Brain Res 209:257269
Verhaegen P, Salthouse TA (1997) Meta-analyzes of age-cognition
relations in adulthood: estimates of linear and nonlinear age
effects and structural models. Psychol Bull 122:231249

The fate of previously focused working memory


content: decay or/and inhibition?
Johannes Groer, Markus Janczyk
Department of Psychology III, University of Wurzburg, Germany
Working memory is thought to allow short term storage of information in a state in which this information can be manipulated by
ongoing cognitive processes. Evidence from various paradigms suggests that at any time only one item held in working memory is
selected for possible manipulation. Oberauer (2002) has thus suggested a 1-item focus of attention within his model of working
memory. Conceivably, this focus of attention needs to shift between
several items during task performance and the following question is
unresolved: What happens to a formerly selected, but now de-selected, item?
Several studies have addressed this question, with opposing
results. Bao, Li, Chen, and Zhang (2006) investigated verbal

123

S42
working memory with an updating task where participants count the
number of occurrences of (three) different sequentially presented
geometric objects (e.g., Garavan 1998; see also Janczyk, Grabowski
2011). In particular, they employed the logic typically used to show
n - 2 repetition costs in task-switching experiments and found
slower updating in ABA than in CBA sequences, i.e., evidence for
an active inhibition of de-selected items (but see Janczyk, Wienrich,
Kunde 2008, for no signs of inhibition with a different paradigm).
Rerko and Oberauer (2013) investigated visual working memory
with the retro-cue paradigm. Participants first learned an array of
briefly presented colored items. Long after encoding, one, two, or
three retro-cues (arrows) were presented one after another, with
always the last one pointing to the particular location that is subsequently tested with a change detection task. (The retro-cue effect
refers to the finding of improved performance after valid compared
with neutral cues.) In the critical condition, Rerko and Oberauer
presented three retro-cues to employ the n - 2 repetition logic and
found evidence for passive decay of de-selected items. These
diverging results obviously come with many differences between
experiments: verbal vs. visual working memory, three working items
vs. six working items, two different groups of participants, and so
on. Here we present ongoing work aiming at identifying the critical
factor(s).
As a first step, we attempted to replicate the results of Bao et al.
(2006) and Rerko and Oberauer (2013) within one sample of participants. A group of n = 24 students took part in two experiments
(we excluded participants with less than 65 % correct trials; 10 in
Exp 1/3 in Exp 2). In Experiment 1, participants performed in a
three-objects updating task and we compared performance in ABA
and CBA trials. ABA trials yielded longer RTs (see Fig. 1, left
panel), thus pointing to inhibitory mechanisms just as Bao et al.
(2006) reported. In Experiment 2, participants performed in a retrocue task with 1, 2, or 3 retro-cues presented one after another. Most
importantly, in the 3 retro-cue condition the cues either pointed to
three different locations (CBA) or the first and the third cue pointed
to the same location (ABA). We did not observe a difference in
accuracy in this case, but RTs were longer in CBA than in ABA
trials (see Fig. 1, right panel), thus pointing to passive decay but not
to inhibitory mechanisms.
After all, with one single sample of participants we were able to
largely replicate the diverging results from two tasks that were
designed to answer the same research question. Given this, it appears
worthwhile to us to continue this work and to isolate critical factors.
This work is currently in progress.

Fig. 1 Response times (RT) in milliseconds (ms) of Experiments 1


and 2 as a function of trial sequence (CBA [control] vs. ABA
[inhibition])

123

Cogn Process (2014) 15 (Suppl 1):S1S158


References
Bao M, Li ZH, Chen XC, Zhang DR (2006) Backward inhibition in a
task of switching attention within verbal working memory. Brain
Res Bull 69: 214221
Garavan H (1998) Serial attention within working memory. Mem
Cogn 26:263276
Janczyk M, Grabowski J (2011) The focus of attention in working
memory: evidence from a word updating task. Memory 19:211225
Janczyk M, Wienrich C, Kunde W (2008) On the costs of refocusing
items in working memory: a matter of inhibition or decay?
Memory 16:374385
Oberauer K (2002) Access to information in working memory:
exploring the focus of attention. J Exp Psychol Learn 28:411421
Rerko, L., Oberauer, K. (2013) Focused, unfocused, and defocused
information in working memory. J Exp Psychol Learn 39
10751096

How global visual landmarks influence the recognition


of a city
Kai Hamburger, Cate Marie Trillmich, Franziska Baier,
Christian Wolf, Florian Roser
University of Giessen, Giessen, Germany
Abstract
What happens if characteristic landmarks are taken out of a city scene
or being interchanged? Are we still able to recognize the city scene
itself or are we fooled by the missing or misleading information?
What information is then represented in our mind and how? Findings
are discussed with respect to attentional capture and decision making.
Keywords
Spatial cognition, Visual landmarks, Recognition, Attention, Decision
making
Introduction
Famous cities are hard to recognize if the characteristic global landmark is taken out of the city scene. In this context we define a global
landmark as a (famous) building that may be used for orientation
purposes from multiple viewpoints (however, other objects such as
trees, mountains, rivers, etc. may also represent landmarks). Here, we
focus on visual information processing and show that a global landmark in form of a famous building by itself does not necessarily lead to
successful recognition of major city scenes. Thus, we assume that the
landmark (object) alone is very helpful for spatial representations and
spatial orientation, but the context/surrounding (city scene) is often
required for a full and correct mental representation. Thus, the isolated
objects sometimes lead to inappropriate mental representations and
may also lead us totally astray, especially when they are interchanged.
Evans et al. (1984) stated that landmarks and the pathways grid
configuration facilitates geographic knowledge and that especially
visual landmarks improve comprehension of place locations. But, the
authors also noted that manipulations of the grid configuration and
landmark placement in a simulated environment setting cause changes in environmental knowledge.
According to Clerici and Mironowicz (2009) it is important to
distinguish between landmarks acting as markers, which could
therefore be replaced by direction signs and indicators, and landmarks
acting as marks and brands of a specific city, which can be considered
as a key factor for the quality of urban life (e.g., Big Ben in London or
Golden Gate Bridge in San Francisco). So, what are the relevant
visual information characterizing a city scene?
Methods
The experiment to examine the influence of a famous landmark on city
recognition was conducted on a standard PC presenting the different

Cogn Process (2014) 15 (Suppl 1):S1S158

S43

Fig. 1 Original (left): city scenes of Berlin with the TV Tower (Alex) and Paris with the Eiffel Tower; Modified (center and right): without
TV and Eiffel Tower, and (right) Berlin with the Eiffel Tower of Paris and vice versa

combinations of (isolated/interchanged) landmarks and their corresponding cities. Each city scene/landmark only occurred once (betweensubject factor). Participants were assigned to the different combinations
randomly. An example is given in Fig. 1, while Table 1 presents the
questions raised with all further experimental details and results.
Results
To summarize the results: 1. In general, many city scenes (46 %)
could be identified correctly if landmark and surrounding were a
match (original city scene); 2. Participants had severe difficulties
recognizing some of the given cities when the characteristic landmark
was missing (e.g., Berlin without the TV Tower, Paris without the
Eiffel Tower, Sydney without the opera); 3. Some cities could still be
recognized very well without the characteristic landmark (London,
Venice); and 4. Most participants were totally fooled when other
(deceptive) landmarks were shown instead of the original ones.
Discussion
We demonstrate that a city scene without a characteristic global landmark may be recognized correctly in some cases and wrongly in others;
while an object presented in a new context may lead to incorrect or
inappropriate information retrieval from memory (semantic network).

Presented in a different context the most prominent landmark is more


important (e.g., dominates the decision/judgment) than its immediate
surroundings (including other potential landmarks and landscapes, e.g.,
mountains). But, sometimes the city scene seems to contain more
important information than just one characteristic landmark and it can
still be recognized successfully without it (e.g., London, Venice).
In our experiment, the object pops out from the city scene and captures
our attention (bottom-up). This attentional capture might prevent that
information from the visual scene/surrounding city is considered for
recognition. The recognition process is therefore only based on information about the deceptive landmark (top-down). In this case, the
attentional capture might be caused by the high contextual salience of the
landmark (Caduff, Timpf 2008) as it is clearly distinguishable from the
rest of the scenery. This phenomenon could as well be explained within a
semantic network with two contradicting associations: One is based on
the deceptive landmark while the other is based on the surroundings. The
attentional capture on the deceptive landmark inhibits any information of
the further city scene to be considered for recognition.
Another possible interpretation could come from the research field of
decision making: According to dual process theories (type 1 versus type 2

Table 1 Research questions and results for the 31 observers


Experimental design and results
Category

Example

Questions

Result (%, ms)

Cities and
landmarks

Paris with Eiffel Tower


(Figure 1 bottom, left)

1. Do you know this city? (affirmations [%])

64 % 2,386 ms

2. What is the name of the city? (correct labeling [%])

46 % 1,887 ms

3. How confident are you with your answer? (Scale from 1 to 7;


1 = very confident7 = very insecure/unsecure)

2.10

1. Do you know this city? (affirmations [%])

35 % 2,801 ms

2. What is the name of the city? (correct labeling [%])

19 % 2,037 ms

3. How confident are you with your answer? (Scale from 1 to 7;


1 = very confident)

2.83

1. Do you know this city? (affirmations [%])

50 % 3,268 ms

2. What is the name of the city? (correct labeling [%])

8%

1,982 ms

3. How confident are you with your answer? (Scale from 1 to 7;


1 = very confident7 = very insecure/unsecure)

3.05

2,864 ms

Correct labeling the city the landmark is really located in [%]

31 %

Cities without
landmarks

Cities with
deceptive
landmarks

Paris without Eiffel Tower


(Figure 1 bottom, middle)

Paris with TV Tower Berlin


(Figure 1 bottom, right)

2,024 ms

2,744 ms

Participants answered three questions in the three conditions. N = 31 (students of the University of Giessen) 18f:13 m, mean age: 25 years
(SD = 4.4)

123

S44
processing), decisions (here: what city is represented?) could be made
consciously and unconsciously (e.g., Markic 2009). One key aspect of the
unconscious, automatic process is associative learning (Evans 2003),
which might explain that a single landmark stores all of the relevant
information for the context (object = city = explicit knowledge). This
experiment shows some important connections between perception and
recognition of spatial information on one side and theories of attention
and decision making on the other. This could serve as a valuable basis for
future research on visuo-spatial information processing.
References
Caduff D, Timpf S (2008) On the assessment of landmark salience for
human wayfinding. Cogn Process 9(4):249267
Clerici A, Mironowicz I (2009) Are landmarks essential to the city
its development? In: Schrenk M, Popovich V V, Engelke D,
Elisei P (eds) REAL CORP 2009: Cities 3.0Smart, sustainable,
integrative: strategies, concepts and technologies for planning the
urban future. Eigenverlag des Vereins CORPCompetence
Center of Urban and Regional Planning, pp 2332
Evans J St B (2003) In two minds: dual-process accounts of reasoning
Cogn Sci 7(10): 454458
Evans G W, Skorpanich M A, Garling T, Bryant K J, Bresolin B (1984)
The effects of pathway configuration, landmarks and stress on
environmental cognition. J Exp Psychol 4:323335
Markic O (2009) Rationality and emotions in decision making. Interdiscip Descrip Complex Syst 7(2):5464

Explicit place-labeling supports spatial knowledge


in survey, but not in route navigation
Gregor Hardiess, Marc Halfmann, Hanspeter Mallot
Cognitive Neuroscience, University of Tubingen, Germany
The knowledge about the navigational space develops with landmark
and route knowledge as the precursors of survey (map-like) knowledge (Siegel, White 1975)a scheme that is widely accepted as the
dominant framework. Route knowledge is typically based on an
egocentric reference frame and learning a route is simply forming
place-action associations between locations (places) and the actions to
take in the sequence of the route. On the other hand, in survey
knowledge, places need to be represented independently of viewing
direction and position. Furthermore, survey representations include
configural knowledge about the relations (topologic, action-based, or
graph-like) between the places in the environment. In wayfinding, it
seems that navigators can draw upon different memory representations and formats of spatial knowledge depending on the task at hand
and the time available for learning.
The hierarchy of spatial representation comprises different levels
of granularity. At the finest level, the recognition of landmarks (i.e.,
salient and permanent patterns or objects, available in the environment) has to be considered. Grouping spatially related landmarks
together leads to the concept of a place, the fundamental unit of routes
and maps. Building a route involves the connection of places with the
corresponding spatial behavior. At this intermediate level, several
routes can exit in parallel also with spatial overlap but without
interactions to each other (Mallot, Basten 2009). Route combination
occurs first at the level of survey representations. Here, the embedding of places as well as routes in a so called cognitive map as a
configural representation of the environment enables the creation of
novel routes and shortcuts to find the goal (Tolman 1948). On top of
the hierarchy, the coarsest level of granularity is provided by the
formation of regions (Wiener, Mallot 2003), where spatially related

123

Cogn Process (2014) 15 (Suppl 1):S1S158


parts of the map cluster together. Depending on task demands and the
time available for spatial learning, the coding of space can be supported at each of the levels of granularity or in combination.
The interaction of language and space has been studied on a wide
variety of aspects including the acquisition of spatial knowledge from
verbal descriptions, verbal direction giving, influences of spatial reference frames which are employed in specific languages on the
judgment of similarity of spatial configurations, or retrospective reports
of spatial thinking. Little is known, however, about possible functions
of language-based or language-supported representations in actual
navigational, or wayfinding behavior. In a dual task study, Meilinger
et al. (2008) showed that verbal distractor tasks are more detrimental to
route navigation than distractor tasks involving visual imagery or spatial hearing. In an ongoing study, Meilinger et al. (2009) investigate
advantages of different types of verbal place codes, i.e. names
describing local landmarks vs. arbitrary names. In this study, descriptive naming leads to better navigational results than arbitrary naming.
In this study, the role of language-supported representations of
space was assessed in two wayfinding experiments (using virtual
reality) with labeling of places using a route and a survey knowledge
task, respectively. In the association phase of both tasks, subjects are
requested to label the places either with semantically meaningful
names (word condition) or icons (icon condition) to build up a link
between sensorimotor and language representation. In a control
condition no labeling was required. In the route task, subjects simply
learned to repeat a route (containing 10 places) from a given starting
point to a goal location in a stereotyped way (route phase). In the
survey task, subjects first had to learn a set of four intersecting routes
(containing 45 places) and then are asked to infer four novel routes
by recombining sections of learned routes (survey phase). Wayfinding performance was assessed by the distance subjects travelled
to find the goal in PAO (percentage above optimum). Overall, we
found no differences between word-based and icon-based labeling.
Labeling supported wayfinding not in in the route (no effect of label
condition on distance), but in the survey knowledge task. There,
subjects performed the survey phase in the word as well as in the
icon condition with reduced walking compared to the control condition. Furthermore, this supporting effect was more pronounced in
subjects with good wayfinding scores. We conclude that the associated place-labels supported the formation of abstract place
concepts and further the inference of novel routes from known route
segments which are useful in the more complex (higher hierarchy
and representational level) survey, but not in the simple route tasks
where just stereotyped stimulusresponse associations without
planning are needed.
References
Mallot HA, Basten K (2009) Embodied spatial cognition: Biological
and artificial systems. Image Vision Comput 27(11):16581670
Meilinger T, Knauff M, Bulthoff HH (2008) Working memory in
wayfindinga dual task experiment in a virtual city. Cogn Sci
32(4):755770
Meilinger T, Schulte-Pelkum J, Frankenstein J, Laharnar N, Hardiess
G, Mallot HA, Bulthoff HH (2009) Place namingexamining
the influence of language on wayfinding. In: Taatgen N, van Rijn
H (eds) Proceedings of the thirty-first annual conference of the
cognitive science society. Cognitive Science Society
Siegel AW, White SH (1975) The development of spatial representations of largescale environments. Adv Child Dev Behav 10:955
Tolman EC (1948) Cognitive maps in rats and man. Psychol Rev
55:189208
Wiener JM, Mallot HA (2003) Fine-to-coarse route planning and
navigation in regionalized environments. Spatial Cogn Comput
3(4):331358

Cogn Process (2014) 15 (Suppl 1):S1S158

How important is having emotions for understanding


others emotions accurately?
Larissa Heege, Albert Newen
Ruhr-University Bochum, Germany
Mirror neuron theory for understanding others emotions
According to the research group, which discovered mirror neurons in
Parma, emotions can be understood through cognitive elaborations of
visual emotional expressions and without a major involvement of
mirror neuron mechanisms. They assume, though, that this provides
only a pale and detached account of others emotions (Rizzolatti
et al. 2004):
It is likely that the direct viscero-motor mechanism scaffolds the
cognitive description, and when the former mechanism is not present
or malfunctioning, the latter provides only a pale, detached account of
the emotions of others. (Rizzolatti et al. 2004).
Mirror neurons in reference to emotions are neurons that fire when
we have an emotion as well as when we observe somebody else
having the same emotion. It is assumed that mirror neuron mechanisms evoke in the observer an understanding of others emotions,
which is based on resonances of the observers emotions. This way an
automatic first-person-understanding of others emotions is originated
(Rizzolatti and Sinigaglia 2012; Rizzolatti et al. 2004):
Side by side with the sensory description of the observed social
stimuli, internal representations of the state associated with these []
emotions are evoked in the observer, as if they [] were experiencing a similar emotion. (Rizzolatti et al. 2004).
Thus somebody, who is not able to have a specific emotion, would
also not be able to have a first person as if understanding of this
emotion in others. Resonances of own emotions could not be produced, the mirror neuron mechanism would not be present or could
not work appropriately. If this person used instead primarily cognitive
elaborations to understand this emotion in others, his emotion
understanding should be pale and detached, according to mirror
neuron theory.
Psychopaths and having the emotion of fear
Primary (low-anxious) psychopaths, demonstrated in the PANAS
(positive affect, negative affect scales) a significant negative correlation with having the emotion of fear (-.297) (Del Gaizo and
Falkenbach 2008). Furthermore, an experiment showed that psychopaths, in contrast to non-psychopaths, do not get anxious when they
breathe in the stress sweat of other people (Dutton 2013). Psychopaths
also have a reduced amygdala activity (Gordon et al. 2004) and a
reduced startle response (Herpertz et al. 2001).
Psychopaths and understanding fear in others
In a study 24 photographs showing different facial expressions
(happy, sad, fearful, angry, disgusted and neutral) were presented to
psychopathic inmates and non-psychopaths. The psychopathic
inmates demonstrated a greater skill in recognizing fearful faces than
the non-psychopaths (Book et al. 2007):
[A] general tendency for psychopathy [is] to be positively associated with increased accuracy in judging emotional intensity for
facial expressions in general and, more specifically, for fearful faces.
(Book et al. 2007).
Psychopaths also identify bodily expressions, which are based on
fear/anxiety, significantly better than non-psychopaths: Ted Bundy, a
psychopathic serial killer, stated that he could identify a good victim
due to her gait. Relating to this statement, in a study twelve videos of
people walking through a corridor were shown to psychopaths and
non-psychopaths; six of the walking people had been victims in their
past. The psychopaths and non-psychopaths had to decide how likely

S45
the persons in the videos were to get mugged. The study found a
robust, positive correlation between primary (low-anxious) psychopathic traits and accuracy in naming the persons, who had been
victims in their past, to be the ones most likely to get mugged. Secondary (high-anxious) psychopaths did not demonstrate such a skill
(Wheeler et al. 2009).
In a similar study five students had to walk through a lecture hall,
in front of other students with many and few psychopathic traits. One
of the walking students carried a hidden handkerchief. The students
with many and few psychopathic traits had to guess who hid the
handkerchief. Seventy percent of the students with many psychopathic traits named the right student; of the students with few
psychopathic traits just thirty percent named the student with the
handkerchief (Dutton 2013).
In another study, people with many psychopathic traits, showed a
decreased amygdala activity during emotion-recognizing tasks. The
people with primary psychopathic traits showed also an increased
activity in the visual and the dorsolateral prefrontal cortex. So primary psychopaths use much more brain areas, which are associated
with cognition and perception, when they solve emotion-recognizing
tasks (Gordon et al. 2004).
Conclusions
Primary psychopaths use primarily cognitive elaborations to understand others emotions and (almost) do not have the emotion of fear.
Thus according to mirror neuron theorists, psychopaths should have
a pale, detached account of fear in others (see end of first
paragraph).
Psychopaths are surely not able to have a first person as if
understanding of others fear: They cannot feel fear with others. In
this way it is possible to say that psychopaths have a pale, detached
account of others emotions.
However, it cannot be said that the outcome of their understanding
of others fear is pale and detached. In fact, they recognize others
fear often more accurately than people, who are able to have fear.
Also we can conclude that (at least for psychopaths) having an
emotion is not important for understanding this emotion in others
accurately.
References
Book AS, Quinsey VL, Langford D (2007) Psychopathy and the
perception of affect and vulnerability. Crim Justice Behav
34(4):531544. doi:10.1177/0093854806293554
Del Gaizo AL, Falkenbach DM (2008) Primary and secondary psychopathic-traits and their relationship to perception and
experience of emotion. Pers Indiv Differ 45:206212. doi:
10.1016/j.paid.2008.03.019
Dutton K (2013) The wisdom of psychopaths. Arrow, London
Gordon HL, Baird AA, End A (2004) Functional differences among
those high and low on a trait measure of psychopathy. Biol
Psychiatry 56(7):516521. doi:10.1016/j.biopsych.2004.06.030
Herpertz SC, Werth U et al. (2001) Emotion in criminal offenders
with psychopathy and borderline personality disorder. Arch Gen
Psychiatry 58:737745. doi:10.1001/archpsyc.58.8.737
Rizzolatti G, Sinigaglia C (2008) Mirrors in the brain. Oxford University Press, Oxford
Wheeler S, Book AS, Costello K (2009) Psychopathic traits and
perceptions of victim vulnerability. Crim Justice Behav
36:635648. doi:10.1177/0093854809333958
Rizzolatti G, Gallese V, Keysers C (2004) A unifying view of the
basis of social cognition. Trends Cogn Sci 8(9):396403. doi:
10.1016/j.tics.2004.07.002

123

S46

Prosody conveys speakers intentions: acoustic cues


for speech act perception
Nele Hellbernd, Daniela Sammler
Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany
Recent years have seen a major change in views on language and language use. During the last decades, language use has been more and
more recognized as an intentional action (Grice 1957). In the form of
speech acts (Austin 1962; Searle 1969), language expresses the
speakers attitudes and communicative intents to shape the listeners
reaction. Notably, the speakers intention is often not directly coded in
the lexical meaning of a sentence, but rather conveyed implicitly, for
example via nonverbal cues such as mimics, body posture, and speech
prosody. The theoretical work of intonational phonologists seeking to
define the meaning of specific vocal intonation profiles (Bolinger 1986;
Kohler 1991) demonstrates the role of prosody in conveying the
speakers conversational goal. However, to date only little is known
about the neurocognitive architecture underlying the comprehension of
communicative intents in general (Holtgraves 2005; Egorova, Shtyrov,
Pulvermuller 2013), and the distinctive role of prosody in particular.
The present study aimed, therefore, to investigate this interpersonal role of prosody in conveying the speakers intents and its
underlying acoustic properties. Taking speech act theory as a
framework for intention in language (Austin 1962; Searle 1969), we
created a novel set of short (non-)word utterances intoned to express
different speech acts. Adopting an approach from emotional prosody
research (Banse, Scherer 1996; Sauter, Eisner, Calder, Scott 2010),
this stimulus set was employed in a combination of behavioral ratings
and acoustic analyzes to test the following hypotheses: If prosody
codes for the communicative intention of the speaker, we expect 1)
above-chance behavioral recognition of different intentions that are
merely expressed via prosody, 2) acoustic markers in the prosody that
identify these intentions, and 3) independence of acoustics and
behavior from the overt lexical meaning of the utterance.
The German words Bier (beer) and Bar (bar) and the nonwords Diem and Dahm were recorded from four (two female)
speakers expressing six different speech acts in their prosodycriticism, wish (expressives), warning, suggestion (directives), doubt, and
naming (assertives). Acoustic features for pitch, duration, intensity,
and spectral features were extracted with PRAAT. These measures
were subjected to discriminant analyzesseparately for words and
non-wordsin order to test whether the acoustic features have
enough discriminant power to classify the stimuli to their corresponding speech act category. Furthermore, 20 participants were
tested for the behavioral recognition of the speech act categories with
a 6 alternative-forced-choice task. Finally, a new group of 40 participants performed subjective ratings of the different speech acts (e.g.
How much does the stimulus sound like criticism?) to obtain more
detailed information on the perception of different intentions and
allow, as quantitative variable, further analyzes in combination with
the acoustic measures.
The discriminant analyzes of the acoustic features yielded high
above chance predictions for each speech act category, with an
overall classification accuracy of about 90 % for both words and nonwords (chance level: 17 %). Likewise, participants were behaviorally
very well able to classify the stimuli into the correct category, with a
slightly lower accuracy for non-words (73 %) than for words (81 %).
Multiple regression analyzes of participants ratings of the different
speech acts and the acoustic measures further identified distinct patterns of physical features that were able to predict the behavioral
perception.
These findings indicate that prosodic cues convey sufficient detail
to classify short (non-)word utterances according to their underlying
intention, at acoustic as well as perceptual levels. Lexical meaning

123

Cogn Process (2014) 15 (Suppl 1):S1S158


seems to be supportive but not necessary for the comprehension of
different intentions, given that participants showed a high performance for the non-words, but scored higher for the words. In total, our
results show that prosodic cues are powerful indicators for the
speakers intentions in interpersonal communication. The present
carefully constructed stimulus set will serve as a useful tool to study
the neural correlates of intentional prosody in the future.
References
Austin JL (1962) How to do things with words. Oxford University
Press, Oxford
Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion
expression. J Pers Soc Psychol 70(3):614636
Bolinger D (1986) Intonation and its parts: melody in spoken English.
Stanford University Press, Stanford
Egorova N, Shtyrov Y, Pulvermuller F (2013) Early parallel processing of pragmatic and semantic information in speech acts:
neurophysiological evidence. Front Human Neurosci 7
Grice HP (1957) Meaning. Philos Rev 66(3):377388
Holtgraves T (2005) The production and perception of implicit performatives. J Pragm 37(12):20242034
Kohler KJ (ed) (1991) Studies in German intonation (No. 25). Institut
fur Phonetik und digitale Sprachverarbeitung, Universitat Kiel.
Sauter D, Eisner F, Calder A, Scott S (2010) Perceptual cues in
nonverbal vocal expressions of emotion. Quart J Exp Psychol
63(11):22512272
Searle JR (1969) Speech acts: an essay in the philosophy of language.
Cambridge University Press, Cambridge

On the perception and processing of social actions


Matthias R. Hohmann, Stephan de La Rosa, Heinrich H. Bulthoff
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Action recognition research has mainly focused on investigating the
perceptual processes in the recognition of isolated actions from biological motion patterns. Surprisingly little is known about the
cognitive representation underlying action recognition. A fundamental question concerns whether actions are represented
independently or interdependently. Here we examined, whether
cognitive representation of static (action image) and dynamic (action
movie) actions are dependent on each other and whether cognitive
representations for static and dynamic actions overlap.
Adaptation paradigms are an elegant way to examine the presence
of relationship between different cognitive representations. In an
adaptation experiment, participants view a stimulus, the adaptor, for a
prolonged amount of time and afterwards report their perception of a
second, ambiguous test stimulus. Typically, the perception of the
second stimulus will be biased away from the adaptor stimulus. The
presence of an antagonistic perceptual bias (adaptation effect) is often
taken as evidence for the interdependency of the cognitive representation between test and adaptor stimulus.
We manipulated the dynamic content (dynamic vs. static) of the
test and adaptor stimulus independently. The ambiguous test stimulus
was created by a weighted linear morph between the spatial positions
of the two adapting actions (hand shake, high five). 30 participants
categorized the ambiguous dynamic or static action stimuli after being
adapted to dynamic or static actions. Afterwards, we calculated the
perceptual bias for each participant by fitting a psychometric function
to the data. We found an action-adaptation after-effect in some but not
all experimental conditions. Specifically, the effect was only present
if the presentation of the adaptor and the test stimulus was congruent,
i.e. if both were presented in either a dynamic or a static manner
(p \ 0.001). This action-adaptation after-effect indicates a

Cogn Process (2014) 15 (Suppl 1):S1S158


dependency between cognitive representations when adaptor and test
stimuli have the same dynamic content (i.e. both static or dynamic).
Future studies are needed to relate those results to other findings in the
field of action recognition and to incorporate a neurophysiological
perspective.

Stage-level and individual-level interpretation


of multiple adnominal adjectives
as an epiphenomenontheoretical and empirical
evidence
Sven Kotowski, Holden Hartl
Institut fur Anglistik und Amerikanistik, Universitat Kassel, Germany
As observed by various authors (among others Bolinger 1967; Cinque
2010; Larson 1998), certain adjectives in several languages are
semantically ambiguous in different adnominal positions. These
ambiguities concern semantic oppositions, such as intersective vs.
non-intersective, restrictive vs. non-restrictive, or individual-level vs.
stage-level. Thus, the time-honored examples in (1a/b) are argued to
have two distinct interpretations:
(1) a. the visible stars
b. the stars visible
In (1a), visible can either have an occasion/stage-level (SL) or a
characterizing/individual-level (IL) reading. The postnominal adjective in (1b), however, is non-ambiguous and allows for the SL-reading
only (cf. Kratzer 1995 for test environments). Furthermore, when the
same adjective occurs twice prenominally (2), the two interpretations are linked to rigidly ordered positions (cf. Cinque 2010; Larson
1998):
(2) the visible[SL] visible[IL] stars
In this paper, we argue that the order of multiple prenominal
adjectives in German (and possibly cross-linguistically) cannot be
derived on the basis of an inherent dichotomy between SL- and ILpredicates, but requires a more general analysis of adnominal adjective
order. SL and IL are not intrinsically ordered along the lines of (2), i.e.
SL  IL. Rather, they are found in this very order due to different
adjectival functions in a layered structure around the NPs head.
Crucially, in such adjective doublets, the second adjective always
receives a generic reading, i.e. the [A2N] in such [A1[A2N]] expressions functions as a complex common name that denotes a subkind of
the kind denoted by the head noun (Gunkel, Zifonun 2009): In (1)/(2)
above, if star denotes the kind STAR, (1a) is ambiguous between a
subkind and a qualifying reading, while in (2) the cluster visible2 stars
is interpreted as a subkind VISIBLE STAR and thus disambiguated.
Accordingly, we assume that doublets increase in general acceptability
if A2Ns fulfil establishedness-conditions and pass tests with kindselecting predicates (like INVENT etc.; see e.g. Krifka et al. 1995).
For example, a taxonomic subkind reading is triggered by the indefinite NP in (3a), while no such downward-projecting taxonomic
inference occurs for non-established complex expressions (3b):
(3) a. John invented a potato peeler. ? a kind of POTATO
PEELER
b. John invented a cauliflower steamer. ? the kind CAULIFLOWER STEAMER
As regards general restrictions on adnominal adjective order, we
assert a lack of descriptive adequacy for purely formal/syntactic (in
particular cartographic) as well as for purely semantic and/or communicative-functional NP models. Instead, we argue prenominal
adjective sequences to include at least three distinct semantic-syntactic layers: a classifying (CLAS; e.g. relational adjectives like
musical), an absolute-qualifying (QA; e.g. basic color terms), and a
relative-qualifying (QR; e.g. dimensional adjectives) one. The former
two share certain semantic and morphosyntactic features (-gradable),

S47
yet are set apart with respect to possible occurrence in predicative
position. The latter twos relation shows the reverse characteristics
(both +predicative use, yet differ in gradability). Adjective order at
the right prenominal edge of Germanic NPs tends to follow the
sequence: QR  QA  CLAS  N. At the same time, classifying
adjectives (either inherent classifiers such as relational adjectives or
other adjectives functioning as classifiers in established names) typically function as modifiers of complex namesjust as in, e.g., NNcompounds, where the modifying N is non-referential, CLAS-adjectives do not locate the NP-referent spatio-temporally but classify it as
a member of a certain kind. Therefore, the IL-interpretation of A2 in
e.g. (2) is an epiphenomenon of more global constraints on modifier
orderin doublets they are interpreted as CLAS, a layer for which
SL-interpretations are not available.
To test our hypothesis, we conducted two questionnaire studies on
German adjective order. Study 1 was a 100-split task designed to test
subjects order preferences when confronted with polysemous
adjectives in combination with another adjective (i.e. either a timestable (e.g. wild & [not domesticated]) or a temporary reading
(wild & [furious]) in combination with a time-stable adjective, e.g.
big). Introductory context paragraphs promoted both adjectives
readings within an item. Subjects had to distribute 100 points over
two possible follow-up sentences, with alternating A1-A2-orders,
according to which sentence was more natural given the context.
Crucially, time-stable AN-syntagms did not denote established kinds,
i.e. the task tried to elicit order preferences based on a potential ILSL-distinction only. While control group items following clear-cut
order regularities described in the literature irrespective of temporality, e.g. small French car, score significantly better than either of
the above test categories, differences between IL- and SL-categories
are clearly insignificant.
In a follow-up study conducted currently, subjects are presented
with introductory sentences containing AAN-clusters not further
specified as regards interpretation. Again, alternating adjectival
senses are utilized. In each test sentence one A is a deverbal ending
in bar (the rough German equivalent of English ible/able; e.g.
ausziehbar extendable), which displays a systematic ambiguity
between an occasion and a habitual reading (Motsch 2004). Combined with respective nouns, these adjectives in one reading encode
established kinds (e.g. ausziehbarer Tisch pull-out table; CLAS
modification), while the respective second adjective encodes a timestable property that does not exhibit a kind reading in an ANsyntagm (e.g. blauer ausziehbarer Tisch blue pull-out table).
Subjects are then asked to rate follow-up sentences according to
their naturalness as discourse continuationsthese systematically violate the occasion reading and we hypothesize that
continuations will score higher for [A=KIND AKIND N] than for
[AOCCASION(POTENTIAL KIND) A=KIND N] expressions. Should this
hypothesis be confirmed, we take the resultstogether with the
findings from study 1as support for the above reasoning that
observed adjective interpretations as in (2) do not derive primarily
from a grammatical distinction between IL- and SL-predicates, but
need to be understood as an epiphenomenon of more general constraints on adjective order and kind reference.
References
Bolinger D (1967) Adjectives in English: attribution and predication.
Lingua 18:134
Cinque G (2010) The syntax of adjectives. A comparative study. MIT
Press, Cambridge, MA
Fernald T (2000) Predicates and temporal arguments. Oxford University Press, Oxford.
Kratzer A (1995) Stage-level and Individual-level Predicates. In:
Carlson GN, Pelletier FJ (eds) The generic book. The University
of Chicago Press, Chicago, pp 125175

123

S48
Krifka M, Pelletier FJ, Carlson GN; ter Meulen A, Chierchia G, Link
G (1995) Genericity: an introduction. In: Carlson GN, Pelletier FJ
(eds) The generic book. The University of Chicago Press, Chicago, pp 1124
Larson R (1998) Events and modification in nominals. In: Strolovitch
D, Lawson A (eds) Proceedings from semantics and linguistic
theory (Salt) VIII, Cornell University Press, Ithaca, pp 145168
Motsch W (2004) Deutsche Wortbildung in Grundzugen. Walter de
Gruyter, Berlin

What happened to the crying bird? Differential roles


of embedding depth and topicalization modulating
syntactic complexity in sentence processing
Carina Krause1, Bernhard Sehm1, Anja Fengler1,
Angela D. Friederici1, Hellmuth Obrig1,2
1
Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany; 2 University Hospital Leipzig, Clinic for Cognitive
Neurology, Germany
The rat the cat the dog bit chased escaped. Previous studies provide evidence that the processing of such hierarchical syntactic
structures involves a network including the inferior frontal gyrus and
temporo-parietal regions (Friederici 2009; Fengler et al. in press) as
two key players. While most studies locate the processing of syntactically complex sentences in Brocas area (BA44/45), some studies
also report the involvement of BA47 and BA6 (Friederici 2011), and
temporo-parietal areas (Shetreet et al. 2009). Why is there so much
variation in localizing the syntactic complexity effect? The interpretation of multiple embedded sentence structures represents a
particular challenge to language processing requiring syntactic hierarchy building and verbal working memory. Thereby syntactic
operations may differentially tax general verbal working memory
capacities, preferentially relying on TP-regions (Meyer et al. 2012),
and more syntax-specific working memory domains, preferentially
relying on IFG structures (Makuuchi et al. 2009). To disentangle the
specific contribution of each subsystem, we developed stimulus
material that contrasts syntactic complexity and the working memory
aspects. The goal of our project is to use this material in facilitation
(tDCS study) and impairment (lesion study) to allow ascribing causal
roles of the above brain areas to these three aspects of syntax
processing.
Methods
20 healthy participants ( age: 24) performed an auditory sentence
picture-matching task. Both reaction times and error rates were
recorded.
Paradigm
In a number of pilot studies (each 1015 participants), task complexity was varied (number of choice options, distractors, presentation
order).Our stimulus set is based on material used in previous studies
(Antonenko et al. 2013; Fengler et al. in press) and consists of 132
German transitive sentences. It has a 2x3-factorial design tapping
argument order (A: subject- vs. B: object-first) and depth of syntactic
embedding (0: no, 1: single, 2: double embedding):

A0: Der Vogel ist braun, er wscht den Frosch, und er weint.
B0: Der Vogel ist braun, ihn wscht der Frosch, und er weint.
A1: Der Vogel, der braun ist, und der den Frosch wscht, weint
B1: Der Vogel, der braun ist, und den der Frosch wscht, weint
A2: Der Vogel, der den Frosch, der braun ist, wscht, weint.
B2: Der Vogel, den der Frosch, der braun ist, wscht, weint.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Results and Conclusion
In healthy subjects only successive presentation of auditorily presented sentences and the ensuing pictures (three distracters) yields
robust behavioral differences. As a function of both (i) level of
embedding and (ii) topicalization, we find highly significant effects
in terms of increasing reaction times (embedding: F(2,32) = 46.610,
p = .000; topicalization, F(1,16) = 25.003, p = .000) as well as
decreased accuracy (embedding depth, F(2,32) = 20.826, p = .000;
topicalization, F(1,16) = 10.559, p = .005). Interestingly, factors do
not interact, suggesting partially independent factorial influence on
syntactic processing. Currently the paradigm is used in a study with
facilitatory transcranial direct current stimulation (tDCS) of each
key area (IFG vs. TP-region). Additionally, patients with circumscribed acquired brain lesions are tested on different versions of the
paradigm adapted to the requirements of language-compromised
patients.
References
Antonenko D, Brauer J, Meinzer M, Fengler A, Kerti L, Friederici A,
Floel A (2013) Functional and structural syntax networks in
aging. Neuroimage 83:513523
Friederici A (2009) Pathways to language: fiber tracts in the human
brain. Trends Cogn Sci 13(4): 175181
Friederici A (2011). The brain basis of language processing: from
structure to function. Physiol Rev 91(4):13571392
Makuuchi M, Bahlmann J, Anwander A, Friederici A (2009) Segregating the core computational faculty of human language from
working memory. PNAS 106(20):83628367
Meyer L, Obleser J, Anwander A, Friederici A (2012) Linking
ordering in Brocas area to storage in left temporo-parietal
regions: the case of sentence processing, Neuroimage
62(3):19871998
Shetreet E, Friedmann N, Hadar U (2009) An fMRI study of syntactic
layers: Sentential and lexical aspects of embedding. Neuroimage
48(4):707716

fMRI-evidence for a top-down grouping mechanism


establishing object correspondence in the Ternus
display
Katrin Kutscheidt1, Elisabeth Hein2, Manuel Jan Roth1,
Axel Lindner1
1
Department of Cognitive Neurology, Hertie Institute for Clinical
Brain Research, Tubingen, Germany; 2 Department of Psychology,
University of Tubingen, Germany
Our visual system is constantly confronted with ambiguous sensory
input. However, it is rarely perceived as being ambiguous. It is, for
instance, possible to keep track of multiple moving objects in parallel,
even if occlusions, or eye blinks might prevent the unique assignment
of objects identity based on sensory input. Hence, neural mechanismsbottom-up or top-downmust disambiguate conflicting
sensory information. The aim of this study was to shed light on the
underlying neural mechanisms establishing object correspondence
across space and time despite such ambiguity.
To this end, we performed a functional magnetic resonance
imaging (fMRI) study using a variant of the Ternus display (Ternus
1926). The Ternus display is an ambiguous apparent motion stimulus,
in which two sets of three equidistant disks are presented in the following way: while two disks are always presented at the same
position, a third disk is alternating between a position to the left and to
the right of these two central disks. This display either leads to the

Cogn Process (2014) 15 (Suppl 1):S1S158


percept of group motion (GM), where an observer has the
impression that all three disks move coherently as one group or,
alternatively, to the percept of element motion (EM), in which the
outermost disk is seen as jumping back and forth over stationary
central disks. The way the Ternus display is perceptually interpreted
thereby depends on both low-level features (e.g. inter-frame interval
[IFI]; Petersik, Pantle,1979) and higher-level factors (e.g. context
information; He, Ooi 1999).
Our Ternus display consisted of three white disks presented on a
grey background. The disks were shown for 200 ms in alternating
frames. Each stimulus block lasted five minutes during which participants (n = 10) had to fixate a central fixation cross and to
manually indicate their respective motion percept, GM or EM, using a
button box. Due to the ambiguous nature of the stimulus, participants
perceptual interpretation constantly changed during the course of the
experiment. The average percept duration across individuals was
*11 s for GM and *8 s for EM. To guarantee comparable percept
durations also within participants, we individually estimated the IFI at
which EM and GM were perceived equally often in a pre-experiment.
The IFI in the MRI experiment was then adjusted accordingly. The
experiment comprised six blocks, each preceded by a 30 s baseline
period without stimulus presentation.
Functional (TR = 2 s) and anatomical MRI images were acquired
on a 3 T Siemens TRIO scanner and processed using SPM8. In participant-specific first level analyzes, we specified general linear
models including three regressors: (i) onset of GM percept; (ii) onset
of EM percept; (iii) stimulus presentation. All regressors were convolved with the canonical haemodynamic response function. The
initial fixation period was not explicitly modelled and served as a
baseline. In each participant, we individually identified task-related
regions of interest (ROIs) by contrasting stimulus presentation (iii) vs.
baseline. Only those areas were considered ROIs that also surfaced in
a second-level group analysis of the same contrast. Task-related
bilateral ROIs were the lingual gyrus (LG), V3a, V5 and intraparietal
sulcus (IPS). For each individual and for each ROI, we then extracted
the time course of fMRI-activity in order to perform time-resolved
group analyzes for activity differences between EM and GM percepts.
Analyzes of the simultaneously recorded eye data helped to exclude
influences of eye blinks, saccades, eye position and eye velocity on the
motion percepts, as no difference between conditions was revealed.
In all ROIs a perceptual switch was accompanied by a significant
peak in fMRI-activity around the time of the indicated switch
(p \ .05). While the amplitude of these peaks did not differ between
perceived GM and EM across all ROIs (p [ .05, n.s.) we observed
significant differences in the temporal onset of the switch-related
fMRI-response in GM and EM (p \ .01). Specifically, there was a
particularly early rise in switch-related fMRI-activity in IPS for GM,
which occurred about three seconds before the participant finally
switched from EM to GM. In the case of EM, on the other hand, this
switch-related increase in fMRI-activity in IPS seemed to rather occur
after the perceptual switch. Area V5 exhibited comparable results but
showed less of a temporal difference between GM and EM (p \ .05).
In contrast, in areas LG and V3a the rise in fMRI-activity was rather
time-locked to the perceptual switch per se, being indistinguishable
between GM and EM (p [ .05, n.s.).
Our results revealed significant peaks of fMRI activity that were
correlated with a switch between two perceptual interpretations (GM
or EM) of a physically identical stimulus in LG, V3a, V5 and IPS,
brain regions, which are also involved in visual motion processing
(e.g. Sunaert, Van Hecke, Marchal, Orban 1999). Importantly, the time
course of switch-related activity in IPS additionally suggests a
potential top-down influence on other areas (cf. Sterzer, Russ, Preibisch, Kleinschmidt 2009), here to mediate the perception of GM. The
specific role of IPS could thereby relate to spatial binding of individual
objects into a group (cf. Zaretskaya, Anstis, Bartels 2013). This idea is
consistent with the theory of Kramer and Yantis (1997) suggesting that

S49
object correspondence in the Ternus display could be determined by
top-down spatial binding of the discs within particular frames.
Acknowledgments
This work was supported by a grant from the BMBF (FKZ 01GQ1002
to A.L.).
References
He ZJ, Ooi TL (1999) Perceptual organization of apparent motion in
the ternus display. Percept Lond 28:877892
Kramer P, Yantis S (1997) Perceptual grouping in space and time:
evidence from the Ternus display. Percept Psychophys 59:8799
Petersik JT, Pantle A (1979) Factors controlling the competing sensations produced by a bistable stroboscopic motion display.
Vision Res 19(2):143154
Sterzer P, Kleinschmidt A, Rees G (2009) The neural bases of multistable perception. Trends Cogn Sci 13:310318
Sunaert S, Van Hecke P, Marchal G, Orban GA (1999) Motion-responsive regions of the human brain. Exp Brain Res 127:355370
Ternus J (1926) Experimentelle Untersuchungen uber phanomenale
Identitat [Experimental investigations of phenomenal identity].
Psychologische Forschung 7:81136
Zaretskaya N, Anstis S, Bartels A (2013) Parietal cortex mediates
conscious perception of illusory gestalt. J Neurosci 33:523531

Event-related potentials in the recognition of scene


sequences
Stephan Lancier, Julian Hofmeister, Hanspeter Mallot
Cognitive Neuroscience Unit, Department of Biology, University
of Tubingen, Germany
Many studies investigated Event-Related Potentials (ERP) associated
to the recognition of objects and words. Friedmann (1990) showed in
an old/new task that correctly recognized new pictures of objects
evoked a larger frontal-central N300 amplitude than familiar pictures
of objects. This indicates that participants are able to discriminate
between old and new pictures 300 ms after stimulus onset. Rugg et al.
(1998) found different neural correlates for the recognition of
implicitly and explicitly learned words. In the so-called mid-frontal
old/new effect, recognized, implicitly learned words were characterized by lower N400 amplitude in contrast to the recognized new
words. The explicitly learned words could be dissociated by their
larger P600 amplitude from implicitly learned words, which was
called the left parietal old/new effect. Rugg et al. concluded that
recognition memory can be divided into two distinct processes, a
familiarity process for implicit learning and a recollection process for
explicit learning. These neural correlates were also shown for the
recognition of pictures of objects (Duarte et al. 2004). In fast recognition tasks, pictures of scenes are identified as fast as pictures of
isolated objects. Schyns and Oliva (1994) suggest that a coarse-to-fine
process extracts a coarse description for scene recognition before finer
information is processed. In this case the workload for recognizing a
scene would not differ substantially from the workload required in
object recognition. In the present study, we investigate the recognition
of target scenes from scene sequences and compare the elicited neural
correlates to those of former studies. We hypothesize that the recognition of scene identity and scene position in the sequence evoke
dissociable neural correlates.
At the current stage of this study five students of the University of
Tubingen participated. Each of them completed two sessions at different days. The experiment consisted of 100 trials. Each trial was
divided into a learning phase and test phase (see Fig. 1). During the
learning phase eight hallways each with two doors were shown. In

123

S50

Cogn Process (2014) 15 (Suppl 1):S1S158


results showed also a parietal old/new effect but without a left lateralization. The results of our experiment cannot be assigned
conclusively to one of the postulated memory processes. Furthermore
in the tasks involving non-sequence matching scenes, time course of
the ERP was reversed after 650 ms. We assume that this effect is a
neural correlate of sequence recognition processing.

Fig. 1 Schematic illustration of the learning phase and the test phase.
After the learning phase the lettering test phase was presented on
the display for three seconds. ERPs were triggered with the onset of
the test scene
each hallway the participants had to choose one door which they
wanted to pass. This decision had no impact on the further presentation but was included to focus attention on the subsequent scene.
After this decision two pictures of indoor scenes were presented each
for 600 ms. The first was the target scene which the participants had
to detect in the test phase. This picture was marked with a green
frame. The second picture showed the distractor scene and was
marked with a red frame. The test phase followed immediately after
all eight hallways had been presented. During the test phase, all
hallways were tested. The number of the current hallway was presented as a cue followed by a test scene. In a Yes/No task, participants
were asked to hit the corresponding mouse button if the presented
scene was the target scene they encountered in the corresponding
hallway during the learning phase. Fifty percent of presented test
scenes were sequence matching target scenes known from the
learning phase (correct identity and position) and the other 50 percent
were homogenously distributed into distractor scenes of the same
hallway (false identity, correct position), new scenes which were not
presented in the learning phase (false identity and position) and target
scenes which did not match the corresponding hallway (correct
identity, false position). In addition to the psychophysical measurements, ERPs were measured by EEG and were triggered with the test
scene presentation.
Behaviorally, the hit rate (correct recognition of scene identity and
position) was about 80 %. Overall correct rejection (either identity or
position incorrect) was about 85 %. Correct target scenes appearing at
incorrect positions were rejected at a rate of about 60 %. Target
scenes appearing at a false position were more likely to be rejected as
the distance between their presentation in the learning and test
sequences increased. ERPs depend on the combination of decision
and task condition. The hit condition differs from all other task/
response combinations in a relatively weak N300. Especially at the
frontal sites the non-hit combinations lacked a P300 wave except for
the false alarms of non-sequence matching target scenes where the
ERP approached the level of the P300 of the hit condition abruptly
after the peak of the N300. Note that in both conditions, scene identity
was correctly judged whereas position is ignored. They also differ
from the other task/response combinations also in the N400 which is
relatively weak. The parietal P600 wave of the hits differed only from
the correct rejections of distractor scenes, the novel scenes and the
missed target scenes. Between 650 and 800 ms, the parietal electrodes
recorded a positive voltage shift for the correct rejections of the nonsequence matching scenes and a negative voltage shift for the false
alarms of the non-sequence matching scenes. No such potentials were
found for the other task/response combinations.
The mid-frontal old/new effect of Rugg et al. (1998) seems to be
comparable to the N400 effect of our preliminary data. In addition our

123

References
Duarte A, Ranganath C, Winward L, Hayward D, Knight RT (2004)
Dissociable neural correlates for familiarity and recollection
during the encoding and retrieval of pictures. Cogn Brain Res
18:255272
Friedmann D (1990) Cognitive event-related potential components
during continuous recognition memory for pictures. Psychophysiology 27(2):136148
Rugg MD, Mark RE, Walla P, Schloerscheidt AM, Birch CS, Allan K
(1998) Dissociation of the neural correlates of implicit and
explicit memory. Nature 392:595598
Schyns PG, Oliva A (1994) From blobs to boundary edges for timeand spatial-scale-dependent scene recognition. Psychol Sci
5(4):195200

Sensorimotor interactions as signaling games


Felix Leibfried1,2,3, Jordi Grau-Moya1,2,3, Daniel A. Braun1,2
1
Max Planck Institute for Biological Cybernetics, Tubingen,
Germany; 2 Max Planck Institute for Intelligent Systems, Tubingen,
Germany; 3 Graduate Training Centre of Neuroscience, Tubingen,
Germany
In our everyday lives, humans not only signal their intentions through
verbal communication, but also through body movements (Sebanz
et al. 2006; Obhi and Sebanz 2011; Pezzulo et al. 2013), for instance
when doing sports to inform team mates about ones own intended
actions or to feint members of an opposing team. We study such
sensorimotor signaling in order to investigate how communication
emerges and on what variables it depends on. In our setup, there are
two players with different aims that have partial control in a joint
motor task and where one of the two players possesses private
information the other player would like to know about. The question
then is under what conditions this private information is shared
through a signaling process. We manipulated the critical variables
given by the costs of signaling and the uncertainty of the ignorant
player. We found that the dependency of both players strategies on
these variables can be modeled successfully by a game-theoretic
analysis. Signaling games are typically investigated within the context
of non-cooperative game theory, where each player tries to maximize
their own benefit given the other players strategy (Cho and Kreps
1987). This allows defining equilibrium strategies where no player
can improve their performance by changing their strategy unilaterally.
These equilibria are called Bayesian Nash equilibria, which is a
generalization of the Nash equilibrium concept in the presence of
private information (Harsanyi 1968). In general, signaling games
allow both for pooling equilibria, where no information is shared, and
for separating equilibria with reliable signaling.
In our study we translated the job market signaling game into a
sensorimotor task. In the job market signaling game (Spence 1973),
there is an applicantthe senderwho has private information about
his true working skill, called the type. The future employerthe
receivercannot directly know about the working skill, but only
through a signalfor example, educational certificatesthat are the
more costly to acquire, the less working skill the applicant has. The
sender can choose a costly signal that may or may not transmit
information about the type to the receiver. The receiver uses this

Cogn Process (2014) 15 (Suppl 1):S1S158

S51

signal to make a decision by trying to match the paymentthe


actionto the presumed type (working skill) that she infers from the
signal. The senders decision about the signal trades off the expected
benefits from the receivers action against the signaling costs.
To translate this game into a sensorimotor task, we designed a
dyadic reaching task that implemented a signaling game with continuous signal, type and action space. Two players sat next to each
other in front of a bimanual manipulandum, such that they could not
see each others faces. In this task, each player controlled one
dimension of a two-dimensional cursor position. No other communication than the joint cursor position was allowed. The senders
dimension encoded the signal that could be used to convey information about a target position (the type) that the receiver wanted to
hit, but did not know about. The receivers dimension encoded her
action that determined the senders payoff. The senders aim was to
maximize a point score that was displayed as a two-dimensional color
map The point score increased with the reach distance of the receiverso there was an incentive to make the receiver believe that the
target is far away. However, the point score also decreased with the
magnitude of the signalso there was an incentive to signal as little
as possible due to implied signaling costs. The receivers payoff was
determined by the difference between his action and the true target
position that was revealed after each trial. Each player was instructed
about the setup, their aim and the possibility of signaling. The
question was whether players behavior converged to Bayesian Nash
Equilibria under different conditions where we manipulated the signaling cost and the variability of the target position. By fitting
participants variance of their signaling, we could quantitatively
predict the influence of signaling costs and target variability on the
amount of signaling. In line with our game-theoretic predictions, we
found that increasing signaling costs and decreasing target variability
leads in most dyads to less signaling. We conclude that the theory of
signaling games provides an appropriate framework to study sensorimotor interactions in the presence of private information.

agents in our environment or controlling technology via voice interfaces. Here we investigate SoA during verbal control of the external
environment using the intentional binding paradigm.
Intentional binding is a phenomenon where the perceived actionoutcome interval for voluntary actions is shorter than for equivalent
passive movements (Haggard, Clark, Kalogeras 2002). In this
experimental paradigm participants report the perceived time of
voluntary action initiation and the consequent effects using the socalled Libet clock. Haggard, Clark, Kalogeras (2002) found that when
participants caused an action, their perceived time of initiation and the
perceived time of the outcome where brought closer together, i.e. the
perceived interval between voluntary actions and outcomes was
smaller than the actual interval. In the case of involuntary actions the
perceived interval was found to be longer than the actual interval.
Importantly, intentional binding is thought to offer a reliable implicit
measure of SoA (Moore, Obhi 2012).
In this study we developed a novel adaptation of the intentional
binding paradigm where participants performed both verbal commands
(saying the word go) and limb movements (key-presses) that were
followed by an outcome (auditory tone) after a fixed 500 ms interval.
Participants sat at a desk in front of a 24 monitor, which displayed the
Libet clock. The experimental design was a within-subjects design with
one independent variable: input modalityspeech input or keyboard
input. A keyboard and microphone were used to register their actions.
The trials were separated into individual blocksoperant blocks require
the participant to act (either via button press or verbal command) to cause
a beep During the operant trials, participants reported the time of the
critical event (either the action or outcome). Baseline trials had either an
action from the participant (with no outcome) or the beep occurring in
isolation. During baseline conditions, the participant is required to report
the time of the critical eventaction or outcome.
We investigated:

References
Cho I, Kreps D (1987) Signaling games and stable equilibria. Quart J
Econ 102(2):179222
Harsanyi J (1968), Games with incomplete information played by
Bayesian players, IIII. Part II. Bayesian equilibrium points.
Manag Sci 14(5):320334
Obhi SS, Sebanz N (2011), Moving together: toward understanding the
mechanisms of joint action. Exp Brain Res 211(34):329336
Pezzulo G, Donnarumma F, Dindo H (2013) Human sensorimotor
communication: a theory of signaling in online social interactions. PloS ONE 8(11):e79876
Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and
minds moving together. Trends Cogn Sci 10(2):7076
Spence M (1973) Job market signaling. Quart J Econ 87(3):355374

Firstly, we found that the average perceived time of action corresponded to the beginning of the utterance. This offers an intriguing
insight concerning the cognitive processes underlying action perception for speech. One possible explanation for the action being
perceived as occurring at the beginning of the utterance is that once
people receive sensory information about their verbal command,
perception of action arises. Theoretically, this possible explanation is
in line with the cue integration theory of agency. Cue integration
holds that both internal motor cues and external situational information contribute to the SoA (Wegner, Sparrow 2004; Moore et al. 2009;
Moore, Fletcher 2012). It has been suggested that the influence of
these cues upon our SoA depends on their reliability (Moore, Fletcher
2012). According to the cue integration concept, multiple agency cues
are weighted by their relative reliability and then optimally integrated
to reduce the variability of the estimated origins of an action. For
speech, it may be the case that hearing ones voice is a highly reliable
agency cue and enough to label the action as initiated. Of course,
further investigation is required, a larger sample size and other
measurements for action perception (such as EEG) will be vital in
determining the perception of action perception for verbal commands.
These insights will be valuable, particularly for designers of speech
recognition software and voice based interfaces.
To address number 2) above, we tested whether binding was
occurring within each input modality. We conducted a 2x2 repeated
measures analysis of variance comparing event type (action/outcome)
and context (operant/baseline). The key-press condition resulted in a
significant interaction between context and event. Follow up t-tests
comparing action operant and action baseline have a significant difference t(13) = - 5.103, p \ .001. This shows that operant actions

Subjective time perception of verbal action


and the sense of agency
Hannah Limerick1, David Coyle1, James Moore2
1
University of Bristol, UK; 2 Goldsmiths, University of London, UK
The Sense of Agency (SoA) is the experience of initiating actions to
influence the external environment. Traditionally SoA has been
investigated using experimental paradigms where a limb movement is
required to initiate an action. However, less is known about the SoA
for verbal commands, which are a prevalent mode of controlling our
external environment. Examples of this are interacting with other

1)
2)

Subjective time of action perception for verbal commands


The SoA for verbal commands

123

S52
were perceived later than the baseline. T test comparing the perceived
times of the operant tone condition and the baseline tone condition have
a significant difference t(13) = 2.374, p \ .05 and therefore operant
tones were perceived earlier than the baseline. The same analysis was
repeated for the speech condition which resulted in a trend towards
significance between context and event (F1,13 = 3,112, p = .101).
Because this was a preliminary investigation, we performed exploratory
analysis and performed follow up paired t-tests comparing action
operant and action baseline and found a significant difference
t(12) = - 2.257, p \ .05 indicating that operant actions are perceived
later than the baseline and thus action binding is occurring. A t-test
comparing outcome operant and outcome baseline gave a non-significant difference t(13) = .532, p = .604). Therefore the outcome operant
condition was not perceived as significantly earlier than the baseline and
the outcome binding component of intentional binding was not present.
Although intentional binding was present for limb movements
(consistent with the existing literature), it was absent for verbal
commands. There are several possible explanations for this. One
possible explanation is that intentional binding is a phenomenon that
does not occur for verbal commands. It is also possible that intentional binding is present at different scales across different
sensorimotor modalities. Another explanation, in line the cue integration approach for SoA (described above) is that there are
differences in the amount of sensory cues provided to the participant
to confirm that the action has occurred. Key-presses involve proprioceptive, visual, haptic and auditory cues, which are all integrated to
influence the SoA for an action. For verbal commands, there are less
sensory cuesproprioceptive and auditory. Less agency cues
involved in verbal commands may result in no intentional binding
effect. Therefore, further investigation would determine whether
different factors within the experimental set up has an impact on
intentional binding for verbal commands. Alterations such as longer
of shorter timescales, perhaps different forms of outcome (e.g. non
auditory) or additional agency cues may alter intentional binding.
There may also be experimental factors that lead to no intentional
binding being present for the verbal condition. Typically a speech
recognizer would need to process the entire utterance and perform
recognition before deeming it as an action. However, as we discussed
above, the user typically considered their utterance as an action
roughly at the beginning of the utterance, thus giving a variable delay
between action-outcome. Intentional binding studies have found that
the binding phenomenon breaks down beyond 650 ms (Haggard,
Clark, Kalogerous 2002). This may also explain the lack of tone
binding found here. Interestingly, further exploratory analyzes of the
speech data suggest that the action component of intentional binding
was present but the outcome component was absent (hence the
apparent lack of overall binding). This suggests that an element of
binding is occurring here.
References
Haggard P, Clark S, Kalogeras J (2002) Voluntary action and conscious
awareness. Nature Neurosci 5(4):382385. doi:10.1038/nn827
Moore J, Fletcher P (2012) Sense of agency in health and disease: a
review of cue integration approaches. Conscious Cogn 21(1):
5968. doi:10.1016/j.concog.2011.08.010
Moore J, Obhi S (2012) Intentional binding and the sense of agency: a
review. Conscious Cogn 21(1):546561. doi:10.1016/j.concog.
2011.12.002
Moore JW, Wegner DM, Haggard P (2009) Modulating the sense of
agency with external cues. Conscious Cogn 18(4):10561064.
doi:10.1016/j.concog.2009.05.004
Wegner DM, Sparrow B (2004) Authorship processing. In: Gazzaniga
M (ed) The cognitive neurosciences III. MIT Press, Cambridge,
MA, pp 12011209

123

Cogn Process (2014) 15 (Suppl 1):S1S158

Memory disclosed by motion: predicting visual working


memory performance from movement patterns
Johannes Lohmann, Martin V. Butz
Cognitive Modeling, Department of Computer Science, University
of Tubingen, Germany
Abstract
Embodied cognition proposes a close link between cognitive and
motor processes. Empiric support for this notion comes from research
applying hand-tracking in decision making tasks. Here we investigate
if similar systematics can be revealed in case of a visual working
memory (VWM) task. We trained recurrent neural networks (RNNs)
to predict memory performance from velocity patterns of mouse
trajectories. Compared to previous studies, the responses were not
speeded. The results presented here reflect a work in progress and
more detailed analyzes are pending, especially the low generalization
performance to unknown data requires a more thorough investigation.
So far, the results indicate that even small RNNs can predict participants working memory state from raw mouse tracking data.
Keywords
Mouse Tracking, Recurrent Neural Networks, Visual Working
Memory
Introduction
With the embodied turn in cognitive science and due to the reconsideration of cognition in terms of a dynamic system (Spivey and Dale
2006), the dynamic coupling between real-time cognition and motor
responses has become a prominent topic in cognitive psychology.
Freeman et al. (2011) provided a first review of this body of research,
concluding that movement trajectories convey rich and detailed
information about ongoing cognitive processes. Most studies investigating this coupling, applied speeded responses, where participants
were instructed to respond as accurate and as fast as possible. Here we
investigate if movement characteristics are also predictive for higher
cognitive functions in case of non-speeded responses. More precisely,
we analyze mouse trajectories obtained in a visual working memory
(VWM) experiment and try to predict recall performance (how well an
item was remembered) from the movement characteristics.
Experimental Setup
Mouse trajectories were obtained during a VWM experiment,
applying a delayed cued-recall paradigm with continuous stimulus
spaces (see Zhang and Luck 2009, for a detailed description of the
paradigm). In each trial, participants had to remember three or six
stimuli. After a variable interstimulus interval (ISI), they had to report
the identity of one of them. The stimuli consisted of either colored
squares or Fourier descriptors. Memory performance in terms of
precision was quantified as angular distance between the reported and
the target stimulus. At the end of the ISI, one of the previous stimulus
locations was highlighted and the mouse cursor appeared at the center
of the screen. Around the center either a color or a shape wheel,
depending on the initial stimuli, was presented and participants had to
click at the location that matched the stimulus at the cued location.
The responses were not speeded and participants were instructed to
take as much time as they wanted for the decision. The trajectory of
the mouse cursor was continuously tracked at a rate of 50 Hz. We
obtained 4,000 trajectories per participant.
Network Training
We used the trajectory data to train Long Short-Term Memory
(LSTM, Gers et al. 2003) networks, to predict the memory performance based on the velocity pattern of the first twenty samples of a
mouse trajectory. We chose LSTMs instead of other possible classifiers since LSTMs are well suited to precisely identify predictive
temporal dependencies in time series, which is difficult for other
algorithms, like Hidden Markov Models. We used the raw velocity

Cogn Process (2014) 15 (Suppl 1):S1S158


vectors of the trajectories as inputs, without applying any
normalization.
We did not require the network to learn a direct mapping between
movement trajectories and reported angular distances (referred to as D
in the plots). Rather we labeled each trajectory based on the data
obtained in the respective condition as either low distance or high
distance and trained the network as a binary classifier. Trajectories
that led to an angular distance below the 33 % quantil (Q(33) in
Fig. 1), were labeled as low distance, trajectories that led to angular
distances above the 66 % quantil (Q(66) in Fig. 1), were labeled as
high distance. The intermediate range between the 33 and 66 %
quantil was not used for training. Labels were assigned based on the
response distribution of the respective experimental condition. Hence,
the same angular distance not always led to the same label assignment. Thus, a suitable network had to learn to perform a relative
quality judgment instead of a mere median split. Half of the 4,000
trajectories of a participant were used for the training of a single
network. From these 2,000 trajectories, the 33 % labeled as low
distance and the 33 % labeled as high distance were used in the
training. We compared the performance of networks consisting of
either 5, 10, or 20 LSTM blocks. For each network size ten networks
were trained.
Results
The presented results were obtained with the 4,000 trajectories of one
participant. Depending on the network size, classification performance increased from 60 % after the first learning epochs up to 80 %
at the end of the training.
Fig. 1 provides an overview of the results. One-sample t tests
revealed that both the proportion of correct- as well as of misclassifications differed significantly from chance level for the two trained
categories. Paired t tests indicated that the proportion of correct and
misclassifications differed significantly within the trained categories.
Despite the apparent ability of the networks to acquire criteria to
distinguish trajectories associated with either low or high angular
distance, cross-validation results were rather poor, yet still significantly above chance level.
Discussion
In this study we investigated if motion patterns are still predictive for
cognitive performance in case of non- speeded responses. We trained
comparatively small recurrent neural networks to predict the precision
of memory recall from mouse movement trajectories. Even if the
generalization performance obtained so far is rather low, our preliminary results show that characteristics of non-speeded movements

S53
can be predictive for the performance of higher cognitive functions
like VWM state and retrieval.
References
Freeman JB, Dale R, Farmer TA (2011) Hand in motion reveals mind
in motion. Front Psychol 2:59
Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise
timing with LSTM recurrent networks. JMLR 3:115143
Spivey MJ, Dale R (2006) Continuous dynamics in real-time cognition. Curr Dir Psychol 15(5):207211
Zhang W, Luck SJ (2009) Sudden death and gradual decay in visual
working memory. Psychol Sci 20(4):423428

Role and processing of translation in biological motion


perception
Jana Masselink, Markus Lappe
University of Munster, Germany
Keywords
Human walking, Biological motion processing, Translation
Visual perception of human movement is often investigated using
point-light figures walking on a treadmill. However, real human
walking does not only consist of a change in the position of the limbs
relative to each other (referred to as articulation), but also of a change
in body localization in space over time (referred to as translation). In
point-light displays this means that the motion vector of each dot is
composed of both an articulatory and a translatory component. We
have examined the influence and processing mechanisms of this
translation component in perception of point-light walkers. In three
experiments each with a two-alternative forced-choice task observers
judged the apparent facing orientation and articulation respectively
in terms of walking direction or forward/backward discriminationof
a point-light walker viewed from the side. Translation could be either
consistent or inconsistent with facing/articulation or not existent at all
(treadmill walking). Additionally, stimuli differed in point lifetime to
manipulate the presence of local image motion. Stimuli were presented for 200 ms to prevent eye movements to the translating
stimulus. Although participants were explicitly instructed to judge
facing orientation and articulation regardless of translation, results
revealed an effect of translation in terms of response bias in translation direction in all three tasks. As translation even had an effect on
walkers with absent local motion signal in facing orientation and
walking direction task, we conclude that the global motion of the
center-of-mass of the dot pattern is relevant to processing of translation. Overall, translation direction seems to influence both
perception of form and motion of a walker. This supports the idea that
translation interacts with both the posture-based analysis of form and
the posture-time-based analysis of articulation in the perception of
human body motion.

How to remember Tubingen? Reference frames


in route and survey knowledge of ones city of residency

Fig. 1 Aggregated evaluation of the 30 networks after 50 learning


epochs. Black bars indicate correct classification performance for the
two trained categories. Error bars indicate the standard error of the
mean. Significant differences between classifications within one
category are marked with an asterisk

Tobias Meilinger1, Julia Frankenstein2, Betty J. Mohler1,


Heinrich H. Buelthoff1
1
Max Planck Institute for Biological Cybernetics, Tubingen,
Germany; 2 Cognitive Science, ETH Zurich, Switzerland
Knowledge underlying everyday navigation is distinguished into
route and survey knowledge (Golledge 1999). Route knowledge
allows re-combining and navigating familiar routes. Survey

123

S54
knowledge is used for pointing to distant locations or finding novel
shortcuts. We show that within ones city of residency route and
survey knowledge root in separate memories of the same environment
and are represented within different reference frames.
Twenty-six Tubingen residents who lived there for seven years in
average faced a photo- realistic virtual model of Tubingen and
completed a survey task in which they pointed to familiar target
locations from various locations and orientations. Each participants
performance was most accurate when facing north, and errors
increased as participants deviation from a north-facing orientation
increased. This suggests that participants survey knowledge was
organized within a single, north-oriented reference frame.
One week later, 23 of the same participants conducted route
knowledge tasks comprising of the very same start and goal
locations used in the survey task before. Now participants did not
point to a goal location, but used arrow keys of a keyboard to enter
route decisions along an imagined route leading to the goal.
Deviations from the correct number of left, straight, etc. decisions
and response latencies were completely uncorrelated to errors and
latencies in pointing. This suggests that participants employed
different and independent representations for the matched route
and survey tasks.
Furthermore, participants made fewer route errors when asked to
respond from an imagined horizontal walking perspective rather than
from an imagined constant aerial perspective which replaced left,
straight, right decisions by up, left, right, down as in a map with the
order tasks balanced. This performance advantage suggests that participants did not rely on the single, north-up reference used for
pointing. Route and survey knowledge were organized along different
reference frames.
We conclude that our participants route knowledge employed
multiple local reference frames acquired from navigation whereas
their survey knowledge relied on a single north-oriented reference
frame learned from maps. Within their everyday environment, people
seem to use map or navigation-based knowledge according to which
best suits the task.
Reference
Golledge RG (1999) Wayfinding behavior. The John Hopkins University Press, Baltimore

The effects of observing other peoples gaze: faster


intuitive judgments of semantic coherence
Romy Muller
Technische Universitat Dresden, Germany
Introduction
Our actions are modulated by an observation of others behavior,
especially when we represent others as intentional agents. However,
inferring intentions can even be accomplished on the basis of seeing
someones gaze. Do eye movements also exert a stronger influence on
an observer when he is ascribing them to a human instead of a
machine? Indeed, reflexive shifts of attention in response to gaze
shifts are modulated by subjects beliefs (Wiese et al. 2012): A human
face elicited stronger gaze cueing effects than a robot face, but this
difference disappeared when the instruction stated that both stimuli
were of the same origin (i.e. either produced by a human or a
machine). This suggests that beliefs about someone elses visual
attention can exert a direct influence on our own processing of the
same stimuli.
A possible way in which the interpretation of gaze as human can
affect our processing is that we try to infer the meaning of the things
another person is attending to. In this case, observers of object-

123

Cogn Process (2014) 15 (Suppl 1):S1S158


directed gaze should be more likely to perceive a coherent relation in
the objects that they see being looked at. To test this, the present study
used the Remote Associates Test (Mednick 1962) in which subjects
decide whether word triads are coherent by means of allowing
meaningful combinations with a fourth word. Before each decision, a
dot moved across the words and subjects were either told that it
represented the eye movements of a human trying to find word
associations, or a computer-generated control. It was hypothesized
that interpreting the dot as someones gaze would increase the frequency and reduce the time for intuitive judgments, namely those
for which subjects assume a coherent relation but cannot name a
solution.
Methods
Sixteen subjects participated in the experiment and their eye movements were tracked with an SR EyeLink 1000. Within each trial there
was a preview video with cursor overlay and a word triad. Videos
showed a 5 9 4 grid of rectangles containing 20 words, three of
which had to be rated for coherence later. A purple dot cursor (15 px)
moved across the grid, either resting on the three words that were
chosen later, or on three other words. Contrary to what subjects were
told, the cursor always was a real eye movement recording. Each
subject saw 100 triads, one after each video. All triads were composed
of words from the respective video, but only in half of the trials these
words had been cued by the cursor.
Subjects were instructed that the cursor depicted eye movements
(gaze) or a computer-generated control (dot). No strategy of using the
cursor was instructed. Each trial started with a video which was
followed by a triad that remained on the screen until subjects pressed
a key to indicate whether it was coherent or not. If they negated, the
response was counted as incoherent. After a positive response, they
were asked to submit the solution word. If they gave no solution or a
wrong solution, this was counted as a yes + unsolved response,
whereas trials with correct solution words were classified as yes + solved. Subjects worked through two blocks of 50 trials, with each
block corresponding to one of the cursor conditions.
Results
The frequency distributions of the three response types (yes + solved, yes + unsolved, incoherent) were compared between both
cursors and there was no difference, v2 (2) = 1.546, p = .462.
Specifically, the amount of yes + unsolved (intuitive) responses
was similar for gaze and dot (24.6 and 22.5 %), and this also did
not depend on whether the triad had been cued by the cursor
during the video or not, both Fs \ 1, both ps [ .3. Mean response
times did not differ between gaze and dot overall (9.5 vs. 9.4 s),
F \ 1, p [ .8, but cursor interacted with response type,
F(2,28) = 6.052, p = .007, indicating that only the yes + unsolved
responses were faster for gaze than for dot (8.9 vs. 11.3 s),
p = .02. In contrast, there was no difference for yes + solved and
incoherent responses, both ps [ .6. There was no main effect or
interaction with cueing, both Fs \ 1, both ps [ .6, suggesting that
the speed advantage for yes + unsolved responses in gaze was
unspecific, i.e. it also occurred for triads that had not been cued by
the gaze cursor (Fig. 1).
To investigate the impact of the two cursors on subjects visual
attention, subjects eye movements were analyzed in terms of the time
spent on the three cued areas within a grid. This time did not differ
between gaze and dot (39.0 vs. 41.9 %), t(15) = 1.36, p = .193.
Thus, although there was quite some interindividual variation in
subjects strategies of using the cursor, most subjects looked at the
gaze and dot cursor in a similar manner.
Discussion
The present results indicate that observing another persons eye
movements can affect the coherence we assume in the things being
looked at. When subjects believed that they saw a depiction of gaze
on word triads, their intuitive classifications as coherent were no more
frequent (perhaps due to a lack of sensitivity) but faster than when

Cogn Process (2014) 15 (Suppl 1):S1S158

Fig. 1 Percentage of responses (A) and response times (B) depending


on cursor and response type. The percentage of time spent on the cued
areas for every single subject (C) was similar for both cursors
they interpreted the exact same cursor as non-human. Thus, it appears
that seeing someone else looking at objects makes people assume that
there must be something in it, especially when they cannot name it.
Interestingly, the effect was not specific to cued triads, suggesting that
with gaze transfer the overall readiness for assuming coherence was
higher. In the light of this result, it is possible that gaze increased
subjects openness for uncertain judgments more than it affected their
actual processing of the objects. This question will have to remain for
future research.
In contrast to what could be predicted on the basis of previous
work (Wiese et al. 2012), subjects visual attention allocation did not
differ between gaze and dot. First, this rules out the possibility that
differences between both cursors only occurred because subjects had
ignored the presumably irrelevant dot. Moreover, it raises the question to what degree and on what level of processing more abstract
depictions of intentional behavior (such as cursors) can exert an
influence. This has implications for basic research on social attention
and joint action as well as for applied topics such as the visualization
of eye movements or computer-mediated cooperation with real and
virtual agents.
References
Mednick SA (1962) The associative basis of creativity. Psychol Rev
69(3):220232
Wiese E, Wykowska A, Zwickel J, Muller HJ (2012) I see what you
mean: how attentional selection is shaped by ascribing intentions
to others. PLoS ONE 7(9):e45391

Towards a predictive processing account of mental


agency
Iuliia Pliushch, Wanja Wiese
Johannes Gutenberg University Mainz, Mainz, Germany
The aim of this paper is to sketch conceptual foundations for a predictive
processing account of mental agency. Predictive processing accounts
define agency as active inference (as opposed to perceptual inference,
cf. Hohwy 2013, Friston 2009, Friston et al. 2012, Friston et al. 2013).
Roughly speaking, perceptual inference is about modeling the causal
structure of the world internally; active inference is about making the
world more similar to the internal model. Existing accounts, however,
so far mainly deal with bodily movements, but not with mental actions
(cf. Proust 2013, Wu 2013; the only conceptual connection between
active inference and mental action we know of is made in Hohwy 2013,
pp 197199). Mental actions are important because they do not just
determine what we do, they determine who we are.
The paper is structured as follows. (I) First, we will briefly explain
the notion of active inference. (II) After that, we will review purely

S55
philosophical accounts of mental agency. (III) Finally, we will highlight aspects of mental agency that need to be explained by predictive
processing accounts and, more specifically, suggest possible conceptual connections between mental actions and active inference.
(I) What is active inference? Two aspects of agency explanations
have been emphasized in the predictive processing literature:
1. The initiation of action: According to the framework provided
by Karl Fristons free-energy principle, agency emerges from active
inference. In active inference, changes in the external world that can
be brought about by action are predicted. In order to cause these
changes (instead of adjusting the internal model to the sensory input),
the organism has to move. In beings like us, this involves changing
the states of our muscles. Therefore, changes in proprioceptive sensors are predicted. These evoke proprioceptive prediction errors
(PPE). If these errors are just used to adjust proprioceptive predictions, no action occurs. Therefore, PPEs that are sent up the
processing hierarchy have to be attenuated by top-down modulation,
in other words: their expected precision must be lowered (Brown et al.
2013). Overly precise PPEs just lead to a change of the hypothesis,
while imprecise PPEs lead to action. The initiation of action therefore
crucially depends on precision optimization at the lower end of the
processing hierarchy (the expected precision of bottom-up sensory
signals has to be low, relative to the precision of top-down proprioceptive predictions).
2. The choice and conductance of action: Agency (a goal-directed
kind of behavior) has been explained as active inference (e.g., Friston
et al. 2013; Moutoussis et al. 2014). Agents possess a representation
of a policy which is a sequence of control states (where control states
are beliefs about future action, cf. Friston et al. 2013, p 3): [A]ction
is selected from posterior beliefs about control states. [] these
posterior beliefs depend crucially upon prior beliefs about states that
will be occupied in the future. (Friston et al. 2013, p 4). In this
process, precision is argued to play a dual biasing role: biasing perception toward goal states and enhancing confidence in action choices
(cf. 2013, p 11). The latter fact may influence the phenomenology of
the agent (cf. Mathys et al. 2011, p 17).
From the point of view of predictive processing, two aspects are
central to the explanation of agency: precision and the fact that
possible, attainable counterfactual states are represented. Determining
which counterfactual states minimize conditional uncertainty about
hidden states corresponds to action selection (cf. Friston et al. 2012,
p 4). Optimizing precision expectations enables action, which is
ultimately realized by attenuating proprioceptive prediction error
through classical reflex arcs (Brown et al. 2013, p 415).
Anil Seth (2014) also emphasizes the importance of counterfactually-rich generative models: models that [] encode not only the
likely causes of current sensory inputs, but also the likely causes of
those sensory inputs predicted to occur given a large repertoire of
possible (but not necessarily executed) actions []. (p 2). Seth
(2014) argues that counterfactually-rich generative models lead to the
experience of perceptual presence (subjective veridicality). This
suggests that counterfactual richness could possibly also play a role in
explaining the phenomenal sense of mental agency.
(II) What is a mental action? Here, we briefly review accounts
proposed by Joelle Proust (2013) and Wayne Wu (2013), respectively.
According to Proust, mental actions depend on two factors: an
informational need and a specific epistemic norm (cf. 2013, p 161).
As an example of an informational need, Proust gives remembering
the name of a play. Crucially, the agent should not be satisfied with
any possible name that may pop into her mind. Rather, the agent must
be motivated by epistemic norms like accuracy or coherence. Agents
who are motivated by epistemic norms have epistemic feelings
reflecting the extent to which fulfilling the informational need is a
feasible task: These feelings predict the probability for a presently
activated disposition to fulfill the constraints associated with a given
norm []. (2013, p 162). Wu (2013) defines mental action as

123

S56
selecting a path in behavioral space with multiple inputs (memory
contents) and outputs (possible kinds of behavior). The space of
possible paths is constrained by intentions (cf. 2013, p 257). This is
why it constitutes a kind of agency, according to Wu.
(III) In what follows, we provide a list of possible conceptual
connections between active inference and mental agency, as well as
targets for future research.
1. Mental and bodily actions have similar causal enabling conditions.
The initiation of mental as well as bodily action depends on the
right kind of precision expectations. This renders mental and bodily
actions structurally similar. Mental actions can be initiated at every
level of the processing hierarchy. At each level, the magnitude of
expected precisions may vary. As in bodily active inference, the
precisions of prior beliefs about desired events must be high
enough; otherwise, the informational need will simply be ignored.
Furthermore, allocating attention may be a factor contributing to
the success of mental actions (e.g., attending away from ones
surroundings if one wants to remember something).
2. The contents of a mental action cannot typically be determined at
will prior to performing the action (cf. Proust 2013, p 151).
Example: I cannot try to remember the name John Wayne. (But I
can try to remember the name of the famous American western
movie actor.) Similarly, in active inference, actions themselves
need not be represented, only hidden states that are affected by
action (cf. Friston et al. 2012, p 4). The system possesses
counterfactual representations whose content is [] what we
would infer about the world, if we sample it in particular way.
(Friston et al. 2012, p 2) In the case of perception, it could be the
[] visual consequences of looking at a bird. (p 4) In the case
of remembering, it could be the consequences that the remembered content would produce in the generative model. A central
question that remains to be answered here is to what extent this
would call for an extension of the predictive processing
framework, in the sense that counterfactuals about internal
consequences would also have to be modeled. Interestingly,
conducting mental actions may often be facilitated by refraining
from certain bodily actions. Imagining, for instance, may be
easier with ones eyes closed. In terms of predictive processing,
this means that visual input is predicted to be absent, and bodily
action ensues in order to make the world conform to this
prediction (i.e., one closes ones eyes).
3. Both bodily and mental actions can be accompanied by a
phenomenal sense of agency. For the sense of bodily agency,
comparator models have been proposed (cf. Frith 2012; Seth
2013). For the sense of mental agency (cf. Metzinger 2013), at
least the following questions need to be answered: (1) Is it
possible to explain the sense of mental agency with reference to a
comparison process? (2) If yes, what kinds of content are
compared in this process? A possible mechanism could compare
the predicted internal consequences with the actual changes in the
generative model after the mental action has been performed.
4. Proust (2013) argues that mental agency is preceded and followed
by epistemic feelings. The latter reflect the uncertainty that the
right criteria for the conductance of a mental action have been
chosen and that it has been performed in accordance with the
chosen criteria. We speculate that the phenomenal certainty that a
mental action will be successful depends both on the prior
probability of future states, and on the conditional probabilities of
those states given (internal) control states (thereby, it indirectly
depends on counterfactual richness: the more possibilities to
realize a future state, the higher the probability that the state will
be obtained).
5. A possible problem for predictive processing accounts of mental
agency arises from the role of attention. Predictive processing
accounts define attention as the optimization of precision estimates

123

Cogn Process (2014) 15 (Suppl 1):S1S158


(cf. Feldman & Friston 2010; Hohwy 2012). Precision estimates,
and therefore attention, play a crucial role both in active inference
and in mental agency. However, some attentional processes, like
volitional attention, have also been described as a kind of mental
action (cf. Metzinger 2013, p 2; Hohwy 2013, pp 197199). It is
thus an open challenge to show how attentional processes that are a
constitutive aspects of mental action differ from those that are a
kind of mental action themselves.

Acknowledgment
The authors are funded by the Barbara Wengeler foundation.
References
Brown H, Adams RA, Parees I, Edwards M, Friston K (2013) Active
inference, sensory attenuation and illusions. Cogn Process
14(4):411427
Feldman H, Friston KJ (2010) Attention, uncertainty, and free-energy.
Front Human Neurosci 4:215
Friston K (2009) The free-energy principle: a rough guide to the
brain? Trends Cogn Sci 13(7):293301
Friston K, Adams RA, Perrinet L, Breakspear M (2012) Perceptions
as hypotheses: saccades as experiments. Front Psychol 3:151
Friston K, Schwartenbeck P, FitzGerald T, Moutoussis M, Behrens T,
Dolan RJ (2013) The anatomy of choice: active inference and
agency. Front Human Neurosci 7
Frith C (2012) Explaining delusions of control: the comparator model
20 years on. Conscious Cogn 21(1):5254
Hohwy J (2012) Attention and conscious perception in the hypothesis
testing brain. Front Psychol 3:96
Hohwy J (2013) The predictive mind. Oxford University Press,
Oxford
Mathys C, Daunizeau J, Friston KJ, Stephan KE (2011) A Bayesian
foundation for individual learning under uncertainty. Front
Human Neurosci 5
Metzinger T (2013) The myth of cognitive agency: subpersonal
thinking as a cyclically recurring loss of mental autonomy. Front
Psychol 4:931
Moutoussis M, Fearon P, El-Deredy W, Dolan RJ, Friston KJ (2014)
Bayesian inferences about the self (and others): a review.
Conscious Cogn 25:6776
Proust J (2013) Philosophy of metacognition: mental agency and selfawareness. Oxford University Press, Oxford
Seth AK (2013) Interoceptive inference, emotion, and the embodied
self. Trends Cogn Sci 17(11):565573
Seth AK (2014) A predictive processing theory of sensorimotor
contingencies: explaining the puzzle of perceptual presence and
its absence in synesthesia. Cogn Neurosci 122
Wu W (2013) Mental action and the threat of automaticity. In Clark
A, Kiverstein J, Vierkant T (eds) Decomposing the will. Oxford
University Press, Oxford, pp 244261

The N400 ERP component reflects implicit prediction


error in the semantic system: further support
from a connectionist model of word meaning
Milena Rabovsky1, Daniel Schad2, Ken McRae3
1
Department of Psychology, Humboldt University at Berlin,
Germany; 2 Charite, Universitatsmedizin Berlin, Germany;
3
University of Western Ontario, London, Ontario, Canada
Even though the N400 component of the event-related brain
potential (ERP) is widely used to investigate language and semantic

Cogn Process (2014) 15 (Suppl 1):S1S158


processing, the specific mechanisms underlying this component are
still under active debate (Kutas, Federmeier 2011). To address this
issue, Rabovsky and McRae (2014) recently used a feature-based
connectionist attractor model of word meaning to simulate seven
N400 effects. We observed a close correspondence between N400
amplitudes and semantic network error, that is, the difference
between the activation pattern produced by the model over time and
the activation pattern that would have been correct. Here, we present
additional simulations further corroborating this relationship, using
the same network as in our previous work, with 30 input units
representing word form that directly map onto 2,526 semantic feature units representing word meaning, according to empirically
derived semantic feature production norms (McRae et al. 2005). The
present simulations focus on influences of orthographic neighbors,
which are words that can be derived from a target by exchanging a
single letter, preserving letter positions. Specifically, empirical ERP
research has shown that words with many orthographic neighbors
elicit larger N400 amplitudes. We found that a model analogue of
this measure (i.e., the number of word form representations differing
in a single input unit from the target) increases network error.
Furthermore, the frequency of a words orthographic neighbors has
been shown to play an important role, with orthographic neighbors
that occur more frequently in language producing larger N400
amplitudes than orthographic neighbors that occur less frequently.
Again, our simulations showed a similar influence on network error.
In psychological terms, network error has been conceptualized as
implicit prediction error, and we interpret our results as yielding
further support for the notion that N400 amplitudes reflect implicit
prediction error in the semantic system (McClelland 1994; Rabovsky, McRae 2014).
References
Kutas M, Federmeier KD (2011) Thirty years and counting: finding
meaning in the N400 component of the event-related brain
potential (ERP). Ann Rev Psychol 62:621647
McClelland JL (1994) The interaction of nature and nurture in
development: a parallel distributed processing perspective. In
Bertelson PEP, dYdewalle G (ed) International perspectives on
psychological science, vol 1. Erlbaum, UK
McRae K, Cree GS, Seidenberg MS, McNorgan C (2005) Semantic
feature production norms for a large set of living and nonliving
things. Behav Res Methods 37(4):547559
Rabovsky M, McRae K (2014) Simulating the N400 ERP component
as semantic network error: insights from a feature-based connectionist attractor model of word meaning. Cognition 132:6889

Similar and differing processes underlying carry


and borrowing effects in addition and subtraction:
evidence from eye-tracking
Patricia Angela Radler1, Korbinian Moeller2,3, Stefan Huber2, Silvia
Pixner1
1
Institute for Psychology, UMITHealth and Life Sciences
University, Hall, Tyrol, Austria; 2 Knowledge Media Research
Center, Tubingen, Germany; 3 Department of Psychology,
Eberhard-Karls University, Tubingen, Germany
Keywords
Eye fixation behavior, Addition, Subtraction, Carry-over, Borrowing
Recent research indicated that investigating participants eye fixation behavior (Rayner 1998; Rakoczi 2012) can be informative to
evaluate processes underlying numerical cognition (Geary et al. 1993;

S57
Green et al. 2007; Moeller et al. 2011a; Moeller et al. 2011b).
However, so far, there are only few studies using this methodology to
better understand the processes involved in mental arithmetic with a
specific focus on addition (Green et al. 2007; Moeller et al. 2011a;
Moeller et al. 2011b). In this context, Moeller and colleagues (2011b)
suggested that successful application of the carry-over procedure in
addition (e.g., 23 + 41 = 64 vs. 28 + 36 = 64) involves at least three
underlying processes. First, the sum of the unit digits is computed
already in first pass encoding (i.e., 3 + 1 = 4 vs. 8 + 6 = 14 in above
examples). Second, based on this unit sum the need for a carry-over
procedure is evaluated (with the need for a carry-over indicated by a
unit sum of C 10). Third, the carry-over procedure has to be executed
by finally adding the decade digit of the unit sum to the sum of the
tens digits of the summands to derive the correct result (i.e.,
2 + 4 + 0 = 6 vs. 2 + 3 + 1 = 6). Interestingly, the authors found
that the first two processes were specifically associated with the
processing of the unit digits of the summands reflecting increased
processing demands when the sum of the unit digits becomes B 10
and it is recognized that a carry is needed. In particular, it was found
that already during the initial encoding of the problem first fixation
durations (FFD) on the second summand increased continuously with
the sum of the unit digits indicating that unit sum indeed provides the
basis for the decision whether a carry is needed or not. Additionally,
after the need for a carry procedure was detected carry addition
problems were associated with particular processing of the unit digits
of both summands as indicated by an increase in refixations.
In the current study, we aimed at valuation of how far these results
of the specific processing of unit digits associated with the carry-over
procedure in addition can be generalized to the borrowing procedure
in subtraction. Similar to the case of the carry-over procedure, the
necessity of a borrowing procedure can also be evaluated when
processing the unit digit of the subtrahend during first pass encoding
(i.e., by checking whether the difference of the unit digits of minuend
and subtrahend is B 0) (Geary et al. 1993; Imbo et al. 2007). Furthermore, after the need for a borrowing procedure was detected, later
processing stages may well involve particular processing of the unit
digits of minuend and subtrahend. Therefore, we expected the influence of the necessity of a borrowing procedure in subtraction
problems on participants eye fixation behavior to mirror the influence
of the carry-over procedure in addition.
Forty-five students [9 males, mean age: 23.9 years;
SD = 7.2 years] solved both 48 addition and 48 subtraction problems
in a choice reaction time paradigm. Their fixation behavior was
recorded using an EyeLink 1000 eye-tracking device (SR-Research,
Kanata, Canada) providing a spatial resolution of less than 0.5
degrees of visual angle at a sampling rate of 500 Hz. In a 2 x 2 design
arithmetic procedure (addition vs. subtraction) and the necessity of a
carry-over or borrowing procedure was manipulated orthogonally
with problem size being matched. Problems were displayed in white
against a black background in non-proportional font Courier New
(style bold, size 50). Each problem was presented together with two
solution probes of which participants had to indicate the correct one
by pressing a corresponding button. The order, in which participants
completed the addition and subtraction task, was counterbalanced
across participants. For the analysis of the eye-tracking data, areas of
interest were centered around each digit (height: 200 pixels, width: 59
pixel). All fixations falling within a respective area of interest were
considered fixations upon the corresponding digit.
Generally, additions were solved faster than subtractions
(3,766 ms vs. 4,581 ms) and carry/borrow problems were associated
with longer reaction times (4,783 ms vs. 3,564 ms). Importantly,
however, effects of carry-over and borrowing were also observed in
participants eye fixation behavior. Replicating previous results the
necessity of a carry-over led to a specific increase of FFD on the unit
digit of the second summand (323 ms vs. 265 ms) during first pass
encoding. Interestingly, this was also observed for a required

123

S58
borrowing procedure. FFD were specifically elevated on the unit
digits of the subtrahend (415 ms vs. 268 ms). However, in contrast to
our hypothesis we did not observe such a congruity between the
influences of carry in addition and borrowing in subtraction on later
processing stages. While the need for a carry procedure led to a
specific increase of the processing of the unit digits of both summands
(as indicated by an increase of fixations on these digits, 2.04 vs. 1.55
fixations), this specificity was not found for borrowing subtraction
problems, for which the number of fixations increased evenly on tens
(2.15 vs. 1.74 fixations) and units (2.33 vs. 1.84 fixations) due to the
need for a borrowing procedure.
Taken together, these partly consistent but also differing results for the
carry-over procedure in addition and the borrowing procedure in subtraction indicate that evaluating the need for both is associated with
specific processing of the unit digit of the second operand (i.e., the second
summand or the subtrahend). This is plausible, as in both addition and
subtraction the sum or the difference between the unit digits is indicative
of whether a carry-over or borrowing procedure is necessary. Importantly, both the sum of the unit digits as well as their difference can only
be evaluated after having considered the unit digit of the second operand.
However, later processes underlying the carry-over and borrowing procedure seem to differ. While the need for a carry procedure is associated
with specific reprocessing of the unit digits of both summands this was
not the case for a required borrowing procedure. Thereby, these data
provide first direct evidence, suggesting that similar cognitive processes
underlie the recognition whether a carry or borrowing procedure is
needed to solve the problem at hand. On the other hand, further processing steps may differ between addition and subtraction. Future studies
are needed to investigate the processes underlying the execution of the
borrowing procedure in subtraction more closely.

References
Geary DC, Frensch PA, Wiley JG (1993) Simple and complex mental
subtraction: strategy choice and speed-of-processing differences
in younger and older adults. Psychol Aging 8(2):242256
Green HJ, Lemaire P, Dufau S (2007) Eye movements correlates of
younger and older adults strategies for complex addition. Acta
Psychol 125:257278. doi:10.1016/j.actpsy.2006.08.001
Imbo I, Vandierendonck A, Vergauwe E (2007) The role of working
memory in carrying and borrowing. Psychol Res 71:467483.
doi:10.1007/s00426-006-0044-8
Moeller K, Klein E, Nuerk HC (2011a) (No) Small adults: childrens
processing of carry addition problems. Dev Neuropsychol
36(6):702720
Moeller K, Klein E, Nuerk HC (2011b) Three processes underlying
the carry effect in additionevidence from eye tracking. Br J
Psychol 102:623645. doi:10.1111/j.2044-8295.2011.02034.x
Rakoczi G (2012) Eye Tracking in Forschung und Lehre.
Moglichkeiten und Grenzen eines vielversprechenden Erkenntnismittels. In Gottfried C, Reichl F, Steiner A (Hrsg.),
Digitale Medien: Werkzeuge fur exzellente Forschung und Lehre (S. 8798). Munster: Waxmann
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124:372422

Simultaneous acquisition of words and syntax:


contrasting implicit and explicit learning
Patrick Rebuschat, Simon Ruiz
Lancaster University, UK
The topic of implicit learning plays a central role in cognitive psychology, and recent years have witnessed an increasing amount of

123

Cogn Process (2014) 15 (Suppl 1):S1S158


research dedicated to this issue. However, comparatively little
research has focused on the implicit learning of vocabulary and, to
our knowledge, no study has examined whether syntax and vocabulary can be acquired simultaneously. This is an important question,
given that in language acquisition outside of the experimental lab,
subjects are exposed to (and learn) many linguistic features at the
same time. This paper reports the results of an experiment that
investigated the implicit learning of second language (L2) syntax and
vocabulary by adult learners. The linguistic focus was on verb
placement in simple and complex sentences (Rebuschat, Williams
2009, 2012; Tagarelli, Borges, Rebuschat 2011, in press). The novel
vocabulary items were ten pseudowords, taken from Hamrick and
Rebuschat (2012, 2013).
Sixty native speakers of English were exposed to an artificial
language consisting of German syntax and English words, including
ten pseudowords that followed English phonotactics. Subjects in the
incidental group (n = 30) did not know they were going to be tested,
nor that they were supposed to learn the grammar or vocabulary of a
novel language. The exposure task required subjects to judge the
semantic plausibility of 120 different sentences, e.g. Chris placed
today the boxes on the dobez (plausible) and Sarah covered usually
the fields with dobez (implausible). The task thus required subjects
to process the sentences for meaning. Subjects were provided with a
picture that matched the meaning of the pseudowords, in the examples above with a black-and-white drawing of a table underneath the
sentence. Subjects in the intentional group (n = 30) read the same
120 sentences but were asked to discover the word-order rules and to
memorize the meaning of the pseudowords. In the testing phase, all
subjects completed two tests, a grammaticality judgment task to
assess whether they had learned the novel syntax and a forced-choice
task to assess their knowledge of the pseudowords. In both tasks,
subjects were also asked to report how confident they were and to
indicate what the basis of their judgment was. Confidence ratings and
source attributions were employed to determine whether exposure had
resulted in implicit or explicit knowledge (see Rebuschat 2013, for a
review).
Data collection has recently concluded but data have not yet been
fully analyzed. Given our previous research (e.g. Tagarelli et al. 2011,
in press; Grey, Williams, Rebuschat 2014; Rogers, Revesz, Rebuschat, in press; Rebuschat, Hamrick, Sachs, Riestenberg, Ziegler
2013), we predict that subjects will be able to acquire both the syntax
and the vocabulary of the artificial language simultaneously and that
the amount of implicit and explicit knowledge will vary depending on
the learning context, with subjects in the incidental group acquiring
primarily implicit knowledge and also some explicit knowledge, and
vice versa in the intentional group The paper concludes with implications for future research.
References
Grey S, Williams JN, Rebuschat P (2014) Incidental exposure and
L3 learning of morphosyntax. Stud Second Lang Acquis
36:134
Hamrick P, Rebuschat P (2012) How implicit is statistical learning?
In Rebuschat P, Williams JN (eds) Statistical learning and
language acquisition. Mouton de Gruyter, Berlin, pp 365382
Hamrick P, Rebuschat P (2013) Frequency effects, learning conditions,
and the development of implicit and explicit lexical knowledge.
In Connor-Linton J, Amoroso L (eds) Measured language:
quantitative approaches to acquisition, assessment, processing
and variation. Georgetown University Press, Washington
Rebuschat P, Williams JN (2009) Implicit learning of word order. In
Taatgen NA, van Rijn H (eds) Proceedings of the 31st annual
conference of the cognitive science society. Cognitive Science
Society, Austin, pp 425430
Rebuschat P (2013) Measuring implicit and explicit knowledge in
second language research. Lang Learn 63(3):595626

Cogn Process (2014) 15 (Suppl 1):S1S158


Rebuschat P, Hamrick P, Sachs R, Riestenberg K, Ziegler N (2013)
Implicit and explicit knowledge of form-meaning connections:
evidence from subjective measures of awareness. In Bergsleithner J,
Frota S, Yoshioka JK (eds) Noticing: L2 studies and essays in honor
of Dick Schmidt. University of Hawaii Press, Honolulu, pp 255275
Rogers J, Revesz A, Rebuschat P (in press) Implicit and explicit knowledge
of L2 inflectional morphology: an incidental learning study
Tagarelli KM, Borges Mota M, Rebuschat P (in press 2014) Working
memory, learning context, and the acquisition of L2 syntax. In
Zhisheng W, Borges Mota M, McNeill A (eds) Working memory
in second language acquisition and processing: theory, research
and commentary. Multilingual Matters, Bristol
Tagarelli K, Borges Mota M, Rebuschat P (2011) The role of working
memory in the implicit and explicit learning of languages. In
Carlson L, Holscher C, Shipley T (eds) Proceedings of the 33rd
annual conference of the cognitive science society. Cognitive
Science Society, Austin, pp 20612066

Towards a model for anticipating human gestures


in human-robot interactions in shared space
Patrick Renner1, Thies Pfeiffer2, Sven Wachsmuth2
1
Artificial Intelligence Group, Bielefeld University, Germany;
2
CITEC, Bielefeld University, Germany
Abstract
Human-robot interaction in shared spaces might benefit from human
skills of anticipating movements. We observed human-human interactions in a route planning scenario to identify relevant
communication strategies with a focus on hand-eye coordination.
Keywords
Shared-space interaction, Hand-eye coordination, 3D eye tracking
Introduction
A current challenge in human-robot interaction is to advance from
using robots as tools to solving tasks cooperatively with them in close
interaction. When humans and robots interact in shared space, by the
overlap of the peripersonal spaces of the interaction partners, an
interaction space is formed (Nguyen and Wachsmuth 2011). Here, the
actions of both partners have to be coordinated carefully in order to
ensure a save cooperation as well as a flawless, successful task
completion. This requires capabilities beyond collision avoidance,
because the robot needs to signal a mutual understanding of situations
where both interaction partners interfere. With a dynamic representation of its peripersonal space (Holthaus and Wachsmuth 2012), a
robot can be aware of its immediate surrounding and this way, e.g.,
avoid collisions before they are actually perceived as a potentially
harmful situation. However, shared-space interactions of humans and
robots are still far from being as efficient as those between humans.
Modeling human skills for anticipating movements could help
robots to increase robustness and smoothness of shared-space interactions. Our eyes often rest on objects we want to use or to refer to. In
a specific pointing task, Prablanc et al. (1979) found that the first
saccade to the target occurs around 100 ms before the hand movement is initiated. If the robot were able to follow human eye gaze and
to predict upcoming human gestures, several levels of interaction
could be improved: First, anticipated gesture trajectories could be
considered during action planning to avoid potentially occupied areas.
Second, action executions could be stopped if the robot estimates a
human movement conflicting with its current target. Third, the robot
could turn its sensors towards the estimated target to facilitate communication robustness and increase the humans confidence in the
grounding of the current target (Breazeal et al. 2005).

S59
Shared-space interaction study
Identifying corresponding human communication strategies requires
studying humans in free interaction. Therefore, we investigate faceto-face, goal-oriented interactions in a natural setting which comprises spatial references with gaze and pointing gestures. In a route
planning scenario, participants are to plan paths to rooms on three
floors of a university building. The three corresponding floor plans are
located on a table between them. The scale of the 32x32 cm plans is
approximately 1:180, each floor having about 60 rooms. The floor
plans are printed on a DIN A0 format poster. This way, each participant has one floor plan directly in front of him or her, one is shared
with the interaction partner, and one plan is not reachable. The difficulty of the task is increased by introducing blocked areas in the
hallways: Detours have to be planned (forcing participants to
repeatedly change the floor), which lead to more complex interactions
ensuring a lively interaction and not a rigid experimental design with
artificial stimuli.
Recorded data
In the experiments, multimodal data were recorded: Two video
cameras observed the participants during the interactions. One participant was equipped with mobile eye-tracking glasses. Pointing
directions and head positions of both participants were recorded by
an external tracking system. As analyzing eye-tracking data usually
requires time-consuming manual annotation, an automatic approach
was developed combining fiducial marker tracking and 3D-modeling
of stimuli in virtual reality as proxies for intersection testing
between the calculated line of sight and the real objects Pfeiffer and
Renner (2014). The occurrence of pointing gestures to rooms, stairs,
elevators and markers for blocked areas were annotated semiautomatically.
Results
The results of our experiments show that at the time of a pointing
gestures onset, it is indeed possible to predict its target area when
taking into consideration fixations which occurred in the last 200 ms
before the onset. When allowing a maximum deviation of 20 cm the
target area was predicted for 75 % of the cases and with a maximum
deviation of 10 cm for 50 % of the cases. Figure 1 shows an example
of a fixation on the target of a pointing gesture, preceding the hand
movement. In the same study, we also analyzed body movements, in

Fig. 1 An example for a fixation (highlighted by the ring) anticipating the pointing target. The three floor plans can be seen. The black
tokens serve for marking blocked areas

123

S60
particular leaning forward: Participants almost exclusively used
leaning forward to point to targets more distant than 65 cm (from the
edge of the table).
Conclusion
Altogether, our findings provide quantitative data to develop a prediction
model considering both eye-hand coordination and leaning forward. This
could enable the robot to have a detailed concept of an upcoming human
pointing movement. For example, based on current gaze information of
its interlocutor, a robot could predict that a starting pointing gesture
would end within a 20 cm radius around the currently fixated point (with
a 75 % chance). This will allow the robot to decide whether the predicated target space is in conflict with its own planned actions and it might
react accordingly, e.g. by avoiding the area or pausing.
Acknowledgments
This work has been partly funded by the DFG in the SFB 673
Alignment in Communication.
References
Breazeal C, Kidd CD, Thomaz AL, Hoffman G, Berlin M (2005)
Effects of nonverbal communication on efficiency and robustness
in human-robot teamwork. In: Intelligent Robots and Systems
2005. (IROS 2005). 2005 IEEE/RSJ International Conference on,
IEEE, pp 708713
Holthaus P, Wachsmuth S (2012) Active Peripersonal Space for More
Intuitive HRI. In: International Conference on Humanoid Robots,
IEEE RAS, Osaka, Japan, pp 508513
Nguyen N, Wachsmuth I (2011) From body space to interaction
space-modeling spatial cooperation for virtual humans. In: 10th
International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous
Agents and Multiagent Systems, Taipei, Taiwan, pp 10471054
Pfeiffer T, Renner P (2014) EyeSee3D: A Low-cost Approach for Analyzing Mobile 3D Eye Tracking Data Using Computer Vision and
Augmented Reality Technology. In: Proceedings of the Symposium
on Eye Tracking Research and Applications, ACM, pp 195202
Prablanc C, Echallier J, Komilis E, Jeannerod M (1979) Optimal
response of eye and hand motor systems in pointing at a visual
target. Biological cybernetics 124 pp 113124

Preserved expert object recognition in a case


of unilateral visual agnosia
Johannes Rennig, Hans-Otto Karnath, Marc Himmelbach
Center of Neurology, Hertie-Institute for Clinical Brain Research,
University of Tubingen, Tubingen, Germany
We examined a stroke patient (HWS) with a unilateral lesion of the
right medial ventral visual stream. A high resolution MR scan showed
a severe involvement of the fusiform and parahippocampal gyri
sparing big parts of the lingual gyrus. In a number of object recognition tests with lateralized presentations of target stimuli, HWS
showed remarkable deficits for contralesional presentations only. His
performance on the ipsilesional side was unaffected. We further
explored his residual capabilities in object recognition confronting
him with objects he was an expert for. These were items he knew
from his job as a trained car mechanic that were occupationally and
personally relevant for him. Surprisingly, HWS was able to identify
these complex and specific objects on the contralesional side while he
failed in recognizing even highly familiar everyday objects. This
observation of preserved expert object recognition in visual agnosia
gives room for several explanations. At first, these results may be
caused by enhanced information processing of the ventral system in
the intact hemisphere that is exclusively available for expert objects.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


On the other hand, expert knowledge could also trigger top-down
mechanisms supporting object recognition despite of impaired basic
functions of object processing. Finally, a more efficient stimulus
processing for expert objects might simply not require complete
resources of an intact ventral stream.

Visual salience in human landmark selection


Florian Roser, Kai Hamburger
University of Giessen, Germany
Abstract
Visual aspects of landmarks are a main component in almost every
theory about landmark preference, selection and definition theory. But
could this aspect be moderated by some other factors, for example the
objects position?
Keywords
Spatial cognition, Landmarks, Visual salience
Introduction
Visual aspects of objects play an elementary role in the landmark
selection process during a wayfinding task (Sorrows, Hirtle 1999; for
an overview see Caduff, Timpf 2008). Thereby the contrast to the
surrounding of the object is elementary, namely the contrast to other
objects. Our assumption is that the visual aspect of an object in a
wayfinding context is only necessary to recognize this object. The
decision in favor of or against an object will be based on other, more
cognitive aspects, for example an objects position.
Two contrary assumptions exist. On the one side preliminary
experiments showed that the ideal position at an intersection (allocentric perspective) is the position in front of the intersection in the
direction of turn (Roser, Hamburger, Krumnack, Knauff 2012). On
the other side it has been shown that in an arrangement of different
objects the pop-out object (single one or singleton) will be
preferred (Roser, Krumnack, Hamburger 2013). Here we want to
discuss in how far these two contrasting assumptions go together and
which influence different tasks or instructions can have.
Experiment
Method
A total of 32 students (21 $; age: 27 years; range: 1956) participated. All participants provided informed written consent. All had
normal or corrected-to-normal visual acuity and color vision (tested
with Velhagen and Broschmann 2003). They received course credits
or money for participation.
Materials and Procedure
The material existed of four grey (apartment) blocks with one white
square each at the medially oriented corner. These should represent
the facades of the building (Roser et al. 2012) at an intersection.
Within these blocks the different objects are placed; we call them
landmarks. This is due to the fact that they could help to orientate in
such an environment.
All landmark objects consist of a cross and five thin lines in different arrangements so that they are generally distinct (Fig. 1). Three
of these landmarks had the same color, one was different (singleton).
The color differences range from 0 to 180. The color gradient is
visible in Fig. 1 (left top and bottom). The single one was presented
once at each position at the intersection and once at each position for
a left and right turn (2nd and 3rd experimental condition). This results
in 64 different pictures/intersections which were presented in a randomized order.
Each participant was assigned to one of three experimental conditions. In the first condition (intersection) the intersections were
presented without a route direction arrow and the task was to choose
the object that pops out most. In the second one (intersection and
arrow) an arrow indicated the direction in which a change of route

Cogn Process (2014) 15 (Suppl 1):S1S158

S61
N, Wachsmuth I (eds) Proceedings of the 35th annual conference
of the cognitive science society cognitive science society, Austin, TX, pp 33153320
Sorrows ME, Hirtle SC (1999) The nature of landmarks for real and
electronic spaces. In: Freksa C, Mark DM (eds) Spatial information theory: cognitive and computational foundations of
geographic information science, international conference COSIT.
Springer, Stade, pp 3750
Velhagen K, Broschmann D (2003) Tafeln zur Prufung des Farbsinns.
33., unveranderte Auflage. Georg Thieme Verlag, Stuttgart

Fig. 1 Left (top and bottom) the used colors of the objects and the
color gradient. Top (middle and right) examples of the intersections
with and without an arrow. Bottom (right) results. The x-axis
represents the single experimental variation (low difference on the left
and high on the right). The y-axis represents the participants relative
object selection of the single object (percentage number)
was about to happen (Fig. 1); the task still remained the same as in the
first condition. In the third condition the intersections looked the same
as in the second one, but now the task was to choose the object, which
the participant would use in order to give a route description. All
experiments were run on the same computer (19 inches).
Results
Figure 1 (bottom right) depicts the frequency for choosing the single
object. In the condition intersection it can be seen that the single
object is clearly identifiable by a color difference of 11 on
(*100 %). 0 and 3 are on chance level and 6 something between.
A similar result is observable for the condition intersection and
arrow. The remaining condition, in which the participants had to
decide which object they would prefer to give a route description,
shows a different curve. First of all, it increases slower and secondly,
it reaches its top at around 60 %. On the other hand participants chose
the ideal position in 60 % of the cases This differs significantly from
chance level (t(9) = 4.576, p = .001).
Discussion
Participants are capable to identify the single object, if the color
difference exceeds 6. The instruction which one would you choose
to give a route description led to different landmark selections. Here
the position seems to play a major role. Thus, we may conclude that
the perception of the color distribution at the intersection is moderated
by the task at hand.
One interpretation could be that the contrast to the surrounding of
landmarks at an intersection is strongly moderated by the participants
task. This will be examined in more detail in further experiments.
Acknowledgments
We thank Anna Bosch and Sarah Jane Abbott for help within data
recording.
References
Caduff D, Timpf S (2008) On the assessment of landmark salience for
human navigation Cog Pro 9:249267
Klippel A, Winter S (2005) Structural salience of landmarks for route
directions. In: AG Cohn, DM Mark (Eds) Spatial information theory.
International Conference COSIT, Springer, Berlin pp 346362
Roser F, Hamburger K, Krumnack A, Knauff M (2012) The structural
salience of landmarks: Results from an online study and a virtual
environment experiment. J of Spatial Science 5, 3750
Roser F, Krumnack A, Hamburger K (2013) The influence of perceptual and structural salience. In: Knauff M, Pauen M, Sebanz

Left to right or back to front? The spatial flexibility


of time
Susana Ruiz Fernandez1, Juan Jose Rahona2, Martin Lachmair1
1
Leibniz Knowledge Media Research Center (KMRC), Tubingen,
Germany; 2 Complutense University, Madrid, Spain
How is time represented in space? Strong evidence was found for a
spatial representation of time that goes from left-to-right with past
represented on the left and future on the right side (Santiago et al.
2007). There is also evidence for a back-to-front timeline with past
represented behind and future ahead (Ulrich et al. 2012). Based on the
notion of a flexible representation of time onto space (Torralbo et al.
2006) the present study compared both time representations directly.
Embodied theories suggest that internal representations of abstract
concepts include multimodal perceptual and motor experiences.
Assuming richer back-to-front spatial experiences through our senses,
we expect faster responses for the back-to-front than for the left-toright response mapping.
Method
After the presentation of a future or past related word (e.g., yesterday,
tomorrow), forty-four participants (all right handed) had to classify
the time word moving the slider of a response device along one of two
axes (left-to-right axis or back-to-front axis) according to the temporal content of the words. In the congruent condition, participants
had to perform a right or forward movement in response to a futurerelated word and a left or backward movement if a past-related word
was presented. In the incongruent condition, a backward or left
movement was performed in response to a future-related word and a
forward or right movement in response to a past-related word. For the
performance of the movement, a response device was used that
recorded continuous movements of the manual response in the left-toright and the back-to-front plane (see Ulrich et al. 2012). Touchsensitive devices registered the onset of the response and the time
when the slider of the device reached one of the two endpoints.
Reaction time (RT) required from the onset of the presentation of the
word to the onset of the response (leaving the start position of the
slider) was measured. Additionally, the movement time (MT)
required from response onset to one of the two endpoints was measured. Depending on the required response axis response device was
rotated by 90 or 180.
The experiment consisted of four experimental blocks. The
experiment combined the factors response axis (back-to-front vs. leftto-right) and response congruency (congruent: forward or right
movement to future-related words and backward or left movement to
past-related words vs. incongruent: forward or right movement to past
related words and backward or left movement to future-related
words). Each combination resulted in one block that included 120
trials (including 20 practice trials). Separate repeated measures analyzes of variance (ANOVA) were conducted on mean RT and mean
MT taking participants (F1) as well as items (F2) as random factors.
When necessary, p-values were adjusted for violations of the sphericity assumption using the Greenhouse-Geisser correction.

123

S62

Cogn Process (2014) 15 (Suppl 1):S1S158


Torralbo A, Santiago J, Lupianez J (2006) Flexible conceptual projection of time onto spatial frames of reference. Cogn Sci
30:745757
Ulrich U, Eikmeier V, de la Vega I, Ruiz Fernandez S, Alex-Ruf S,
Maienborn C (2012) With the past behind and the future ahead:
Back-to-front representation of past and future sentences. Mem
Cognit 40:483495

Smart goals, slow habits? Individual differences


in processing speed and working memory capacity
moderate the balance between habitual and goaldirected choice behavior

Fig. 1 Mean RT depending on response congruency and response


axis
Results
RT results are showed in Fig. 1, which depicts mean RT as a function
of response congruency and response axis. An ANOVA on RT
showed shorter RT for the congruent (768.84 ms) compared to the
incongruent condition (812.15 ms), F1 (1, 43) = 29.42, p \ .001; F2
(1, 19) = 104.37, p \ .001. Participants were faster initiating a right
or forward movement for future-related words and a left or backward
movement for past-related words than initiating a left or backward
movement for future-related words and a right or forward movement
for past-related words. RT were marginally shorter for the left-to-right
axis (783.07 ms) compared to the back-to-front axis (797.92 ms), F1
(1, 43) = 3.73, p = .060; F2 (1, 19) = 14.06, p = .001. The interaction between response congruency and response axis failed
significance, F1 and F2 \ 1.
An ANOVA on MT did not reveal significant effects for response
congruency [F1 (1, 43) = 0.52, p = .476; F2 (1, 19) = 2.59,
p = .124], response axis [F1 (1, 43) = 1.58, p = .216], and interaction [F1 and F2 \ 1]. Only the F2-analysis revealed an effect of
response axis, F2 (1, 19) = 23.04, p \ .001. Accordingly, response
congruency and response axis affected movement initiation but not
movement execution.
Discussion
Results support a flexible projection of time into space. Unexpectedly,
a trend to faster responses for the left-to-right mapping was found,
suggesting an influence of reading direction on response axis. A
possible explanation is that reading temporal words could activate the
left-to-right response axis. This activation needs to be inhibited when
a front-to-back response is performed. This explanation is supported
by recent experiments that show a higher activation of the timespace
congruency when visual (instead of auditory) stimuli were used
(Rolke et al. 2013).
Acknowledgments
We thank R. Bahlinger, V. Engel, N. Feldmann, P Huber, S. Kaiser, J.
Kinzel, H. Kriening, S. Riedel, K. Wessolowski, E. Wiedemann and
K. Zeeb for their assistance.
References
Rolke B, Ruiz Fernandez S, Schmid M, Walker M, Lachmair M,
Rahona Lopez JJ, Hervas G, Vazquez C (2013) Priming the
mental time-line: Effects of modality and processing mode. Cogn
Process 14:231244
Santiago J, Lupianez J, Perez E, Funes MJ (2007) Time (also) flies
from left to right. Psychon B Rev 14:512516

123

Daniel Schad1, Elisabeth Junger2, Miriam Sebold1,


Maria Garbusow2, Nadine Bernhart2, Amir Homayoun Javadi3,
Ulrich S. Zimmermann2, Michael Smolka2, Andreas Heinz1,
Michael A. Rapp4, Quentin Huys5
1
Charite, Universitatsmedizin Berlin, Germany; 2 Technische
Universitat Dresden, Germany; 3 University College London (UCL),
UK; 4 Universitat Potsdam, Germany; 5 TNU, ETH und Universitat
Zurich, Switzerland
Choice behavior is shaped by cognitively demanding goal-directed
and by more automatic habitual processes. External cognitive load
manipulations alter the balance of these systems. However, it is
unclear how individual differences in specific cognitive abilities
contribute to the arbitration between habitual and goal-directed
decision-making.
29 adults performed a two-step decision task explicitly designed to
capture the two systems computational characteristics. We also
collected measure of fluid and crystalline intelligence.
There was an inverted U-shape relationship between processing
speed and habitual choice together with a linear relationship between
processing speed and goal-directed behavior. Working memory
capacity impacted on this balance only amongst those subjects with
high processing speed.
Different aspects of intelligence have specific contributions to
complex human decision-making. Individual differences in such
cognitive abilities moderate the balance between habitual and goaldirected choice behavior.

Tracing the time course of n 2 2 repetition costs


Juliane Scheil, Thomas Kleinsorge
Leibniz Research Centre for Working Environment and Human
Factors, Dortmund, Germany
Introduction
In order to flexibly adapt to a permanently changing environment, it is
necessary to inhibit previously activated but now irrelevant processing pathways. Empirically, this inhibition manifests itself only
indirectly in terms of a cost of reengaging a previously inhibited
pathway, that is, the so-called n - 2 repetition costs: When switching
among three tasks A, B, and C, higher reaction times and error rates
occur when the task in the current trial equals the task in trial n - 2
(i.e., sequences of type ABA) compared to two consecutive switches
to another task (sequences CBA). Although n - 2 repetition costs
have been reported in many studies, it remains an open question when
and how inhibition is triggered and how it develops over time.
A possibility to capture the time course of inhibition lies in the
variation of different time intervals in the cued task switching paradigm. The cue-stimulus interval (CSI) allows participants to prepare

Cogn Process (2014) 15 (Suppl 1):S1S158

S63

for the next task. On the other hand, no specific preparation for the
next task is possible during the response-cue interval (RCI), in which
usually a fixation mark is presented that contains no information
about the next task. Therefore, effects of the RCI cover passive
processes, like decaying inhibition or activation.
The present study aimed at investigating the time course of inhibition in a fine-grained manner. For this purpose, the length of the
RCI (the time between the response of trial n - 1 and the cue of trial
n) was manipulated in five steps separated by 125 ms each. This
allowed us to capture also non-linear trends of the size of n - 2
repetition costs that could be overlooked in designs using only two
distinct RCIs.
Method
In two experiments, subjects (Exp I: 10 men, 21 women, mean age
23.8 years; Exp II: 6 men, 15 women, mean age 22.7 years) switched
between three tasks in an explicitly cued task switching experiment.
In Exp I, participants had to indicate via keypress whether the digit
serving as imperative stimulus is smaller or larger than five, odd or
even, or regarding its position along the number line relative to the
digit five (central or peripheral). In Exp II, participants had to judge
shapes regarding their size (big or small), color (yellow or blue), or
shape (x or +). Stimuli were presented centrally on a 17 monitor on
light-grey background. Viewing distance approximated 60 cm. The
experimental design resulted from a factorial combination of the
within-subjects factors RCI, varied in five steps (50, 175, 300, 425,
and 550 ms), and Task Sequence (ABA vs. CBA).
Results
Both experiments revealed significant n - 2 repetition costs that were
modulated by the RCI. Costs were highest for RCIs of 300 ms and
differed significantly from those of RCIs of length 50 and 175 ms
(Experiment I and II), 425 ms (Experiment I), and 550 ms (Experiment II, cf. Fig. 1).
Discussion
In both experiments, the size of n - 2 repetition costs was modulated by the length of the RCI. Highest n - 2 repetition costs could
be observed for the RCI of 300 ms, while they were smaller for
shorter RCIs (50 ms or 175 ms). Furthermore, the size of n - 2
repetition costs declined again when the RCI exceeded 300 ms, that
is, with a RCI of 425 and 550 ms. This pattern can be interpreted in
terms of an overlap of two different time courses involved in
inhibition.
On the one hand, inhibition seems to need about 200300 ms
to reach its full extent, reflecting a process of building up a sufficient amount of inhibition in order to cope with interference of
recently established task sets. Importantly, while there have been

Experiment I

Experiment II
*
*

100

n - 2 repetition cost [ms]

**

80

**
*

60

40

20

0
50

175

300

425

550

50

175

300

425

550

RCI [ms]

Fig. 1 Mean n - 2 repetition cost [ms] as a function of RCI [ms]


(*p \ .05; **p \ .01). Error bars represent SEM

investigations focusing on how and when inhibitory processes


decline, the present study is the first trying to identify the time
needed for inhibition to build up On the other hand, our results
suggest that n - 2 repetition costs, after reaching their maximum
at about 300 ms, start to decay. Therefore, the results are in line
with the assumption of inhibition that, once exerted, decays during
the RCI.

Language cues in the formation of hierarchical


representation of space
Wiebke Schick, Marc Halfmann, Gregor Hardiess,
Hanspeter A. Mallot
Cognitive Neuroscience, Dept of Biology, University of Tubingen,
Germany
Keywords
Region effect, Linguistic categories, Whole-part-relations, Interaction
language, spatial knowledge
The formation of a hierarchical representation of space can be
induced by the spatial adjacency of landmark objects belonging to the
same semantic category, as was demonstrated in a route planning
experiment (Wiener, Mallot 2003). Using the same paradigm, we
tested the efficiency of linguistic cues with various hierarchical categorization principles in regional structuring. In different conditions,
the experimental environment was parceled (i) with landmarks of
different semantic categories, (ii) with superordinate fictive proper
names, (iii) with superordinate prototypical names, (iv) with names
from different linguistic semantic categories, and (v) with holonymmeronym relations (semantic whole-parts relation). A region effect
comparable to the landmark-object condition was found only for the
holonym-meronym condition which combined spatial proximity with
a shared context.
Wiener, Mallot (2003) investigated the influence of regions on
human route planning behavior in a hexagonal, iterated y-maze in a
virtual environment. All of the 12 decision places were marked by a
landmark belonging to one of three different semantic categories
(vehicles, animals and paintings), thus defining three regions composed of four adjacent places. When asked to navigate routes which
allowed for two equidistant alternatives, subjects consistently preferred the one that crossed fewer regional borders (61.6 % against
chance level). These routes also passed more places of the target
region.
In the actual investigation, we repeated the experiment and also
modified it to test whether such a region-perception can be evoked
linguistically as well.
Procedure
The test phase consisted of 18 navigation trials including 12 with
equidistant but region-sensitive route alternatives, and six distractors.
The subjects were asked to choose the shortest route passing three
places and had access to the place names on a second screen.
Participants
Only the data of those who performed at least 50 % of the test routes
correctly were included in the analysis. This applied to 65 subjects
(37 female, 28 male, all 1943 years of age).
Variables of interest
Test trials allowed for two equidistant route alternatives to the goal,
differing in the amount of region boundaries that had to be crossed.
We call the route choices with the smaller number of region-crossings
region-consistent and count the total number of region-consistent
routes for each subject, expecting a chance level of 50 % if route
choice was based solely on distance. Significant preference for one
route type is regarded as evidence for a regionalized representation of

123

S64
the experimental environment. We also measured the navigational
errors.
Results
The results of the landmark-condition confirmed the findings by
Wiener, Mallot (2003). For the linguistic conditions, higher error rates
as well as strong differences in the prevalence of region-consistent
route choices were found. A significant preference was found only for
the holonym-meronym condition. We therefore suggest that language-based induction of hierarchies must be in itself of spatial nature
to induce a regionalized representation of space.
Reference
Wiener JM, Mallot HA (2003) Fine-to-Coarse route planning and
navigation in regionalized environments. Spatial Cogn Comput
3(4):331358

Processing of co-articulated place information in lexical


access
Ulrike Schild1, Claudia Teickner2, Claudia K. Friedrich1
1
University of Tubingen, Germany; 2 University of Hamburg,
Germany
Listeners do not have any trouble identifying assimilated word forms
such as the spoken string ,,gardem benchas an instance of
,,garden bench. Assimilation of place of articulation, such as the
coronal place of articulation of the final speech sound of ,,gardento
the dorsal place of articulation of the initial speech sound of
,,benchis common in continuous speech. It is a matter of debate how
the recognition system handles systematic variation resulting from
assimilation. Here we test the processing of place variation as soon as
appears in the signal. We used co-articulated information in speech
sounds. For example, the/o/in ,,joghas already encoded the dorsal
place of articulation of the following/g/.
It is still a matter of debate whether subphonemic information is
normalized at a pre-lexical level of representation or is maintained
and used for lexical access. On the one hand, many traditional
models of spoken word recognition such as Cohort (Marslen-Wilson
1987) or TRACE (McClelland 1986) favor abstract pre-lexical
representations. Here, sub-phonemic variation is resolved at a prelexical level. On the other hand, full-listing exemplar approaches
(Goldinger 1998) assume that phonetic detail is fully represented in
lexical access with no need of pre-lexical representations that normalize for variation. Variation in co-articulation information should
be less disruptive in the former than in the later account. Somewhere in-between both types of models, the featurally underspecified
lexicon (FUL) model (Lahiri and Reetz 2002) avoids pre-lexical
representations by means of sparse abstract lexical representations
storing only those features that do not frequently undergo variation
in the signal. According to FUL, non-coronal features like the labial
or dorsal place of articulation are stored in the lexicon. If the input
contains another place of articulation the respective candidate is not
further considered. For example, ,,foanwould not be able to activate ,,foam. By contrast, coronal place features are not stored in
the lexicon. Thus, utterances containing a coronal feature at a certain position should be activated by any input containing a noncoronal feature at that position. For example ,,gardem can activate
,,garden.
Here we investigate the processing of co-articulatory place
information in cross-modal word onset priming. We presented 41
German target words with coronal place of the word medial consonant (e.g., ,,Rinne, Engl., chute), and 41 German target words with
non-coronal place of the word medial consonant (e.g. ,,Dogge, Engl.,
mastiff). In addition we presented 41 pseudowords that diverged from

123

Cogn Process (2014) 15 (Suppl 1):S1S158


the coronal target words in medial place (e.g., ,,Rimme), and 41
pseudowords that diverged from the non-coronal targets in medial
place (e.g., ,,Dodde). Spoken prime fragments were preceding the
visual target words and pseudowords, which were presented in capitals. In Experiment 1, the spoken primes were the onsets of the target
words and of the pseudowords up to the first nucleus. Those cvprimes differed only in the place feature co-articulated in the vowel,
such as ,,ri[n]and ,,ri[m]. In Experiment 2, the spoken primes were
the onsets of the target words and of the pseudowords up to the
consonant following the first nucleus. Those cvc-primes differed in a
complete phoneme, such as ,,rinand ,,rim. In a Match condition, the
primes were followed by their carrier words (e.g., ,,rin-RINNE) or
carrier pseudowords (e.g., ,,rim-*RIMME), in a Variation condition,
the primes were followed by their respective pseudoword pair
member (e.g., ,,rin-*RIMME) or their respective word pair member
(e.g., ,,rim-RINNE). Unrelated prime-target pairs were taken as
controls (,,dog-RINNE). Taking together, we manipulated Condition
(Match vs. Variation vs. Control), Lexicality (words vs. pseudowords)
and word medial place of the target (coronal vs. non-coronal) as
within-subject factors; and Prime Length (cv-primes in Experiment 1
vs. cvc-prime in Experiment 2) as between-subject factor. Parallel to
classical psycholinguistic research, we analyzed only the first presentation of the target. Presentation order was counterbalanced across
participants.
With respect to the role of features in lexical access, we tested
whether word recognition cascades from features to the lexicon. If so,
we should not find different results for cv-primes vs. cvc-primes. With
respect to a pre-lexical level of representation, we tested whether
subphonemic variation is maintained up to the lexical level. If so, the
Match condition and the Variation condition should differ for words,
but not for pseudowords in Experiment 1. With respect to the
assumptions of the FUL model, we tested whether lexical representations are sparse for coronal place. If so, responses to the Match
condition and to the Variation should only differ for non-coronal
targets, but not for coronal targets.
Results of four-way ANOVA with the factors Prime Length
(Experiment 1 vs. Experiment 2), Lexicality (Word Targets vs.
Pseudoword Targets), Place (Targets with Coronal Segment vs. Targets with Non-coronal Segment) and Condition (Match vs. Variation
vs. Control) are informative for our hypotheses.
First, there was no significant interaction with the factor Prime
Length. That is, behavioral results were comparable across both
experiments. This is support for cascaded activation of lexical representations from features to word forms.
Second, there was an interaction of the factors Condition and
Lexicality. For word targets and for pseudoword targets, responses
were slowest for the Control condition. For pseudowords, the Match
condition and the Variation condition did not differ. However, for
words, responses for the Match condition were faster than responses
for the Variation condition (Fig. 1, left panel). This is support for the
assumption that the lexicon is involved in processing sub-phonemic
variation.
Third, there was an interaction of the factors Condition and Place.
Responses to coronal targets in the Match condition and in the Variation condition did not differ from each other, but both were faster
than responses in the control condition. Responses to non-coronal
targets were fastest in the Match condition, intermediate in the Variation condition and slowest in the Control condition (Fig. 1, right
panel). This is evidence for the assumption of unspecified coronal
place. However, it does not appear that this effect is mediated by the
lexicon because it was not modulated by the factor Lexicality.
The results suggest that information of anticipatory co-articulation
is maintained and used in lexical access. Completely matching
information activates the target words lexical representation more
effectively than partially mismatching information. Even subtle subphonemic variation reduces lexical activation. Thus, subphonemic

Cogn Process (2014) 15 (Suppl 1):S1S158


Lexicality x Condition

850

S65
PLACE x Condition

800

RT [ms]

750

700

650

600

words
pseudowords

550
0

Match

Variation

Control

coronal
non-coronal

Match

Variation

Control

Fig. 1 Shown are mean lexical decision latencies collapsed across


both experiments. The left side illustrates responses to words (black)
and pseudowords (white), the right side illustrates responses to
coronal targets (black) and non-coronal targets (white) in the Match
condition, in the Variation condition and in the Control respectively.
Error bars indicate standard errors
detail appears to be used for lexical access in a similar way as phonemic information. Furthermore, our results are evidence for the FUL
model.
References
Goldinger SD (1998) Echoes of echoes? An episodic theory of lexical
access. Psychol Rev 105(2):251279
Lahiri A, Reetz H, Gussenhoven C, Warner N (2002) Underspecified
recognition. Mouton de Gruyter, Berlin, pp 638675
Marslen-Wilson WD (1987) Functional parallelism in spoken wordrecognition. Cogn Int J Cogn Sci 25(12):71102
McClelland JL, Elman JL (1986) The TRACE model of speech
perception. Cogn Psychol 18(1):186

Disentangling the role of inhibition and emotional


coding on spatial stimulus devaluation
Christine Scholtes, Kerstin Dittrich, Karl Christoph Klauer
Universitat Freiburg, Abteilung Sozialpsychologie und
Methodenlehre, Germany
Keywords
Spatial position, Stimulus devaluation, Emotional coding,
Edge aversion, Eye tracking
In a study investigating the influence of visual selective attention on affective evaluation, Raymond, Fenske, and Tavassoli
(2003) observed a distractor devaluation effect: Previously to-beignored stimuli were emotionally devaluated compared to to-beselected stimuli and neutral stimuli not previously presented.
According to Raymond et al. (2003), this stimulus devaluation can
be explained by assuming that cognitive inhibition is applied to the
to-be-ignored stimulus. This inhibition is assumed to be stored with
the mental representation of this stimulus and applied to the
evaluation task where the stimulus is presented again. Another
explanation account is provided by Dittrich and Klauer (2012).
According to their account, the act of ignoring leads to a negative
emotional code on this to-be-ignored stimulus. This negative code
is assumed to be stored with the mental representation of the to-beignored stimulus leading to devaluation when the stimulus is
encountered again.

Aside from ignoring, also the spatial position of a stimulus has


proven to influence the evaluation of (e.g., Valenzuela and Raghubir
2009, 2010) and the preference for (e.g., Christenfeld 1995; Rodway,
Shepman and Lambert 2012) certain stimuli. Meier and Robinson
(2004) showed that upper positions are associated with positive affect
and lower positions with negative affect. In another study, products in
a supermarket context were evaluated as more expensive when presented in upper shelves compared to lower positioned products
(Valenzuela and Raghubir 2010). In horizontal arrangements though,
there is evidence for an advantage of the central stimulus position. In
several studies, participants preferred the central stimulus to laterally
presented stimuli (e.g., Christenfeld 1995; Rodway et al. 2012; Valenzuela and Raghubir 2009). This effect pattern was called centerstage-effect by Valenzuela and Raghubir (2009). Attali and Bar-Hillel
(2003) suggested that this pattern is not based on preferences for the
central position but might rather be explained with an aversion against
the edges of this stimulus configuration.
The present research combines affective stimulus devaluation and
the concept of spatial position effects and measures their influence on
later stimulus evaluation. It is assumed that lateral stimuli will be
devaluated either due to a negative code which is applied to them (via
edge aversion; extending the emotional coding account to other
possible stimulus-connoting factors such as spatial position) or due to
a (passive) inhibition applied to lateral positions compared to central
products and comparable baseline stimuli. Moreover, the present
research is set on disentangling these just mentioned possible explanation accounts of this lateral devaluation effect by using a
combination of stimulus evaluation patterns and eye tracking
measurements.
Experiment 1 (N = 20) was conducted to investigate the affective
evaluations of centrally and laterally presented products compared to
neutral baseline stimuli. In a presentation task, three cosmetics were
presented simultaneously in a row. The subsequent evaluation task
revealed a devaluation of lateral stimuli compared to central and
more importantcompared to baseline stimuli. This lateral devaluation below baseline level is a new finding which points to a bias of
the edges and not to a center-stage-effect when comparing central and
lateral stimuli. However, the underlying mechanisms that might have
led to this lateral devaluation are not solved yet. A devaluation of
lateral products might either base on affective codinga positively
connoted center position contrasted to a negatively connoted lateral
position (see Attali and Bar-Hillel 2003; Dittrich and Klauer 2012;
Valenzuela and Raghubir 2009); or the effect might base on an
attentional focus on the center product (e.g., Tatler 2007) and a
possible consequential neglect of the lateral stimuli. In Experiment 2
(planned N = 80), we are currently trying to disentangle these just
mentioned possible mechanisms. Again, three cosmetics are simultaneously presented; this time either in a horizontal row (Condition 1;
replicating Experiment 1) or in a vertical column (Condition 2).
Subsequently, one single product either previously presented or not
presented has to be emotionally evaluated by the participants. During
the experiment, the participants eye gazes are tracked. Of interest is
the dwell time in three previously defined areas of interest including
the three cosmetic products. We expect that products in the vertical
arrangement will be evaluated more positively the higher they are
placed in the column (see Meier and Robinson 2004; Valenzuela and
Raghubir 2010); they are also assumed to be evaluated more positively than novel products. Products in the horizontal arrangement
will be devaluated when presented laterally compared to central or
novel baseline products (see results Experiment 1). However, the
participants attentional focus is assumed to rest on the central
product in both arrangements (Tatler 2007). A respective result pattern would indicate emotional coding as the underlying mechanism,

123

S66
as the attentional focus on the central product would implicate
following the inhibition accountthat in both conditions the lateral
products would be inhibited and thus devaluated. Preliminary analyzes of the eyetracking data of 40 participants revealed the expected
gaze pattern: participants in both conditions focused on the central
product. Implications for the two competing explanation accounts as
well as for the transfer of the lateral devaluation effect to consumer
psychology will be discussed.

References
Attali Y, Bar-Hillel M (2003) Guess where: the position of correct
answers in multiple-choice test items as a psychometric variable.
J Educ Meas 40(2):109128
Christenfeld N (1995) Choices from identical options. Psychol Sci
6(1):5055
Dittrich K, Klauer KC (2012) Does ignoring lead to worse evaluations? A new explanation of the stimulus devaluation effect. Cogn
Emot 26:193208
Meier B, Robinson M (2004) Why the sunny side is up. Psychol Sci
15:243247
Raymond J E, Fenske M J, Tavassoli N T (2003) Selective attention
determines emotional responses to novel visual stimuli. Psychol
Sci 14(6):537542
Rodway P, Schepman A, Lambert J (2012) Preferring the one in the
middle: Further evidence for the centre-stage effect. Appl Cogn
Psychol 26:215222
Tatler B (2007) The central fixation bias in scene viewing: Selecting
an optimal viewing position independently of motor biases and
image feature distributions. J Vis 7(14):117
Valenzuela A, Raghubir P (2009) Position-based beliefs: the centerstage effect. J Consum Psychol 19(2):185196
Valenzuela A, Raghubir P (2010) Are Consumers Aware of Top
Bottom but not of LeftRight Inferences? Implications for Shelf
Space Positions (Working Paper). New York: Baruch College
City University, Marketing Department

The role of working memory in prospective


and retrospective motor planning
Christian Seegelke1,2, Dirk Koester1,2, Bettina Blaesing1,2,
Marnie Ann Spiegel1,2, Thomas Schack1,2,3
1
Neurocognition and Action Research Group, Bielefeld University,
Germany; 2 Center of Excellence Cognitive Interaction Technology,
Bielefeld University, Germany; 3 CoR-lab, Bielefeld University,
Germany
A large corpus of work demonstrates that humans plan and represent
actions in advance, taking into account future task demands (i.e.,
prospective planning). Empirical evidence exists that, action plans
are not always generated from scratch for each movement, but
features of previously generated plans are recalled, modified
appropriately, and then used for subsequent actions (e.g., van der
Wel et al. 2007). This retrospective planning is likely to serve the
purpose of reducing the cognitive costs associated with motor
planning. In general, these findings support the notion that action
planning is contingent on both future and past events (cf. Rosenbaum et al. 2012). In addition, there is considerable evidence to
suggest that motor planning and working memory (WM) share
common cognitive resources (e.g., Weigelt et al. 2009; Spiegel et al.
2012; 2013). In two experiments, we further explored the role of
WM in prospective and retrospective motor planning using different
dual-task paradigms.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Experiment 1 examined the mutual influence of reduced attentional resources on the implementation of a new action plan and of
movement planning on the transfer of information into visuospatial
WM. To approach these two questions, we used a dual-task design in
which participants grasped a sphere and planned a placing movement
toward a left or right target, according to a directional arrow. (Previous research using a single memory task suggested that visuospatial WM is more effected by a grasp-to-place task than verbal
WM; Spiegel et al. 2012.) Subsequently, participants encoded visuospatial information, i.e., a centrally presented memory stimulus
(4 9 4 symbol matrix). While maintaining the information in WM, a
visual stay/change cue (presented on the left, center or right) either
confirmed or reversed the direction of the planned movement (indicated by its color). That is, participants had to execute either the
prepared or a re-planned movement, before they reported the symbols
of the matrix without time pressure. The results show that both
movement re-planning as well as shifting spatial attention to the
location of the incongruent stay/change cues constitute processing
bottlenecks, presumably because both actions are based on visuospatial WM performance. Importantly, the spatial attention shifts and
movement re-planning appeared to be independent of each other.
Further, we found that the initial preparation of the placing movement
influenced the report of the memorized items. Preparing a leftward
movement resulted in better memory performance for the left matrix
half, while the preparation of a rightward movement resulted in better
memory performance for the right matrix half. Hence, movement
planning influenced the transfer of information into WM. Therefore,
experiment 1 suggests that movement planning, spatial attention and
visuospatial WM are functionally correlated but not linked in a
mandatory fashion.
Experiment 2 examined the role of WM on action plan modification
processes (retrospective motor planning) using a hand path priming
paradigm. Participants performed a sequential manual tapping task
comprised of nine movements in time with a metronome. In a defined
part of the trials, tapping movement had to cross an obstacle between the
two center targets. Participants executed this task alone (motor only
conditions) or while concurrently performing a WM task of varied
difficulty (i.e., counting backwards in steps of one or three; motor-WM 1 and motor-WM -3 condition, respectively). In addition, participants
performed the WM tasks without simultaneously executing the motor
task (WM -1 and WM -3 conditions, respectively). As the generation of
a new motor plan from scratch is thought to require more WM resources
compared to recall of a previously generated plan, we expected the
retrospective effect on motor planning (as measured by means of peak
movement height after clearing the obstacle) to increase with task difficulty (i.e., motor-WM- 3 [ motor-WM -1 [ motor only).
Corroborating findings from earlier studies (van der Wel et al. 2007), we
found that after clearing an obstacle, peak heights of the manual tapping
movements were only gradually reduced. This hand path priming effect
has been interpreted as indicating that participants recalled the previously generated motor plan and only slightly modified it for the
subsequent movements, thereby saving cognitive processing resources.
Contrary to our expectation, the results showed that the magnitude of
the hand path priming effect was similar regardless of whether participants performed the motor task alone or together with a WM task. This
finding suggests that WM has no moderating influence on retrospective
motor planning. However, peak heights of the tapping movements were,
on average, higher during the dual-task conditions compared to the
single-task condition, suggesting an influence of WM on movement
execution in general. In addition, WM performance was not influenced
by task condition (i.e., single vs. dual-task). These two experiments
point toward a tight functional interaction between action control and
(spatial) WM processes and attentional load. However, retrospective
and prospective planning may draw differentially on WM and attentional resources.

Cogn Process (2014) 15 (Suppl 1):S1S158


References
Rosenbaum DA, Chapman KM, Weigelt M, Weiss DJ, van der Wel R
(2012) Cognition, action, and object manipulation. Psychol Bull
138:924946
Spiegel MA, Koester D, Schack T (2013) The functional role of
working memory in the (re-) planning and execution of grasping
movements. J Exp Psychol Hum Percept Perform 39:13261339
Spiegel MA, Koester D, Weigelt M, Schack T (2012) The costs of
changing an intended action: movement planning, but not execution,
interferes with verbal working memory. Neurosci Lett 509:8286
van der Wel R, Fleckenstein RM, Jax SA, Rosenbaum DA (2007)
Hand path priming in manual obstacle avoidance: evidence for
abstract spatiotemporal forms in human motor control. J Exp
Psychol Hum Percept Perform 33:11171126
Weigelt M, Rosenbaum DA, Huelshorst S, Schack T (2009) Moving
and memorizing: motor planning modulates the recency effect in
serial and free recall. Acta Psychol 132:6879

Temporal preparation increases response conflict


by advancing direct response activation
Verena C. Seibold, Freya Festl, Bettina Rolke
Evolutionary Cognition, Department of Psychology, University
of Tubingen, Germany
Temporal preparation refers to processes of selectively attending and
preparing for specific moments in time. Various studies have shown
that these preparatory processes allow for faster and more efficient
stimulus processing, as reflected in shorter reaction time (RT) and
higher accuracy in a variety of tasks (e.g. Rolke, Ulrich 2010).
Recently, however, Correa et al. (2010) showed that temporal preparation impairs performance in tasks with conflicting response
information. Specifically, these authors observed that temporal
preparation magnified compatibility effects in a flanker task. The
flanker compatibility effect refers to an increase in RT to a target that
is flanked by response-incompatible stimuli. According to dual-route
models (e.g. Eimer et al. 1995), this effect arises because stimuli
activate responses at a cortical level along two parallel routes: a
slower controlled route, which activates responses according to task
instructions, and a fast direct route, which activates responses via
direct response priming. In case of incompatible flankers the direct
route thus activates the incorrect response, leading to conflict. Within
this framework, temporal preparation may increase conflict effects by
giving direct-route processing a head start. We investigated this idea
by measuring the stimulus-locked lateralized readiness potential
(LRP) of the event-related potential (ERP) in a flanker task. We
picked the LRP because it reflects response hand-specific ERP lateralization in motor areas and thus enabled us to separate controlled
from direct response activation in incompatible flanker trials: whereas
controlled (correct) response hand activation shows up in a negativegoing LRP, direct activation of the incorrect response hand emerges
as an early positive LRP dip Accordingly, if temporal preparation
advances direct-route based response activation we expected to
observe an earlier occurring positive LRP in incompatible trials. In
addition, this latency shift may also affect response activation in the
controlled route, as indexed by the negative LRP.
Method
Twelve participants performed an arrowhead version of the flanker
task. In each trial participants had to indicate the orientation of a
central arrowhead with either a left or right hand response. This target
was flanked by two vertically aligned stimuli being either responsecompatible (arrowheads pointing in the same direction), incompatible
(arrowheads pointing in the opposite direction) or neutral (rectangles).
To maximize compatibility effects and disentangle the time-course of

S67
incorrect and correct response activation we included a constant
flanker-to-target delay of 100 ms (see Kopp et al. 1996). A blocked
foreperiod (FP) paradigm (FPs of 800 and 2,400 ms) served as
manipulation of temporal preparation, whereby the short FP leads to
good temporal preparation.
The LRP was derived at electrode sites C4/C3 in the digitally
filtered (0.0510 Hz), artifact-free (horizontal EOG \ 30 lV; all
other electrodes \ 80 lV) ERP as the average of contra minus
ipsilateral activity for left- and right-hand responses. The 100 ms preflanker interval served as baseline. Jackknife-based onset latency
(50 % relative amplitude criterion) was calculated for positive and
negative LRPs (time windows: 140240 ms and 150400 ms).
Results
Statistical analysis was performed via repeated-measures analysis of
variance (rmANOVA) and pairwise t-tests for post hoc comparisons
(with Bonferroni corrected p-values). Mean RT in correct trials, mean
percentage error (PE), and negative LRP onsets were submitted to
separate rmANOVAs with factors foreperiod (short, long) and compatibility (compatible, neutral, incompatible). Positive LRP onset was
analyzed via an rmANOVA with factor foreperiod (short, long).
Analysis of mean RT revealed a compatibility main effect,
F(2,22) = 75.8, p \ .001 (RTcompatible \ RTneutral \ RTincompatible;
both ps \ .001; Fig. 1). Furthermore, FP had a main effect on RT,
F(1,11) = 37.1, p \ .001, which was further qualified by a FP x
Compatibility interaction, F(2,22) = 4.6, p = .02. RTs were shorter
after the short FP, but only in compatible and neutral trials (both
ps \ .001), not in incompatible trials (p = .16). PE was affected by
compatibility, F(2,22) = 9.1, p = .01, and FP, F(1,11) = 10.6,
p = .008, as well as the interaction, F(2,22) = 10.4, p = .006. PE
was selectively higher in incompatible trials, t(11) = 3.0, p = .02,
specifically after the short FP, t(11) = 3.4, p = .02.
Negative LRP onset (Fig. 2a) was affected by compatibility,
FC(2,22) = 28.0, p \ .001, with increasing latency from compatible
to neutral to incompatible trials (both ps \ .001). The FP main effect
was not significant, FC(1,11) = 2.64, p = .13, nor the FP x Compatibility interaction (FC \ 1). The positive LRP in incompatible
trials was clearly affected by FP, FC(1,11) = 18.8, p = .001, with
shorter latency after the short FP (Fig. 2b).
Discussion
By means of ERPs, we examined how temporal preparation affects
response activation in conflict tasks. Replicating previous studies
(Kopp et al. 1996), we observed clear compatibility effects, as RT and
(negative) LRP latency increased from compatible to incompatible
trials. Furthermore, temporal preparation increased the size of the
behavioral response conflict. Most importantly, temporal preparation
reduced the latency of the positive LRP in incompatible trials
indexing direct response activation, but it did not affect negative LRP

Fig. 1 Mean RT (correct responses) and PE as a function of


compatibility and FP

123

S68

Fig. 2 a Negative LRP as a function of compatibility. b Positive LRP


in incompatible trials as a function of FP Flanker (F) and target
(T) onset are marked at the x-axis
latency indexing controlled response activation. This finding suggests
that temporal preparation modulates response activation along the
direct route and thereby increases response conflict.
References
Correa A, Cappucci P, Nobre AC, Lupianez J (2010) The two sides of
temporal orienting: facilitating perceptual selection, disrupting
response selection. Exp Psychol 57:142148. doi:10.1027/16183169/a000018
Eimer M, Hommel B, Prinz, W (1995) SR compatibility and response
selection. Acta Psychol 90:301313. doi:10.1016/0001-6918(95)
00022-M
Kopp B, Rist F, Mattler U (1996) N200 in the flanker task as a
neurobehavioral tool for investigating executive control. Psychophysiol 33:282294. doi:10.1111/j.1469-8986.1996.tb00425.x
Rolke B, Ulrich R (2010) On the locus of temporal preparation:
Enhancement of pre-motor processes. In: Nobre AC, Coull JT (eds)
Attention and time. Oxford University Press, Oxford, pp 228241

The flexibility of finger-based magnitude


representations
Elena Sixtus, Oliver Lindemann, Martin H. Fischer
Cognitive Science Division, University of Potsdam, Germany
Finger counting is a crucial step towards accomplishing counting
and understanding number. Consistent with the theoretical stance
of embodied cognition (see e.g., Glenberg, Witt, Metcalfe 2013),
recent studies reported evidence suggesting that adults show an
influence of finger counting on cognitive number processing in

123

Cogn Process (2014) 15 (Suppl 1):S1S158


various tasks (e.g., Domahs, Moeller, Huber, Willmes, Nuerk
2010; Fischer 2008). Di Luca and Pesenti (2007) demonstrated in
adults that pictures of finger counting postures prime numerical
size in an Arabic number classification task. This suggests that
finger representations become automatically activated during
number processing. The present study reports further interactions
between the execution of finger counting postures and the processing of numbers; it provides evidence for an activation of
number representations through finger postures.
In Experiment 1, 25 right-handed adult participants were instructed to compare two successively presented digits while performing
finger postures. Each trial comprised a reference number ranging from
2 to 4, followed by a target number that was either smaller or larger
by 1 and thus ranging from 1 to 5. Responses were given verbally (i.e.
saying ta for bigger and to for smaller). The postures were
executed behind a visual occluder with the dominant hand with 2 to 4
fingers stretched out in an either canonical (finger counting, starting
from the thumb) or non-canonical way. Crucially, the number of
extended fingers sometimes corresponded with the presented target
number (congruent trials). The current posture was instructed by the
experimenter before each block of 15 trials. Each trial started with a
button press before the finger posture was readopted to refresh participants proprioceptive experience. Results showed a significant
comparison time advantage for congruent trials, only when canonical
finger postures were adopted (RT advantage of 13 ms, SD = 27 ms
for congruent trials compared to incongruent trials; t(24) = 2.39,
p \ .03). These data suggest that, although most participants reported
not to be aware that they were occasionally adopting finger counting
postures, these finger movements pre-activated the representation of
specific numbers that led to a facilitated number processing.
In Experiment 1, almost all participants were right-starters in
finger counting. It is possible that congruency effects only emerge for
the hand that is usually used to represent these specific numbers. It
also remains unclear whether the coding of numbers larger than 5
benefits from adopting finger postures. We therefore conducted a
second experiment in which both hands were used and numbers
between 2 and 9 served as stimuli.
In Experiment 2, 26 right-handed participants verbally classified
numbers (2, 3, 4, 7, 8, 9) as odd or even, while again executing
canonical or non-canonical finger postures with one hand. In contrast
to Experiment 1, participants were required to perform two blocks, in
which they adopted finger postures with the left and with the right
hand. Responses were again given verbally, by saying odd or
even (German: ungerade and gerade, respectively). We subtracted from each vocal RT the subjects individual mean RT per
response. In this design, at least four different congruencies can be
distinguished. Again, the number of extended fingers could coincide
with the classified number (exact congruency), the numerical size of
the stimulus could correspond to the respective hand in finger
counting (counting hand congruency), both the number of fingers and
the digit were either odd or even (parity congruency), and both the
finger posture and the digit could be relatively small or large (with a
range of 24 for finger postures and a range of 29 for presented
digits; relative size congruency). While no significant exact and
counting hand congruency effects were found and only a trend for a
RT advantage for parity congruent trials (4 ms, SD = 12 ms;
t(25) = 1.89, p = .07), there was a significant relative size congruency effect for canonical (but not for non-canonical) postures 2 and 4
(12 ms, SD = 23 ms for congruent trials compared to incongruent
trials; t(25) = 2.56, p \ .02): Executing a relatively small counting
posture led to faster parity decisions for small than large digits, and
vice versa for a relatively big counting posture, while a medium
counting posture had no such effect.
Together, these results clarify our understanding of embodied
number processing. First, the presence of the exact congruency effect
was limited to a situation in which the numbers did not exceed the

Cogn Process (2014) 15 (Suppl 1):S1S158


counting range of one hand, suggesting that finger counting postures
only activate the corresponding mental number representations when
embedded in an appropriate task. Second, the absence of a counting
hand congruency effect shows that using the non-starting hand does
not necessarily activate the respective mental representation for larger
numbers. Third, the finding that finger postures and numbers interact
based on their respective relative sizes demonstrates a more flexible
size activation through finger postures than previously assumed. This
is in line with the idea of a generalized magnitude system, which is
assumed to encode information about the magnitudes in the external
world that are used in action (Walsh 2003, p 486). Specifically,
showing almost all fingers of one hand is associated to large magnitudes and showing very few fingers to small magnitudes. The
present study shows that only under certain task demands subjects
activate a one-to-one correspondence between fingers and numbers.
In other situations, magnitudes might not have to be exactly the same,
but rather proportional to be associated.

Acknowledgments
This research is supported by DFG grant FI 1915/2-1 Manumerical
cognition.
References
Di Luca S, Pesenti M (2007) Masked priming effect with canonical
finger numeral configurations. Exp Brain Res 185(1): 2739. doi:
10.1007/s00221-007-1132-8
Domahs F, Moeller K, Huber S, Willmes K, Nuerk HC (2010)
Embodied numerosity: implicit hand-based representations
influence symbolic number processing across cultures. Cognition
116(2):251266. doi:10.1016/j.cognition.2010.05.007
Glenberg AM, Witt JK, Metcalfe J (2013) From the revolution to
embodiment: 25 years of cognitive psychology. Perspect Psychol
Sci 8(5):573585. doi:10.1177/1745691613498098
Fischer MH (2008) Finger counting habits modulate spatial-numerical
associations. Cortex 44(4): 38692. doi:10.1016/j.cortex.2007.
08.004
Walsh V (2003) A theory of magnitude: common cortical metrics of
time, space and quantity. Trends Cogn Sci 7(11):483488. doi:
10.1016/j.tics.2003.09.002

Object names correspond to convex entities


Rahel Sutterlutti, Simon Christoph Stein, Minija Tamosiunaite,
Florentin Worgotter
Faculty of Physics: Biophysics and Bernstein Center
for Computational Neuroscience, Gottingen, Germany
Commonly one assumes that object-identification (and recognition)
requires complexinnate as well as acquiredcognitive processes
(Carey 2011), however, it remains unclear how objects can be individuated, segregated into parts, and identified (named) given the high
degree of variability of the sensory features which arise even from
similar objects (Geisler 2008). Gestalt laws, relying on shape
parameters and their relations; for example edge-relations, compactness, or others; seem to play a role in this process (Spelke et al. 1993).
Specifically, there exist several results from psychophysics (Hoffman
and Richards 1984, Biederman 1987, Bertamini and Wagemans 2013)
and machine vision (Siddiqi and Kimia 1995, Richtsfeld et al. 2012),
which demonstrate that convex-concave surface transitions can be
used for object partitioning.
Here we are now trying to discern to what degree such a partitioning corresponds to our language-expressible object
understanding. To this end, a total of 10 real scenes, consisting of

S69
3D point cloud data and the corresponding RBG image, have been
analyzed. Scenes were recorded by RGB-D sensors (Kinect), which
provide 3D point cloud data and matched 2D RGB images. Scenes
were taken from openly available machine vision data bases (Richtsfeld et al. 2012, Silberman et al. 2012). We segmented the scenes
into 3D entities using convex-concave transitions in the point cloud
by a model-free machine vision algorithm, the details of which are
described elsewhere (LCCP Algorithm, Stein et al. 2014). This is a
purely data-driven segmentation algorithm, which does not use any
additional features for segmentation and works reliably for in-door
RGB-D scenes with a depth range of approx. 0.5 to 5 meters using
only 2 parameters to set the resolution. Note, due to the limited spatial
resolution of the RGB-D sensors, small objects cannot be consistently
labeled. Thus, segments smaller than 3 % of the image size were
manually blackened out by us as they most often represent sensor
noise. We received a total of 247 segments (i.e. about 2030 per
image). Segments are labeled on the 2D RGB image with different
colors to make them distinguishable for the observer. To control for
errors introduced by image acquisition and/or by the computer vision
algorithm, we use the known distance error function of the Kinect
sensor (Smisek et al. 2011) to calculate a reliability score for every
segment.
We asked 20 subjects to compare the obtained 247 color-labeled
segments with the corresponding original RGB image, asking: How
precisely can you name it?; and recorded their utterances obtaining
4,940 data points. Subsequently we analyzed the utterances and
divided them into three groups: 1) precise naming of a segment (e.g.
table leg), where it does not play a role whether or not subjects
would use unique names (e.g. table leg, leg, and table support
are equally valid), 2) definite failure/impossibility to name a segment
and 3) unclear cases, where subjects stated that they are not sure about
the identification.
One example scene is shown in Fig. 1a. Using color-based segmentation (BenSalah et al. 2011) the resulting image segments rarely
correspond to objects in the scene (Fig. 1b) and this is also extremely
dependent on illumination. Unwanted merging or splitting of objects
will, regardless of the chosen segmentation parameters, generically
happen (e.g. throat + face, fridge-fragments, etc. Figure 1b).
Instead of using 2D color information, here point clouds were 3Dsegmented along concave/convex transitions. We observed (Fig. 1b)
that subjects many times used different names (e.g. face or head)
to identify a segment, which are equally valid as both describe a valid
conceptional entity (an object). There are however several cases
where segments could not be identified. We find that on average 64 %
of the segments could be identified, 30 % not, and there were 6 %
unclear cases. Are these 30 % non-identified segments possibly
(partially) due to machine vision errors? To assess this, we additionally considered the reliability of the individual segments. Due to
the discretization error of the Kinect (stripy patterns in Fig. 1c), data
at larger distances become quadratically more unreliable (Smisek
et al. 2011) leading to merging of segments. When considering this
error source, we find that subjects could more often identify reliable
segments (Fig. 1e, red) and unrecognized cases dropped accordingly
(green). The red lettering in Fig. 1d marks less reliable segments and,
indeed, identification is lower or more ambivalent for those segments
as compared to the more reliable ones.
The here performed segmentation generically renders identifiable
object parts (e.g. head, arm, handle of fridge, etc.). Clearly,
no purely data-driven method exists, which would allow detecting
complex, compound objects (e.g. woman) as this requires additional conceptual knowledge. Furthermore, we note that we are here
not concerned with higher cognitive aspects, relating to context
analysis, hierarchization, categorization, and other complex processes. Our main observation is that the purely geometrical (lowlevel) breaking up of a 3D scene, most often leads to entities for
which we have an internal object or object-part concept which may

123

S70

Cogn Process (2014) 15 (Suppl 1):S1S158

Fig. 1 Humans can with high reliability identify image segments that result from splitting images along concave-convex surface transitions.
a One example scene used for analysis. b Color-based segmentation of the scene. c Point cloud image of parts of the scene. d 3D-segmented
scene and segment names used by our subjects to identify objects. Missing percentages are the non-named cases. Red lettering indicates segments
with reliability less than 50. e Fraction of identified (red), not-identified (green) and unclear (blue) segments for the complete data set plotted
against their reliability. Fat dots represent averages across reliability intervals [0,10]; [10,20]; ; [150,160]. The ability to identify a segment
increases with reliability. Grand averages (red 0.64, green 0.30, blue 0.06) for all data are shown, too

reflect the low-level perceptual grounding of the bounded region


hypothesis formulated by Langacker (1990) as a possible foundation
for grammatical entity construal.
It is known that color, texture and other such statistical image
features vary widely (Geisler 2008). Thus, object individuation cannot
rely on them. By contrast, here we find that convex-concave transitions between 3D-surfaces might represent the required prior to which
a contiguous object concept can be unequivocally bound. These
transitions render object boundaries and, consequentially leads to the
situation that we can name them.
In addition, we note that this bottom-up segmentation can easily
be combined with other image features (edge, color, etc.) and alsoif
desiredwith object models where one now can go beyond object
individuation towards true object recognition.

References
Ben Salah, M, Mitiche, A, Ayed, IB (2011) Multiregion image segmentation by parametric kernel graph cuts. IEEE Trans Image
Proc. 20(2):545557
Bertamini, M, Wagemans, J (2013) Processing convexity and concavity along a 2-D contour: figure-ground, structural shape, and
attention. Psychon Bull Rev 20(2):191207
Biederman I (1987) Recognition-by-components: a theory of human
image understanding. Psychol Rev 94:115147

123

Carey S (2011) Precis of The origin of concepts (and commentaries), behav Brain Sci 34(3):113167
Geisler W (2008) Visual perception and the statistical properties of
natural scenes. Ann Rev Psy 59:167192
Hoffman D, Richards W (1984) Parts of recognition. Cognition
18(13):6596
Langacker RW (1990) Concept, image, and symbol: the cognitive
basis of grammar. Mouton de Gruyter, Berlin
Richtsfeld A, Morwald T, Prankl J, Zillich M, Vincze M (2012)
Segmentation of unknown objects in indoor environments. In:
Proceedings of IEEE Conference on EEE/RSJ intelligent robots
and systems (IROS), pp 47914796
Siddiqi K, Kimia BB (1995) Parts of visual form: computational
aspects. IEEE Trans Pattern Anal Mach Intel 17:239251
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGB-D images. In: Proceedings
of European conference on computer vision (ECCV), pp 746760
Smisek J, Jancosek M, Pajdla T (2011) 3D with Kinect. In: Proceedings of international conference comp vision (ICCV),
pp 11541160
Spelke ES, Breinlinger K, Jacobson K, Phillips A (1993) Gestalt
relations and object perception: a developmental study. Perception 22(12):14831501
Stein S, Papon J, Schoeler M, Worgotter F (2014) Object partitioning
using local convexity. In: Proceedings of IEEE conference on

Cogn Process (2014) 15 (Suppl 1):S1S158


computer vision and pattern recognition (CVPR), 2014. http://www.
cv-foundation.org/openaccess/content_cvpr_2014/papers/Stein_
Object_Partitioning_using_2014_CVPR_paper.pdf

The role of direct haptic feedback in a compensatory


tracking task
Evangelia-Regkina Symeonidou, Mario Olivari, Heinrich H. Bulthoff,
Lewis L. Chuang
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Haptic feedback systems can be designed to assist vehicular steering
by sharing manual control with the human operator. For example,
direct haptic feedback (DHF) forces, that are applied over the control
device, can guide the operator towards an optimized trajectory, which
he can either augment, comply with or resist according to his preferences. DHF has been shown to improve performance (Olivari et al.
submitted) and increase safety (Tsoi et al. 2010). Nonetheless, the
human operator may not always benefit from the haptic support
system. Depending on the amount of the haptic feedback, the operator
might demonstrate an over- reliance or an opposition to this haptic
assistance (Forsyth and MacLean 2006). Thus, it is worthwhile to
investigate how different levels of haptic assistance influence shared
control performance.
The current study investigates how different gain levels of DHF
influence performance in a compensatory tracking task. For this
purpose, 6 participants were evenly divided into two groups according
to their previous tracking experience. During the task, they had to
compensate for externally induced disturbances that were visualized
as the difference between a moving line and a horizontal reference
standard. Briefly, participants observed how an unstable air- craft
symbol, located in the middle of the screen, deviated in the roll axis
from a stable artificial horizon. In order to compensate for the roll
angle, participants were instructed to use the control joystick.
Meanwhile, different DHF forces were presented over the control
joystick for gain levels of 0, 12.5, 25, 50 and 100 %. The maximal
DHF level was chosen according to the procedure described in
(Olivari et al. 2014) and represents the best stable performance of
skilled human operators. The participants performance was defined
as the reciprocal of the median of the root mean square error (RMSE)
in each condition.
Figure 1a shows that performance improved with in- creasing
DHF gain, regardless of experience levels. To evaluate the operators
contribution, relative to the DHF contribution, we calculated the ratio

Fig. 1 a Performance of the experienced and in experienced


participants as well as the baseline of direct haptic feedback (DHF)
assistance without human input for increasing haptic gain. b The ratio
of overall system performance to DHF performance without human
input for increasing haptic gain

S71
of overall performance to estimated DHF performance without human
input. Figure 1b shows that the subjects contribution in both groups
de- creased with increasing DHF up to the 50 % condition. The
contribution of experienced subjects plateaued between the 50 and
100 % DHF levels. Thus, the increase in performance for the 100 %
condition can mainly be attributed to the higher DHF forces alone. In
contrast, the inexperienced subjects seemed to completely rely on the
DHF during the 50 % condition, since the operators contribution
approximated 1. However, this changed for the 100 % DHF level.
Here, the participants started to actively contribute to the task
(operators contribution [1). This change in behavior resulted in
performance values similar to those of the experienced group Our
findings suggest that the increase of haptic support with our DHF
system does not necessarily result in over-reliance and can improve
performance for both experienced and inexperienced subjects.
References
Forsyth BAC, MacLean KE (2006) Predictive haptic guidance:
intelligent user assistance for the control of dynamic tasks. IEEE
Trans Visual Comput Graph 12(1):10313
Olivari M, Nieuwenhuizen FM, Bulthoff HH, Pollini L (2014) An
experimental comparison of haptic and automated pilot support
systems. In: AIAA modeling and simulation technologies conference, pp 111
Olivari M, Nieuwenhuizen F, Bulthoff H, Pollini L (submitted) Pilot
adaptation to different classes of haptic aids in tracking tasks.
J Guidance Control Dyn
Tsoi KK, Mulder M, Abbink DA (2010) Balancing safety and support: changing lanes with a haptic lane-keeping support system.
In: 2010 IEEE international conference on systems, man and
cybernetics, pp 12361243

Comprehending negated action(s): embodiment


perspective
Nemanja Vaci1, Jelena Radanovic2, Fernando Marmolejo-Ramos3,
Petar Milin2,4
1
Alpen-Adria University Klagenfurt, Austria; 2 University of Novi
Sad, Serbia; 3 University of Adelaide, Australia; 4 Eberhard Karls
Universitat Tubingen, Germany
Keywords
Embodied cognition,
comprehension

Negation,

Mental

simulation,

Sentence

According to the embodied cognition framework, comprehension of


language involves activation of the same sensorimotor areas of the brain
that are activated when entities and events described by language
structures (e.g., words, sentences) are actually experienced (Barsalou
1999). Previous work on the comprehension of sentences showed
support for this proposal. For example, Glenberg and Kaschak (2002)
observed that judgment about sensibility of a sentence was facilitated
when there was congruence between the direction of an action implied
by the sentence and the direction of a movement required for making a
response, while incongruence led to slower responses. It was also shown
that linguistic markers (e.g., negation) could modulate mental simulation of concepts (Kaup 2001). This finding was explained by the twostep negation processing: (1) a reader simulates a sentence as if there is
no negation; (2) she negates the simulated content to reach full meaning.
However, when a negated action was announced in preceding text,
negated clause was processed as fast as the affirmative one (Ludtke and
Kaup 2006). The mentioned results suggest the mechanism of negation
processing can be altered contextually.
In this study, we aimed at further investigating the effects of
linguistic markers, following the assumptions of embodied

123

S72
framework. To obtain manipulation of a sentence context that would
target mental simulations we made use of materials from De Vega
et al. (2004). These researchers created sentences by manipulating
whether or not two actions described in a sentence were competing
for the cognitive resources. They showed that sentences with two
actions being performed at the same time were easier to process
when they aimed at different sensorimotor systems (whistling and
painting a fence), than when described actions involved the same
sensorimotor system (chopping a wood and painting a fence). We
hypothesized that given two competing actions negation could
provide suppression for one of them and, thus, change the global
simulation time course.
Experiment 1 was a modified replication of De Vega et al.
(2004) study in Serbian. We constructed sentences by manipulating
whether or not two actions described in a sentence were performed
using the same or different sensorimotor systems. We also manipulated temporal ratio of the two actions (simultaneous vs.
successive). Finally, actions within a sentence could be physically
executed or mentally planned (reading a book vs. imagining reading
a book). This way we included both descriptions of real actions
as well as the descriptions of mental states. Introduction of this
factor aimed at testing whether linguistic marker for mentally
planned actions would induce second order simulation, similar to
the two-step processing, or suppress the mental simulation, which
than would match the one-step processing. The participants task in
this experiment was to read the sentences and to press a button
when they finished. To ensure comprehension, in 25 % randomly
chosen trials participants were instructed to repeat the meaning of a
sentence to the experimenter.
In the following two experiments, we focused on the mechanism
of negation, using similar sentences as in Experiment 1. Here, we
manipulated the form of the two parts (affirmative vs. negative). The
task used Experiment 2 and 3 was a modified self-paced reading task,
allowing a by-clause reading rather than a by-word or by-sentence
reading. This way we obtained response times for each of the two
parts (clauses). We were also interested in measuring the time (and
accuracy) required for judging sensibility of the whole sentence.
Therefore, we included the equal number of nonsensible filler
sentences.
The linear mixed-effect modeling was applied to the response
times and logistic mixed-effect modeling to the accuracy rates. We
controlled for trial order, clause and sentence length and sensibility of
a sentence (the sensibility ratings were obtained in separate study
using different participants). Results from Experiment 1 confirmed
the findings of De Vega et al. (2004): the sentences with two actions
from the same sensorimotor system were comprehended slower
(t(489.90) = 4.21, p \ .001). In addition, we observed a stronger
inhibitory effect from a length in case of sentences with simultaneously executed actions, which indicates additional comprehension
load for this type of sentences (t(499.40) = - 2.00, p \ .05). Finally,
processing time was longer when sentences described mentally
planned actions as opposed to real ones (t(489.70) = 3.21,
p \ .01).
The analyzes of Experiment 2 and 3 showed consistent results
between the clause (local) and sentence (global) response times. The
interaction between sensorimotor systems (same vs. different) and a
form of a clause (affirmative vs. negative) was significant
(t(303.70) = 2.95, p \ .01): different sensorimotor actions/systems
combined with negation lead to slower processing time and lower
accuracy; when the sensorimotor system was the same, affirmative
and negated markers did not induce significant differences. However,
in cases when actions addressed the same sensorimotor system, the
accuracy of the sensibility judgments was higher if the second action
was negated (z(303.70) = 4.36, p \ .001). Taken together, this pattern of results suggest that in case of competing actions negation

123

Cogn Process (2014) 15 (Suppl 1):S1S158


might be processed in one step, as opposed to the two stages processing in the case of non-competing actions.
The present results support the claim about mental simulation as
influenced by linguistic markers. We showed, however, that such an
influence depends on more general contextual factors. Present results
suggest that negation might have regulatory purpose in sentence
comprehension. The negated content is comprehended in a two-step
simulation only if actions do not compete for the cognitive resources.
Contrariwise, when actions within a sentence are in sensorimotor
competition, negation can suppress the second action to facilitate the
comprehension.
References
Barsalou LW (1999) Perceptual symbol systems. Behav Brain Sci
22:577660
De Vega M et al. (2004) On doing two things at once: temporal
constraints on actions in language comprehension. Mem Cogn
33:10331043
Glenberg AM, Kaschak MP (2002) Grounding language in action.
Psychon Bull Rev 9:558565
Kaup B (2001) Negation and its impact on the accessibility of text
information. Mem Cogn 29:960967
Ludtke J, Kaup B (2006) Context effects when reading negative and
affirmative sentences. In: Sun R (ed) Proceedings of the 28th
annual conference of the cognitive science society. Lawrence
Erlbaum Associates, Mahwah, pp 17351740

Effects of action signaling on interpersonal


coordination
Cordula Vesper1, Lou Safra2, Laura Schmitz1, Natalie Sebanz1,
Gunther Knoblich1
1
CEU, Budapest, Hungary; 2 Ecole Normale Superieure, Paris,
France
How do people coordinate actions such as lifting heavy objects
together, clapping in synchrony or passing a basketball from one
person to another? In many joint action tasks (Knoblich et al. 2011),
talking is not needed or simply too slow to provide useful cues for
coordination. Instead, two people who coordinate their actions
towards a joint goal often adapt the way they perform their own
actions to facilitate performance for a task partner.
One way of supporting a co-actor is by providing relevant information about ones own action performance. This can be achieved
non-verbally by exaggerating specific movement aspects so that
another person can more easily understand and predict the action.
This communicative modulation of ones own actions is often referred to as signaling (Pezzulo et al. 2013) and includes common action
exaggerations such as making a distinct, disambiguating step towards
the right to avoid a collision with another person on the street.
The present study investigated signaling in a joint action task in
which sixteen pairs of participants moved cursors on a computer screen
towards a common target with the goal of reaching the target synchronously. Short feedback tones at target arrival indicated the
coordination accuracy of their actions. To investigate whether actors
modulate their signaling depending on what is perceptually available to
their partners, we compared several movement parameters between two
conditions: In the visible condition, co-actors could see each others
movements towards the target (i.e. both computer screens were visible
to both co-actors); in the hidden condition an occluder between the coactors prevented them from receiving visual feedback about each other.
Analyzes of participants movements showed that signaling in the form
of exaggerating the trajectory towards the target (by increasing the

Cogn Process (2014) 15 (Suppl 1):S1S158


curvature of the movement) was specifically used in the visible condition, whereas a temporal strategy of reducing the variability of target
arrival times (Vesper et al. 2011) was used in the hidden condition.
Furthermore, pairs who signaled more were overall better coordinated.
Together these findings suggest that signaling is specifically
employed in cases where a task partner is able to use the information
(i.e. can actually see the action modulation) and that this can be
beneficial for successful joint action performance. Thus, co-actors
take into account what their partners can perceive in their attempts to
coordinate their actions with them. Moreover, our study demonstrates
how, depending on the type and amount of perceptual information
available between co-actors, different mechanisms support interpersonal coordination.
References
Knoblich G, Butterfill S, Sebanz N (2011) Psychological research on joint
action: theory and data. In: Ross B (ed) The psychology of learning
and motivation 54. Academic Press, Burlington, pp 59101
Pezzulo G, Donnarumma F, Dindo H (2013) Human sensorimotor
communication: a theory of signaling in online social interactions. PLoS ONE 8: e79876
Vesper C, van der Wel RPRD, Knoblich G, Sebanz N (2011) Making
oneself predictable: Reduced temporal variability facilitates joint
action coordination. Exp Brain Res 211: 517530

S73
tactile signals in expected sensory areas such as the primary
sensory cortex, supramarginal gyri, and Rolandic opercula. In
second-level analyses significant 2-way interactions between the
belt on/off and pre/post training condition indicates an involvement
of Rolandic opercula, Insula, MST and PPC. Inspection of the
activation intensities shows a significant difference belt on [ off
only in the first measurement before the training period, but not
after the training period.
In summary, in fMRI we observe differential activations in areas
expected for path integration tasks and tactile stimulation. Additionally,
we also found activation differences for the belt signals well beyond the
somatosensory system, indicating that processing is not limited to sensory areas but includes also higher level and motor regions as predicted by
the theory of sensorimotor contingencies. It is demonstrated that the
belts signal is processed differently after the training period. Our fMRI
results are also in line with subjective reports indicating a qualitative
change in the perception of the belt signals.

Do you believe in Mozart? The influence of beliefs


about composition on representing joint action
outcomes in music
Thomas Wolf, Cordula Vesper, Natalie Sebanz, Gunther Knoblich
CEU, Budapest, Hungary

Physiological changes through sensory augmentation


in path integration: an fMRI study
Susan Wache1,*, Johannes Keyser1, Sabine U Konig1,
Frank Schumann1, Thomas Wolbers2,3, Christian Buchel2,
Peter Konig1,4
1
Institute of Cognitive Science, University Osnabruck; 2 Institute of
Systems Neuroscience, University Medical Center Hamburg
Eppendorf; 3 German Center for Neurodegenerative Diseases,
Magdeburg; 4 Department of Neurophysiology and Pathophysiology,
University Medical Center Hamburg Eppendorf
The theory of sensorimotor contingencies (SMCs) describes qualitative experience as based on the dependency between sensory input
and its preceding motor actions. To investigate sensory processing
and learning of new SMCs we used sensory augmentation in a virtual
path integration task. Specifically, we built a belt that maps directional information of a compass to a set of vibrating elements such as
that the element pointing north is always activated. The belt changes
its tactile signals only by motor actions of the belt-wearing participants, i.e. when turning around.
Nine subjects wore the belt during all waking hours for seven
weeks, 5 control subjects actively trained their navigation, but without
a belt (age 1932y, seven female). Before and after the training period
we presented in the fMRI scanner a virtual path integration (PI) task
and a corresponding control task with identical visual stimuli. In half
of the trials of both tasks the belt was switched on, coherently
vibrating with the virtual movements of the subjects.
We used ROI analysis to concentrate on regions relevant for
spatial navigation and for sensory processing. We used a mixedeffects ANOVA to decompose the four factors belt on/off, belt/
control subjects, PI/control task, and before/after training. The
main effect PI [ control task shows large-scale differences in areas
that have been found to be active in similar navigational tasks
such as medial superior temporal cortices (MST), posterior parietal
cortex (PPC), ventral intraparietal areas, and caudate nucleus.
Additionally we found sensorimotor regions such as supplementary
motor areas (SMA), insula, primary sensory cortex, and precentral
gyrus. The main effect belt on [ off reveals processing of the

Actors in joint action situations represent the outcomes of their joint


actions and use these to guide their actions (Vesper, Butterfill, Knoblich,
Sebanz 2010). However, it is not clear how conceptual and perceptual
information affect the representations of joint action outcomes. In the
present experiment, we investigated whether beliefs about the intended
nature of joint action outcomes are sufficient to elicit changes in their
representation. As recent studies provide evidence that participants represent joint action outcomes in musical paradigms (Loehr, Kourtis,
Vesper, Sebanz, Knoblich 2013), we used a piano paradigm to investigate
the hypothesis that beliefs about the composers intentions can influence
representations of jointly produced tones.
In our paradigm, we used a within-subjects 2 9 2 design with the
factors Belief (together, separate) and Key (same, different). Two
adult piano novices played 24 melody-sets with the help of templates.
In the Belief condition together, the participants were told that the
melodies they were going to play were intended to be played together
as duets. In the condition separate, participants were told that their
melodies were not intended to be played together. With the Key
manipulation, we manipulated the cognitive costs of joint action
outcome representations as follows. All 24 melody-sets were generated by a python script, and followed the same simple chord
progression (I-IV-V7-I). They differed only along the Key manipulation: In 12 melody-sets, the aforementioned chord progression was
implemented in the same musical key. When the two melodies are
realized following the same chord progression in the same key, the
cognitive cost of representing the joint action outcome should be
lower than in the other 12 melody-sets, where the same chord progression was implemented in different keys. Representing the joint
action outcome of two melodies in different keys demands more
resources, even though representing only ones own action outcome is
equally costly in both key conditions. During the experiment, accuracy, tempo and synchrony were measured.
Following our hypothesis that beliefs about the composition
affects the representation of the joint action outcome, we predicted
that the differences between the same Key and the different Key
melody-sets would be significantly higher when participants believed
the melodies were meant to be played together, attesting that the
participants beliefs had led to an increase of joint action representations. In other words, we predicted that an ANOVA with the

123

S74

Cogn Process (2014) 15 (Suppl 1):S1S158

independent variables Belief and Key would show a significant


interaction.
References
Vesper C, Butterfill S, Knoblich G, Sebanz N (2010) A minimal
architecture for joint action. Neural Netw 23:9981003
Loehr JD, Kourtis D, Vesper C, Sebanz N, Knoblich G (2013)
Monitoring individual and joint action outcomes in duet music
performance. J Cogn Neurosci 25(7):10491061

Processing sentences describing auditory events:


only pianists show evidence for an automatic space
pitch association
Sibylla Wolter, Carolin Dudschig, Irmgard de La Vega,
Barbara Kaup
Universitat Tubingen, Germany
Embodied language understanding models suggest that language comprehension is grounded in experience. It is assumed that during reading of
words and sentences these experiences become reactivated and can be
used as mental simulation (Barsalou 1999; Zwaan, Madden 2005).
Despite a growing body of evidence supporting the importance of sensory-motor representations during language understanding (e.g.,
Glenberg, Kaschak 2002) rather little is known regarding the representation of sound during language processing. In the current study, we aim
to close this gap by investigating whether processing sentences
describing auditory events results in similar action-compatibility effects
as have been reported for physical tone perception.
With regard to physical tone perception it is known that real tones
of different pitch heights trigger specific spatial associations on a
vertical as well as horizontal axis. The vertical association is typically
activated automatically for all participant groups (Lidji, Kolinsky,
Lochy, Morais 2007; Rusconi, Kwan, Giordano, Umilta, Butterworth
2006). In contrast, the horizontal axis seems to be mediated by
musical expertise. Specifically, only pianists with a considerable
amount of experience with the piano keyboard and other musicians
show an automatic association between low tones and the left side and
high tones and the right side (Lidji et al. 2007; Trimarchi, Luzatti
2011). This suggests that the experiences pianists make when playing
the piano lead to a space-pitch association automatically elicited when
processing high or low auditory sounds.
The aim of the present study was to investigate whether experience-specific space-pitch associations in the horizontal dimension can
also be observed during the processing of sentences referring to high
or low auditory sounds. For pianists, we expected to find faster
responses on the right compared to the left for sentences implying
high pitch and faster responses on the left compared to the right for
sentences implying low pitch. For non-musicians no such interaction
was expected. Finding the respective differences between pianists and
non-musicians would strongly support the idea that during language
processing specific experiential associations are being reactivated.
20 skilled pianists with an average training period of 14.85 years
(Experiment 1) and 24 non-musicians with none or less than 2 years
of musical training that took place at least 10 years ago (Experiment
2) were presented with sentences expressing high/low auditory
events, such as the bear growls deeply vs. the soprano singer sings a
high Aria. Half of the sentences contained the words high or low
(explicit condition), the other half only implicitly expressed pitch
height (implicit condition). Nonsensical sentences served as filler
items. Participants judged whether the sentence was semantically
correct or incorrect by pressing either a left or right response key. The

123

Fig. 1 Mean response times for left/right-responses to sentences


implying high/low pitch for pianists (left panel) and non-musicians
(right panel). The error bars represent the 95% confidence interval
and are conducted according to Masson and Loftus (2003)
response position (sensible is right vs. left) was varied between
blocks, starting position was balanced between participants. Each
sentence was presented only once to each participant. A by-participant (F1) and a by-item (F2) ANOVA was conducted, one treating
participants and one items as random factor. The results are displayed
in Fig. 1. The pianists (Exp 1) showed a significant interaction
between implied pitch and response hand (F1(1,19) = 4.8, p \ .05,
F2(1,56) = 6.77, p \ .05) with faster responses to sentences implying
high pitch with a right compared to a left keypress response and faster
responses to sentences implying low pitch with a left compared to a
right keypress response. Sentence type (explicit vs. implicit) did not
modify this interaction (Fs \ 1). For the non-musicians, no interaction between implied pitch and response hand was found (Fs \ 1).
Additionally, the data showed significant main effects of implied
pitch and sentence type in the by- participants analysis for both
participant groups (pianists: F1(1,19) = 21.42, p \ .001, F2(1,56) =
1.4, p = .24; F1(1,19) = 29.87, p \ .001, F2(1,56) = 2.56, p = .12;
non-musicians: F1(1,23) = 20.01, p \ .001, F2(1,56) = 1.21,
p = .28; F1(1,23) = 27.14, p \ .001, F2(1,56) = 1.17, p = .28).
Sentences implying high pitch yielded faster responses compared to
sentences implying low pitch and implicit sentences were responded
to faster than explicit sentences.
The results show that specific musical experiences can influence a
linguistically implied space-pitch association. This is in line with the
mental simulation view of language comprehension suggesting that
language understanding involves multimodal knowledge representations
that are based on experiences acquired during interactions with the world.
References
Barsalou LW (1999) Perceptual symbol systems. Behav Brain Sci
22:577660
Glenberg AM, Kaschak MP (2002) Grounding language in action.
Psychon Bull Rev 9(3):558565
Lidji P, Kolinsky R, Lochy A, Morais J (2007) Spatial associations
for musical stimuli: a piano in the head? J Exp Psychol
33(5):11891207
Masson MEJ, Loftus GR (2003) Using confidence intervals for
graphically based data interpretation. Can J Exp Psychol
57(3):203220
Rusconi E, Kwan B, Giordano BL, Umilta C, Butterworth B (2006)
Spatial representation of pitch height: the SMARC effect. Cognition 99:113129
Trimarchi PD, Luzatti C (2011) Implicit chord processing and motor
representation in pianists. Psychol Res 75:122128
Zwaan RA, Madden CJ (eds) (2005) Embodied sentence comprehension. CUP, Cambridge

Cogn Process (2014) 15 (Suppl 1):S1S158

A free energy approach to template matching in visual


attention: a connectionist model
Keyvan Yahya1, Pouyan R. Fard2, Karl J. Friston3
1
University of Birmingham, Edgbaston, Birmingham,UK;
2
Graduate school of Neural Information Processing, University of
Tubingen, Germany; 3 The Wellcome Trust Centre for Neuroimaging,
Institute of Neurology, University College London, London, UK
Abstract
In this work, we propose a free energy model for visual template
matching (FR-SAIM) based on the selective visual attention and
identification model (SAIM).
Keywords
Selective Visual Attention, Template Matching, Free Energy
Principle
Introduction
Visual search is a perceptual task that has been extensively studied in
the cognitive processing systems literature. It is widely known that
this process rests on matching the input from visual field with a topdown attentional set, namely, a search template. However, the way
this attentional set is formed and how it guides the visual search
is still is not clear. The free energy principle is an emerging neurocognitive framework, which tries to account for how interactions
within a self-organizing system, like the brain, lead to represent,
perceive and interpret sensory data by minimizing a free energy
that can be considered as prediction error (Friston 2009). By
extending the SAIM (Heinke, Humphreys 2003), we demonstrate how
connectionist models can shed a light on how free energy minimization mediates template matching in a visual attention model.
The Overview of FR-SAIM model
The architecture of the FR-SAIM model is illustrated in Fig. 1a. In
brief, visual input sampling is carried out by the content network (CN),
while controlled by the selection network (SN), and mapped to the
focus of attention (FOA). When multiple objects appear in the retina,
there is a property called inhibition of return to make the model select
one and only one object to avoid them being overlapped in the FOA.
At the same time, the content network rectifies the already selected
objects. Every neuron in the CN (sigma-pi nodes) holds a correspondence between the retina and the FOA. On the other hand, the SN
determines which one of them is instantiated. By using a top-down

B
C

S75
control mechanism, the knowledge network (KN) identifies the content of the FOA in comparison with the template it entails. Moreover,
the location map complements the matching task by imposing another
top-down control that supervises the selection of the input image.
In the FR-SAIM, every network is associated with a free energy
function in a hierarchical fashion. Each lower-level network makes a
prediction and sends it up to the level above and in turn, each higherlevel network calculates the top-down prediction error signal and
returns to the level below.
The Generative Model: To model sensory information in a hierarchical structure, we define a nonlinear function, say f, to represent our
state-space in terms of the sensory states (input data), in the way the
following equation suggests:
 
1
si f xi w : w  N0; Rm x; m; c
where the causal states m are mediated by hidden states x and thereby
the hierarchical states link together and bring about a memory for the
model and establish the local dynamics xi :xm ; xx are both random
fluctuations produced through observation.
Concerning equation (1), the model dynamics can be written in a
hierarchical fashion as follows:
 
2
x0 f x1


x1 f x2 Ui
3
 
x2 f x1 bottomup
4
 
x2 f x3 topdown prediction
5
where U(i) is the action the networks takes to modify the selection
process of sen-sory data and is denoted by Ui maxx2 ; x3 .
The Energy Functions: The energy functions of the neural networks
in the FR-SAIM are derived by combining the original SAIM network
energy functions and the prediction errors computed using free energy
principle. The details of mathematical derivation of these functions
are discussed in Yahya (2013). These energy functions can be written
as follows:

SCN

SN
xCN
ij ; xkl

X
bCN X X CN
SN

xij  ySN
xVF
kl 
kl yki;lj
2 ij
kl
kl
X
2

ykl
SN  1

!2

kl


 a X
2
KN
CN

EKN yKN
yKN
 1 bKN
m ; xij
l
2 l
0
!2 1
X
X
KN
@yl 

xCN
 wlij A
KN
ij  yl
l

ij

X
2

 aLM X  LM
 SN

SN
LM
ELM yLM
yLM

l ykl  1 bLM 
kl ; xkl
kl  xkl  ykl
2
l

Top-down
Modulation
Focus of
Attention

8
D
Top-down
Modulation

Finally, the gradient descent method, at time step t, will be imposed


on all of the network energy functions in order to have them
minimized:
xi t 1 xi t 

Fig. 1 a Architecture of the FR-SAIM Model, b Visual field input to


the model, c Activation patterns of the content network during
simulation, d Time course of activation of the content network

oExi
oxi

Simulation Results
Simulation results are shown in Fig. 1bd. Here, the model starts
processing visual input and will put the result into the FOA. These
results illustrate how the target template 2, won the competition
over the distractor template +, by dominating the activation of the

123

S76
content network, as time passes. Furthermore, the time plot of the
content network shows how the obtained network energy functions
are minimized with regards to free energy principle.
References
Friston KJ (2009) The free-energy principle: a rough guide to the
brain? Trend Cogn Sci 13(7):293301
Heinke D, Humphreys GW (2003) Attention, spatial representation,
and visual neglect: simulating emergent attention and spatial

123

Cogn Process (2014) 15 (Suppl 1):S1S158


memory in the selective attention for identification model (SAIM).
Psychol Rev 110:2987
Yahya K (2013) A computational study of visual template identification in the SAIM: a free energy approach, MPhil Thesis,
University of Birmingham

Cogn Process (2014) 15 (Suppl 1):S1S158

Oral Presentations
Analyzing psychological theories with F-ACT-R:
an example F-ACT-R application
Rebecca Albrecht, Bernd Westphal
Informatik, Universitat Freiburg, Germany
Abstract
The problem to decide whether an ACT-R model predicts experimental data is, today, solved by simulation. This, of course, needs a
complete ACT-R model and fixed global parameter settings. Such an
ACT-R model may include implementation details, e.g. the use of
control variables as part of declarative knowledge, in order to yield
expected results in simulation. Some of these implementation details
are not part of a psychological theory but, nevertheless, may change
the models behavior. On the other hand, the crucial parts of a psychological theory modelled in ACT-R may only depend on very few
rules. Based on a formal semantics for the ACT-R architecture we
present preliminary results on a method to formally analyze whether a
partial ACT-R model predicts experimental data, without the need for
simulation.
Keywords
ACT-R, Formal Methods, Model Analysis, SMT, Model Checking
Introduction
In cognitive modelling, computer models are used to describe human
cognitive processes wrt. psychological assumptions. Unified theories
of cognition and their implementations (called cognitive architectures) provide means for cognitive modelling. A widely used unified
theory of cognition and cognitive architecture is ACT- R (Anderson
1983, 2007). ACT-R is a so-called hybrid architecture which consist
of a symbolic and a subsymbolic layer. As part of the symbolic layer
declarative knowledge (chunks) and procedural knowledge (production rules) is defined. The interface between the symbolic and the
subsymbolic layer in ACT-R is given by so- called modules. Modules
are requested by production rules to process declarative information
and make them accessible through associated buffers. The subsymbolic layer is defined by the behavior of modules, i.e. the responses of
modules for given requests. For some modules, these responses
depend on numerical parameters, e.g. the decay rate for the implementation of base-level learning as part of the declarative module.
The process of cognitive modelling in ACT-R can be described as
defining a model which adequately predicts average human data
collected in experiments. Today this process is typically performed as
follows. There is a psychological theory, i.e., a hypothesis on how a
given task is principally solved by humans. In order to validate the
psychological theory, an ACT-R model which implements the theory
is constructed and evaluated wrt. experimental data. Practically, figures like average error rates or response times are derived from
several executions of the ACT-R model and compared to average
human data collected in experiments. If the figures obtained from
executions of the ACT-R model deviate too far from experimental
data, there are two possible ways to adjust the models behavior. On
the one hand, numerical parameters can be adjusted, on the other
hand, a different implementation of the psychological theory can be
provided. If there is no implementation and parameter setting with
which the cognitive architecture yields adequate predictions, the
psychological theory needs to be rejected.
Today, the only available method for ACT-R model validation is
simulation, i.e. repeated model execution. Using this method for the
validation of psychological theories requires an ACT-R model which
is suitable for simulation. Creating such a sufficiently complete ACTR model may take a significant effort even if issues of a theory may
depend on only few production rules of a model. For example, a
psychological theory may be invalid because according to the theory a

S77
certain number of rules must be executed whose sequential execution
takes more time than it takes humans to solve corresponding (sub-)
tasks in the experiments.
In this work we propose a new method to investigate the validity
of a psycho- logical theory with ACT-R models. Based on a formal
semantics (Albrecht 2013; Albrecht and Westphal 2014) of ACT-R,
we reduce the question whether global parameter settings exist such
that, e.g., a timely execution of a set of ACT-R rules is possible, to a
satisfiability problem, i.e. a formula in first order logic. In order to
analyze the resulting satisfiability problem we use a satisfiability
modulo theories (SMT) (De Moura and Bjrner 2011) solver to
analyze it. If the SMT solver proves the given formula unsatisfiable,
we can conclude that there are no appropriate global parameter settings, thus there is an issue with the given implementation of the
psychological theory. If the SMT solver proves the given formula
satisfiable, we obtain valuable hints on global parameter settings and
can check them for plausibility. As our approach is not based on
actual executions of an ACT-R model, it in particular applies to
partial ACT-R models, i.e., to small sets of rules essential for the
psychological theory. This may save significant modelling effort.
Motivating Example
Experimental Setting. A typical task in the domain of relational
spatial reasoning with mental models is the following. Objects are
visually presented to participants either on the left or on the right of a
computer screen (cf. Fig. 1). The position of objects on two subsequently shown screens implicitly encodes a relation between two
objects. For example, the two leftmost screens in Fig. 1 together
encode the relation A is to the left of B.
The psychological experiment consists of n different tasks, where task
i consists of showing six screens at times t0i ,,t5i . The two relations
encoded by the first four screens are called premises, the relation
encoded by the last two screens shown at t4i and t5i is called conclusion. After the sixth screen of a task has been shown, participants
should state whether the two premises and the conclusion are contradictory. In the example shown in Fig. 1, they are not contradictory
because objects A, B, and C can be arranged in an order which
satisfies both premises and the conclusion.
The Theory of Preferred Mental Models
In the preferred mental model theory (Ragni, Knauff and Nebel
2005), it is assumed that participants construct a mental spatial array
of dynamic size which integrates information given by the premises.
Whether a conclusion contradicts the given premises is checked by
inspecting the spatial array. Furthermore, it is assumed that only one
preferred mental model is constructed immediately when the premises
are presented. Only if the given conclusion does not hold in the
preferred mental model an alternative mental model is constructed.
For example, a possible model of the premises shown in Fig. 1 is to
order the objects as A, C, B. This model does not imply the
conclusion.
Modelling the Theory of Preferred Mental Models. When modelling
the theory of preferred mental models in ACT-R, a crucial aspect is
the use of the declarative memory. In the ACT-R theory, the time and
probability for retrieving a chunk from declarative memory depend on
the activation of chunks. Activation in turn depends on different

Fig. 1 Example relational reasoning task with id i. Premise 1 is A is


to the left of B, premise 2 is A is to the left of C, and the
conclusion is B is to the left of C. The time when the j-th stimulus
is presented is denoted by tij

123

S78
assumptions on human memory processing, e.g. spreading activation,
where the content of the declarative memory is considered and base
level learning, where the history is considered. In an ACT-R cognitive
architecture where only base level learning is considered, the activation is calculated based on two global parameters: the decay rate d
which determines how fast the activation of a chunk decays over time
and the threshold s which defines a lower bound on activation values
for successful chunk retrieval.
A fundamental assumption of the theory of preferred mental models is
that the preferred mental model for the two premises is constructed
before the conclusion is presented. That is, the behavior of the
environment imposes hard deadlines on the timing of the model: any
valid ACT-R model for the theory of preferred mental models must
complete the processing of all rules needed to construct the preferred
mental model before the next stimulus is presented.
Consider the top row of Fig. 2 for a more formal discussion.
During a task, stimuli are presented to the participants at fixed points
in time. For example, let E1 denote the third screen (i.e. the onset of
the first element of premise 2) and E2 denote the fifth screen (i.e. the
onset of the first element of the conclusion) shown at times ti2 and ti4,
respectively, in the i-th task. This is the interval where the second
premise has to be processed. Then, according to the assumption stated
above, processing of premise 2 has to be completed within tb := t2i 
t4i time units. An ACT-R model for this task in particular needs to
model successful solutions of the task. That is, in an ACT-R model
which is valid given experimental data, the execution of all rules
which are involved in constructing the mental model must complete
in at most tb time units.
In Fig. 2, we illustrate cognitive states by the circular nodes,
arrows indicate the execution of one rule which transforms one
cognitive state into another. In addition to rules which request the
declarative module, an ACT-R model of the theory of preferred
mental models may comprise rules with deterministic timing and
outcome, e.g., when modifying buffers of the imaginal module. In
Fig. 2, we assume that there is only one request to the declarative
module by rule r, i.e. a request for the already constructed mental
model comprising premise 1, which has two qualitatively different
outcomes: a correct reply, and a wrong reply or no reply at all. Now
given corresponding rules, if it is impossible to choose the decay rate
d and the threshold s such that ti2 - ti4 B tb, then the considered rules
definitely do not constitute a valid (partial) ACT-R model for the
preferred mental model theory.
A Valid ACT-R Model for the Theory of Preferred Mental Models.
The preferred mental model theory has been implemented in ACT-R
(Ragni, Fangmeier and Brussow 2010). In this model, each premise is
represented by a mental model chunk which is constructed in the
imaginal buffer. A mental model chunk specifies a number of positions pos1, pos2, and assigns objects presented on the computer
screen accordingly. When premise 2 is presented, the mental model
chunk representing the first premise has to be retrieved from declarative memory in order to construct a new mental model chunk which
integrates both premises. In the ACT-R model for the preferred
mental model theory, only base-level learning is considered.
In the following, we use a part of this ACT-R model to illustrate our
approach. As the ACT-R model predicts the experimental data
appropriately for a certain choice of parameters, we expect our
approach to confirm this result.
Formal Analysis of ACT-R Models
Formally, an ACT-R production rule is a pair r = (p, a) which
comprises a precondition p and an action a. An ACT-R model is a set
of production rules. A cognitive state c = (s, t) consists of a mapping
s from buffers to chunks or to the symbol nil, and a time-stamp
t 2 R
0.
The F-ACT-R formal semantics (Albrecht, Westphal 2014)
explains how a set of production rules induces a timed transition
system on cognitive states given a set of global parameters, including

123

Cogn Process (2014) 15 (Suppl 1):S1S158


decay d and threshold s. Two cognitive states c = (s, t) and
c0 = (s0 , t0 ) are in transition relation, denoted by c ? rc0 , if there is a
rule r = (p, a) such that precondition p is satisfied in s, s0 is obtained
by applying a to s, and t0 - t is the time needed to execute action a.
Now the ACT-R model validity problem stated in Section 2
basically reduces to checking whether, given a start cognitive state
(c, t)) and a goal state (c0 , t0 ) there exist values for d and s such that
there is a sequence of transitions
r1

r2

rn

co ; t0 !c1 ; t1 ! . . . !cn ; tn

with c0 = c and cn = c0 .
For an example, consider the phase of the preferred mental model
theory shown in Fig. 2 as discussed in Section 2. More specifically,
consider a rule r which requests the declarative module for a mental
model representing premise 1 when the first screen of the second
premise is presented at time ti2.
In the following, we consider for simplicity a model where r is the
only nondeterministic rule which is ever enabled between ti2 and ti4
and that the sequence of rules executed before and after rule r is
deterministic. Then the time to execute the model only varies wrt. the
time for executing r. The model is definitely not valid if there are no
choices for decay d and threshold s such that there is a transition
c ? rc0 where c = (s, t) is a cognitive state associated with a realistic
history and c0 = (s0 , t0 ) is a cognitive state where the mental model
chunk representing premise 1 has correctly been recalled.
This can be encoded as a satisfiability problem as follows. A
cognitive state can be characterized by a formula over variables V
which model buffer contents, i.e. cognitive states. We can assume
formula us0 over variables V0 to encode cognitive state s and us0 . over
variables V0 to encode cognitive state s0 . The precondition p of rule
r can be seen as a formula over V, the action a relates s and s0 , so it is
a formula over V and V0 .
Furthermore, we use A(c, t) to denote the activation of chunk c at
time t. We use optimized base-level learning to calculate activation
values: A(c, t) = ln (2) - ln (1 - d) - d(t - tc) where tc is the first
time chunk c was presented. For our experiments, we consider two

Fig. 2 Example sequence of cognitive states (circles) in between


environment event E1 and E2 (rectangles). A cognitive state which
leads to a correct reply is denoted by V, and a state which leads to a
wrong reply or no reply at all as X. Label r indicates a state where a
retrieval request is posed to the declarative module

Cogn Process (2014) 15 (Suppl 1):S1S158

S79

chunks c1, which correctly represents premise 1, and c2, which does
not.
The formula to be checked for satisfiability, then, is
9d; s : us ^ p ^ Ac1 ; t [ s ^ Ac2 ; t\Ac1 ; t ^ a ^ u0s
^ t0  t\tb :

As a proof-of-concept, we have used the SMT solver SMTInterpol


(Christ, Hoenicke and Nutz 2012) to check an instance of (2). With an
appropriate start cognitive state, SMTInterpol reports satisfiability of
(2) and provides a satisfying valuation for d and s in about 1 s in total.
If we choose an initial cognitive state where the activation of c1 is too
low, SMTinterpol proves (2) unsatisfiable as expected.
By adding, e.g., constraints on s and d to (2), we can use the same
procedure to check whether the model is valid for particular values of
d and s which lie within a range accepted by the community.
Note that our approach is not limited to the analysis of single rules.
Given an upper bound n on the number of rules possibly executed
between two points in time, a formula similar to (2) can be
constructed.
Conclusion
We propose a new method to check whether and under which conditions a psychological theory implemented in ACT-R predicts
experimental data. Our method is based on stating the modelling
problem as a satisfiability problem which can be analyzed by an SMT
solver.
With this approach it is in particular no longer necessary to write a
complete ACT-R model in order to evaluate a psychological theory. It
is sufficient to provide those rules which are possibly enabled during
the time considered for the analysis.
For example, in Albrecht and Ragni (2014) we propose a cognitive
model for the Tower of London task, where an upper bound on the
time to complete a retrieval request for the target position of a disk is
defined as the time it takes the visual module to encode the start
position. We expect the evaluation of such mechanisms to become
much more efficient using our approach as compared to simulationbased approaches.
In general, we believe that by using our approach the overall
process of cognitive modelling can be brought to a much more efficient level by analyzing crucial aspects of psychological theories
before entering the often tedious phase of complete ACT-R
modelling.
References
Albrecht R (2013) Towards a formal description of the ACT-R unified
theory of cognition. Unpublished masters thesis, Albert-Ludwigs-Universitat Freiburg
Albrecht R, Ragni M (2014) Spatial planning: an ACT-R model for
the tower of London task. In: Proceedings of spatial cognition
conference 2014, to appear
Albrecht R, Westphal B (2014) F-ACT-R: dening the architectural
space. In: Proceedings of KogWis 2014, to appear
Anderson JR (1983) The architecture of cognition, vol 5. Psychology
Press
Anderson JR (2007) How can the human mind occur in the physical
universe? Oxford U Press
Christ J, Hoenicke J, Nutz A (2012) SMTInterpol: an interpolating
SMT solver. In Donaldson AF, Parker D (eds) SPIN, vol 7385.
Springer, pp 248254
De Moura L, Bjrner N (2011) Satisfiability modulo theories: introduction and applications. Commun ACM 54 (9): 6977. doi:
10.1145/1995376.1995394
Ragni M, Fangmeier T, Brussow S (2010) Deductive spatial reasoning: from neurological evidence to a cognitive model. In:
Proceedings of the 10th international conference on cognitive
modeling, pp 193198

Ragni M, Knauff M, Nebel B (2005) A computational model for


spatial reasoning with mental models. In: Proceedings of the 27th
annual cognition science conference, pp 10641070

F-ACT-R: defining the ACT-R architectural space


Rebecca Albrecht, Bernd Westphal
Informatik, Universitat Freiburg, Germany
Abstract
ACT-R is a unified theory of cognition and a cognitive architecture
which is widely used in cognitive modeling. However, the semantics
of ACT-R is only given by the ACT-R interpreter. Therefore, an
application of formal methods from computer science in order to, e.g.,
analyze or compare cognitive models wrt. different global parameter
settings is not possible. We present a formal abstract syntax and
semantics for the ACT-R cognitive architecture as a cornerstone for
applying formal methods to symbolic cognitive modeling.
Keywords
ACT-R, Cognitive Architectures, Formal Methods, Abstract Syntax,
Formal Semantics
Introduction
In Cognitive Science researchers describe human cognitive processes in order to explain human behavioral patterns found in
experiments. One approach is to use cognitive architectures which
implement a set of basic assumptions about human cognitive processes and to create cognitive models with respect to these
assumptions. ACT-R (Anderson 1983, 2007) is one such cognitive
architecture, which provides a programming language to create a
cognitive model and an interpreter to execute the model. The ACTR architecture is a hybrid architecture which includes symbolic and
subsymbolic mechanisms. Symbolic mechanisms consist of three
concepts, namely assuming a modular structure of the human brain,
using chunks as basic declarative information units and production
rules to describe processing steps. Subsymbolic processes are
associated with the modules behavior. The modules behavior is
controlled by so-called global parameters, which enable the execution of a cognitive model with respect to different assumptions
about human cognition.
In this work we introduce a formal abstract syntax and semantics
for the ACTR cognitive architecture. An ACT-R architecture is
defined as a structure which interprets syntactic components of an
ACT-R model with respect to psychological assumptions, e.g. global
parameter settings. As a result, we construct a complete transition
system which describes all possible computations of an ACT-R model
with respect to one ACT-R architecture. The architectural space of
ACT-R is defined as the set of all possible ACT-R architectures.
State of the Art
ACT-R. The functionality of ACT-R is based on three concepts.
Firstly, there are basic information units (chunks) which describe
objects and their relationship to each other. A chunk consists of a
chunk type and a set of slots which reference other chunks.
Secondly, the human brain is organized in a modular structure,
that is, information processing is localized differently wrt. how
information is processed. There are different modules for perception,
interaction with an environment, and internal mental processes. When
requested, each module can process one chunk at a time and the
processing of chunks needs time to be completed. The processed
chunk is made accessible by the module through an associated buffer.
A state in cognitive processing (cognitive state) is the set of chunks
made accessible by modules through associated buffers.
Thirdly, there are cognitive processing steps, i.e. changing a
cognitive state by altering or deleting chunks which are made
accessibly by modules, or requesting new chunks from modules. This

123

S80
is accomplished by the execution of production rules. A production
rule consists of a precondition, which characterizes cognitive states
where the production rule is executable, and an action which
describes changes to cognitive states when the production rule is
executed. Basically, actions request modules to process certain
chunks. Which chunk is processed and how long this processing takes
depends on the implementation of psychological assumptions within
the modules and may be controlled by global parameters.
Formalization of Symbolic Cognitive Architectures. To the best of our
knowledge, there is no formal abstract description of ACT-R. Other
works try to make cognitive modelling more accessible by utilizing
other modelling formalisms, like GOMS (Card, Moran and Newell
1983) as high level languages for ACT-R (Salvucci, Lee 2003; St.
Amant, Freed and Ritter 2005; St. Amant, McBride and Ritter 2006).
In other approaches the authors propose high-level languages which
can be compiled into different cognitive architectures, e.g. ACT-R
and SOAR (Laird, Newell and Rosenbloom 1987). This includes
HERBAL (Cohen, Ritter and Haynes 2005; Morgan, Haynes, Ritter
and Cohen 2005) and HLSR (Jones, Crossman, Lebiere and Best
2006). All these approaches do not report a formal description for
ACT-R but only describe the high-level language and compilation
principles.
In Schultheis (2009), the author introduces a formal description for
ACT-R in order to prove Turing completeness. However, this formal
description is too abstract to be used as a complete formal semantics
for ACT-R. In Stewart and West (2007) the authors analyze the
architectural space of ACT-R. In general, this idea is similar to the
idea presented in this work. However, the result of their analysis is a
new implementation of ACT-R in the Python programming language.
Therefore, it is not abstract and, e.g., not suitable for applying formal
methods from software engineering.
A Formal Definition of ACT-R
In this section, we describe the basic building blocks of our formalization of ACTR. The formalization complies to the ACT-R theory as
defined by the ACT-R interpreter. Firstly, we provide an abstract
syntax for ACT-R models which includes chunk instantiations, abstract
modules, and production rules. In our sense, an ACT-R model is simply
a syntactic representation of a cognitive process. Secondly, we formally
introduce the notion of architecture as an interpretation of syntactic
entities of an ACT-R model with respect to psychological assumptions,
i.e. subsymbolic mechanisms. This yields a representation of cognitive
states and finite sequences thereof. Thirdly, for a given model we
introduce an infinite state, timed transition system over cognitive states
which is induced by an architecture.
Abstract Syntax. We consider a set of abstract modules as a generalization of the particular modules provided by the ACT-R tool. A
module M consists of a finite set of buffers B, a finite set of module
queries Q, and a finite set of action symbols A. Buffers are represented as variables which can be assigned to chunks. Module queries
are represented as Boolean variables. As action symbols we consider
the standard ACT-R action symbols + , = , and -.
In order to describe the ACT-R syntax we define the signature of
ACT-R models which is basically a set of syntactic elements of a
model. The signature of an ACT-R model consists of a set of modules, a set of chunk types, a set of ACT-R variables, and a set of
relation symbols. A chunk type consists of a type name and a finite set
of slot names (or slots for short).
An abstract production rule consists of a precondition and an
action. A precondition is basically a set of expressions over a models
signature, i.e. over the content of buffers of a module parameterized
by ACT-R variables and module queries. An action is also an
expression over the models signature which uses action symbols of
modules.
An abstract ACT-R model consists of a finite set of production
rules R, a finite set of initial buffer actions A0 in order to define the
initial state, and a finite set of chunk type instantiations C0.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Architecture. In this section we describe the formal interpretation of
an abstract ACT-R model with respect to psychological assumptions.
We propose to denote by architecture a structure which provides all
necessary means to interpret an abstract ACT-R model. To this end,
an architecture defines chunks as the building blocks of declarative
knowledge, i.e. instantiations of chunk types, an interpretation function for each action symbol of a module, and a production rule
selection mechanism.
In order to describe all computations of an ACT-R model as a transition
system, we introduce the notion of a cognitive state and finite sequences
thereof which are induced by the execution of production rules. The
introduction of finite sequences of cognitive states is necessary as the
interpretation of actions depends on the history of a model. As the most
prominent example consider base-level learning in ACT-R. A request to
the declarative module yields one chunk as a result. In general, it is
possible that more than one chunk fits the request. Which chunk is
returned by the declarative module depends on how often and when all
fitting chunks were processed by the model before.
We use D to denote the set of chunks, where a chunk c 2 D is a
unique entity which has a chunk type and maps each slot (as defined
by the chunk type) to another chunk.
A cognitive state c is a function which maps each buffer b 2 B to a
chunk c 2 D and a delay d 2 R
0 . The delay corresponds to the timing
behavior of modules. By mapping buffer b to a delay d [ 0 we
indicate that there is an action pending and that it will be completed in
d time units. If there is no action pending, d is 0. This is a slightly
different view as common in ACT-R, where a chunk is accessible in a
cognitive state only after it has been processed by the module, i.e. if
the modules delay d is 0. Intuitively, in our representation, an
interpreter is able to look ahead when scheduling actions. In the
following, we use C to denote the set of all cognitive states and Cpart
to denote the set of all partial cognitive states, i.e., functions which
not necessarily assign all buffers. A finite trace p is simply a finite
sequence c0 ; c1 ; . . .; cn 2 C of cognitive states. In the following, we
use P to denote the set of all finite traces.
Given an action symbol a 2 A of a module M, an interpretation of a
D
is a function Ihai : P ! 2Cpart K2 which assigns to each finite trace p
a set of possible effects of the action. An effect is a triple
(cpart, k, C) consisting of a partial cognitive state cpart, a valuation
k 2 K of module queries, and a set C 2 2D of chunks. The partial
cognitive state cpart defines an update of the buffer contents, k provides
new values for module queries, and C comprises the chunks which are
removed from buffers and which have to be considered for an update of
the declarative memory. Similarly, the production rule selection
mechanism is formally a function S : P ! 2R which yields a set of
production rules. The production selection mechanism decides whether
a precondition is satisfied in a cognitive state given an interpretation of
relation symbols and an assignment of ACT-R variables to chunks.
Note that our notion of architecture provides a clear interface
between the symbolic layer of ACT-R, i.e. the world of rules and
chunks, and the sub-symbolic layer, i.e. formal principles corresponding to human cognitive processing captured by the
interpretation functions of actions symbols associated with modules.
Furthermore, each choice of global parameters e.g. the decay rate in
base-level learning corresponds to exactly one architecture as defined
above. Architectures differ in the definitions of the interpretation
functions I, i.e. which effects are obtained for a given finite trace, and
in the production rule selection function S.
Behavior of ACT-R Models. In this section we introduce the computational space of an ACT-R model given an ACT-R architecture.
This is done by introducing a labelled, timed transition system as
induced by a model and an architecture. To this end, we define the
following transition relation. Two time-stamped cognitive states
(c, t) and (c0 , t0 ) are in transition relation wrt. a production rule r 2 R,
an execution delay s 2 R
0 for production rule r, a set of chunks
x 2 D, and a finite trace p 2 P, i.e.

Cogn Process (2014) 15 (Suppl 1):S1S158


r;s;x

c; tp! c0 ; t0 ;
if and only if production rule r is executable in cognitive state wrt. the
finite trace p, i.e. if r 2 S(p, c), if the effect of the actions in
r according to the interpretation functions of the action symbols yields
c0 , and if time-stamp t0 is t + s.
The introduced transition relation corresponds to a cognitive
processing step in ACT-R, i.e. the execution of one production rule.
The transition relation ? induces an (infinite state) timed transition
system with the initial state defined by the cognitive state given by
initial buffer actions A0. Given an ACT-R model, there is a one-to-one
correspondence between the set of simulation runs obtainable from
the ACT-R tool (for a given set of parameters) and computation paths
in the timed transition system induced by the architecture corresponding to the chosen parameters. We validated the formal
description by comparing a prototype implementation to the ACT-R
interpreter for several models described in the ACT-R tutorial.
Conclusion
In this work, we presented the first comprehensive, high-level formal
semantics for the ACT-R programming language as defined by the
ACT-R interpreter. By our notion of ACT-R architectures, we have
precisely captured the architectural space of ACT-R.
Our formalization lays the ground to approach a number of known
issues with the ACT-R modelling language. Firstly, our notion of
architecture can be used to explicitly state all assumptions under which
cognitive models are created and evaluated. Then the architectures used
for different cognitive models can be compared precisely due to the
formal nature of our definition. We expect such comparisons to provide
deeper insights into human cognition as such. Today, mechanisms and
parameter settings employed for modelling and evaluation are often
neither reported nor discussed, mainly due to the intransparent integration of these principles in the ACT-R interpreter.
Secondly, our formal semantics allow to compare different ACT-R
models. Whether two models with (syntactically) different rule sets
describe the same behavior now amounts to proving that the induced
timed transition systems are equivalent.
Thirdly, our formal view on ACT-R models allows to go beyond
todays quantitative evaluation of ACT-R models with the ACT-R
interpreter towards a qualitative evaluation. Today, the ACT-R interpreter is typically used to compute abstract quantitative figures like the
average time needed by the model to solve a certain task. Our formalization provides a stepping stone to, e.g., formal analysis
techniques. With these techniques we can, for instance, analyze whether and under what conditions certain aspects of psychological
theories (Albrecht, Westphal 2014) can possibly predict empirical data,
or check whether and under what conditions a certain cognitive state
which is crucial for a modelled psychological theory is reachable.
Last but not least, formal techniques can be applied to improve the
software engineering aspect of ACT-R modelling which is often
perceived to be rather inefficient and error prone (Morgan et al. 2005;
Jones et al. 2006) in the literature.
Furthermore, the scope of our work is not limited to ACT-R but
has a clear potential to affect the whole domain of rule-based cognitive architectures. Firstly, efforts to provide alternative ACT-R
interpreters like (Stewart, West 2007) can refer to a common reference semantics. Secondly, we are able to formally establish
connections between different cognitive architectures ranging from
general purpose architectures like SOAR to special purpose architectures like CasCas (Ludtke et al. 2006).
In future work, our formalization needs to be extended to cover
probabilistic aspects. Furthermore, we plan to extend the prototype
implementation of our formal description (Albrecht, Giewein and
Westphal 2014) to support more ACTR models before we investigate
options for improved high level model description languages that are
explicitly suitable for the ACT-R theory.

S81
References
Albrecht R, Giewein M, Westphal B (2014) Towards formally
founded ACT-R simulation and analysis. In: Proceedings of
KogWis 2014, to appear
Albrecht R, Westphal B (2014) Analyzing psychological theories
with F-ACT-R. In: Proceedings of KogWis 2014, to appear
Anderson JR (1983) The architecture of cognition, vol 5. Psychology
Press.
Anderson JR (2007) How can the human mind occur in the physical
universe? Oxford U Press
Card SK, Moran TP, Newell A (1983) The psychology of human
computer interaction. CRC.
Cohen MA, Ritter FE, Haynes SR (2005) Herbal: a high-level language and development environment for developing cognitive
models in soar. In: Proceedings of 14th conference on behavior
representation in modeling and simulation, pp 177182
Jones RM, Crossman JA, Lebiere C, Best BJ (2006) An abstract language for cognitive modeling. In: Proceedings of 7th international
conference on cognitive modeling. Lawrence Erlbaum, Mahwah
Laird JE, Newell A, Rosenbloom PS (1987) Soar: an architecture for
general intelligence. Artif Intell 33(1):164
Ludtke A, Cavallo A, Christophe L, Cifaldi M, Fabbri M, Javaux D
(2006) Human error analysis based on cognitive architecture. In
HCI-Aero, pp 4047
Morgan GP, Haynes SR, Ritter FE, Cohen MA (2005) Increasing
Efficiency of the Development of User ModelsIn SIEDS, pp 8289
Salvucci DD, Lee FJ (2003) Simple cognitive modeling in a complex
cognitive architecture. In: CHI pp 265272
Schultheis H (2009) Computational and explanatory power of cognitive architectures: the Case of ACT-R. In: Proceedings of 9th
international conference cognitive modeling, pp 384389
St. Amant R, Freed AR, Ritter FE (2005) Specifying ACT-R models of
user interaction with a GOMS language. Cogn Syst Res 6(1):7188
St. Amant R, McBride SP, Ritter FE (2006) An AI planning perspective on abstraction in ACT-R Modeling: toward an HLBR
language manifesto. In: Proceedings of ACT-R Workshop
Stewart TC, West RL (2007) Deconstructing and reconstructing ACTR: exploring the architectural space. Cogn Syst Res 8(3):227236

Defining distance in language production: extraposition


of relative clauses in German
Markus Bader
Goethe-Universitat Frankfurt, Institut fur Linguistik, Frankfurt am
Main, Germany
Abstract
This paper presents results from a corpus study and two language production experiments that have investigated the position of relative clauses
in German. A relative clause can appear either adjacent to its head noun
or extraposed behind the clause final verb. The corpus data show that the
major factor deciding whether to extrapose or not is the distance that has
to be crossed by extraposition. Relative clause length has an effect too,
but a much smaller one. The experimental results show that distance is
not defined as number of words but as new discourse referents in the
sense of the Dependency Locality Theory of Gibson (2000).
Keywords
Sentence production, Extraposition, Dependency length, Dependency
Locality Theory (DLT)
Introduction
A large body of research into word order variation has shown that
constituent weight is a major factor determining the choice between
competing syntactic alternatives (e.g., Hawkins 1994; Wasow 2002).
More recently, it has become common to define weight in terms of the

123

S82
length of syntactic dependencies, like the dependencies between verbs
and their arguments (e.g., Hawkins 2004; Gildea, Temperley 2010).
This raises the question of how dependency length is to be measured.
The syntactic alternation considered in this paper concerns the
position of relative clauses in German. As shown in (1), a relative
clause in German can appear either adjacent to its head noun (1-a) or
extraposed behind the clause final verb (1-b).

Cogn Process (2014) 15 (Suppl 1):S1S158


same way as proposed by the DLT for language comprehension,
namely in terms of new discourse referents, not in terms of words.
Corpus Analysis
About 2000 sentences containing a relative clause in either adjacent
or extraposed position were drawn from the deWaC corpus (Baroni,
Bernardini, Ferraresi and Zanchetta 2009) and analyzed. Preliminary
results of the ongoing analysis are shown in Figs. 1 and 2. Figure 1
shows the effect of relative clause length. Figure 2 shows the effect of
the post head- noun region, which is the region between head noun/
relative clause and clause-final verb (Geschenke in (3)). The verb is
not included because the verb has always to be crossed when extraposing and additional analyses show that the length of the verbal
complex has only very small effects on the rate of extraposition.
When only the total extraposition distance is considered, as in the
older corpus literature, one misses the point that it is the length of the
post head-noun region which is crucially involved in determining
extraposition.
In accordance with earlier corpus studies of relative clause
placement in German, the rate of extraposition increases with
increasing length of the relative clause and decreases with increasing
length of the post head-noun region. Furthermore, the length of the
post head-noun region is a much more important predictor of relative

When deciding whether to keep the relative clause adjacent to its


head noun or to extrapose it behind the clause-final verb, two
dependencies have to be considered. One is the dependency between
head noun and relative clause and the second one is the dependency
between head noun and clause-final verb. As shown in (2) and (3),
these two dependencies stand in a trade-off relation.

When the relative clause is adjacent to the head noun, the head
nounrelative clause dependency (solid arrow) is optimal whereas
the head nounverb dependency (dashed arrow) is not because the
relative clause intervenes between head noun and verb. Extraposition
of the relative clause shortens the head nounverb dependency but
lengthens the head-noun relative clause dependency, that is, while
the former dependency improves the latter one becomes worse.
Corpus studies (Hawkins 1994; Uszkoreit et al. 1998) show that
the decision to extrapose depends on both dependencies. First, the rate
of extraposition increases with increasing length of the relative clause.
Second, the rate of extraposition declines with increasing extraposition distance, that is, with an increasing amount of material that
intervenes between head noun and relative clause in the extraposed
variant. In (3), for example, extraposition has to cross two words
(Geschenke geben).
In studies of language production (Stallings, MacDonald 2011)
and corpus research (Gildea, Temperley 2010), distance is measured
as number of words. The same is true for the efficiency theory proposed in (Hawkins 2004) which is neutral with regard to language
production or language comprehension. This contrasts with the
Dependency Locality Theory (DLT) of (Gibson 2000), which is a
theory of processing load during language comprehension. According
to the DLT, dependency length is not measured in number of words
but in number of new discourse referents.
The aim of the present work is to test the hypothesis that dependency length for purposes of language production is defined in the

123

Fig. 1 Proportion of extraposition depending on the length of the


relative clause (in words)

Fig. 2 Proportion of extraposition depending on the length of the


pre-verbal material (in words)

Cogn Process (2014) 15 (Suppl 1):S1S158


clause placement than the length of the relative clause. When the post
head-noun region is empty, extraposition is almost obligatory, but
already a post head-noun region of four words drives the extraposition
rate down to less than 10 %.
In the following experiments, the post head-noun region will range
from 0 to 2 words. As shown in Fig. 2, this relatively small increase
has strong effects on the decision to extrapose when averaged across
all different kinds of intervening material. In this case, the extraposition rate goes down from ca. 90 % for 0 words to 60 % for one word
and to 35 % for two words. The question addressed by the following
two experiments is whether more fine-grained distinctions show up
when looking at particular types of intervening material.
Experiment 1
In order to decide between defining dependency length in terms of
number of words or number of new discourse referents 32 students
participated in an oral production experiment which was a variant of
the production-from-memory task (Bock, Warren 1985). Participants
first read a main clause as in (4). After a visual prompt like Max said
that, the initial main clause had to be repeated orally from memory in
the form of an embedded clause. While the initial main clause fixed
the lexical content of the to-be-produced embedded clause, participants were completely free with regard to the position of the relative
clause.

The experiment varied the amount of material that had to be


crossed by extraposition in addition to the verb: nothing (4-a), a bare
NP object (4-b), or an NP object containing a determiner (4-c). The
latter two conditions differ in number of words but are identical in
number of new discourse referents. As shown by the corpus analysis,
a difference of one word has a strong effect on the rate of extraposition in the length range under consideration.
The percentages of sentences with extraposed relative clauses are
presented in Table 1. Table 1 shows that the rate of extraposition
decreases substantially in the presence of an object but the difference
between one- and two-word objects is quite small. The results were
analyzed by means of mixed-effect logistic regression using the
R-package LME4 (Bates, Maechler 2010). The experimental factors
were coded in such a way that all contrasts test whether differences
between means are significant (so-called contrast coding). Table 2
shows the results of the statistical analysis. The difference between 0
words and 1 word was significant but the difference between 1 word
and 2 words was not. In sum, the results of Experiment 1 suggest that
distance is defined as number of new discourse referents, as in the
DLT, and not as number of words.
Experiment 2
To corroborate the results of Experiment 1, Experiment 2 uses the
same experimental procedure for testing material that differs only in
one respect from the material investigated in the first experiment. As
shown in (5), the condition with one additional word before the verb
now contains the indefinite pronoun etwas (something) instead of a
bare noun.

S83
Both the indefinite pronoun and a bare noun introduce a new
discourse referent and should thus block extraposition in the same
way. However, because the indefinite pronoun lacks lexical content, it
causes less semantic processing costs. Since the cost of semantic
processing is the underlying reason of why distance is measured in
terms of new discourse referents in the DLT, it could be expected that
it is easier to extrapose across an indefinite pronoun than across a bare
noun.
27 students participated in Experiment 2. The results, which are
also shown in Table 1, reveal a 14 % drop in extraposition rate in the
presence of a one-word object and a further 9 % drop when going
from one- to two-word objects. The results were analyzed as described for Experiment 1. The results of the logistic mixed effect
regression are shown in Table 3. The difference between 0 words and
1 word was significant but the difference between 1 word and 2 words
failed to reach significance.
Discussion
The experimental results presented in this paper show that the decision between keeping a relative clause adjacent to its head noun and
extraposing the relative clause behind the clause-final verb is strongly
affected by the amount of material that intervenes between head noun
(including the relative clause) and the clause final verb. When a new
discourse referent intervenes, the rate of extraposition is substantially
reduced. Whether the new discourse referent was introduced by a oneword NP or a two-word NP had no significant effect, although
numerically there were some differences in the expected direction.
The results thus suggest that dependency length is defined in the
same way for language production and language comprehension,
namely in terms of new discourse referents. This in turn argues that
the DLT has a broader coverage than just processing load during
language comprehension.
An alternative to defining weight in terms of new discourse
referents is the prosodic theory proposed by (Anttila, Adams and
Speriosu 2010) in their analysis of the English dative alternation. In
a nutshell, (Anttila et al. 2010) propose that dependency length
should be measured as number of intervening phonological phrases,
where a phonological phrase consists of an accented lexical word
possibly preceded by unaccented function words. According to this

Table 1 Percentages of extraposition in Experiments 1 and 2


Structure

% Extraposed in Exp 1

% Extraposed in Exp 2

38

54

15

40

Det + N

11

31

Table 2 Results of mixed effect model for Experiment 1


Contrast

Estimate

Std. Error

z value

Pr([|z|)

vs. N

2.3233

0.6387

3.638

0.0002

N vs. Det + N

0.5776

0.8060

0.717

0.4735

Table 3 Results of mixed effect model for Experiment 2


Contrast

Estimate

Std. Error

z value

Pr([|z|)

vs. N

1.0587

0.3855

2.747

0.0060

N vs. Det + N

0.5054

0.3313

1.526

0.1271

123

S84
definition, an NP consisting of a bare noun like Gedichte (poems)
and an NP consisting of a determiner and a noun like einige Gedichte (some poems) both constitute a single phonological phrase.
This would be compatible with the finding of Experiment 1 that the
rate of extraposition did not differ significantly between these two
types of NPs.
In contrast to a bare noun like Gedichte, an indefinite pronoun like
etwas something does not form a phonological phrase because etwas
is an unaccented function word. This predicts that the intervening
indefinite pronoun etwas should be invisible with regard to extraposition. However, as shown by the results for Experiment 2, the rate of
extraposition decreased significantly when etwas was present. The
rate of extraposition decreased even further when an NP consisting of
a determiner and a noun intervened, but this further decrease was not
significant. The results of Experiment 2 do thus not support the
prosodic definition of distance proposed by (Anttila et al. 2010).
In sum, the results of the two experiments reported in this
paper favor a definition of dependency length in terms of intervening new discourse referents. The two alternatives that were
considereddistance measured as number of words or number of
phonological phrasescould not account for the complete pattern
of results.
References
Anttila A, Adams M, Speriosu M (2010) The role of prosody in the
English dative alternation. Lang Cogn Process 25(79):946981
Baroni M, Bernardini S, Ferraresi A, Zanchetta E (2009) The WaCky
wide web: a collection of very large linguistically processed
web-crawled corpora. Lang Resour Eval J 23(3):209226. doi:
10.1007/s10579-009-9081-4
Bates DM, Maechler M (2010) lme4: linear mixed-effects models
using S4 classes
Bock JK, Warren RK (1985) Conceptual accessability and syntactic
structure in sentence formulation. Cognition 21:4767
Gibson E (2000) The dependency locality theory: a distance-based
theory of linguistic complexity. In Marantz A, Miyashita Y,
ONeil W (eds) Image, language, brain. Papers from the first
mind articulation project symposium. MIT Press, Cambridge,
pp 95126
Gildea D, Temperley D (2010) Do grammars minimize dependency
length? Cogn Sci 34:286310
Hawkins JA (1994) A performance theory of order and constituency.
Cambridge University Press, Cambridge
Hawkins JA (2004) Efficiency and complexity in grammars. Oxford
University Press, Oxford
Stallings LM, MacDonald MC (2011) Its not just the heavy NP:
relative phrase length modulates the production of heavy-NP
shift. J Psycholing Res 40(3):177187
Uszkoreit H, Brants T, Duchier D, Krenn B, Konieczny L, Oepen S,
Skut W (1998) Studien zur performanzorientierten Linguistik:
Aspekte der Relativsatzextraposition im Deutschen. Kognitionswissenschaft 7:129133
Wasow T (2002) Postverbal behavior. CSLI Publications, Stanford

How is information distributed across speech


and gesture? A cognitive modeling approach
Kirsten Bergmann, Sebastian Kahl, Stefan Kopp
Bielefeld University, Germany
Abstract
In naturally occurring speech and gesture, meaning occurs organized
and distributed across the modalities in different ways. The underlying cognitive processes are largely unexplored. We propose a model

123

Cogn Process (2014) 15 (Suppl 1):S1S158


based on activation spreading within dynamically shaped multimodal
memories, in which coordination arises from the interplay of visuospatial and linguistically shaped representations under given cognitive
resources. A sketch of this model is presented together with simulation results.
Keywords
Speech, Gesture, Conceptualization, Semantic coordination, Cognitive modeling
Introduction
Gestures are an integral part of human communication and they are
inseparably intertwined with speech (McNeill, Duncan 2000). The
detailed nature of this connection, however, is still a matter of considerable debate. The data that underlie this debate have for the most
part come from studies on the coordination of overt speech and
gestures showing that the two modalities are coordinated in their
temporal arrangement and in meaning, but with considerable variations. When occurring in temporal proximity, the two modalities
express the same underlying idea, however, not necessarily identical
aspects of it: Iconic gestures can be found to be redundant with the
information encoded verbally (e.g., round cake + gesture depicting
a round shape), to supplement it (e.g., cake + gesture depicting a
round shape), or even to complement it (e.g., looks like
this + gesture depicting a round shape). These variations in meaning
coordinationtogether with temporal synchronyled to different
hypotheses about how the two modalities encode aspects of meaning
and what mutual influences between the two modalities could
underlie this. However, a concrete picture of this and in particular of
the underlying cognitive processes is still missing.
A couple of studies have investigated how the frequency and
nature of gesturing, including its coordination with speech is influenced by cognitive factors. There is evidence that speakers indeed
produce more gestures at moments of relatively high load on the
conceptualization process for speaking (Kita, Davies 2009; Melinger,
Kita 2007). Moreover, supplementary gestures are more likely in
cases of problems of speech production (e.g. disfluencies) or when the
information conveyed is introduced into the dialogue (and thus conceptualized for the first time) (Bergmann, Kopp 2006). Likewise,
speakers are more likely to produce non-redundant gestures in facetip-face dialogue as opposed to addressees who are not visible
(Bavelas, Kenwood, Johnson and Philips 2002).
Chu et al. (Chu, Meyer, Foulkes and Kita 2013) provided data
from an analysis of individual differences in gesture use demonstrating that poorer visual/spatial working memory is correlated with
a higher frequency of representational gestures. However, despite this
evidence, Hostetter and Alibali (Hostetter, Alibali 2007) report findings suggesting that speakers who have stronger visual-spatial skills
than verbal skills produce higher rates of gestures than other speakers.
A follow-up study demonstrated that speakers with high spatial skills
also produced a higher proportion of non-redundant gestures than
other speakers, whereas verbal-dominant speakers tended to produce
such gestures more in case of speech disfluencies (Hostetter, Alibali
2011). Taken together this suggests that non-redundant gesturespeech combinations are the result of speakers having both strong
spatial knowledge and weak verbal knowledge simultaneously, and
avoiding the effort of transforming the one into the other.
In the literature, different models of speech and gesture production
have been proposed. One major distinguishing feature is the point
where in the production process cross-modal coordination can take
place. The Growth Point Theory (McNeill, Duncan 2000) assumes
that gestures arise from idea units combining imagery and categorical
content. Assuming that gestures are generated pre-linguistically,Krauss, Chen and Gottesman (2000) hold that the readily
planned and executed gesture facilitates lexical retrieval through
crossmodal priming. De Ruiter (2000) proposed that speech-gesture
coordination arises from a multimodal conceptualization process that

Cogn Process (2014) 15 (Suppl 1):S1S158


selects the information to be expressed in each modality and assigns a
zyurek (2003) agree that gesperspective for the expression. Kita, O
ture and speech are two separate systems interacting during the
conceptualization stage. Based on crosslinguistic evidence, their
account holds that language shapes iconic gestures such that the
content of a gesture is determined by bidirectional interactions
between speech and gesture production processes at the level of
conceptualization, i.e. the organization of meaning. Finally, Hostetter,
Alibali (2008) proposed the Gestures as Simulated Action framework
that emphasizes how gestures may arise from an interplay of mental
imagery, embodied simulations, and language production. According
to this view, language production evokes enactive mental representations which give rise to motor activation.
Inspite of a consistent theoretical picture starting to emerge, many
questions about the detailed mechanisms remain open. A promising
approach to explicate and test hypotheses are cognitive models that
allow for computational simulation. However, such modeling attempts
for the production of speech and gestures are almost inexistent. Only
Breslow, Harrison and Trafton (2010) proposed an integrated production model based on the cognitive architecture ACT-R
(Anderson, Bothell, Byrne, Lebiere and Qin 2004). This model,
however, has difficulties to explain gestures that clearly complement
or supplement verbally encoded meaning.
A Cognitive Model of Semantic Coordination
In recent and ongoing work we develop a model for multimodal
conceptualization that accounts for the range of semantic coordination
we see in real-life speech-gesture combinations. This account is
embedded into a larger production model that comprises three stages:
(1) conceptualization, where a message generator and an image
generator work together to select and organize information to be
encoded in speech and gesture, respectively; (2) formulation, where a
speech formulator and a gesture formulator determine appropriate
verbal and gestural forms for this; (3) motor control and articulation

S85
to finally execute the behaviors. Motor control, articulation, and
formulation have been subject of earlier work (Bergmann, Kopp
2009). In the following we provide a sketch of the model, details can
be found in (Kopp, Bergmann and Kahl 2013; Bergmann, Kahl and
Kopp 2013).
Multimodal Memory
The central component in our model is a multimodal memory which
is accessible by modules of all processing stages. We assume that
language production requires a preverbal message to be formulated in
a symbolic-propositional representation that is linguistically shaped
(Levelt 1989) (SPR, henceforth). During conceptualization the SPR,
e.g., a function-argument structure denoting a spatial property of an
object, needs to be extracted from visuo-spatial representations
(VSR), i.e., the mental image of this object. We assume this process
to involve the invocation and instantiation of memorized supramodal
concepts (SMC, henceforth), e.g. the concept round which links the
corresponding visuo-spatial properties to a corresponding propositional denotation. Figure 1 illustrates the overall relation of these
tripartite multimodal memory structures.
To realize the VSR and part of the SMC, we employ a model of
visuo-spatial imagery called Imagistic Description Trees (IDT)
(Sowa, Kopp 2003). The IDT model unifies models from (Marr,
Nishihara 1978; Biederman 1987; Lang 1989) and was designed,
based on empirical data, to cover the meaningful visuo-spatial features in shape-depicting iconic gestures. Each node in an IDT contains
an imagistic description which holds a schema representing the shape
of an object or object part. Important aspects include (1) a tree
structure for shape decomposition, with abstracted object schemas as
nodes, (2) extents in different dimensions as an approximation of
shape, and (3) the possibility of dimensional information to be underspecified. The latter occurs, e.g., when the axes of an object
schema cover less than the three dimensions of space or when an
exact dimensional extent is left open but only a coarse relation

Fig. 1 Overall production architecture

123

S86
between axes like dominates is given. This allows to represent the
visuo-spatial properties of SMCs such as round, left-of or longish. Applying SMC to VSR is realized through graph unification and
similarity matching between object schemas, yielding similarity values that assess how well a certain SMC applies to a particular visuospatially represented entity (cf. Fig. 1). SPR are implemented straight
forward as predicate-argument sentences.
Overall production process
Figure 1 shows an outline of the overall production architecture.
Conceptualization consists of cognitive processes that operate upon
the abovementioned memory structures to create a, more or less
coherent, multimodal message. These processes are constrained by
principles of memory retrieval, which we assume can be modeled by
principles of activation spreading (Collins, Loftus 1975). As in cognitive architectures like ACT-R (Anderson et al. 2004), activations
oat dynamically, spread across linked entities (in particular via
SMCs), and decay over time. Activation of more complex SMCs are
assumed to decay more slowly than activation in lower VSR or SPR.
Production starts with the message generator and image generator
inducing local activations of modal entries, evoked by a communicative goal. VSRs that are sufficiently activated invoke matching
SMCs, leading to an instantiation of SPRs representing the corresponding visuo-spatial knowledge in linguistically shaped ways. The
generators independently select modal entries and pass them on to the
formulators. As in ACT-R, highly activated features or concepts are
more likely to be retrieved and thus to be encoded. Note that, as
activation is dynamic, feature selection depends on the time of
retrieval and thus available resources. The message generator has to
map activated concepts in SPR onto grammatically determined categorical structures, anticipating what the speech formulator is able to
process (cf. Levelt 1989). Importantly, interaction between generators
and formulators in each modality can run top-down and bottom-up
For example, a proposition being encoded by the speech formulator
results in reinforced activation of the concept in SPR, and thus
increased activation of associated concepts in VSR.
In result, semantic coordination emerges from the local choices
generators and formulators take, based on the activation dynamics in
multimodally linked memory representations. Redundant speech and
gesture result from focused activation of supramodally linked mental
representations, whereas non-redundant speech and gesture arise
when activations scatter over entries not connected via SMCs.
Results and outlook
To quantify our modeling results we ran simulation experiments in
which we manipulated the available time (in terms of memory update
cycles) before the model had to come up with a sentence and a gesture
(Kopp et al. 2013; Bergmann et al. 2013). We analyzed the resulting
multimodal utterances with respect to semantic coordination: Supplementary (i.e., non-redundant) gestures were dominant in those runs
with stricter temporal limitations, while redundant ones become more
likely when time available is increased. The model, thus, offers a
natural account for the empirical finding that non-redundant gestures
are more likely when conceptualization load is high, based on the
assumption that memory-based cross-modal coordination consumes
resources (memory, time), and is reduced or compromised when such
resources are limited.
To enable a direct evaluation of our simulation results in comparison with empirical data, we currently conduct experiments to set
up a reference data corpus. In this study, participants are engaged in a
dyadic description task and we manipulate the preparation time
available for utterance planning. The verbal output will subsequently
be analyzed with respect to semantic coordination of speech and
gestures based on a semantic feature coding approach as already
applied in (Bergmann, Kopp 2006).
In ongoing work we extend the model to also account for complementary speech-gesture ensembles in which deictic expressions in
speech refer to their cospeech gesture as in the window looks like

123

Cogn Process (2014) 15 (Suppl 1):S1S158


this. To this end, we advance and refine the feedback signals provided
by the behavior generators to allow for the fine-grained coordination as
it is necessary for the production of this kind of utterances. With this
extension the model will allow to further investigate predictions as
postulated in the lexical retrieval hypothesis (Krauss, Chen and Chawla
1996; Rauscher, Krauss and Chen 1996; Krauss et al. 2000). Although
that model was set up on the basis of empirical data, it was subject to
much criticism based on psycholinguistic experiments and data. Data
from detailed simulation experiments based on our cognitive model can
provide further arguments in this debate.

References
Anderson J, Bothell D, Byrne M, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):10361060
Bavelas J, Kenwood C, Johnson T, Philips B (2002) An experimental
study of when and how speakers use gestures to communicate.
Gesture 2(1):117
Bergmann K, Kahl S, Kopp S (2013) Modeling the semantic coordination of speech and gesture under cognitive and linguistic
constraints. In Aylett R, Krenn B, Pelachaud C, Shimodaira H
(eds) Proceedings of the 13th international conference on intelligent virtual agents. Springer, Berlin, pp 203216
Bergmann K, Kopp S (2006) Verbal or visual: how information is
distributed across speech and gesture in spatial dialog. In: Proceedings of SemDial2006, pp 9097
Bergmann K, Kopp S (2009) GNetIc Using Bayesian decision networks for iconic gesture generation. In: Proceedings of IVA
2009. Springer, Berlin, pp 7689
Biederman I (1987) Recognition-by-components: a theory of human
image understanding. Psychol Rev 94:115147
Breslow L, Harrison A, Trafton J (2010) Linguistic spatial gestures.
In: Proceedings of cognitive modeling 2010, pp 1318
Chu M, Meyer AS, Foulkes L, Kita S (2013) Individual differences in
frequency and saliency of speech-accompanying gestures: the
role of cognitive abilities and empathy. J Exp Psychol Gen
143(2):694709
Collins AM, Loftus EF (1975) A spreading-activation theory of
semantic processing. Psychol Rev 82(6):407428
de Ruiter J (2000) The production of gesture and speech. In McNeill
D (ed) Language and gesture. Cambridge University Press,
Cambridge, pp 284311
Hostetter A, Alibali M (2007) Raise your hand if youre spatial
relations between verbal and spatial skills and gesture production. Gesture 7:7395
Hostetter A, Alibali M (2008) Visible embodiment: gestures as simulated action. Psychon Bull Rev 15/3:495514
Hostetter A, Alibali M (2011) Cognitive skills and gesture-speech
redundancy. Gesture 11(1):4060
Kita S, Davies TS (2009) Competing conceptual representations
trigger cospeech representational gestures. Lang Cogn Process
24(5):761775
zyurek A (2003) What does cross-linguistic variation in
Kita S, O
semantic coordination of speech and gesture reveal? Evidence for
an interface representation of spatial thinking and speaking.
J Memory Lang 48:1632
Kopp S, Bergmann K, Kahl S (2013) A spreading-activation model of
the semantic coordination of speech and gesture. In: Proceedings
of the 35th annual conference of the cognitive science society
(cogsci 2013). Cognitive Science Society, Austin, pp 823828
Krauss R, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell
us? Adv Exp Soc Psychol 28:389450
Krauss R, Chen Y, Gottesman R (2000) Lexical gestures and lexical
access: a process model. In McNeill D (ed) Language and gesture.
Cambridge University Press, Cambridge, pp 261283

Cogn Process (2014) 15 (Suppl 1):S1S158


Lang E (1989) The semantics of dimensional designation of spatial
objects. In Bierwisch M, Lang E (eds) Dimensional adjectives:
grammatical structure and conceptual interpretation. Springer,
Berlin, pp 263417
Levelt WJM (1989) Speaking: from intention to articulation. MIT
Press
Marr D, Nishihara H (1978) Representation and recognition of the
spatial organization of three-dimensional shapes. In: Proceedings
of the royal society of London, vol 200, pp 269294
McNeill D, Duncan S (2000) Growth points in thinking-for-speaking.
In: Language and gesture. Cambridge University Press, Cambridge, pp 141161
Melinger A, Kita S (2007) Conceptualisation load triggers gesture
production. Lang Cogn Process 22 (4):473500
Rauscher F, Krauss R, Chen Y (1996) Gesture, speech, and lexical
access: the role of lexical movements in speech production.
Psychol Sci 7:226231
Sowa T, Kopp S (2003) A cognitive model for the representation and
processing of shape-related gestures. In: Procedings of European
cognitive science conference

Towards formally well-founded heuristics in cognitive


AI systems
Tarek R. Besold
Institute of Cognitive Science, University of Osnabruck, Germany
Abstract
We report on work towards the development of a framework for
the application of formal methods of analysis to cognitive systems
and computational models (putting special emphasis on aspects
concerning the notion of heuristics in cognitive AI) and explain
why this requires the development of novel theoretical methods
and tools.
Keywords
Cognitive Systems, Heuristics, Complexity Theory, Approximation
Theory
Heuristics in Cognitive Systems
An ever-growing number of researchers in cognitive science and
cognitive psychology, starting in the 1970s with (Kahneman, Slovic
and Tversky 1982)s heuristics and biases program and today
prominently heralded, for instance, by (Gigerenzer, Hertwig and Pachur 2011), argues that humans in their common sense reasoning do
not apply any full-edged form of logical or probabilistic reasoning to
possibly highly complex problems, but instead rely on heuristics as
(mostly automatic and unconscious) mechanisms that allow them to
circumvent the impending complexity explosion and nonetheless
reach acceptable solutions to the original problems. All of these
mechanisms are commonly subsumed under the general term heuristics and, following the paradigmatic example given by (Newell,
Simon 1976)s notion of heuristic search, under this label are also
often (re)implemented in cognitive AI.4
Still, on theoretical grounds, from a computational point of view at
least two quite different general types of approach can be imagined:
Either the complexity of solving a problem can be reduced by reducing
the problem instance under consideration to a simpler (but solution
4

Whilst this type of work clearly has lost some of its popularity over
the years, and has been replaced with efforts invested in finding
answers to questions where an optimal solution can provably be
achieved (although under possibly unrealistic or impractical time and/
or space requirements), the study of heuristics-based approaches and
techniques still are a lively field of active research, see, for example,
((Bridewell & Langley, 2011; MacLellan, 2011)).

S87
equivalent) one, or the problem instance stays untouched butinstead of
being perfectly (i.e., precisely) solvedis dealt with in a good enough
(i.e., approximate) way. Against this background, two crucial question
arise: Which problems can actually be solved by applying heuristics?
And how can the notion of heuristics be theoretically modeled on a
sufficiently high level so as to allow for a general description?
In what follows we want to provide a sketch of work towards an
approach to answering these questions using techniques originating
from complexity theory and hardness of approximation analysis. This
choice of formal methods is justified by the observation that, although
computational in nature, systems as developed in cognitive AI and
cognitive systems research can be considered as physical systems
which need to perform their tasks in limited time and with a limited
amount of space at their disposal and thus formal computational
properties (and restrictions on these) are relevant parameters.
Two and a Half Formal Perspectives on Heuristics in Cognitive
Systems
Returning to the two different types of heuristics identified above and
having a look at recent work in complexity and approximation theory,
we find a natural correspondence between the outlined conceptual
approaches and well-known concepts from the respective fields.
The Reduction Perspective:
Over the last years, complexity theory has turned its attention more
and more towards examples of problems which have algorithms that
have worst-case exponential behavior, but tend to work quite well in
practice if certain parameters of the problem are restricted. This has
led to the introduction of the class of fixed-parameter tractable
problems FPT (see, e.g., (Downey, Fellows 1999)):
Definition 1 (FPT) A problem P is in FPT if P admits an O(f(j)nc)
algorithm, where n is the input size, j is a parameter of the input
constrained to be small, c is an independent constant, and f is some
computable function.
A non-trivial corollary can be derived from FPT-membership: Any
instance of a problem in FPT can be reduced to a problem kernel.
Definition 2 (Kernelization) Let P be a parameterized problem. A
kernelization of P is an algorithm which takes an instance x of P with
parameter j and maps it in polynomial time to an instance y such that
x 2 P, if and only if y 2 P, and the size of y is bounded by f(j) (f a
computable function).
Theorem 1 (Kernelizability (Downey, Fellows 1999)) A problem P
is in FPT if and only if it is kernelizable.
This essentially entails that, if a positive FPT result can be obtained,
then (and only then) there is a downward reduction for the underlying
problem to some sort of smaller or less-complex instance of the same
problem, which can then be solved. Returning to the initial quest for
finding a formal characterization of reduction-based heuristics, we
notice that, by categorizing problems according to kernelizability we
can establish a distinction between problem classes which are solvable
by the presented type of reduction and those which are notand can
thus already a priori decide whether a system implementing a mechanism based on a kernelization account generally is (un)able to solve a
certain class. What remains to be shown is the connection between
kernelization and the notion of reduction-based heuristics (or rather the
suitability of kernelization as conceptual characterization of the notion
of reduction in the examined type of heuristics).
The connection is explicated by the correspondence between FPTmembership and kernelizability of a problem: If heuristics are to be as
fast and frugal as commonly claimed, considering them anything but (at
worst) polynomial-time bounded processes seems questionable. But
now, if the reduced problem shall be solvable under resource-critical
conditions, using the line of argument introduced above, we can just
hope for it to be in FPT. Finally, combining the FPT-membership of the
reduced problem with the polynomial-time complexity of the reduction
process (i.e., the presumed heuristics), already the original problem had

123

S88
to be fixed-parameter tractable. This should not come as a surprise, as the
contrary (i.e., a heuristics reducing the overall complexity of solving a
superpolynomial problem to polynomial-time computation by means of
a reduction of the original problem within the same class) would contradict the class membership of the original problem and thus break the
class hierarchy (assuming P = NP). Still, kernelization-based heuristics
are not trivialized by these considerations: Although original and reduced
problem are in FPT, the respective size of the parameters may still differ
between instances (making an important difference in application scenarios for implemented systems).
The Approximation Perspective:
The second perspective on heuristics uses approximation algorithms:
Instead of precisely solving a kernel as proposed by reduction-based
heuristics, we try to compute an approximate solution to the original
problem (i.e., the solution to a relaxed problem). The idea is not any more
to perfectly solve the problem (or an equivalent instance of the same
class), but to instead solve the problem to some satisfactory degree.
A possible analog to FPT in the Tractable AGI thesis is APX, the
class of problems allowing polynomial-time approximation
algorithms:
Definition 3 (APX) An optimization problem P is in APX if P admits
a constant factor approximation algorithm, i.e., there is a constant
factor e [ 0 and an algorithm which takes an instance of P of size
n and, in time polynomial in n, produces a solution that is within a
factor 1 e of being optimal (or 1  e for maximization problems).
This notion in practice crucially depends on the bounding constant for
the approximation ratio: If the former is meaningfully chosen with
respect to the problem, constant-factor approximation allows for
quantifying the good enough aspect of the solution and, thus, might
even offer a way of modeling the notion of satisficing introduced
by (Simon 1956) (which in turn is central to many heuristics considered in cognitive science and psychology, providing additional
empirical grounding for the computational systems in cognitive AI).
Joining Perspectives:
What if the system architect, instead of deciding whether to solve a
certain type of task applying one of the two types of heuristic and then
conducting the respective analysis, just wants to directly check whether
the problem at hand might be solvable by any of the two paradigms?
Luckily, FPT and APX can be integrated via the concept of fixedparameter approximability and the corresponding problem class FPA:
Definition 4 (FPA) The fixed-parameter version P of a minimization
problem is in FPA iffor a recursive function f, a constant k, and
some fixed recursive function gthere exists an algorithm such that
for any given problem instance I with parameter k, and question
OPT(I) B k, the algorithm which runs in O(f(k)nc) (where n = |I|)
either outputs no or produces a solution of cost at most g(k).
As shown by (Cai, Huang 2006), both polynomial-time approximability and fixed-parameter tractability with witness (Cai, Chen 1997)
independently imply the more general fixed-parameter approximability. And also on interpretation level FPA artlessly combines both
views of heuristics, at a time in its approximability character
accommodating for the notion of satisficing and in its fixed-parameter
character accounting for the possibility of complexity reduction by
kernelizing whilst keeping key parameters of the problem fixed.
The Wrong Type of Approximation?
Approximation-based heuristics have been introduced as solution
procedures for a problem producing solutions which are not optimal
but (at least when using a standard like the proposed APX) fall within
a certain defined neighborhood of the optimal one. Here, the degree of
optimality of a solution is measured in terms of proximity of the
solutions value to the optimal value for the optimization problem at
hand. But this is not the only possible way of conceptualizing
approximation: What if emphasis would be put on finding a solution
which is structurally as similar as possible to the original oneso
what if the quality of approximation would be measured in similarity
of structure instead of proximity of values?

123

Cogn Process (2014) 15 (Suppl 1):S1S158


At first sight this seems to be either a trivial issue, or not an issue
at all, depending on whether it is assumed that value similarity and
structural similarity coincide, or it is decided that structure is not of
interest. Still, we believe that dismissing the issue this easily would be
ill-advised: Especially in the context of cognitive systems and highlevel AI in many cases the structure of a problems solution can be of
great relevance. As an example consider a cognitive system built for
maintaining a network of maximally coherent beliefs about complex
domains as, e.g., presented by (Thagard 2000). Whilst, for instance,
(Millgram 2000) has shown that the required form of maximal
coherence over this type of network in its full form is NP-hard,
(Thagard, Verbeurght 1998) proposed several (valuebased) approximation algorithms. Still, a mere value-based approximation scheme
does not yield the desired results: As also demonstrated by (Millgram
2000), two belief assignments can be arbitrarily close in coherence
value and at the same time still arbitrarily far from each other in terms
of which beliefs are accepted and which are rejected.
Unfortunately, whilst our knowledge and command of value-based
approximation has greatly developed over the last decades, structurebased approximation has rarely been studied. (Hamilton, Muller, van
Rooij and Wareham 2007) present initial ideas and define basic
notions possibly forming the foundations of a formal framework for
structure-based approximation. And although these are still only very
first steps towards a complete and well-studied theory, the presented
concepts already allow for several important observations. The most
relevant for the introduced cognitive systems setting is the following:
Value approximation and structural approximation are distinct in
generaland whilst very careful use of the tools of value-based
approximation might partially mitigate this divergence (the most
naive ad-hoc remedy being the use of problem-specific and highly
non-generalizable optimization functions which also take into account
some basic form of structural similarity and not only outcome values
of solutions) it cannot be assumed in general that both notions
coincide in a meaningful way.
Future Work
In the long run we therefore want to develop the presented roots into
an overall framework addressing empirically-inspired aspects of
cognitive system in general. Also, in parallel to the corresponding
theoretical work, we want to put emphasis on showing the usefulness
and applicability of the proposed methods in different prototypical
examples from relevant fields (such as, for example, models of epistemic reasoning and interaction, cognitive systems in general
problem-solving, or models for particular cognitive capacities),
allowing for a mutually informed development process between
foundational theoretical work and application studies.
Acknowledgments
I owe an ever-growing debt of gratitude to Robert Robere (University
of Toronto) for introducing me to the fields of parameterized complexity theory and approximation theory, reliably providing me with
theoretical/technical backup and serving as a willing partner for
feedback and discussion.
References
Bridewell W, Langley P (2011) A computational account of everyday
abductive inference. In: Proceedings ofthe 33rd annual meeting
of the cognitive science society, pp 22892294
Cai L, Chen J (1997) On fixed-parameter tractability and approximability of fNPg optimization problems. J Comput Syst Sci
54(3):465474
Cai L, Huang X (2006) Fixed-parameter approximation: conceptual
framework and approximability results. In Bodlaender H, Langston
M (eds) Parameterized and exact computation. Springer, pp 96108
Downey RG, Fellows MR (1999) Parameterized complexity. Springer
Gigerenzer G, Hertwig R, Pachur T (eds) (2011) Heuristics: the
foundation of adaptive behavior. Oxford University Press

Cogn Process (2014) 15 (Suppl 1):S1S158


Hamilton M, Muller M, van Rooij I, Wareham T (2007) Approximating solution structure. In: Dagstuhl seminar proceedings Nr.
07281. IBFI, Schloss Dagstuhl
Kahneman D, Slovic P, Tversky A (1982) Judgment under uncertainty: Heuristics and Biases. Cambridge University Press
MacLellan C (2011) An elaboration account of insight. In: AAAI fall
symposium: advances in cognitive systems
Millgram E (2000) Coherence: the price of the ticket. J Philos
97:8293
Newell A, Simon HA (1976) Computer science as empirical inquiry:
symbols and search. Commun ACM 19(3):113126
Simon HA (1956) Rational choice and the structure of the environment. Psychol Rev 63:129138
Thagard P (2000) Coherence in thought and action. The MIT Press
Thagard P, Verbeurght K (1998) Coherence as constraint satisfaction.
Cogn Sci 22:124

Action planning is based on musical syntax in expert


pianists. ERP evidence
Roberta Bianco1, Giacomo Novembre2, Peter Keller2,
Angela Friederici1, Arno Villringer1, Daniela Sammler1
1
Max Planck Institute for Human Cognitive and Brain Sciences,
Leipzig, Germany; 2 MARCS Institute, University of Western Sydney,
Australia
Action planning of temporally ordered elements within a coherent
structure is a key element in communication. The specifically human
ability of the brain to variably combine discrete meaningful units into
rule-based hierarchical structures is what is referred to as syntactic
processing and has been defined as core aspect of language and
communication (Friederici 2011; Hauser et al. 2002; Lashley 1952).
While similarities in the syntactic organization of language and
Western tonal music have been increasingly consolidated (Katz, Pesetsky 2011; Patel 2003; Rohrmeier, Koelsch 2012), analogies with
the domain of action, in terms of hierarchical and combinatorial
organization (Fitch, Martins 2014; Pastra, Aloimonos 2012; Pulvermuller 2014), remain conceptually controversial.(Moro 2014). To
investigate the syntax of actions, piano performance based on the
tonal music is an ideal substrate. First, playing chord progressions is
the direct motoric translation of musical syntax, a theoretically
established hierarchical system of rules governing music structure
(Rohrmeier 2011). Second, it gives the possibility to investigate
action planning at different levels of action hierarchy, from lower
immediate levels of movement selection to higher levels of distal
goals (Grafton, Hamilton 2007; Haggard 2008; Uithol et al. 2012).
Finally, it offers the perspective to investigate the influence of
expertise on the relative weighting of different action features (i.e.,
goal and manner) in motor programming (Palmer, Meyer 2000;
Wohlschlager et al. 2003).
Novembre, Keller (2011) and Sammler et al. (2013) have shown
that expert pianistsduring intense practicemight motorically
learn syntactic regularities governing musical sequences and therefore
generate motor predictions based on their acquired long-term syntactic knowledge. In a priming paradigm, pianists were asked to
imitate on a mute piano silent videos of a hand playing chord
sequences. The last chord was either syntactically congruent or
incongruent with the preceding musical context. Despite the absence
of sounds, the authors found slower imitation times of syntactically
incongruent chords as well as motor facilitation (i.e. faster responses)
of the syntactically congruent chords. In the ERPs (Sammler et al.
2013), the imitation of the incongruent chord elicited a late posterior
negativity, index of reprogramming of an anticipated motor act
(Leuthold, Jentzsch 2002) primed by the syntactic structure of the

S89
musical sequence (i.e. the musical goal). In line with models of
incremental planning of serial actions (Palmer, Pfordresher 2003),
these findings suggest that the notion of syntax translates to a
grammar of musical action in expert pianists.
According to the notion of goal priority over the means in action
hierarchy (Bekkering et al. 2000; Grafton 2009; Wohlschlager et al.
2003), in musical motor acts the musical goal determined by the
context (Syntax) should take priority over the specific movement
selection adopted for the execution (Manner), especially at advanced
skill levels (Novembre, Keller 2011; Palmer, Meyer 2000). However,
through intensive musical training, frequently occurring musical
patterns (i.e., scales, chord progressions) may have codified for some
fixed matching fingering configuration (Gellrich, Parncutt 2008;
Sloboda et al. 1998). Thus, from this perspective, it may also be that
motor pattern familiarity has a role in motor predictions during the
execution of common chord progressions. To what extent motor
predictive mechanisms operate at the level of musical syntax or arise
due to motor pattern familiarity will be addressed here. Whether a
progressively more syntax-based motor control independent of the
manner correlates with expertise will be also discussed.
To this end, we asked pianists to watch and simultaneously execute on a mute piano chord progressions played by a performing
pianists hand presented in a series of pictures on a screen. To negate
exogenously driven auditory predictive processes, no sound was used.
To explore the effect of expertise on syntax-based predictions, pianists ranging from 12 to 27 years of experience were tested
behaviorally and with electroencephalography (EEG). To induce
different strengths of predictions, we used 5-chord or 2-chord
sequences (long/short Context) presenting the target chord in the last
position. In a 2 x 2 factorial design, we manipulated the target chord
of the sequences in terms of keys (Syntax congruent/incongruent), to
violate the syntactic structure of the sequence, and in terms of fingering (Manner correct/incorrect), to violate the motor familiarity.
Crucially, the manipulation of the manner, while keeping the syntax
congruent, allowed us to dissociate behavioral and neural patterns
elicited by the execution of either the violation of the syntactic
structure of the sequence (Syntax) or of a general violation of familiar
movements (Manner). Additionally, the 2 x 2 factorial design permitted us to investigate syntax-related mechanisms on top of the
concurrent manner violation in order to test whether in motor programming high levels of syntactic operations are prioritized over
mechanisms of movement parameter specification.
We hypothesized that, if motor predictions, during execution of
musical chords sequences, are driven by musical syntax rather than
motor pattern familiarity, then the violation of the Syntax should
evoke specific behavioral and electrophysiological patterns, different
from those related to the Manner. Also, we expected to observe
syntax-based prediction effects irrespectively of the fingering used to
play, thus even in presence of the concurrent manner violation.
Finally, if at advanced skill levels the more abstract musical motor
goals increase weighting in motor programming, we expected to
observe a positive dependency between the strength of syntax-based
prediction and expertise levels.
We found that the production of syntactically incongruent
compared to the congruent chords showed a response delay that was
larger in the long compared to the short context and that was
accompanied by the presence of a central posterior negativity
(520800 ms) in the long and not in the short context. Conversely,
the execution of the unconventional manner was not delayed as a
function of Context, and elicited an opposite electrophysiological
pattern (a posterior positivity between 520 and 800 ms). Hence,
while the effects associated to the Syntax might reflect a signal of
movement reprogramming of a prepotent response in face of the
incongruity to be executed (Leuthold, Jentzsch 2002; Sammler et al.
2013), the effects associated with the Manner were stimulus- rather
than response-related and might reflect the perceptual surprise

123

S90
(Polich 2007) of the salient fingering manipulation, recognized by
the pianists as obvious target manipulation. Finally, syntax-related
effects held when only considering the manner incorrect trials, and
their context dependency was sharper with increasing expertise level
(computed as cumulated training hours across all years of piano
playing). This suggests that syntactic mechanisms take priority over
movements specifications, especially in more expert pianists being
more affected by the priming effect of the contextual syntactic
structure.
Taken together, these findings indicate that, given a contextual
musical structure, motor plans for distal musical goal are generated
coherently with the context and forehead those ones underlying
specific, immediate movement selection. Moreover, the increase of
syntax-based motor control with expertise might hint at the action
planning based on musical syntax as a slowly acquired skill built on
top of the acquisition of motor flexibility. More generally, this finding
indicates that, similarly to music perception, music production too
relies on generative syntactic rules.
References
Bekkering H, Wohlschlager A, Gattis M (2000) Imitation of gestures
in children is goal-directed. Quart J Exp Psychol Human Exp
Psychol 53(1):15364. doi:10.1080/713755872
Fitch WT, Martins MD (2014) Hierarchical processing in music,
language, and action: Lashley revisited. Ann N Y Acad Sci 118.
doi:10.1111/nyas.12406
Friederici AD (2011) The brain basis of language processing: from
structure to function. Physiol Rev 91(4):13571392. doi:10.1152/
physrev.00006.2011
Gellrich M, Parncutt R (2008) Piano technique and fingering in the
eighteenth and nineteenth centuries: bringing a forgotten method
back to life. Br J Music Educ 15(01):523. doi:10.1017/S0265051
700003739
Grafton ST, Hamilton AFDC (2007) Evidence for a distributed
hierarchy of action representation in the brain. Human Movement
Sci 26(4):590616. doi:10.1016/j.humov.2007.05.009
Haggard P (2008) Human volition: towards a neuroscience of will.
Nature Rev Neurosci 9(12): 93446. doi:10.1038/nrn2497
Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language:
what is it, who has it, and how did it evolve? Science (New York,
N.Y.) 298(5598):15691579. doi:10.1126/science.298.5598.1569
Katz J, Jean I, Paris N, Pesetsky D (2011) The Identity Thesis for
Language and Music (January)
Lashley K (1952) The problem of serial order in behavior. In: Jeffress
LA (ed) Cerebral mechanisms in behavior. Wiley, New York,
pp 112131
Leuthold H, Jentzsch I (2002) Spatiotemporal source localisation
reveals involvement of medial premotor areas in movement
reprogramming. Exp Brain Res. Experimentelle Hirnforschung.
Experimentation Cerebrale 144(2):17888. doi:10.1007/s00221002-1043-7
Moro A (2014) On the similarity between syntax and actions. Trend
Cogn Sci 18(3):10910. doi:10.1016/j.tics.2013.11.006
Novembre G, Keller PE (2011) A grammar of action generates predictions in skilled musicians. Conscious Cogn 20(4):12321243.
doi:10.1016/j.concog.2011.03.009
Palmer C, Meyer RK (2000) Conceptual and motor learning in music
performance. Psychol Sci 11(1):6368. Retrieved from http://www.
ncbi.nlm.nih.gov/pubmed/11228845
Palmer C, Pfordresher PQ (2003) Incremental planning in sequence
production. Psychol Rev 110(4):683712. doi:10.1037/0033295X.110.4.683
Pastra K, Aloimonos Y (2012) The minimalist grammar of action.
Philos Trans R Soc Lond Ser B Biol Sci 367(1585):103117. doi:
10.1098/rstb.2011.0123

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Polich J (2007) Updating P300: an integrative theory of P3a and P3b.
Clin Neurophysiol: Off J Int Feder Clin Neurophysiol
118(10):21282148. doi:10.1016/j.clinph.2007.04.019
Pulvermuller F (2014) The syntax of action. Trend Cogn Sci
18(5):219220. doi:10.1016/j.tics.2014.01.001
Rohrmeier M (2011) Towards a generative syntax of tonal harmony.
J Math Music 5(1):3553. doi:10.1080/17459737.2011.573676
Rohrmeier M, Koelsch S (2012) Predictive information processing in
music cognition. A critical review. Int J Psychophysiol Off J Int
Organ Psychophysiol 83(2):164175. doi:10.1016/j.ijpsycho.
2011.12.010
Sammler D, Novembre G, Koelsch S, Keller PE (2013) Syntax in a
pianists hand: ERP signatures of embodied syntax processing
in music. Cortex J Devoted Study o Nervous Syst Behav
49(5):13251339. doi:10.1016/j.cortex.2012.06.007
Sloboda JA, Clarke EF, Parncutt R, Raekallio M (1998) Determinants of
finger choice in piano sight-reading. J Exp Psychol Human Percept
Performance 24(1):185203. doi:10.1037//0096-1523.24.1.185
Uithol S, van Rooij I, Bekkering H, Haselager P (2012) Hierarchies in
action and motor control. J Cogn Neurosci 24(5):10771086. doi:
10.1162/jocn_a_00204
Wohlschlager A, Gattis M, Bekkering H (2003) Action generation
and action perception in imitation: an instance of the ideomotor
principle. Philos Trans R Soc Lond Ser B Biol Sci 358(1431):
501515. doi:10.1098/rstb.2002.1257

Motor learning in dance using different modalities:


visual vs. verbal models
Bettina Blasing1, Jenny Coogan2, Jose Biondi2, Liane Simmel3,
Thomas Schack1
1
Neurocognition and Action Research Group & Center of Excellence
Cognitive Interaction Technology (CITEC), Bielefeld University,
Germany; 2 Palucca Hochschule fur Tanz Dresden, Germany; 3
tamed Tanzmedizin Deutschland e.V., Fit for Dance Praxis und
Institut fur Tanzmedizin, Munchen, Germany
Keywords
Motor learning, Observation, Visual model, Verbal instruction, Dance
Introduction
Observational learning is viewed as the major mode of motor learning
(Hodges et al. 2007). Empirical evidence shows that observational
learning primarily takes place in an implicit way, by activating shared
neural correlates of movement execution, observation and simulation
(Jeannerod 2004; Cross et al. 2006, 2009). It has been shown that the
use of language (in terms of verbal cues) can facilitate or enhance
motor learning by guiding attention towards relevant features of the
movement and making these aspects explicit (see Wulf and Prinz
2001). In dance training (and other movement disciplines), observational learning from a visual model is most commonly applied, and is
often supported by verbal cue-giving. Evidence from practice suggests that explicit verbal instructions and movement descriptions play
a major role in movement learning by supporting the understanding,
internalizing and simulating of movement phrases. In modern and
contemporary dance, however, choreographers often do not expect the
dancers to simply reproduce movement phrases in adequate form, but
to develop movement material on their own, in accordance with a
given idea, description or instruction, aiming at a more personal
expression and higher artistic quality of the developed movement
material.
In this study, we investigate dancers learning of movement
phrases based on the exclusive and complementary use of visual
model observation and verbal instruction (movement description).

Cogn Process (2014) 15 (Suppl 1):S1S158


Dance students learned comparable movement material via two different modes: via observation of a model and via listening to a verbal
movement description (as example, a part of a model sequence is
displayed in Fig. 1). In a second step, the complementary mode was
added. After both learning steps, the students performance of the
learned movement phrases was recorded and rated by independent
experts. A retention test was applied to evaluate long-term effects of
the learning processes. We expected the dance students to learn
successfully from the visual model, their most commonly practiced
mode of movement learning. From the verbal instruction, we
expected that performed movement phrases would vary more
strongly, but could possibly be performed with more artistic quality.
We also expected performance after the second learning step to be
improved compared to the first learning step in both conditions.
Method
Learning task: Eighteen students (age: 18.4 1.0 years, 11 female)
from the BA Dance study program at the Palucca Hochschule fur
Tanz Dresden learned two dance phrases of similar length (approx.
30 s) and complexity, one via visual observation of a demonstration
video, the other one via a recorded verbal description (see Fig. 1). In a
first learning step (Step 1), one of the dance phrases was presented
five times either visually (video) or verbally (audio), and the participant was instructed to learn it by watching or listening, and by
marking movements as required. After a short practice, the participant
performed the learned dance phrase while being recorded on video. In
a second learning step (Step 2), the participant was twice presented
the same dance phrase in the complementary presentation mode (i.e.,
video for the verbally learned phrase and vice versa), and the performance was recorded again. The other dance phrase was then
learned and performed using the same procedure, but was presented in

Fig. 1 Images illustrating approximately two-thirds of Phrase 1,


choreographed by Jenny Coogan and performed by Robin Jung. The
phrase was presented as video of 26 s and as audio recording of a
verbal description (speaker: Alex Simkins). Phrase 2, choreographed
by Jose Biondi, was of similar length and complexity and contained
similar movement elements as Phrase 1, and was performed and
spoken by the same dancer and speaker in the video and audio
recording, respectively. The verbal description of the dance sequence
shown in the pictures reads as follows: Stand facing the front left
diagonal of the room in first position. At the same time extend your
left leg forward and your two arms sideways to the horizontal. Allow
your right hand to continue moving until it arrives to a high diagonal.
Gradually let the shape melt back into its beginning position as you
shift your weight into the right hip, bending both knees, sinking your
head to the left to make a big C-curve. Continue into falling, then
catch the weight with a step of the left leg crossing to the right.
Follow with two steps sideward, in the same direction while throwing
both arms in front of your shoulders. Keeping your arms close to you,
spiral to the right diagonal, then, kick your right leg, left arm and head
forward as you throw your right arm behind you. Bring the energy
back into you quickly bending both elbows and the right knee close to
the body, spine vertical. Drop your arms and take a step back onto
your right leg turning fully around while dragging your left leg behind
you. Finish with the weight low, left leg behind, spine rounded
forward, arms wrapped around the body, right arm front, left arm
back. Stretch your legs and gradually lengthen your spine horizontally. Allow your arms to follow the succession of your spine, right
front, left back

S91
the remaining learning mode (verbal or visual) in the Step 1, complemented by the other mode in the Step 2. The order of the dance
phrases (Phrase 1, Phrase 2) and of the initial leaning modes (visual,
verbal) was balanced between the participants (the experimental
design of the study is illustrated in Table 1). The experimental procedure took place in a biomechanics laboratory and lasted
approximately one hour for each participant. Additional to the evaluation of the recorded performances, questionnaires and psychometric
tests were applied to investigate the students learning success and
their personal impressions of the different learning processes.
Expert ratings of the reproduced material: Two independent experts
rated the recorded performance trials from the recorded and cut
video clips, one of each demonstration condition (visual,
visual + verbal, verbal, verbal + visual). The experts rated each of
the recorded performances by filling out a questionnaire consisting
of six-point Likert-scale type questions assigned to two categories,
accordance with the model (AM; 10 questions) and artistic performance quality (PQ, 5 questions). For each category of questions,
ratings of the questions were averaged to achieve general measures
for the main criteria AM and PQ. Each expert independently watched the recordings from the students performances and marked
one answer for each question, without knowing about the learning
condition of the recorded performance. Non-parametric tests (Wilcoxon signed-rank, MannWhitney U) were used to compare the
averaged ratings of the two experts for the different conditions
(visual, visual + verbal, verbal, verbal + visual) within each criterion (AM, PQ) and for the two criteria within each demonstration
condition.
Retention test: Thirteen of the dance students (8 female) participated
in a retention test that was carried out 1013 days after the experimental learning task. The retention test included the video-recorded
performance of the remembered movement material, psychometric
tests and questionnaires. In the performance part of the test, each
student was asked to perform both dance phrases as completely as
possible. Students were allowed to practice for several minutes before
being recorded, but were not given any assistance in reproducing the
phrases. Each student was recorded individually and on his/her own in
a separate dance studio. The video recordings of the students performance in the retention test were annotated for the completeness of
the phrases by two annotators. Each phrase was segmented into eleven partial phrase, or elements, of similar content (note that the
phrases had been choreographed to resemble each other in complexity, duration and structure). The annotators independently
watched the recordings and marked the completeness of each of the
eleven elements as value between 0 and 1 (0: the element was not
danced at all, or was not recognizable; 1: the element was clearly
recognizable and was performed without error); ratings of the two
annotators were then averaged. Each student thereby received for
each of the two phrases a value between 0 (no partial phrase was
reproduced at all) and 11 (all partial phrases were reproduced perfectly). Non-parametric tests (Wilcoxon signed-rank, MannWhitney
U) were used to compare averaged completeness scores between
dance phrases (Phrase 1, Phrase 2) and learning modes (visual first,
verbal first).
Results
Expert ratings: Ratings of the two experts were positively correlated for
both criteria, AM (r = 0.528; p \ .001) and PQ (r = 0.513; p \ .001).
After Step 1, ratings of PQ were significantly better than ratings for AM
(visual: 3.82, 3.33; Z = -2.987, p = .003; verbal: 3.73, 2.69; Z =
-3.529, p \ .001), whereas ratings did not differ after Step 2. AM ratings
after learning only from verbal description was lower (2.69) than after all
other conditions (verbal + visual: 3.48, Z = -3.724, p \ .001; visual:
3.33, Z = -3.624, p \ .001; visual + verbal: 3.65, Z = -3.682,
p \ .001), and AM ratings after visual + verbal learning were higher
than after visual learning (Z = -2.573, p = .01). PQ ratings did not
differ for any of the learning conditions.

123

S92

Cogn Process (2014) 15 (Suppl 1):S1S158

Table 1 Experimental design of the learning task


Learning task

Group 1a N = 4

Group 2a N = 4

Group 2b N = 5

Group 1b N = 5

Pre-test questionnaires
Step 1

Phrase 1

Verbal (5x)

Visual (5x)

+Visual (2x)

+Verbal (2x)

Record 13x

Record 13x

Visual (5x)

Verbal (5x)

Step 2

+Verbal (2x)

Performance

Record 13x

Step 2
Performance
Step 1

Phrase 2

Phrase 2

Phrase 1

Verbal (5x)

Visual (5x)

+Visual (2x)

+Verbal (2x)

Record 13x

Record 13x

Visual (5x)

Verbal (5x)

+Visual (2x)

+Verbal (2x)

+Visual (2x)

Record 13x

Record 13x

Record 13x

N=3

N=4

N=4

N=2

Record 1x

Record 1x

Record 1x

Record 1x

Post-test questionnaire, psychometric tests, interview


Retention
Performance

Phrases 1, 2

Phrases 1, 2

Retention questionnaire, psychometric tests


Step 1, 2: successive learning steps; Phrase 1, 2: movement material; visual, verbal: demonstration mode; Performance: video-recorded
performance of the learned dance phrase
Retention test: Completeness scores given by the two annotators were
highly correlated for both sequences (Phrase 1: r = 0.942, p \ .001;
Phrase 2: r = 0.930, p \ .001). No differences were found between
the groups (Group 1: Phrase 1 verbal first, N = 5; Group 2: Phrase 1
visual first, N = 8) in general, and no differences were found between
the two sequences (Phrase 1: 7.64; Phrase 2: 6.90). Scores were better
for the first visually learned phrase (8.32) than for the phrase first
learned from verbal description (6.23) (Z = -1.992, p = .046).
When the sequences were regarded separately, groups differed for
Phrase 2 (Group 1: 9.17; Group 2: 5.48), but not for Phrase 1 (Group
1: 7.42; Group 2: 7.78), with Group 1 performing better than Group 2
(Z = -2.196, p = .028) (see Fig. 2). When comparing ratings for the
individual elements (1 to 11), primacy effects were found for both
dance phrases, in terms of higher scores for the first 3 and 2 parts in
Phrase 1 and Phrase 2, respectively (Phrase 1: element 1 differed from
6, 7, 8, 9 and 11; 2 differed from 5, 6, 7, 8; 3 differed from 4, 5, 6, 7,
8, 9 and 10; Phrase 2: 1 differed from 3, 4, 5, 6, 7, 8, 9, 10 and 11; 2
differed from 4, 7, 9 and 10; all p \ =.05).
Discussion
Interdisciplinary projects linking dance and neurocognitive
research have recently come to increasing awareness in artistic and
scientific communities (see Blasing et al. 2012; Sevdalis, Keller
2011). The presented project on observational (implicit) and verbal
(explicit) movement learning in dance has been developed within

Fig. 2 Left Mean expert ratings of students performance for


accordance with the model (AM; dark grey columns) and performance quality (PQ; light grey columns) after learning from one
(visual, verbal) and two (visual + verbal, verbal + visual) modalities
(ratings for both dance phrases are pooled); right completeness scores
for students performance in the retention test for Phrases 1 and 2;
dark grey columns Group 1 (Phrase 1 verbal, verbal + visual; Phrase
2 visual, visual + verbal); light grey columns Group 2 (Phrase 1
visual, visual + verbal; Phrase 2 verbal, verbal + visual)

123

an interdisciplinary network (Dance engaging Science; The


Forsythe Company | Motion Bank), motivated by scientific, artistic
and (dance-) pedagogical questions. We compared expert ratings
for the recorded performance of two different movement phrases
in 18 dance students who had learned one phrase initially via
verbal description and the other one via observation of a video
model. After dancing the phrase and being recorded, students
received the complementary modality to learn from, and were
recorded performing again. Ratings for performance quality were
better than rating for model reproduction after the first learning
step (one modality), but not after the second learning step (two
modalities). After learning from only one modality, ratings for
accordance with the model were better if the first learning
modality was visual than verbal, whereas ratings for performance
quality did not differ for visual vs. verbal learning. When the
students had to reproduce the learned movement material in a
retention test, the (initially) visually learned material was reproduced more completely than the verbally learned material,
however, when the dance phrases were regarded separately, this
result was only significant for one of the phrases. The results
corroborate findings regarding observational learning of movements in dance and other disciplines or tasks, but also suggest
dissociation between the exact execution of a model phrase and
the artistic quality of dance, even in the learning phase. As
expected, accordance with the model phrases was stronger after
visual learning and after two compared to one modalities (which
might as well have been influenced by the additional practice, as
this was always the second learning step.) Regarding artistic
quality of performance, the students danced the newly learned
material after learning from verbal description as well as after
learning from visual observation, but not better, as we had
expected. Questionnaires and psychometric tests are currently
being analyzed to complement the reported findings of this study.
We expect the outcomes to contribute to our understanding of
explicit and implicit motor learning on the basis of different
modalities, and also to yield potential implications for teaching
and training in dance-related disciplines. While explicit learning
(via verbal instruction) and implicit learning (via observation and
practice) have been found to work synergistically in skilled motor
action (Taylor and Ivry 2013), the situation might be different for
dance and potentially for dance-like movement in general (see
Schachner and Carey 2013), in which skilful movement execution
largely depends on kinesthetic awareness; further research is
needed at this point. Further implications could be derived for

Cogn Process (2014) 15 (Suppl 1):S1S158

S93

learning in general, specifically regarding the potential benefit of


combining different modes (or modalities) for conveying information in order to shape and optimize learning success.
References
Blasing B, Calvo-Merino B, Cross ES, Jola C, Honisch J, Stevens CJ
(2012) Neurocognitive control in dance perception and performance. Acta Psychol 139:300308
Cross ES, Hamilton AF, Grafton ST (2006) Building a motor simulation de novo:observation of dance by dancers. NeuroImage
31:12571267
Cross ES, Kraemer DJ, Hamilton AF, Kelley WM, Grafton ST (2009)
Sensitivity of the action observation network to physical and
observational learning. Cereb Cortex 19:315326
Hodges NJ, Williams AM, Hayes SJ, Breslin G (2007) What is
modelled during observational learning? J Sport Sci 25:531545
Jeannerod M (2004) Actions from within. Int J Sport Exercise Psychol 2:376402
Schachner A, Carey S (2013) Reasoning about irrational actions:when intentional movements cannot be explained, the movements
themselves are seen as the goal. Cognition 129:309327
Sevdalis V, Keller PE (2011) Captured by motion:dance, action
understanding, and social cognition. Brain Cogn 77:231236
Wulf G, Prinz W (2001) Directing attention to movement effects
enhances learning:a review. Psychon B Rev 8:648660
Taylor JA, Ivry RB (2013) Implicit and explicit processes in motor
learning. Action Sci:6387

A frontotemporoparietal network common to initiating


and responding to joint attention bids
Nathan Caruana, Jon Brock, Alexandra Woolgar
ARC Centre of Excellence in Cognition and its Disorders,
Department of Cognitive Science, Macquarie University, Sydney,
Australia
Joint attention is the ability to interactively coordinate attention with
another person to objects of mutual interest, and is a fundamental
component of daily interpersonal relationships and communication.
According to the Parallel Distributed Processing model (PDPM;
Mundy, Newell 2007), responding to joint attention bids (RJA) is
supported by posterior-parietal cortical regions, while initiating joint
attention (IJA) involves frontal regions. Although the model
emphasizes their functional and developmental divergence, it also
suggests that the integration of frontal and posterior-parietal networks is crucial for the emergence of complex joint attention
behavior, allowing individuals to represent their own attentional
perspective as well as the attentional focus of their social partner in
parallel. However, little is known about the neural basis of these
parallel joint attention processes, due to a lack of ecologically valid
paradigms.
In the present study, we used functional magnetic resonance
imaging to directly test the claims of the PDPM. Thirteen subjects (9
male, Mage = 24.85, SD = 5.65) were scanned as they engaged with
an avatar whom they believed was operated by another person outside
the scanner, but was in fact controlled by a gaze-contingent computer
algorithm. The task involved catching a burglar who was hiding
inside one of six houses displayed on the screen. Each trial began with
a search phase, during which there was a division of labor between
the subject and their virtual partner. Subjects were required to search
a row of three houses located at either the top or bottom of the screen,
whilst the avatar searched the other row. When the subject fixated one
of their designated houses, the door opened to reveal an empty house
or the burglar (see Fig. 1a). The location of the subjects designated

Fig. 1 a This is an example of the stimuli used in the social condition


(i.e. RJA and IJA). b This is an example of the stimuli used in the
control conditions (i.e. RJAc and IJAc) Note that for a and b, the eyeshaped symbol represents the subjects eye movement resulting in
joint attention. This was not part of the stimulus visible to subjects
houses was counterbalanced across acquisition runs. Subjects were
instructed that whoever found the burglar on each trial had to guide
their partner to that location by first establishing mutual gaze and then
looking at the appropriate house.
On RJA trials, subjects searched their designated houses, each of
which would be empty. The avatar would then complete his search and
guide the subject to the burglars location. Once the subject responded
and joint attention was achieved, positive feedback was provided with
the burglar appearing behind bars to symbolize that he had been
successfully captured. On IJA trials, the subject would find the burglar
inside one of their designated houses. Once the avatar had completed
his search and mutual gaze was established, the subject was then
required to initiate joint attention by saccading towards the correct
location. The avatar responded by gazing at the location fixated by the
subject, regardless of whether it was correct or not. Again, positive
feedback was provided when joint attention was achieved at the burglars location. Negative feedback was also provided if the subject
failed to make a responsive eye movement within three seconds, or if
they responded or initiated by fixating an incorrect location.
During the search phase, the avatars gaze behavior was controlled
so that he only completed his search after the subject completed their
search and fixated back on the avatar. This meant that subjects were
required to monitor the avatars attention during their interaction,
before responding to, or initiating a joint attention bid. In this paradigmas in ecological interactionsestablishing mutual gaze was
therefore essential in determining whether the avatar was ready to
guide the subject, or respond to the subjects initiation of joint
attention. The onset latencies of the avatars gaze behavior (i.e.
alternating between search houses, establishing mutual gaze, and
executing responding or initiating saccades) were also jittered with a
uniform distribution between 500 and 1,000 ms. This served to
enhance the avatars ecological appearance.
The subjects social role as a responder or initiator only
became apparent throughout the course of each trial. Our paradigm
thereby created a social context that (1) elicited intentional, goaldriven joint attention (2) naturally informed subjects of their social
role without overt instruction, and (3) required subjects to engage in
social attention monitoring.
In order to account for the effect of non-social task features, the
neural correlates of RJA and IJA were investigated relative to nonsocial control conditions that were matched on attentional demands,
number of eye movements elicited and task complexity. During these
trials, the avatar remained on the screen with his eyes closed, and
subjects were told that both partners were completing the task independently. In the IJA control condition (IJAc), subjects found the
burglar, looked back to a central fixation point and, when this turned
green, saccaded towards the burglar location. In the RJA control
condition (RJAc), the fixation point became an arrow directing them
to the burglar location (see Fig. 1b).
A synchronization pulse was used at the beginning of each
acquisition run to allow for the BOLD and eye tracking data to be

123

S94

Fig. 2 Threshold maps are displayed for a Responding to joint


attention (RJA - RJAc), b Initiating joint attention (IJA - IJAc),
c Initiating over and above Responding [(IJA - IJAc) - (RJA RJAc)], and d Activation common to Responding and Initiating,
t [ 3.70, equivalent to p \ 0.05 FDR correction in a, with extent
threshold 10 voxels. The threshold for p \ 0.05 FDR correction
would have been 2.87, 3.18 and 3.10 in b, c and d respectively. No
voxels survived FDR correction for Responding over and above
Initiating contrast [(RJA - RJAc) - (IJA - IJAc)]
temporally aligned. Our analyses of BOLD data focused on the
joint attention phase of each trial. Accordingly, event onset times
were defined as the time at which the participant opened the last
empty house (RJA and RJAc) or found the burglar (IJA and IJAc).
Events were modelled as box cars lasting until the time at which
joint attention was achieved and the burglar captured. This assisted
in accounting for variation in reaction times between trials. All
second level t-images were thresholded at t [ 3.70, equivalent to
p \ 0.05, with a false discovery rate (FDR) correction for multiple
comparisons in the comparison of RJA and RJAc (see Fig. 2a).
This threshold was more conservative than p \ 0.05 with FDR
correction in any other contrast tested. The use of a single
threshold for visualization allowed the results to be more easily
compared.
Relative to their corresponding control conditions, both RJA
(Fig. 2a) and IJA (Fig. 2b) activated a broad frontotemporoparietal
network, largely consistent with previous findings (Redcay et al.
2010; Schilbach et al. 2010). Additionally, IJA resulted in more
distributed activation across this network, relative to RJA, after
controlling for non-social attention (Fig. 2c).
A conjunction analysis identified a right-lateralized subset of this
network that was common to both RJA and IJA, over and above
activation associated with the non-social control conditions (Fig. 2d).
Regions included the dorsal portion of the middle frontal gyrus
(MFG), inferior frontal gyrus (IFG), middle temporal gyrus (MTG),
precentral gyrus, posterior superior temporal sulcus (pSTS), temporoparietal junction (TPJ) and precuneus. The existing literature
associates many of these regions with tasks involving perspective
taking processes. Specifically, TPJ has been implicated in tasks where
subjects form representations of others mental states (Samson, Apperly, Chiavarino and Humphreys 2004). The precuneus has been
recruited in tasks that involve representing first person (self) and third
person (other) visual perspectives (Vogeley et al. 2004). Involvement
of IFG has been reported in dyadic tasks where subjects make competitive profit-oriented decisions which intrinsically involve self-other
comparisons (Halko, Hlushchuk, Hari and Schurmann 2009). Finally,
modulation of pSTS activation has been reported during tasks where

123

Cogn Process (2014) 15 (Suppl 1):S1S158


subjects determine the intentionality of anothers behavior (Morris,
Pelphrey and McCarthy 2008).
Together with previous findings, the frontotemporoparietal network identified in our study is consistent with the PDPMs claim that
the neural mechanisms of RJA and IJA have a shared neural basis in
adulthood. This may support the ability to simultaneously represent
the attentional state of the self and others during interactions (Mundy,
Newell 2007). These self-other representations are essential for the
achievement of joint attention in ecological contexts, as one must
represent the attentional focus of their partner to determine when they
can respond to or initiate joint attention. One also must represent their
own attentional focus so as to plan initiations of joint attention, and to
shift their attentional focus when guided.
Furthermore, a portion of the frontoparietal network common to
RJA and IJAincluding IFG, TPJ and precuneusrevealed additional
activation during IJA trials, compared to RJA trials (see Fig. 2c). This is
again consistent with the role of this network in simultaneously representing self- and other- oriented attention perspectives, as IJA trials
required subjects to represent an additional shift in their partners
attentional focus (avatar searches, then waits for guidance, then
responds), relative to RJA trials (avatar searches, then guides).
Our data contributes to ongoing debates in the social neuroscience
literature concerning the social specificity of many of the regions
included in this network, such as TPJ (Kincade, Abrams, Astafiev,
Shulman and Corbetta 2005). Due to the implementation of closely
matched non-social conditions, the present study provides further
evidence that these substrates may be particularly sensitive to social
engagement.
This is the first imaging study to directly investigate the neural
correlates common to RJA and IJA engagement and thus support the
PDPMs claim that a broad integrated network supports the parallel
aspects of both initiating and responding to joint attention. These data
inform a neural model of joint attention in adults, and may guide
future clinical applications of our paradigm to investigate whether the
developmental delay of joint attention in autism is associated with a
differential organization of this integrated network.

References
Halko M-L, Hlushchuk Y, Hari R, Schurmann M (2009) Competing with
peers: mentalizing-related brain activity reflects what is at stake.
NeuroImage 46:542548. doi:10.1016/j.neuroimage.2009.01.063
Kincade JM, Abrams RA, Astafiev SV, Shulman GL, Corbetta M (2005)
An event-related functional magnetic resonance imaging study of
voluntary and stimulus-driven orienting of attention. J Neurosci
25:45934604. doi: 10.1523/JNEUROSCI.0236-05.2005
Morris JP, Pelphrey KA, McCarthy G (2008) Perceived causality
influences brain activity evoked by biological motion. Soc
Neurosci 3:1625. doi:10.1080/17470910701476686
Mundy P, Newell L (2007) Attention, joint attention and social
cognition. Curr Dir Psychol Sci 16:269274
Redcay E, Dodell-Feder D, Pearrow MJ, Mavros PL, Kleiner M,
Gabrieli JDE, Saxe R (2010) Live face-to-face interaction during
fMRI: a new tool for social cognitive neuroscience. NeuroImage
50:16391647. doi:10.1016/j.neuroimage.2010.01.052
Samson D, Apperly IA, Chiavarino C, Humphreys GW (2004) Left
temporoparietal junction is necessary for representing someone
elses belief. Nat Neurosci 7:499500. doi:10.1038/nn1223
Schilbach L, Wilms M, Eickhoff SB, Romanzetti S, Tepest R, Bente
G, Vogeley K (2010) Minds made for sharing: initiating joint
attention recruits reward-related neurocircuitry. J Cogn Neurosci
22:27022715. doi:10.1162/jocn.2009.21401
Vogeley K, May M, Ritzl A, Falkai P, Zilles K, Fink GR (2004)
Neural correlates of first-person perspective as one constituent of
human self-consciousness. J Cogn Neurosci 16:817827. doi:
10.1162/089892904970799

Cogn Process (2014) 15 (Suppl 1):S1S158

Action recognition and the semantic meaning


of actions: how does the brain categorize different social
actions?
Dong-Seon Chang1, Heinrich H. Bulthoff1, Stephan de la Rosa1
Max Planck Institute for Biological Cybernetics, Dept. of Human
Perception, Cognition and Action, Tubingen, Germany
Introduction
The visual recognition of actions occurs at different levels (Jellema
and Perrett 2006; Blake and Shiffrar 2007; Prinz 2013). At a kinematic
level, an action can be described as the physical movement of a body
part in space and time, whereas at a semantic level, an action can carry
various social meanings such as about the goals or intentions of an
action. In the past decades, a substantial amount of neuroscientific
research work has been devoted to various aspects of action recognition (Casile and Giese 2005; Blake and Shiffrar 2007; Prinz 2013).
Still, the question at which level the representations for different social
actions might be encoded and categorically ordered in the brain is
largely left unanswered. Does the brain categorize different actions
according to their kinematic similarities, or in terms of their semantic
meanings? In the present study, we wanted to find out whether different actions were ordered according to their semantic meaning or
kinematic motion by employing a visual action adaptation aftereffect
paradigm as used in our previous studies (de la Rosa et al. 2014).
Materials and methods
We used motion capture technology (MVN Motion Capture Suit from
XSense, Netherlands) to record different social actions often observed in
everyday life. The four social actions chosen as our experimental stimuli
were handshake, wave, punch, yopunch (fistbump), and each of the
actions were similar or different with the other actions either in terms of
their semantic meaning (e.g. handshake and wave both meant a greeting, whereas punch meant an attack and yopunch meant a greeting) or
kinematic motion (e.g. the movement of a punch and a yopunch were
both similar, whereas the movement of a punch and a wave were very
different). To quantify these similarities and differences between each
action, a total of 24 participants rated the four different social actions
pairwise in terms of their perceived differences in either semantic
meaning or kinematic motion on a visual analogue scale ranging from 0
(exactly same) to 10 (completely different). All actions were processed
into short movie clips (\ 2 s) showing only the joint movements of an
actor (point-light stimuli) from the side view to the participants. Then,
the specific perceptual bias for each action was determined by measuring the size of the action adaptation aftereffect in each participant.
Each of the four different social actions were shown as a visual adaptor
each block (30 s prolonged exposure in the start, 3 x repetitions each
trial) while participants had to engage in a 2-Alternative-Forced-Choice
(2AFC) task where they had to judge which action was shown. The test
stimuli in the 2AFC task were action morphs in 7 different steps
between two actions which were presented repeatedly (18 repetitions
each block) and randomized. Finally, the previously obtained meaning
and motion ratings were used to predict the measured adaptation
aftereffect for each action using linear regression.
Results
The perceived differences in the ratings of semantic meaning significantly predicted the differences in the action adaptation
aftereffects (p \ 0.001). The rated differences in kinematic motion
alone was not able to significantly predict the differences in the action
adaptation aftereffects, although the interaction of meaning and
motion was also able to significantly predict the changes in the action
adaptation aftereffect for each action (p \ 0.01).
Discussion
Previous results have demonstrated that the action adaptation aftereffect
paradigm could be a useful paradigm for determining the specific perceptual bias for recognizing an action, since depending on the adaptor

S95
stimulus (e.g. if the adaptor was the same action as in one of the test
stimuli) a significant shift of the point of subjective equality (PSE) was
consistently observed in the psychometric curve judging the difference
between two different actions (de la Rosa et al. 2014). This shift of PSE
is representing a specific perceptual bias for each recognized action
because it is assumed that this shift (adaptation aftereffect) would not be
found if there would be no specific adaptation of the underlying neuronal populations recognizing each action (Clifford et al. 2007; Webster
2011). Using this paradigm we showed for the first time that perceived
differences between distinct social actions might be rather encoded in
terms of their semantic meaning than kinematic motion in the brain.
Future studies should confirm the neuroanatomical correlates to this
action adaptation aftereffect. The current experimental paradigm also
serves as a useful method for further mapping the relationship between
different social actions in the human brain.
References
Blake R, Shiffrar M (2007) Perception of human motion. Ann Rev
Psychol 58:4773. doi:10.1146/annurev.psych.57.102904.190152
Casile A, Giese MA (2005) Critical features for the recognition of
biological motion. 348360. doi:10.1167/5.4.6
Clifford CWG, Webster M a, Stanley GB, et al. (2007) Visual
adaptation: neural, psychological and computational aspects.
Vision Res 47:31253131. doi:10.1016/j.visres.2007.08.023
De la Rosa S, Streuber S, Giese M et al. (2014) Putting actions in
context: visual action adaptation aftereffects are modulated by
social contexts. PloS ONE 9:e86502. doi:10.1371/journal.pone.
0086502
Jellema T, Perrett DI (2006) Neural representations of perceived bodily
actions using a categorical frame of reference. Neuropsychologia
44:15351546. doi:10.1016/j.neuropsychologia.2006.01.020
Prinz W (2013) Action representation: Crosstalk between semantics
and pragmatics. Neuropsychologia 16. doi:10.1016/j.neuropsych
ologia.2013.08.015
Webster MA (2011) Adaptation and visual coding. 11:123. doi:
10.1167/11.5.3.Introduction

Understanding before language5


Anna Ciaunica
Institute of Philosophy, London, England & Institute of Philosophy,
Porto, Portugal
Abstract How can an infant unable to articulate meaning in verbal
communication be an epistemic agent capable of attributing false
beliefs? Onishi, Baillargeon (2005) demonstrated false belief understanding in young children through completely nonverbal measures such
as violation of expectation (VOE)6 looking paradigm and showed that
children younger than 3 years of age, who consistently fail the standard
verbal false-belief task (SFBT), can anticipate others actions based on
their attributed false beliefs. This gave rise to the so-called Developmental Paradox (DP): if preverbal human infants have the capacity to
respond to others false beliefs from at least 15 months, why should
they be unable to verbally express their capacity to recognize false
5

An extended version of this paper has been recently accepted


for publication in the Review of Philosophy and Psychology,
under the title Under Pressure: Processing Representational Decoupling in False-Belief Tasks.  Springer Science + Business Media
Dordrecht 2014.
6
The VOE task tests whether children look longer when agents act in
a manner that is inconsistent with their false beliefs and relies on the
basic assumption that when an individuals expectations are violated,
she is surprised and thus she looks longer at an unexpected event
rather than at an expected event.

123

S96
beliefs until they are 4-years old, a full 33 months later? The DP teaches
us that visual perception plays a crucial role in processing the implicit
false-belief condition as opposed to the explicit/verbal-report condition.
But why is perception, in some cases, smarter than explicit and
verbalized thinking? In this paper I briefly sketch the solution proposed
by De Bruin, Kastner (2012), the Dynamic Embodied Cognition and I
raise an objection regarding their use of the term metarepresentation
in explaining the puzzle.
Recently, evidence has been mounting to suggest that infants have
much more sophisticated social-cognitive skills than previously suspected. The issue at stake is crucial since as Sommerville, Woodward
(2010:84) pointed out, assessing infants understanding of others
behavior provides not only a snapshot of the developing mind of the
child, but also a panorama of the very nature of cognition itself.
Consider this challenge:
P1. Empirical evidence strongly suggests that basic cognition is
smart (since 15-month olds understand false-beliefs).
P2. Smart cognition necessarily involves computations and representations (of false beliefs).
P3. Hence, basic cognition necessarily involves computations and
representations (of false beliefs).
De Bruin, Kastner (2012) recently proposed a reconciliatory
middle-ground solution between representationalist and enactivist
accounts, i.e. Dynamic Embodied Cognition (DEC). They claim that
the Developmental Puzzle is best addressed in terms of the relation
between coupled (online) and decoupled (offline) processes for basic
and advanced forms of (social) cognition as opposed to merely representing/not representing false beliefs. They argue that rephrasing the
issue in terms of online/offline processing provides us with an explanation of the Developmental Puzzle. How exactly does this work? First,
the authors take for granted the premise that infants are equipped with
implicit abilities that start out as grounded in basic online processes,
albeit partly decoupled. It is crucial for their project that these basic
implicit abilities already involve decoupling. This is in line with the
cognitivist distinction between (a) sub-doxastic mental states that do
not possess truth-evaluable propositional content and (b) robust mental
states (Spaulding 2010:123). In a second step, they hold that infants
implicit abilities develop gradually into more sophisticated explicit
abilities that rely on offline processes to a much larger extent. The
coupling and decoupling relations between agent and environment
advocated by DEC are dynamic in the sense that they are a matter of
degree and never an end in itself. () The dynamic interplay of
decoupled and coupled processes may be used for optimization of
cognitive processing. (De Bruin and Kastner 2012: 552 emphasis
added). There is definitely much more to be said about DEC, but this
gives us the basic flavor. Clearly, DEC borrows from the weak-strategy
theorists such as Apperly and Butterfill (2009) the idea that early
mechanisms are cheap and efficient, while the late-emerging
mechanisms are costly but flexible. But they also borrow from
rich theorists (Baillargeon et al. 2010) the idea that preverbal human
infants are already capable of decoupling, i.e. taking their own realitycongruent perspective offline, albeit in a very limited way.
An important concern regards the use of the term metarepresentation. As S. Scott (2001) pointed out, there is danger of
confusionwith serious consequences for the debate about the nature
of higher-level cognitionbetween two distinct notions of metarepresentation, as defined by philosophers (Dennett 1998) and by
psychologists dealing with the question of autistic disorders (Leslie
1991). According to Dennett (1998), representations are themselves
objects in the world, and therefore potential objects of (second-order
or meta-) representations. Call this metarepresentation1. For example,
drawing a cat on a piece of paper is a type of non-mental representation, which is represented in the mind of the person viewing it. The
mental representation is of the drawing, but since the drawing is itself
a representation, the viewer has a (mental) metarepresentation of
whatever it is that the drawing represents, namely a cat. By contrast,

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Leslie uses the term metarepresentation to mean (e.g., in the case
of understanding pretence-in-others) an internal representation of an
epistemic relation (PRETEND) between a person, a real situation and
an imaginary situation (represented opaquely)(Leslie 1991:73)
Call this metarepresentation2.This definition does not sound at all like
the definition of metarepresentation1 as second-order representation
pursued above. There is nothing metarepresentational in the sense of
higher order representation in Leslies formulation of the semantics
of psychological predicates. Building on this distinction, S. Scott
insightfully argues that a representation can contain other representations without being a metarepresentation1.
Consider (P):
(P) The child BELIEVES that Sally BELIEVES that the marble is
in the basket.
In what follows, I shall argue that although (P) is a straight-up
second-order belief, this does not necessarily involve second-order
representation, or metarepresentation1 in Dennetts sense. Against De
Bruin and Kastner (2012), I hold that there are no additional secondorder metarepresentational skills involved in SFBT as compared
with VOE trials. Much of what I have to say in this section parallels
arguments from Scott (2001) with which I am in close agreement.
Scott convincingly argued that second-order beliefs do not necessarily
require metarepresentations1. It is only necessary to have the ability to
represent first order beliefs in order to have second-order beliefs
(Scott 2001: 940).
Take the following example of a first order belief:
(1) Melissa BELIEVES that her dog is dead.
The crucial point here is that to simply hold a belief, Melissa need
not be aware of her belief or to hold an explicit representation of it. In
other words, she need not think to herself: I believe my dog is dead
or It is I who believes that my dog is dead. At this level of interpretation, we can speak of animals having this kind of online implicit
beliefs, although we may find uncomfortable the idea of dogs having
implicit beliefs. Now, consider the following example of a second
order belief:
(2) Anne BELIEVES that Melissa BELIEVES that her dog is
dead.
As Scott rightly points out, in order to get (2) Anne needs the
representation of Melissas dog, the predicate DEAD, and so on.
What she does not need is a representation of Melissas representation of her dog, the predicate DEAD, and so on. That is, she does not
need a second-order representation of any of these things. She can get
by with her own first-order representations. Given that neither Melissa
nor Anne has any particular need of belief representation in order to
be a believer, Annes representation of Melissas belief need not be
second order. In addition, it would seem that what Anne also needs in
order to get (2) is a representation of Melissas BELIEF. That is to
say, she needs a representation of Melissas mental state of believing
in a way that Melissa does not (Scott 2001:939, emphasis added).
The question is: is there any metarepresentation1 involved here?
Indeed, one might object that Melissas belief state involves already
implicit or sub-personal representational processing. Now, the distinction between explicit versus implicit or sub-personal mental
representations is a complicated issue and need not concern us here.
For present purposes, it is sufficient to insist on the idea that Annes
representation of Melissas first-order belief (regardless of the fact
that the latter involves or not subpersonal representational processing
in Melissas mind) does not amount to a second-order metarepresentation1 (in Annes mind). But let us suppose for the sake of the
argument that Anne holds a representation of Melissas first-order
implicit belief (B) which in turn involves a certain sub-personal
representational processing (S) in Melissas brain. Now, if (S) is an
implicit, sub-personal representation (in Melissas mind), then one
consequence would be that in metarepresenting Melissas belief
(B) [which involves (S)], Anne is only half-aware of what she is
metarepresenting. Indeed, given that one member of the double

Cogn Process (2014) 15 (Suppl 1):S1S158


representational layer, namely (S), remains opaque to her, Anne is
aware only of what she is representing, namely (B). Note that this is
not a problem per se. One could label this half-blind metarepresenting, metarepresentation3, say. If this is so, then it is difficult for me to
see why metarepresenting3 in this sense is supposed to be cognitively
more demanding (for Anne) than mere representing. In contrast, recall
that in Dennetts drawing example, the viewer is fully aware of the
double representational layer: he forms a mental representation of a
drawing of a cat and this makes his metarepresenting1 a genuine
second-order cognitive achievement. Hence, it is not clear that
metarepresenting1 is at work in the Sally/Anne scenario and this casts
doubt on the idea that the ERTs require that infants not only represent
but metarepresent.
To sum up, according to De Bruin and Kastner, ERTs involve a
stronger form of decoupling (precisely because it involves metarepresentational skills and language processing), hence explaining the
Developmental Puzzle. Although I agree with De Bruin and Kastner in
saying that (a) SFBTs require decoupling, and that (b) the verbal
interaction with the experimenter during SFBT plays a crucial role in
3-year olds failure to report false-belief understanding, there is still
something missing in the picture. Indeed, I fail to see how (a) and
(b) alone can solve the Developmental Puzzle, since, as the authors
themselves have insisted, the decoupling is supposed to lead to an
optimization of cognitive processing. Everybody agrees that strong
decoupling is an important evolutionary advantage. But the mystery of
the Developmental Puzzle stems from the opposite situation. In order
to truly solve the DP, they need to answer the following question: why
does stronger decoupling impair (at least in some cases) rather than
improve the mental gymnastics of representational manipulation?
In other words: why do weaker forms of decoupling do a better job in a
complex task such as false-belief understanding?
Unlike De Bruin and Kastner, I reject the idea that basic forms of
mentality are representational and that during VOE scenarios, infants
must rely on internal representations of visual information that is
available to the other agent but not available to them. Rather, infants
understand others intentional attitudes as currently and readily
available (i.e. directly observable) in the environment. To support this
claim, I appeal to empirical findings illustrating that (i) infants ability
to understand other minds is rooted in their capacity to actively
engage in interactive scenarios. Consistent with a burgeoning literature suggesting a common basis for both the production and
perception of action, evidence has been mounting to illustrate that
infants understanding of others is more robust within interactive
contexts. In other words, the more engaged the interactions infants/
agents are the more robust the infants understanding of others
becomes. Children first learn to discern or establish reference in situations that are not defined by differences in how self and other
perceive agents and objects visually but by differences in their shared
experiential backgrounds, i.e. in what they did, witnessed or heard.
For example, Moll and Tomasello (2007) tested the childs abilities to
recall an adults knowledge of what she has experienced in three
conditions: (1) the child and adult together interacted with a toy; (2)
the infant handled the toy with another experimenter, while the adult
watched (and the infant was alerted to this several times); (3) the adult
handled a toy alone, while the infant watched. As Wilby (2012)
pointed out, one might describe the difference in evidence that is
available to the infant as follows:
(1) X is aware that [I am aware that [X is aware that [p]]].
(2) X is aware that [I am aware that [p]].
(3) X is aware that p.
Now, if we apply De Bruin and Kastners degrees of decoupling
explanatory strategy in this specific case, then one could expect that
infants would find the first condition (1) the hardest, since it involves
several embedded layers of decoupling. Yet, the evidence suggests
the complete opposite. Hence, it is not clear that crediting infants with
an implicit representational decoupling ability is the best strategy here.

S97
References
Apperly I, Butterfill S (2009) Do humans have two systems to track
beliefs and belief-like states? Psychol Rev 116:953970
Baillargeon R, Scott RM, Zijing H (2010) False-belief understanding
in infants. Trend Cogn Sci 14(3):110118
Ciaunica A (2014) (in press) Under pressure- processing representational decoupling in false-belief tasks. Rev Philos Psychol. doi:
10.1007/s13164-014-0195-2
De Bruin LC, Kastner L (2012) Dynamic embodied cognition. Phenomenol Cogn Sci 11(4):541563
Dennett D (1998) Making tools for thinking. In: Sperber D (ed)
(2000) Metarepresentation. Oxford University Press, New York
Leslie AM (1991) Precursors to a theory of mind. In: Andrew W (ed),
Natural theories of mind: evolution, development, and simulation
of everyday mindreading. Blackwell, Oxford, pp 6378
Moll H, Tomasello M (2007) How 14- and 18-month-olds know what
others have experienced. Dev Psychol 43(2):309317
Onishi KH, Baillargeon R (2005) Do 15-month-old infants understand
false beliefs? Science 308(8):255258
Scott S (2001) Metarepresentations in philosophy and psychology. In:
Moore J, Stenning K (eds) Proceedings of the twenty-third annual
conference of the cognitive science society, University of Edinburgh. LEA Publishers, London
Sommerville JA, Woodward A (2010) In: Grammont et al. (eds)
Naturalizing intention in action. MIT Press, Cambridge
Wilby M (2012) Embodying the false-belief tasks, phenomenology
and the cognitive sciences, Special Issue on Debates on
Embodied Mindreading (ed. S. Spaulding) December 2012,
Volume 11, pp 519540

An embodied kinematic model for perspective taking


Stephan Ehrenfeld, Martin V. Butz
Cognitive Modeling, Department of Computer Science, University
of Tubingen, Germany
Abstract Spatial perspective taking (PT) is an important part of
many social capabilities, such as imitation or empathy. It enables an
observer to experience the world from the perspective of another
actor. Research results from several disciplines suggest that the
capability for PT is partially grounded in the postural structure of the
own body. We investigate an option for enabling PT by employing a
potentially learned, own, kinematic body model. In particular, we
investigate if the modular modality frame model (MMF), which is a
computational model of the brains postural representation of its own
body, can be used for PT. Our results confirm that MMF is indeed
capable of PT. In particular, we show that MMF can be used to infer
a probabilistic estimate by recruiting the own, embodied kinematic
knowledge for inferring the necessary spatial transformation for PT
as well as for deducing object positions and orientations from the
actors egocentric perspective.
Keywords
Perspective Taking, Embodiment, Frame of Reference
Introduction
Perspective taking (PT) may be defined as the ability to put oneself
into another persons spatial, bodily, social, emotional, or even logical
reasoning perspective. Taking on an actors perspective in one or
several of these respects, seems mandatory to be able to interact with
the observed actor socially, to imitate the actor, to cooperate with the
actor, to imagine situations, events, or episodes the actor has been in
or may experience in the future, and to show and experience empathy
(Buckner and Carroll, 2007). PT has often been associated with the
mirror neuron system (Rizzolatti and Craighero, 2004). However, todate it is still under debate where mirror neurons come fromwith

123

S98

123

modality & FoR axis

forward

perspective change

inverse

suggestions ranging from purely associative learning mechanisms,


over adaptations for action understanding, to a consequence of epigenetic, evo-devo interactions (Heyes, 2010; Ferrari et al. 2013)
Although PT has been addressed in various disciplines, few
explicit computational models exist. For example, PT is often
simply attributed to the concept of mirror neurons, without specifying how exactly mirror neurons accomplish PT, and which
mechanisms are made use of in this process. Fleischer et al. (2012)
simulated the development of mirror neurons based on relative
object interaction encodings, suggesting and partially confirming
that many mirror neurons should be view-point dependent. However, their neural model was mainly hard-coded and it did not
include any information processing interaction mechanisms. Purely
feed- forward processing allowed the inference of particular types of
object-interactions.
We believe that PT is an essential ingredient for learning an
interactive mirror-neuron system during cognitive development.
Essentially, for establishing view- independent mirror neuron activities, two requirements need to be met: First, in order to understand
complex actions, the observer must encode an embodied representation of the actor and its surrounding environment. Second, seeing that
the sensory information comes from the observers egocentric perspective, a change in the frame of reference (FoR) becomes necessary
for transferring the perspective into the actors egocentric perspective. Biologically and computationally, both requirements may be met
by employing attributes of a model of the own body during cognitive
development. Doing so, the own body model can be used to continually filter and combine multiple (visual) cues into a rich estimate of
an actors position and orientation. Moreover, the own body model
can be used to compute the necessary spatial translation and rotations,
both of which are inevitably a part of the kinematics of the own body
model. Therefore, we propose that a PT mechanism may recruit a
modular body model of its own inherent, bodily kinematic mappings.
To the best of our knowledge, no explicit computational models
have been applied to model an embodied PT mechanism. A nonembodied approach to PT may be found in Cabido-Lopes and SantosVictor (2003). To fill this gap, we show here that the modular
modality frame model (MMF) (Ehrenfeld and Butz, 2013; Ehrenfeld
et al. 2013), which constitutes a biologically inspired model of body
state estimations, can exhibit embodied PT.
The modular modality frame model (MMF)
At its core, MMF is a Bayesian model of modular body state estimation. MMF distributes the body state over a set M of local modules
mi, such that each module encodes a part pxjmi of the whole
probabilistic body state pxjM . Bayesian filtering reduces noise over
time, and kinematic mappings connect all mi , leading to continuous
information flow between the modules. The modules and their connections are visualized in Fig. 1.
Multiple frames of reference (FoRs) are shown in Fig. 1. The
head-centered FoR is used to encode joint positions (first row), and
limb orientations (second row), where limb orientations consist of
three orthogonal vectorsone parallel to the limb, the other vectors
specifying its intrinsic rotation. An additional FoR is centered on each
body limb and is used to encode the relative orientation of the next
distal limb (third row). Finally, Tait-Bryan angles between adjacent
limbs are encoded (fourth row). An in-depth description can be found
elsewhere (Ehrenfeld and Butz, 2013; Ehrenfeld et al. 2013). In
summary, MMF executes transitions between FoRs and implements
complex information fusion.
MMFs characteristics are ideal to model PT in an embodied way.
When observing the body of an actor, MMF features two ways of
accomplishing the necessary FoR transformation for PT.
First, any visual information arriving in the first two rows, i.e. position
or orientation information relative to the observers body, can be projected along the inverse kinematics (gray dotted arrows) to the third row
(rectangles), thus inferring limb-relative representations. The result can

Cogn Process (2014) 15 (Suppl 1):S1S158

eye or observer
shoulder eye
relative
position

proximal

limb axis

distal

actor
shoulder

actor
elbow

actor
wrist

actor
fingertips

actor torso

upper arm

forearm

hand

actor
shoulder

actor
elbow

actor
wrist

observer
head or
head
torso
relative
orientation
limb
relative
orientation

limb
relative
angles

Fig. 1 The body state is distributed over modules (depicted as


circles, filled circles, rectangles and crossed-out rectangles). Along
the horizontal axis, different body limbs are shown, and along the
vertical axis, different modalities (positions, orientations and angles),
which also use different FoRs: relative to a base (first two rows) or
relative to the next proximal limb (third and fourth row). Arrows
show the kinematic mappings, which connect the modules (dashdotted yellow are the forward kinematics, dotted gray the inverse
kinematics, and solid red are distal-to-proximal mappings)
be projected back along the forward kinematics (yellow dash-dotted
arrows). When the actors shoulder position and torso orientation (filled
circles) are substituted with a base frame of reference during this process
(e.g. position (0,0,0) and orientation (1, 0, 0), (0, 1, 0), (0, 0, 1)), the result
represents the actors limbs in the actor-relative FoR.
Second, any observer-relative visual information arriving in an
observer-relative FoR (first two rows of MMF) may also be directly
transformed into actor- relative FoRs. As before, the model can
accomplish such a transformation by projecting the sensory information along the inverse kinematics (gray dotted) into limb- relative
orientation FoRs, only in this case the next proximal input is substituted with the position and orientation of the actors shoulder and
torso, respectively. Due to the substitutions and because no normalization to the limb length is done, the result in the relative orientation
FoR is actually equal to the actors limbs in the actor-relative FoR.
Equally, the second method can be used to transform objects in the
environment from an observers egocentric perspective to an actors
egocentric perspective.
As both methods rely exclusively on interactions that are built
initially for establishing a distributed representation of the observers
own body, the observer can simply recruit its own body model to infer
the actors perspective.
When the actors shoulder position and orientation are not visible
to the observer, also this base perspective can be inferred by MMF
given at least one shoulder and torso-relative position and orientation
signal. By transforming multiple cues particularly along the distal-toproximal kinematic mappings and fusing them, MMF can build a
robust estimate of the actors shoulder and torso. In the following, we
detail and evaluate these three capabilities of MMF in further detail.
Simulations
An observer might not always be able to perceive an actors torso,
while still being able to perceive other parts of an actors body. The
torso might be occluded, or additional cues might be available for
other body parts (e.g. the observer could touch the actors hand,
providing additional cues, the actors hand could be placed on a wellestablished landmark, such as a door handle, or attention could be

Cogn Process (2014) 15 (Suppl 1):S1S158

S99

0.3

0.2

0.1

10

20

time step
Fig. 3 The estimate of the actors torso in the observers FoR is used
to project an object from the observers FoR to the actors FoR. As
the torso estimate improves (cf. Fig. 2), the object projection
improves as well. The vertical axis is in units of limb lengths, error
bars are SEM
estimate inferred from the relative relations. The error of this fused
estimate is shown in Fig. 4, green. The improvement of the green
performance over the red performance is only possible, because the
torso estimate is filtered over time. The results show how continuous
filtering and information fusion can improve the body state estimate.
Conclusion
Recently, we applied MMF to multimodal sensor fusion, Bayesian
filtering, and sensor error detection and isolation. As shown in Butz
et al. (2014), MMF is also well-suited to model the Rubber Hand
Illusion. Two important characteristics of MMF are its modularly
distributed state representation and its rigorous multimodal Bayesian
fusion, making it highly suitable to model PT in an embodied way.
Our results show that MMF is able to infer position and orientation
estimates of an actors body and objects in the environment from the
actors egocentric perspective. We showed that this is even possible
when the actors head and torso are occluded. Moreover, we showed
that Bayesian filtering is able to improve the process. All results are
obtained by exclusively using the observers own body model, i.e. no
new abilities are required. Thus, the proposed PT approach is fully
embodied.
The resulting PT capability sets the stage for many skills that at
least partially rely on PT. As an estimate of the actors whole body

position
orientation

0.3

without global measurement


with global measurement

0.2

0.1

position
orientation

0.4

10

20

time step
Fig. 2 Error of the estimation of an actors shoulder position and
torso orientation in an observers egocentric FoR. The shoulder and
torso themselves are occluded and are inferred via the body model.
The vertical axis is in units of limb lengths, and error bars are SEM

position estimation error

estimation error

0.4

0.5

estimation error

focused on the hand). In the following, we show how MMF can use
cues from the actors hand state and relative relations between adjacent limbs to build a probabilistic estimate of the actors torso
orientation and shoulder position.
In the following simulations, we assume that the actors torso is
static and its hand moves along an unknown trajectory. To this end,
we model the hands movement as Gaussian noise with mean zero
and a standard deviation of 0.2 rad per angle. The arm has nine
degrees of freedom (three on each joint). In each time step, noisy
sensory input arrives in all modules depicted in Fig. 1 with a crossedout rectangle (standard deviation of 0.02 per dimension in units of
limb length) or a non-crossed-out rectangle (standard deviation of
0.2). Thus, while the fingertip-position and hand-orientation are perceived rather accurately in the observers egocentric perspective,
relations between adjacent limbs are perceived rather inaccurately. In
each time step, MMF projects the sensory information along the solid
red arrows to the torsos position and orientation (filled circles),
where Bayesian filtering reduces the sensor noise. The Euclidean
distance of the resulting torso estimate from the real torso state is
shown in Fig. 2. The results show that despite the high sensory noise
in the relative FoRs and the high movement noise, the orientation of
the actors torso can be inferred with a lower estimation error than the
one inherent in most of the FoRs perceived. Results are averaged over
100 individual runs, where each run samples a different shoulder
position, torso orientation, and arm trajectory.
To infer the actor-relative orientation of objects, the second projection method is evaluated. For this purpose, in each run a new object
with random positions and orientations is created in a sphere of one
limb length around the actors torso. The error of the objects projection into the actors egocentric FoR is shown in Fig. 3. It depends
on both the shoulder position and torso orientation estimates (cf.
Fig. 2). In accordance to the improvement of those estimates, the
objects projection into the actors FoR improves.
Last, we evaluate the effects of multimodal sensor fusion and
Bayesian filtering on the representation of the actors fingertip position in the actors egocentric perspective. At first glance, sensory
input of the relative relations between adjacent limbs (non-crossedout rectangles in Fig. 1) are sufficient to infer the finger- tip position.
The resulting estimation error is shown in Fig. 4, red. It is however
advantageous to also include the eye-relative measurements (crossedout rectangles in Fig. 1). They are projected into the actors egocentric perspective in the same way environmental objects are
projected. However this time the result is fused with the fingertip

1.5

0.5

10

20

time step
Fig. 4 The vertical axis is in units of limb lengths, and error bars are
SEM

123

S100

Cogn Process (2014) 15 (Suppl 1):S1S158

state is maintained over time, angular estimates and changes in these


angular estimates, which allow the inference of current motor
activities, are readily available. Because MMF represents these
estimates body-relative, the inferred motor activities are the same no
matter if the observer or another actor acts. As a consequence,
motor primitives can be activated and movements may be classified
according to these activities. The observer could for example recognize a motion as a biological motion or even infer the desired
effect an actor is trying to achieve. Even more so, the close integration of PT in the body model should allow for easy online
imitation learning. Overall, MMF can be considered to either precede the mirror neuron system and provide it with input or to be
part of the mirror neuron system for simulating and understanding
observed bodily motions of an observed actor.
References
Buckner RL, Carroll DC (2007) Self-projection and the brain. Trends
Cogn Sci 11:4957
Butz MV, Kutter EF, Lorenz C (2014) Rubber hand illusion affects
joint angle perception. PLoS ONE 9(3):e92854
Cabido-Lopes M, Santos-Victor J (2003) Visual transformations in
gesture imitation: What you see is what you do. In: IEEE internatonal conference robot, vol 2, pp 23752381
Ehrenfeld S, Butz MV (2013) The modular modality frame model:
continuous body state estimation and plausibility-weighted
information fusion. Biol Cybern 107(1):6182
Ehrenfeld S, Herbort O, Butz MV (2013) Modular neuron-based body
estimation: maintaining consistency over different limbs,
modalities, and frames of reference. Front Comput Neurosci 7
Ferrari PF, Tramacere A, Simpson EA, Iriki A (2013) Mirror neurons
through the lens of epigenetics. Trends Cogn Sci 17(9):450457
Fleischer F, Christensen A, Caggiano V, Thier P, Giese MA (2012)
Neural theory for the perception of causal actions. Psychol Res
76(4):476493
Heyes C (2010) Where do mirror neurons come from? Neurosci
Biobehav R 34(4):575583
Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu
Rev Neurosci 27:169192

The under-additive effect of multiple constraint


violations
Emilia Ellsiepen, Markus Bader
Goethe-Universitat, Institut fur Linguistik, Frankfurt am Main,
Germany
Keywords
Gradient grammaticality, Harmonic grammar, Quantitative linguistics
Introduction
The quest for quantitative evidence in syntactic and semantic research
has led to the development of experimental methods for investigating
linguistic intuitions with experimental rigor (see overview in Schutze
and Sprouse, 2014). This in turn has inspired a renewed interest in
grammar formalisms built on constraints with numerical weights
(Smolensky and Legendre, 2006; see overview in Pater, 2009). These
formalisms assign each sentence a numerical harmony value as
defined in (1). The harmony H of a sentence S is the negative
weighted sum of all grammatical constraints Ci that are violated by S
X
wCi vS; Ci
1
Harmony of sentence S HS 
i

As discussed in detail by Pater (2009), such formalisms have a


great potential in bringing generative linguistics and cognitive science
in close contact again.

123

One of the challenges brought about by these developments is how


harmony relates to quantitative linguistic evidence. With regard to corpus
frequencies, a large body of research has shown that this relationship is
non-linear (e.g., Goldwater and Johnson, 2003). With regard to gradient
linguistic judgments, the most transparent relationship between constraint weight and perceived grammaticality is postulated by Linear
Optimality Theory (LOT) (Keller, 2006) which explicitly aims at providing a model of gradient grammaticality judgments, in particular as
obtained by the method of magnitude estimation (ME). This method
allows participants to judge sentences on an open-ended continuous
numerical scale relative to a predefined reference sentence. Following
Bard et al. (1996), magnitude estimation has become a kind of gold
standard for assessing grammaticality, although more recently its validity
has been questioned (e.g., Sprouse, 2011). LOT claims that the weight of
a constraint reflects the decrease in acceptability that results from violating the constraint. A further claim of LOT is that multiple constraints
combine additively. Thus, if a sentence contains two constraint violations, its decrease in acceptability should be the sum of the acceptability
decrease of violating each constraint in isolation.
Some evidence against this assumption was found by Hofmeister
et al. (2014). The combined effect of two syntactic constraint violations was under-additive, that is, less than the sum of the separate
effects of the two constraints. However, Hofmeister et al. (2014) used
a non-standard judgment procedure (the thermometer judgment
methodology of Featherston, 2007), and the interaction between the
two constraints was only marginally significant.
We ran two experiments using a standard magnitude estimation
procedure in order to investigate the effect of multiple constraint
violations. Experiment 1 investigates the effect of two severe (hard)
constraint violations; Experiment 2 investigates the effect of a severe
constraint violation coinciding with a mild (soft) constraint violation.
In both cases, we find evidence for under-additivity. In the last part,
we therefore develop the idea of mapping harmony values to
acceptability judgments using a sigmoid linking function to preserve
linear cumulativity in harmony, while allowing for under-additive
effects in acceptability judgments.
Experiment 1
Experiment 1 tested German sentences that contained either no violation at all (2-a), a violation of the position of the finite auxiliary
(2-b), an agreement violation (2-c), or both violations at once (2-d).
While sentences (2b-d) are all ungrammatical in a binary system,
LOT predicts that sentence (2-d) is even less acceptable than (2b-c).
The corresponding constraints AuxFirst and Agree are both considered hard constraints in the sense of Sorace and Keller (2005), that is,
both violations should cause severe decreases in acceptability.
(2) Ich finde, dass die Eltern im Winter an die See
I think, that the parents in winter at the sea
a.

hatten reisen sollen.


have travel should

b. *reisen sollen hatten.


c. *hatte reisen sollen.
d. *reisen sollen hatte.
Method
The ME procedure closely follows the description of the ME method
in Bard et al. (1996) and consisted of a customization phase, where
participants were acquainted with the method by judging the length of
lines and the acceptability of ten training sentences, and the experimental phase.
In each phase, participants first saw the reference stimulus (either a
line or a sentence) and assigned it a numerical value. Afterwards, the
experimental stimuli were displayed one by one, and participants
judged each stimulus relative to the reference stimulus, which
remained visible throughout the experiment. The reference sentence

Cogn Process (2014) 15 (Suppl 1):S1S158

S101

the doctor the patient have helped could


b. *der Doktor dem Patienten helfen konnen hatte.
c. ?dem Patienten der Doktor hatte helfen konnen.
d. *dem Patienten der Doktor helfen konnen hatte.

0.2

SO
OS
predicted
OS

-0.6

-0.2

-0.2

0.2

Agree
noAgree
predicted
noAgree

-0.6

Acceptability (log ratios)

Method
The procedure was the same as in Experiment 1. 32 sentences were
created, all appearing in four versions according to the two violation
types introduced above (4). The experimental sentences were distributed onto four lists according to a Latin square design and combined
with 62 filler sentences. 36 students took part in the study.
Results
Similar to Experiment 1, a repeated measures ANOVA yielded two
main effects of AuxFirst and S [ O as well as an interaction
(F(35,1) = 20.32, p [ .001). As can be seen on the right hand side of

Aux first

Aux last

Verb cluster order


Fig. 1 Results of Experiments 1 and 2

Aux first

Aux last

Verb cluster order

a. linking (xh) = tanh (xh) * 0.75


b. linkinginv (xh) = artanh (xh / 0.75 )
To estimate the weights to be used, we first run a linear model on the
acceptability judgment data of Experiment 1 and extract the coefficients
of the two simple effects of AuxFirst and Agree, disregarding the
interaction term. We apply the inverse of the linking function to these
coefficients to obtain the weights to be used in our model which allow us
to calculate the harmony values for our four candidates:

0.5

der Doktor dem Patienten hatte helfen konnen.

a.

0.0

32 sentences were created, all appearing in four versions according


to the two violation types introduced above (2). The experimental
sentences were distributed onto four lists according to a Latin square
design and combined with 278 filler sentences. 36 students, all native
speakers of German, took part in the study.
Results
The acceptability judgments as obtained by the ME procedure were first
normalized with the judgment of the reference sentence and then log
transformed as is standard practice with ME data. A repeated measures
ANOVA revealed significant main effects for both factors (AuxFirst and
Agree), as well as a significant interaction between the two (F(35,1) =
14.7, p \ .001). As illustrated on the left hand side of Fig. 1, the effects
were not additive (the predicted mean for full additive effects is indicated by the x). The difference between conditions (2-b) and (2-d),
however, was still significant, as indicated by a paired t-test
(t(35) = 3.55, p \ .01).
Experiment 2
Experiment 2 has a similar design as Experiment 1, the difference being
that instead of an agreement violation, a violation of subject-object
order (S [ O) is investigated, again in combination to a violation of
AuxFirst. As shown in (4), the normal order between subject and object
is SO. When the order constraint on subject and object is violated,
sentence acceptability decreases. However, in contrast to Agree, this is
not a hard but a soft constraint in the sense of Sorace and Keller (2005),
resulting in a comparatively mild decrease in acceptability.
(4) Ich glaube, dass I think, that

-0.5

(3) Ich glaube, dass den Bericht der Chef in


I believe that the.ACC report the.NOM boss in
seinem Buro gelesen hat.
his office read has

Fig. 1, the difference between conditions (4-b) and (4-d) is even


smaller than in Experiment 1, but this difference was still significant
(t(35) = 3.32,p \ .01). We can conclude that both hard and soft
constraints affect acceptability in a cumulative fashion in that additional violations systematically lead to lower acceptability. The
effects of the two constraint violations in isolation, however, do not
add up in the case of sentences containing both constraint violations.
Instead, they combine in an under-additive way.
Modelling under-additivity: a sigmoid function to link harmony
with acceptability
While the results above suggest a cumulative effect of constraint violations in that additional violations always lower acceptability, this
cumulativity does not result in a linear decrease. There are at least two
explanations for this under-additivity effect: Either harmony is not a
linear combination of constraint weights multiplied by the number of
violations, or acceptability is not proportional to harmony. In this case, it
is not unlikely that acceptability can still be derived from harmony, but by
a linking function that accounts for the apparent floor effect we found in
the experiments. In this section we will explore the possibility of using a
sigmoid function to link harmony values to acceptability judgments.
If we assume that the constraint weights and thus the harmony values
are given, the appropriate linking function should intuitively preserve the
relative distance between harmony values in the upper range, while
progressively reducing the difference between harmony values in the
lower range, possibly leveling off at a horizontal asymptote which would
correspond to the lower bound of acceptability. A sigmoid function with
its inflection point at 0 and an asymptote which corresponds to the
maximal difference in acceptability could serve this requirement as the
0-point (a structure which does not violate any constraint) is mapped to
zero while increasingly lower values are mapped to values that are closer
together. If we want to estimate the weights from the acceptability data
itself, however, it gets more complicated. If we were to use the differences between acceptability judgments as the weights, we would
subsequently predict higher acceptability than observed for structures
which exhibit only one violation. This problem, however, can be avoided
by first applying the inverse of the sigmoid linking function to the weights
as derived by acceptability judgments. As for choosing the correct
asymptote, this seems to be an empirical question. As an example, we
chose the hyperbolic tangent function with a reduced asymptote of
-0.75 instead of -1 and its corresponding inverse (Fig. 2):

tanh(x)*0.75
(adjusted harmony)

(3), almost literally taken from Keller (2000, sentence (B.18)/page


377), is a sentence with non-canonical word order.

-2

-1

x (harmony)
Fig. 2 A linking function between harmony and adjusted harmony

123

S102

Cogn Process (2014) 15 (Suppl 1):S1S158

Constraint

Coefficient

Weight

AuxFirst

0.46

0.72

Agree

0.40

0.59

H(a) = - (0 + 0*0.72 + 0*0.59) = 0


H(b) = - (0 + 1*0.72 + 0*0.59) = - 0.72
H(c) = - (0 + 0*0.72 + 1*0.59) = - 0.59
H(d) = - (0 + 1*0.72 + 1*0.59) = - 1.31

(5)

-0.5
-0.75

(transformed) harmony

-0.25

0.2
-0.2

Acceptability
Agree
noAgree
Predicted harmony
noAgree sigmoid
noAgree linear

-0.6

Acceptability (log ratios)

To compare the harmony values to the acceptability judgments,


we now apply the linking function to the harmony values and plot the
values in Fig. 3 after aligning the 0-point of the transformed harmony
scale (Axis on the right) to the zero violation case.
Because of rescaling the weights with the inverse, the predicted
values for the two single violation cases match exactly. For the
condition with two violations, the predicted value comes close to the
measured value, and it is much closer than the value predicted under
the assumption of a direct proportionality between harmony and
acceptability, i.e., linear decrease in acceptability.
While it is possible to determine a function that leads to an exact
match by choosing a different multiplier in (5), we leave this step to
further research as ideally this function should be fitted to a variety of
experiments and not only a single one. The existence of a single
linking function for all acceptability judgment experiments

Aux first

Aux last

Verb cluster order

Acceptability

-0.5
-0.75
Aux first

(transformed) harmony

0
-0.25

0.2
-0.2
-0.6

Acceptability (log ratios)

Fig. 3 Measured vs predicted values in Experiment 1

SO
OS
Predicted harmony

Aux last

Verb cluster order


Fig. 4 Measured vs predicted values in Experiment 2

123

OS sigmoid
OS linear

presupposes a fixed lower bound of acceptability relative to the fully


grammatical candidate. To test whether this assumption is reasonable,
we apply the same linking function as above to the results of
Experiment 2 and plot all values in Fig. 4.
The transformed harmony value for the two-violation case here
diverges slightly more from the measured mean, but is still much
closer than the one predicted by a linear combination of weights.
Discussion
The current study makes two independent contributions to the area of
gradient grammaticality research: Firstly, it provides strong evidence
that multiple constraint violations combine in an under-additive
fashion in acceptability judgments measured by magnitude estimation. This holds both for concurrent violations of two hard constraints,
that is constraints that are context independent and cause severe
unacceptability, and for soft constraints which can depend on context
and only cause mild degradation. Secondly, we demonstrated that
using an appropriate linking function that maps harmony to ME
acceptability judgments we are able to model the under-additive
effect in judgments while preserving full additive cumulativity in
harmony values. It remains to be tested whether this linking function,
or an alternative function based on more data, can account for the
whole set of ME judgment data.
The under-additivity in ME judgments that we observed suggests
the existence of a lower bound of perceived grammaticality. If such a
lower bound exists, this questions the appropriateness of ME in two
ways: If there is a cognitive lower bound, the motivation for using an
open-ended scale rather than a bounded scale, like a Likert Scale of
suitable size, seems to disappear. Alternatively, it is possible that the
method itself is not well-suited to capture differences below a certain
threshold as the perception of linear differences might not be linear in
general, as is the case with loudness.
References
Bard EG, Robertson D, Sorace A (1996) Magnitude estimation of
linguistic acceptability. Language 72(1):3268
Featherston S (2007) Data in generative grammar: the stick and the
carrot. Theor Ling 33(3):269318
Goldwater S, Johnson M (2003) Learning OT constraint ranking using
a maximum entropy model. In: Spenader J, Eriksson A, Dahl S
(eds) Proceedings of the Stockholm workshop on variation
within optimality theory. University of Stockholm, pp 111120
Hofmeister P, Casasanto LS, Sag IA (2014) Processing effects in
linguistic judgment data:(super-) additivity and reading span
scores. Lang Cogn 6(01):111145
Keller F (2000) Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD thesis,
University of Edingburgh
Keller F (2006) Linear optimality theory as a model of gradience in
grammar. In: Fanselow G, Fery C, Vogel R, Schlesewsky M (eds)
Gradience in grammar: generative perspectives. Oxford University Press, New York, pp 270287
Pater J (2009) Weigthed constraints in generative linguistics. Cogn
Sci 33:9991035
Schutze CCT, Sprouse J (2014) Judgment data. In Podesva RJ,
Sharma D (eds) Research methods in linguistics. Cambridge
University Press, Cambridge, pp 2750
Smolensky P, Legendre G (2006) The harmonic mind: from neural
computation to optimality-theoretic grammar (2 Volumes). MIT
Press, Cambridge
Sorace A, Keller F (2005) Gradience in linguistic data. Lingua
115(11):14971524
Sprouse J (2011) A test of the cognitive assumptions of magnitude
estimation: Commutativity does not hold for acceptability judgments. Language 87(2):274288

Cogn Process (2014) 15 (Suppl 1):S1S158

Strong spatial cognition


Christian Freksa
University of Bremen, Germany
Motivation
The ability to solve spatial tasks is crucial for everyday life and thus
of great importance for cognitive agents. A common approach to
modeling this ability in artificial intelligence has been to represent
spatial configurations and spatial tasks in form of knowledge about
space and time. Augmented by appropriate algorithms such representations allow the computation of knowledge-based solutions to
spatial problems. In comparison, natural embodied and situated
cognitive agents often solve spatial tasks without detailed knowledge
about underlying geometric and mechanical laws and relationships;
they can directly relate actions and their effects due to spatio-temporal affordances inherent in their bodies and their environments.
Against this background, we argue that spatial and temporal structures
in the body and the environment can substantially support (or even
replace) reasoning effort in computational processes. While the
principle underlying this approach is well knownfor example, it is
applied in descriptive geometry for geometric problem solvingit
has not been investigated as a paradigm of cognitive processing. The
relevance of this principle may not only be to overcome the need for
detailed knowledge that is required for a knowledge-based approach;
it is also in understanding the efficiency of natural problem solving
approaches.
Architecture of cognitive systems
Cognitive agents such as humans, animals, and autonomous robots
comprise brains (resp computers) connected to sensors and actuators.
These are arranged in their (species-specific) bodies to interact with
their (species-typical) environments. All of these components need to
be well tuned to one another to function in a fully effective manner.
For this reason, it is appropriate to view the entire aggregate (cognitive agent including body and environment) as a full cognitive
system (Fig. 1).
Our work aims at investigating the distribution, coordination, and
execution of tasks among the system components of embodied and
situated spatial cognitive agents. From a classical information processing/AI point of view, the relevant components outside the brain or
computer would be formalized in some knowledge representation
language or associated pattern in order to allow the computer to
perform formal reasoning or other computational processing on this
representation. In effect, physical, topological, and geometric relations are transformed into abstract information about these relations
and the tasks are then performed entirely on the information processing level, where true physical, topological, and geometric
relations no longer persist.
This classical information-processing oriented division between
brain/computer on one hand and perception, action, body, and environment on the other hand is only one way of distributing the

Fig. 1 Structure of a full cognitive system

S103
activities involved in cognitive processing [Wintermute and Laird,
2008]. Alternative ways would be (1) to maintain some of the spatial
relations in their original form or (2) to use only mild abstraction for
their representation. Maintaining relations in their original form
corresponds to what Norman [1980] named knowledge in the world.
Use of knowledge in the world requires perception of the world to
solve a problem. The best-known example of mild abstraction is
geographic paper maps; here certain spatial relations can be represented by identical spatial relations (e.g. orientation relations); others
could be transformed (e.g. absolute distances could be scaled). As a
result, physical operations such as perception, route-following with a
finger, and manipulation may remain enabled similarly as in the
original domain. Again, perception is required to use these mildly
abstracted representationsbut the perception task can be easier than
the same task under real-world conditions, for example due to the
modified scale.
A main research hypothesis for studying physical operations and
processes in spatial and temporal form in comparison to formal or
computational structures is that spatial and temporal structures in the
body and the environment can substantially support reasoning effort
in computational processes. One major observation we can make
when comparing the use of such different forms of representation
(formal, mild abstraction, original) is that the processing structures of
problem solving processes differ [Marr 1982]. Different processing
structures facilitate different ease of processing [Sloman 1985].
Our hypothesis can be plainly formulated as:
manipulation + perception simplify computation
While the principle underlying this hypothesis is well knownfor
example, it is applied in descriptive geometry for geometric problem
solvingit has not been investigated as a principle of cognitive
processing.
Reasoning about the world can be considered the most advanced
level of cognitive ability; this ability requires a comprehensive
understanding of the mechanisms responsible for the behavior of
bodies and environments. But many natural cognitive agents
(including adults, children, and animals) lack a detailed understanding
of their environments and still are able to interact with them rather
intelligently. For example, they may be able to open and close doors
in a goal-directed fashion without understanding the mechanisms of
the doors or locks on a functional level. This suggests that knowledgebased reasoning may not be the only way to implementing problem
solving in cognitive systems.
In fact, alternative models of perceiving and moving goal-oriented
autonomous systems have been proposed in biocybernetics and AI
research to model aspects of cognitive agents [e.g. Braitenberg 1984;
Brooks 1991; Pfeifer and Scheier, 2001]. These models physically
implement perceptual and cognitive mechanisms rather than
describing them formally and coding them in software. Such systems
are capable of intelligently dealing with their environments without
encoding knowledge about the mechanisms behind the actions.
The background of the present work has been discussed in detail in
[Freksa 2013; Freksa and Schultheis, in press].
Approach
With our present work, we go an important step beyond previous
embodied cognition approaches to spatial problem solving. We
introduce a paradigm shift which not only aims at preserving spatial
structure, but also will make use of identity preservation; in other
words, we will represent spatial objects and configurations by themselves or by physical spatial models of themselves, rather than by
abstract representations. This has a number of advantages: we can
avoid loss of information due to early representational commitments:
we do not have to decide prematurely which aspects of the world to
represent and which aspects to abstract from. This can be decided
partly during the problem solving procedure. At this stage, additional
contextual information may become available that can guide the
choice of the specific representation to be used.

123

S104
Perhaps more importantly, objects and configurations frequently
are aggregated in a natural and meaningful way; for example, a
chair may consist of a seat, several legs, and a back; if I move one
component of a chair, I automatically (and simultaneously!) move
the other components and the entire chair, and vice versa. This
property is not intrinsically given in abstract representations of
physical objects; but it may be a very useful property from a cognitive point of view, as no computational processing cycles are
required for simulating the physical effects or for reasoning about
them. Thus, manipulability of physical structures may become an
important feature of cognitive processing, and not merely a property
of physical objects.
Similarly, we aim at dealing with perception dynamically, for
example allowing for on-the-fly creation of suitable spatial reference frames: by making direct use of spatial configurations, we can
avoid deciding a priori for a specific spatial reference system in which
to perceive a configuration. As we know from problem solving in
geometry and from spatial cognition, certain reference frames may
allow a spatial problem to collapse in dimensionality and difficulty.
For example, determining the shortest route between two points on a
map boils down to a 1-dimensional problem [Dewdney 1988].
However, it may be difficult or impossible to algorithmically determine a reference frame that reduces the task given on a 2- or
3-dimensional map to a 1-dimensional problem. A spatial reconfiguration approach that makes use of the physical affordance shortcut,
easily reduces the problem from 3D or 2D to 1D. In other cases, it
may be easier to identify suitable spatial perspectives empirically in
the field than analytically by computation. Therefore we may be better
off by allowing certain operations to be carried out situation-based in
the physical spatial configuration as part of the overall problem
solving process.
In other words, our project investigates an alternative architecture
of artificial cognitive systems that may be more closely based on role
models of natural cognitive systems than our purely knowledgebased AI approaches to cognitive processing. We focus on solving
spatial and spatio-temporal tasks, i.e. tasks having physical aspects
that are directly accessible by perception and can be manipulated by
physical action. This will permit outsourcing some of the intelligence for problem solving into spatial configurations.
Our approach is to first isolate and simplify the specific spatial
problem to be solved, for example by identifying an appropriate taskspecific spatial reference system, by removing task-irrelevant entities
from the spatial configuration, or by reconstructing the essence of the
spatial configuration by minimal abstraction. In general, it may be
difficult to prescribe the precise steps to preprocess the task; for the
special case of spatial tasks it will be possible to provide rules or
heuristics for useful preprocessing steps; these can serve as metaknowledge necessary to control actions on the physical level. After
successful preprocessing, it may be possible in some cases to read
an answer to the problem through perception directly off the resulting
configuration; in other cases the resulting spatial configuration may be
a more suitable starting point for a knowledge-based approach to
solving the problem.
Discussion
The main hypothesis of our approach is that the intelligence of
cognitive systems is located not only in specific abstract problemsolving approaches, but alsoand perhaps more importantlyin the
capability of recognizing characteristic problem structures and of
selecting particularly suitable problem-solving approaches for given
tasks. Formal representations may not facilitate the recognition of
such structures, due to a bias inherent in the abstraction. This is,
where mild abstraction can help: mild abstraction may abstract only
from few aspects while preserving important structural properties.
The insight that spatial relations and physical operations are
strongly connected to cognitive processing may lead to a different
division of labor between the perceptual, the representational, the

123

Cogn Process (2014) 15 (Suppl 1):S1S158


computational, and the locomotive parts of cognitive interaction than
the one we currently pursue in AI systems: rather than putting all the
intelligence of the system into the computer, the proposed approach
aims at putting more intelligence into the interactions between
components and structures of the full cognitive system. More specifically, we aim at exploiting intrinsic structures of space and time to
simplify the tasks to be solved.
We hypothesize that this flexible assignment of physical and
computational resources for cognitive problem solving may be closer
to natural cognitive systems than the almost exclusively computational approach; for example, when we as cognitive agents search for
certain objects in our environment, we have at least two different
strategies at our disposal: we can represent the object in our mind and
try to imagine and mentally reconstruct where it could or should be
this would correspond to the classical AI approach; or we can visually
search for the object in our physical environment. Which approach is
better (or more promising) depends on a variety of factors including
memory and physical effort; frequently a clever combination of both
approaches may be best.
Although the general principle outlined may apply to a variety of
domains, we will constrain our work in the proposed project to the
spatio-temporal domain. This is the domain we understand best in
terms of computational structures; it has the advantage that we have
well-established and universally accepted reference systems to
describe and compute spatial and temporal relations.
Our research aims at identifying a bag of cognitive principles and
ways of combining them to obtain cognitive performance in spatiotemporal domains. We bring together three different perspectives, in
this project: (1) the cognitive systems perspective which addresses
cognitive architecture and trade-offs between explicit and implicit
representations; (2) the formal perspective which characterizes and
analyzes the resulting structures and operations; and (3) the implementation perspective which constructs and explores varieties of
cognitive system configurations. In the long-term, we see potential
technical applications of physically supported cognitive configurations for example in the development of future intelligent materials
(e.g. smart skin where distributed spatio-temporal computation is
required but needs to be minimized with respect to computation
cycles and energy consumption).
Naturally, the proposed approach will not be as broadly applicable
as some of the approaches we pursue in classical AI. But it might
discover broadly applicable cognitive engineering principles, which
will help the design of tomorrows intelligent agents. Our philosophy
is to understand and exploit pertinent features of space and time as
modality-specific properties of cognitive systems that enable powerful
specialized approaches in the specific domain of space and time.
However, space and time are most basic for perception and action and
ubiquitous in cognitive processing; therefore we believe that understanding and use of their specific structures may be particularly
beneficial.
In analogy to the notion of strong AI (implementing intelligence
rather than simulating it [Searle 1980]) we call this approach strong
spatial cognition, as we employ real space rather than simulating its
structure.
Acknowledgments
I acknowledge discussions with Holger Schultheis, Ana-Maria Olteteanu, and the R1-[ImageSpace] project team of the SFB/TR 8 Spatial
Cognition. This work was generously supported by the German
Research Foundation (DFG).
References
Braitenberg V (1984) Vehicles: experiments in synthetic psychology.
MIT Press, Cambridge
Brooks RA (1991) Intelligence without representation, Artif Intell
47:139159

Cogn Process (2014) 15 (Suppl 1):S1S158


Dewdney AK (1988) The armchair universe. W.H. Freeman &
Company, San Francisco
Freksa C (2013) Spatial computinghow spatial structures replace
computational effort. In: Raubal M, Mark D, Frank A (eds)
Cognitive and linguistic aspects of geographic space. Springer,
Heidelberg
Freksa C, Schultheis H (in press) Three ways of using space. In:
Montello DR, Grossner KE, Janelle DG (eds) Space in mind:
concepts for spatial education. MIT Press, Cambridge
Marr D (1982) Vision. MIT Press, Cambridge
Norman DA (1980) The psychology of everyday things. Basic Books,
Inc, New York
Pfeifer R, Scheier C (2001) Understanding intelligence. MIT Press,
Cambridge
Searle J (1980) Minds, brains and programs. Behav Brain Sci 3(3):
417457
Sloman A (1985) Why we need many knowledge representation formalisms. In Bramer M (ed) Research and development in expert
systems. Cambridge University Press, New York, pp 163183
Wintermute S, Laird JE (2008) Bimodal spatial reasoning with continuous motion. In: Proceedings of AAAI, pp 13311337

Inferring 3D shape from texture: a biologically inspired


model architecture
Olman Gomez, Heiko Neumann
Inst. of Neural Information Processing, Ulm University, Germany
Abstract
A biologically inspired model architecture for inferring 3D shape
from textures is proposed. The model is hierarchically organized into
modules roughly corresponding to visual cortical areas in the ventral
stream. Initial orientation selective filtering decomposes the input into
low-level orientation and spatial frequency representations. Grouping
of spatially anisotropic orientation responses builds sketch-like representations of surface shape. Gradients in orientation fields and
subsequent integration infers local surface geometry and globally
consistent 3D depth.
Keywords
3D Shape, Texture, Gradient, Neural Surface Representation
Introduction
The representation of depth structure can be computed from various
visual cues such as binocular disparity, kinetic motion and texture
gradients. Based on findings from experimental investigations (Liu
et al. (2004); Tsutsui et al. (2002)) we suggest that depth of textured
surfaces is inferred from monocular images by a series of processing
stages along the ventral stream in visual cortex. Each of these stages
is related to individual cortical areas or a strongly clustered group of
areas (Markov et al. 2013). Based on previous works that develop
generic computational mechanisms of visual cortical network processing (Thielscher and Neumann (2003); Weidenbacher et al.
(2006)) we propose a model that transforms initial texture gradient
patterns into representations of intrinsic structure of curved surfaces
(lines of minimal curvature, local self- occlusions) and 3D depth (Li
and Zaidi (2000); Todd (2004)).
Previous work
Visual texture can assume different component structure which suffers
from compression along the direction of surface slant when the object
appearance curves away from the viewers sight. Texture gradients
provide a potent cue to local relative depth (Gibson, 1950). Several
studies have investigated how size, orientation or density of texture
elements convey texture gradient information (Todd and Akerstrom,
1987). Evidence suggests that patterns of changing energy convey the

S105
basic information to infer shape from texture that need to be integrated
along characteristic intrinsic surface lines (Li and Zaidi, 2000). Previous computational models try to estimate surface orientation from
distortions of the apparent optical texture in the image. The approaches
can be subdivided according to their task specificity and the computational strategies involved. Geometric approaches are suggested to
reconstruct the structure of the metric surface geometry (e.g., Aloimonos and Swain (1985); Bajcsy and Lieberman (1976); Super and
Bovik (1995)). Neural models, on the other hand, infer the relative or
even ordinal structure from initial spatial frequency selective filtering,
subsequent grouping of the resulting output responses and a depth
mapping step (Grossberg et al. 2007; Sakai and Finkel, 1997). The
LIGHTSHAFT model of Grossberg et al. (2007) utilizes scale-selective initial orientation filtering and subsequent long-range grouping.
Relative depth in this model is inferred by depth-to-scale mapping
associating coarse-to-fine filter scales to depth using orientation sensitive grouping cells which define scale- sensitive spatial
compartments to fill-in qualitative depth. Grouping mechanisms can
be utilized to generate a raw surface sketch to establish lines of minimal surface curvature as a ridge-based qualitative geometry
representation (Weidenbacher et al. 2006). Texture gradients can be
integrated to derive local maps of relative surface orientation (as
suggested in Li and Zaidi (2000); Sakai and Finkel (1997)). Such
responses may be integrated to generate globally consistent relative
depth maps from such local gradient responses (Liu et al. 2004).
The above mentioned models are limited to simple objects most
dealing only with regular textures and do not give an explanation as to
how the visual system mechanistically produces a multiple depth
order representation of complex objects.
Model description
Our model architecture consists of a multi-stage network of interacting areas that are coupled bidirectionally (extension of
(Weidenbacher et al. 2006); Fig. 1). The architecture is composed of
four functional building blocks or modules, each one consists of three
stages corresponding to the compartment structure of cortical areas:
feedforward input is initially filtered by a mechanism specific to the
model area, then resulting activity is modulated by multiplicative
feedback signals to enhance their gain, and finally a normalization via
surround competition utilizes a pool of cells in the space-feature
domain.
The different stages can be formally denoted by the following
steady-state equations (with the filter output modulated by feedback
and inhibition by activities from a pool of cells (Eq. 1) and the
inhibitory pool integration (Eq. 2)):


I;FB
I;in
 n  qi;feat
b  f F r 0  1 neti;feat
g
I


1
ri;feat
I;FB
qI;in
a c  f F r 0  1 neti;feat
i;feat
!


X
X
pool
I;in
I
I
ri;feat e 
max ri;feat  Kij
qi;feat d 
2
feat

feat

I,FB
where the
P feedbackII signal I isII defined by neti,feat = [kFB II
ri,feat] +
z2{feat,loc}rz . Here r , r denote output activation of the
generic modules (I, II: two subsequent modules in the hierarchy). The
different three-stage modules roughly correspond to different cortical
areas with different feature dimensions represented neurally (compare
Fig. 1): Cortical area V1 computes orientation selective responses
using a spatial frequency decomposition of the input; area V2
accomplishes orientation sensitive grouping of initial items into
boundaries in different frequency channels to generate representations
of surface curvature properties. Different sub-populations of cells in
V4/IT are proposed to detect different surface features from distributed responses: One is used to extract discontinuities in the
orientation fields (indicative for self-occlusions), another extracts and
analyzes anisotropies in the orientation fields of grouping responses to

123

S106

Cogn Process (2014) 15 (Suppl 1):S1S158

Fig. 1 General overview of models schematics. Texture inputs are decomposed into a space-orientation-frequency domain representation. The
cascaded processing utilizes computational stages with cascades of filtering, top-down modulation via feedback, and competition with activity
normalization

determine slanted surface regions, and one that integrates patches of


anisotropic orientation field representations in order to infer local 3D
depth. The approach suggests that the generation of 2D sketch representation of surface invariants seeks to enhance surface border lines,
while integrating regions with high response anisotropies in the orientation domain (over spatial frequencies) allows the inference of
qualitative depth from texture gradients. The proposed network
architecture is composed of four blocks, or modules, each of which
defines a cascade of processing stages as depicted in Fig. 1. Module I
employs 2D Gabor filters resembling simple cells in area V1. In
module II output responses from the previous module are grouped to
form extended contour arrangements. Activations are integrated by
pairs of 2D anisotropic Gaussian filters separated along the major axis
along the target orientation axis of each orientation band (like in area
V2). Grouping is computed separately in each frequency band. This is
all similar to the LIGHTSHAFT model (Grossberg et al. (2007)) to
compute initial spatial frequency- selective responses and subsequently group them into internal boundaries. Unlike LIGHTSHAFT we
employ frequency-related response normalization such that relative
frequency energy in different channels provide direct input for gradient
estimation. The sum of the responses here give a measure of texture
compression. In module III the grouping responses are in turn filtered by
mechanisms that employ oriented dark-lightdark anisotropic Gaussian
spatial weightings with subsequent normalization (like in Thielscher
and Neumann (2003)). The output is fed back to module II to selectively
enhance occlusion boundaries and edges of the apparent object surface
shape. This recurrence helps to extract a sketch-like representation of
the surface structure similar to (Weidenbacher et al. 2006). Module IV
combines the output of the previous modules and serves as a gradient

123

detector using coarse-grained oriented filters with on/off-subfields (like


area V4). In addition, model area IT functions as directed integrator of
gradient responses using pairs of anisotropic Gaussian long-range
grouping mechanisms truncated by a sigmoid function. These integrate
the gradient cell responses to generate an activation that is related to the
surface depth profile.
Results
We show few results in order to demonstrate the functionality of the new
proposed model architecture. In Fig. 2 the result of computing surface
representations from initial orientation sensitive filtering and subsequent grouping to create a sketch-like shape representation are
shown. Then a map of strong anisotropy in the texture energy is shown.
These anisotropies refer to locations of local slant in the surface orientation relative to the observer view point and operate independent of
the particular texture pattern that appears on the surface.
In Fig. 3 the results of orientation sensitive integration of texture
gradient responses is shown that leads to a viewer-centric surface
depth representation.
These results are compared against the ground truth surface height
map in order to demonstrate the invariance of the inferred shape
independent of the texture pattern in the input.
Discussion and conclusion
A neural model is proposed that extracts 3D relative depth shape
representations of complex textured objects. The architecture utilizes
a hierarchical computational scheme of different stages referring to
cortical areas V1, V2, V4 and IT along the ventral pathway to generate representations of shape and the recognition of objects. The
model also generates a 2D surface sketch from texture images. Such a
sketch contains depth cues such as T-junctions or occlusion

Cogn Process (2014) 15 (Suppl 1):S1S158

S107
selectively integrated for different orientations to generate qualitative
surface depth.
Acknowledgments
O.G. is supported by a scholarship of the German DAAD, ref.no.
A/10/90029.

Fig. 2 Result of grouping initial filter responses in space-orientation


domain (separately for individual frequency channels) for the input
image (upper left). Texture gradient information is calculated over the
normalized responses of cells in different frequency channels (upper
right). Stronger response anisotropies are mapped to white. The short
axis of the anisotropies (strongest compression) coheres with the slant
direction (surface tilt). The maximum responses over frequency and
orientation (white) create a sketch-like representation of the ridges of
a surface corresponding with the orientation of local minimal
curvature (bottom left). Also local junctions occur due to selfocclusions generated by concave surface geometry. The result of
orientation contrast detection (bottom right) is fed back to enhance the
sketch edges

References
Aloimonos J, Swain MJ (1985) Shape from texture. In: Proceedings
of the 9th IJCAI, Los Angeles, CA, pp 926931
Bajcsy R, Lieberman L (1976) Texture gradient as a depth cue.
Comput Graph Image Process 5(1):5267
Gibson JJ (1950) The perception of the visual world. Houghton,
Mifflin
Grossberg S, Kuhlmann L, Mingolla E (2007) A neural model of 3d
shape-from-texture: multiple-scale filtering, boundary grouping,
and surface filling-in. Vision Res 47(5):634672
Li A, Zaidi Q (2000) Perception of three-dimensional shape from
texture is based on patterns of oriented energy. Vision Res
40(2):217242
Liu Y, Vogels R, Orban GA (2004) Convergence of depth from
texture and depth from disparity in macaque inferior temporal
cortex. J Neurosci 24(15):37953800
Markov NT, Ercsey-Ravasz M, Van Essen DC, Knoblauch K, Toroczkai Z, Kennedy H (2013) Cortical high-density
counterstream architectures. Science 342(6158):1238406
Sakai K, Finkel LH (1997) Spatial-frequency analysis in the perception of perspective depth. Netw Comput Neural Syst
8(3):335352
Super BJ, Bovik AC (1995) Shape from texture using local spectral
moments. IEEE Trans PAMI 17(4):333343
Thielscher A, Neumann H (2003) Neural mechanisms of cortico
cortical interaction in texture boundary detection: a modeling
approach. Neuroscience 122(4):921939
Todd JT (2004) The visual perception of 3d shape. Trend Cogn Sci
8(3):115121
Todd JT, Akerstrom RA (1987) Perception of three-dimensional form
from patterns of optical texture. J Exp Psychol Human Percept
Performance 13(2):242
Tsutsui KI, Sakata H, Naganuma T, Taira M (2002) Neural correlates
for perception of 3d surface orientation from texture gradient.
Science 298(5592):409412
Weidenbacher U, Bayerl P, Neumann H, Fleming R (2006) Sketching
shiny surfaces: 3d shape extraction and depiction of specular
surfaces. ACM Trans Appl Percept 3(3):262285

An activation-based model of execution delays


of specific task steps
Fig. 3 3D depth structure computed for different input textures for
the same surface geometry (left). Results of inferred depth structure
are shown (right) for given ground truth pattern (bottom). Relative
error RE measures are calculated to determine the deviation of depth
estimation from the true shape
boundaries as well as ridge-like structures depicting lines of minimum
surface curvature. Unlike previous approaches the model goes beyond
a simple detection of local energies of oriented filtering to explain
how such localized responses are integrated into a coherent depth
representation. Also it does not rely on a heuristic scale-to-depth
mapping, like LIGHTSHAFT, to assign relative depth to texture
gradients and also does not require diffusive filling- in of depth
(steered by a boundary web representation). Instead, responses distributed anisotropically in the orientation feature domain are

Marc Halbrugge, Klaus-Peter Engelbrecht


Quality and Usability Lab, Telekom Innovation Laboratories,
Technische Universitat Berlin, Germany
Abstract
When humans use devices like ticket vending machines, their actions
can be categorized into task-oriented (e.g. selecting a ticket) and
device-oriented (e.g. removing the bank card after having paid).
Device-oriented steps contribute only indirectly to the users goal;
they take longer than their task-oriented counterparts and are more
likely to be forgotten. A promising explanation is provided by the
activation-based memory for goals model (Altmann and Trafton
2002). The objectives of this paper are, first, to replicate the step
prolongation effect of device-orientation in a kitchen assistance
context, and secondly, to investigate whether the activation construct

123

S108
can explain this effect using cognitive modeling. Finally, a necessity
and sensitivity analysis provides more insights into the relationship
between goal activation and device-orientation effects.
Keywords
Cognitive Modeling, HumanComputer Interaction, ACT-R,
Memory, Human Error
Introduction and related work
While the research on task completion times in human- computer
interaction (HCI) has brought many results of both theoretical and
practical nature during the last decades (see John and Kieras 1996, for
an overview), the relationship between interface design and user error is
still unclear in many parts. Notable exceptions are post-completion
errors, when users fail to perform an additional step in a procedure after
they have already reached their main goal (Byrne and Davis 2006). This
concept can be extended to any step that does not directly support the
users goals, independently of the position in the action sequence, and
has been termed device-orientation in this context (Ament et al. 2009).
The opposite (i.e. steps that do contribute to the goal) is analogously
called task-orientation. Device-oriented steps take longer and are more
prone to omission than task-oriented ones (Ament 2011).
A promising theoretical explanation for the effects of device-orientation is provided by the memory for goals model (MFG; Altmann
and Trafton 2002). The main assumption of the MFG is that goals
underlie effects that are usually connected to memory traces, namely
time- dependent activation and associative priming. Within the theoretical framework of the MFG, post-completion errors and increased
execution times for post-completion steps are caused by lack of
activation of the respective sub-goal. A computational implementation of the MFG that can be used to predict sequence errors has been
created by Trafton et al. (2009).
This paper aims at investigating the concept of device- orientation
on the background of the MFG using cognitive modeling with ACT-R
(Anderson et al. 2004). The basic research question is whether human
memory constructs as formalized within ACT-R can explain the
completion time differences between task- and device-oriented steps
found in empirical data.
Experiment
As the empirical basis for our investigation, we decided not to rely on
synthetic laboratory tasks like the Tower of Hanoi game, but instead
use an application that could be used by everyone in an everyday
environment. Our choice fell on a HTML-based kitchen assistant that
had been created for research on ambient assisted living. Besides
other things, the kitchen assistant allows to search for recipes
depending on regional cuisine (French, Italien, German, Chinese) and
type of dish (main dish, appetizer, dessert, pastry). Our experiment
was built around this search feature.
12 subjects (17 % female, Mage = 28.8, SDage = 2.4) were invited
into the lab kitchen and performed 34 individual search tasks of varying
difficulty in five blocks. The user interface (UI) of the kitchen assistant
was presented on a personal computer with integrated touch screen.
Task instructions were given verbally and all user clicks were recorded
by the computer system.7 Individual trials consisted of five phases:
1. Listening to and memorizing the instructions for the given trial.
2. Entering the search criteria (e.g. German and Main dish) by
clicking on respective buttons on the screen. This could also
contain deselecting criteria from previous trials.
3. Initiating the search using a dedicated Search button. This also
initiated switching to a new screen containing the search results
list if this list was not present, yet.
7

The experiment as described here was embedded in a larger


usability study. See Quade et al. (2014) for more details. The
instructions are available for download at http://www.tu-berlin.
de/?id=135088.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


4. Selecting the target recipe (e.g. Lamb chops) in the search
results list.
5. Answering a simple question about the recipe (e.g. What is the
preparation time?) as displayed by the kitchen assistant after
having selected the recipe.
We did not analyze the first and last phase as they do not create
observable clicks on the touch screen. Of the remaining three phases,
entering search criteria and recipe selection are task-oriented, while
the intermediate Search-click is device-oriented.
Results
We recorded a total of 18 user errors. Four were intrusions, nine were
omissions, five were selections of wrong recipes. The application
logic of the kitchen assistant inhibits overt errors during the deviceoriented step We therefore focused on completion time as dependent
variable and discarded all erroneous trials.
As our focus is on memory effects, we concentrated on steps that task
only the memory and motor system. We removed all subtasks that need
visual search and encoding (phase 4: searching for the target recipe in the
results list and clicking on it), and steps that incorporated substantial
computer system response times (i.e. moving to another UI screen).
817 clicks remained for further analysis; 361 (44 %) of these were
device-oriented. The average time to perform a click was 764 ms
(SD = 381) for task-oriented and 977 ms (SD = 377) for deviceoriented steps.
As the kitchen assistant has been created for research in an area
different from HCI, it introduces interfering variables that need to be
controlled. The motor time needed to perform a click on a target
element (i.e. button) depends strongly on the size and distance of the
target as formalized in Fitts law (Fitts 1954). Fitts index of difficulty
(ID) cannot be held constant for the different types of clicks, we
therefore introduced it into the analysis. As the click speed (i.e. Fitts
law parameters) differs between subjects, we used linear mixed
models (NLME; Pinheiro et al. 2013) with subject as grouping factor
and Fitts law intercept and slope within subject. We also observed a
small, but consistent speed-up during the course of the experiment
that led us to the introduction of the trial block as additional interfering variable. The analysis of variance was conducted using R (R
Core Team 2014). All three factors yielded significant results, we
obtained a prolongation effect for device-oriented steps of 104 ms.
The results are summarized in Table 1.
Discussion
The first objective of this paper is met, we could identify a significant
execution time delay for device-oriented steps. How does this effect
relate to the existing literature? Ament et al. (2009) report an insignificant difference of 181.5 ms between task-oriented and deviceoriented steps. This fits well with the empirical averages reported at
the beginning of the results section, although the experimental procedure used there (flight simulation game) led to longer steps with
completion times well above two seconds.
What remains open is whether the proposed cognitive mechanism
behind the time difference, namely lack of activation, can account for
this time difference. The next section addresses this question.

Table 1 Regression coefficients (coef.) with confidence intervals


(CI) and analysis of variance results for the experiment
Factore Name

Coef.

95 % CI of coef.

F1,802

p
\.001

Fitts ID

165 ms

126 to 204 ms

111.1

trial block

-55 ms

-71 to -39 ms

45.9

\.001

Device-orient.

104 ms

53 to 154 ms

16.4

\.001

Individual slopes for Fitts difficulty (ID) ranged from 121 to


210 ms/bit

Cogn Process (2014) 15 (Suppl 1):S1S158

S109

Table 2 Average click time (Mtime), average memory retrieval time (Mmem), determination coefficient (R2), root mean squared error (RMSE),
maximum likely scaled difference (MLSD), and maximum relative difference (%diff) for different amounts of activation spreading (mas)
mas

Mtime

Mmem

R2

RMSE

MLSD

%diff

1785 ms

591 ms

.759

982 ms

16.5

66 %

1509 ms

315 ms

.738

687 ms

12.1

58 %

1291 ms

99 ms

.881

477 ms

8.5

50 %

1231 ms

37 ms

.912

422 ms

7.9

48 %

10

1210 ms

15 ms

.893

406 ms

7.8

48 %

The ACT-R code of the model is available for download at


http://www.tu-berlin.de/?id=135088.

Results
We evaluated the overall fit of the model by dividing the clicks into
eight groups by the screen areas of the origin and target click position
(e.g. from type of dish to search; from search to recipe selection) and
compared the average click times per group between our human
sample and the model. Besides the traditional goodness of fit measures R2 and root mean squared error (RMSE), we applied the
maximum likely scaled difference (MLSD; Stewart and West 2010)
which also takes the uncertainty in the human data into account. The
relative difference between the empirical means and the model predictions is given in percent (%diff). The results for five different
amounts of activation spreading are given in Table 2.
The model is overall slower than the human participants, resulting in
moderately high values for RMSE, MLSD, and relative difference. The
explained variance (R2) on the other hand is very promising and hints at
the model capturing the differences between different clicks quite well.
Sensitivity and necessity analysis
In order to test whether our model also displays the device- orientation effect, we conducted a statistical analysis identical to the one
used on the human data and compared the resulting regression
coefficients. While an acceptable fit of the model is necessary to
support the activation spreading hypothesis, it is not sufficient to
prove it. By manipulating the amount of activation spreading, we can
perform a sensitivity and necessity analysis that provides additional
insight about the consequences of our theoretical assumptions (Gluck
et al. 2010). Average coefficients from a total of 400 model runs are
displayed in Fig. 1. It shows an inverted U-shaped relationship
between spreading activation and the device-orientation effect. For
intermediate spreading activation values, the time delay predicted by
the model falls within the confidence interval of the empirical coefficient, meaning perfect fit given the uncertainty in the data.

Device orientation effect size [ms]

The MFG model


We implemented the memory for goals theory based on the mechanism provided by the cognitive architecture ACT-R (Anderson et al.
2004), as the MFG is originally based on the ACT-R theory (Altmann
and Trafton 2002). Within ACT-R, memory decay is implemented
based on a numerical activation property belonging to every chunk
(i.e. piece of knowledge) in declarative memory. Associative priming
is added by a mechanism called spreading activation.
This led to the translation of the tasks used in our experiment into
chains of goal chunks. Every goal chunk represents one step towards
the target state of the current task. One element of the goal chunk
(slot in ACT-R speak) acts as a pointer to the next action to be
taken. After completion of the current step, this pointer is used to
retrieve the following goal chunk from declarative memory. The time
required for this retrieval depends on the activation of the chunk to be
retrieved. If the activation is too low, the retrieval may fail completely, resulting in an overt error.
The cognitive model receives the task instructions through the
auditive system, just like the human participants did. For reasons of
simplicity, we reduced the information as much as possible. The user
instruction Search for German dishes and select lamb chops for
example translates to the model instruction German on; search push;
lamb-chops on. The model uses this information to create the necessary goal chunks in declarative memory. No structural information
about the kitchen assistant is hard coded into the model, only the
distinction that some buttons need to be toggled on, while others need
to be pushed.
While the model should in principle be able to complete the recipe
search tasks of our experiment with the procedural knowledge described above, it actually breaks down due to lack of activation. Using
unaltered ACT-R memory parameters, the activation of the goal chunks
is too low to be able to reach the target state (i.e. recipe) of a given task.
We therefore need to strengthen our goals and spreading activation is
the ACT-R mechanism that helps us doing so. How we apply spreading
activation in our context is inspired by close observation of one of our
subjects who used self-vocalization for memorizing the current task
information. The self-vocalization contained only the most relevant
parts of the task, which happen to be identical to the task- oriented steps
of the procedure. We analogously theorize that the goal states representing task-oriented steps receive more spreading activation than their
device-oriented counterparts. This assumption is also in line with the
discussion of post-completion errors on the basis of the memory for
goals model in Altmann and Trafton (2002).
For the evaluation of the model, we used ACT-CV (Halbrugge
2013) to connect it directly to the HTML-based user interface of the
kitchen assistant. In order to be able to study the effect of spreading
activation in isolation, we disabled activation noise and manipulated
the value of the ACT-R parameter that controls the maximum amount
of spreading activation (mas). The higher this parameter, the more
additional activation is possible.8

150

100

50

0
2

10

ACTR activation spreading parameter (mas)

Fig. 1 Device orientation effect size depending on spreading activation amount. The shaded area between the dotted lines demarks the
95 % confidence interval of the effect in the human sample

123

S110
Discussion
The MFG model is able to replicate the effects that we found in our
initial experiment. The model being overall slower than the human
participants could be caused by the rather low Fitts law parameter
used within ACT-R (100 ms/bit) compared to the 165 ms/bit that we
observed.
Spreading activation is not only necessary for the model to be able
to complete the tasks, but also to display the device-orientation effect
(Fig. 1). We can infer that the activation assumption is a sound
explanation of the disadvantage of device-oriented steps. Too much
spreading activation reduces the effect again, though. This can be
explained by a ceiling effect: The average retrieval time gets close to
zero for high values of mas (Mmem in Table 2), thereby diminishing
the possibility for timing differences.
How relevant is a 100 ms difference in real life? Probably not too
much by itself. What makes it important is its connection to user
errors. Errors itself are hard to provoke in the lab without adding
secondary tasks that interrupt the user or create strong working
memory strain, thereby substantially lowering external validity.
Conclusions
The concept of device-orientation versus task-orientation is an
important aspect of humancomputer interaction. We could replicate
that the device-oriented parts of simple goal- directed action
sequences take approximately 100 ms longer than their task-oriented
counterparts. With the help of cognitive modeling, associative priming could be identified as a possible explanation for this effect.
Acknowledgments
The authors gratefully acknowledges financial support from the
German Research Foundation (DFG) for the project Automatische
Usability-Evaluierung modellbasierter Interaktionssysteme fur
Ambient Assisted Living (AL-561/13-1).

References
Altmann EM, Trafton JG (2002) Memory for goals: an activationbased model. Cogn Sci 26(1):3983
Ament MG, Blandford A, Cox AL (2009) Different cognitive
mechanisms account for different types of procedural steps. In:
Taatgen NA, van Rijn H (eds) Proceedings of the 31nd annual
conference of the cognitive science society, Amsterdam, NL,
pp 21702175
Ament MGA (2011) The role of goal relevance in the occurrence of
systematic slip errors in routine procedural tasks. Dissertation,
University College London
Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y
(2004) An integrated theory of the mind. Psychol Rev
111(4):10361060
Byrne MD, Davis EM (2006) Task structure and postcompletion error in
the execution of a routine procedure. Hum Factors 48(4):627 638
Fitts PM (1954) The information capacity of the human motor system
in controlling the amplitude of movement. J Exp Psychol
47(6):381391
Gluck KA, Stanley CT, Moore LR, Reitter D, Halbrugge M (2010)
Exploration for understanding in cognitive modeling. J Artif Gen
Intell 2(2):88107
Halbrugge M (2013) ACT-CV: Bridging the gap between cognitive
models and the outer world. In: Brandenburg E, Doria L, Gross
A, Gunzlera T, Smieszek H (eds) Grundlagen und Anwendungen
der Mensch-Maschine-Interaktion10. Berliner Werkstatt
Mensch- Maschine-Systeme, Universitatsverlag der TU Berlin,
Berlin, pp 205210
John BE, Kieras DE (1996) Using GOMS for user interface design
and evaluation: which technique? ACM Trans Comput Hum
Interact (TOCHI) 3(4):287319

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2013) nlme:
Linear and Nonlinear Mixed Effects Models. R package version
3.1-113
Quade M, Halbrugge M, Engelbrecht KP, Albayrak S, Moller S
(2014) Predicting task execution times by deriving enhanced
cognitive models from user interface development models. In:
Proceedings of the 2014 ACM SIGCHI symposium on engineering interactive computing systems, ACM, New York,
pp 139148
R Core Team (2014) R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna,
Austria, http://www.R-project.org. Accessed 7 May 2014
Stewart TC, West RL (2010) Testing for equivalence: a methodology
for computational cognitive modelling. J Artif Gen Intell
2(2):6987
Trafton JG, Altmann EM, Ratwani RM (2009) A memory for goals
model of sequence errors. In: Howes A, Peebles D, Cooper RP
(eds) Proceedings of the 9th International Conference of Cognitive Modeling, Manchester, UK

How action effects influence dual-task performance


Markus Janczyk, Wilfried Kunde
Department of Psychology III, University of Wurzburg, Germany
Doing multiple tasks at once typically involves performance costs in
at least one of these tasks. This unspecific dual-task interference
occurs regardless of the exact nature of the tasks. On top of that,
several task characteristics determine how well tasks fit with each
other. For example, if two tasks require key press responses with the
left and right hand, performanceeven in the first performed Task
1is better if both responses entailed the same spatial characteristic
(i.e., if two left or two right responses are required compared with
when one left and one right response is required), the so-called
backward-crosstalk effect (BCE; Hommel, 1998). Similarly, a
mental rotation is faster when it is preceded by or performed simultaneously with a manual rotation into the same direction compared to
when both rotations go into opposite directions (Wexler, Kosslyn,
Berthoz, 1998; Wohlschlager, Wohlschlager, 1998). These examples
are cases of specific dual-task interference.
Given that the aforementioned tasks require some form of motor
output, one may ask: how is this motor output selected? A simple
solution to this question has been offered already by philosophers of
the 19th century (e.g., Harle, 1861) and has experienced a revival in
psychology in recent decades (e.g., Hommel, Musseler, Aschersleben,
Prinz, 2001): the ideomotor theory (IT). The basic idea of IT is that,
first, bidirectional associations between motor output and its consequences (= action effects) are learned. Later on, this bidirectionality is
exploited for action selection: motor output is accessed by mentally
anticipating the action effects. Conceptually, action effects can be
distinguished as being environment-related (e.g., a light that is switched on by pressing a key) or body-related (e.g., the proprioceptive
feedback from bending the finger).
Against this background, consider again the case of mental and
manual rotations. Turning a steering wheel clockwise gives rise to
body-related proprioceptive feedback resembling a clockwise turn
and even to obvious environment-related action effects, because one
sees his/her hand and the wheel turning clockwise. According to IT,
exactly these effects are anticipated to select the motor output.
However, the rotation directions of (anticipated) effects and of the
actual motor output are confounded then. Consequently, one may
wonder whether the manual rotation or rather the (anticipated) effect
rotation is what determines the specific interference with a mental
rotation. The same argument applies to the BCE: Pressing a left of
two response keys requires anticipation of, for example, a left body-

Cogn Process (2014) 15 (Suppl 1):S1S158

S111

related action effect, which is thus similarly confounded with the


spatial component of the actual motor output. Given the importance
IT attributes to action effects for action selection, we hypothesized
that action effects determine the size and direction of specific interference in such cases. We here present results from two studies that
aimed to disentangle the contributions of motor output and the
respective action effects. Conceivably, it is rather difficult to
manipulate body-related action effects, the approach was thus to
couple the motor output with environment-related action effects.
In a first study we have investigated the interplay of manual and
mental rotations (Janczyk, Pfister, Crognale, Kunde, 2012). To disentangle the directions of manual and effect rotations, we resorted to
an instrument from aviation known as the attitude indicator or artificial horizon. This instrument provides the pilot with information
about deviations from level-flight (perfect horizontal flying). Notably,
two versions of this instrument are available (Previc, Ercoline, 1999;
see also Fig. 1). In a plane-moving display, the horizon remains fixed
and turns of a steering wheel are visualized by the corresponding
turns of the plane. Consequently, turning a steering wheel counterclockwise results in an action effect rotating into the same direction.
Obviously, manual and effect rotation are confounded but provide a
benchmark against which the critical condition using a horizonmoving display can be compared. In this display, the plane remains
fixed but the horizon rotates. Consequently, turning the steering wheel
counter-clockwise gives rise to an action effect turning clockwise. In
our experiments, a mental rotation task (Shepard, Metzler, 1971) was
followed by a manual rotation task that required turning a steering
wheel. The planes curve due to this steering wheel turn was either
visualized with the plane-moving or the horizon-moving display.
First, with the plane-moving display the manual rotation was initiated
faster when the preceding mental rotation went into the same direction (essentially replicating the Wohlschlager, Wohlschlager, 1998,
and Wexler et al. 1998, results but with the reversed task order).
Second, with the horizon-moving display, the manual rotation was
initiated faster when the preceding mental rotation was into the
opposite direction (Exp 3). Here, however, the mental and the effect
rotation were into the same direction. Thus, these results suggest that
important for the specific interference between mental and manual
rotations is not so much the motor output itself but rather what follows from this motor output as a consequence.
In a second study we used a similar strategy to investigate the
origin of the BCE (Janczyk, Pfister, Hommel, Kunde, 2014). In
Experiment 1 of this study, participants were presented with colored
letters as stimuli (a green or red H or S). Task 1 required a response to
the color with a key press of the left hand (index or middle finger) and
Task 2 required a subsequent key press depending on the letter
identity with the right hand (index or middle finger). The BCE in this
case would be reflected in better performance in Task 1 if both tasks
required a left or a right response (compatible R1-R2 relations)
compared to when one task required a left response and the other a
right response (incompatible R1-R2 relations). The critical addition

was that pressing a response key with the right hand briefly flashed a
left or right light (i.e., an environment-related action effect), and this
was the participants goal. One group of participants (the R2-E2
compatible group; see Fig. 2, left part) flashed the left light with a left
key press (of the right hand) and the right light with a right key press
(of the right hand). This group produced a BCE, that is, better Task 1
performance was observed with compatible R1-R2 relations (see
Fig. 2, middle part). Again though, relative locations of motor output
and action effects were confounded. Therefore, another group of
participants (the R2-E2 incompatible group; see Fig. 2, right part)
flashed the left light with a right key press (of the right hand) and the
right light with a left key press (of the right hand). Now Task 1
performance was better with incompatible R1-R2 relations (see
Fig. 2, middle part). This, however, means that the relative locations
of (body-related) action effects of the Task 1 response and the
environment-related action effects of Task 2 were compatible. This
basic outcome was replicated with continuous movements and action
effects (Exp 2) and also when both tasks resulted in environmentrelated action effects (Exp 3).
The generative role of anticipated action effects for action
selection, a pillar of IT, has been investigated in single task settings
in numerous studies. The studies summarized in this paper extend
this basic idea to dual-task situations and tested our assertion that
mainly the (anticipated) action effects determine the size and
direction of specific interference phenomena. In sum, the results
presented here provide evidence for this (see also Janczyk, Skirde,
Weigelt, Kunde, 2009, for converging evidence). In broader terms,
action effects can be construed as action goals. Thus, it is not so
much the compatibility of motor outputs and effectors but rather the
compatibility/similarity of action goals that induces performance
costs or facilitation. Such an interpretation also bears potential for
improving dual-task performance and ergonomic aspects in, for
example, working environments.

Fig. 1 Illustration of the tasks used by Janczyk et al. (2012)

Acknowledgments
This research was supported by the German Research Foundation
(Deutsche Forschungsgemeinschaft, DFG; projects KU 1964/2-1, 2).
References
Harle E (1861) Der Apparat des Willens. Z Philos philos Kri 38:
5073
Hommel B (1998) Automatic stimulusresponse translation in dualtask performance. J Exp Psychol Human 24: 13681384
Hommel B, Musseler J, Aschersleben G, Prinz W (2001) The theory
of event coding: a framework for perception and action. Behav
Brain Sci 24: 849878
Janczyk M, Pfister R, Crognale MA, Kunde W (2012) Effective
rotations: action effects determine the interplay of mental and
manual rotations. J Exp Psychol Gen 141: 489501

Fig. 2 Illustration of the tasks used by Janczyk et al. (2014) and the
results of their Experiment 1. Error bars are within-subject standard
errors (Pfister, Janczyk 2013), computed separately for each R2-E2
relation group (see also Janczyk et al. 2014)

123

S112
Janczyk M, Pfister R, Hommel B, Kunde W (2014) Who is talking in
backward crosstalk? Disentangling response- from goal-conflict
in dual-task performance. Cognition 132: 3043
Janczyk M, Skirde S, Weigelt M, Kunde, W (2009) Visual and tactile
action effects determine bimanual coordination performance.
Hum Movement Sci 28: 437449
Pfister R, Janczyk, M (2013) Confidence intervals for two sample
means: Calculation, interpretation, and a few simple rules. Adv
Cogn Psychol 9: 7480
Previc FH, Ercoline WR (1999) The outside-in attitude display
concept revisited. Int J Aviat Psychol 9: 377401
Shepard RN, Metzler J (1971) Mental rotations of three-dimensional
objects. Science 171: 701703
Wexler M, Kosslyn SM, Berthoz A (1998) Motor processes in mental
rotation. Cognition 68: 7794
Wohlschlager A, Wohlschlager A (1998) Mental and manual rotations. J Exp Psychol Human 24: 397412

Introduction of an ACT-R based modeling approach


to mental rotation
Fabian Joeres, Nele Russwinkel
Technische Universitat Berlin, Department of cognitive modeling
in dynamic human machine systems, Berlin, Germany
Introduction
The cognitive processes of mental rotation as postulated by Shepard
and Metzler (1971) have been extensively studied throughout the last
decades. With the introduction of numerous humanmachine interface concepts that are integrated the humans spatial environment
(e.g. augmented-reality interfaces such as Google Glass or virtualreality interfaces such as Oculus Rift), human spatial competence and
its understanding have become more and more important. Mental
rotation is seen as one of three main components of human spatial
competence (Linn and Petersen, 1985). A computational model of
mental rotation was developed to help understand the involved cognitive processes. This model integrates a wide variety of empirical
findings on mental rotation. It was validated in an experimental study
and can be seen as a promising approach for further modeling of more
complex, application-oriented tasks that include spatial cognitive
processes.
Mental Rotation
In an experiment on object recognition, Shepard and Metzler (1971)
displayed two abstract three-dimensional objects from different perspectives to their participants. Both images showed either the same
object (same-trials) or mirrored versions of the same object (differenttrials). The objects were rotated around either the vertical axis or
within the picture plane. Subjects were asked to determine if both
images showed the same object. The authors found that the reaction
time needed to match two objects forms a linear function of the
angular disparity between those objects. The slope of that linear
function is called rotation rate. Following Shepard and Metzlers
interpretation of some analogue rotation process, this means that a
high rotation rate represents slow rotation, whereas fast rotation is
expressed by a low rotation rate.
Since Shepard and Metzlers (1971) experiment, numerous studies
have been conducted on the influences that affect the rotation rate of
mental rotation. Based on these findings and on process concepts
suggested by various authors, a cognitive model has been developed.
The following section summarizes the three main assumptions that
the model is based on.
First, it is assumed that the linear dependence of angular displacement
and reaction time is based on an analogue transformation of mental

123

Cogn Process (2014) 15 (Suppl 1):S1S158


images. Besides this widely found linearity, this claim is supported by
psychophysiological findings that are summarized in Kosslyn (1996).
The second assumption stems from findings about the influence of
object complexity (e.g. Bethell-Fox and Shepard, 1988; Yuille and
Steiger, 1982). It is assumed that objects can be rotated holistically if
they are sufficiently familiar. If an object is not, it will be broken down
into its components until these components are simple (i.e., familiar)
enough to rotate. Then these components will be rotated subsequently.
Third, the mental images that are maintained and transformed
throughout the rotation task are assumed to be subject to activation
processes. This means that they have to be reactivated during the
process. This assumption is suggested by Kosslyn (1996) and fits
Cowans (1999) activation of working memory contents. It is furthermore supported by Just and Carpenters (1976) results. Analyzing
eye movement during a mental rotation task, the authors found frequent fixation changes between both object images.
Cognitive Model
A full description of the model is beyond the scope of this paper. This
section gives a short overview of the process steps that were derived from
the abovementioned assumptions. The described process applies to
mental rotation tasks in which both stimuli are presented simultaneously.
1. Stimulus encoding
The first image is encoded and a three-dimensional object representation (mental image) is created.
2. Memory retrieval
Based on the three-dimensional representation, long term memory
is called to check if the encoded object is familiar enough to process
its representation. If so, the representation is stored in working
memory and the second image is encoded. The created representation
is used as reference in the following process steps (reference image).
If the object is not familiar enough, the same retrieval is conducted for
an object component and the information about the remaining component(s) is stored in working memory.
3. Initial Search
Several small transformations (i.e., rotations around different axes
by only a few degrees) are applied to the mental image that was first
created (target image). After each small rotation, the model evaluates if
the rotation reduced the angular disparity between both mental images.
The most promising rotation axis is chosen. The decision (as well as the
monitoring process in the following step) is based on previously identified corresponding elements of the object representations.
4. Transform and Compare
After defining the rotation axis in step 3, the target image is rotated
by this axis. During this process, the target representations orientation
is constantly monitored and compared to the reference representation.
The rotation is stopped when both representations are aligned.
5. Confirmation
If the object was processed piecemeal, the previously defined
rotation is applied to the remaining object components. After that
propositional descriptions of all object parts is created for both mental
images. A comparison of these delivers a decision for same object
or different objects.
6. Reaction
Based on the decision a motor response is triggered.
Steps 3, 4, and 5 are inspired by the equally named process steps
suggested by Just and Carpenter (1976). However, although their
purpose is similar to that of Just and Carpenters steps, the details of
these sub-processes are different.

Cogn Process (2014) 15 (Suppl 1):S1S158


Furthermore, due to the abovementioned activation processes, steps
3 to 5 can be interrupted if the activation of one or both mental images
falls below a threshold. In that case, a reactivation sub-process is triggered that includes re-encoding of the corresponding stimulus.
The model was implemented within the cognitive architecture
ACT-R (Anderson et al. 2004). Since ACT-R does not provide
structures for modeling spatial cognitive processes, an architecture
extension based on Gunzelmann and Lyons (2007) concept was
developed. Adding the structures for spatial processes to the architecture will enable ACT-R modelers to address a broad range of
applied tasks that rely on spatial competence.
Experiment
Empirical validation is an integral part of the model development
process. Therefore, an experimental study was conducted to test the
assumptions about training effects and to predict these effects with the
above mentioned model.
Experimental approach
The experimental task was a classic mental rotation task with threedimensional objects as in Shepard and Metzlers (1971) study. In this
task, two images were displayed simultaneously. Reaction times were
measured for correctly answered same-trials. Different-trials and
human errors include cognitive processes that are not addressed by
the discussed cognitive model.
To test the assumption of object-based learning, the objects
occurred with different frequencies. The entire stimulus set consisted
of nine objects, adopted from the stimulus collection of Peters and
Battista (2008). One third of all trials included the same object,
making this familiar object occur four times as often as each of the
other eight unfamiliar objects. The object used as familiar was balanced over the participants.
To capture training effects, the change of rotation rates was
monitored. Following the approach of Tarr and Pinker (1989), the
experiment was divided into eight blocks. In each block one rotation
rate for the familiar object and one rotation rate for all unfamiliar
objects were calculated, based on the measured reaction times.
The described model is designed to predict learning-induced
changes in the rotation rate. As Schunn and Wallach (2005) suggest,
two measures for the models goodness of fit were used to evaluate the
model. As the proportion of data variance that can be accounted for
by the model, r2 is a measure of how well the model explains trends in
the experimental data. RMSSD (Root Mean Squared Scaled Deviation), however, represents the datas absolute deviation, scaled to the
experimental datas standard error.
Experimental Design
The study has a two-factorial within-subjects-design with repeated
measurements. As first independent variable the experimental block
was varied with eight levels. This variable represents the participants
state of practice. The second independent variable was object familiarity (two levels: familiar object and unfamiliar objects). As
dependent variable, two rotation rates were calculated per block and
subject (one for each class of objects).
Sample
27 subjects (18f, 9 m) participated in the study. The participants age
ranged from 20 to 29 years (m = 26.1). Two persons received course
credit, the others were paid 10 for participation.
Procedure
After receiving instructions, the participants were required to complete eight experimental blocks, each including 48 trials. Of these 48
trials, 16 trials displayed the familiar object. Half the trials were
same- the other half were different-trials.
Results
The experiment was repeated for two subjects because the number of
correct same trials was too low to calculate valid rotation rates in

S113
numerous blocks. Also, the last two experimental blocks were
excluded from data analysis because fatigue interfered with the
training effects. Generally, the expected training and object familiarity effects occurred, as reported in Joeres and Russwinkel
(accepted).
The effects that were found in the experiment (Ex) and predicted
by the model (M) are displayed in Fig. 1. It can be seen that the
predicted rotation rates are considerably lower than the experimentally found. A possible explanation for this disparity can be found in
the abovementioned reactivation process that includes re-encoding of
the stimuli. The model, however, does not claim to address stimulus
encoding validly. Therefore, duration differences in this process can
cause the data deviation. Nevertheless, the trends, i.e. the shape of the
learning curves, are validly predicted by the model. This is the case
for the familiar object and for the unfamiliar objects, respectively.
This first impression is confirmed by the goodness-of-fit measures,
as listed in Table 1. Although no golden standard exists for these
measures, it can be said that the absolute value deviation is rather high
with a mean RMSSD = 4.53. The data trends, however, were matched rather well, as indicated by the high r2 values (Fig. 1).
Discussion
The presented study showed that the model can validly replicate
certain training effects in mental rotation. It can therefore be seen as a
promising approach for modeling mental rotation and, with further
research, mental imagery.
As briefly discussed, the model assumptions are partially based on
eye movement data. Therefore, further model validation data should
be provided in a follow-up study in which eye movement during a
mental rotation task is predicted by the model and evaluated
experimentally.

Table 1 Goodness-of-fit measures

Condition

RMSSD

r2

Familiar object

5.03

.74

Unfamiliar object

4.02

.80

Mean

4.53

.77

Fig. 3 Experimental (Ex) and model (M) data

123

S114
If that study is successful, the model can be extended to further
types of stimuli and to more complex, application-oriented tasks
including mental imagery.
References
Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y
(2004) An integrated theory of the mind. Psychol Rev
111(4):10361060. doi:10.1037/0033-295X.111.4.1036
Bethell-Fox CE, Shepard RN (1988) Mental rotation: effects of
stimulus complexity and familiarity. J Exp Psychol Human
Percept Performance 14(1):1223
Cowan N (1999) An embedded-process model of working memory. In
Miyake A, Shah P (eds) Models of working memory. Mechanisms of active maintenance and executive control. Cambridge
University Press, Cambridge, pp 62101
Gunzelmann G, Lyon DR (2007) Mechanisms for human spatial
competence. In Barkowsky T, Knauff M, Ligozat G, Montello
DR (eds) Spatial cognition V: reasoning, action, interaction,
pp 288308
Joeres F, Russwinkel N (accepted). Object-related learning effects in
mental rotation. In: Proceedings of the Spatial Cognition 2014,
Bremen.
Just MA, Carpenter PA (1976) Eye fixations and cognitive processes.
Cogn Psychol 8(4):441480. doi:10.1016/0010-0285(76)90015-3
Kosslyn SM (1996) Image and brain: the resolution of the imagery
debate: the resolution of the imagery debate, 1st edn. A Bradford
book. MIT Press, Cambridge
Linn MC, PetersenAC (1985) Emergence and characterization of sex
differences in spatial ability: a meta-analysis. Child Dev14791498
Peters M, Battista C (2008) Applications of mental rotation figures of
the Shepard and Metzler type and description of a mental rotation
stimulus library. Brain nd Cogn 66(3):260264. doi:10.1016/j.
bandc.2007.09.003
Schunn CD, Wallach D (2005) Evaluating goodness-of-fit in comparison of models to data. In: Psychologie der Kognition: Reden
und Vortrage anlasslich der Emeritierung von Werner Tack
Shepard RN, Metzler J (1971) Mental rotation of three-dimensional
objects. Science 171:701703
Shepard S, Metzler D (1988) Mental rotation: effects of dimensionality of objects and type of task. J Exp Psychol Human Percept
Performance 14(1):311
Tarr MJ, Pinker S (1989) Mental rotation and orientation-dependence
in shape recognition. Cogn Psychol 21(2):233282. doi:10.1016/
0010-0285(89)90009-1
Yuille JC, Steiger JH (1982) Nonholistic processing in mental rotation:
some suggestive evidence. Percept Psychophys 31(3):201209

Processing linguistic rhythm in natural stories:


an fMRI study
Katerina Kandylaki1, Karen Bohn1, Arne Nagels1, Tilo Kircher1,
Ulrike Domahs2, Richard Wiese1
1
Philipps Universitat Marburg, Germany; 2 Universitat zu Koln,
Germany
Keywords
Rhythm rule, Speech comprehension, rhythmic irregularities, fMRI
Abstract
Language rhythm is assumed to involve an alternation of strong and
weak beats within a certain linguistic domain, although the beats are not
necessarily isochronously distributed in natural language. However, in
certain contexts, as for example in compound words, rhythmically
induced stress shifts occur in order to comply with the so-called Rhythm
Rule (Liberman, Prince 1977). This rule operates when two stressed

123

Cogn Process (2014) 15 (Suppl 1):S1S158


adjacent syllables create a stress clash or adjacent unstressed syllables
(stress lapse) occur. Experimental studies on speech production, judgment of stress perception, and event-related potentials (ERPs) (Bohn,
Knaus, Wiese, Domahs 2013) have found differences in production,
ratings, and ERP components respectively, between well-formed
structures and rhythmic deviations. The present study builds up on these
findings by using functional magnetic resonance imaging (fMRI) in
order to localize rhythmic processing (within the concept of Rhythm
Rule) in the brain. Other fMRI studies on linguistic stress found effects
in the supplementary motor area, insula, precuneus, superior temporal
gyrus, parahippo- campal gyrus, calcarine gyrus and inferior frontal
gyrus (Domahs, Klein, Huber, Domahs 2013; Geiser, Zaehle, Jancke,
Meyer 2008; Rothermich, Kotz 2013). However, what other studies
have not investigated yet is rhythm processing in natural contexts, thus
in the course of a story which is not further controlled for a metrically
isochronous speech rhythm. Here we examine the hypotheses that a)
well-formed structures are processed differently than rhythmic deviations in compound words for German, b) this happens in speech
processing of stories in the absence of a phonologically related task
(implicit rhythm processing).
Our compounds consisted of three parts (A(BC)) that build a premodifier-noun combination. The modifier was either a monosyllabic
noun (Holz, wood) or a bisyllabic noun (Plastik, plastic) with
lexical stress on the initial syllable. The premodifier was followed by a
disyllabic noun bearing compound stress on the initial syllable in isolation (Spielzeug, toy). When combining these two word structures
the pre- modifier bears overall compound stress and the initial stress of
the disyllabic noun should be shifted right- wards to its final syllable, in
order to be in accordance to the Rhythm Rule: Holz-spiel-zeug (wooden
toy(s)). On the other hand if the disyllabic noun is combined with a
preceding disyllabic noun bearing initial stress, a shift is unnecessary
allowing for the stress pattern: Pla- stik-spiel-zeug (plastic toy(s)). The
first condition we call SHIFT and the second NO SHIFT. In contrast to
these well-formed conditions we induce rhythmically ill- formed conditions: CLASH for the case that Holz- spiel-zeug keeps the initial stress
of its compounds and LAPSE when we introduce the unnecessary shift
in Pla- stik-spiel-zeug. We constructed 20 word pairs following the
same stress patterns as Holz-/Plastikspielzeug and embedded them in
20 two-minute long stories. Our focus when embedding the conditions
was the naturalness of the stories. For example, word-pair Holzspielzeug vs. Plastikspielzeug would thus appear in the following
context:The clown made funny grim- aces, reached into his red cloth
bag and threw a small wooden toy to the lady in the front row. vs.The
toys, garden chairs and pillows remained however outside. The mother
wanted to tidy up the plastic toys from the garden after dinner.
We obtained images (3T) of 20 healthy right-handed German
monolinguals (9 male) employing a 2x2 design: well-formedness
(rhythmically well-formed vs. ill-formed) x rhythm-trigger (monosyllabic vs. disyllabic premodifier). Subjects were instructed to listen
carefully and were asked two comprehension questions after each
story. On the group level we analyzed the data in the 2x2 design
mentioned above. Our critical events were the whole compound
words. We report clusters of p \ .005 and volumes of at least 72
voxels (Monte Carlo corrected).
For the main effect of well-formedness we found effects in the left
cuneus, precuneus and calcarine gyrus. For the main effect of rhythmtrigger we found no significant differences at this supra-threshold
level, which was expected since we did not hypothesize an effect of
the length of the premodifier. Our main finding is the interaction of
well-formedness and rhythmic-trigger in the precentral gyrus bilaterally and in the right supplementary motor area (SMA). Since the
interaction was significant we calculated theoretically motivated pairwise contrasts within one rhythmic trigger level. For the monosyllabic
premodifier CLASH vs. SHIFT revealed no significant clusters, but,
interestingly, the opposite contrast (SHIFT vs. CLASH) showed differences in the right superior frontal gyrus, right inferior frontal gyrus

Cogn Process (2014) 15 (Suppl 1):S1S158


(rIFG, BA 45), right lingual and calcarine gyrus, bilateral precentral
gyrus (BA6,BA4), left precentral gyrus (BA3a). For the bisyllabic
premodifier LAPSE vs. NO SHIFT activated significantly the left
inferior temporal gyrus, left parahippocampal gyrus, left insula,
bilateral superior temporal gyrus (STG), right pre- and post- central
gyrus. NOSHIFT vs. LAPSE activated significantly the right lingual
gyrus and the calcarine gyrus bi- laterally. We finally compared the
two rhythmically ill- formed structures LAPSE vs. CLASH and found
significant activation in the right supplementary motor area and premotor cortex.
Our findings are in line with previous fMRI findings on rhythmic
processing. Firstly, the superior temporal gyrus is robustly involved in
rhythmic processing irrespective of the task of the study: semantic and
metric task (Rothermich, Kotz 2013), speech perception of violated vs.
correctly stressed words (Domahs, Klein, Huber, Domahs 2013) and in
explicit and implicit isochronous speech rhythm tasks (Geiser, Zaehle,
Jancke, Meyer 2008). To this we can add with our careful-listening task
comparable to the semantic task of (Rothermich, Kotz 2013). Our
contribution is that we found activations for the implicit task of careful
listening which have only been found for explicit tasks before: these
include the left insula, the bilateral precentral gyrus, the precuneus and
the parahippocampal gyrus. Lastly, the activation in the supplementary
motor areas completes the picture of rhythm processing regions in the
brain. This finding is of special interest since it was strong for the
comparison within rhythmically ill-formed conditions LAPSE vs.
CLASH. This might be due to the fact that stress lapse structures contain
two violations, i.e. a deviation from word stress which is not rhythmically licensed, while the clash structures contain only a rhythmically
deviation but keep the original word stress.
The differences in activations found for well-formedness show that
even in implicit rhythmical processing the language parser is sensitive
to subtle deviations in the alternation of strong and weak beats. This is
particularly evident in the STG activation associated with the processing of linguistic prosody, SMA activation which has been suggested
to be involved in temporal aspects of the processing of sequences of
strong and weak syllables, and IFG activation associated with tasks
requiring more demanding processing of suprasegmental cues.
References
Bohn K, Knaus J, Wiese R, Domahs U (2013) The influence of rhythmic
(ir) regularities on speech processing: evidence from an ERP study
on German phrases. Neuropsychologia 51(4):760771
Domahs U, Klein E, Huber W, Domahs F (2013) Good, bad and ugly
word stressfMRI evidence for foot structure driven processing of
prosodic violations. Brain Lang 125(3):272282
Geiser E, Zaehle T, Jancke L, Meyer M (2008) The neural correlate of
speech rhythm as evidenced by metrical speech processing.
J Cogn Neurosci 20(3):541552
Liberman M, Prince A (1977) On stress and linguistic rhythm. Linguistic Inquiry 249336
Rothermich K, Kotz SA (2013) Predictions in speech comprehension:
fMRI evidence on the metersemantic interface. Neuroimage
70:89100

Numbers affect the processing of verbs denoting


movements in vertical space
Martin Lachmair1, Carolin Dudschig 2, Susana Ruiz Fernandez 1,
Barbara Kaup 2
1
Leibniz Knowledge Media Research Center (KMRC), 2 Psychology,
University of Tubingen, Germany
Recent studies have shown that nouns referring to objects that typically appear in the upper or lower visual field (e.g., roof vs. root) or

S115
verbs referring to movements in vertical space (e.g., rise vs. fall)
facilitate upwards or downwards oriented sensorimotor processes,
depending on the meaning of the word that is being processed
(Lachmair, Dudschig, De Filippis, de la Vega and Kaup 2011;
Dudschig, Lachmair, de la Vega, De Filippis and Kaup 2012). This
finding presumably reflects an association of words with experiential
traces in the brain that stem from the readers interactions with the
respective objects and events in the past. When later the words are
being processed in isolation, the respective experiential traces become
reactivated, providing the possibility of interactions between language
processing and the modal systems (cf. Zwaan and Madden 2005).
Such interactions are also known from other cognitive domains, such
as for instance number processing (Fischer, Castel, Dodd and Pratt
2003). Here high numbers facilitate sensorimotor processes in upper
vertical space and low numbers in lower vertical space (Schwarz and
Keus 2004). The question arises whether the observed spatial-association effects in the two domains are related. A recent study
conducted in our lab investigated this question. The reasoning was as
follows: If number processing activates spatial dimensions that are
also relevant for understanding words, then we can expect that processing numbers may influence subsequent lexical access to words.
Specifically, if high numbers relate to upper space, then they can be
expected to facilitate understanding of an up-word such as bird.
The opposite should hold for low numbers which should facilitate the
understanding of a down-word such as root. This is exactly what
we found in an experiment in which participants saw one of four
digits (1,2,8,9) prior to the processing of up- and down-nouns in a
lexical decision task (Lachmair, Dudschig, delaVega and Kaup 2014).
In the present study we aimed at extending these findings by investigated whether priming effects can be observed for the processing of
verbs referring to movements in the vertical dimension (e.g., rise vs.
fall).
Method
Participants (N = 34) performed a lexical decision task with 40 verbs
denoting an up- or downwards oriented movement (e.g., rise vs. fall)
and 40 pseudowords. Verbs were controlled for frequency, length and
denoted movement direction. The words were preceded by a number,
one of the set {1, 2, 8, 9}. Correctly responding to the verbs required a
key press on the left in half of the trials and on the right in the other
half. The order of the response mapping was balanced across participants. Each trial started with a centered fixation cross (500 ms),
followed by a number (300 ms). Afterwards the verb/pseudo-word
stimulus appeared immediately and stayed until response. Response
times (RTs) were measured as the time from stimulus onset to the key
press response. Each stimulus was presented eight times, resulting in
a total of 640 experimental trials (320 verb-trials + 320 pseudo wordtrials), subdivided into 8 blocks, separated by a self-paced break with
error information. Each experimental half started with a short practice
block. To ensure the processing of the digits, the participants were
informed beforehand that they should report the numbers they had
seen in a short questionnaire at the end of the experiment. The design
of the experiment was a 2 (number magnitude: low vs. high) x 2 (verb
direction: up vs. down) x 2 (response mapping) design with repeated
measurements in all variables.
Results
The data of six participants were excluded due to a high number of
errors ([10 %) in all conditions. Responses to pseudo words,
responses faster than 200 ms, and errors were excluded from further
analyses. We found no main effect of number magnitude (Fs \ 1.8),
no effect of response mapping (Fs \ 1), but a main effect of verb
direction with faster responses for down- compared to up-verbs
(F1(1,26) = 5.61, p \ .05; F2 \ 1; 654 ms vs. 663 ms). Interestingly,
we also found a significant interaction of number magnitude and verb
direction, F1(1,26) = 5.23, p \ .05; F2(1,38) = 3.46, p = .07, with
slower responses in congruent compared to incongruent trials [up
verbs: 668 ms vs. 658 ms; down verbs: 654 vs. 654 ms]. To obtain

123

S116
more information with regard to whether this effect depends on how
deeply the number primes were being processed, we conducted post
hoc analyses. Participants were subdivided into two groups, with
Group 1 including all participants who had correctly reported the
numbers at the end of the experiment (N = 14), and Group 2 including
the remaining participants. Group 1 again showed an interaction
between number magnitude and verb direction (F1(1,26) = 9.47,
p \ .01; F2(1,38) = 3.38, p = .07), however Group 2 did not
(Fs \ 1). Mean RT of both groups are displayed in Fig. 1.
Discussion
The present findings show that reading verbs denoting an up- or
downwards oriented movement is affected by the preceding processing of high and low numbers. As such, the presented findings
provide evidence for the view that spatial associations observed in
number and word processing may share a common basis (Barsalou
2008). Interestingly, in contrast to Lachmair et al. (2014), the results
show interference instead of facilitation in spatially congruent conditions. Possibly, this deviating findings reflect the fact that verbs
referring to movements in vertical space (such as rise or fall) are
rather complex and implicitly refer to two spatial locations, namely
the starting point and the end point of the movement. Maybe participants dynamically simulated the described movements beginning
with the starting point. Considering that verbs are assumed to trigger
complex comprehension processes (see Vigliocco, Vinson, Druks,
Barber and Cappa 2011), it seems plausible to assume that our
experimental task may have tapped into early rather than late simulation processes. This in turn may explain why interference rather
than facilitation was observed in the present experiments.
One could of course argue that this explanation is not very convincing considering that the study by Dudschig et al. (2012) also
presented participants with verbs referring to upwards or downwards
directed movements (as in the current study) and nevertheless
observed facilitation in spatially congruent conditions, not interference. However, we think that differences concerning temporal aspects
of the experimental task may explain the different results. The study
by Dudschig et al. (2012) investigated the speed with which upwards

Cogn Process (2014) 15 (Suppl 1):S1S158


or downwards directed movements could be initiated by the participants after having processed the motion verbs, whereas the current
study investigated the effects of number primes that were presented
prior to the processing of the motion verbs. Thus, it seems well
possible that the task in Dudschig et al. tapped into later simulation
processes than the task in the current study.
Of course, future studies are needed to directly investigate this
post hoc explanation of our results. One possibility would be to
change the temporal aspects of the experimental task in the current
paradigm such that participants spend more time processing the verbs,
giving later simulation processes a chance to occur. Another possibility would be to present participants with verbs in the past perfect
denoting a movement that has already taken part in the past (e.g.,
gestiegen [had risen]). Maybe participants then focus more on the end
point of the denoted movement, leading to facilitation effects in
spatially congruent conditions.
One further aspect of the results obtained in the present study calls
for discussion. The interaction effect between number and word
processing was only observed for those participants who could correctly recall the number primes at the end of the experiment. One
possible explanation is that the number primes need to be processed at
a certain level of processing in order for them to affect the subsequent
processing of direction associated words. This would suggest that the
interaction effect between number and word processing is not fully
automatic. Another possibility is that those participants who did not
recall the number primes correctly at the end of the experiment
simply did not adequately follow the instructions and strategically
ignored the number primes because they were of no relevance to the
experimental task. If so, it would be of no surprise that these participants did not experience any interference from the number primes,
and no further conclusions could be drawn. One interesting manipulation for follow-up studies to the current experiment would be to
present the number primes for a very short duration and/or framed by
a visual mask (see Dudschig et al. 2014). An interaction effect
between number and word processing under these conditions would
provide strong evidence for the view that spatial associations in
number and word processing indeed share a common basis independent of any strategic behavior of inter-relating the two domains.
Acknowledgments
We thank Elena-Alexandra Plaetzer for her assistance in data collection. This work was supported by a grant from the German
Research Foundation (SFB 833/B4 [Kaup/Leuthold]).

Fig. 1 Mean RT of correct responses as a function of verb direction


(up vs. down) and number magnitude (high vs. low). Participants in
Group 1 correctly recalled the number primes at the end of the
experiment, participants in Group 2 did not. Error bars represent the
95 % confidence interval for within-subject designs (Masson and
Loftus 2003)

123

References
Barsalou LW (2008) Grounded cognition. Ann Rev Psychol
59:617645
Dudschig C, de la Vega I, De Filippis M, Kaup B (2014) Language
and vertical space: on the automaticity of language action
interconnections. Cortex 58: 151160
Dudschig C, Lachmair M, de la Vega I, De Filippis M, Kaup B (2012)
Do task-irrelevant direction-associated motion verbs affect
action planning? Evidence from a Stroop paradigm. Mem Cogn
40(7):10811094
Fischer MH, Castel AD, Dodd MD, Pratt J (2003) Perceiving numbers
causes spatial shifts of attention. Nat Neurosci 6(6):555556
Lachmair M, Dudschig C, de la Vega I, Kaup B (2014) Relating numeric
cognition and language processing: do numbers and words share a
common representational platform? Acta Psychol 148: 107114
Lachmair M, Dudschig C, De Filippis M, de la Vega I, Kaup B (2011)
Root versus roof: automatic activation of location information
during word processing. Psychon B Rev 18: 11801188
Masson, MEJ, Loftus GR. (2003) Using confidence intervals for
graphically based data interpretation. Can J Exp Psychol 57:
203220

Cogn Process (2014) 15 (Suppl 1):S1S158


Schwarz W, Keus IM (2004) Moving the eyes along the mental
number line: comparing SNARC effects with saccadic and
manual responses. Percept Psychophys 66: 651664
Vigliocco G, Vinson DP, Druks J, Barber H, Cappa SF (2011) Nouns
and verbs in the brain: a review of behavioural, electrophysiological, neuropsychological and imaging studies. Neurosci
Biobehav Rev 35(3): 407426
Zwaan RA, Madden CJ (2005) Embodied sentence comprehension. In
Pecher D, Zwaan RA (eds) Grounding cognition: the role of
perception and action in memory, language, and thinking.
Cambridge University Press, Cambridge, pp 224245

Is joint action necessarily based on shared intentions?


Nicolas Lindner1, Gottfried Vosgerau
Department of Philosophy, Heinrich-Heine-Universitat Dusseldorf,
Dusseldorf, Germany
Abstract: Is joint action necessarily based on shared intentions?
Regarding joint action, the majority of researchers in the field assumes
that underlying collective or joint intentions are the glue that holds the
respective actions of the participants together (Searle 1990; Bratman
1993; Tuomela 1988). A major part of the debate thus focuses on the
nature of these particular intentions. In this talk, we will describe one
major account and criticize that this account cannot explain joint
action as displayed by small children. Based on this critique, we will
formulate an alternative view, which suggests that some nondemanding cases of (seemingly) joint action (including those displayed by small children) are rather effects of the lack of representing
ones own intentions as ones own (it is just represented as an intention
that is there). This account has the advantage of offering a way to
specify the pivotal role that joint action is supposed to play in the
acquisition of socio-cognitive abilities.
A prominent approach to joint intentions by Michael Bratman
(1993, 2009) construes shared intention as he calls them, as being
derived from singular intentions, a conception of which he developed
in his book Intention, Plans, and Practical Reason from 1987. In a
nutshell, Bratman characterizes intentions in this book as conductcontrolling pro-attitudes, a term by Davidson (1980) describing an
agents mental attitude directed toward an action under a certain
description. For Bratman, intentions are typically parts of larger plans
concerning future actions. He regards these plans as mental states,
which often are only partial and involve a hierarchy of general and
more specific intentions.
Bratmans account of shared intention (1993) relies on his conception of individual intentions, the attitudes of the participants in joint
action, and their interrelations and is thus constructivist in nature. In his
account, a shared intention doesnt consist in a particular type of
intention. He proposes a complex of intentions and attitudes thatif
they have the appropriate content and function properlydo the job of a
shared intention and can be identified with it. This complex is supposed
to do three interrelated jobs: It should 1) coordinate the agents intentional actions in such a way that the joint goal can be achieved by acting
together, 2) help in coordinating the relevant planning of the participants, and 3) provide a framework that helps to structure relevant
bargaining. According to Bratman, fulfilling this three-fold function is a
necessary condition for any kind of shared intention.
With regard to a complex that does all of these jobs, Bratman
suggests three sufficient conditions to describe a substantial account
of shared intention. 1) There should be an individual intention of each
participant of the joint action in the form of I intend that we J. 2)
These individual intentions should be held in part because of and in
accordance with the relevant intentions of the other partakers in the
joint action. 3) The two aforementioned conditions have to be common knowledge between the participants.

S117
It is Bratmans main argument that the described complex of
interrelated intentions and attitudes functions together as one characteristic form of shared intention (2009). Due to the constructivist
and functionalist nature of his approach it may yet not be the only
kind of shared intention. The author himself admits the possibility
that there may be other kinds and that shared intention may thus be
multiply realizable.
Bratmans conception of shared intention seems to be a convincing characterization of how cognitively mature agents act together. In
contrast to this, some researchers doubt whether his approach is suited
to account for joint action in young children. This issue is closely
related to the developmental onset of socio-cognitive abilities. The
common knowledge condition of Bratmans substantial account presupposes that the system of intentions in question is in the public
domain. Furthermore, there has to be mutual knowledge of the others
intentions plus knowledge of the others knowledge. The cognitive
basis for common knowledge thus rests on a variety of capacities. The
agents in joint action ought to have: a) the ability to form beliefs and
higher-order beliefs (beliefs about beliefs), b) the ability to attribute
mental states to themselves and others, and c) the capacities needed
for recursive mindreading. All in all, they must thus have a robust
theory of mind. With respect to this, critics of Bratmans account state
that he characterizes shared intention in a way that is too complex to
accommodate for joint action of young children (Tollefsen 2005;
Pacherie 2011; Butterfill 2012).
Tollefsens (2005) critique is based on evidence suggesting that
young children lack a robust theory of mindparticularly, a proper
understanding of others beliefs. This evidence comes from different
false-belief tasks (Wellman et al. 2001). Without such a proper
understanding of other agents beliefs, so Tollefsen argues, the
common knowledge condition in Bratmans conception could not be
fulfilled. Hence, children could not take part in shared intentional
activities of such a sort. Similarly, Pacherie (2011) claims that
Bratmans shared intention requires cognitively sophisticated agents
who have both concepts of mental states like intentions and attitudes
and the ability to represent the mental states of others. From her point
of view, small children lack such fully developed mentalizing and
meta-representational capacities. Therefore, shared intention cannot
account for joint action in young children. A problem for Bratmans
account thus stems from the fact that there is evidence of children
engaging in joint activities before they develop the putatively necessary socio-cognitive abilities. Findings from developmental
psychology (Brownell 2011) suggest that children engage in different
forms of joint action together with adults from around 18 months of
age and, from the end of the 2nd year of life, also with peers.
We will show that said criticisms rest on rather shaky grounds.
First, they both attack Bratmans substantial account, which only
presents sufficient conditions for the presence of a shared intention.
Thus, there might be other constructions in a Bratmanian sense that
avoid these flaws. Furthermore, both critiques rely on controversial
empirical claims about the onset of childrens mindreading capacitiesfor example, with respect to the starting point of false belief
understanding in children (Bargailleon et al. 2010; De Bruin and
Newen 2012) and the development of an early understanding of
common knowledge (Carpenter 2009). Thus, the critiques by Pacherie
and Tollefsen do not present convincing arguments against Bratmans
account per se. Still, they highlight an important issue by questioning
the cognitive standards imposed on participating agents in joint action.
Butterfill (2012) takes a different route in criticizing Bratmans
approach. His objection focuses on the necessary conditions for
shared intention: the functional roles that shared intention is supposed to play. Butterfill claims that the coordinating and structuring
of relevant bargaining, which shared intention is supposed to
ensure, sometimes require monitoring or manipulating of other
agents intentions. With regard to accounts that stress the importance of joint action for cognitive and socio-cognitive development

123

S118
in infants (Tomasello et al. 2005; Moll and Tomasello 2007), joint
action would thus presuppose psychological concepts and capacities
whose development it should explain in the first place. The contribution of joint activities to the development of our cognitive
capacities is the core argument of Tomasello and colleagues
hypothesis on shared intentionality. As long as one stresses its role
in cognitive and socio-cognitive development, Butterfill claims that
early joint action of children can hence not involve shared intention
in Bratmans sense.
Bratmans conception can thus not account for childrens joint
actions. At least, if it is supposed to explain the development of their
understanding of minds. Yet, his approach is suited to explain joint
action of adults as mature cognitive agents. Especially, this is the case
for those kinds of joint action that involve planning, future-directed
intentions and deliberation.
We will conclude our talk by offering an alternative account of
childrens ability for joint action, which turns, in a way, the circularity
upside down: If joint action is indeed pivotal for the development of
socio-cognitive abilities, they cannot be developed in small children.
Thus, joint action as displayed by small children has to be grounded
in other abilities. Our proposal is that it is the lack of the concept of a
mental state (esp intentions) that produces behavior which looks like
joint action (we will not discuss whether the term should be applied to
these cases or not). If a child has not yet learned that a mental state is
something that belongs to single persons, it cannot be said to have
acquired the concept of a mental state. However, the child might be,
at the same time, able to introspect the content of the own intentions,
such that the childs introspection can be paraphrased as there is the
intention to J. In other words, the child has not yet learned to make a
difference between the own intentions and those of others. The effect
of this lack of abilities will result in a behavior that looks like joint
action (at least in cases in which the intention of the adult and the
child match). Such behavior might be initiated by different triggers in
the surrounding world that establish a common goal in the first place.
Candidates for this could be pointing gestures, affordances and
alignment between the agents.
This account does not only offer new perspectives for the
explanation of autism (Frith 1989; Vosgerau 2009), it also offers a
way to specify the thesis that (seemingly) joint action is pivotal to
the acquisition of socio-cognitive abilities: Joint action sets up an
environment in which children are able to gradually learn that
intentions can differ between individuals. The result of this learning
phase will ultimately be the acquisition of the concept of a mental
state, which includes that mental states belong to persons and that
thus mental states can differ between individuals (this knowledge
is then tested in the false-belief-task). In other words, the learning
of a theory of mind starts with acquiring the concept of a mental
state, and this concept can be best acquired in (seemingly) joint
action scenarios, in which children directly experience the effects of
differing mental states (intentions and beliefs). Accordingly,
empirical research has already suggested that the acquisition of
mental state concepts is dependent on the use of mental state terms
(Rakoczy et al. 2006), which are presumably most often used in
joint action scenarios.
Some empirical results have been interpreted to show that very
young children already possess the socio-cognitive abilities needed
for cooperative activities and act on a rather sophisticated understanding of the mental states of self and other (Carpenter 2009).
Following this line of argument, researchers propose that infants
already understand others knowledge and ignorance (Liszkowski
et al. 2008), they can act on a shared goal (Warneken et al. 2006;
Warneken and Tomasello 2007), and exploit the common ground they
shared with an adult (Liebal et al. 2009; Moll et al. 2008). While
appreciating the importance of this research as such, we will present
alternative interpretations of these findings that are cognitively less
demanding and thus consistent with our proposal.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Our alternative account is primarily designed to explain the
behavior of small children. However, we point to the possibility that
non-demanding cases of cooperation (e.g. to buy an article in a grocery) can be explained by similar mechanisms in adults. In such cases,
adults would not explicitly represent their own intentions as their own
intentions, thereby generating actions that are structurally similar to
those of small children. Nevertheless, other more complex cases of
joint action certainly also exist in adults. In the light of our proposal,
we thus also conclude that Bratmans account of shared intention
should not be abandoned altogether. Although a uniform account of
joint action for both children and mature agents would have the
benefits of being parsimonious, candidates for such a comprising
explanation (Tollefsen and Dale 2012; Vesper et al. 2010; Gold and
Sugden 2007) do not seem to have the resources to explain the
development of qualitatively differing stages of joint action.
References
Baillargeon R, Scott RM, He Z (2010) False-belief understanding in
infants. Trend Cogn Sci 14(3):110118. doi:10.1016/j.tics.2009.
12.006
Bratman M (1987) Intention, plans, and practical reason. C S L I
Publications/Center for the Study of Language & Information
Bratman M (1993) Shared intention. Ethics 104(1):97113
Bratman M (2009) Shared agency. In: Philosophy of the social sciences: philosophical theory and scientific practice. Cambridge
University Press
Brownell CA (2011) Early developments in joint action. Rev Philos
Psychol, 2(2):193211. doi:10.1007/s13164-011-0056-1
Butterfill S (2012) Joint action and development. Philos Quart
62(246):2347. doi:10.1111/j.1467-9213.2011.00005.x
Carpenter M (2009) Just how joint is joint action in infancy? Top
Cogn Sci 1(2):380392. doi:10.1111/j.1756-8765.2009.01026.x
Davidson D (1980/2001). Essays on actions and events, 2nd ed.
Oxford University Press, USA
De Bruin LC, Newen A (2012) An association account of false belief
understanding. Cognition 123(2):240259. doi:10.1016/j.cognition.
2011.12.016
Frith U (1989/2003) Autism: explaining the enigma, 2nd edn.
Blackwell Publ, Malden
Gold N, Sugden R (2007) Collective intentions and team agency.
J Philos 104(3):109137
Liebal K, Behne T, Carpenter M, Tomasello M (2009) Infants use
shared experience to interpret pointing gestures. Dev Sci
12(2):264271. doi:10.1111/j.1467-7687.2008.00758.x
Liszkowski U, Carpenter M, Tomasello M (2008) Twelve-month-olds
communicate helpfully and appropriately for knowledgeable and
ignorant partners. Cognition 108(3):732739. doi:10.1016/
j.cognition.2008.06.013
Moll H, Carpenter M, Tomasello M (2007) Fourteen-month-olds
know what others experience only in joint engagement. Dev Sci
10(6):826835. doi:10.1111/j.1467-7687.2007.00615.x
Moll H, Tomasello M (2007) Cooperation and human cognition: the
Vygotskian intelligence hypothesis. Philos Trans R Soc B Biol
Sci 362(1480):639648. doi:10.1098/rstb.2006.2000
Pacherie E (2011) Framing Joint Action. Rev Philos Psychol
2(2):173192
Rakoczy H, Tomasello M, Striano T (2006) The role of experience and
discourse in childrens developing understanding of pretend play
actions. Br J Dev Psychol 24(2):305335. doi:10.1348/02615100
5X36001
Searle J (1990) Collective intentions and actions. In Cohen P, Morgan
J, Pollack ME (Hrsg.), Intentions in Communication. Bradford
Books, MIT Press, Cambridge
Tollefsen D (2005) Lets pretend! children and joint action. Philos
Soc Sci 35(1):7597. doi:10.1177/0048393104271925

Cogn Process (2014) 15 (Suppl 1):S1S158


Tollefsen D, Dale R (2012) Naturalizing joint action: a process-based
approach. Philos Psychol 25(3):385407. doi:10.1080/095150
89.2011.579418
Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of cultural cognition.
Behav Brain Sci 28(5):675691
Tuomela R, Miller K (1988) We-intentions. Philos Stud 53(3):
367389
Vesper C, Butterfill S, Knoblich G, Sebanz N (2010) A minimal
architecture for joint action. Neural Netw 23(89):9981003. doi:
10.1016/j.neunet.2010.06.002
Vosgerau G (2009), Die Stufentheorie des Selbstbewusstseins und
ihre Implikationen fur das Verstandnis psychiatrischer Storungen. J Fur Philos Psychiatrie 2
Warneken F, Chen F, Tomasello M (2006) Cooperative activities in
young children and chimpanzees. Child Dev 77(3):640663. doi:
10.1111/j.1467-8624.2006.00895.x
Warneken F, Tomasello M (2007) Helping and cooperation at 14 months
of age. Infancy 11(3):271294. doi:10.1111/j.1532-7078.2007.tb
00227.x
Wellman HM, Cross D, Watson J (2001) Meta-analysis of theory-ofmind development: the truth about false belief. Child Dev
72(3):655684. doi:10.1111/1467-8624.00304

A general model of the multi-level architecture


of mental phenomena. Integrating the functional
paradigm and the mechanistic model of explanation
Mike Ludmann
University of Duisburg-Essen, Germany
The central aim of this contribution is to provide a conceptual
foundation of psychology in terms of the formulation of a general
model of an architecture of mental phenomena. It will be shown that
the mechanistic model of explanation (Bechtel, Richardson 1993;
Machamer, Darden und Craver 2000; Bechtel 2007, 2008, 2009;
Craver 2007) offers an appropriate founding approach to psychology
as well as their integration within the framework of cognitive and
brain sciences. Although the computational model of mind provides
important models of mental properties and abilities, it fails to provide
an adequate multi-level model of mental properties. The mechanistic
approach, however, can be regarded as a conceptually coherent and
scientifically plausible extension of the functional paradigm (see
Polger 2004; Eronen 2010). While a functionalist conception of the
mind mostly focuses on the mysterious relationship of mental properties as abstract or second-order properties to their physical realizers
(if such issues are not generally excluded), the mechanistic approach
model allows establishing a multi-level architecture of mental properties and their unambiguous localization in the overall scientific
system.
The functionalist models of the mind are usually based on the
computer metaphor of man that construes human beings as
information processing systems. They postulate relatively abstract
theoretical models of mental processes that allow generally very
reliable predictions of subsequent behavior of the system under
consideration of known input variables. The models provide a way
to put some cognitive (functionalist) operators into the black box
of behaviorism like thinking, decision making, planning. Taking
into account the current interdisciplinary research on the mind, the
functionalist conception of mind defining these operators as
abstract information processing, which can be described independently of neuroscientific constraints, is a problem. If the question
is raised, how the connection between functional models and the
mind is established, Marr (1982) proposes that computational

S119
processes of his model of visual information processing (e.g., the
generation of a three-dimensional depth structure) are specified by
particular formal algorithms, which are physically implemented in
the human brain. Therefore it is recognized, that functional processes also have a physical reality, but functional models fail to
provide a framework for the exact circumstances, conditions,
constraints, etc. of such implementation relations. Admittedly, the
connectionist approach has fulfilled this task better by generating
models of neural networks that are more likely to describe the
actual processes in our minds (see Rumelhart, McClelland 1986;
Smolensky 1988), but ultimately does not offer clear multi-level
model of the mind either.
It is important that the way of physical implementation as
described by Marr is usually understood in terms of physical realization. Therefore, the causal profile of an abstract functional property
(behavioral inputs and outputs) must be determined by a conceptual
analysis in order to identify those physical (neural) structures that
have exactly that causal profile (cf. Levine 1993; Kim 1998, 2005).
Maybe the realization theory is intended to provide an explanatory
approach of how abstract, functionally characterized properties as
postulated by cognitive sciences can be a part of the physical world.
An abstract, theoretical phenomenon is realized (quasi materialized)
in this sense through concrete physical conditions, while it can be
different physical systems to bring the computational or connectionist
formalism into the world (see Fodor 1974). The ontological status of
an abstract functional description or a second-order property remains
highly questionable.
In contrast, much is gained if the functionalist approach is extended
and partially adjusted by the mechanistic rendition of mental properties. A mechanism can be understood as a set of activities organized
such that they exhibit the phenomenon to be explained (Craver 2007,
p 5). The mechanistic approach individuates a phenomenon about
which tasks or causal roles it holds for the system concerned. So if the
mechanism behind a phenomenon is explored, one has explained the
phenomenon itself. As Bechtel (2008) says, a mechanism is a structure
performing a function in virtue of its component parts, component
operations, and their organization (p 13). Figure 1 shows the general
formal structure of mechanistic levels.
To the explanatory phenomenon at the top of the mechanism
(S) is w. By contributed suffix -ing and the course of the arrows
the process-related nature of mechanisms should be expressed. The
phenomenon w can decomposed into subcomponents. Craver used
X as a term for the functioning as a component of W entities and u
as a name for their activity patterns. While functionalism respectively realization theory focuses on the relationship of abstract
information processing and certain processes in the brain, the

Fig. 1 Formal structure of a mechanism (from Craver 2007, p 189)

123

S120
mechanistic approach extends this concern to a question of
embedding a given (mental) phenomenon in a structural hierarchy
of natural levels of organization characterized by the part-whole
relationship.
If we take a cognitive property like spatial orientation or spatial
memory, it is not simply the question of which brain structure realized
this property, rather than it has to be shown which causally relevant
mechanisms are installed at various levels of a mereologically construed mechanistic hierarchy (see Craver 2007). Thus the functional
structure, as described by cognitive science, is undoubtedly an
explanatorily essential description of this mental property. So we can,
for example, explain the behavior of a person in a given situation in
terms of the components and predictions of working memory theory
(Baddeley 1986). But the same mental event can described at different levels of organization. In this way the mental event has a
neuronal structure which, among other things, consists of a hippocampal activity. In addition, the mental property has a molecular
reality which is primarily characterized by the NMDA receptor
activation and so on. So a mental phenomenon has a (potentially
infinite) sequence of microstructures, none of them can be understood
as the actual reality of the target property.
From the fact that the installed part-whole relation implies a
spatio-temporal coextensivity of the different microstructures, so I
will argue, it can be deduced that we have a mereologically based
form of psychophysical identity. Nevertheless this identity thesis does
not have the crude reductionistic implications like the classical
philosophical thesis of psychophysical identity (see Place 1956; Smart
1959). Likewise, it can be shown that the dictum that only functionalism guarantees the autonomy of psychology (Fodor 1974)
and it is jeopardized by every conception of psychophysical identity,
is fundamentally wrong. Quite the opposite is true. If we strictly
follow Fodor, then psychological concepts and theories are to be
preferred, which have a small inter-theoretical fit or low degree of
correspondence to physical processes. Especially under these conditions, psychology risks to fall prey to crudely reductionist programs
such as the new wave reductionism. This means that an inferior theory, which does not have good inter-theoretical fit, should be replaced
by lower-level theories (Bickle 1998, 2003). And even worse because
of the rejection of any psychophysical identity conception psychologist would have to accept a microphysicalism, entailing micro levels
have an ontological and explanatory priority. On the basis of the
mechanistic approach (and its identity-theoretical interpretation) both
the integrity of psychology can be justified as well as the inter-theoretical fit of their concepts and theories. Mental properties form a
higher level in the natural organization of a (human) organism, but at
the same time they form a mutually inseparable unit with its physical
microstructures.
It is the mental properties that characterize diffuse nexus of neuronal events in terms of certain functional units in the first place. In
this sense the mind is the structure-forming or shaping principle at all
levels of the natural organization of the brain. Not despite but because
of its coextensivity with diverse natural organizational levels is the
mental both a real and a causally potent phenomenon. Despite the fact
that with recourse to these micro levels of e.g. neurobiology some
characteristics of mental phenomena can be well explained, there is
neither an ontological nor an explanatory primacy of the micro levels
or their explanations. The adoption of such primacy is merely the
product of a cognitive bias, a misguided interpretation of scientific
explanations and the process of scientific knowledge discovery
(Wimsatt 1976, 1980, 2006).

References
Baddeley AD (1986) Working memory. Oxford University Press,
Oxford

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Bechtel W (2007) Reducing psychology while maintaining its autonomy via mechanistic explanation. In M. Schouten, H. Looren de
Jong (eds) The matter of the mind: philosophical essays on psychology, neuroscience and reduction. Blackwell, Oxford,
pp 172198
Bechtel W (2008) Mental mechanisms: philosophical perspectives on
cognitive neuroscience. Psychology Press, New York
Bechtel W (2009) Looking down, around, and up: Mechanistic
explanation in psychology. Philos Psychol 22, 543564
Bechtel W, Richardson RC (1993) Discovering complexity: decomposition and localization as strategies in scientific research. MIT
Press, Cambridge
Bickle J (1998) Psychoneural reduction: the new wave. MIT Press,
Cambridge
Bickle J (2003) Philosophy and neuroscience: a ruthlessly reductive
account. Kluwer, Dordrecht
Craver CF (2007) Explaining the brain. Mechanisms and the Mosaic
Unity of Neuroscience. Clarendon Press, Oxford
Eronen MI (2010) Replacing functional reduction with mechanistic
explanation. Philosophia Naturalis 47/48:125153
Fodor FA (1974) Special sciences (or the Disunity of Science as a
Working Hypothesis). Synthese 28:97115
Kim J (1998) Mind in a physical world. MIT Press, Cambridge
Kim J (2005) Physicalism, or something near enough. Princeton
University Press, Princeton
Levine J (1993) On leaving out what its like. In: Davies M,
Humphreys GW (eds) Consciousness. Psychological and Philosophical Essays. Blackwell, Oxford, S. 121136
Machamer P, Darden L, Craver CF (2000) Thinking about mechanisms. Philos Sci 67:125
Marr D (1982) Vision. Freeman and Company, New York
Place UT (1956) Is consciousness a brain process? Br J Psychol
47:4450
Polger, T. W. (2004) Natural Minds. Cambridge: MIT Press.
Rumelhart DE, McClelland JL (1986) Parallel distributed processing:
explorations in the microstructure of cognition. MIT Press,
Cambridge
Smart JJC (1959) Sensations and brain processes. Philos Rev
68:148156
Smolensky P (1988) On the proper treatment of connectionism.
Behav Brain Sci 11:123
Wimsatt WC (1976) Reductionism, levels of organization, and the
mindbody problem. In Globus G, Maxwell G, Savodnik I (eds)
Consciousness and the Brain. Plenum, New York, pp 205267
Wimsatt WC (1980) Reductionistic research strategies and their biases
in the units of selection controversy. In Nickles T (ed), Scientific
discovery: case studies. D. Reidel, Dordrecht, pp 213259
Wimsatt WC (2006) Reductionism and its heuristics: making methodological reductionism honest. Synthese 151:445475

A view-based account of spatial working and long-term


memories: Model and predictions
Hanspeter A. Mallot, Wolfgang G. Rohrich, Gregor Hardiess
Cognitive Neuroscience, Dept. of Biology, University of Tubingen,
Germany
Abstract
Space perception provides egocentric, oriented views of the environment from which working and long-term memories are constructed.
Allocentric (i.e. position-independent) long-term memories may be
organized as graphs of recognized places or views but the interaction
of such cognitive graphs with egocentric working memories is
unclear. Here, we present a simple coherent model of view-based
working and long-term memories, and review supporting evidence

Cogn Process (2014) 15 (Suppl 1):S1S158


from behavioral experiments. The model predicts (i) that within a
given place, memories for some views may be more salient than
others, (ii) imagery of a target place should depend on the location
where the recall takes place and (iii) that ecall avors views of the
target place which would be obtained when approaching it from the
current recall location.
Keywords
Spatial cognition, Working memory, Imagery, View-based
representation, Spatial updating
Introduction
Sixteen years before his famous paper on the cognitive map, Edward
C. Tolman gave an account of rat spatial learning in terms of what he
called the means- ends-field (Tolman 1932, 1948) of which a key
diagram is reproduced in Fig. 1. The arrows indicate means- endsrelations, i.e. expectations that a rat has learned about which objects
can be reached from which other ones, and how. In modern terms, the
Means objects (MO in the figure) are intermediate goals or representational states that the rat is in or expects to get into. This graphapproach to spatial memory has later been elaborated by Kuipers
(1978) and is closely related to the route vs. maps distinction discussed by OKeefe and Nadel (1978). Behavioral evidence for graphlike organization of human spatial memory has been reviewed e.g., by
Wang, Spelke (2002) or Mallot, Basten (2009).
The graph-based approach to cognitive mapping, powerful as it
may appear, leaves open a number of important questions, two of
which will be addressed in this paper. First, what is the nature of the
nodes of the graph? In Tolmans account, the means objects are
intervening objects, the passage along each object being a means
to reach the next one or the eventual goal. Kuipers (1978) thinks of
the nodes as places defined by a set of sensory stimuli prevailing at
each place. The resulting idea of a place-graph can be relaxed to a
view-graph in which each node represents an observer pose (position
plus orientation), again characterized by sensory input, which are now
egocentric, oriented views (Scholkopf, Mallot 1995).
The second question concerns the working memory stage needed,
among other things, as an interface between the cognitive graph and
perception and behavior, particularly in the processes of planning
routes from long-term memory and of encoding new spatial information into long-term memory. For such working memory structures,
local, metric maps are generally assumed, representing objects and
landmarks at certain egocentric locations (Byrne et al. 2007, Tatler,
Land 2011, Loomis et al. 2013). While these models offer plausible
explanations for many effects in spatial behavior, they are hard to
reconcile with a view-based rather than object-based organization
long-term memory which will have to interact with the working

Fig. 1 Tolmans notion of spatial long-term memory as a meansends-field (from Tolman 1932). This seems to be the first account of
the cognitive map as a graph of states (objects) and actions
(means-ends-relations) in which alternative routes can be found by
graph search

S121
memory. As a consequence, computationally costly transformations
between non-egocentric long-term memories and egocentric working
memories are often assumed.
In this paper, we give a consistently view-based account of spatial
working- and long-term memories and discuss a recent experiment
supporting the model.
View-based spatial memory
Places
By the term view, we denote an image of an environment taken at a
view-point x and oriented in a direction u. Both x and u may be
specified with respect to a reference frame external to the observer,
but this is not of great relevance for our argument. Rather, we assume
that each view is stored in relation to other views taken at the same
place x but with various viewing directions u. The views of one place
combine to a graph with a simple ring topology where views taken
with neighboring viewing directions are connected by a graph link
(see Fig. 2a). This model of a place representation differs from the
well-known snapshot-model from insect navigation (Cartwright,
Collett 1982; for the role of snapshots in human navigation, see
Gillner et al. 2008) by replacing the equally sampled, panoramic
snapshot by a set of views that may sample different viewing directions by different numbers of views. It is thus similar to view-based
models of object recognition, where views may also be sampled
inhomogeneously over die sides or aspects of an object (Bulthoff,
Edelman 1992). As in object recognition, places may therefore have
canonical views from which they are most easily recognized.
Long-term memory
The graph approach to spatial long-term memory has been extended
from place-graphs to graphs of oriented views by Scholkopf, Mallot
(1995). As compared to the simple rings sufficient to model place
memory, we now allow also for view-to-view links representing
movements with translatory components such as turn left and move
ahead or walk upstairs. The result is a graph of views with links
labeled by egocentric movements. Scholkopf, Mallot (1995) provide a
formal proof that this view-graph contains the same information as a
graph of places with geocentric movement labels.

Fig. 2 Overview of view-based spatial memory. a Memory for


places is organized as a collection on views obtained from a place,
arranged in a circular graph. View multiplicity models salience of
view orientation. b Spatial long-term memory organized as a graph of
views and movements leading to the transition from one view to
another. c View-based spatial working memory consisting of a
subgraph of the complete view-graph, centered at the current view
and including an outward neighborhood of the current view. For
further explanation see text. (Tubingen Holzmarkt icons are sections
of a panoramic image retrieved by permission from
www.kubische-panoramen.de. Map source Stadtgrundkarte der Universitatsstadt Tubingen, Stand: 17.3.2014.)

123

S122
A sketch of the view-graph for an extended area is given in
Fig. 2b. The place-transitions are shown as directed links whereas the
turns within a place work either way. In principle, the view-graph
works without metric information or a global, geocentric reference
frame, but combinations with such data types are possible.
Working memory
Spatial working memory tasks may or may not involve the interaction with long-term memory. Examples for stand-alone
processes in working memory include path integration, perspective
taking, and spatial updating while spatial planning requires interaction with spatial long-term memory. Models of spatial working
memory presented e.g. by Byrne et al. (2007), Tatler, Land (2011),
or Loomis et al. (2013) assume a local egocentric map in which
information about landmarks and the environment is inscribed. In
contrast, Wiener, Mallot (2003) suggested a working-memory
structure formed as local graph of places in which more distant
places are collapsed into regional nodes. In order to reconcile this
approach with the view-graph model for long-term memory, we
consider a local subgraph of the view-graph containing (i) the
current view, (ii) all views connected to this view by a fixed number
of movement steps, and (iii) some local metric information represented either by egocentric position-labeling of the included views
or by some view transformation mechanism similar to the one
suggested in object recognition by Ullman, Basri (1991), or both.
This latter component is required to account for spatial updating
which is a basic function of spatial working memory.
Frames of reference
While perceptions and spatial working memory are largely organized
in an egocentric way, long-term memory must be independent of the
observers current position and orientation and is therefore often
called allo- or geo-centric. These terms imply that a frame of
reference is used, much as a mathematical coordinate system within
which places are represented by their coordinates (e.g., Gallistel
1990). Clearly, the assumption of an actual coordinate frame in the
mental map leads to severe problems, not the least of which are the
representation of (coordinate) numbers by neurons and the choice of
the global coordinate origin. The view-graph approach avoids these
problems. Long-term memory is independent of egos position and
orientation since the views and their connections are carried around
like a portfolio, i.e. as abstract knowledge that does not change upon
egos movements. Working memory may rightly be called egocentric
since it collects views as they appear from the local or a close-by
position. In the view-based model, the transform between the poseindependent long-term memory and the pose-dependent (egocentric) working memory reduces in the view based model to a simple
selection process of the views corresponding to the current pose and
their transfer into working memory.
Predictions and experimental results
The sketched model of spatial memory interplay makes predictions
about the recollections that subjects may make of distant places. The
task of imagining a distant place is a working-memory task where an
image of that place may be built by having an imagined ego move
to the target place. Bisiach, Luzzatti (1978) show that hemilateral
neglect in the recall of landmarks around the Piazza del Duomo in
Milan, Italy, affects the landmarks appearing left when viewed from
an imagined view-point, but not the landmarks on the respective right
side. This result can be expressed by assuming that neglect entails a
loss of the left side of spatial working memory, into which no longterm memory items can be loaded; the long-term memory items
themselves are unaffected by the neglect condition.
For the imagery of distant places, two mechanisms can be
assumed. In a mental-travel mechanism, an observer might imagine a travel from his or her current position to the requested target
place, generate a working memory and recall the image from this
working memory. In a recall from index-mechanism, place names
might be recalled from long-term memory without mental travel, e.g.,

123

Cogn Process (2014) 15 (Suppl 1):S1S158


by some sort of indexing mechanism which then is likely to recall a
canonical view of the target place.
The mental-travel mechanism is illustrated in Fig. 3. Assume
that the subject is currently located at position A in a familiar
downtown environment. When asked to recall a view of the central
square appearing in Fig. 3, mental travel will generate a southward
view in spatial working memory which is then recalled. In contrast,
when asked at position B, the mental-travel mechanism will yield a
westward view, and so on. We therefore predict that recall, or
imagery of a distant place will result in oriented views whose orientation depends on the interview location. Preliminary data
(Rohrich et al. 2013) support this prediction: passers-by who were
approached in downtown Tubingen and asked to sketch a map of the
Holzmarkt (a landmark square in central Tubingen), produced
maps whose orientation depended on the interview site. As predicted, orientations were preferred that coincided with the direction
of approach from the current interview location. This effect was not
found for additional interview-locations some 2 km away from
downtown, indicating that here a different recall mechanism might
operate.
Oriented recall can also be triggered by explicitly asking subjects
to perform a mental travel before sketching a map Basten et al. (2012)
asked subjects to imaging walking one of two ways in downtown
Tubingen, passing the Holzmarkt square either in westward or eastward direction. In this phase of the experiment, the Holzmarkt was
not mentioned explicitly. When asked afterwards, to draw sketches of
the Holzmarkt, produced view orientations were clearly biased
towards the view orientation occurring in the respective direction of
mental travel carried out by each subject. This indicates that oriented
view-like memories are generated during mental travel and affect
subsequent recall and imagery.
Conclusion
We suggest that spatial long-term memory consists of a graph of
views linked together according to the movements effecting each
view transition. Working memory contains local views as well as
those nearby views which are connected to one of the local views.
When walking onwards, views of approached places are added from
long-term memory, thereby maintaining orientation continuity (spatial updating). In recall, views are selected from either working or

Fig. 3 Mental-travel mechanism of spatial recall. When located at


a nearby place, but out of sight of the target (places AD), recall by
mental travel towards the target place will result in different views.
Preliminary data suggest that this position-dependence of spatial
recall exists. (For image sources see Fig. 2.)

Cogn Process (2014) 15 (Suppl 1):S1S158


long-term memory. For places more than 2 km away, recall reflects
the long-term memory contents only.
Acknowledgment
WGR was supported by the Deutsche Forschungsgemeinschaft within
the Center for Integrative Neuroscience (CIN) Tubingen.
References
Basten K, Meilinger T, Mallot HA (2012) Mental travel primes place
orientation in spatial recall. Lecture Note Artif Intell 7463:378385
Bisiach E, Luzzatti C (1978) Unilateral neglect of representational
space. Cortex 14:129133
Bulthoff HH, Edelman S (1992) Psychophysical support for a two
dimensional view interpolation theory of object recognition. Proc
Natl Acad Sci 89:6064
Byrne P, Becker S, Burgess N (2007) Remembering the past and
imagining the future: A neural model of spatial memory and
imagery. Psych Rev 114:340375
Cartwright BA, Collett TS (1982) How honey bees use landmarks to
guide their return to a food source. Nature 295:560564
Gallistel CR (1990) The organization of learning. The MIT Press,
Cambridge
Gillner S, Wei AM, Mallot HA (2008) Visual place recognition and
homing in the absence of feature-based landmark information.
Cogn 109:105122
Kuipers B (1978) Modeling spatial knowledge. Cogn Sci 2:129153
Loomis JM, Klatzky RL, Giudice NA (2013) Representing 3D space
in working memory: spatial images from vision, hearing, touch,
and language. In: Lacey S, Lawson R (eds) Multisensory imagery:
theory and applications. Springer, New York
Mallot HA, Basten K (2009) Embodied spatial cognition: biological
and artificial systems. Image Vision Comput 27:16581670
OKeefe J, Nadel L (1978) The hippocampus as a cognitive map,
chapter 2. Spatial Behaviour. Clarendon, Oxford
Rohrich WG, Binder N, Mallot HA (2013) Imagery of familiar places
varies with interview location. In Proceedings of 10th Gottingen
meeting of the German Neuroscience Society, pp T242C. www.
nwg-goettingen. de/2013/upload/file/Proceedings NWG2013.pdf
Scholkopf B, Mallot HA (1995) View-based cognitive mapping and
path planning. Adapt Behav 3:311348
Tatler BW, Land MF (2011) Vision and the representation of the
surroundings in spatial memory. Phil Trans R Soc Lond B
366:596610
Tolman EC (1932) Purposive behavior in animals and men, chapter
XI. The Century Co., New York
Tolman EC (1948) Cognitive maps in rats and man. Psych Rev
55:189208
Ullman S, Basri R (1991) Recognition by linear combinations of
models. IEEE Trans Pattern Recogn Machine Intel 13:9921006
Wang RF, Spelk ES (2002) Human spatial representation: insights
from animals. Trends Cogn Sci 6:376382
Wiener JM, Mallot HA (2003) Fine-to-coarse route planning and
navigation in regionalized environments. Spatial Cogn Comput
3:331358

Systematicity and Compositionality in Computer Vision


German Martn Garca, Simone Frintrop, Armin B. Cremers
Institute of Computer Science III, Universitat Bonn, Germany
Abstract
The systematicity of vision is a topic that has been discussed thoroughly in the cognitive science literature; however, few accounts of it

S123
exist in relation to computer vision (CV) algorithms. Here, we argue
that the implications of the systematicity of vision, in terms of what
behavior is expected from CV algorithms, is important for the
development of such algorithms. In particular, the fact that systematicity is a strong argument for compositionality should be relevant
when designing computer vision algorithms and the representations
they work with. In this paper, we discuss compositionality and systematicity in CV applications and present a CV system that is based
on compositional representations.
Keywords
Systematicity, Compositionality, Computer Vision
Systematicity and Compositionality
In their seminal paper (Fodor and Pylyshyn 1988), Fodor and Pylyshyn address the question of systematicity of cognition. Systematicity
is the property by which related thoughts or sentences are understood.
Anyone able to understand the sentence John loves the girl should
be able to understand the related sentence The girl loves John. This
can be explained because both sentences are syntactically related. It is
because there is a structure on the sentences that language, and
thought, exhibit systematic behavior. The compositionality principle
states that the meaning, or the content, of a sentence is derived from
the semantic contribution of its constituents and the relations between
them (Szabo 2013). It is because John, the girl, and loves make the
same semantic contribution to the sentence John loves the girl, and
to The girl loves John, that we are able to systematically understand both of them. In the case of language, systematicity is achieved
by a compositional structure of constituents. In general, systematicity
is a strong argument for compositionality (Szabo 2013): we are able
to understand an immense number of sentences which we have never
seen before.
This can be extended to vision: we are able to make sense of
scenes we have never seen before because they are composed of items
we know. The systematicity of vision is defended by several authors.
Already in (Fodor and Pylyshyn 1988), Fodor and Pylyshyn foresee
that systematicity is probably a general property of cognition that is
not limited to verbal capabilities. In the cognitive science literature,
there are several arguments that support that vision is systematic
(Aparicio 2012; Tacca 2010): if a subject is capable of visually
representing a red ball then he must be capable of representing: i) the
very same red ball from a large number of different viewpoints (and
retinal inputs); ii) a number of similar red balls []; and iii) red
objects and ball- shaped objects in general. (Aparicio 2012).
In this paper, we are concerned with the sort of systematic behavior
that should be expected when a scene is observed from different points
of view: a systematic CV algorithm should be able to determine the
visual elements that compose the images and find the correspondences
between them over time. Some authors claim that systematicity in
vision can be achieved without having compositionality (Edelman and
Intrator 2003, 2000). However, the models they provide have not
shown to be applicable in real world CV problems. We argue that from
a computer scientist point of view, recurring to compositionality is
beneficial when designing CV algorithms.
Compositionality in Computer Vision Algorithms
The systematicity problem is rarely addressed in computational
models of vision. In Edelman and Intrator (2000), the authors
acknowledge that structural descriptions are the preferred theory about
human vision that allows for view- point abstraction and novel shape
recognition. In the structural approaches to vision, the visual information is explained in terms of atomic elements and the spatial
relations that hold between them (Edelman 1997). One example is the
Recognition-by-Components theory of Biederman (1987). In this
theory, object primitives are represented by simple geometric 3D
components called geons. However, extracting such primitive elements from images is by no means a trivial task in CV. Approaches
that attempt to extract such primitives to explain the visual phenomena

123

S124
are hard to realize in practice, and according to Andreopoulos, Tsotsos
(2013) there is no method that works reliably with natural images
(Andreopoulos and Tsotsos 2013). Here, we suggest to generate such
primitive elements by grouping mechanisms realized by segmentation
methods which are well investigated in CV. In the following section,
we propose a computer vision system that bases on such perceptually
coherent segments to represent scenes in a compositional way.
A Compositional Approach for Visual Scene Matching
Here, we present a compositional vision system that is able to represent a scene in terms of perceptually coherent components and the
relations between them with help of a graph representation. A graph
matching algorithm enables to match components between different
viewpoints of a scene and, thus, enables a scene representation that is
temporally consistent. In contrast to geons, our segments are easily
extracted with standard segmentation algorithms; we use the wellknown Mean Shift segmentation algorithm (Comaniciu and Meer
2002). Mean Shift produces a segmentation based on the proximity of
pixels in spatial and color spaces. We construct a graph where the
nodes represent segments, and the edges the neighborhood of segments. We use labeled edges, where the labels correspond to the
relations between segments. These are of two types, part of and
attached to, and can be obtained automatically from the image by
simple procedures. To compute whether two segments that share a
common border (attached to relation) it is enough to perform two
morphological operations: first to dilate, and then intersect
both segments. The remaining pixels will constitute the shared contour and will indicate that this relation is present. To find whether
segment A is part of segment B is enough to check whether the outer
contour of segment B is the same as the outer contour of the union of
A and B.
Once the graphs are built, we can apply a graph matching algorithm to establish correspondences between nodes, and thus, between
segments. Suppose we have two graphs G1 V1 ; E1 ; X1 and
G2 V2 ; E2 ; X2 defined by a set of nodes V, edges E, and attributes
measured on the nodes X. We want to find a labelling function f that
assigns nodes from G1 to nodes in G2: f:G1 ? G2. We base our
approach for matching on (Wilson and Hancock 1997). The authors
propose a relaxation algorithm for graph matching that locally
updates the label of each node based on an energy functional F
defined on the labelling function f. By defining F f as the maximum

Cogn Process (2014) 15 (Suppl 1):S1S158


a posteriori probability of the labelling given the measurements
F f Pf jX1 ; X2 , and by applying Bayes rule, we get:
Pf jX1 ; X2

PX1 ; X2 jf Pf
pX1 ; X2

Hereby, P(X1, X2|f) is the appearance term that denotes the


probability that the nodes of a given match f have certain attributes
X1and X2: we used colour average and dimensions of the minimum
fitting rectangle as attributes. P(f) is the structural term and is high if a
matching preserves the structure of the graph; for this term to have a
high value, if node A is mapped to A0 , then the neighbors of A should
be mapped to the neighbors of A0 . The algorithm works by iteratively
assigning to each node u in G1, the node v in G2 that maximises
Equation 1:
f u argmax pxu ; xv ju; vP f :

v2V2

We extended the original algorithm so that it is able to deal with


directed graphs as well as with labeled edges. The labels represent the two
different relations: part of and attached to. The directions of the edges
denote the order in which the segments appear in the relation predicates,
e.g., in the part of relation, the edge points towards the node that contains
the other, and in the attached to the edge points towards the node that is
either under or on the right side of the other. The details of the algorithm
are out the scope of this paper and can be found in (Garcia 2014).
We evaluated the algorithm on a real-world video sequence
recorded at our office by matching pairs of consecutive and nonconsecutive frames. In the first case, 84 % of the segments were
correctly matched, and in the second case, 57 %. Some non-consecutive frames are shown in Fig. 1: the matched segments are displayed
with the same color, and those that were missed are displayed in
black. It can be seen that some missing matches originate from having
non-repeatable segmentations over frames, i.e., the boundaries of the
segments are not always consistent when the viewpoint changes (see,
for example, the segmentation of the sponge in frames d) and e) in
Fig. 1). This is a known problem of image segmentation algorithms
(Hedau et al. 2008) that has two effects: a segment in frame 1 is
segmented as two in frame 2, or the other way round. As a consequence, the graphs that are built on top of these segmentations are
structurally different.

Fig. 1 First row: original non-consecutive images. Rows 2 & 3: results of the matching between the corresponding pair of frames. Matches are
displayed with the same colors. Segments for which no match was found are shown in black

123

Cogn Process (2014) 15 (Suppl 1):S1S158


In future work, we will extend the matching algorithm so that
merging of segments is performed. In the presented system, we show
in an exemplar way how the concept of compositionality can be
integrated into CV algorithms and, by making use of well-approved
segmentation and graph-matching methods, a simple visual representation can be achieved that is coherent over time.
References
Andreopoulos A, Tsotsos JK (2013) 50 years of object recognition:
directions forward. Comput Vis Image Understand
Aparicio VMV (2012) The visual language of thought: Fodor vs. Pylyshyn. Teorema: Revista Internacional de Filosofa 31(1):5974
Biederman I (1987) Recognition-by-components: a theory of human
image understanding. Psychol Rev 94(2):115
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward
feature space analysis. Pattern Anal Mach Intell IEEE Trans
24(5):603619
Edelman S (1997) Computational theories of object recognition.
Trend Cogn Sci, pp 296304
Edelman S, Intrator N (2000) (coarse coding of shape fragments) + (retinotopy)
approximately = representation
of
structure. Spatial Vision 13(23):255264
Edelman S, Intrator N (2003) Towards structural systematicity in distributed, statically bound visual representations. Cogn Sci
27(1):73109 Fodor JA, Pylyshyn ZW (1988) Connectionism and
cognitive architecture: a critical analysis. Cognition 28(1):371
Garcia GM (2014) Towards a Graph-based Method for Image
Matching and Point Cloud Alignment. Tech. rep., University of
Bonn, Institute of Computer Science III
Hedau V, Arora H, Ahuja N (2008) Matching images under unstable
segmentations. In: IEEE conference on computer vision and
pattern recognition (CVPR)., IEEE
Szabo ZG (2013) Compositionality. In: Zalta EN (ed) The Stanford
Encyclopedia of Philosophy, fall 2013 edn
Tacca MC (2010) Seeing objects: the structure of visual representation. Mentis
Wilson RC, Hancock ER (1997) Structural matching by discrete
relaxation. IEEE Trans Pattern Anal Mach Intell 19:634648

Control and flexibility of interactive alignment: Mobius


syndrome as a case study
John Michael1,2,3, Kathleen Bogart4, Kristian Tylen3,5, Joel Krueger6,
Morten Bech3, John Rosendahl stergaard7, Riccardo Fusaroli3,5
1
Department of Cognitive Science, Central European University,
Budapest, Hungary; 2 Center for Subjectivity Research, Copenhagen
University, Copenhagen, Denmark; 3 Interacting Minds Centre,
Aarhus University, Aarhus, Denmark; 4 School of Psychological
Science, Oregon State University, Corvallis, USA; 5 Center for
Semiotics, Aarhus University, Aarhus, Denmark; 6 Department of
Sociology, Philosophy, and Anthropology University of Exeter Amory,
Exter, UK; 7Aarhus University Hospital, Aarhus, Denmark
Keywords
Mobius Syndrome, social interaction, social cognition, alignment
When we interact with others, there are many concurrent layers
of implicit bodily communication and mutual responsiveness at work
from the spontaneous temporal synchronization of movements (Richardson et al. 2007), to gestural and postural mimicry (Chartrand and
Bargh 1999; Bernieri and Rosenthal 1991), and to multiple dimensions of
linguistic coordination (Garrod and Pickering 2009; Clark 1996; Fusaroli
and Tylen 2012). These diverse processes may serve various important
social functions. For example, one individuals facial expressions,

S125
gestures, bodily postures, tone and tempo of voice can provide others with
information about her emotions, intentions and other mental states, and
thereby help to sustain interpersonal understanding and support joint
actions. And when such information flows back and forth among two or
more mutually responsive participants in an interaction, the ensuing
alignment can promote social cohesion, enhancing feelings of connectedness and rapport (Lakin and Chartrand 2003; Bernieri 1988; Valdesolo
et al. 2010). Indeed, by enhancing rapport, interactive alignment may also
increase participants willingness to cooperate with each other (van
Baaren et al. 2004; Wiltermuth and Heath 2009) andequally importantlytheir mutual expectations of cooperativeness even when interests
are imperfectly aligned, as in scenarios such as the prisoners dilemma
(Rusch et al. 2013). Moreover, interactive alignment may even enhance
interactants ability to understand each others utterances (Pickering and
Garrod 2009) and to communicate their level of confidence in their
judgments about situations (Fusaroli et al. 2012), thereby enhancing
performance on some joint actions. Finally, interactive alignment may
also increase interactants ability to coordinate their contributions to joint
actions (Valdesolo et al. 2010) because synchronization increases interactants attention to one anothers movements, and because it may be
easier to predict and adapt to the movements of another person moving at
a similar tempo and initiating movements of a similar size, duration, and
force as oneself.
It is no surprise, then, that recent decades have seen a dramatic
increase in the amount of attention paid to various kinds of interactive
alignment in the cognitive sciences. However, although there is a broad
consensus about the importance of interactive alignment processes for
social interaction and social cognition, there are still many open questions. How do these diverse processes influence each other? Which ones
contributeand in what waysto interpersonal understanding, cooperativeness and/or performance in joint actions? Is alignment
sometimes counterproductive? To what extent can alignment processes
be deliberately controlled and flexibly combined, replaced, tweaked or
enhanced? This latter question may be especially relevant for individuals who have impairments in some form of bodily expressiveness, and
who therefore may benefit by compensating with some other form of
expressiveness. In the present study, we investigated social interactions
involving just such individuals, namely a population of teenagers with
Mobius Syndrome (MS)a form of congenital, bilateral facial paralysis resulting from maldevelopment of the sixth and seventh cranial
nerves (Briegel et al. 2006).
Since people with MS are unable to produce facial expressions, it is
unsurprising that they often experience difficulties in their social
interactions and in terms of general social well-being. We therefore
implemented a social skills intervention designed to train individuals
with facial paralysis owing to MS to adopt alternative strategies to
compensate for the unavailability of facial expression in social interactions (e.g. expressive gesturing and prosody). In order to evaluate the
effectiveness of this intervention, each of the 5 participants with MS
(MS-participants) engaged in interactions before and after the intervention with partners who did not have MS (Non-MS-participants).
These social interactions consisted of two separate tasks, a casual getting-to-know-you task and a task designed to tap interpersonal
understanding. Participants filled out rapport questionnaires after each
interaction. In addition, the interactions were videotaped and analyzed
by independent coders, and we extracted two kinds of linguistic data
relating to the temporal organization of the conversational behavior:
prosody (fundamental frequency) and speech rate. We used this latter
data to calculate indices of individual behavioral complexity and of
alignment using cross-recurrence quantification analysis (CRQA).
We found several interesting results. First, intervention increased
observer-coded rapport. Secondly, observer-coded gesture and expressivity increased in participants with and without MS after
intervention. Thirdly, fidgeting and repetitiveness of verbal behavior
decreased in both groups after intervention. Fourthly, while we did in
general observe alignment (compared to surrogate pairs), overall

123

S126
linguistic alignment actually decreased after intervention, and pitch
alignment was negatively correlated with rapport.
These results suggest that the intervention had an impact on MS
interlocutors, which in turn impacted non-MS interlocutors, making
them less nervous and more engaged. Behavioral dynamics can statistically predict observer-coded rapport, thus suggesting a direct link
between them and experience of the interaction.
This pattern of findings provides initial support for the conjecture
that a social skills workshop like the one employed here can not only
affect the participants with MS but alsoand perhaps even more
importantlyaffect the interaction as a whole as well as the participants without MS. One reason why this is important is because some of
the difficulties experienced by individuals with MS in social interactions may arise from other peoples discomfort or uncertainly about
how to behave. In other words, individuals without MS who interact
with individuals with MS may interrupt the smooth flow of interaction
through their uncertainty about how to interact in what is for them a
new and sensitive situation. Moreover, this may also be true in other
instances in which people interact with others who appear different or
foreign to them (because of other forms of facial difference, skin color,
etc.) Thus, this issue points to a possible direction in which further
research may be conducted that would extend the findings far beyond
the population of individuals with MS. More concretely, one obvious
comparison would be to individuals with expressive impoverishment
due to Parkinsons disease. Do these individuals also employ some of
the same kinds of compensatory strategies as individuals with MS? If
so, what effects does that have upon interactive alignment within
social interactions? What differences does it make that their condition
is an acquired rather than a congenital one?
Finally, one additional question for further research is whether
some compensatory strategies are more easily automated than others.
For example, it is possible that increasing hand gesturing or eye
contact can be quickly learned and routinized, but that modulating
ones prosody cannot. If there are such differences among the degrees
to which different processes can be automated, it would be important
to understand just what underlies them. On a theoretical level, this
could provide useful input to help us understand the relationship
between automatic and controlled processes. On a more practical
level, this could be important for three concrete reasons. First of all, it
may be taxing and distracting to employ deliberate strategies for
expressing oneself in social interactions, and people may therefore
find it tiring, and be less likely to continue doing it. Secondly, it may
be important that some interactive alignment processes occur without
peoples awareness. Thus, attempting to bring them about deliberately
may actually interfere with the implicit processes that otherwise
generate alignment. Indeed, there is evidence that behavioral mimicry
actually undermines rapport if people become aware that it is being
enacted deliberately (Bailensen et al. 2008). Thirdly, it would be
important for future social skills workshops to examine whether some
compensatory strategies are more effectively taught indirectlye.g.
rather than telling people to use more gestures, it may be advantageous to employ some other means which does not require them to
deliberately attend to their gestures or prosody, for example by using
more gestures and prosody when interacting with children with MS,
by asking them to watch videos in which actors are highly expressive
in their gestures and prosody, or by engaging them in role-playing
games in which a high level of gesture and/or prosody is appropriate.
References
Bailenson JN, Yee N, Patel K, Beall AC. (2008) Detecting digital
chameleons. Comput Hum Behav 24:6687
Bernieri FJ, Rosenthal R (1991) Interpersonal coordination: behavior
matching and interactional synchrony. In: Feldman RS, Rime B

123

Cogn Process (2014) 15 (Suppl 1):S1S158


(eds) Fundamentals of nonverbal behavior. Cambridge University Press, Cambridge, pp 401432
Bogart KR, Tickle-Degnen L, Ambady N (2012) Compensatory
expressive behavior for facial paralysis: adaptation to congenital
or acquired disability. Rehabilit Psychol 57(1):4351
Bogart KR, Tickle-Degnen L, Joffe M (2012) Social interaction
experiences of adults with Moebius syndrome: a focus group.
J Health Psychol. Advance online publication
Bogart KR, Matsumoto D (2010) Facial mimicry is not necessary to
recognize emotion: Facial expression recognition by people with
Moebius syndrome. Soc Neurosci 5(2):241251
Bogart KR, Matsumoto D (2010) Living with Moebius syndrome:
adjustment, social competence, and satisfaction with life. Cleft
Palate-Craniofacial J47(2):134142
Briegel W (2007) Psychopathology and personality aspects of adults
with Moebius sequence. Clin Genet 71:376377
Chartrand TT, Bargh JA (1999) The chameleon effect: the perceptionbehavior link and social interaction. J Person Soc Psychol
76:893910
Clark HH (1996) Using language. Cambridge University Press,
Cambridge
Derogatis LR SCL-90-R (1977) Administration, scoring and procedures manual-I for the Revised version. Johns Hopkins
University School of Medicine, Baltimore
Fahrenberg J, Hampel R, Selg H. FPI-R. (2001) Das Freiburger
Personlichkeitsinventar, 7th ed. Gottingen: Hogrefe
Garrod S, Pickering MJ (2009) Joint action, interactive alignment,
and dialog. Top Cogn Sci 1(2):292304
Helmreich R, Stapp J (1974) Short forms of the Texas Social
Behavior Inventory (TSBI), an objective measure of selfesteem. Bull Psychon Soc
Kahn JB, Gliklich RE, Boyev KP, Stewart MG, Metson RB,
McKenna MJ (2001) Validation of a patient-graded instrument
for facial nerve paralysis: The FaCE scale. Laryngoscope
111(3):387398
Lakin J, Chartrand T (2003) Using nonconscious behavioral mimicry
to create affiliation and rapport. Psychol Sci 14:334339
Mattick RP, Clarke JC (1998) Development and validation of measures of social phobia scrutiny fear and social interaction anxiety.
Behav Res Therapy 36(4):455470
Meyerson MD (2001) Resiliency and success in adults with Moebius
syndrome. Cleft Palate Craniofac J 38:231235
Oberman LM, Winkielman P, Ramachandran VS (2007) Face to face:
blocking facial mimicry can selectively impair recognition of
emotional expressions. Social Neurosci 2(3):167178
Richardson MJ, Marsh KL, Isenhower RW, Goodman JR, Schmidt
RC (2007) Rocking together: dynamics of intentional and unintentional interpersonal coordination. Hum Mov Sci 26:867891
Robinson E, Rumsey N, Partridge J. (1996) An evaluation of social
interaction skills training for facially disfigured people. Br J Plast
Surg 49:281289
Rosenberg M (1965) Rosenberg self-esteem scale (RSE). Acceptance
and Commitment Therapy. Measures Package, 61
Tickle-Degnen L, Lyons KD (2004) Practitioners impressions of
patients with Parkinsons disease: the social ecology of the
expressive mask. Soc Sci Med 58:603614
Valdesolo P, Ouyang J, DeSteno D (2010) The rhythm of joint action:
synchrony promotes cooperative ability. J Exp Soc Psychol
46:693695
van Baaren RB, Holland RW, Kawakami K, van Knippenberg A
(2004) Mimicry and pro-social behavior. Psychol Sci 15:7174
Zigmond AS, Snaith R (1983) The hospital anxiety and depression
scale. Acta Psychiatrica Scand 67(6):361370

Cogn Process (2014) 15 (Suppl 1):S1S158

Efficient analysis of gaze-behavior in 3D environments


Thies Pfeiffer1, Patrick Renner, Nadine Pfeiffer-Lessmann
1
Center of Excellence Cognitive Interaction Technology, Bielefeld
University, Germany; 2 SFB 673: Alignment in Communication,
Bielefeld University, Germany
Abstract
We present an approach coined EyeSee3D to identify the 3D point of
regard and the fixated object in real-time based on 2D gaze videos
without the need for manual annotation. The approach does not
require additional hardware except for the mobile eye tracker. It is
currently applicable for scenarios with static target objects and
requires fiducial markers to be placed in the target environment. The
system has already been tested in two different studies. Possible
applications are visual world paradigms in complex 3D environments,
research on visual attention or humanhuman/human-agent interaction studies.
Keywords
3D eye tracking, Natural environments
Introduction
Humans are evolved to live in a 3D spatial world. This affects our
perception, our cognition and our action. If human behavior and in
particular visual attention is analyzed in scientific studies, however,
practical reasons often force us to reduce the three-dimensional
world to two dimensions within a small field of view presented on
a computer screen. In many situations, such as spatial perspective
taking, situated language production, or understanding of spatial
references, just to name a few, a restriction to 2D experimental
stimuli can render it impossible to transfer findings to our natural
everyday environments.
One of the reasons for this methodological compromise is the
effort required to analyze gaze data in scenarios where the participant
is allowed to move around and inspect the environment freely. Current mobile eye-tracking systems use a scene camera to record a video
from the perspective of the user. Based on one or two other cameras
directed at the participants eyes, the gaze fixation of the participant is
then mapped on the video of the scene camera. While binocular
systems are already able to compensate for parallax by estimating the
distance of the fixation from the observer, they have no representation
of the 3D world but still only work on the 2D projection of the world
visible in the scene camera video. The most important part then is
identifying in the video stream that particular object the participant
has been fixating. This currently requires manual annotations, which
take several times as much as the recorded time. Depending on the
complexity of the annotation (target object count and density), we had
cases where the annotation of one minute recorded video required
fifteen minutes of annotation or more.
With our EyeSee3D approach, we provide a software tool that is
able to identify the fixated objects automatically if it can be allowed
that the environment is covered with some visible markers that do not
affect the visual behavior and if the target objects remain static.
Related Work
There are approaches for semi-automatic gaze annotation based on
2D computer vision, such as the SemantiCode approach by Pontillo
et al. (2010), which still requires manual annotation, but achieves a
speed-up by incrementally learning the labeling of the targets using
machine learning and computer vision techniques. Still, the experimenter has to at least validate every label. Approaches that also use
3D models are Toyama et al. (2012), but they are targeting human
computer interactions, not scientific studies, and Paletta et al. (2013),
who use a 3D scan of the target environment to later identify the
target position. Their approach requires much more effort during
preparation but then does not require an instrumentation of the
environments with markers.

S127
Application Areas
The presented EyeSee3D approach can be applied as a method to
accurately an- notate fixations in 3D environments as required for
scientific studies. We have already tested this approach in two studies.
Both studies involve settings with two interacting interlocutors (no
confederates) sitting face-to-face at a table.
In the first study, we were interested in gaze-patterns of joint
attention (Pfeiffer-Lessmann, Pfeiffer, Wachsmuth 2013). We placed
23 figures of a LEGO Duplo set on a table, each of which facing
either of the interlocutors. The experimenter then describes a certain
figure and the interlocutors have to team up to identify the figure. The
task, however, is not as simple as it sounds: the information given
might only be helpful for one of the interlocutors, as it might refer to
features of the figure only visible from a certain perspective. Even
more, the interlocutors are instructed to neither speak nor gesture to
communicate. This way we force the participants to use their gaze to
guide their partners attention towards the correct figure. The set-up
used in this experiment will be used later in this paper to illustrate the
EyeSee3D method.
In the second study, we were interested in creating computational models for predicting the targets of pointing gestures and
more generally areas which in the near future will be occupied by
a human interlocutor during interaction (Renner, Pfeiffer, Wachsmuth 2014). This research is motivated by human-robot interaction
in which we want to enable robots to anticipate human movements
in order to be more responsive, i.e., in collision-avoidance
behavior.
Besides eye tracking, in this study we also combined the EyeSee3D approach with an external motion-tracking system to track the
hands and the faces of the interlocutors. Using the same principles as
presented in the next section, also the targets of pointing gestures as
well as gazes towards the body of the interlocutor can be identified
computationally without the need for manual annotations.
EyeSee3D
The EyeSee3D approach is easy to set-up Fig. 1 on the left shows a
snapshot from one of our own studies on joint attention between two
human interlocutors (Pfeiffer-Lessmann, N., Pfeiffer, T., Wachsmuth,
I. (2013). In this study we had 12 pairs of interaction partners and a
total of about 160 min of gaze video recordings. It would have taken
about 40 h to manually annotate the gaze videos, excluding any
additional second annotations to test for annotation reliability.
The process followed by EyeSee3D is presented in Fig. 2. In a
preparation phase, we covered the environment with so-called fiducial
markers, highly visible printable structures that are easy to detect
using computer-vision methods (see Fig. 1, mid upper half). We
verified that these markers did not attract significant attention by the
participants. As a second step, we created proxy geometries for the
relevant stimuli, in this example small toy figures (see Fig. 3). For our
set- up, a simple approximation using bounding boxes is sufficient,
but any complex approximation of the target may be used. When
aiming for maximum precision, it is possible to use 3D scans with
exact replications of the hull of the target structure. The whole process for setting up such a table will take about 30 min. These
preparations have to be made once, as the created model can be used
for all study recordings.
Based on these preparations, we are now able to conduct the study
and record the eye-tracking data (gaze videos and gaze data). EyeSee3D then automatically annotates the recorded gaze videos. For
each frame of the video, the algorithms detect fiducial markers in the
image and estimate the position and orientation of the scene camera in
3D space. For this process to succeed at least one fiducial marker has
to be fully visible in each frame. The camera position and orientation
are then used together with the gaze information provided by the eye
tracker itself to cast a gaze ray into the 3D proxy geometries. This
gaze ray intersects the 3D proxy geometries exactly at the point (see
Fig. 1, right) that is visualized by the gaze cursor in the scene camera

123

S128

Fig. 1 The left snapshot is taken from a 2D mobile eye-tracking


video taken from the ego- centric perspective of the scene camera.
The point of regard is visualized using a green circle and a human
annotator would have to manually identify the fixated object, here the
figure of a girl. With EyeSee3D, gaze-rays can be computed and cast
into a 3D abstract model of the environment (simple white boxes
around the figures), the intersection with the fixation target (box
corresponding to figure of the girl) is computed automatically and in
real-time

Fig. 2 The EyeSee3D method requires a one-time preparation phase.


During study recording there are two alternatives, either (a) use the
standard tools and run EyeSee3D offline to annotate the data or b use
EyeSee3D online during the study

Fig. 3 The 3D proxy geometries that had to be created to determine


the fixated objects. The different figures are textured with original
pictures, which is not needed for the process but useful for controlling
the orientation of the figures when setting up the experiment

video provided by the standard eye-tracking software (see Fig. 1,


left). As each of the proxy geometries is labeled, we can identify the
target object automatically.
This annotation process can be either used online during the study,
so that the annotation results are already available when the study
session is completed. Or, alternatively, EyeSee3D can be used in
offline-mode to analyze the previously recorded gaze videos and data
files. This offline-mode has the advantage that it can be repeatedly
applied to the same data. This is useful in cases where number and
placement of the proxy geometries is not known beforehand and
incrementally refined during the progress of understanding the
problem domain. For example, at the moment we are only interested
in locating the target figure. Later on we might be working together
with psycholinguists on language processing following a visualworld paradigm. We might then be also interested in whether the
participants have looked at the headdress, the head, the upper body or

123

Cogn Process (2014) 15 (Suppl 1):S1S158


the lower body of the figures during sentence processing. After
updating the 3D proxy models, we could use EyeSee3D to re-annotate
all videos and have the more fine-grained annotation ready within
minutes.
In our example study, we were able to cover about 130 min of the
160 min of total recordings using this technique. In the remaining
30 min, participants were either moving their head so quickly that the
scene camera only provided a motion-blurred image or they turned
towards the interaction partner or the experimenter for questions, so
that no marker was visible in the image (but also no target stimuli).
Thus, the remaining 30 min where not relevant for the evaluation of
the study.
More technical details about the EyeSee3D approach have been
presented at ETRA 2014 (Pfeiffer and Renner 2014).
Discussion and Future Work
The presented initial version of our EyeSee3D approach can already
significantly speed-up the annotation of mobile eye-tracking studies.
There are no longer economic reasons to keep an eye on short sessions and low number of participants. The accuracy of the system
depends on the one hand on the accuracy of the eye- tracking system.
In this the accuracy of EyeSee3D does not differ from the normal 2D
video-based analysis. On the other hand the accuracy depends on the
quality with that the fiducial markers are detected. The larger the
detected marker and the better the contrast, the higher the accuracy of
the estimated camera position and orientation.
EyeSee3D is not only applicable for small setups, as the selected
example of two interaction partners sitting at a table might suggest at
first glance. The size of the environment is not restricted as long as at
least one fiducial marker is in the field of view for every relevant
target object. The markers might, for example, be sparsely distributed
in a museum just around the relevant exhibits.
We are currently working on further improving the speed and the
accuracy of the system. In addition to that, we are planning to integrate other methods for tracking the scene cameras position and
orientation in 3D space based, e.g., on tracking arbitrary but significant images. In certain examples such as a museum or a shelf in a
shopping center, this would allow for an automatic tracking without
any dedicated markers.
In future work, we are planning to compare the results obtained by
human annotators with those calculated by EyeSee3D. In a pilot
evaluation we were able to identify situations of disagreement, i.e.
situations in which EyeSee3D comes to slightly different results as a
human annotator, when two target objects overlap in space (which is
more likely to happen with a freely moving participant than in traditional screen-based experiments) and the fixation is somewhere in
between.
Such situations are likewise difficult to annotate consistently
between human annotators, because of their ambiguity. Investigating
the differences between the systematic and repeatable annotations
provided by EyeSee3D and the interpretations of human annotators,
which might depend on different aspects, such as personal preferences
or the history of preceding fixations, could be very informative.
Besides the described speed-up achieved by EyeSee3D, it might also
provide more objective and consistent annotations.
In summary, using EyeSee3D the analysis of mobile gaze-tracking
studies has become as easy as desktop-computer-based studies using
remote eye-tracking systems.
Acknowledgments
This work has been partly funded by the DFG in the SFB 673
Alignment in Communication.
References
Paletta L, Santner K, Fritz G, Mayer H, Schrammel J (2013) 3D
attention: measurement of visual saliency using eye tracking

Cogn Process (2014) 15 (Suppl 1):S1S158


glasses, CHI 13 Extended Abstracts on Human Factors in Computing Systems, 199204, ACM, Paris, France
Pfeiffer T, Renner P (2014) EyeSee3D: A Low-Cost Approach for
Analysing Mobile 3D Eye Tracking Data Using Augmented
Reality Technology, Proceedings of the Symposium on Eye
Tracking Research and Applications, ACM
Pfeiffer-Lessmann N, Pfeiffer T, Wachsmuth I (2013) A Model of
Joint Attention for Humans and Machines. Book of Abstracts of
the 17th European Conference on Eye Movements (Bd. 6), 152,
Lund, Sweden
Pontillo DF, Kinsman TB, Pelz JB (2010) SemantiCode: using content similarity and database-driven matching to code wearable
eyetracker gaze data. ACM ETRA 2010, 267270, ACM
Renner P, Pfeiffer T, Wachsmuth I (2014) Spatial references with
gaze and pointing in shared space of humans and robots. In:
Proceedings of the Spatial Cognition 2014
Toyama T, Kieninger T, Shafait F, Dengel A (2012) Gaze guided
object recognition using a head-mounted eye tracker, ACM
ETRA 2012, 9198, ACM

The role of the posterior parietal cortex in relational


reasoning
Marco Ragni, Imke Franzmeier, Flora Wenczel, Simon Maier
Center for Cognitive Science, Freiburg, Germany
Abstract
Inferring information from given relational assertions is at the core of
human reasoning ability. Involved cognitive processes include the
understanding and integration of relational information into a mental
model and drawing conclusions. In this study we are interested in the
identification of the role of associated brain regions. Hence, (i) we
reanalyzed 23 studies on relational reasoning from Pubmed, Science
Direct, and Google Scholar with healthy patients and focused on
peak-voxel analysis of single subregions of the posterior parietal
cortex allowing a more fine-grained analysis than before, and (ii) the
identified regions are interpreted in light of findings on reasoning
phases from own transcranial magnetic stimulation (TMS) and fMRI
studies. The results indicate a relevant role of the parietal cortex,
especially the lateral superior parietal cortex (SPL) for the construction and manipulation of mental models.
Keywords
Relational Reasoning, Brain Regions, Posterior Parietal Cortex
Introduction
Consider a relational reasoning problem of the following form:
The red car is to the left of the blue car.
The yellow car is to the right of the blue car.
What follows?
The assertions formed in reasoning about (binary) relations consist
of two premises connecting three terms (the cars above). Participants
process each piece of in- formation and integrate it in a mental model
(Ragni, Knauff 2013). A mental model (Johnson-Laird 2006) is an
analogue representation of the given information. For the problem
above we could construct a mental model of the following form:
red car blue car yellow car
From this analogical representation (for a complete discussion of
how much information might be represented please refer to Knauff
2013) the missing relational information, namely that the red car is to
the left of the yellow car, can easily be inferred. The premise
description above is determinate, i.e., it elicits only one mental model.
There are, however, indeterminate descriptions, i.e., descriptions with
which multiple models are consistent, and sometimes alternative
models have to be constructed. The associated mental processes in
reasoning are the model construction, mental inspection, and model

S129
variation phase. The neural activation patterns can also help unravel
the cognitive processes underlying reading and processing of the
premise information. First experiments utilizing recorded neural
activation with PET (and later with fMRI) were conducted by Goel
et al. in 1998. The initial motivation was to get an answer about which
of several then popular psychological theories were correct (cp Goel
2001), simply by examining the involvement of respective brain areas
that are connected to specific processing functions. Such an easy
answer has not yet been found. However, previous analyses (e.g.,
Knauff 2006; Knauff 2009) across multiple studies showed the
involvement of the frontal and the posterior parietal cortex (PPC),
especially for relational reasoning. Roughly speaking, the role of the
PPC is to integrate information across modalities (Fig. 1) and its
general involvement has been shown consistently in studies (Knauff
2006).
In this article we briefly introduce state-of-the-art neural findings
for relational reasoning. We present an overview of the current studies
and report two studies form our lab. Subregions within the PPC, e.g.,
the SPC, are differentiated to allow a more fine-grained description of
their role for the mental model construction and manipulation process.
The Associated Activation in Relational Reasoning
Main activations during reasoning about relations found in a metastudy conducted by Prado et al. (2011) identified the role of the PPC
and the middle frontal gyrus (MFG). Although we know that these
regions are involved in the reasoning process about relations, exact
functions of these areas, the role of the subregions, and problemspecific differences remain unclear. Studies by Fangmeier, Knauff
and colleagues (Fangmeier et al. 2006; Fangmeier, Knauff 2009;
Knauff et al. 2003) additionally compared activations across the
reasoning process. They analyzed the function of the PPC during the
processing and integration of premise information and the subsequent
model validation phase. The PPC is mainly active in the last phase
model validation.
We included all studies mentioned in Prado et al. (2012) and
Knauff (2006) and additionally searched the three databases Pubmed,
Google scholar, and Science Direct with the keywords: relational
reasoning or deductive reasoning in combination with the terms
neuroimaging or fMRI, and searched for studies that were cited in the
respective articles.
Of these 26 studies we included 23 in our analysis; all those which
(i) report coordinates (either Tailarach or MNI), (ii) had a reasoning
vs. non-reasoning contrast, and (iii) used healthy participants, i.e.,
excluding patient studies. We transformed all coordinates to the MNI
coordinate system for the peak voxel analysis.
Only few studies report temporal activation. Mainly activation in
the middle temporal gyrus was found, possibly related to language
processes. Activation in the occipital cortex probably is due to the

Fig. 1 The posterior parietal cortex and central subregions

123

S130

Cogn Process (2014) 15 (Suppl 1):S1S158

Table 1 Key studies and frontal and parietal activation

Anatomical probabilities for the peak coordinates located within the lateral (with the SPL as a subregion) and the medial SPC (with the precuneus
as a subregion) according to the SPM anatomy toolbox (Eickhoff et al. 2007) are reported. Reports of SPC activation in the original publications
which showed an anatomical probability of less than 30 % for the SPC are depicted in brackets. MC = motor cortex, PMC = premotor cortex,
dlPFC = dorsolateral prefrontal cortex, AG = angular gyrus, TPJ = temporoparietal junction, SMG = supramarginal gyrus; left half-circle = left lateral, right half-circle = right lateral, circle = bilateral

visual presentation of the stimuli. Key activations were found in the


frontal and parietal lobes (Table 1). Across all studies only the lateral
SPL was consistently involved while in the frontal regions the activation was more heterogeneous. Hence, we focused on the PPC and
its subregions. In Table 1 we report anatomical probabilities for the
peak coordinates located within the lateral and me- dial (incl. the
precuneus as a subregion) SPL according to the SPM anatomy toolbox (Eickhoff et al. 2007). Reports of SPL activation in the original
publications which showed an anatomical probability of less than
30 % for the region are depicted in brackets.
General Discussion
Table 1 shows that in almost all experimental studies of relational
reasoning the PPC is active. However, our goala more detailed
analysisshows the bilateral involvement of the lateral SPL and the
inferior parietal lobe in the reasoning process. Additionally, and in
accordance with findings from Fangmeier et al. (2006), it shows the
importance in the core reasoning phasethe model validation phase.
To investigate the role of these regions we conducted an fMRI study
(Maier et al. 2014) and presented participants with indeterminate
problems in which they could construct and vary the mental models.
These processes elicited lateral SPL activation. We argue that in this
region the mental model of the premise information is constructed and
varied (cp Goel et al. 2004)a result supported by our study. Thus,
the lateral SPL is likely to be involved in the reasoning process. A
causal connection can be established if a malfunctioning SPL leads to
a decrease in reasoning performance. A method to induce virtual
lesions is transcranial magnetic stimulation (TMS, Walsh, PascualLeone 2003). Hence, in a recent study we investigated the role of the
SPL on the construction and alteration of mental models (Franzmeier
et al. 2014). TMS on the SPL modulated the performance in deductive

123

reasoning tasks, i.e., participants needed longer if the SPL was


stimulated during the model validation phase. A performance modulation was achieved by unilateral right and by bilateral stimulation.
The modulation direction, i.e., whether the performance was
enhanced or disrupted, depended on stimulation timing. Real lesions
can shed additional light on this. A recent study by Waechter et al.
(2013) compared patients with lesions in the rostrolateral prefrontal
cortex to patients with PPC lesions, and controls on transitive inference problems. These results further support the role of the lateral
SPL in drawing inferences and its crucial involvement in mental
model construction. All studies show the eminent role of lateral SPL
(and hence of the PPC) for the reasoning process. Premise information is integrated and manipulated in a mental model which is at least
partially kept in the lateral SPL and to a lesser degree the inferior
parietal cortex.
Acknowledgments
The work has been partially supported by a grant to MR from the
DFG within the SFB/TR 8 in project R8-[CSPACE]. The authors are
grateful to Barbara Kuhnert for drawing the brain picture and
Stephanie Schwenke for proof-reading.

References
Acuna BD, Eliassen JC, Donoghue JP, Sanes JN (2002) Frontal and
parietal lobe activation during transitive inference in humans.
Cerebral Cortex (New York, N.Y.: 1991), 12(12):13121321
Brzezicka A, Sedek G, Marchewka A, Gola M, Jednorg K, Krlicki L,
Wrbel A (2011) A role for the right prefrontal and bilateral
parietal cortex in four-term transitive reasoning: an fMRI study

Cogn Process (2014) 15 (Suppl 1):S1S158


with abstract linear syllogism tasks. Acta Neurobiologiae Experimentalis, 71(4):479495
Eickhoff SB, Paus T, Caspers S, Grosbras MH, Evans A, Zilles K,
Amunts K (2007) Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. NeuroImage 36(3):511521
Fangmeier T, Knauff M (2009) Neural correlates of acoustic reasoning.
Brain Res 1249:181190. doi:10.1016/j.brainres.2008.10.025
Fangmeier T, Knauff M, Ruff CC, Sloutsky V (2006) FMRI evidence
for a three-stage model of deductive reasoning. J Cogn Neurosci
18(3):320334
Franzmeier I, Maier SJ, Ferstl EC, Ragni M (2014) The role of the
posterior parietal cortex in deductive reasoning: a TMS study. In
OHMB 2014. Human Brain Mapping Conference, Hamburg
Goel V, Gold B, Kapur S, Houle S (1998) Neuroanatomical correlates
of human reasoning. J Cogn Neurosci 10(3):293302
Goel V, Dolan RJ (2001) Functional neuroanatomy of three-term
relational reasoning. Neuropsychologia 39(9):901909
Goel V, Makale M, Grafman J (2004) The hippocampal system
mediates logical reasoning about familiar spatial environments.
J Cogn Neurosci 16:654664
Goel V, Stollstorff M, Nakic M, Knutson K, Graf-man J (2009) A role
for right ventrolateral prefrontal cortex in reasoning about
indeterminate relations. Neuropsychologia 47(13):27902797
Hinton EC, Dymond S, von Hecker U, Evans CJ (2010) Neural correlates of relational reasoning and the symbolic distance effect:
involvement of parietal cortex. Neuroscience 168(1):138148
Johnson-Laird PN (2006) How we reason. Oxford University Press,
New York
Knauff M (2006) Deduktion und logisches Denken. Denken und Problemlosen. Enzyklopadie der Psychologie, 8, Hogrefe, Gottingen
Knauff M (2009) A neuro-cognitive theory of deduc- tive relational
reasoning with mental models and visual images. Spatial Cogn
Comput 9(2):109137
Knauff M, Fangmeier T, Ruff CC, Johnson-Laird PN (2003) Reasoning, models, and images: Behavioral measures and cortical
activity. J Cogn Neurosci 15(4):559573
Knauff M, Johnson-Laird PN (2002) Visual imagery can impede
reasoning. Memory Cogn 30(3):363371
Knauff M, Mulack T, Kassubek J, Salih HR, Greenlee MW (2002)
Spatial imagery in deductive reasoning: a functional MRI study.
Brain Res Cogn Brain Res 13(2):203212
Knauff M (2013) Space to Reason: A Spatial Theory of Human
Thought. MIT Press
Prado J, Chadha A, Booth JR (2011) The brain network for deductive
reasoning: a quantitative meta- analysis of 28 neuroimaging
studies. J Cogn Neurosci 23(11):34833497
Prado J, Mutreja R, Booth JR (2013) Fractionating the neural substrates of transitive reasoning: task- dependent contributions of
spatial and verbal representations. Cerebral Cortex (New York,
N.Y.: 1991):23(3):499507
Prado J, Noveck IA, Van Der Henst J-B (2010a). Overlapping and
distinct neural representations of numbers and verbal transitive
series. Cerebral Cortex (New York, N.Y.: 1991), 20(3):720729
Prado J, Van Der Henst JB, Noveck IA (2010b). Recomposing a
fragmented literature: How conditional and relational arguments
engage different neural systems for deductive reasoning. Neuroimage 51(3):12131221
Ragni M, Knauff M (2013) A theory and a computational model of
spatial reasoning with preferred mental models. Psychol Rev
120 (3):561588
Ruff CC, Knauff M, Fangmeier T, Spreer J (2003) Reasoning and
working memory: common and distinct neuronal processes.
Neuropsychologia 41(9):12411253
Shokri-Kojori E, Motes MA, Rypma B, Krawczyk DC (2012) The
Network Architecture of Cortical Processing in Visuo-spatial
Reasoning. Scientific Reports, 2. doi:10.1038/srep00411

S131
Waechter RL, Goel V, Raymont V, Kruger F, Grafman J (2013)
Transitive inference reasoning is impaired by focal lesions in
parietal cortex rather than rostrolateral prefrontal cortex. Neuropsychologia 51(3):464471
Walsh P-L (2003) Transcranial magnetic stimulation: A neurochronometrics of mind. MIT Press, Cambridge
Wendelken C, Bunge SA (2010) Transitive inference: distinct contributions of rostrolateral prefrontal cortex and the hippocampus.
J Cogn Neurosci 22(5):837847

How to build an inexpensive cognitive robot: Mind-R


Enrico Rizzardi1, Stefano Bennati2, Marco Ragni1
1
University of Freiburg, Germany; 2 ETH Zurich, Switzerland
Abstract
Research in Cognitive Robotics is dependent on standard robotic platforms that are designed to provide high precision as required by
classical robotics, such platforms are generally expensive. In most cases
the features provided by the robot are more than are needed to perform
the task and this complexity is not worth the price. In this article we
propose a new reference platform for Cognitive Robotics that, thanks to
its low price and full-featured set of capabilities, will make research
much more affordable and pave the way for more contributions in the
field. The article describes the requirements and procedure to start using
the plat- form and presents some use examples.
Keywords
Mind-R, Cognitive Robotics, ACT-R, Mindstorms
Introduction
Cognitive Robotics aims to bring human level intelligence to robotic
agents equipping them with cognitive- based control algorithms.
This can be accomplished by extending the capabilities of robots
with concepts from Cognitive Science, e.g. learning, reasoning and
planning abilities. The main difference to classical robotics is in the
requirements: cognitive robots must show robust and adaptable
behavior, while precision and efficiency are not mandatory. The
standard robotic platforms are designed to comply with the
demanding requirements of classical robotics; therefore the entry
price is high enough to become an obstacle for most researchers.
To address this issue we present a new robotic plat- form targeted
to Cognitive Robotics research that we called Mind-R. The advantages of Mind-R over other robotic hardware are its low price and
customization capabilities. Its greatest disadvantage is that its sensors
and actuators are not even nearly as precise as other commercial
hardware, but this is not a big is- sue in Cognitive Robotics which
does not aim at solving tasks efficiently and precisely, instead flexibility and adaptability are the focus.
The article is structured as follows: Section 2 briefly describes the
ACT-R theory and how the Mind-R modules fit into its framework,
Section 3 provides details of the characteristics of the hardware
platform and Section 4 details a step-by-step guide on how to install
the software and run some examples.
ACT-R
ACT-R (Bothell 2005) is a very well know and widely tested cognitive architecture. It is the implementation of a theory of the mind
developed by (Anderson et al. 2004) and validated by many experiments over the years. The ACT-R framework has a modular structure
that can be easily expanded with new modules that allow re- searchers
to add new features to the architecture, such as controlling a robotic
platform.
ACT-R models the structure and behavior of the human brain.
Each module has a specific function (i.e. visual, motor, etc.) that
reflects the functional organization of the cortex. The modules can
exchange information through their buffers; each module can read all

123

S132

Cogn Process (2014) 15 (Suppl 1):S1S158


Imaginal
Module

Goal
Module
Goal
Buffer

Declarative
Module
Retrieval
Buffer

Im aginal
Buffer

Procedural
Module

nxtmove

nxttouched

nxtvisual

nxtdistance

nxtmotor

nxttouch

nxtvision

nxtdistance

Environment

Fig. 1 ACT-R structure with Mind-R modules


the buffers, but can write only in its own (i.e. to answer queries).
Communication is coordinated by the procedural module, which is a
serial bottleneck. Extending ACT-R to read from sensors and control
actuators required writing new modules, that give cognitive models
the capability to communicate with the robot as it were the standard
ACT-Rs virtual device (Fig. 1).
Mind-R robot
The robot used to build Mind-R is the LEGO Mind- storms set
(LEGO 2009, Fig. 2). Its core relies on the central brick that includes
the CPU, batteries and communication interfaces. Mind-Rs design is
fully customizable using LEGO Mindstorms bricks. The robot can be
programmed through the USB interface using many different languages and tools. To keep the interface with ACT-R straightforward,
the chosen programming language is Lisp The Lisp interpreter can
command the robot through the NXT-LSP libraries (Hiraishi 2007),
which are at the core of the ACT-R interface.
The LEGO Mindstorms in the Mind-R configuration is composed
of: one ultrasonic sensor, two bumpers, one color sensor and two
engines. The ultrasonic sensor, shown in Fig. 3a, provides an
approximate distance from the next obstacle inside a solid angle. A
bumper, show in Fig. 3b, provides binary state in- formation, that is it
can distinguish between press and release states. The color sensor,
show in Fig. 3c, can distinguish between basic colors, for example
blue, red and green, over a short distance. The engines are step- per
motors, able to turn in both directions. Two engines together make the
driving system in Fig. 3d. Each engine drives a wheel, shown in the
upper left and upper right-hand corners of Fig. 3d; they can be controlled to navigate in the environment. The third wheel, in the bottom
part of Fig. 3d, has no tire has and only has balance purposes.
Mind-R has already shown its effectiveness in spatial planning
problems (Bennati and Ragni 2012). As a future development the
effect of complex visual perception, coming from image processing
software, on robotic navigation will be investigated.
Setup
This section provides a step-by-step guide on how to in- stall and
configure the needed software to run the Mind- R platform. A more
detailed guide can be found on the Mind-R website9 containing
examples and a troubleshooting section.
The LEGO NXT Fantom driver10 can be obtained from the LEGO
Mindstorms website or from the Mind- storms installation CD and
then installed. A reboot of the computer may be required. The
9

http://webexperiment.iig.uni-freiburg.de/mind-r/index.html.
For Intel-based Macs users: make sure to install the cor- rect driver
for your platform. For GNU/Linux no support is provided.

Fig. 2 Mind-R robot

Fig. 3 The Mindstorms robot and its sensors

recommended interpreter is SBCL x86. The SBCL installer can be


down- loaded from its website.11 ACT-R 6.0 can be downloaded from
its website and unpacked into a local folder. The Mind-R software,
containing the NXT communication library, the peripherals modules
and a demo model can be downloaded from the Mind-R website. The

10

123

11

SBCL website: http://www.sbcl.org/.

Cogn Process (2014) 15 (Suppl 1):S1S158


source code of the NXT communication library is provided together
with a guide about how to compile it. The con- tent of the Mind-R
archive has to be unpacked into a local folder.
The robot has to be connected to the computer through a USB port
before loading the modules with the interpreter. After the robot is
recognized by the OS, the interpreter can be launched. Make sure to
start the interpreter from the folder where the Mind-R software has
been unpacked. The NXT communication library can be loaded from
LISP interpreter with the command (load nxt.lsp). If the SBCL
current working directory is not the one into which Mind-R has been
unpacked, the NXT communication library loading will fail. This will
prevent the demo or any other Mind-R models from running. If the
loading was successful, SBCL returns T; if anything else is returned
as final output, see the troubleshooting section of the Mind-R website.
The next step is to load ACT-R with the command (load /path/to/
act-r/load-act-r-6.lisp), replacing the path with the appropriate one.
The Mind- R modules can now be loaded. Within the Mind-R archive
four modules are provided: nxt-distance.lsp, nxt-touch.lsp, nxtmotor.lsp and nxt-vision.lsp The first al- lows ACT-R to communicate
with the ultrasonic sensor, the second one with the bumpers, the third
commands the engines and the last the color sensor.
When the modules have been loaded by the interpreter with a load
command, the robot setup phase can be concluded by executing the
function nxt-setup.
Once all this software has been successfully installed and loaded, a
demo model consisting of a series of productions that let Mind-R
explore the environment, can be started. The demo model is called
demo.lsp and can be loaded with a load command or using the
graphical interface. This demo model is very simple and is designed
to be a starting point for building more complex models.
To run the demo use the ACT-R run function. The engines of the
robot might continue running after a model has terminated, this
depends both on the model structure and on its final state. To stop the
engines the function motor-reset has to be called. The demo model
contains some productions used to let the robot inter- act with the
environment. Figures 4 and 5 show productions similar to those in the
model. Figure 4 shows two simple productions that are used to read
the distance measured by ultrasonic sensor and to print the read
distance into the console by invoking !outptut!.
Figure 5 shows two productions that send commands to the
engines. The left one makes the robot move straight forward. The
value next to the duration field indicates how long the engines have to
rotate be- fore stopping. The right one makes the robot turn right.
Again, the higher the value assigned to duration, the longer the robot
will turn in that direction.
Conclusions
The described platform is a low-priced and flexible cognitive robot
that give researchers in the field of Cognitive Robotics an affordable
alternative to the most common and expensive robotic platforms.
The platform is based on the widely accepted architecture ACT-R
6.0 that controls the robot and interacts with the environment through
LEGO Mind- storms sensors and actuators. The Mind-R robot has

Fig. 4 Productions that read the distance from the ultrasonic sensor

S133

Fig. 5 Production rules that control the engines: left moves forward,
right turns right

already proven effective in spatial navigation (Bennati and Ragni 2012)


tasks where it was able to replicate human navigation results and errors.
The flexibility of the platform allows advanced perception capabilities such as computer vision and natural speech to be added. A first
step in this direction is (Rizzardi 2013) in which a communication
module and digital image processing software were used to ac- quire
information from a retail webcam to improve the robotic navigation
with the use of visual landmarks.
By integrating ACT-R on a LEGO Mindstorms platform it is
possible to use other ACT-R models from driver models (Haring et al.
2012), reasoning (Ragni and Brussow 2011) and planning (Best and
Lebiere 2006) towards a true unified cognition approach. How- ever,
equipping robots with cognitive knowledge is not only important for
learning about human and embodied cognition (Clark 1999), it is
becoming increasingly important for Human-Robot-Interaction,
where a successful interaction depends on an understanding of the
other agents behavior (Trafton et al. 2013).
Our hope is that an affordable robot and the bridging function
towards ACT-R maybe fruitful for research and education purposes.
Acknowledgments
This work has been supported by the SFB/TR 8 Spatial Cognition
within project R8-[CSPACE] funded by the DFG. A special thanks to
Tasuku Hiraishi, the developer of the NXT-LSP communication
library.
References
Anderson J, Bothell D, Byrne M, Douglass S, Lebiere C, Qin Y
(2004) An integrated theory of the mind. Psychological Review
111(4):10361060
Bennati S, Ragni M (2012) Cognitive Robotics: Analysis of Preconditions and Implementation of a Cognitive Robotic System for
Navigation Tasks. In: Proceedings of the 11th International Conference on Cognitive Modeling, Universitaetsverlag der TU Berlin
Best BJ, Lebiere C (2006) Cognitive agents interacting in real and
virtual worlds. Cognition and multi-agent interaction: From
cognitive modeling to social simulation pp 186218
Bothell D (2005) ACT-R. http://act-r.psy.cmu.edu
Clark A (1999) An embodied cognitive science? Trend Cogn Sci
3(9):345351
Haring K, Ragni M, Konieczny L (2012) Cognitive Model of Drivers
Attention. In: Russwinkel N, Drewitz U, van Rijn H (eds) Proceedings of the 11th International Conference on Cognitive
Modeling, Universitaetsverlag der TU Berlin, pp 275280
Hiraishi T (2007) Nxt controller in lisp http://super.para.media.
kyoto-u.ac.jp/*tasuku/index-e.html
LEGO (2009) LEGO Mindstorms. http://mindstorms.lego.com
Ragni M, Brussow S (2011) Human spatial relational reasoning:
Processing demands, representations, and cognitive model. In:

123

S134
Burgard W, Roth D (eds) Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, CA
Rizzardi E (2013) Cognitive robotics: Cognitive and perceptive
aspects in navigation with landmarks. Masters thesis, Universita
degli Studi di Brescia, Brescia, Italy
Trafton G, Hiatt L, Harrison A, Tamborello F, Khemlani S, Schultz A
(2013) Act-r/e: An embodied cognitive architecture for humanrobot interaction. Journal of Human-Robot Interaction 2(1):3055

Crossed hands stay on the time-line


Bettina Rolke1, Susana Ruiz Fernandez2, Juan Jose Rahona Lopez3,
Verena C. Seibold1
1
Evolutionary Cognition, University of Tubingen, Germany; 2 Leibniz
Knowledge Media Research Center (KMRC), Tubingen, Germany;
3
Complutense University, Madrid, Spain
How are objects and concepts represented in our memories? This
question has been addressed in the past by two controversial positions. Whereas some theories suppose that representations are coded
in amodal concept nodes (e.g., Kintsch 1998), more recent
embodied theories assume that internal representations include multimodal perceptual and motor experiences. One example for the latter
conception constitutes the conceptual metaphor view, which assumes
that abstract concepts like time are grounded in cognitively more
accessible concepts like space (Boroditsky et al. 2010). This view
is empirically supported by timespace congruency effects, showing
faster left-hand responses to past-related words and faster right-hand
responses to future-related words compared to responses with
reversed stimulusresponse mapping (e.g., Santiago et al. 2007). This
congruency effect implies that time is mentally represented along a
line, extending horizontally from left to right. Whereas the existence
of this mental time-line has been supported by several empirical
findings (see Bonato et al. 2012), the specific properties of the spatial
reference frame are still unclear. The aim of the present study was to
shed further light into the specific relationship between temporal and
spatial codes. Precisely, we examined whether the frame of reference
for the association between temporal and spatial codes is based on the
structural embodied side of the motor effectors, meaning that left
(right) refers to the left (right) hand, independent of the actual hand
position, or, alternatively, whether the frame of reference is organized
along an egocentric spatial frame which represents things and effectors occurring at the left (right) body side as left (right)-sided. In other
words, according to the embodied frame of reference, the left hand
represents left, irrespective whether it is placed at the left or right
body side, whereas according to an egocentric spatial frame, the left
hand represents the left side, when it is placed at the left body side,
but represents the right side, when it is placed at the right body side.
Method
We employed a spatial discrimination task. Participants (N = 20) had
to respond with their right or left hand depending on the presentation
side of a rectangle. Specifically, when the rectangle was presented at
the left (right) side of fixation, participants had to press a response key
on their left (right) side. In the uncrossed-hands condition, participants
placed their left (right) index finger at the left (right) response key, in
the crossed-hands condition they crossed hands and thus responded
with their right (left) index finger on the left (right) key to left (right)
sided targets. To activate spatial codes by time related words, we
combined the spatial discrimination task with a priming paradigm and
presented future and past related time words (e.g., yesterday, tomorrow) before the rectangle appeared (see Rolke et al. 2013). To monitor
the time-course of the timespace congruency effect, we manipulated
the SOA between the prime word and the rectangle (SOA = 300, 600,
or 1,200 ms). Whereas the uncrossed condition served as baseline to
establish the timespace congruency effect, the crossed-hands

123

Cogn Process (2014) 15 (Suppl 1):S1S158


condition allowed us to investigate whether the timespace congruency is based on egocentric spatial codes or whether it depends on
body-referenced effector sides. Specifically, if the timespace congruency effect is based on an egocentric frame of reference, we expect
faster RT for left (right) key responses following past (future) words
regardless of response condition. If, on the other hand, the timespace
congruency effect depends on effector side, we expect that the pattern
reverse across conditions, i.e., faster RT should result for left (right)
key responses following past (future) words in the uncrossed-hands
condition, but faster RT for left (right) key responses following future
(past) words in the crossed-hands condition.
The experiment factorially combined response condition
(uncrossed hands, crossed hands), temporal reference (past, future),
response key position (left, right), and SOA (300, 600, or 1,200 ms).
Repeated measures analyses of variance (ANOVA) were conducted
on mean RT of correct responses and percent correct (PC) taking
participants (F1) and items (F2) as random factors. P-values were,
whenever appropriate, adjusted for violations of the sphericity
assumption using the Greenhouse-Geisser correction.
Results
RT results are summarized in Fig. 1 which depicts mean RT as a function
of temporal reference, response key position, SOA, and response hands
condition. An ANOVA on RT revealed shorter RT for the uncrossed
compared to the crossed condition, F1(1,19) = 48.9, p \ .001;
F2(1,11) = 9198.2, p \ .001. SOA exerted an influence on RT,
F1(2,38) = 28.3, p \ .001; F2(2,22) = 243.4, p \ .001. Shorter RTs
were observed at shorter SOAs (all contrasts between SOAs p \ .05). As

Fig. 1 Mean RT depending on response key position, temporal


reference of prime words, and SOA. Solid lines represent data of the
uncrossed response condition; dotted lines represent data of the
crossed response condition. For sake of visual clarity, no error bars
were included in the figure

Cogn Process (2014) 15 (Suppl 1):S1S158


one should expect, response condition interacted with response key
position, F1(1,19) = 8.3, p = .01; F2(1,11) = 93.5, p \ .001, indicating a right hand benefit for right hand responses at the left side in the
crossed condition and at the right side in the uncrossed condition. Theoretically most important, temporal reference and response key position
interacted, F1(1,19) = 9.7, p = .01; F2(1,11) = 17.6, p = .002. This
timespace congruency effect was neither modulated by response condition, F1(1,19) = 1.1, p = .31; F2(1,11) = 1.4, p = .27, nor by SOA,
F1(2,38) = 1.2, p = .30; F2(2,22) = 1.0, p = .37. All other effects
were not significant, all ps [ .31. Participants conducted more errors in
the crossed than in the uncrossed response condition, F1(1,19) = 24.3,
p \ .001; F2(1,11) = 264.3, p \ .001. The F2-analysis further revealed
an interaction between response key position, SOA, and response condition F2(2,22) = 3.9, p = .04. There were no other significant effects
for PC, all ps [ .07.
Discussion
By requiring responses on keys placed on the left or right by crossed
and uncrossed hands, we disentangled the egocentric spatial space and
the effector-related embodied space. The presentation of a time
word before a lateralized visual target evoked a spacetime congruency effect, that is, responses were fastened for spatially left (right)
responses when a past (future) word preceded the rectangle. Theoretically most important, this spacetime congruency effect was not
modulated when hands were crossed. This result indicates that temporal codes activate abstract spatial codes rather than effector-related
spatial codes.
References
Bonato M, Zorzi M, Umilta C (2012) When time is space: Evidence
for a mental time line. Neurosci Biobehav Rev 36:22572273.
doi:10.1016/j.neubiorev.2012.08.007
Boroditsky L, Fuhrman O, McCormick K (2010) Do English and
Mandarin speakers think about time differently? Cognition
118:123129. doi:10.1016/j.cognition.2010.09.010
Kintsch, W. (1998) Comprehension: A paradigm for cognition.
Cambridge University Press, New York
Rolke B, Ruiz Fernandez S, Schmid M, Walker M, Lachmair M,
Rahona Lopez JJ, Hervas G, Vazquez C (2013) Priming the
mental time-line: Effects of modality and processing mode. Cogn
Process 14:231244. doi:10.1007/s10339-013-0537-5
Santiago J, Lupianez J, Perez E, Funes MJ (2007) Time (also) flies
from left to right. Psychon Bull Rev 14:512516. doi:
10.1007/s10339-013-0537-5

Is the novelty-P3 suitable for indexing mental workload


in steering tasks?
Menja Scheer, Heinrich H. Bulthoff, Lewis L. Chuang
Max Planck Institute for Biological Cybernetics, Tubingen, Germany
Difficulties experienced in steering a vehicle can be expected to place
a demand on ones mental resources (ODonnell, Eggemeier 1986).
While the extent of this mental workload (MWL) can be estimated by
self-reports (e.g., NASA-TLX; Hart, Staveland 1988), it can also be
physiologically evaluated in terms of how a primary task taxes a
common and limited pool of mental re- sources, to the extent that it
reduces the electroencephalographic (EEG) responses to a secondary
task (e.g. an auditory oddball task). For example, the participant could
be primarily required to control a cursor to track a target while
attending to a series of auditory stimuli, which would infrequently
present target tones that should be responded to with a button-press
(e.g., Wickens, Kramer, Vanasse and Donchin 1983). Infrequently
presented targets, termed oddballs, are known to elicit a large positive
potential after approximately 300 ms of their presentation (i.e.,P3).

S135
Indeed, increasing tracking difficulty either by decreasing the predictability of the tracked target or by changing the complexity of the
controller dynamics has been shown to attenuate P3 responses in the
secondary auditory monitoring task (Wickens et al. 1983; Wickens,
Kramer and Donchin 1984).
In contrast, increasing tracking difficultyby introducing more
frequent direction changes of the tracked target (i.e. including higher
frequencies in the function that describes the motion trajectory of the
target)has been shown to bear little influence on the secondary
tasks P3 response (Wickens, Israel and Donchin 1977; Isreal,
Chesney, Wickens and Donchin 1980). Overall, the added requirement of a steering task consistently results in a lower P3 amplitude,
relative to performing auditory monitoring alone (Wickens et al.
1983; Wickens et al. 1977; Isreal et al. 1980).
Using a dual-task paradigm for indexing workload is not ideal.
First, it requires participants to perform a secondary task. This prevents it from being applied in real-world scenarios; users cannot be
expected to perform an unnecessary task that could compromise their
critical work performance. Second, it can only be expected to work if
the performance of the secondary task relies on the same mental
resources as those of the primary task (Wickens, Yeh 1983), requiring
a deliberate choice of the secondary task. Thus, it is fortunate that
more recent studies have demonstrated that P3 amplitudes can be
sensitive to MWL, even if the auditory oddball is ignored (Ullsperger,
Freude and Erdmann 2001; Allison, Polich 2008). This effect is said
to induce a momentary and involuntary shift in general attention,
especially if recognizable sounds (e.g. a dog bark, opposed to a pure
sound) are used (Miller, Rietschel, McDonald and Hatfield 2011).
The current work, containing two experiments, investigates the
conditions that would allow novelty-P3, the P3 elicited by the
ignored, recognizable oddball, to be an effective index for the MWL
of compensatory tracking. Compensatory tracking is a basic steering
task that can be generalized to most implementations of vehicular
control. In both experiments participants were required to use a joystick to counteract disturbances of a horizontal plane. To evaluate the
generalizability of this paradigm, we depicted this horizontal plane as
either a line in a simplified visualization or as the horizon in a realworld environment. In the latter, participants experienced a large
field-of-view perspective of the outside world from the cockpit of an
aircraft that rotated erratically about its heading axis. The task was the
same regardless of the visualization. In both experiments, we
employed a full factorial design for the visualization (instrument,
world) and 3 oddball paradigms (in experiment 1) or 4 levels of task
difficulty (in experiment 2) respectively. Two sessions were conducted on separate days for the different visualizations, which were
counter-balanced for order. Three trials were presented per oddball
paradigm (experiment 1) or level of task difficulty (experiment 2) in
blocks, which were randomized for order. Overall, we found that
steering performance was worse when the visualization was provided
by a realistic world environment in experiments 1 (F (1, 11) = 42.8,
p \ 0.01) and 2 (F (1, 13) = 35.0, p \ 0.01). Nonetheless, this
manipulation of visualization had no consequence on our participants
MWL as evaluated by a post-experimental questionnaire (i.e., NASATLX) and EEG responses. This suggests that MWL was unaffected by
our choice of visualization.
The first experiment, with 12 participants, was designed to identify
the optimal presentation paradigm of the auditory oddball. For the
EEG analysis, two participants had to be excluded, due to noisy
electrophysiological recordings (more than 50 % of rejected epochs).
Whilst performing the tracking task, participants were presented with
a sequence of auditory stimuli that they were instructed to ignore.
This sequence would, in the 1-stimulus paradigm, only contain the
infrequent odd- ball stimulus (i.e., the familiar sound of a dogs bark
(Fabiani, Kazmerski, Cycowicz and Friedmann 1996)). In the
2-stimulus paradigm this infrequently presented oddball (0.1) is
accompanied by a more frequently presented pure tone (0.9) and in

123

S136
the 3-stimulus paradigm the infrequently presented oddball (0.1) is
accompanied by a more frequently presented pure tone (0.8) and an
infrequently presented pure tone (0.1). These three paradigms are
widely used in P3 research (Katayama, Polich 1996). It should be
noted, however, that the target to target interval is 20 s regardless of
the paradigm. To obtain the ERPs the epochs from 100 ms before to
900 ms after the onset of the recognizable oddball stimulus, were
averaged. Mean amplitude measurements were obtained in a 60 ms
window, centered at the group- mean peak latency for the largest
positive maximum component between 250 and 400 ms for the
oddball P3, for each of the three mid-line electrode channels of
interest (i.e., Fz, Cz, Pz). In agreement with previous work, the
novelty-P3 response is smaller when participants had to perform the
tracking task compared to when they were only presented with the
task-irrelevant auditory stimuli, without the tracking task (F (1,
9) = 10.9, p \ 0.01). However, the amplitude of the novelty-P3
differed significantly across the presentation paradigms (F (2,
18) = 5.3, p \ 0.05), whereby the largest response to our task-irrelevant stimuli was elicited by the 1- stimulus oddball paradigm. This
suggests that the 1-stimulus oddball paradigm is most likely to elicit
novelty-P3 s that are sensitive to changes in MWL. Finally, the
attenuation of novelty-P3 amplitudes by the tracking task varied
across the three mid-line electrodes (F (2, 18) = 28.0, p \ 0.001).
Pairwise comparison, Bonferroni corrected for multiple comparisons,
revealed P3 amplitude to be largest at Cz, followed by Fz and smallest
at Pz (all p \ 0.05). This stands in contrast with previous work that
found control difficulty to attenuate P3 responses in parietal electrodes (cf., Isreal et al. 1980; Wickens et al. 1983). Thus, the current
paradigm that uses a recognizable, ignored sound is likely to reflect an
underlying process that is different from previous studies, which
could be more sensitive to the MWL demands of a tracking task.
Given the result of experiment 1, the second experiment with 14
participants, investigated whether the 1-stimulus oddball paradigm
would be sufficiently sensitive in indexing tracking difficulty as
defined by the bandwidth of frequencies that contributed to the disturbance of the horizontal plane (cf., Isreal et al. 1980). Three
different bandwidth profiles (easy, medium, hard) defined the linear
increase in the amount of disturbance that had to be compensated for.
This manipulation was effective in increasing subjective MWL,
according to the results of a post- experimental NASA-TLX questionnaire (F (2, 26) = 14.9, p \ 0.001) and demonstrated the
expected linear trend (F (1, 13) = 23.2, p \ 0.001). This increase in
control effort was also reflected in the amount of joystick activity,
which grew linearly across the difficulty conditions (F (1, 13) = 42.2,
p \ 0.001). For the EEG analysis two participants had to be excluded
due to noisy electrophysiological recordings (more than 50 % of
rejected epochs). A planned contrast revealed that the novelty- P3 was
significantly lower in the most difficult condition compared to the
baseline viewing condition, where no tracking was done (F (1,
11) = 5.2, p \ 0.05; see Fig. 1a). Nonetheless, novelty-P3 did not
differ significantly between the difficulty conditions (F (2,
22) = 0.13, p = 0.88), nor did it show the expected linear trend (F (1,
11) = 0.02, p = 0.91). Like (Isreal et al. 1980), we find that EEGresponses do not discriminate for MWL that is associated with controlling increased disturbances. It remains to be investigated, whether
the novelty-P3 is sensitive for the complexity of controller dynamics,
like it has been shown for the P3.
The power spectral density of the EEG data around 10 Hz (i.e.,
alpha) has been suggested by (Smith, Gevins 2005) to index MWL. A
post hoc analysis of our current data, at electrode Pz, revealed that
alpha power was significantly lower for the medium and hard conditions, relative to the view-only condition (F (1, 11) = 6.081,
p \ 0.05; (F (1, 11) = 6.282, p \ 0.05). Nonetheless, the expected
linear trend across tracking difficulty was not significant (Fig. 1b).
To conclude, the current results suggest that a 1-stimulus oddball
task ought to be preferred when measuring general MWL with the

123

Cogn Process (2014) 15 (Suppl 1):S1S158

Fig. 1 a left Grand average ERP data of Experiment 2 averaged over


Fz, Cz, Pz; right averaged amplitude of P3 as function of tracking
difficulty. b left Averaged power spectral density (PSD) at Pz; right
averaged PSD as a function of tracking difficulty
novelty-P3. Although changes in novelty-P3 can identify the control
effort required in our compensatory tracking task, it is not sufficiently
sensitive to provide a graded response across different levels of disturbances. In this regard, it may not be as effective as self-reports and
joystick activity in denoting control effort. Nonetheless, further
research can improve upon the sensitivity of EEG metrics to MWL by
investigating other aspects that better correlate to the specific
demands of a steering task.
Acknowledgments
The work in this paper was supported by the myCopter project, funded
by the European Commission under the 7th Framework Program.
References
Allison BZ, Polich J (2008) Workload assessment of computer
gaming using a single-stimulus event-related potential paradigm.
Biol Psychol 77 (3):277283
Fabiani M, Kazmerski V, Cycowicz Y, Friedmann, D. (1996) Naming
norms for brief environmental sounds. Psychol Rev 33:462475
Hart SG, Staveland LE (1988) Development of NASA-TLX (Task
Load Index). Results of empirical and theoretical research
Isreal JB, Chesney GL, Wickens CD, Donchin E (1980) P300 and
tracking difficulty: evidence for multiple resources in dual-task
performance. Psychophysiology 17 (3):259273
Katayama J, Polich J (1996) P300 from one-, two-, and three-stimulus
auditory paradigms. Int J Psychophysiol 23, 3340
Miller MW, Rietschel JC, McDonald CG, Hatfield BD (2011) A
novel approach to the physiological measurement of mental
workload. Int J Psychophysiol 80 (1):7578
ODonnell RC, Eggemeier TF (1986) Workload assessment methodology. Handbook of Perception and Human Performance,
2:149
Smith ME, Gevins A (2005) Neurophysiologic monitoring of mental
workload and fatigue during operation of a flight simulator.
Defense and Security (International Society for Optics and
Photonics) 116126

Cogn Process (2014) 15 (Suppl 1):S1S158


Ullsperger P, Freude G, Erdmann U (2001). Auditory probe sensitivity to mental workload changesan event-related potential
study. Int J Psychophysiol 40 (3):201209
Wickens CD, Kramer AF, Vanasse L, Donchin E (1983) Performance
of concurrent tasks: a psychophysiological analysis of the reciprocity of information-processing resources. Science 221
(4615):10801082
Wickens CD, Israel J, Donchin E (1977) The event related potential
as an index of task workload. Proceedings of the Human Factors
Society Annual Meeting 21, 282286
Wickens CD, Kramer AF, Donchin E (1984). The event- related
potential as an index of the processing demands of a complex
target acquisition task. Annals of the New York Academy of
Sciences 425 (955610):295299
Wickens CD, Yeh Y-Y (1983) The dissociation between subjective
workload and performance: A multiple resource approach. In:
Proceedings of the human factors and ergonomics society annual
meeting, 27(3):244248

Modeling perspective-taking by forecasting 3D


biological motion sequences
Fabian Schrodt, Martin V. Butz
Cognitive Modeling, Computer Science Department, University
of Tubingen, Germany
Abstract
The mirror neuron system (MNS) is believed to be involved in social
abilities like empathy and imitation. While several brain regions have
been linked to the MNS, it remains unclear how the mirror neuron
property itself develops. Previously, we have introduced a recurrent
neural network, which enables mirror-neuron capabilities by learning
an embodied, scale- and translation-invariant model of biological
motion (BM). The model allows the derivation of the orientation of
observed BM by (i) segmenting BM in a common positional
and angular space and (ii) generating short-term, top-down predictions of subsequent motion. While our previous model generated
short-term motion predictions, here we introduce a novel forecasting
algorithm, which explicitly predicts sequences of BM segments. We
show that the model scales on a 3D simulation of a humanoid walking
and is robust against variations in body morphology and postural
control.
Keywords
Perspective Taking; Embodiment; Biological Motion;
Self-Supervised Learning; Sequence Forecasting; Mirror-Neurons;
Recurrent Neural Networks
Introduction
This paper investigates how we may be able to recognize BM
sequences and mentally transform them to the egocentric frame of
reference to bootstrap mirror neuron properties. Our adaptive, selfsupervised, recurrent neural network model (Schrodt et al. 2014)
might contribute to the understanding of the MNS and its implied
capabilities. With the previous model, we were able to generate
continuous mental rotations to learned canonical views of observed
2D BMessentially taking on the perspective of an observed person.
This self-supervised perspective taking was accomplished by backpropagating errors stemming from top-down, short-term predictions
of the BM progression.
In this work, we introduce an alternative or complementary, timeindependent forecasting mechanism of motion segment sequences to
the model. In the brain, prediction and forecasting mechanisms may
be realized by the cerebellum, which is involved in the processing of
BM (Grossman et al. 2000). In addition, it has been suggested that the
cerebellum may also support the segmentation of motion patterns via

S137
the basal ganglia, thereby influencing the learning of motor sequences
in parietal and (pre-)motor cortical areas (Penhune and Steele 2012).
Along these lines, the proposed model learns to predict segments of
motion patterns given embodied, sensorimotor motion signals. Due to
the resulting perspective taking capabilities, the model essentially
offers a mechanism to activate mirror neuron capabilities.
Neural Network Model
The model consists of three successive stages illustrated in the
overview given in Fig. 1. The first stage processes relative positional
and angular values into mentally rotated, motion-direction sensitive
population codes. The second stage performs a modulatory normalization and pooling of those. Stage III is a self- supervised pattern
segmentation network with sequence forecasting, which enables the
back-propagation of forecast errors. We detail the three stages and the
involved techniques in the following sections.
Stage I: Feature Preprocessing
The input of the network is driven by a number of (not necessarily all)
relative joint positions and joint angles of a person. Initially, the network can be driven by self-perception to establish an egocentric
perspective on self-motion. In this case, the relative joint positions
may be perceived visually, while the perception of the joint angles may
be supported by proprioception in addition to vision. When actions of
others are observed, joint angles may be solely identified visually.
In each single interstage Ia in the relative position pathway, a
single, positional body landmark relation is transformed into a
directional velocity by time-delayed inhibition, in which way the
model becomes translation-invariant. Interstage Ib implements a
mental rotation of the resulting directional velocity signals using a
neural rotation module Rl. It is driven by auto-adaptive mental
rotation angles (Euler angles in a 3D space), which are implemented
by bias neurons. The rotational module and its influence on the
directional velocity signals are realized by gain field-like modulations
of neural populations (Andersen et al. 1985). All positional processing
stages apply the same mental rotation Rl, by which multiple error
signals can be merged at the module. This enables orientationinvariance on adequate adaptation of the modules biases. In interstage Ic, each (rotated) D-dimensional directional motion feature is
convolved into a population of 3D - 1 direction-responsive neurons.

Fig. 1 Overview of the three-stage neural modeling approach in a 3D


example with 12 joint positions and 8 joint angles, resulting in n = 20
features. Boxes numbered with m indicate layers consisting of m
neurons. Black arrows describe weighted forward connections, while
circled arrowheads indicate modulations. Dashed lines denote
recurrent connections. Red arrows indicate the flow of the error
signals

123

S138

Cogn Process (2014) 15 (Suppl 1):S1S158

The processing of each one-dimensional angular information is


done analogously, resulting in 2-dimensional population codes. A
rotation mechanism (inter-stage Ib) is not necessary for angles and
thus not applied. In summary, stage I provides a population of neurons for each feature of sensory processing, which is either sensitive
to directional changes in a body-relative limb position (26 neurons for
each 3D position) or sensitive to directional changes in angles
between limbs (2 neurons for each angle).
Stage II: Normalization and Pooling
Stage II first implements individual activity normalizations in the
direction-sensitive populations. Consequently, the magnitude of
activity is generalized over, by which the model becomes scale- and
velocity-invariant. Normalization of a layers activity-vector can be
achieved by axo-axonic modulations, using a single, layer-specific
normalizing neuron (shown as circles in Fig. 1). Next, all normalized
direction-sensitive fields are merged by one-to-one connections to a
pooling layer, which serves as the input to stage III. To also normalize
thepactivity
of the pooling layer, the connections are weighted by

1= n, where n denotes the number of features being processed.


Stage III: Correlation Learning
Stage III realizes a clustering of the normalized and pooled information from stage II (indexed by i) over time by instar weights fully
connected to a number of pattern-responsive neurons (indexed j).
Thus, each pattern neuron represents a unique constellation of positional and angular directional movements. For pattern learning, we
use the Hebbian inspired instar learning rule (Grossberg 1976). To
avoid a catastrophic forgetting of patterns, we use winner-takes-all
competitive learning in the sense that only the weights to the most
active pattern neuron are adapted. We bootstrap the weights from
scratch by adding neural noise to the input of each pattern neuron,
which consequently activates Hebbian learning of novel input patterns. The relative influence of neural noise decreases while a patternsensitive neuron is learned (cf. Schrodt et al. 2014).
In contrast to our previous, short-term prediction approach, here
we apply a time-independent forecasting algorithm (replacing the
attentional gain control mechanism). This is realized by feedback
connections wji from the pattern layer to the pooling layer, which are
trained to approximate the input neti of the pooling layer neurons:
1 owji t

Dwji t neti t  wji t;
g
ot

where neuron j is the last winner neuron that differed from the current
winner in the pattern layer. In consequence, the outgoing weight
vector of a pattern neuron forecasts the input to the pooling layer
while the next pattern neuron is active. The forecasting error can be
backpropagated through the network to adapt the mental
transformation for error minimization (cf. red arrows in Fig. 1).
Thus, perspective adaptation is driven by the difference between the
forecasted and actually perceived motion. The difference di is directly
fed into the pooling layer by the outstar weights:
dit Dwji t;

Fig. 2 Variants of the simulated walker

where j again refers to the preceding winner.


Experiments
In this section, we first introduce the 3D simulation we implemented to evaluate our model. We then show that after training on
the simulated movement, the learned angular and positional correlations can be exploited to take on the perspective of another
person that currently executes a similar motion pattern. The
reported results are averaged over 100 independent runs (training
and evaluating the network starting with different random number
generator seeds).
Simulation and Setup
We implemented a 3D simulation of a humanoid walking with 10
angular DOF. The movement is cyclic with a period of 200 time steps

123

(corresponding to one left and one right walking step). The simulation
provides the 3D positions of all 12 limb endpoints relative to the
bodys center x1 . . .x12 as well as 8 angles a1 . . .a8 between limbs
(inner rotations of limbs are not considered). The view of the walker
can be rotated arbitrarily before serving as visual input to the model.
Furthermore, the simulation allows the definition of the appearance
and postural control of the walker. Each of the implied parameters
(body scale, torso height, width of shoulders/hips and length of arms/
legs, as well as minimum/maximum amplitude of joint angles on
movement) can be varied to log-normally distributed variants of an
average walker, which exhibits either female or male proportions.
Randomly sampled resulting walkers are shown in Fig. 2.
Perspective-Taking on Action Observation with Morphological
Variance
We first trained the model on the egocentric perspective of the
average male walker for 40 k time steps. The rotation biases were
kept fixed since no mental rotation has to be applied during selfperception. In consequence, a cyclic series of 4 to 11 winner patterns
evolved from noise in the pattern layer. Each represents i) a sufficiently linear part of the walking via its instar vector and ii) the next
forecasted, sequential part of the movement via its outstar vector.
After training, we fed the model with an arbitrarily rotated (uniform
distribution in orientation space) view of a novel walker, which was
either female or male with 50 % probability. Each default morphology parameter was varied by a log-normal distribution
LN0; r2 with variance r2 = 0.1, postural control parameters were
not varied. Instar/outstar learning was disabled from then on, but the
mental rotation biases were allowed to adapt according to the
backpropagated forecast error to derive the orientation of the shown
walker.
Figure 3 shows the mismatch of the models derived walker orientation, which we term orientation difference (OD), over time. We define
the OD by the minimal amount of rotation needed to rotate the derived
orientation into the egocentric orientation about the optimal axis of
rotation. In result, all trials converged to a negligible OD, which means
that the given view of the walker was internally rotated to the previously
learned, egocentric orientation. The median remaining OD converged
to * 0.15 with quartiles of * 0.03. The time for the median OD to

Fig. 3 The model aligns its perspective to the orientation of observed


walkers with different morphological parameters (starting at
t = 200). Blue quartiles, black median

Cogn Process (2014) 15 (Suppl 1):S1S158

S139
Grossberg S (1976) on the development of feature detectors in the
visual cortex with applications to learning and reactiondiffusion
systems. Biological Cybernetics 21(3):145159
Grossman E, Donnelly M, Price R, Pickens D, Morgan V, Neighbor
G, Blake R (2000) Brain areas involved in perception of biological motion. Journal of cognitive neuroscience 12(5):711720
Penhune VB, Steele CJ (2012) Parallel contributions of cerebellar,
striatal and m1 mechanisms to motor sequence learning.
Behavioral brain research 226(2):579591
Schrodt F, Layher G, Neumann H, Butz MV (2014) Modeling perspective-taking by correlating visual and proprioceptive
dynamics. In: 36th Annual Conference of the Cognitive Science
Society, Conference Proceedings

Fig. 4 The model aligns its perspective to the orientation of observed


walkers with different postural control parameters

fall short of 1 was 120 time steps. These results show that morphological differences between the self-perceived and observed walkers
could be generalized over. This is because the models scale-invariance
applies to every positional relation perceived by the model.
Perspective-Taking on Action Observation with Postural Control
Variance
In this experiment, we varied the postural control parameters of the
simulation on action observation by a log-normal distribution with
variance r2 = 0.1, instead of the morphological parameters. Again,
female as well as male walkers were presented. The perspective of all
shown walkers could be derived reliably, but with a higher remaining
OD of * 0.67 and more distal quartiles of * 0.32. The median
OD took longer to fall short of 1, namely 154 time steps. This is
because the directions of joint motion are influenced by angular
parameters. Still, variations in postural control could largely be
generalized over (Fig. 4).
Conclusions and Future Work
The results have shown that the developed model is able to recognize
novel perspectives on BM independent from morphological and largely independent from posture control variations. With the previous
model, motion segments are also recognized if their input sequence is
reordered, such that additional, implicitly learned attractors may exist
for the perspective derivation. The introduced, explicit learning of
pattern sequences forces the model to deduce the correct perspective
by predicting the patterns of the next motion segment rather than the
current one. It may well be the case, however, that the combination of
both predictive mechanisms may generate even more robust results.
Future work needs to evaluate the current model capabilities and
limitations as well as possible combinations of the prediction mechanisms further. Currently, we are investigating how missing or
incomplete data could be derived by our model during action
observation.
We believe that the introduced model may help to infer the current
goals of an actor during action observation somewhat independent of
the current perspective. Experimental psychological and further
cognitive modeling studies may examine the influence of motor
sequence learning on the recognition of BM and the inference of
goals. Also, an additional, dynamics-based modulatory module could
be incorporated, which could be used to deduce emotional properties
of the derived motionand could thus bootstrap capabilities related
to empathy. These advancements could pave the way for the creation
of a model on the development of a mirror neuron system that supports learning by imitation and is capable of inferring goals,
intentions, and even emotions from observed BM patterns.
References
Andersen RA, Essick GK, Siegel RM (1985) Encoding of spatial location
by posterior parietal neurons. Science 230(4724):456458

Matching quantifiers or building models? Syllogistic


reasoning with generalized quantifiers
Eva-Maria Steinlein, Marco Ragni
Center for Cognitive Science, University of Freiburg, Germany
Abstract
Assertions in the thoroughly investigated domain of classical syllogistic reasoning are formed using one of the four quantifiers: all,
some, some not, or none. In everyday communication, meanwhile,
set-based quantifiers like most and frequency-based quantifiers such
as normally are more often used. However, little progress has been
made in finding a psychological theory that considers such quantifiers.
This article adapts two theories for reasoning with these quantifiers:
the Matching-Hypothesis and a variant of the Mental Model Theory.
Both theories are evaluated experimentally in a syllogistic reasoning
task. Results indicate a superiority of the model-based approach.
Semantic differences between the quantifiers most and normally are
discussed.
Keywords
Reasoning, Syllogisms, Matching-Hypothesis, Mental Models, Minimal Models
Introduction
Consider the following example:
All trains to Bayreuth are local trains.
Normally local trains are on time.
What follows?
You might infer that, normally, trains to Bayreuth are on timeat
least this is what participants in our experiments tend to do. And, in
absence of other information, it might be even sensible to do so.
However, if you understand the second assertion as Normally local
trains in Germany are on time then the local trains to Bayreuth could
be an exception. So while different conclusions are possible, none of
them is necessarily true. Hence no valid conclusion (NV) follows, but
participants rarely give this logically correct answer.
Problems like the example above consisting of two quantified
premises are called syllogisms. The classical quantifiers all, some,
some not, and none have been criticized for being too strict or
uninformative, respectively, (Pfeifer 2006) and thus being infrequently
used
in
natural
language.
Hence
so-called
generalized quantifiers like most and few have been introduced and
investigated in this field (Chater, Oaksford 1999). In our study we
additionally included the frequency-based term normally that is
used in non-monotonic reasoning. Non-monotonic reasoning (Brewka, Niemela and Truszczynski 2007) deals with rules that describe
what is usually the case, but do not necessarily hold without
exception.
The terms of a syllogistic problem can be in one of four possible
figures (Khemlani, Johnson-Laird 2012). We focus on two: Figure I,

123

S140

Cogn Process (2014) 15 (Suppl 1):S1S158

the order A-B and BC (the example above is of this type with
A = trains to Bayreuth, B = local trains, and C = are on
time), and Figure IV, the term order B-A BC. While Figure I
allows for a transitive rule to be applied, Figure IV does not. Additionally, conclusions can be drawn in two directions, relating A to C
(A-C conclusion) or C to A (C-A conclusion). Several theories of
classical syllogistic reasoning have been postulated based on formal
rules (e.g. (Rips 1994)), mental models (e.g. Bucciarelli, JohnsonLaird 1999), or heuristics (e.g. Chater, Oaksford 1999). However, none of them provides a satisfying account of native
participants syllogistic reasoning behavior (Khemlani, Johnson-Laird
2012).
While most theories only provide predictions for reasoning with
the classical quantifiers, some theories apply equally to generalized
quantifiers. One of the most important approaches in this field is the
Probability Heuristics Model (PHM) introduced by Chater and
Oaksford (1999). It states that reasoners solve syllogisms by simple
heuristics, approximating a probabilistic procedure. Within this
framework, generalized quantifiers like most are treated as probabilities of certain events or features. Another theory to explain human
syllogistic reasoning is the Matching Hypothesis (Wetherick, Gilhooly 1995), which states that the choice of the quantifier for the
conclusion matches the most conservative quantifier contained in the
premises. Extending this approach with most and normally could
result in the order:
All \ Normally \ Most \ Some = Some not \ None
Considering the example above from this perspective, normally is
preferred over all; hence a reasoner would, incorrectly, respond that
normally trains to Bayreuth are on time.
Do people actually reason when confronted with syllogisms or are
responses the result of a superficial automatic process, as suggested
by the Matching-Hypothesis? Mental Models are an approach that
assumes individuals engage in a reasoning process, thus allowing for
more sophisticated responses. Yet individual differences exist.
Therefore, we suggest Minimal Models as a hybrid approach, combining mental models and heuristics. It is assumed that a deductive
process based on an initial model is guided by the most conservative
quantifier of the premises. Reasoners will try to verify this quantifier
in the initial model, which is minimal with respect to the number of
individuals represented, and tend to formulate a conclusion containing
this quantifier. For example, for the syllogism Most A are B,
Some B are C, some is more conservative and tested in the following initial (minimal) model (left):

Some holds in this model, thus the preferred conclusion is Some A


are C. While this answer is dominated by heuristics (corresponding

to what is known as System 1), some reasoners may engage in a more


sophisticated reasoning process (System 2) consisting of the construction of alternative models in order to falsify the initial
conclusion. An example for an alternative model of the syllogism is
given above. With such an alternative model in mind, the reasoner
will arrive at the valid response, i.e. in this case NV.
Empirical Investigation
Hypothesis. We tested whether our extension of the MatchingHypothesis or Minimal Models provide a more accurate account of
human syllogistic reasoning. The PHM was not included in our analysis,
as it does not provide any predictions for reasoning with the quantifier
normally. For our experiment, we assume that Minimal Models make
better predictions for the response behavior of naive participants,
because 1) they allow for effects of figure, i.e. responses may vary
depending on the order of terms in the premises and 2) they not only
predict heuristic System 1 responses, but also System 2 responses which
are logically valid and are often not conform to System 1 responses.
Therefore, we hypothesize that Minimal Models show a higher hit rate
and more correct rejections than the Matching-Hypothesis. Furthermore, System 2 responses should occur as predicted by Minimal
Models. In addition to this comparison of theories, we explored the
semantics of the quantifiers most and normally empirically.
Participants. Fifty-eight native English speakers (21 m, 37 f;
mean(age) = 35.5) participated in this online experiment. They were
recruited via Amazon Mechanical Turk and come from a variety of
educational and occupational backgrounds.
Design & Procedure. In this online experiment participants were
asked to generate conclusions to 31 syllogisms in Figures I and IV
reflecting all combinations of the quantifiers all, some, most, and
normally (the simple syllogism AA in Figure I was omitted, as it was
used as an explanatory example in the instruction). Both premises
were presented simultaneously, together with the question What
follows? Participants could either fill in a quantifier and the terms
(X, N, V), or write nothing (meaning NV) in the response field.
After completing this production task, participants were asked about
their understanding of the four quantifiers. For each quantifier they
had to complete a statement of the following form: If someone says
[quantifier] it refers to a minimum of out of 100 objects. Note that
we asked for the minimum, as the lower bounds of the quantifiers are
of greater importance to understanding the semantics of these specific
quantifiers.
Results
Overall calculations show only a small, but significant difference
(Wilcoxon test, z = 2.11, p = .018) between Minimal Models
(67.9 % of responses predicted) and the Matching-Hypothesis
(64.5 %). This trend was confirmed by the more fine-grained analysis
of hits and correct rejections following a method introduced by
(Khemlani, Johnson-Laird 2012): Theoretical predictions are compared to significant choices (as shown in Tables 1 and 2) and hits (i.e.
choices that are predicted by the respective theory) and correct
rejections are counted. In this analysis, Minimal Models perform
better in both categories, hits (90.1 vs. 78.1 %; Wilcoxon test,
z = 2.99, p = .001) and correct rejections (92.1 vs. 87.5 %;

Table 1 Significant choices for Figure I and the percentage of participants who drew these conclusions
First premise

Second premise
All [A]

Some [I]

Most [M]

Normally [N]

All [A]

I (78 %)

M (72 %)

N (60 %)

Some [I]

I (78 %)

I (79 %)

I (67 %)

I (69 %)

Most [M]

M (74 %)

I (74 %)

M (67 %)

M (33 %), I (28 %), N (26 %)

Normally [N]

N (69 %)

I (79 %)

I (33 %), M (26 %), N (19 %)

N (69 %)

123

Cogn Process (2014) 15 (Suppl 1):S1S158

S141

Table 2 Significant choices for Figure IV and the percentage of participants who drew these conclusions
First premise

Second premise
All [A]

Some [I]

Most [M]

Normally [N]

All [A]

A (74 %)

I (66 %)

M (50 %)

N (50 %)

Some [I]

I (55 %), I*(21 %)

I (57 %), NV (36 %)

I (48 %), NV (31 %)

I (43 %), NV (31 %)

Most [M]

M (45 %), M*(17 %)

I (50 %), NV (24 %)

M (50 %), NV (22 %), I (21 %)

N (24 %), NV (24 %), M (22 %), I (21 %)

Normally [N]

N (41 %), N*(17 %)

I (57 %), NV (31 %)

M (26 %), I (22 %), N (17 %),


NV (17 %)

N (59 %), NV (24 %)

Conclusions marked with * are conclusions in C-A direction, all others are in A-C direction. NV = not valid conclusion
Wilcoxon test, z = 2.65, p = .004). According to our Minimal Model
approach, for 26 tasks System 2 leads to responses differing from the
heuristic ones. In eight cases, this prediction was confirmed by the
data, i.e., in those cases a significant proportion of participants drew
the respective System 2 conclusion.
The quantitative interpretation of the quantifiers is depicted in
Fig. 1 for the quantifiers some, most, and normally. Note that the
values for the quantifier all are not illustrated, as with one exception
all participants assigned a minimal value of 100 to it. For several
participants normally is equivalent to all, i.e., no exceptions are
possiblein contrast to most. The direct comparison of the quantifiers
most and normally revealed that, as expected, normally
(mean = 75.5) is attributed a significantly (Wilcoxon text, z = 2.39,
p = .008) higher value than most (mean = 69.0).
Discussion
Our distinction between frequency-based quantifiers (e.g., normally)
and set-based quantifiers (e.g., most) in reasoning isto the best of
our knowledgenew. Although both, in principle, allow for exceptions, depending on the underlying semantics, four reasoners gave the
same semantics for normally as for all. For most all reasoners
assumed the possibility for exceptionspossibly applying a principle
similar to the Gricean Implicature (Newstead 1995). This principle
assumes that whenever we use expressions that allow for exceptions,
these can be typically assumed.
So far the PHM (Chater, Oaksford 1999) does not provide any
predictions for reasoning with the quantifier normally; however, given
our quantitative evaluation of this quantifier, the PHM could be

Fig. 1 Individual values (points) and quartiles (lines) of participants


understanding of the minimum of the quantifiers some, most, and
normally

extended to address this issue in further research. Furthermore, for the


presented experimental results the theory predictions for the quantifier
normally could also be examined using the more fine-grained method
of Multinomial Processing Trees (cf. Ragni, Singmann and Steinlein
2014).
Non-monotonic reasoning, i.e., reasoning about default assumptions, often uses quantifiers like normally to express knowledge. For
instance (Schlechta 1995) characterizes such default assumptions in
his formal investigation as generalized quantifiers.
Although there are many theories for syllogistic reasoning
(Khemlani, Johnson-Laird 2012), there are only few that can be
generically extended; among these we focused on the MatchingHypothesis and the Mental Model Theory. In contrast to the Matching-Hypothesis, Mental Model Theory relies on mental
representations that can be changed to search for counter-examples
(as in relational reasoning cf. Ragni, Knauff 2013) and generate
additional predictions by the variation of the models. The findings
indicate that an extension of the Matching-Hypothesis to include the
set-based quantifier most and the frequency-based quantifier normally
lead to an acceptable prediction of the experimental data. There are,
however, some empirical findings it cannot explain, e.g., System 2
responses and figural effects in reasoning with generalized quantifiers.
It seems that in this case reasoners construct models as representations instead of merely relying on superficial heuristics.
Acknowledgments
The work has been partially supported by a grant to MR from the
DFG within the SPP in project Non-monotonic Reasoning. The
authors are grateful to Stephanie Schwenke for proof-reading.

References
Brewka G, Niemela I, Truszczynski M (2007) Nonmonotonic reasoning. Handbook of Knowledge Represent 239284
Bucciarelli M, Johnson-Laird PN (1999) Strategies in syllogistic
reasoning. Cogn Sci 23:247303. doi:10.1016/S0364-0213
(99)00008-7
Chater N, Oaksford M (1999) The probability heuristics model of
syllogistic reasoning. Cogn Psychol 38:191258. doi:10.1006/
cogp.1998.0696
Khemlani S, Johnson-Laird PN (2012) Theories of the syllogism: a
metaanalysis. Psychol Bull 138, 427457. doi:10.1037/a0026841
Newstead S (1995) Gricean implicatures and syllogistic reasoning.
J Memory Lang 34 (5):644664
Pfeifer N (2006) Contemporary syllogistics: Comparative and quantitative syllogisms. In Kreuzbauer G, Dorn GJW (eds)
Argumentation in Theorie und Praxis: Philosophie und Didaktik
des Argumentierens. LIT, Wien, pp 5771
Ragni M, Knau M (2013) A theory and a computational model of
spatial reasoning with preferred mental models. Psychological
Review 120 (3):561588

123

S142
Ragni M, Singmann H, Steinlein E-M (2014) Theory comparison for
generalized quantifiers. In Bello P, Guarini M, McShane M, Scassellati B (eds) Proceedings of the 36th annual conference of the
cognitive science society. Cognitive Science Society, Austin,
pp 19841990
Rips LJ (1994) The psychology of proof: Deductive reasoning in
human thinking. The MIT Press, Cambridge
Schlechta K (1995) Defaults as generalized quantifiers. J Logic
Comput 5 (4):473494
Wetherick NE, Gilhooly KJ (1995) Atmosphere, matching, and
logic in syllogistic reasoning. Curr Psychol 14:169178. doi:
10.1007/BF02686906

What if you could build your own landmark? The


influence of color, shape, and position on landmark
salience
Marianne Strickrodt, Thomas Hinterecker, Florian Roser,
Kai Hamburger
Experimental Psychology and Cognitive Science, Justus Liebig
University Giessen, Germany
Abstract
This study focused on participants preferences for building a
landmark from eight colors, eight shapes, and four possible landmark
positions for aiding the wayfinding of a nonlocal person. It can be
suggested that participants did not only select features according to
their personal and aesthetic preference (e.g. blue, circle), but also
according to a sense of common cognitive availability and utility for
learning a route (e.g. red, triangle). Strong preferences for the position
of a landmark namely, before the intersection and in direction of the
turn are in line with other studies investigating position preference
from an allocentric view.
Keywords
Salience, Landmarks, Position, Color, Shape, Feature preference
Introduction
When travelling in an unknown environment people can use objects
such as buildings or trees for memorizing the paths they walk. These
objects, also called landmarks, are considered to be important reference points for learning and finding ones way (Lynch 1960).
Essentially, almost everything can become a landmark as long as it is
salient (Presson, Montello 1988). Sorrows, Hirtle (1999) defined three
different landmark saliencies, whose precise definitions change
slightly in the literature (e.g. Klippel, Winter 2005; Roser et al. 2012)
but, nevertheless, include the following aspects:

visual/perceptual salience: physical aspects of an object (e.g.


color, shape, size);
semantic/cognitive salience: aspects of knowledge and experiences, refers to mental accessibility of an object (i.e. easiness to
label an object);
structural salience: easiness one can cognitively conceptualize the
position of an object.

Speaking of perceptual salience, which is not absolute but contrast-defined, a high contrast of the landmark to the surrounding will
lead to easy and fast identification and recognition (Presson, Montello
1988). Nevertheless, given a high contrast, also color preference itself
might influence the visual salience of an object. In a non-spatial
context, blue to purple colors were found to be preferred whereas
yellowish-green colors were most disliked (Hurlbert, Ling 2007).
The cause of these preferences is discussed in light of different theories and explanations ranging from evolutionary adaption of our

123

Cogn Process (2014) 15 (Suppl 1):S1S158


visual system (finding the red, ripe fruit) to aspects of ecological
valence (experiences we associate with colored objects cause corresponding color preference). In a spatial context, colored environments
have been found to enhance wayfinding behavior of children and
adults when compared to non-colored environments (Jansen-Osmann,
Wiedenbauer 2004). Also, on the level of single colors applied to
landmarks along a virtual route, green led to the worst performance in
a subsequent recognition task, while yellow, cyan, and red were
recognized best (Wahl et al. 2008). Interestingly, even though not
being a preferred color per se, yellow seems to be easy to recognize,
therefore, probably helpful for remembering a path and learning the
surrounding. Thus, it seems to be important to differentiate between a
personal preference for color and the memorability and utility of
colors in a spatial context.
A shape, such as a square or an ellipse, comprises both, visual and
semantic salience an appearance which is more or less easy to
perceive and to reproduce and a mental conceptualization, a label, a
meaning. Shapes compared to colors revealed a significantly higher
recognition performance (Wahl et al. 2008). Nevertheless, no differences in selecting or recognizing differently shaped landmarks could
be found (Roser et al. 2012; Wahl et al. 2008). Outside the spatial
context, Bar, Neta (2006) found that angular objects are less preferred
than curved objects. They speculated: sharp contours elicit a sense of
threat and lead to a negative bias towards the object. Taken together,
these findings again suggest that both, preference and utility of
shapes, are to be differentiated, whereby utility might play a more
important role when selecting a color in a wayfinding context.
Besides these low-level features color and shape, this research
concentrates on the position of a landmark at an intersection, covering
an important aspect of structural salience, namely, how different
positions are conceptualized. When instructed to select one out of
four landmarks for a route description, each attached to one of the
four corners of an intersection, participants show a clear position
preference (Roser et al. 2012). From an egocentric perspective the
positions in direction of turn either before or after the intersection are
chosen significantly more often than positions opposite to the direction of turn. With allocentric (map-like) material the position before
the intersection and in direction of turn is most preferred. Therefore,
what most accounted for an object to be chosen was its location
dependent of the direction in which the route continued, not whether
it was presented on the left or right. The two types of defining the
position of an object at an intersection are visualized in Fig. 3.
This study addresses all saliencies with the help of a simple selection
task. Participants should choose from color, shape and position to create
a landmark, which should be aiding another persons way through the
same environment. We assume that the landmarks produced by the
participants mirror their own implicit sense of a good, salient landmark,
which everyone should be able to use. Results might in turn be an
indicator for diverging scores of salience within the examined features.
By combining the findings of the aforementioned preference and navigation research, we hypothesize that red and blue and the position in
front of the intersection, in direction of turn (D) are most frequently
chosen. Since shapes seem to induce distinctive preference this might
also be reflected in the construction of a landmark, but no clear suggestions can be made at this point.
Material and Method
Participants
The sample consisted of 56 students (46 females) from Giessen
University (mean age 24 yrs, SD = 4.5), who received course credits
for participation. Normal or corrected to normal vision was required.
Materials
On an A4 sized paper an allocentric view of a small schematic city
area was printed. The region of interest in this area consisted of four
orthogonal intersections (Fig. 1). The route (two right and left turns)
through this region was displayed by a magenta line. On the four
corners of each intersection a quadratic white field indicated an

Cogn Process (2014) 15 (Suppl 1):S1S158

S143

color assignments [%]

30
25

22.77

20

17.86

17.41

14.29

15

12.95

uniform distribution

10.27
10

3.57

0.89
0
red

green

21.88

21.43

blue

yellow

violet

orange

black

white

Fig. 1 Schematic city area and range of colors and shapes participants could choose from as presented to the participants. The route
from start (Start) to destination (Ziel) is indicated by a dashed
line. White quadratic fields are the optional locations for the created
landmarks
optional location for a landmark. Participants could choose from eight
colors (violet, blue, green, yellow, orange, red) or luminances (white,
black), respectively, and eight shapes (diamond, hexagon, square,
rhomboid, ellipse, circle, cross, triangle).
Procedure
Instructions were given on a separate paper. Only one landmark
should be built for every intersection and each color and shape could
only be used once. The landmark was to be positioned at one of the
four corners of an intersection. The shape, therefore, had to be drawn
with the selected color in one of the four white corners of an intersection. The task was to use the subjectively preferred combinations
to build the landmarks in order to facilitate wayfinding for a notional,
nonlocal person. Participants were instructed to imagine giving a
verbal route description to this nonlocal person, including their built
landmarks.
Results
Overall 224 decisions for shapes, colors, and positions, respectively
(56 participants * 4 landmarks to build), were analyzed with nonparametric Chi Square tests. Frequencies for selection of shapes and
colors can be seen in Fig. 2.
When analyzing single colors (Bonferroni correction a = .006),
red was significantly above uniform distribution (v2(1) = 21.592,
p \ .001), black (v2(1) = 16.327, p \ .001) and white
(v2(1) = 27.592, p \ .001) were below. Regarding the shapes, results
show that participants have a significant preference for the triangle
(v2(1) = 18, p \ .001) and the circle (v2(1) = 16.327, p \ .001). On
the other hand, ellipse (v2(1) = 13.224, p = .001), hexagon
(v2(1) = 14.735, p \ .001), and rhomboid (v2(1) = 19.755,
p \ .001) were rarely chosen at all. Green, blue, yellow, violet,
orange, and square, diamond and cross did not deviate from average
frequencies.
Figure 3 and Table 1 comprise findings comparing landmark
positions. When focusing on landmark positions dependent of the
direction of turn, it could be shown that position D in front of the
intersection, in direction of turn is by far most frequently selected
(71.88 %), followed by the other, associated position lying in direction of turn but behind the intersection position B (25 %). Positions
opposite to the direction of turn (A and C) lag far behind, suggesting
that the significant difference between direction independent positions
in front of and behind the intersection (1 and 2 against 3 and 4) is
solely driven by the popularity of D.

shape assignments [%]

30
25
20
15

17.86

14.73
12.95

uniform distribution

10
4.46

4.02

*
2.68

Fig. 2 Relative frequency of selected colors and shapes and their


deviation from uniform distribution (dashed line)

a| Position independent of
direction of turn

Position dependent of |b
direction of turn

12.5%

14.29%

1.34%

25%

37.5%

35.71%

1.79%

71.88%

Fig. 3 Relative frequency of selected positions a independent of


direction of turn, 1 behind, left; 2 behind, right; 3 in front, left; 4 in
front, right. b dependent of direction of turn, A behind, opposite;
B behind, in; C in front, opposite; D in front, in. Note that the right
figure includes both, right and left (transposed to right) direction of
turns

Discussion
This study examined the selection of three different landmark features, namely color, shape, and location. Participants were instructed
to select according to their own persuasion what kind of landmark is
most qualified to aid a nonlocal person to find her way following a
route description. Most favored by the participants was the color red
(followed by green and blue, which, due to a error correction did not
differ from chance). The least preferred colors were the luminances
black and white. As for shapes triangle and circle were most frequently selected (ensued by square, although, without significant
difference from chance). Least preferred were ellipse, hexagon, and
rhomboid. A significant prominence of the position D was found.

123

S144

Cogn Process (2014) 15 (Suppl 1):S1S158

Table 1 Multiple Chi Square comparisons for the two types of definition for landmark location
Independ.
12

v2(1)
0.267

Depend.
.699

AB

v2(1)
47.610

p
\.001*

13

28

\.001*

AC

0.143

14

25.037

\.001*

AD

152.220

\.001*

23

23.310

\.001*

BC

45.067

\.001*

24

20.571

\.001*

BD

50.806

\.001*

34

0.098

.815

CD

149.388

\.001*

1.000

v2 value and significance are shown. (Bonferroni correction a = .008)

The neglect of the luminances black and white is in line with the
assumptions concerning visual salience, namely, that a low contrast to
the grey and white background of the experimental material is not
preferable in a wayfinding context. Results suggest that participants
were aware of the positive impact of contrast. Interestingly, neither
former results of color preferences (Hurlbert, Ling 2007) nor benefit in
recognition (Wahl et al. 2008) are perfectly mirrored in our data, suggesting that selection process was not based on either of these levels.
Instead, it seems to be plausible to suggest a selection strategy preferring landmark features according to familiarity. As red, blue, and green
constitute the three primary colors every western pupil gets taught in
school and as they are probably the most used colors in street signs, they
might also be best established and conceptualized in the knowledge of
an average person, selecting these colors. For the visual and semantic
salience of shapes a similar explanation may be consulted. Shapes are
preferred, which are highly common and easy to identify by everyone:
triangles and circles. Furthermore, the low complexity of these shapes
compared to rhomboid or hexagon might have affected the selection as
well. It seems that the sharpness of the contour of an object was
immaterial in this task. The clearest and most reliable result is the
preference for position D (allocentric), the position before the intersection and in direction of the turn. Also Waller, Lippa (2007) pointed
out the advantages of landmarks in directions of turn as they serve as
beacons (as compared to associative cues). Merely recognizing these
landmarks is sufficient to know where to go, since their position reveals
the correct direction response at an intersection.
Overall, it seems that participants did not choose object properties according to a mere personal feature preference. Their
selection process probably involved preference with respect to perceptibility, easiness of memorization, and usability in terms of
wayfinding (this works fine as a landmark). To what extent the
selection was based on a conscious or unconscious process cant be
determined here. Also, if the fact of guiding another person (compared to oneself) played an important role in creating a landmark
cant be sufficiently answered at this point. Furthermore, if these
preferences really help people to learn a route faster or easier yet is
another question. For the task of building a landmark, which shall
aid other people to find the same way, we found evidence that
people show clear preference for best-known and most common
colors and shapes. Moreover, the high frequency of the selection of
the position before the turn and in the direction of turn is striking.
This study, the task of creating a landmark, is a small contribution
to the expanding research on visual as well as semantic, and
structural salience of landmarks.
References
Bar M, Neta M (2006) Humans prefer curved visual objects. Psychol
Sci 17:645648
Hurlbert AC, Ling Y (2007) Biological components of sex differences
in color preference. Curr Biol 17:R623R625

123

Jansen-Osmann P, Wiedenbauer G (2004) The representation of


landmarks and routes in children and adults: A study in a virtual
environment. J Environ Psychol 24:347357
Klippel A, Winter S (2005) Structural salience of landmarks for route
discrimination. In: Cohn AG, Mark D (ed) Spatial information
theory. International Conference COSIT. Springer, Berlin
Lynch K (1960) The image of the city. MIT Press, Cambridge
Presson CC, Montello DR (1988) Points of reference in spatial cognition: Stalking the elusive landmark. Br J Dev Psychol 6:378381
Roser F, Krumnack A, Hamburger K, Knauff, M (2012) A four factor
model of landmark salienceA new approach. In: Russwinkel
N, Drewitz U, van Rijn H (ed) Proceedings of the 11th International Conference on Cognitive Modeling (ICCM).
Universitatsverlag TU Berlin, Berlin
Sorrows ME, Hirtle SC (1999) The nature of landmarks for real and
electronic spaces. In: Freksa C, Mark DM (ed) Spatial information
theory: cognitive and computational foundations of geographic
information science, International Conference COSIT 1999.
Springer, Stade
Wahl N, Hamburger K, Knauff M (2008) Which properties define the
salience of landmarks for navigation?An empirical investigation of shape, color and intention. International Conference
Spatial Cognition 2008, Freiburg
Waller D, Lippa Y (2007) Landmarks as beacons and associative
cues: their role in route learning. Mem Cognit 35:910924

Does language shape cognition?


Alex Tillas
Institut fur Philosophie, Heinrich-Heine-Universitat, Dusseldorf,
Germany
Introduction
In this paper, I investigate the relation between language and thinking
and offer an associationistic view of cognition. There are two main
strands in the debate about the relation between language and cognition. On the one hand there are those that ascribe a minimal role to
language and argue that language merely communicates thoughts
from the Language of Thought-level to the conscious-level (e.g. Grice
1957; Davidson 1975; Fodor 1978). On the other hand, there are those
who argue for a constitution relation holding between the two (Carruthers 1998; Brandom 1994). Somewhere in the middle of these two
extremes lie the supra-communicative views of language that go back
to James (1890/1999), Vygotsky (trans. 1962) and more recently in
the work of Berk and Garvin (1984). Furthermore, Gauker (1990)
argues that language is a tool for affecting changes in the subjects
environment, while Jackendoff (1996) argues that linguistic formulation allows us a handle for attention. Finally, Clark (1998), and
Clark and Chalmers (1998) argue for the causal potencies of language
and suggest that language complements our thoughts (see also Rumelhart et al. 1986).
Building upon associationism the view suggested here ascribes a
significant role to language in terms of cognition. This role is not
limited to interfacing between unconscious and conscious level but
the relation between the two is not one of constitution. More specifically, in the suggested view linguistic labels (or words) play a
crucial role in thinking. Call this position Labels and Associations in
Thinking hypothesis (henceforth LASSO). LASSO is similar to
Clarks view in that utilization of linguistic symbols plays a significant role. However, for Clark, language is important in reducing
cognitive loads, while in LASSO utilization of linguistic labels is
responsible for acquisition of endogenous control over thoughts. In
particular, I start from the ability that human agents have to manipulate external objects in relationships of agency towards them, and

Cogn Process (2014) 15 (Suppl 1):S1S158


argue that we can piggyback on that ability to manipulate and direct
our own thinking. Despite sharing with supra-communicative views
that language does not merely serve to communicate thoughts to
consciousness, my focus here is on a more general level. In particular,
I focus on how language influences thinking, rather than on how
specific cognitive tasks might be propped by language. Finally,
LASSO resembles Lupyans (2007) Label Feedback Hypothesis, even
though my agenda is more general than Lupyans (non-linguistic
aspects of cognition such as perceptual processing).
The LASSO Hypothesis
LASSO is based on a view of concepts as structured entities, comprising a set of representations. Members of this set are mainly
perceptual representations from experiences with instances of a given
kind, as well as perceptual representations of the appropriate word.
These representations become associated on the basis of co-occurrence. Crucially they become reactivated when thinking about this
object; to this extent thinking is analogous to perceiving (Barsalou
1999).
To endogenously control the tokening of a given concept is to
activate this concept in the absence of its referents. In turn, to
endogenously control thinking is to token a thought on the basis of
processes of thinking rather than of processes of perceiving the
appropriate stimulus. Endogenously controlled thinking is merely
associative thinking, i.e., current thinking caused by earlier thinking.
The key claim hare is that we have endogenous control over our
production of linguistic items given that we are able to produce linguistic utterances at will. It is this executive control over linguistic
utterances that gives us endogenous control over our thoughts.
Admittedly, there are alternative ways to acquire endogenous control
over our thoughts, e.g. via associations with a goal-directed state over
which we already have endogenous control. Once a certain degree of
linguistic sophistication is acquired, the process of activating a concept in a top-down manner is achieved in virtue of activating
associated words.
Language is not constitutive to (conscious) thinking
According to Carruthers (1998; 2005), accounting for our non-inferential access to our thoughts requires inner speech to be constitutively
involved in propositional thinking. Contra Carruthers, I argue that this
is not the only way in which non-inferential thinking can occur. One
alternative is associative thinking. It might be that the transition from
the word to the concept that has the very same content that a given
word expresses is an associationistic link. In the suggested view,
perceptual representations and words are associated in memory. Note
that this is not a case of language being constitutive to thoughts, but a
case of co-activation of a concepts different subparts: Perceptual
representations of the appropriate word (A) and representations
formed during perceptual experiences with instances of a given kind
(B). This occurs in virtue of an instance of a word activating A, which
in turn activates B resulting in the concepts activation, as a whole.
Nevertheless, and importantly, this kind of thinking is not interpretative, as Carruthers argues. It is not that an agent hears a word, say
Cat, and then tries to guess or infer what the word means. Instead,
on hearing the word Cat the concept cat becomes activated. Access
to thinking is neither interpretative nor constitutive.
Perceptual representations of objects and words are distinct from
each other and are brought together during the process of concept
formation. It is just that we only have conscious access at the level
where representations of words and objects convergeconsider this
in terms of Damasios (1989) well known convergence zones
hypothesis. In this sense, an agent can only access representations of
objects and words simultaneously and treat them as if they were
constitutive parts of a concept/thought.
The relationship between a thought and its representation in selfknowledge is brute causation. The particular transition between a first
order thought and a second order thought are causally and not

S145
constitutively related. Contra Carruthers, the relationship between a
first order and a second order thought is not a constitutive but a causal
associative one. Thought and language are not constitutively
connected.
Evidence for LASSO: Language & perceptual categorization
The suggested view enjoys significant empirical support, e.g. from
evidence showing that perceptual categorization depends on language. This evidence could in turn be used against the communicative
conception of language. For instance, in a series of experiments,
Davidoff, Robertson (2004) examined LEWsa patient with language impairments and close to the profile of high-level Wernickes
aphasia, abilities to categorize visually presented color stimuli, and
found that color categories did not pop-out for LEW. Instead, he
retreated to a comparison between pairs, which in turn resulted in his
poor performance in the categorization tasks. From this, Davidoff,
Robertson argue that color categorization is essentially a rule-governed process. And even though colors are assigned to a given
category on the basis of similarity, it is similarity to a conventionally
named color that underlines this assignment. LEWs inability to
categorize simple perceptual stimuli is because names are simply not
available to him.
With regards to his performance in the color and shape categorization tasks, they argue that it is not the case that LEW has simply lost
color or shape names. He is rather unable to consciously allocate
items to perceptual categories. To this extent, they argue that LEWs
impairment is not related to a type-of-knowledge but rather to a typeof-thought story. Furthermore, they argue that there is a type of
classification, independent of feature classification, which is
unavailable to aphasics with naming disorders. This evidence does not
suggest a constitutive relation between language and thinking. Instead
it suggests a strong relation between naming and categorization
impairments, which could be explained by appealing to a strong
association between a linguistic label and a concept. This in turn lends
support to LASSO.
Evidence against a constitutive relation between language &
cognition
Evidence in favor of LASSO and against a constitutive relation
between language and cognition can be found in results showing that
grammara constitutive part of languageis neither necessary nor
sufficient for thinking and more specifically in Theory of Mind (ToM)
reasoning. For instance, Siegal, Varley, Want (2001) show a double
dissociation between grammar and ToM reasoning, which in turn
indicates that reasoning can occur largely independently from grammatical language. Even though ToM understanding and
categorization is not all there is to cognition, had it been the case that
there was a constitutive relation between language and (conscious)
cognitionin the way Carruthers argues for instancethen a double
dissociation between grammar and ToM reasoning would have never
occurred.
Focusing on the relation between grammar and cognition in
aphasia, Varley and Siegal (2000) show that subjects with severe
agrammatic aphasia and minimal access to propositional language
performed well in different ToM tests and were capable of simple
causal reasoning. On these grounds, Siegal, Varley, Want (2001)
argue that reasoning about beliefs as well as other forms of sophisticated cognitive processes involve processes that are not dependent
on grammar. By contrast to the previous evidence, Siegal et al. report
that non-aphasic subjects with right-hemisphere (non-language
dominant) lesions exhibited impaired ToM reasoning and had difficulties understanding sarcasm, jokes and the conversational
implications of questions (Siegal et al. 1996; Happe et al. 1999). This
double dissociation between grammar on the one hand and causal
reasoning and ToM on the other, suggest a non-constitutive relation
between language and cognition, and in turn favors LASSO.

123

S146
Objections to LASSO
Qua inherently associationistic, LASSO might be subject to the
objection that it cannot account for propositional thinking or for
compositionality of thought. For it might be that LASSO at best
describes how inter-connected concepts become activated without
explaining the propositional-syntactic properties that thoughts in the
form of inner speech have. In reply, a single thought becomes
propositional in structure and content by piggybacking on language.
The conventional grammatical unity and structure of the sentence
unifies these concepts and orders them in a certain way.
Another challenge facing associationistic accounts of thinking is
that it is unclear how they can account for the characteristic of concepts to combine compositionally. In reply, I appeal to Prinzs
semantic account (2002), according to which, in order for c to refer to
x, the following two conditions have to be fulfilled:
a) xs nomologically covary with tokens of c
b) An x was the (actual) incipient cause of c
In the suggested view the concept petfish, like all concepts, is a
folder that contains perceptual representations. The incipient causes
of petfish can either be instances of petfish or representations of pets
and representations of fish. Crucially, in terms of semantics, petfish
has to nomologically covary with petfish rather than a disjunction of
pet and fish. The reason why petfish nomologically covaries with
petfish is that the concepts functional role is constrained by the
constraints on the uses of the word that are set by the agents locking
into the conventions about conjunction formation. In this sense,
agents participate in a convention and it is via the association between
the word and the concept that the functional role of the conjunctive
concept is constrained. In terms of the constitutive representations of
petfish, these can be representations of pets like cats and dogs as well
as representations of fish. Crucially, these representations are idle in
the functional role of the concept; the latter is more constrained by its
link to the words.
Acknowledgments
I am grateful to Finn Spicer, Anthony Everett and Jesse Prinz for
comments on earlier drafts of this paper. Research for this paper has
been partly funded by the Alexander S. Onassis Public Benefit
Foundation (ZF 075) and partly by the Deutsche Forschungsgemeinschaft (DFG) (SFB 991_Project A03).
References
Barsalou LW (1999) Perceptual symbol systems, Behav Brain Sci 22:
577609. doi:10.1017/s0140525x99002149
Berk L and Garvin R (1984) Development of private speech among
low-income Appalachian children. Dev Psychol 20 2: 271286.
doi:10.1037/0012-1649.20.2.271
Brandom R (1994) Making it explicit: Reasoning, representing, and
discursive commitment. Harvard University Press, Cambridge
MA
Carruthers P (1998) Conscious thinking: Language or elimination?
Mind Lang 13 4: 457476. doi:10.1111/1468-0017.00087
Carruthers P (2005) Consciousness: Essays from a higher order perspective. Clarendon Press, Oxford
Clark A (1998) Magic words: How language augments human computation. In Carruthers P and Boucher J (ed) Language and
thought: Interdisciplinary themes, pp 162183. Cambridge University Press, Cambridge
Clark A and Chalmers DJ (1998) The extended mind. Analysis 58
1:719. doi:10.1111/1467-8284.00096
Damasio AR (1989) Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition.
Cognition 33: 2562. doi: 10.1016/0010-0277(89)90005-X

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Davidoff J and Roberson D (2004) Preserved thematic and impaired
taxonomic categorization: A case study. Lang Cognitive Proc 19
1: 137174. doi:10.1080/01690960344000125
Davidson D (1975) Thought and talk. In his Inquiries into truth and
interpretation pp 155170. Oxford University Press, Oxford
Dummett M (1975) Wangs Paradox. Synthese. 30, 30124
Elman JL, Bates EA, Johnson MH, Karmiloff-Smith A, Parisi D,
Plunkett K (1996) Rethinking innateness: A connectionist perspective on development. MIT Press, Cambridge MA
Fodor J (1978) Representations: Philosophical essays on the foundations of cognitive science. MIT Press, Cambridge MA
Gauker C (1990) How to Learn a Language like a Chimpanzee. Phil
Psych 3 1: 3153. doi:10.1080/09515089008572988
Grice P (1957) Meaning. Phil Review 66:37788
Happe F et al. (1999) Acquired theory of mind impairments following stroke. Cognition 70, 21140. doi:10.1016/S00100277(99)00005-0
Jackendoff R (1996) How language helps us think. P&C 4 1. doi:
10.1075/pc.4.1.03jac
James W (1890/1999) The Principles of Psychology (2 vols.). Henry
Holt, New York (Reprinted Thoemmes Press, Bristol).
Lupyan G (2012) Linguistically modulated perception and cognition:
the label feedback hypothesis. Front Psychol 3: 54 doi:
10.3389/fpsyg.2012.00054
Prinz J (2002) Furnishing the mind: Concepts and their perceptual
basis. MIT Press, Cambridge
Rumelhart DE, Smolensky P, McClelland JL, Hinton GE (1986)
Parallel distributed models of schemata and sequential thought
processes. In McClelland JL and Rumelhart DE (eds) Parallel
Distributed Processing: Explorations in the Microstructure of
Cognition. Volume 2: Psychological and Biological Models
pp 757
Siegal M, Carrington J, Radel M (1996) Theory of mind and pragmatic understanding following right hemisphere damage. Brain
Lang 53: 4050. doi:10.1006/brln.1996.0035
Siegal M, Varley M, Want SC (2001) Mind over grammar: reasoning
in aphasia and development. Trends Cogn Sci 5 7. doi:
10.1016/S1364-6613(00)01667-3
Varley R, Siegal M (2000) Evidence for cognition without grammar from
causal reasoning and theory of mind in an agrammatic aphasic
patient. Curr Biol 10: 72326. doi:10.1016/S0960-9822(00)00538-8
Vygotsky LS (1962) Thought and Language. MIT Press, Cambridge

Ten years of adaptive rewiring networks in cortical


connectivity modeling. Progress and perspectives
Cees van Leeuwen
KU Leuven, Belgium; University of Kaiserslautern, Germany
Activity in cortical networks is generally considered to be governed
by oscillatory dynamics, enabling the network components to synchronize their phase. Dynamics on networks are determined to a large
extent by the network topology (Barahona and Pecora 2002; Steur
et al. 2014). Cortical network topology, however, is subject to change
as a result of development and plasticity. Adaptive network models
enable the dynamics on networks to shape the dynamics of networks,
i.e. the evolution of the network topology (Gross and Blasius 2008).
Adaptive networks show a strong propensity to evolve complex
topologies. In adaptive networks, the connections are selectively
reinforced (Skyrms and Pemantle 2000) or rewired (Gong and van
Leeuwen 2003, 2004; Zimmerman et al. 2004), in adaptation to the
dynamical properties of the nodes. The latter are called adaptively
rewiring networks.

Cogn Process (2014) 15 (Suppl 1):S1S158


Gong and van Leeuwen (2003, 2004) started using adaptive
rewiring networks in order to understand the relation between large
scale brain structure and function. They applied a Hebbian-like
algorithm, in which synchrony between pairs of network components
(nodes) is the criterion for rewiring. The nodes exhibit oscillatory
activity and, just like the brain does, where dynamic synchronization
in spontaneous activity shows traveling and standing waves, and
transitions between them (Ito et al. 2005, 2007), the network nodes
collectively move spontaneously in and out of patterns of partial
synchrony. Meanwhile, adaptive rewiring takes place. When a pair of
nodes is momentarily synchronized but not connected, from time to
time a link from elsewhere is relayed, in order to connect these nodes.
This is the key principle of adaptive rewiring.
Adaptively rewiring a network according to synchrony in spontaneous activity gave rise to the robust evolution of a certain class of complex
network structures (Fig. 1). These share important characteristics with
the large-scale connectivity structure of the brain. Adaptive rewiring
models, therefore, became an integral part of the research program of the
Laboratory for Perceptual Dynamics, which takes a complex systems
view to perceptual processes (For a sketch of the Laboratory while at the
RIKEN Brain Science Institute, see van Leeuwen 2005; for its current
incarnation as an FWO-funded laboratory at the KU Leuven, see its
webpage at http://perceptualdynamics.be/). The original adaptive
rewiring model (Gong and van Leeuwen 2003, 2004) was developed over
the years in a number of studies (Jarman et al. 2014; Kwok et al. 2007
Rubinov et al. 2009a; van den Berg et al. 2012; van den Berg and van
Leeuwen 2004). Here I review these developments and sketch some
further perspectives.
In the original algorithm (Gong and van Leeuwen 2004; van den
Berg and van Leeuwen 2004), the network initially consists of randomly coupled maps. Coupled maps are continuously valued maps
connected by a diffusive coupling scheme (Kaneko 1993). We used
coupled logistic maps; the return plots of these maps are generic and
can be regarded as coarsely approximating that of a chaotic neural
mass model (Rubinov et al. 2009a). Adaptively rewiring the couplings
of the maps showed the following robust tendency: From the initially
random architecture and random initial conditions, a small-world
network gradually emerges as the effect of rewiring. Small worlds are
complex networks that combine the advantages of a high degree of
local clustering from a regular network with the high degree of global
connectedness observed in a random-network (Watts and Strogatz
1998). They are, in other words, an optimal compromise for local and
global signal transfer. Small-world networks have repeatedly been
observed in the anatomical and functional connectivity of the human
brain (He et al. 2007; Sporns 2011; Bullmore and Bassett 2011;
Gallos et al. 2012).

Fig. 1 A random network prior to (left) and after (right) several


iterations of adaptive rewiring (From van Leeuwen 2008). Note that
this version of the model considers topology only; geographical
proximity of nodes was heuristically optimized in order to provide a
visualization of the propensity of the system to evolve a modular
small-world network

S147
The products of rewiring have an additional characteristic that is
relevant to the brain: they are modular networks (Rubinov et al.
2009a). This means that they form community structures that interact
via hubs. The hubs are specialized nodes that network evolution has
given the role of mediating connections between communities. They
synchronize, sometimes with one and sometimes with another and can
be considered as agents of change in the behavior of the regions to
which they are connected.
Several studies have explored, and help extend, the notion that
adaptive rewiring leads to modular small worlds. It was already
shown early on (Gong and van Leeuwen 2003) that combining
rewiring with network growth results in a modular network that is
also scale-free in the distribution of its connectivity (Barabasi and
Albert 1999). Kwok et al. (2007) have shown, that the behavior of
these networks is not limited to coupled maps, but could also be
obtained with more realistic, i.e. spiking model neurons. Other than
the coupled maps, these have directed connections. As the system
proceeds its evolution, the activity in the nodes changes. Initial
bursting activity (as observed in immature neurons, see e.g. Leinekugel et al. 2002, an activity assumed to be random but in fact, like
that of the model, shows deterministic structure, see Nakatani et al.
2003), gives way to a mixture of regular and irregular activity characteristic of mature neurons.
Van den Berg et al. (2012) lesioned the model and showed that there
is a critical level of connectivity, at which the growth of small-world
structure can no longer be robustly sustained. Somewhat surprisingly,
this results in a break-down, not primarily in the connections between
the clusters, but in the local clustering. In other words, the network shifts
towards randomness. This corresponds to observations in patients
diagnosed with schizophrenia (Rubinov et al. 2009b). The model,
therefore, could suggest an explanation of the anomalies in large-scale
connectivity structures found in schizophrenic patients.
Despite these promising results, a major obstacle towards realistic
application of the model has been the absence of any geometry. A
spatial embedding for the model would allow us to consider the effect of
biological constraints such as metabolic costs and wiring length. In a
recent study, Jarman et al. (2014) studied networks endowed with
metrics, i.e. a definition of distance between nodes, and observed its
effects on adaptive rewiring. A cost function that penalizes rewiring
more distant nodes, leads to a modular small world structure with
greater efficiency and robustness, compared to rewiring based on synchrony alone. The resulting network, moreover, consists of spatially
segregated modules (Fig. 2, left part), in which within-module connections are predominantly of short range and their inter-connections
are of long range (Fig. 2, right part). This implies that the topological
principle of adaptive rewiring and the spatial principle of rewiring costs
operate in synergy to achieve a brain-like architecture. Both principles
are biologically plausible. The spatially biased rewiring process,
therefore, may be considered as a basic mechanism for how large-scale
architecture of the cortex is formed.
The models developed so far have been no more (and no less)
than a proof of principle. To some extent, this is how it should be.
Efforts at biological realism can sometimes obscure the cognitive,
neurodynamical principles on which a model is based. Some predictions, such as what happens when lesioning the model, could
already be made with a purely topological version, with its extreme
simplification of the neural dynamics. Yet, in order to be relevant,
future model development will have to engage more with neurobiology. We are doing this step by step Jarman et al. (2014) have
overcome an important hurdle in applying the model by showing
how spatial considerations could be taken into account. Yet, more is
needed. First, we need to resume our work on realistic (spiking)
neurons (Kwok et al. 2007): We will consider, distinct (inhibitory
and excitatory) neural populations, realistic neural transmission
delays, spike-timing dependent plasticity and more differentiated
description of mechanisms that guide synaptogenesis in the

123

S148

Fig. 2 From: Jarman et al. 2014. Adaptive rewiring on a sphere left


Differently colored units reveal the community structure (modularity)
resulting from adaptive rewiring with a wiring cost function. Right
Correlation between spatial distance of connections (x-axis) and their
topological betweenness centrality (Y-axis). From top to bottom
initial state and subsequent states during the evolution of the small
world network. The correlation as it emerges with the network
evolutions shows that links between modules tend to be of long range

transition from immature to mature systems. Second, and only after


that, should we be start preparing the system for information processing functions.
References
Barabasi A-L, Albert R (1999) Emergence of scaling in random
networks. Science 286:509512
Barahona M, Pecora LM (2002) Synchronization in small-world
systems. Phys Rev Lett: 89:0541014
Bullmore ET Bassett DS (2011) Brain graphs: graphical models of the
human connectome. Annu Rev Clin Psychol 7:113140
Gallos LA, Makse HA, Sigman M (2012) A small world of weak ties
provides optimal global integration of self-similar modules in
functional brain networks PNAS 109:28252830
Gong P, van Leeuwen C (2004) Evolution to a small-world network
with chaotic units. Europh Lett 67:328333
Gong P, van Leeuwen C (2003) Emergence of scale-free network
with chaotic units. Physica A Stat Mech Appl 321:679688
Gross T, Blasius B (2008) Adaptive coevolutionary networks: a
review. J Roy Soc Interf 5:259271
He Y, Chen ZJ, Evans AC (2007) Small-world anatomical networks
in the human brain revealed by cortical thickness from MRI.
Oxf J 17:24072419
Ito J, Nikolaev AR, van Leeuwen C (2005) Spatial, temporal structure
of phase synchronization of spontaneous EEG alpha activity.
Biol Cybern 92:5460
Ito J, Nikolaev AR, van Leeuwen C (2007) Dynamics of spontaneous
transitions between global brain states. Hum Brain Mapp
28:904913
Jarman N, Trengove C, Steur E, Tyukin I, van Leeuwen C (2014)
Spatially constrained adaptive rewiring in cortical networks
creates spatially modular small world architectures Cogn Neurodyn: doi:101007/s11571-014-9288-y
Kaneko K (ed) (1993) Theory, applications of coupled map lattices
Wiley, Chichester

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Kwok HF, Jurica P, Raffone A, van Leeuwen C (2007) Robust
emergence of small-world structure in networks of spiking
neurons. Cogn Neurodyn 1:3951
Leinekugel X Khazipov R Cannon R Hirase H Ben-Ari Y, Buzsaki G
(2002) Correlated bursts of activity in the neonatal hippocampus
in vivo Science 296(5575) 20492052
Nakatani H Khalilov I Gong P, van Leeuwen C (2003) Nonlinearity in
giant depolarizing potentials. Phys Lett A 319:167172
Rubinov M Sporns O van Leeuwen C, Breakspear M (2009a) Symbiotic relationship between brain structure, dynamics BMC
Neuroscience 10:55 doi:101186/1471-2202-10-55
Rubinov M Knock S Stam C Micheloyannis S Harris A Williams L,
Breakspear M (2009b) Small-world properties of nonlinear brain
activity in schizophrenia Human Brain Mapping 58 (2) 403416
Skyrms B Pemantle R (2000) A dynamic model of social network
formation Proc Natl Acad Sci USA 97:93409346
Sporns O (2011) The human connectome: a complex network Ann N
Y Acad Sci 1224(1):109125
Steur E Michiels W Huijberts HJC, Nijmeijer H (2014) Networks of
diffusively time-delay coupled systems: Conditions for synchronization, its relation to the network topology Physica D 277 2239
van den Berg D Gong P Breakspear M, van Leeuwen C (2012)
Fragmentation: Loss of global coherence or breakdown of
modularity in functional brain architecture? Frontiers in Systems
Neuroscience 6 20 doi:103389/fnsys201200020
van den Berg D, van Leeuwen C (2004) Adaptive rewiring in chaotic
networks renders small-world connectivity with consistent clusters Europhysics Letters 65 459464
van Leeuwen C (2005) The Laboratory for Perceptual Dynamics at
RIKEN BSI. Cogn Proc 6:208215
van Leeuwen C (2008) Chaos breeds autonomy: connectionist design
between bias and babysitting. Cogn Proc 9:8392
Watts D, Strogatz S (1998) Collective dynamics of small-world
networks Nature 393:440442
Zimmermann M G Eguluz V M, M S Miguel (2004) Phys Rev E
69:065102

Bayesian mental models of conditionals


Momme von Sydow
Department of Psychology, University of Heidelberg, Germany
Conditionals play a crucial role in psychology of thinking, whether
one is concerned with truth table tasks, the Wason selection task, or
syllogistic reasoning tasks. Likewise, there has been detailed discussion on normative models of conditionals in philosophy, in logics
(including non-standard logics), in epistemology as well as in philosophy of science. Here a probabilistic Bayesian account of the
induction of conditionals based on categorical data is proposed that
draws on different traditions and suggests a synthesis of several
aspects of some earlier approaches.
Three Main Accounts of Conditionals
There is much controversy in philosophy and psychology over how
indicative conditionals should be understood, and to whether this
relates to the material implication, to conditional probabilities, or to
some other formalization (e.g. Anderson, Belnap 1975; Ali, Chater,
Oaksford 2011; Byrne, Johnson-Laird 2009; Edgington 2003; Beller
2003; Evans, Over 2004, Kern-Isberner 2001; Krynski, Tenenbaum
2007; Pfeiffer 2013; Johnson-Laird 2006; Leitgeb 2007; Oaksford,
Chater 2007, cf. 2010; Oberauer 2006; Oberauer, Weidenfeld, Fischer
2007; Over, Hadjichristidis, Evans, Handley, Sloman 2007;). Three
main influential approaches, on which we will build, may be
distinguished:
One class of approaches is based on the material implication. A
psychological variant replaces this interpretation (with a T F T T truth

Cogn Process (2014) 15 (Suppl 1):S1S158


table by mental models akin either to complete truth tables or to only
the first two cases of such a truth table (Johnson-Laird 2006; cf.
Byrne, Johnson-Laird 2009). The present approach adopts the idea
that a conditional if p then q may be represented with reference
either to a full 2 9 2 contingency table or simply with reference to
the cells relating to the antecedent p (i.e., p & q, p & non-q).
Another class uses a conditional probability interpretation, thus
referring only to the first two cells of a contingency table (Stalnaker
1968, cf. Eddington 2003; Evans, Over 2004; Oberauer et al. 2007;
Pfeifer 2013). This is often linked to assuming the hypothetical or
counterfactual occurrence of the antecedent p (cf. Ramsey test). Here
we take conditional probabilities as a starting-point for a probabilistic
understanding of conditionals, while adding advantages of the mental
model approach. Moreover, here an extended Bayesian version of this
approach is advocated, concerned not with a hypothetical frequentist
(observed or imagined) relative frequency of q given p, but rather
with an inference about an underlying generative probability of
q given p that now depends on priors and sample size.
A subclass of the conditional probability approach additionally
assumes a high probability criterion for the predication of logical
propositions (cf. Foley 2009). This is essential to important classes of
non-monotonic logic (e.g., System P) demanding a high probability
threshold (a ratio of exceptions e) for the predication of a normic
conditional (Adams 1986; Schurz 2001, cf. 2005): P(q|p) [ 1 - e.
We here reformulate a high probability criterion in a Bayesian way
using second-order probability distributions (cf. von Sydow 2014).
Third, conditionals sometimes involve causal readings (cf. Hagmayer, Waldmann 2006; Oberauer et al. 2007) and methods of causal
induction (Delta P, Power, and Causal Support; Cheng 1997; Griffiths, Tenenbaum 2005; cf. Ali et al. 2011) that make use of all four
cells of a contingency table. Although conditionals have to be distinguished from causality (if effect then cause; if effect E1 then
effect E2; if cause C1 then cause C2), conditional probabilities
may not only form the basis for causality, but conditionals may be
estimated based on causality. Moreover, determining the probability
of conditionals may sometimes involve calculations similar to causal
judgments. In any case, approaches linking conditionals and causality
have not been fully developed for non-causal conditionals in situations without causal model information.
Bayesian Mental Model Approach of Conditionals (BMMC)
The Bayesian Mental Model Approach of Conditionals allows for
complete and incomplete models of conditionals (here symbolized as
p & [ q vs. p * [ q). It nonetheless models conditionals in a
probabilistic way. It is claimed that the probability of fully represented conditionals (P(p & [ q)) needs not to be equated with a
single conditional probability (P(q|p)). In contrast, the probability of
conditionals concerned with the antecedent p only, P(p * [ q), is
taken to be closely related to the relative frequency of the consequent
given the antecedent (its extension). However, the model does not
merely refer to the extensional probability Pe(q|p), but is concerned
with subjective generative probabilities affected by priors and sample
size.
The postulates of the approach and the modelling steps will be
sketched here (cf. von Sydow 2014, for a related model):
(1) Although BMMC relates to the truth values of conditionals and
biconditionals, etc. (Step 6), it assigns probabilities to these propositions as a whole (cf. Foley 2009, von Sydow 2011).
(2) BMMC distinguishes complete vs. incomplete conditionals.
This idea is adopted from mental model theory (Johnson-Laird, Byrne
1991; cf. Byrne, Johnson-Laird 2002). It is likewise assumed that
standard conditionals are incomplete. However, whereas mental model
theory has focused on cognitive elaboration as the cause for fleshing
out incomplete conditionals, the use of complete vs. incomplete conditionals is primarily linked here to the homogeneity or inhomogeneity
of the occurrence of q in the negated subclasses of the antecedent

S149
p (i.e. non-p) (cf. Bellers 2003, closed-world principle). Imagine
homogeneity of non-p with P(q|p) = P(q|non-p) = .82 (e.g., if one
does p then one gets chocolate q but for non-p cases one gets
chocolate with the same probability as well.) Here it seems inappropriate to assign the high probability of P(q|p) to P(p & [ q) as well,
since the antecedent does not make a difference. However, consider a
similar case were non-p is heterogeneous. Take nine subclasses in
which P(q|non-p) = .9 and one in which P(q|non-p) = .1 (this yields
the same average of P(q|non-p) = .82). For such a heterogeneous
contrast class, the conditional is indeed taken to singles out only the
specific subclass p (similar to the conditional probability approach),
since there is at least one potential contrast in one subclass of nonp For the homogeneous case, however, the probability of the conditional is claimed to reflect the overall situation, and a high probability
here would involve a difference between P(q|non-p) and P(q| p).
(3) BMMC represents the simpler, antecedent-only models of
conditionals, not as extensional probabilities, or relative frequencies
of (observed or imagined) conditionals, but as subjective estimates of
generative probabilities that have produced them. Although similar to
a conditional probability approach, i.e. PE(q|p), this measure depends
on priors and sample size. For flat priors observing a [4; 1] input
(f(p&q), f(p&non-q)) yields a lower P(p * [q) than for a larger
sample size, e.g. [40; 10]. Particularly for low sample sizes, priors
may overrule likelihoods, reversing high and low conditional probability judgments.
Formally, the model uses cases of q or non-q, conditional on p, as
input (taken as Bernoulli trials with an unchanging generative probability h). Given a value of h the Binomial distribution provides us
with the likelihood of the data, P(D| h), with input k = f(q|p) in
n = f(q|p) + f(q|p) trials:
 
n k
Bkjh; n
h 1  hnk
k
We obtain a likelihood density function for all h (cf. middle
Fig. 1), resulting in a Beta distribution, now with the generative
probability h as an unknown parameter (with a-1 = f(x = q|p) and
b-1 = f(x = :q|p):
Betaa; b Phja; b const:  ha1 1  hb1
As prior for h we take the conjugate Beta distribution (e.g.,
Beta(1,1) as flat prior) to calculate easily a Beta posterior probability

Fig. 1 Example for the prior for h, the Binomial likelihood and the
Beta posterior distribution over h

123

S150
distribution for h (Fig. 1) that depends on sample size and priors. Its
mean is a rational point estimate for the subjective probability of
q given p.
(4) In contrast, given fully represented conditionals (no heterogeneous contrast class), the probability of a conditional even more
clearly differs from (extensional) conditional probabilities (cf. Leitgeb 2007). One option would be to apply a general probabilistic
pattern logic (von Sydow 2011) to conditionals. In this case, conditionals, however, would yield the same results as inclusive
disjunctions P(p & [ q) = P(:p _ q). Albeit here concerned with all
four cells of a logical truth table, another option is that conditionals
have a direction even in non-causal settings. This assumption will be
pursued here. A hypothetical causal-sampling assumption that asserts
hypothetical antecedent-sampling for conditionals (Fiedler 2000), as
if assuming that the antecedent would have caused the data (cf.
Stalnaker 1968; Evans, Over 2004). (In the presence of additional
causal knowledge, one may correct for this, but this is not modelled
here.) Based on the generative models of conditional probabilities
(Step 3), here generative versions of delta P (Allan, Jenkins 1980) or
causal power (Cheng 1997) are suggested as another possible formalization of a full conditional.
Formally, the two conditional probability distributions (for q|p and
q|non-p) are determined based Step 3. To proceed from the two beta
posterior distributions on the interval [0, 1], to a distribution for Delta
P, relating to P(q|p)-P(q|non-p) in the interval [- 1, 1], one can use
standard sampling techniques (e.g. inversion or rejection method,
Lynch 2007). For the sequential learning measure for causal power
one proceeds analogously. The means of the resulting probability
distributions may be taken as point estimates. However, these Delta P
and causal power may not be flexible enough (see Step 6).
(5) Let us first return to incomplete conditionals (Step 3). Even
here the probability of a conditional P(p * [ q) may have to be
distinguished from the conditional probability, even if modelled as a
generative conditional probability (Step 3). To me there are to other
plausible options: One option would be to model probabilities of
conditionals along similar lines as other connectives have been
modelled in von Sydow (2011). Here I propose another option, closely related to another proposal von Sydow (2014). This builds on the
general idea of high probability accounts (Adams 1986; Schurz 2001,
cf. 2005; Foley 2009), here specifying acceptance intervals over h.
This seems particularly suitable if concerned with the alternative
testing of the hypotheses p * [ q, p * [ non-q, and,
p * [ q_non-q (e.g., if one does p then one either gets chocolate
q or does not). This links to the debate concerning conjunction
fallacies and other inclusion fallacies (given p, q_non-q refers to the
tautology and includes the affirmation q; cf. von Sydow 2011, 2014).
Formally, we start with ideal generative probabilities on the h
scale (hq = 1; hnon-q = 0; p and hq_non-q = .5) (cf. von Sydow 2011).
We then vary for each of the three hypotheses H, the acceptance
threshold e (over all, or all plausible, values). For e = .2, the closed
acceptance inter-val for the consequent q would be [.8, 1]; for non-q,
[0, .2]; and for q_non-q, [.4, .6]. Based on Step 3 we calculate for all
tested hypotheses the integral over h in the specified interval of the
posterior probability distribution:
h2

r Posterior distributionh; H
h1

This specifies the subjective probability that for given observed


data the posterior probability of H is within the acceptance interval
(cf. von Sydow 2011). The probability of each hypothesis is
determined by adding up the outcomes for H over different levels
of e and normalizing the results over the alternative hypothesis (e.g.,
alternative conditionals). This provides us with a kind of pattern

123

Cogn Process (2014) 15 (Suppl 1):S1S158


probability Pp of the hypotheses, predicting systematic (conditional)
inclusion fallacies (e.g., allowing for Pp(q_non-q|p) \ Pp(q|p)).
(Additionally, such intervals over h may help to model quantifiers:
If x are p then most x are q, cf. Bocklisch 2011).
(6) In continuation of Step 4, and analogous to Step 5, we detail
the alternative testing of p &[q, p &[non-q, and p &[(q _ nonq) for complete conditionals. Since this includes representation of
non-p as well, we can also model the converse conditionals (\ &,
probabilistic necessary conditions) and biconditionals (\ &[, probabilistic necessary and sufficient conditions) as alternatives to
conditionals (&[, probabilistic sufficient conditions). First, to
determine homogeneity of non-p subclasses (cf. Step 2), Step 5 is to
be applied repeatedly, revealing whether each subclass is rather q,
non-q, or q_ non-q. If the dominant results for all subclasses do not
differ, we can determine the probability of a fully represented conditional. We make use of the results for the incomplete conditionals
(for p or non-p; cf. Step 5). Related to conditionals, converse conditionals or biconditionals (or their full mental models), we interpret
ideal conditionals p &[q, at least in the presence of alternative biconditionals, as the combination of p *[q and non-p * [ (q_nonq); ideal biconditionals p \ &[q as combinations of p *[q and
non-p * [ non-q; and ideal converse conditionals p \ &q as the
combination of p *[(q _ non-q) and non-p * [ q. Sometimes a
connective may refer to more than one truth table: In the absence of
biconditionals, P(p &[q) is taken to be the mixture of a conditional
and a biconditional. Likewise the approach allows to model, for
instance, if p then q or non-q (p & [ (q_non-q)) as average of two
truth table instantiations (with non-p either being q or, in another
model, non-q).
Technically it is suggested that one can obtain the pattern probabilities of the combination of the incomplete models by assuming
their independence and by multiplying their outcome; e.g.:
Pp(p & [ q) = Pp(p * [ q) * Pp(non-p * [ q_non-q). If the
hypothesis-space is incomplete or if other logical hypotheses are
added (von Sydow 2011; 2014), the results need to be normalized to
obtain probabilities for alternative logical hypotheses.
Conclusion
Overall the sketched model is suggested to provide an improved
rational model for assessing generative probabilities for conditionals,
biconditionals, etc. The model predicts differences for complete and
incomplete mental models of conditionals, influences of priors,
influences of sample size, probabilistic interpretations of converse
conditionals and biconditionals, hypothesis-space dependence, and
conditional inclusion fallacies. Although all these phenomena seem
plausible in some situations, none of the previous models, each with
their specific advantages, seems to cover all predictions. Throughout
its steps the present computational model may contribute to predicting
a class of conditional probability judgments (perhaps complementing
extensional conditionals) by potentially integrating some divergent
findings and intuitions from other accounts into a Bayesian framework of generative probabilities of conditionals.
Acknowledgments
This work was supported by the grant Sy 111/2-1 from the DFG as
part of the priority program New Frameworks of Rationality (SPP
1516). I am grateful to Dennis Hebbelmann for an interesting discussion about modelling causal power in sequential learning scenarios
(cf. Step 4). Parts of this manuscript build on von Sydow (2014),
suggesting a similar model for other logical connectives.
References
Adams EW (1986) On the logic of high probability. J Philos Logic
15:255279

Cogn Process (2014) 15 (Suppl 1):S1S158


Allan LG, Jenkins HM (1980) The judgment of contingency and the
nature of the response alternative. Can J Psychol 34:111
Ali N, Chater N, Oaksford M (2011) The mental representation of
causal conditional reasoning: Mental models or causal models.
Cognition 119:403418
Anderson AR, Belnap N (1975) Entailment: the logic of relevance
and necessity, vol I. Princeton University Press, Princeton
Beller S (2003) The flexible use of deontic mental models. In R.
Alterman, D. Kirsh (eds) Proceedings of the Twenty-Fifth
Annual Conference of the Cognitive Science Society. Lawrence
Erlbaum, Mahwah, pp 127132
Bocklisch F (2011) The vagueness of verbal probability and frequency expressions. Int J Adv Comput Sci 1(2):5257
Byrne RMJ, Johnson-Laird PN (2009) If and the problems of conditional reasoning. Trend Cogn Sci 13:282286
Cheng PW (1997) From covariation to causation: A causal power
theory. Psychol Rev 104:367405
Edgington D (2003) What if? Questions about conditionals. Mind
Lang 18:380401
Evans JSBT, Over DE (2004). If. Oxford University Press, Oxford
Fiedler K (2000) Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychol Rev 107:659676
Foley R (2009) Beliefs, degrees of belief, and the Lockean Thesis. In:
Huber F, Schmidt-Petri C (eds) Degrees of belief, synthese
library 342, Heidelberg: Springer
Griffiths TL, Tenenbaum JB (2005) Structure and strength in causal
induction. Cogn Psychol 51:334384
Hagmayer Y, Waldmann MR (2006) Kausales Denken. In Funke J
(ed) Enzyklopadie der Psychologie Denken und Problemlosen,
Band C/II/8 (S. 87166). Hogrefe Verlag, Gottingen
Johnson-Laird PN, Byrne RMJ (2002) Conditionals: A theory of
meaning, pragmatics, and inference. Psychol Rev 109:646678
Johnson-Laird PN (2006) How We Reason. Oxford University Press,
Oxford
Kern-Isberner G (2001) Conditionals in Nonmonotonic Reasoning
and Belief Revision. Springer, Heidelberg
Krynski TR, Tenenbaum JB (2007) The role of causality in judgment
under uncertainty. J Exp Psychol Gen 3:430450
Leitgeb H (2007) Belief in conditionals vs. conditional beliefs. Topoi
26(1):115132
Lynch SM (2007) Introduction to Applied Bayesian Statistics and
Estimation for Social Scientists. Springer, Berlin
Oaksford M, Chater N (2010) Cognition and Conditionals (eds).
Probability and Logic in Human Reasoning. Oxford University Press,
Oxford
Oberauer K (2006) Reasoning with conditionals: A test of formal
models of four theories. Cogn Psychol 53:238283
Oberauer K, Weidenfeld A, Fischer K (2007) What makes us believe
a conditional? The roles of covariation and causality. Think Reason
13:340369
Over DE, Hadjichristidis C, Evans JSBT, Handley SJ, Sloman SA
(2007) The probability of causal conditionals. Cogn Psychol 54:6297
Pfeifer N (2013) The new psychology of reasoning: a mental probability logical perspective. Think Reason 19:329345
Schurz G (2005) Non-monotonic reasoning from an evolutionary
viewpoint. Synthese 146:3751
von Sydow M (2011) The Bayesian logic of frequency-based conjunction fallacies. J Math Psychol 55(2):119139
von Sydow M (2014) Is there a Monadic as well as a Dyadic Bayesian
Logic? Two Logics Explaining Conjunction Fallacies. In: Proceedings of the 36th annual conference of the cognitive science
society. Cognitive Science Society, Austin

S151

Visualizer verbalizer questionnaire: evaluation


and revision of the German translation
Florian Wedell, Florian Roser, Kai Hamburger
Giessen, Germany
Abstract
Many everyday abilities depend on various cognitive styles. With the
Visualizer-Verbalizer Questionnaire (VVQ) we here translated a
well-established inventory to distinguish between verbalizers and
visualizers into German language and evaluated it. In our experiment
476 participants answered the VVQ in an online study. Results of this
experiment suggest that indeed only eight items measure, what they
are supposed to. To find out, whether these eight items are usable as a
future screening tool, we currently run further studies. The VVQ
translation will be discussed with respect to the original VVQ.
Keywords
Cognitive styles, Evaluation, Translation, Visualizer, Verbalizer,
VVQ
Introduction
When I learn or think about things, I imagine them very pictorially.
People often describe their ability of learning or thinking in one of
two possible directions. Either they state that they are the vivid
type, whose thoughts are full of colors and images or they describe
themselves as the word-based-person, which seems often a bit cold
and more rational.
In the nineteen-seventies Baddeley and Hitch (1974) demonstrated
how important the working memory is for everyday life. It seems as if
the way of how we learn and describe things is more or less unconscious, but this fundamental ability is determined by individual
preferences. Individual preferences and individual abilities are very
important for various human skills, e.g. wayfinding, decision making.
Therefore, they have to be taken into account throughout the whole
domain of spatial cognition (e.g., Pazzaglia, Moe` 2013).
One way of dealing with the necessary interindividual differentiation in wayfinding performance is to distinguish between peoples
cognitive style (Klein 1951) ormore preciselythe preferred
components of their working memory. In their model Baddeley and
Hitch (1974) assumed that the central executive is a kind of attentive
coordinator of verbal and visuo-spatial information in certain ways.
Riding (2001) stated that one of the main dimensions of cognitive
styles is the visualizer-verbalizer-dimension. Therefore it is common
in cognitive research to differentiate between preferring visual
(visualizer) and/or verbal (verbalizer) information (e.g. Richardson
1977; Pazzaglia, Moe` 2013). Considering this classification it can be
assumed that visualizers seem to be people with high-imagery preferences and verbalizers tend to have low-imagery preferences. These
two styles are generally accounted for with self-report-instruments.
As Jonasson and Grabowski (1993) concluded, the primarily used
tool to distinguish between visualizer and verbalizer is the VisualizerVerbalizer Questionnaire (VVQ; Richardson 1977). The VVQ contains 15 items. Participants have to answer each of the given items by
judging in how they apply to their style of thinking (dichotomy; yes/
no). Still there is an unsolved problem concerning the VVQ. The
verbal subscale indeed surveys verbal abilities (e.g., Kirby et al.
1988), whereas the items of the visual subscale are only partly connected to visuo-spatial abilities (e.g., Edwards, Wilkins 1981; Kirby
et al. 1988). Another problem concerning the VVQ is that it is rather
hard to find people that can clearly be assigned to one of the
extremes of the visualizer-verbalizer-dimension, since most participants are located somewhere in between and may not be assigned

123

S152
to one of the two dimension poles. Preliminary studies in our research
group revealed that in some cases an estimate of about 50 participants
had to be investigated for cognitive style with the VVQ in order to
clearly assign 23 people to one of the two groups, which is not very
useful and also not very economic for further research.
In the present study, our aim is to translate the VVQ into German
language. It seems to be necessary to translate and evaluate this
questionnaire, since it is not evaluated and because of the lack of an
equivalent tool freely available for research on the visualizer-verbalizer-dimension in the German-speaking area.
Experiment
Method
Participants
A total of 476 participants (377 female/99 male), ranging from 18 to
50 years (M = 24.14 years) were examined anonymously in an
online study during the period from 12/16/2013 to 01/14/2014. Most
of the participants highest educational attainment was claimed to be a
high-school diploma (n = 278), followed by university degree
(n = 195) and other school graduation (n = 3). All participants were
told that the study served to evaluate several translated forms of
questionnaires, which included the VVQ. Participation was voluntary
and was not compensated for in any way.
Materials
The used material was the VVQ in its translated form. Table 1 shows
the translation of the whole inventory. The questionnaire was translated in three steps. In the first step, the VVQ was translated by the
first author of this study. Negatively formulated items were formulated negatively in German as well. Then in step two the translation
was corrected by the two co-authors. In the third step, a bilingual
member (native English- and German-speaking) of the research group
of Experimental Psychology and Cognitive Science corrected the
translated items on colloquial subtleties.
After the translation process the online-study was setup with
LimeSurvey, a tool for creating and conducting online studies.
Procedure
Participants were recruited with an E-Mail containing basic information and the Hyperlink to the study webpage. Forwarded to the
webpage via the Hyperlink, participants first received a short introduction about the aim of the study, followed by three standard
demographical questions (gender, age, and level of education; Fig. 1).
A specific instruction marked the start of the VVQ. Participants
were asked to answer each item with either yes or no and, if they were
not able to answer an item neither with yes nor no, they were asked to
choose the answer that most likely applied to them. The translated
items of the VVQ were presented in the same order as in the original
version of the questionnaire.
Results
Before reporting the results of the VVQ, it should be noted that we
were unable to compare our findings with the original data, due to the
lack of statistical data in the original study by Richardson (1977).
After reversing the code of negatively formulated items, we analyzed
the VVQ with a factor analysis and Varimax rotation. The assumed
two factors were preset. Each of the two factors had an eigenvalue
above two (2.32 and 2.42) and taken together, these factors explained
31.59 % of the variance. Table 2 shows the results of the factor
analysis in detail. We only found eight items matching their predicted
scale, while each scale contains four items. The other seven items
could not clearly be assigned to one of these scales. Figure 2 shows a
diagram of the items in the rotated space to illustrate the distribution
of each item to the respective underlying factor.
Cronbachs alpha (a = .04) of the translated version is very weak,
when considering the whole inventory, but reaches at least a moderate
level (a = .57), when items 06, 07, 08, 09, 10, 13 and 14 are
eliminated.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Table 1 VVQ items (Richardson 1977) and the German translation
VVQ_01

I enjoy doing work that requires the use of words


Mir machen Aufgaben Spa, bei denen man mit Wortern
umgehen muss

VVQ_02

My daydreams are sometimes so vivid I feel as though I


actually experience the scene
Meine Tagtraume fuhlen sich manchmal so lebendig an,
dass ich meine, sie wirklich zu erleben

VVQ_03

I enjoy learning new words


Das Lernen neuer Worter macht mir Spa

VVQ_04

I easily think of synomyms for words


Es fallt mir leicht. Synonyme von Wortern zu finden

VVQ_05

My powers of imagination are higher than average


Ich besitze eine uberdurchschnittliche Vorstellungskraft

VVQ_06

I seldom dream
Ich traume selten

VVQ_07

I read rather slowly


Ich lese eher langsam

VVQ_08

I cant generate a mental picture of a friends face when I


close my mind
Wenn ich meine Augen schliee, kann ich mir das
Gesicht eines Freundes nicht bildhaft vorstellen

VVQ_09

I dont believe that anyone can think in terms of mental


pictures
Ich glaube nicht, dass jemand in Form mentaler Bilder
denken kann

VVQ_010 I prefer to read instructions about how to do something


rather than have someone show me
Ich lese lieber eine Anleitung, als mir von jemand
anderem ihren Inhalt vorfuhren zu lassen
VVQ_011 My dreams are extremely vivid
Meine Tagtraume sind extrem lebhaft
VVQ_012 I have better than average fluency in using words
Meine Wortgewandtheit ist uberdurchschnittlich
VVQ_013 My daydreams are rather indistinct and hazy
Meine Tagtraume sind eher undeutlich und
verschwommen
VVQ_014 I spend very little time attempting to increase my
vocabulary
Ich verbringe wenig Zeit damit, meinen Wortschatz zu
erweitern
VVQ_015 My thinking often consists of mental pictures or images
Ich denke sehr haufig in Form von Bildern

Discussion
The investigation of the VVQ reveals a large deviation between the
original VVQ and the translated version. The data suggests that the
translated VVQ contains the two predicted main factors (visualizer
and verbalizer). These two factors or in other words the two extreme
poles of the visualizer-verbalizer-dimension are covered with four
items each. These are the items 02, 05, 11 and 15 for the visualizerpole and the items 01, 03, 04 and 12 for the verbalizer-pole. The
remaining seven items cannot clearly be attributed to one of the poles.

Cogn Process (2014) 15 (Suppl 1):S1S158

S153

Fig. 1 Screenshot of presented demographic items in LimeSurvey;


first the dichotomous question for the participants gender, second a
free-text field for age and third a drop-down box where participants
choose their level of education
Table 2 Underlying factors of the translated version of the VVQ
items after Varimax rotation
Item

Verbalizer

Visualizer

01

.754

02

.006

.037
.618

03

.684

-.034

04

.655

.068

05

.216

.423

06

.032

-.619

07
08

-.300
-.097

-.030
-.280

09

-.058

-.214

10

.127

-.151
.657

11

-.087

12

.587

.073

13

-.089

-.692

14

-.655

-.131

15

.011

.533

Fig. 2 Diagram of components in the rotated space; cluster with items


VVQ_02, 05, 11, 15 represents the visualizer-based items; cluster with
items VVQ_01, 03, 04, 12 represents the verbalizer-based items

Work in progress: Revising the VVQ


When analyzing the data of the VVQ, the results show that nearly half
of the questionnaire does not measure in detail whether a person is a
visualizer or a verbalizer. This finding matches some data of our
research group that both styles are not separable from each other, but
a small number of people can clearly be assigned to one of the two
groups. This shows that a translated form of the VVQ is not able to
exactly distinguish between visualizer and verbalizer, which can also
be assumed for the original version of the VVQ.
The results could be explained in the way that, due to the translation process, the intended item content changed. An aspect that
supports this assumption is that in some cases the participants
answered in the wrong direction, as item 14 as an example illustrates:
I spend very little time attempting to increase my vocabulary is
translated with Ich verbringe wenig Zeit damit, meinen Wortschatz
zu erweitern into German language. The problem is that the German
translation could induce two possible solutions for a participant to
answer this item with yes, which makes a participant both, either
visualizer or verbalizer. The first solution, which marks the participant as a verbalizer is that the participant wants to say yes, I
spend little time, because there is no need for me to spend more
time on learning that stuff, as I already am very good. The second
possible solution would clearly mark the participant as visualizer, when he or she answers in the intended way with yes, I spend
very little time on it, because I do not care much about that stuff. To
solve this problem, it seems to be necessary to change the phrasing of
several items. But when doing so, it is inevitable to change most
parts of the inventory or even the whole inventory. We assume that
one possible way to work with the translated form of the VVQ is to
reduce the VVQ to eight items, namely the eight items, which
are clearly definable as being part of the visualizer- or verbalizer-pole
and use the inventory as a screening-test only. We currently do
research on this possibility with a second online study. In this study
our participants are asked to answer this VVQ screening version. On
the one hand, we want to investigate, whether a strict distinction
between visualizers and verbalizers is possible or if there is only one
cognitive style as a result of the combination of both, visual and
verbal abilities.
Our research group also plans to use the translated VVQ as a
pretest in further investigations on the visual-impedance effect
(Knauff, Johnson-Laird 2002). The visual-impedance effect is
described with relations that elicit visual images containing details
that are irrelevant to an inference problem and in turn (should) impede
the reasoning process (Knauff, Johnson-Laird 2002). The VVQ might
help to discover whether visualizers or verbalizers are more affected
by the visual-impedance effect. We assume that (extreme) verbalizers
might not be as much affected as (extreme) visualizers, because their
preference to imagine is more word-based (or propositional) and
therefore their reasoning process might not be as much disrupted as it
might be with the visualizers.
Further research and conclusion
The VVQ is a widely used tool in the German research area. One
reason for this is that it is freely available (in contrast to some other
questionnaires like the OSIQ (Blajenkova et al. 2006). Therefore, we
here consider creating a completely new inventory that fits the German language better with the eight definable items of the VVQ as a
basis. There are two ways to fill the inventory with items. In the first
way, we suggest to translate and evaluate the revised version of the
VVQ by Kirby et al. (1988) and put the eight items from the original
VVQ together with the best fitting items from the revised version by
Kirby into a new inventory. Another way is to create completely new
items. We think that the pioneering work of Richardson (1977) is
neither lost nor unusable, but we conclude that his work and the VVQ
need to be revised for further use.

123

S154
Acknowledgment
We thank Sarah Jane Abbott for help within the translations process
and for proof-reading the manuscript.
References
Baddeley AD, Hitch G (1974) Working memory. In Bower GH (ed)
The psychology of learning and motivation: advances in
research and theory. Academic Press, New York, pp 4789
Blajenkova O, Kozhevnikov M, Motes MA (2006) Object-spatial
imagery: a new self-report imagery questionnaire. Appl Cogn
Psychol 20:239263
Edwards JE, Wilkins W (1981) Verbalizer-visualizer questionnaire:
relationship with imagery and verbal-visual ability. J Mental
Imagery 5:137142
Jonasson DH, Grabowski BL (1993) Handbook of individual differences, learning, and instruction. Erlbaum, Hillsdale

123

Cogn Process (2014) 15 (Suppl 1):S1S158


Kirby JR, Moore PJ, Schofield NJ (1988) Verbal and visual learning
styles. Contemp Educ Psychol 13:169184
Klein GS (1951) A personal world through perception. In Blake RR,
Ramsey GV (eds) Perception: an approach to personality. The
Ronald Press Company, New York, pp 328355
Knauff M, Johnson-Laird P N (2002) Visual imagery can impede
reasoning. Mem Cogn 30(3):363371
Kosslyn SM, Koenig O (1992) Wet mind: the new cognitive neuroscience. Free Press, New York
Pazzaglia F, Moe` A (2013) Cognitive styles and mental rotation
ability in map learning. Cogn Process 14:391399
Richardson A (1977) Verbalizer-visualizer: a cognitive style dimension. J Mental Imagery 1(1):109125
Riding RJ (2001) The nature and effects of cognitive style. In
Sternberg RJ, Zhang L (eds) Perspectives on thinking, learning,
and cognitive styles. Erlbaum, Mahwah, pp 4772

Cogn Process (2014) 15 (Suppl 1):S1S158

S155

Author Index
For each author, references are given to the type of contribution, if (s)he is the first author, or to the first author, if (s)he is a
co-author. Within each type, contributions are ordered alphabetically.
Afsari Z.
Albrecht R.
Alex-Ruf S.
Aschersleben G.
Augurzky P.
Bader M.
Bahnmueller J.
Baier F.
Baumann M.
Bech M.
Bekkering H.
Bennati S.
Bergmann K.
Bernhart N.
Besold T. R.
Bianco R.
Biondi J.
Blascheck T.
Blasing B.
Bogart K.
Bohn K.
Bott O.
Brandenburg S.
Brandi M.
Brauer R. R.
Braun C.
Braun D. A.
Brauner C.
Brock J.
Buchel C.
Bulthoff H. H.

Burch M.
Burigo M.
Buschmeier H.
Butz M. V.

Caruana N.
Chang D.

? POSTER PRESENTATION
? POSTER PRESENTATIONS (2);
ORAL PRESENTATIONS (2)
? POSTER PRESENTATION
? SYMPOSIUM (Koester)
? SYMPOSIUM (Brauner, Jager, Rolke)
? ORAL PRESENTATIONS (2);
Ellsiepen, E.
? SYMPOSIUM (Nuerk)
? POSTER PRESENTATION;
Hamburger, K.
? SYMPOSIUM (Baumann)
? POSTER PRESENTATION; Michael, J.
? KEYNOTE LECTURE; SYMPOSIUM
(Koester)
? ORAL PRESENTATION; Rizzardi, E.
? ORAL PRESENTATION
? POSTER PRESENTATION; Schad, D.
? POSTER PRESENTATION,
ORAL PRESENTATION
? ORAL PRESENTATION
? ORAL PRESENTATION; Blasing, B.
? TUTORIAL (Raschke)
? POSTER PRESENTATION;
ORAL PRESENTATION; Seegelke, C.
? ORAL PRESENTATION; Michael, J.
? ORAL PRESENTATION; Kandylaki, K.
? SYMPOSIUM (Brauner, Jager, Rolke)
? SYMPOSIUM (Baumann)
? SYMPOSIUM (Himmelbach)
? POSTER PRESENTATION;
Fischer, N.M.
? POSTER PRESENTATION
? SYMPOSIUM (de la Rosa); POSTER
PRESENTATION; Leibfried, F.
? SYMPOSIUM (Brauner, Jager, Rolke)
? ORAL PRESENTATION; Caruana, N.
? POSTER PRESENTATION; Wache, S.
? POSTER
PRESENTATIONS
(5);
ORAL PRESENTATION (2);
Chang, D; Glatz, C; Hohmann, M.R.;
Meilinger, T.; Symeonidou, E.;
Chang, D.; Scheer, M.
? TUTORIAL (Raschke)
? SYMPOSIUM (Knoeferle, Burigo)
? POSTER PRESENTATION
? POSTER PRESENTATION; ORAL
PRESENTATIONS (2); Lohmann, J.;
Ehrenfeld, S.; Schrodt, F.
? ORAL PRESENTATION
? POSTER PRESENTATION;
ORAL PRESENTATION

Chuang L. L.

Ciaunica A.
Colombo M.
Coogan J.
Coyle D.
Cremers A. B.
Damaskinos M.
Daroczy G.
de la Rosa S.

de la Vega I.
de Lange F. P.
Demarchi G.
Demberg V.
Dittrich K.
Domahs U.
Dorner D.
Dowker A.
Dshemuchadse M.
Dudschig C.

Egan F.
Ehrenfeld S.
Ehrsson H. H.
Ellsiepen E.
Engelbrecht K.
Engelhardt P. E.
Fard P. R.
Fengler A.
Fernandez L. B.
Fernandez S.R.

Festl F.
Fischer M. H.
Fischer N. M.
Frankenstein J.
Franzmeier I.

? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION; Glatz, C.;
Symeonidou, E.; Scheer, M.
? ORAL PRESENTATION
? SYMPOSIUM (Morgan)
? ORAL PRESENTATION; Blasing, B.
? POSTER PRESENTATION;
Limerick, H.
? ORAL PRESENTATION; Garcia, G.M.
? POSTER PRESENTATION
? SYMPOSIUM (Nuerk)
? SYMPOSIUM (de la Rosa); POSTER
PRESENTATIONS (2); ORAL
PRESENTATION Chang, D.;
Hohmann, M.R.; Chang, D.
? POSTER PRESENTATIONS (2);
Wolter, S.
? SYMPOSIUM (Koester)
? POSTER PRESENTATION; Braun, S.
? SYMPOSIUM (Knoeferle, Burigo)
? POSTER PRESENTATION;
Scholtes, C.
? ORAL PRESENTATION;
Kandylaki, K.
? POSTER PRESENTATION;
Damaskinos, M.
? SYMPOSIUM (Nuerk)
? POSTER PRESENTATION; Frisch, S.
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION;
de la Vega, I.; Wolter, S.; Lachmair, M.
? SYMPOSIUM (Morgan)
? ORAL PRESENTATION
? KEYNOTE LECTURE
? ORAL PRESENTATION
? ORAL PRESENTATION;
Halbrugge, M.
? SYMPOSIUM (Knoeferle, Burigo)
? POSTER PRESENTATION; Yahya, K.
? POSTER PRESENTATION; Krause, C.
? SYMPOSIUM (Knoeferle, Burigo)
? POSTER PRESENTATION;
ORAL PRESENTATIONS (2);
Lachmair, M.; Rolke, B.
? POSTER PRESENTATION;
Seibold, V.C.
? SYMPOSIUM (Nuerk); POSTER
PRESENTATION; Sixtus, E.
? POSTER PRESENTATION
? POSTER PRESENTATION;
Meilinger, T.
? ORAL PRESENTATION; Ragni, M.

123

S156
Freksa C.
Frey J.
Friederici A. D

Friedrich C. K.
Frintrop S.
Frisch S.
Friston K.

Fusaroli R.
Garbusow M.
Garcia G. M.
Giese M. A.
Giewein M.
Glatz C.
Godde B.
Goebel S.
Goldenberg G.
Goltenboth N.
Gomez O.
Goschke T.
Grau-Moya J.
Gray W. D.
Grishkova I.
Grosjean M.
Groer J.
Grosz P.
Gunter T.
Guss C. D.
Halbrugge M.

Halfmann M.
Hamburger K.

Hardiess G.

Hartl H.
Heege L.
Hein E.
Heinz A.
Hellbernd N.
Henning A.
Herbort O.
Hermsdorfter J.
Hesse C.
Himmelbach M.
Hinterecker T.
Hofmeister J.
Hohmann M. R.

123

Cogn Process (2014) 15 (Suppl 1):S1S158


? ORAL PRESENTATION
? POSTER PRESENTATION; Braun, C.
? POSTER PRESENTATION;
ORAL PRESENTATION; Krause, C.;
Bianco, R.
? POSTER PRESENTATION; Schild, U.
? ORAL PRESENTATION; Garcia, G.M.
? POSTER PRESENTATION
? KEYNOTE LECTURE; SYMPOSIUM
(Morgan); POSTER PRESENTATION;
Yahya, K.
? ORAL PRESENTATION; Michael, J.
? POSTER PRESENTATION; Schad, D.
? ORAL PRESENTATION
? SYMPOSIUM (de la Rosa)
? POSTER PRESENTATION;
Albrecht, R.
? POSTER PRESENTATION
? SYMPOSIUM (Koester)
? SYMPOSIUM (Nuerk)
? SYMPOSIUM (Himmelbach)
? POSTER PRESENTATION
? ORAL PRESENTATION
? POSTER PRESENTATION; Frisch, S.
? POSTER PRESENTATION;
Leibfried, F.
? KEYNOTE LECTURE
? POSTER PRESENTATION
? POSTER PRESENTATION
? POSTER PRESENTATION
? SYMPOSIUM (Brauner, Jager, Rolke)
? SYMPOSIUM (Koester)
? POSTER
PRESENTATIONS
(2);
Damaskinos, M.; Goltenboth, N.
? SYMPOSIUM (Russwinkel, Prezenski,
Lindner); TUTORIAL (Russwinkel,
Prezenski, Joeres, Lindner, Halbrugge);
ORAL PRESENTATION
? POSTER PRESENTATION;
Hardiess, G.
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION (2); Roser, F.;
Strickrodt, M.; Wedell, F.
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION; Schick, W.;
Mallot, H.A.
? POSTER PRESENTATION;
Kotowski, S.
? POSTER PRESENTATION
? POSTER PRESENTATION;
Kutscheidt, K.
? POSTER PRESENTATION; Schad, D.
? POSTER PRESENTATION
? SYMPOSIUM (Koester)
? SYMPOSIUM (Koester)
? SYMPOSIUM (Himmelbach)
? SYMPOSIUM (Himmelbach)
? SYMPOSIUM (Himmelbach); POSTER
PRESENTATION; Rennig, J.
? ORAL PRESENTATION;
Strickrodt, M.
? POSTER PRESENTATION; Lancier, S.
? POSTER PRESENTATION

Holle H.
Huber S.
Huys Q.
Jager G.
Jakel F.
Janczyk M.
Javadi A. H.
Joeres F.

Junger E.
Kahl S.
Kandylaki K.
Karnath H.
Kathner D.
Kaul R.
Kaup B.

Keller P.
Keyser J.
Kircher T.
Klauer K. C.
Knoblich G.
Knoeferle P.
Koester D.
Konig P.
Konig S. U.
Kopp S.

Kotowski S.
Krause C.
Kroczek L.
Kruegger J.
Kuhl D.
Kunde W.
Kurzhals K.
Kutscheidt K.
Lachmair M.
Lancier S.
Lappe M.
Le Bigot M.
Leibfried F.
Limerick H.
Lindemann O.
Lindner A.
Lindner N.
Lindner S.

? SYMPOSIUM (Koester)
? SYMPOSIUM (Nuerk); POSTER
PRESENTATION; Radler, P.A.
? POSTER PRESENTATION; Schad, D.
? SYMPOSIUM (Brauner, Jager, Rolke)
? TUTORIAL (Jakel)
? POSTER PRESENTATIONS;
ORAL PRESENTATION; Groer, J.
? POSTER PRESENTATION; Schad, D.
? TUTORIAL (Russwinkel, Prezenski,
Joeres, Lindner, Halbrugge); POSTER
PRESENTATION
? POSTER PRESENTATION; Schad, D.
? ORAL PRESENTATION;
Bergmann, K.
? ORAL PRESENTATION
? POSTER PRESENTATION; Rennig, J.
? SYMPOSIUM (Baumann)
? SYMPOSIUM (Baumann)
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION;
de la Vega, I.; Wolter, S.; Lachmair, M.
? ORAL PRESENTATION; Bianco, R.
? POSTER PRESENTATION; Wache, S.
? ORAL PRESENTATION;
Kandylaki, K.
? POSTER PRESENTATION;
Scholtes, C.
? POSTER PRESENTATIONS (2);
Vesper, C.; Wolf, T.
? SYMPOSIUM (Knoeferle, Burigo)
? SYMPOSIUM (Koester); POSTER
PRESENTATION; Seegelke, C.
? POSTER PRESENTATIONS (2);
Afsari, Z.; Wache, S.
? POSTER PRESENTATION; Wache, S.
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATION; Buschmeier,
H.; Grishkova, I.; Bergmann, K.
? POSTER PRESENTATION
? POSTER PRESENTATION
? SYMPOSIUM (Koester)
? ORAL PRESENTATION; Michael, J.
? SYMPOSIUM (Baumann)
? ORAL PRESENTATION; Janczyk, M.
? TUTORIAL (Raschke)
? POSTER PRESENTATION
? POSTER PRESENTATIONS; ORAL
PRESENTATION; Fernandez, S.R.
? POSTER PRESENTATION
? POSTER PRESENTATION;
Masselink, J.
? POSTER PRESENTATION;
Grosjean, M.
? POSTER PRESENTATION
? POSTER PRESENTATION
? POSTER PRESENTATION; Sixtus, E.
? SYMPOSIUM (Morgan); POSTER
PRESENTATION; Kutscheidt, K.
? ORAL PRESENTATION
? SYMPOSIUM (Russwinkel, Prezenski,
Lindner); TUTORIAL (Russwinkel,
Prezenski, Joeres, Lindner, Halbrugge)

Cogn Process (2014) 15 (Suppl 1):S1S158


Lingnau A.
Lloyd D.
Lohmann J.
Lopez J. J. R.
Ludmann M.
Lutsevich A.

?
?
?
?
?
?

Maier S.
Mallot H. A.

?
?

Marmolejo-Ramos F.
Masselink J.
Matthews R.
McRae K.

?
?
?
?

Meilinger T.
Meurers D.
Michael J.
Milin P.
Moeller K.

?
?
?
?
?

Mohler B. J.

Monittola G.
Moore J.

?
?

Morgan A.
Muckli L.
Muller R.
Myachykov A.
Nagels A.

?
?
?
?
?

Neumann H.

Newen A.
Novembre G.
Nuerk H.
Obrig H.
Olivari M.

?
?
?
?
?

Ondobaka S.
Ossandon J.
Ostergaard J. R.
Patel-Grosz P.
Pfeiffer T.

?
?
?
?
?

Pfeiffer-Lessmann N.
Pfister R.
Pfluger H.
Pixner S.

?
?
?
?

Pliushch I.
Popov T.
Prezenski S.

?
?
?

Rabovsky M.
Radanovic J.
Radler P. A.
Ragni M.

?
?
?
?

SYMPOSIUM (Himmelbach)
SYMPOSIUM (Nuerk)
POSTER PRESENTATION
ORAL PRESENTATION; Rolke, B.
ORAL PRESENTATION
POSTER PRESENTATION;
Damaskinos, M.
ORAL PRESENTATION; Ragni, M.
POSTER
PRESENTATIONS
(3);
ORAL PRESENTATION; Hardiess, G.;
Lancier, S.; Schick, W.
POSTER PRESENTATION; Vaci, N.
POSTER PRESENTATION
SYMPOSIUM (Morgan)
POSTER PRESENTATION;
Rabovsky, M.
POSTER PRESENTATION
SYMPOSIUM (Nuerk)
ORAL PRESENTATION
POSTER PRESENTATION; Vaci, N.
SYMPOSIUM (Nuerk); POSTER
PRESENTATION; Radler P.A.
POSTER PRESENTATION;
Meilinger, T.
POSTER PRESENTATION; Braun, C.
POSTER PRESENTATION;
Limerick, H.
SYMPOSIUM (Morgan)
SYMPOSIUM (Morgan)
POSTER PRESENTATION
SYMPOSIUM (Knoeferle, Burigo)
ORAL PRESENTATION;
Kandylaki, K.
TUTORIAL (Neumann); ORAL
PRESENTATION; Gomez, O.
POSTER PRESENTATION; Heege, L.
ORAL PRESENTATION; Bianco, R.
SYMPOSIUM (Nuerk)
POSTER PRESENTATION; Krause, C.
POSTER PRESENTATION;
Symeonidou, E.
SYMPOSIUM (Koester)
POSTER PRESENTATION; Afsari, Z.
ORAL PRESENTATION; Michael, J.
SYMPOSIUM (Brauner, Jager, Rolke)
POSTER PRESENTATIONS; ORAL
PRESENTATION; Renner, P.
ORAL PRESENTATION; Pfeiffer, T.
SYMPOSIUM (Koester)
TUTORIAL (Raschke)
POSTER PRESENTATION;
Radler, P.A.
POSTER PRESENTATION
POSTER PRESENTATION; Braun, C.
SYMPOSIUM (Russwinkel, Prezenski,
Lindner); TUTORIAL (Russwinkel,
Prezenski, Joeres, Lindner, Halbrugge)
POSTER PRESENTATION
POSTER PRESENTATION; Vaci, N.
POSTER PRESENTATION
POSTER PRESENTATIONS; ORAL
PRESENTATION (3); Albrecht, R.;
Rizzardi, E.; Steinlein, E.

S157
Rahona J. J
Rapp M. A.
Raschke M.
Rebuschat P.
Renner P.
Rennig J.
Rizzardi E.
Roberts M.
Rohrich W. G.
Rolke B.

Romoli J.
Roser F.

Roth M. J.
Ruiz S.
Russwinkel N.

Safra L.
Sammler D.

Sandamirskaya Y.
Schack T.

Schad D.
Scheer M.
Schenk T.
Scherbaum S.
Schick W.
Schiltz C.
Schmid U.
Schmitz L.
Schneegans S.
Scholtes C.
Schoner G.
Schrodt F.
Schulz M.
Schumacher P.
Schumann F.
Sebanz N.

Sebold M.
Seegelke C.
Sehm B.
Seibold V. C.

? ORAL PRESENTATION;
Fernandez, S.R.
? POSTER PRESENTATION; Schad, D.
? TUTORIAL (Raschke)
? POSTER PRESENTATION
? POSTER PRESENTATIONS; ORAL
PRESENTATION; Pfeiffer, T.
? POSTER PRESENTATION
? ORAL PRESENTATION
? SYMPOSIUM (Nuerk)
? ORAL PRESENTATION; Mallot, H.A.
? SYMPOSIUM (Brauner, Jager, Rolke);
POSTER PRESENTATION; ORAL
PRESENTATION; Seibold, V.C.
? SYMPOSIUM (Brauner, Jager, Rolke)
? POSTER
PRESENTATIONS
(2);
ORAL PRESENTATIONS (2);
Hamburger, K.; Strickrodt, M.;
Wedell, F.
? POSTER PRESENTATION;
Kutscheidt, K.
? POSTER PRESENTATION;
Rebuschat, P.
? SYMPOSIUM (Russwinkel, Prezenski,
Lindner); TUTORIAL (Russwinkel,
Prezenski, Joeres, Lindner, Halbrugge);
ORAL PRESENTATION; Joeres, F.
? POSTER PRESENTATION; Vesper, C.
? POSTER PRESENTATION;
ORAL PRESENTATION; Bianco, R.;
Hellbernd, N.
? TUTORIAL (Sandamirskaya,
Schneegans)
? POSTER PRESENTATION;
ORAL PRESENTATION; Blasing, B.;
Seegelke, C.
? POSTER PRESENTATIONS (2);
Rabovsky, M.
? ORAL PRESENTATION
? SYMPOSIUM (Himmelbach)
? POSTER PRESENTATION; Frisch, S.
? POSTER PRESENTATION
? SYMPOSIUM (Nuerk)
? POSTER PRESENTATION;
Damaskinos, M.
? POSTER PRESENTATION; Vesper, C.
? TUTORIAL (Sandamirskaya,
Schneegans)
? POSTER PRESENTATION
? KEYNOTE LECTURE
? ORAL PRESENTATION
? SYMPOSIUM (Russwinkel, Prezenski,
Lindner)
? SYMPOSIUM (Brauner, Jager, Rolke)
? POSTER PRESENTATION; Wache, S.
? KEYNOTE LECTURE; POSTER
PRESENTATIONS (2); Vesper, C.;
Wolf, T.
? POSTER PRESENTATION; Schad, D.
? POSTER PRESENTATION
? POSTER PRESENTATION; Krause, C.
? POSTER PRESENTATION; ORAL
PRESENTATION; Rolke, B.

123

S158

Cogn Process (2014) 15 (Suppl 1):S1S158

Shaki S.
Simmel L.
Sixtus E.
Smolka M.
Soltanlou M.
Sorg C.
Spiegel M. A.

?
?
?
?
?
?
?

Steffenhagen F.

Stein S. C.

Steinlein E.
Sternefeld W.
Strickrodt M.
Sutterlutti R.
Symeonidou E.
Szucs D.
Tamosinunaite M.

?
?
?
?
?
?
?

Teickner C.
Thuring M.
Tillas A.
Trillmich C. M.

?
?
?
?

Tuason M. T.

Tylen K.
Tzelgov J.
Ugen S.
Ulrich R.
Unger M.

?
?
?
?
?

Vaci N.
van Leeuwen C.
Van Rinsveld A.
Vesper C.

?
?
?
?

Villringer A.
Voelcker-Rehage C.
Vogeley K.

?
?
?

123

SYMPOSIUM (Nuerk)
ORAL PRESENTATION; Blasing, B.
POSTER PRESENTATION
POSTER PRESENTATION; Schad, D.
SYMPOSIUM (Nuerk)
SYMPOSIUM (Himmelbach)
POSTER PRESENTATION;
Seegelke, C.
POSTER PRESENTATION;
Albrecht, R.
POSTER PRESENTATION;
Sutterlutti, R.
ORAL PRESENTATION
SYMPOSIUM (Brauner, Jager, Rolke)
ORAL PRESENTATION
POSTER PRESENTATION
POSTER PRESENTATION
SYMPOSIUM (Nuerk)
POSTER PRESENTATION;
Sutterlutti, R.
POSTER PRESENTATION; Schick, W.
SYMPOSIUM (Baumann)
ORAL PRESENTATION
POSTER PRESENTATION;
Hamburger, K.
POSTER PRESENTATION;
Goltenboth, N.
ORAL PRESENTATION; Michael, J.
SYMPOSIUM (Nuerk)
SYMPOSIUM (Nuerk)
SYMPOSIUM (Brauner, Jager, Rolke)
POSTER PRESENTATION;
Fischer, N.M.
POSTER PRESENTATION
ORAL PRESENTATION
SYMPOSIUM (Nuerk)
POSTER PRESENTATIONS (2);
Wolf, T.
ORAL PRESENTATION; Bianco R.
SYMPOSIUM (Koester)
SYMPOSIUM (de la Rosa)

von Sydow M.
Vorwerg C.
Vosgerau G.
Wache S.
Wachsmuth S.
Weber L.
Wedell F.
Weigelt M.
Weiss-Blankenhorn P. H.
Weisz N.
Wenczel F.
Westphal B.

Wiese R.
Wiese W.
Wirzberger M.
Wittmann M.
Wohlschlager A.
Wolbers T.
Wolf C.
Wolf T.
Wolska M.
Wolter S.
Wong H. Y.
Woolgar A.
Worgotter F.
Wortelen B.
Wuhle A.
Wunsch K.
Yaghoubzadeh R.
Yahya K.
Zimmermann U. S.
Zohar-Shai B.

? ORAL PRESENTATION
? POSTER PRESENTATION;
Grishkova, I.
? ORAL PRESENTATION; Lindner, N.
? POSTER PRESENTATION
? POSTER PRESENTATION; Renner, P.
? SYMPOSIUM (Baumann)
? ORAL PRESENTATION
? SYMPOSIUM (Koester)
? SYMPOSIUM (Himmelbach)
? POSTER PRESENTATION; Braun, C.
? ORAL PRESENTATION; Ragni M.
? POSTER
PRESENTATIONS
(1);
ORAL PRESENTATION (2);
Albrecht, R.; Albrecht, R.; Albrecht, R.
? ORAL PRESENTATION;
Kandylaki, K.
? POSTER PRESENTATION;
Pliushch, I.
? SYMPOSIUM (Russwinkel, Prezenski,
Lindner)
? SYMPOSIUM (Koester)
? SYMPOSIUM (Himmelbach)
? POSTER PRESENTATION; Wache, S.
? POSTER PRESENTATION;
Hamburger, K.
? POSTER PRESENTATION
? SYMPOSIUM (Nuerk)
? POSTER PRESENTATION
? SYMPOSIUM (de la Rosa)
? ORAL PRESENTATION; Caruana, N.
? POSTER PRESENTATION;
Sutterlutti, R.
? SYMPOSIUM (Baumann)
? POSTER PRESENTATION; Braun, C.
? SYMPOSIUM (Koester)
? POSTER PRESENTATION;
Grishkova, I.
? POSTER PRESENTATION
? POSTER PRESENTATION; Schad, D.
? SYMPOSIUM (Nuerk)

You might also like