Multisensory Development PDF

Multisensory Development
This page intentionally left blank

Multisensory
Development
Edited by
Andrew J. Bremner
Goldsmiths, University of London
David J. Lewkowicz
Florida Atlantic University
and
Charles Spence
University of Oxford
1
1
Great Clarendon Street, Oxford OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Oxford University Press 2012
The moral rights of the authors have been asserted
First Edition published in 2012
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
ISBN 978–0–19–958605–9
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Whilst every effort has been made to ensure that the contents of this work
are as complete, accurate and-up-to-date as possible at the date of writing,
Oxford University Press is not able to give any guarantee or assurance that
such is the case. Readers are urged to take appropriately qualified medical
advice in all cases. The information in this work is intended to be useful to the
general reader, but should not be used as a means of self-diagnosis or for the
prescription of medication
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Foreword
Andy and Charles first thought up the idea of producing this book some years ago over a few pints
of Sam Smith’s Nut Brown Ale in the Three Goats Heads in Oxford (a rich multisensory experi-
ence if ever there was one). After starting on a book proposal they quickly realised that the project
would not be possible without David’s help. Throughout the process of putting this volume
together we have been very lucky to have been able to attract such a fantastic and willing set of
contributors. We are very proud of the addition that this book will make to the literature and we
have the many authors who have contributed to thank for that. We would also like to acknowl-
edge the support that we have received from Martin Baum and Charlotte Green at OUP, and
also colleagues at Goldsmiths, Oxford, and Florida Atlantic. Fran Knight, JJ Begum, Madeleine
Miller-Bottome, Conor Glover, and Jenn Hiester bear particular mention. Nut Brown Ales all
around!
AJB was supported by European Research Council Grant No. 241242 (European Commission
Framework Programme 7) and DJL was supported by NSF grant BCS-0751888 and NIH grant
D057116 during the preparation of this volume.
Contents
Contributors ix
1 The multisensory approach to development 1
Andrew J. Bremner, David J. Lewkowicz, and Charles Spence
Part A Typical development of multisensory processes from

early gestation to old age
2 The role of olfaction in human multisensory development 29
Benoist Schaal and Karine Durand
3 The development and decline of multisensory flavour perception:
Assessing the role of visual (colour) cues on the perception of taste
and flavour 63
Charles Spence
4 Crossmodal interactions in the human newborn: New answers to
Molyneux’s question 88
Arlette Streri
5 The development of multisensory representations of the body and of
the space around the body 113
Andrew J. Bremner, Nicholas P. Holmes, and Charles Spence
6 The development of multisensory balance, locomotion, orientation,
and navigation 137
Marko Nardini and Dorothy Cowie
7 The unexpected effects of experience on the development of
multisensory perception in primates 159
David J. Lewkowicz
8 The role of intersensory redundancy in early perceptual, cognitive,
and social development 183
Lorraine E. Bahrick and Robert Lickliter
9 The development of audiovisual speech perception 207
Salvador Soto-Faraco, Marco Calabresi, Jordi Navarra,
Janet F. Werker, and David J. Lewkowicz
10 Infant synaesthesia: New insights into the development of
multisensory perception 229
Daphne Maurer, Laura C. Gibson, and Ferrinne Spector
11 Multisensory processes in old age 251
Paul J. Laurienti and Christina E. Hugenschmidt
viii CONTENTS
Part B Atypical multisensory development

12 Developmental disorders and multisensory perception 273
Elisabeth L. Hill, Laura Crane, and Andrew J. Bremner
13 Sensory deprivation and the development of multisensory integration 301
Brigitte Röder
Part C Neural, computational, and evolutionary mechanisms

in multisensory development
14 Development of multisensory integration in subcortical and cortical
brain networks 325
Mark T. Wallace, Dipanwita Ghose, Aaron R. Nidiffer, Matthew C. Fister,
and Juliane Krueger Fister
15 In search of the mechanisms of multisensory development 342
Denis Mareschal, Gert Westermann, and Nadja Althaus
16 The evolution of multisensory vocal communication in primates and the
influence of developmental timing 360
Asif A. Ghazanfar
Author index 373

Subject index 375
Contributors
Nadja Althaus Asif A. Ghazanfar

Department of Psychology, Neuroscience Institute,
Oxford Brookes University, Departments of Psychology and Ecology and
Headington Campus, Evolutionary Biology,
Gipsy Lane, Oxford OX3 0BP, UK Princeton University,
Lorraine E. Bahrick Princeton, NJ 08540, US
Department of Psychology, Dipanwita Ghose
Florida International University, Department of Psychology,
Miami, FL33199, US Vanderbilt University, 465 21st Avenue
Andrew J. Bremner South,
Sensorimotor Development Research Unit, MRB III, Suite 7110, Nashville,
Department of Psychology, TN 37232–8548, US
Goldsmiths, University of London, Laura C. Gibson
New Cross, London, SE14 6NW, UK Department of Psychology,
Marco Calabresi Neuroscience and Behavior,
Department of Information and McMaster University,
Communication Technologies, 1280 Main Street West, Hamilton,
Universitat Pompeu Fabra, Ontario L8S4L8, Canada
Room 55.108, c/Roc Boronat 138, Elisabeth L. Hill
08018 Barcelona, Spain Sensorimotor Development Research Unit,
Dorothy Cowie Department of Psychology,
Sensorimotor Development Research Unit, Goldsmiths, University of London,
Department of Psychology, Goldsmiths, New Cross, London, SE14 6NW, UK
University of London, Nicholas P. Holmes
New Cross, London, SE14 6NW, UK Department of Psychology,
Laura Crane School of Psychology and Clinical Language
Department of Psychology, Sciences,
Goldsmiths, University of Reading, Reading, RG6 6AL, UK
University of London, Christina E. Hugenschmidt
New Cross, London, SE14 6NW, UK Center for Human Genomics,
Karine Durand Wake Forest University School of Medicine,
Developmental Ethology and Cognitive Winston-Salem NC, 27157, US
Psychology Group, Juliane Krueger Fister
Centre for Taste and Smell Science, Neuroscience Graduate Program,
CNRS (UMR 6265), Université de Bourgogne, Vanderbilt University,
15 rue Picardet, 21000 Dijon, France 465 21st Avenue South, MRB III,
Matthew C. Fister Suite 7110, Nashville, TN 37232–8548, US
Vanderbilt University Institute of
Imaging Science, 1161 21st Avenue South,
Medical Center North, AA-1105,
Nashville, TN 37232-2310, US
x CONTRIBUTORS
Paul J. Laurienti Benoist Schaal

Department of Radiology, Developmental Ethology and Cognitive
Wake Forest University School of Medicine, Psychology Group,
Winston-Salem NC, 27157, US Centre for Taste and Smell Science,
David J. Lewkowicz CNRS (UMR 6265),
Department of Psychology, Université de Bourgogne, 15 rue Picardet,
Florida Atlantic University, 21000 Dijon, France
777 Glades Road, Boca Raton, FL 33431, US Salvador Soto-Faraco
Robert Lickliter Department of Information and
Department of Psychology, Communication Technologies,
Florida International University, Universitat Pompeu Fabra, Room 55.108,
Miami, FL33199, US c/Roc Boronat 138, 08018 Barcelona, Spain
Denis Mareschal Ferrinne Spector

Centre for Brain and Cognitive Development, Department of Psychology,
School of Psychological Sciences, Birkbeck, Neuroscience and Behavior,
University of London, Malet Street, McMaster University,
London, WC1E 7HX, UK 1280 Main Street West, Hamilton, Ontario
L8S4L8, Canada
Daphne Maurer
Department of Psychology, Charles Spence
Neuroscience and Behavior, Crossmodal Research Laboratory,
McMaster University, Department of Experimental Psychology,
1280 Main Street West, Hamilton, University of Oxford,
Ontario L8S4L8, Canada South Parks Road, Oxford, OX1 3UD, UK
Marko Nardini Arlette Streri

Department of Visual Neuroscience, Paris Descartes University
UCL Institute of Ophthalmology, Institut Universitaire de France
11–43 Bath Street, Laboratoire Psychologie de la Perception
London EC1V 9EL, UK CNRS UMR 8158
rue des Saints Pères, 45
Jordi Navarra 75006 Paris, France
Fundació Sant Joan de Déu,
Sant Boi de Llobregat, Mark T. Wallace
Hospital Sant Joan de Déu, Vanderbilt Brain Institute,
Edifici Docent, 4th floor, Departments of Hearing and Speech Sciences,
C/Santa Rosa, 39–57, 08950 Psychology and Psychiatry,
Esplugues del Llobregat, Vanderbilt University,
Barcelona, Spain 465 21st Avenue South, MRB III,
Suite 7110, Nashville, TN 37232–8548, US
Aaron R. Nidiffer
Department of Hearing and Speech Sciences, Janet F. Werker
Vanderbilt University, Department of Psychology,
465 21st Avenue South, MRB III, Suite 7110, University of British Columbia,
Nashville, TN 37232–8548, US 2136 West Mall, Vancouver,
V6T 1Z4, Canada
Brigitte Röder
Biological Psychology and Gert Westermann
Neuropsychology, Department of Psychology,
University of Hamburg, Von-Melle-Park 11, Lancaster University, Fylde College,
D-20146 Hamburg, Germany Lancaster, LA1 4YF, UK
Chapter 1
The multisensory approach

to development
Andrew J. Bremner, David J. Lewkowicz, and
Charles Spence
1.1 Introduction
The question of how humans and other species come to be able to deal with multiple sources of
sensory information concerning the world around them is of great interest currently, but has also
has been on the minds of philosophers and psychologists alike for centuries (e.g. Berkeley 1709;
Locke 1690). Nonetheless, it has only been in the last two decades or so that we have witnessed a
dramatic progress in our understanding of how the brain actually integrates the information
available to the different senses in the mature adult (e.g. Calvert et al. 2004; Dodd and Campbell
1987; Lewkowicz and Lickliter 1994; Naumer and Kaiser 2010; Spence and Driver 2004; Stein and
Meredith 1993). This progress has resulted, in part, from the proliferation of new methodologies
(functional imaging, transcranial magnetic stimulation, and so on). Such methods have stimu-
lated a veritable explosion of knowledge about multisensory processes at various levels of neural
organization in humans and other species, in human behavioural performance, and in various
neurally-compromised clinical populations. We are now standing at a point where this growth in
the knowledge-base about mature multisensory functioning, and in the availability of newly-
developed experimental techniques, is now beginning to be applied to developmental questions
concerning the emergence of multisensory processes at a neural level (e.g. see Chapter 14 by
Wallace et al. for work on the development of the superior colliculus (SC) in the animal model),
and at a behavioural level in human infants and children.
However, there is, of course, a great tradition of thinking concerning the development of
multisensory processes, which began well before the advent of modern cognitive neuroscience.
The central question posed by this tradition (and indeed by this volume and all of its contribu-
tors) concerns how it is that we come to be able to process the information conveyed by our
multiple senses such that we can perceive the world (and ourselves), thereby enabling us to
function sensibly within it. As has been acknowledged by philosophers and early psychologists
alike, despite the fact that adults are able to achieve this feat in a seemingly effortless manner, we
cannot assume that the same is true for others who have had different degrees, or qualities, of
experience as compared to typically developed adults. From Molyneaux’s famous question to
Locke (as reported by Locke 1690) about whether a person who had been blind from birth could,
on the restoration of their vision, see and recognize an object (see Held et al. 2011, for the latest
evidence on this fascinating topic), to William James’s (1890, p. 488) assertion that the newborn
experiences a ‘blooming buzzing confusion’, the question of how multisensory development comes
about has been put forward time and again as one of seminal importance to developmental
psychology.
2 THE MULTISENSORY APPROACH TO DEVELOPMENT
Developmental scientists have by no means been idle when it comes to addressing topics
concerning multisensory development (see Lewkowicz and Lickliter 1994). While the literature
on multisensory integration in adult humans has only recently become a topic of substantive
theoretical interest within the fields of experimental psychology and cognitive neuroscience,
theory and research in the field of multisensory development has been a more constant theme in
developmental psychology over the last 50 years or so (Birch and Lefford 1963, 1967; Bryant 1974;
Bryant et al. 1972; Gibson 1969; Meltzoff and Borton 1979; Piaget 1952; for more recent approaches
see Chapter 8 by Bahrick and Lickliter and Chapter 7 by Lewkowicz, as well as Lickliter and
Bahrick 2000 and Lewkowicz 2000).
However, despite the considerable attention that has been devoted to the development of mul-
tisensory integration, we would argue that it still remains essentially a niche consideration among
developmental psychologists (see Fig. 1.1). Indeed, many of the most influential current theo-
retical accounts of development, and indeed empirical investigations, particularly of cognitive
abilities in infants and children, are still addressed within a theoretical framework that is largely
unimodal (e.g. see Baillargeon 2004; Johnson and de Haan 2010; Mareschal et al. 2007; Spelke
2007). This is not to claim that developmental psychology as a field is devoting too much atten-
tion to the study of individual sensory modalities (although it is quite clear that the focus of
research has clearly been on visual and auditory processing at the expense of other channels;
indeed, researchers have argued that some of the other senses are profoundly neglected both in
study and stimulation; e.g. Field 2003, 2010; Field et al. 2006; see Fig. 1.1), but rather that
little attention has been paid to the role of the multisensory nature of sensory experience in
mainstream accounts of development.
The provision of multiple sensory modalities bestows both important adaptive benefits, and
challenges for ontogenetic development. If we are to fully understand how development progresses
and the constraints placed upon it by the biological reality of our bodies and the neural apparatus
contained therein, the multisensory nature of our perceptual world across the course of develop-
ment will have to be characterized more fully. First, it is our contention that researchers too fre-
quently treat the sensory experience of their participants as being all of a piece, with little regard
to both the subtle and more obvious differences in the kinds of information provided by the dif-
ferent sensory channels. Different sensory channels complement each other by providing differ-
ent kinds of information (see Section 1.2 for a discussion of this matter). Understanding these
differences, and how individuals at various stages in the lifespan are able to benefit from these
various forms of information, is central to understanding the development of information
processing in humans.
However a more central consideration of the current volume concerns how it is that we come
to be able to use our senses in concert. There are both adaptive benefits and challenges to achiev-
ing this. By way of an illustration of the adaptive benefits, suppose that in a given instance, the
provision of several sensory inputs at one and the same time enhances an infant’s ability to make
a perceptual discrimination (e.g. imagine the situation in which a face, the sound of a voice,
the way someone touches us, and even the person’s distinctive smell, provide cues both to their
identity and gender; see Smith et al. 2007). If we consider an infant’s ability to recognize a person
from the point of view of just a single sensory input (as is frequently done in face recognition
research), then we may well be underplaying their social cognitive abilities at any given develop-
mental stage (on this theme, see also Gallace and Spence 2010).
But the problem is not as straightforward as it might at first seem; as researchers studying
multisensory development have been at pains to point out, the senses do not interact in a homo-
geneous way across the course of development (e.g. see Chapter 9 by Lewkowicz and Chapter 14
by Wallace et al.). So, if a given perceptual skill, which in adults relies on more than one sensory
INTRODUCTION 3
1000 Multisensory
Multisensory AND Development
900
Visual AND Development
Number of articles retrieved on MEDLINE
Auditory AND Development

800
Tactile AND Development
Olfactory AND Development
700
Executive function AND Development
600
500
400
300
200
100
0
1
1
96
96
97
97
98
98
99
99
00
00
01
–1
–1
–1
–1
–1
–1
–1
–1
–2
–2
–2
57
62
67
72
77
82
87
92
97
02
07
19
19
19
19
19
19
19
19
19
20
20
Fig. 1.1 The number of articles retrieved by MEDLINE for searches restricted to 5-year time-windows
from 1957–2011. In order to retrieve as many publications related to multisensory development as
possible, we searched for articles with the words ‘multisensory’, ‘multimodal’, ‘crossmodal’,
‘intermodal’, ‘intersensory’ in conjunction with the word ‘development’ (this is represented by the
line labelled ‘Multisensory AND Development’). Articles addressing multisensory development have
undergone a recent increase since 2001, more than double the number published in any 5-year
period before that year. Nonetheless, this number is still dwarfed by research on auditory and visual
perceptual development (despite the fact that we conducted more conservative searches for such
articles by searching for ‘Visual perception’ or ‘Auditory perception’ in conjunction with
‘Development’). Research on the development of visual and auditory perception underwent
something of a resurgence of interest in the early 1970s, likely due to the popularity of Piaget’s
sensorimotor theory of development and the advent of new methods for investigating infants’
and children’s abilities in vision and audition (Fantz 1961, 1964). Research addressing olfactory
and tactile development is particularly under-represented (when searching for studies on the
development of Tactile perception AND Development we also included the search terms
‘Haptic perception’ and ‘Tactual perception’. We provide the results of a search for ‘Executive
function’ AND ‘Development’ as a point of reference with a topic of current interest. Research
using the words ‘multisensory’, ‘multimodal’, ‘crossmodal’, ‘intermodal’, and/or ‘intersensory’
has increased exponentially over the last 50 years (5610 entries are retrieved between 2007
and 2011).
modality, requires a form of multisensory integration which the child has not yet attained, then a
unisensory experiment may well provide a misleading picture of the child’s competence.
In a recent discussion of cognitive development, Neuroconstructivism (Mareschal et al. 2007), a
persuasive argument was put forward for the necessity of considering the constraints that are
placed on development at all levels of functioning of the organism. Such levels of functioning
include the organism’s genetic inheritance, its brain and cellular machinery, its body and physical
interactions with the environment, and, of course, the environment (which can itself influence
the organism at these various different levels of functioning). Whilst many approaches to devel-
opment have assumed that it is possible to understand ontogeny via changes in the interactions
occurring at one or two levels of functioning (e.g. changes in the organism’s behaviour in relation
to its environment), Mareschal et al. argued that developmental change can only be fully under-
stood if all the factors that influence change are taken into account; they particularly emphasize
the role of neural and more large-scale brain architecture and also the biomechanics of the body
in its environment.
We consider this kind of approach to be the ‘state-of-the-art’ in terms of developmental theory.
However, we also believe that Neuroconstructivism has a glaring omission; it does not consider in
detail the constraints placed on development of the multisensory apparatus we possess
(A.J. Bremner and Spence 2008; although see Chapter 15 by Mareschal et al. and Sirois et al.
2008). As will be made clear in the next section, the multiple sensory inputs that humans (and
other species) are bestowed with provide a rich variety of different kinds of sensory information
and these interact in complex and adaptive ways (See Fig. 1.2 for a classification of multisensory
interactions). If there is one overarching lesson from the recent explosion of research into multi-
sensory processes in neurologically-normal mature adults, it is that multisensory interactions are
the rule rather than the exception. However, these interactions are constrained by the nature of
our multisensory apparatus and the way in which that apparatus is embedded in our bodies and
nervous systems. Importantly, they are also constrained by unfolding changes in the ways in
which our multisensory abilities develop (Lewkowicz 2011a; Occelli et al. 2011; Turkewitz 1994;
Turkewitz and Kenny 1982). We would thus like to argue here that a full picture of perceptual,
cognitive, and social development will only emerge once diverse ecological multisensory
constraints are considered and studied.
Separate components Multisensory composite signal

Signal Percept Signal Percept Category
a a+b Equivalence
(intensity unchanged)
Redundancy
b a+b Enhancement
(intensity increased)
a+b and Independence

a a+b
Non-redundancy Dominance
b a+b (or ) Modulation
a+b Emergence
Single signal a a and Synaesthesia
Fig. 1.2 A classification of multisensory interactions (from Science, 283, Sarah Partan and Peter Marler,
Communication Goes Multimodal, pp. 1272–1273 © February 1999, The American Association for
the Advancement of Science. Reprinted with permission from AAAS). The distinction between
redundancy and non-redundancy is often compared to the distinction drawn between amodal and
arbitrary correspondences in the developmental literature (see, e.g. Walker-Andrews 1994). Note,
however, that there is some variance in the literature with regard to what exactly constitutes an
amodal relation (see Spence 2011a for discussion). As there has been significant interest in
multisensory synaesthetic percepts since the publication of Partan and Marler’s (1999) paper (e.g. see
Chapter 10 by Maurer et al.), we have included an additional section on their figure in parentheses.
HOW DO MULTISENSORY PROCESSES CONSTRAIN AND ENRICH DEVELOPMENT? 5
1.2 How do multisensory processes constrain and enrich

development?
Despite the fact that there is still quite some uncertainty over how many senses we possess, and
indeed how we should define sensory modalities in the first place (e.g. Durie 2005; Macpherson
2011), we can safely state that we perceive our environment through multiple sensory systems.
Vision, touch, hearing, taste, smell, proprioception, whilst being most frequently considered, are
just some of the modalities available to humans. Humans have likely evolved the organs necessary
for these multiple channels of information processing for a wide variety of adaptive functions.
If we do not attempt to understand such multisensory benefits in developing individuals we will
likely misrepresent the development of the skills we are interested in (which will typically be mul-
tisensory in our everyday ecological settings). But it is critical to bear in mind that the provision
of these multiple channels also poses significant problems for the developing individual. With all
of this information arriving in different representational formats and neural codes, how does the
child or young animal make sense of the environment from which the information originates? In
this section, as an introduction to the matters considered in this volume, we outline both the
adaptive significance and the challenges posed by the provision of multiple different sensory
modalities to developing humans.
1.2.1 Developmental benefits of multiple senses

The provision of multiple sensory modalities gives us complementary sources of information
about the environment (E.J. Gibson 1969; J.J. Gibson 1966; 1979). The sensory modalities
have likely evolved to make use of the constraints of the physical environment around us.
So, for example, vision is particularly good at transducing spatial information about near and far
space from the visible light spectrum. This helps us to recognize objects, people, and spatial
arrays. In contrast, audition is particularly good at encoding rapid temporal patterns of informa-
tion arising from mechanical disturbances in both near and far space (Bregman 1990), including
from places which are not in a direct line of sight to the body (i.e. for sound sources located
behind the head, or obscured by an occluder), and in darkness. Information from the somatosen-
sory channels (including information about touch from cutaneous receptors and about the
arrangement and movement of our limbs from proprioceptors) helps us to perceive our body and
the environment that is in direct contact with our body. The chemoreceptors (including gusta-
tory and olfactory receptors) provide us with information about chemicals impinging on the
membranes in which they are located, telling us about everything from nutrients and poisons to
social signals in the environment. Such chemicals may arise from objects that happen to be in
contact with our various sensory epithelia, while others may have arisen from more distant
objects (in the case of orthonasal olfaction).
These channels of information are not (always) redundant. ‘Direct’ receptors like taste and
touch do not tell us about the distal environment except in some special situations (e.g. when the
sun’s rays heat one’s skin). Likewise, we cannot hear or see most of the chemicals that make their
way onto our olfactory epithelia. It seems safe to conclude then that we are able to glean more
information about the world by sampling from multiple modalities than we could from just a
single sensory modality. That said, the functional onset and development of the various senses
follow quite different ontogenetic trajectories, with the earliest to begin functioning being touch,
followed by the chemoreceptors, the vestibular sense, and then audition and finally vision (see
Fig. 1.3 and Table 1.1. for a detailed timetable of the developmental emergence of the senses; also
Gottlieb 1971). Not only do the senses develop at different rates, but there are remarkable idio-
syncrasies in the ways in which unisensory abilities develop within them. Indeed, as will be
explained later, some researchers have suggested that the heterochronous emergence of multisensory
0 4 8 12 16 20 24 28 32 36 38 weeks
Trigeminal anatomy (touch/chemoreception)
Cutaneous touch anatomy
Olfactory and taste anatomy
Vestibular anatomy
Auditory anatomy
Visual anatomy
Trigeminal function (touch/chemoreception)
Cutaneous tactile function
Olfactory and taste function
Vestibular function
Auditory function
Visual function
Birth
Fig. 1.3 The emergence of the anatomy and function of the multiple sensory systems in humans during
gestation (Reproduced from Moore, K.L., and Persaud, T.V.N., The developing human: Clinically oriented
embryology, 8th Ed. Saunders Elsevier: Philadelphia, PA © Elsevier 2008 with permission). Pink bars indicate
the emergence and maturation of the senses (usually provided by histological evidence). Shading of the
pink bars indicates the time between the first observation of sensory apparatus and its full anatomical
maturation (as gleaned from the incomplete evidence summarized in Table 1.1 and Gottlieb 1971). Green
bars indicate the onset of function of the senses. Shading of the green bars indicates the uncertainty in the
literature concerning the first age at which function is observable (as gleaned from the incomplete evidence
summarized in Table 1.1 and Gottlieb 1971). Arrows pointing beyond birth indicate the continued postnatal
development of anatomy and function. Visual receptors continue to mature beyond birth, possibly due to
the relatively low level of stimulation they have received in utero. Both Gottlieb (1971) and Lickliter (Lickliter
and Bahrick 2000; Lickliter and Banker 1994) have noted that the senses emerge in a strikingly similar order
across vertebrate species (but vary with respect to the onset of functionality). It has been suggested that this
ordering and timing of functionality plays an important epigenetic role in the development of information
processing (see also Turkewitz and Kenny 1982). Reproduced in colour in the colour plate section.
Table 1.1 The emergence of anatomy and function of our multiple sensory systems during
gestation (adapted from Gottlieb 1971). Here, we present some of the key pieces of evidence
for prenatal development of the senses. Anatomical development is determined typically from
histological evidence, whereas functional development comes from behavioural and physiological
evidence. Functional development occasionally occurs in advance of anatomical maturation; this is
likely due to partial functionality before full maturation has occurred. Note that the trigeminal nasal
system contributes to both touch and chemosensation as chemicals give rise to tactile stimulation
registered by trigeminal receptors embedded in the olfactory epithelium. The table contains a
number of omissions. Notably proprioception is not mentioned, since it is difficult to determine
functionality in utero (although there are indications of well-developed proprioceptive functioning
at birth; e.g. van der Meer 1997). Similarly, the various subdivisions of the chemosensory receptors
and cutaneous receptors are not discussed either for the same reason.
Sensory channel Anatomical development Functional development

Touch 4–7 weeks’ gestation: Maturation 7 weeks’ gestation: foetus moves if the lips are
of trigeminal and cutaneous touched (Hooker 1952; Humphrey 1964).
receptors (Humphrey 1964).
12 weeks’ gestation: grasp and rooting reflex
responses (Moon and Fifer 2011; Humphrey
1964).
Chemosensation 4 weeks’ gestation: trigeminal 7 weeks’ gestation: foetus responds to touch
neurons mature (Chapter 2 by on trigeminal receptors.
Schaal and Durand).
It is difficult to determine functionality of
11 weeks’ gestation: olfactory chemoreceptors in utero, but it is likely that
receptor neurons mature (Chapter they are functional by at least late gestation
2 by Schaal and Durand). (LeCanuet and Schaal 1996; Schaal et al. 2004;
Chapter 2 by Schaal and Durand).
12–13 weeks’ gestation: taste
buds mature (Beidler 1961).
Vestibular system 10–14 weeks’ gestation: sensory 11–25 weeks’ gestation: foetus will show a
cells are present in semicircular ‘righting’ reflex (see Gottlieb 1971).
canals (Moon and Fifer 2011).
Birth: newborns show ocular nystagmus in
12–16 weeks’ gestation: Lateral response to rotation (Galebsky 1927).
vestibular nucleus present (Gottlieb
1971).
Auditory system 9 weeks’ gestation: the cochlea 24 weeks’ gestation: foetus begins to respond
forms. to sounds (Abrams et al. 1995; Bernard and
Sontag 1947).
24 weeks’ gestation: inner ear has
reached its adult size and shape; 21 weeks’ gestation: auditory cortical evoked
organ of Corti is fully mature responses recorded in an infant born
(Bredberg 1968; Moore and prematurely (Weitzman and Graziani 1968).
Persaud 2008).
Visual system 5 weeks’ gestation: eyes begin to 22–28 weeks’ gestation: earliest age at which
form. Primordial neural retina first premature infants have shown evoked cortical
appears at about 7 weeks, but the responses to light flashes. Behavioural responses
retina is not completely mature such as tracking have also been measured
until around 4 months after birth around this age (Ellingson 1960; Engel 1964;
(Mann 1964; Moore and Persaud Moon and Fifer 2011; Taylor et al. 1987).
2008). Myelination of retinal
ganglion cells is complete at about
3 months (Moore and Persaud
2008).
(Continued)
Table 1.1 (continued ) The emergence of anatomy and function of our multiple sensory systems
during gestation
Sensory channel Anatomical development Functional development

Visual system 28 weeks’ gestation: Laminar Postnatal 3–4 months: Braddick and Atkinson
structure appears in striate cortex, (2011) argue that early visual evoked potentials
but adult-like appearance and (VEPs) indicate that activity reaches cortex, but
connections continue to develop not that cortical neurons are responding to
postnatally (see Burkhalter et al. that. VEPs that respond to binocular constrasts
1993). are their preferred marker and appear
postnatally at 3–4 months. Visual acuity is
thought to reach adult levels soon afterwards
(Boothe et al. 1985).
functioning plays an important role in development (Chapter 16 by Ghazanfar; Chapter 2 by

Schaal and Durand; Lewkowicz 2002; Lewkowicz and Ghazanfar 2009; Lickliter and Bahrick
2000; Turkewitz and Kenny 1982). The more we understand about the trajectory of development
of all of the senses, the more we will get a complete picture of the development of our ability to
make sense of the whole array of environmental information to which we are exposed.
However, to suggest a model in which information gained is related in a very simple
linear manner to the number of channels available would be misleading. The senses have an impact
well beyond their own immediate provision of information. By way of an example consider the
sense of touch. Touch is most typically stimulated by objects in direct contact with the body surface.
This is not a unique provision as we can also, in many cases, see when something is touching us.
Importantly however, this redundancy may have important implications beyond the sense of touch
as it can disambiguate which cues from our distal spatial senses (e.g. vision and audition) specify
aspects of our environment that are within reach. Touch can, as such, ‘tutor’ vision to perceive
depth, or at least the environment that affords immediate action (Berkeley 1709; Gori et al. 2010a;
Gregory 1967; Held et al. 2011). Aside from this classic example, there are numerous ways in which
the senses can inform one another across development in this fashion. It is, for instance, well-known
that when humans have bad (and good) experiences with taste and flavour, that this can help us to
learn which visually perceived foodstuffs are safe to eat (e.g. Bernstein 1978).
Perhaps a more precise way of expressing this particular advantage of multiple redundant sen-
sory inputs is as a facility to use one sensory channel in order to improve the performance of
another in situations when is the latter is used on its own (this is known as ‘crossmodal calibra-
tion’). For some time now, researchers have suggested that the senses are not equal in their ability
to provide accurate information about the environment (e.g. Freides 1974). In particular,
Welch and Warren (1980, 1986) have argued that the superiority of vision in terms of providing
information about the spatial environment can account for the numerous demonstrations that
vision tends to dominate over other modalities in guiding our judgments and responses with
respect to both objects and spatial layouts (e.g. Gibson 1943; Lee 1980; Botvinick and Cohen
1998; Spence 2011b; although see Spence et al., 2001). The fact that the senses are not equal in this
regard can also provide a means of improving performance within the less accurate modality; that
is, the more accurate sense can be used to calibrate, and thereby improve the sensitivity of, the less
accurate sense. This kind of process of calibration has been used to explain developmental
improvements in proprioception in the absence of vision (Chapter 6 by Nardini and Cowie;
Chapter 12 by Hill et al.; Lee and Aronson 1974). It can also explain the poorer balance and
haptic judgment of orientation that has been observed in blind participants (Edwards 1946;
Gori et al. 2010b)1.
Redundant multisensory information can also improve the accuracy with which we can make per-
ceptual judgements when more than one sense is available concurrently. As has been extensively dis-
cussed in the recent literature on multisensory integration, mature adults, rather than being dominated
by information from the most reliable modality (Welch and Warren 1980), actually appear to integrate
the information from several available modalities according to their relative reliabilities in the context
of the task at hand (see Ernst and Bülthoff 2004; Trommershäuser et al. 2011). Typically, the senses
appear to be weighted optimally in accordance with the maximum likelihood estimation model (Alais
and Burr 2004; Ernst and Banks 2002; although see Brayanov and Smith 2010 for a recent exception).
Interestingly, as discussed particularly in Chapter 6 by Nardini and Cowie (see also recent work by
Gori et al. 2008, 2010b; Nardini et al. 2008; Nava et al. in press), this is not the case at all stages of per-
ceptual development, with young children apparently weighting the senses in a sub-optimal fashion.
The integration of sensory stimuli across multiple senses has also been shown to speed responses
to it (e.g. Hershenson 1963; Miller 1991; Roland et al. 2007; see also Spence et al. 2004). This
multisensory speeding has also been shown to develop at various stages across early development,
and to continue changing into old age (Chapter 11 by Laurienti and Hugenschmidt; Chapter 9 by
Lewkowicz; Chapter 6 by Nardini and Cowie; Barutchu et al. 2009, 2011; Neil et al. 2006). Clearly,
the development of this kind of multisensory integration, which can lead to more accurate and
faster responses to the stimuli/events in environment, will have important implications for the
development of ecological perception and cognition.
The senses are also combined in particular ways by virtue of the action system in which they are
situated. Going back to the example of touch, two forms of tactile sensation that are frequently
discussed in the literature are passive and active touch (with active touch sometimes being referred
to as ‘haptics’; see Chapter 4 by Streri for a discussion of haptics). Whilst passive touch can, in
some sense, be considered as a purer form of sensory transduction in that it relies solely on pres-
entation of stimuli to cutaneous receptors, haptic exploration of an object involves the combina-
tion of cutaneous sensory inputs with other sensory inputs; specifically, proprioceptive information
concerning the arrangement of our limbs (as the hand is the primary haptic organ, our finger
joints are particularly important here), and efferent motor signals concerning our movement.
This multisensory combination enhances our ability to identify objects through active tactile
exploration (see Klatzky and Lederman 1993; Klatzky et al. 1987).
Up until now, we have discussed how multisensory processes can have an impact at any given
stage in development, and how differences in multisensory processes between these stages have an
important bearing on ecological perception across the lifespan. However, it is particularly impor-
tant to consider how multiple senses can play a role in driving or constraining development from
one stage to the next. One particularly influential recent account of multisensory development
has been offered by the ‘intersensory redundancy hypothesis’ (IRH) proposed by Bahrick and
Lickliter (2000, see also Chapter 8 by Bahrick and Lickliter). According to these researchers,
infants selectively attend to redundant information across the senses and this then enables them
1 Actually, various accounts of developmental crossmodal calibration have been put forward to date. Whilst
some researchers have argued that the principle role of crossmodal calibration is to improve accuracy
(sensitivity) in the calibrated sense (e.g. Lee 1980), Burr and colleagues (Burr et al. in press) have, more
recently, presented a somewhat different perspective. They suggest that crossmodal calibration serves the
purpose of tuning sensory inputs to environmental information to which they do not have ‘direct’ access
(as in the case of actual size perceived via the visual modality, which requires more computational
transformations than does actual size perceived through haptics; see Gori et al. 2008; 2010b).
to learn about the cues in the environment that specify unified objects and events. This account
shows how an interaction between the multisensory infant and his/her environment can give rise
to a simple orienting preference, which can elegantly explain a number of the perceptual develop-
ments typically observed in early life (Chapter 8 by Bahrick and Lickliter; see also Lewkowicz
2011b).
Another argument that multiple senses ‘scaffold’ development comes from Piaget, who in this
case (and more widely) was more concerned with the development of knowledge about the envi-
ronment. In fact, the interaction of multiple sensory modalities was a key aspect of Piaget’s con-
structionism. He argued that the initial steps toward objectivity in infants’ representations of their
world were achieved, in part, through the integration of separate, modality-specific schemas
(what he termed the ‘organisation of reciprocal assimilation’, Piaget 1952). Piaget describes a
number of observations in which his young children (at around 4 months of age) first noticed
that an object could be apprehended in more than one modality (or sensory schema) at once.
Piaget goes on to describe this as a new kind of schema in which infants can, for instance, grasp
what they see, and see what they grasp: a reciprocal relationship. Because of this organization, an
object is no longer a thing of looking or a thing of grasping, but an entity existing in a more objec-
tive representation. There are, of course, problems with the view that infants require visually
guided reaching in order to develop with typical intelligence, as there are many instances of blind
children or children without limbs who manage to do just this (e.g. Gouin Décarie 1969).2
Nevertheless, regardless of the specific importance of visually-guided reaching in development,
Piaget’s conception of a reciprocal schema shows us how two initially modality-specific represen-
tations of environmental stimulation can be enriched almost fortuitously by virtue of the spatial
coincidence of the modalities that occurs when an infant, for example, picks up an object.
This gives us another example of an enhancement of development by the provision of multiple
senses.
1.2.2 Developmental challenges of multiple senses

The ease with which we typically accomplish multisensory integration belies the complexity of the
computational processes involved. The senses convey information about the environment using
different neural codes. Spatial codes vary substantially from one spatial channel to the next. For
instance, vision is initially coded in terms of a retinocentric spatial frame of reference, auditory
stimuli in terms of head-centred coordinates, touch in terms of the location of stimulation on the
skin-surface, and proprioception in terms of limb-centred coordinates (Spence and Driver 2004).
This clearly poses a problem for simultaneously locating a stimulus or object within more than
one sense. Do we relate everything to a single sensory coordinate system (e.g. a visual frame of
reference), or do we re-code it all into a multisensory, receptor-independent, coordinate system
(see Avillac et al. 2005; Ma and Pouget 2008)? How do we translate information between sensory
frames of reference? For some consideration of how infants and young children come to be able
to solve these kinds of problems see Chapter 5 by Bremner et al. and Chapter 13 by Röder.
Temporal codes also vary between the senses, causing computational problems in terms of the
perception of synchrony (King 2005; Mauk and Buonomano 2004; Meck 1991; Petrini et al.
2009a, 2009b; Spence and Squire 2003).
2 In fact, there is reason to believe that Piaget considered visually-guided reaching to be merely one example
of how the organization of reciprocal assimilation could occur, and that he viewed evidence concerning
infants without reaching or vision to be irrelevant to this matter (Gouin Décarie and Ricard 1996).
WHERE ARE WE NOW? 11
Despite the multiplicity of the spatial and temporal codes in which information is provided by
the senses, in some ways, once one has settled on the frame of reference which one is going to use,
this problem constitutes a relatively easy hurdle to surmount. If the infant (or foetus) can learn
the relationships between the various sensory spatial frames of reference and temporal codes
(perhaps through the common occurrence of stimuli arising from specific objects or spatial lay-
outs), then the problem is solved (although of course this ignores a sticky issue: when there are
multiple stimuli in each modality, how does one know which stimulus goes with which? This is
known as ‘the crossmodal correspondence problem’ (Spence et al. 2010a; also see Lewkowicz
2011b, for a developmental solution)). However, at least for spatial binding, the problem is much
more difficult than this because the spatial relationship between the sensory frames of reference
frequently changes when, for example, the body changes posture (e.g. when the eyes move in their
sockets, or when the hands move relative to the body; see Pöppel 1973; Spence and Driver 2004).
This difficulty is compounded across development, as, especially in the early years but continuing
right across the lifespan, the relative shapes and sizes of the body, limbs, and head change, as do
the number and variety of postural changes that an individual can readily make.3 These compu-
tational problems prompt an important question which represents the key focus of the present
volume: how is it that we develop the ability to integrate the senses? As many of the chapters in
this volume propose answers to these questions we will not pre-empt them here, apart from
noting that researchers have provided a wide variety of solutions to these problems (see Chapter 8
by Bahrick and Lickliter, Chapter 16 by Ghazanfar, Chapter 7 by Lewkowicz, Chapter 10 by
Maurer et al., and Chapter 14 by Wallace et al.).
Interestingly, some theorists have made a virtue of certain of the challenges posed by multisen-
sory integration to ontogenetic development. For instance, Turkewitz and his colleagues
(Turkewitz 1994; Turkewitz and Kenny 1982; Kenny and Turkewitz 1986) have both argued, and
presented evidence to support the claim, that the particular heterogeneous way in which the
function of the different sensory systems emerge prenatally facilitates perceptual development.
The suggestion is that heterogeneous emergence of multisensory function reduces the amount of
sensory information the developing foetus has to assimilate at early stages in development,
and reduces competition between the separate sensory systems facilitating their functional
development.
1.3 Where are we now?

The question of how multisensory development occurs has been approached in several ways, but
perhaps the clearest delineation has been between theorists who suggest that the senses become
integrated across development and those who suggest that the senses become differentiated across
development. The integration account is perhaps best exemplified by Piaget who, as we have just
explained, considered the senses and their associated action schemas to be separate at birth. The
gradual integration of sensory schemas during the first two years of life was a hallmark of Piaget’s
sensorimotor period of development. On the other hand, following William James’s (1890) asser-
tion that newborn infants perceive a ‘blooming buzzing confusion’ in which the senses are united
and undifferentiated, Eleanor Gibson (1969) proposed that the senses are initially fused, resulting
in unified perceptual experiences. In contrast with James’s position however, Gibson’s argument
did not assume that young infants are confused by this lack of differentiation, but instead sug-
gested that such a unity of the senses allows young infants to more easily pick-up information
3 Developments in the ability to move the body in certain ways are also likely to have an important impact
on senses which are active by nature such as haptic tactile exploration (see Section 1.2.1).
from the structure presumably available in the environment. In direct contrast with the Piagetian
position, Gibson argued that as development progresses and as perceptual learning and differen-
tiation permit the gradual pick-up of increasingly finer stimulus structure, infants become capa-
ble of differentiating the senses one from another.
One traditional way to assess the relative merits of these early and opposing theoretical views
was to test whether newborn infants can perceive links between stimulation in different sensory
modalities. At the last major overview of multisensory development in the mid-1990s (Lewkowicz
and Lickliter 1994), an opening chapter by Linda Smith stated that this question had been com-
prehensively answered in the affirmative; infants perceive in a multisensory way from the outset.
Such a claim provided strong support for a differentiation account. However, one of the principle
concerns of Lewkowicz and Lickliter’s volume was to delineate the kinds of multisensory skills
that infants and children have to master. One particular distinction was drawn between amodal
and arbitrary crossmodal correspondences (see also Parise and Spence in press; Spence 2011a).
Amodal correspondences were described as those in which information about the world is sup-
plied redundantly across the senses (e.g. intensity, duration, tempo, rhythm, shape, texture, and
spatial location), whereas arbitrary correspondences are those that are naturally linked in the
environment but which provide different information according to the modality in which they
are delivered (such as that between the colour of a person’s hair and the sound of their voice).
Research has indicated that infants are rather good at detecting amodal information from early on
in life (e.g. Chapter 7 by Lewkowicz; Lewkowicz and Turkewitz 1980; Lewkowicz et al. 2010) but
develop more slowly in their ability to learn about naturally-occurring arbitrary crossmodal
correspondences (Bahrick 1994; Hernandez-Reif and Bahrick 2001).
More recently, the IRH has provided a useful framework with which to explain this develop-
mental precedence of amodal correspondences and at the same time explain how infants learn
about the multisensory environment (see our earlier discussion of this hypothesis). The IRH pro-
poses that infants’ attention is captured early on by amodal correspondences, requiring little or no
integration, and that infants’ attention to these amodal properties of the environment bootstraps
their subsequent ability to perceive arbitrary crossmodal correspondences. In that sense, then, the
IRH represents a reconciliation between Piagetian integration and Gibsonian differentiation.
Since the publication of Lewkowicz and Lickliter’s (1994) seminal volume, there have been
numerous advances in terms of the empirical and theoretical contributions to our understanding
of multisensory development (see Fig. 1.1). At the same time, however, we have also witnessed
a huge increase in research into multisensory processes in mature adults across a range of
disciplines (Stein et al. 2010). This research has pointed to the pervasiveness of multisensory
interactions both at the behavioural (e.g. Spence and Driver 2004) and at the neural levels (e.g.
Calvert et al. 2004; Driver and Noesselt 2008; Ghazanfar and Schroeder 2006; Murray and Wallace
2011; Stein and Meredith 1993; Stein and Stanford 2008). This new knowledge about adult mul-
tisensory functioning from the domain of cognitive neuroscience is now beginning to have an
impact on the field of multisensory development and we hope that the current volume reflects
this fact. Below, we highlight some of the issues that we think are of key current interest.
1.4 Important themes in this book

A number of volumes have been published on the subject of multisensory perception. Typically
these have dealt with multisensory processes from a broad and multimethodological perspective,
but are largely dominated by research into mature multisensory processes in adults (e.g. Calvert
et al. 2004; Spence and Driver 2004). The themes emphasised in the research on mature adults at
the time focused around the problem of spatial and temporal integration across multiple sensory
IMPORTANT THEMES IN THIS BOOK 13
modalities. These themes echo the priorities of a particular model of multisensory integration put
forward by Stein and Meredith (1993), which dealt specifically with the neurophysiological
aspects of multisensory integration within the SC.
More recently, we have witnessed a shift, at least in the adult behavioural literature, towards a
consideration of other factors that modulate multisensory integration such as, for example,
semantic congruency (Doehrmann and Naumer 2008; Hein et al. 2007; Chen and Spence 2010,
2011; Laurienti et al. 2004; Naumer and Kaiser 2010) and synaesthetic congruency (otherwise
known as crossmodal correspondence; see Spence 2011a; Parise and Spence in press). This move
has been motivated, in part, on the basis of the observation that spatial coincidence does not
appear to play anything like as important a role in multisensory integration when it comes to
stimulus identification as opposed to stimulus detection/localization (see Spence in press). There
is also a move towards a consideration of the development of multisensory processes, as it is
increasingly acknowledged that developmental findings are important in validating models of
multisensory functioning in mature adults (e.g. Lewkowicz and Ghazanfar 2009; Spence and
A.J. Bremner 2011). Indeed, some of the new topics of research in the adult literature are particu-
larly well-suited to developmental investigation (especially the role of semantic knowledge and
congruency in multisensory integration).
As should be clear from the title, the focus of the current volume is an examination of how
multisensory perception develops. The closest book in terms of this emphasis is the seminal
edited volume by Lewkowicz and Lickliter (1994). In choosing the chapter coverage offered in the
current volume we have attempted to strike a balance between updating the progress with regard
to the questions that book addressed in 1994, and coverage of new and emerging topics in
multisensory development. Below, we describe some of the particular emphases of the current
volume.
1.4.1 Chemosensory and gustatory information processing:

odours, tastes, textures, and flavours
The chapters in this volume by Schaal and Durand, and Spence address, respectively, the litera-
tures on the development of olfaction and gustation and multisensory flavour perception more
generally (i.e. including vision). Chapter 2 by Schaal and Durand reports on the emergence of
multisensory interactions with olfaction, from the womb through infancy to childhood.
Chapter 3 by Spence focuses on crossmodal interactions between vision and the traditional
flavour senses.
Chemosensory and gustatory information processing is a relatively new area for developmental
research. In fact, the development of the chemical senses was not an issue that was covered in
separate chapters in Lewkowicz and Lickliter’s (1994) earlier volume on multisensory develop-
ment. Indeed, this comes as little surprise given that the chemical senses have typically not received
much interest from developmental psychologists. That said, the reasons for this neglect are
unclear. Is it perhaps due to an implicit assumption that these are ‘lower’ senses, which are in
some way less worthy of study? Likely more relevant here is the difficulty associated with stimulat-
ing the taste buds or olfactory epithelium in a controlled manner (in adults let alone in infants).
Notwithstanding the above, there is a relatively large body of research tracking changes in chemi-
cal sensitivity from birth (and before, see Chapter 2 by Schaal and Durand). However, much of
this research has appeared in the food science and nutrition journals (e.g. journals such as Journal
of Sensory Studies, Chemical Senses, Food Quality, and Preference and Appetite), which are not
typically read by developmental psychologists. Furthermore, much of the research has tended to
investigate development within a unisensory framework (e.g. asking how salt sensitivity changes
over the first few years of life). To date, there has been much less multisensory research concerning
the interaction between the chemical senses and vision, audition, and somatosensory channels
(although we must acknowledge that there have been numerous considerations of interactions
between taste, retronasal olfaction, and trigeminal stimulation; i.e. those sensory inputs that
together contribute to the classical definition of flavour; see ISO 1992, 2008; Lundström et al.
2011; Spence et al. 2010b). There is certainly less in terms of a theoretical framework for under-
standing the development of multisensory integration of the chemical senses. We predict that,
given the difficulty of stimulating the chemical senses, that models of multisensory integration
developed on the basis of the study of interactions between audition, vision, and touch will have
an important impact on progress in understanding the chemical senses.
Despite the dearth of prior research, interest in the chemical senses within the more main-
stream developmental psychology literature is unquestionably on the increase (see Fig. 1.1).
As food choice and preferences become an increasingly important social issue, researchers are
beginning to investigate how multisensory processes are involved in the development of these
kinds of behaviours. Spence highlights the importance of expectancy and cultural variation on
multisensory interactions (the latter is an issue that multisensory research has yet to tackle with
any vigour; see also Howes 1991; 2006). Furthermore, as emphasized in Chapter 2 by Schaal and
Durand, the chemical senses are among the most useful for early development, functioning in the
womb well before birth. As these authors elegantly describe, the chemical senses may play a vital
role in setting the sensory canvas for the development of the other senses that become functional
later (Turkewitz 1994; see Table 1.1).
1.4.2 Embodiment
Multisensory representations of the body and of the world from an embodied perspective, play a
fundamental role in all of our mental processes by providing a point of reference to objects in the
external world (this is what is known as ‘embodied cognition’; Bermúdez et al. 1995; Gallagher
2005; Varela et al. 1991). At an even more basic level, multisensory perception of the body is
required if we are to manipulate and move around our environments in a physically competent
and functional manner.
Despite the above, developmental psychology as a field has been rather underwhelming in its
attempt to understand the development of perceptual and cognitive processes from an embodied
perspective. This is perhaps due to the successful proliferation of looking-time techniques, which
have themselves brought a particular bias in the knowledge that has been gathered about early
cognitive development. Because it is difficult to record infants’ looking behaviours whilst they are
manipulating objects, the methods we have available have typically been used to present stimuli
via more indirect modalities (i.e. vision and audition). These limitations have also been inherited
by the more recently developed functional imaging measures of early cognition (see Johnson and
de Haan 2010 for an up-to-date review of neuroscientific studies of perceptual and cognitive
development). In order to better understand perceptual development from an embodied per-
spective we badly need more research that investigates the development of multisensory abilities
that bridge between vision, audition, and the more body-specific sensory channels: touch and
proprioception. As can be seen from Fig. 1.1, research on the development of tactile perception is
hugely underrepresented compared to that on visual and auditory development. Nonetheless, a
number of recent advances have been made (see, e.g., A.J. Bremner et al. 2008; J.G. Bremner et al.
in press; Nardini et al. 2008; Gori et al. 2008; Zmyj et al. 2011)
In this volume two chapters specifically examine the development of infants’ and children’s
sensitivity to, and perception of, tactile stimuli. Streri (Chapter 4) reports the most up-to-date
perspective on her long-standing programme of research into the perception of crossmodal
links across vision and haptics in very young infants (some of them newborns). Meanwhile,
Bremner et al. (Chapter 5) describe recent research into tactile orienting responses in infants
and the perception of the position of their limbs. Chapter 6 by Nardini and Cowie covers our
understanding of the development of multisensory processes underlying the employment of the
body in balancing, orienting behaviours, and more large-scale sensorimotor abilities including
locomotion and spatial navigation.
As pointed out by Mareschal and his colleagues (2007), the body plays an important role in
constraining the ways in which environment and inheritance interact to give rise to developmen-
tal change. The way in which the body influences multisensory processes is clearly no exception
to this rule (see above). Without a concrete understanding of the way in which the body struc-
tures multisensory experience at various stages of human development, it will be difficult for us
to understand the emergence of perceptual abilities. A particularly nice example of this is pro-
vided by Bahrick and Lickliter (Chapter 8). They describe developmental robotics research that
suggests that ‘infant-like’ movements can help structure the multisensory canvas in such a way as
to enhance the availability of multisensory cues such as amodal synchrony (e.g. Arsenio and
Fitzpatrick 2005).
1.4.3 Neuroscience
Related to the issue of how we integrate inputs from the sensory apparatuses distributed across
our dynamic body, is how the structural and functional organisation of the brain constrains mul-
tisensory development. The sensory apparatus we possess feeds into the central nervous system at
different points of entry (via the primary cortices). The structural and functional constraints of
the brain on our ability to integrate these inputs together at any given stage in development is, of
course, critical to our understanding of the way in which the developing infant, child, or even
adult can process multisensory information (Johnson 2011; Mareschal et al. 2007). Whilst behav-
ioural studies can, to a certain extent, tell us about the neural processes that underlie behaviours,
research is increasingly indicating that behaviour and associated neural processing do not appear
at the same time in development (e.g. see Elsabbagh and Johnson 2010; Halit et al. 2003). Such
findings highlight the possibility that perceptual abilities develop in advance of the behaviours
which developmentalists have typically used to identify them, and thus provide an additional
reason to consider a neuroscientific approach to understanding multisensory development.
The great bulk of what we now know about the neural basis of multisensory development
comes from research on the development of the multisensory responsiveness of cells in the SC
from recordings in cats and monkeys (Stein et al. 1994; Stein and Stanford 2008; Wallace 2004).
Two chapters in the current volume tackle the development of neural processes linked to the SC
model of multisensory integration. Wallace et al. (Chapter 14) describe research in animal prepa-
rations on the development of and the role of sensory experience in the development of the SC
model of integration, while Laurienti and Hugenschmidt (Chapter 11) examine changes in mul-
tisensory integration during aging, with specific reference to marker tasks of SC function (e.g.
tasks in which participants’ responses to bimodal stimuli are enhanced in terms of their speed or
accuracy over the same responses to unimodal stimuli are typically taken as providing a behav-
ioural marker for multisensory integration in SC; see Spence et al. 2004).
The SC animal model has contributed a great deal to our understanding of multisensory devel-
opment; a particularly important contribution has been to enable the assessment of the role of
specific kinds of sensory experience on developmental change in the neural functions of multi-
sensory processes (Chapter 14 by Wallace et al.; Yu et al. 2010). Nonetheless, it has been rather
difficult to make the leap from these neurophysiological studies in animals to the development of
multisensory processing in human infants and children. Primarily, this has been because studies
of SC function in cats and monkeys indicate that the kinds of experiences required to foster adult-
like responsiveness in SC neurons occur early in life and have their impact quickly (although see
Yu et al. 2010). Such findings are rather at odds with recent evidence, which have demonstrated
that changes in multisensory integration in human behaviour continue well into childhood and
early adolescence (Barutchu et al. 2009, 2011; Gori et al. 2008, 2010b; Hillock et al. 2010; Nardini
et al. 2008; Ross et al. 2011). They are also difficult to reconcile with the knowledge that sub-
cortical structures such as the SC are largely mature before birth in humans.
Thus, an important direction for future research will be to understand the neural basis of
changes in multisensory functioning in human infants and children. Functional imaging research
on cortical multisensory processes in developing human populations is just starting to appear in
the literature (e.g. Brandwein et al. 2011; Kushnerenko et al. 2008; Russo et al. 2010). Furthermore,
researchers are now beginning to consider neural models of multisensory development (e.g.
A.J. Bremner et al. 2008). Certainly, as more and more data arise it will become increasingly
important to consider whether our explanations of the development of multisensory processes
are consistent with the development of the central nervous system that subserves them. It will also
be particularly important to consider how the extant animal models can inform our understand-
ing of neural and behavioural development seen in humans.4 Chapter 16 by Ghazanfar presents a
particularly useful contribution to this matter by showing how a comparative perspective on
brain development can help explain why certain developmental processes occur in the way they
do in different species.
Lastly in this section, we mention an area of behavioural research in human infants, young
children, and clinical populations that has been brought to bear on the question of how neural
communications between sensory cortical areas change over the course of human development.
In Chapter 10, Maurer et al. explain how synaesthesia, a clinical condition in which stimuli to one
sensory system give rise to additional (often arbitrarily-related) atypical sensations in the same or
in a different sensory modality, could result from atypical neural development involving connec-
tions between sensory cortical areas (see also Marks and Odgaard 2005; Maurer and Mondloch
2005). Maurer et al. describe different developmental explanations of synaesthesia and cover the
literature concerning behavioural development in typical infants and children and which relates
to the typical developmental process which seems to have gone awry in synaesthetes (see
Lewkowicz 2011a; Walker et al. 2010; Wagner and Dobkins 2011, for recent data and discus-
sion).
1.4.4 Understanding the developmental process (inheritance

and environment)
As we discussed earlier, theory and evidence indicates that multisensory development is strongly
influenced by both context and timing (e.g. Turkewitz 1994). This role for timing and context in
development has been highlighted a great deal in recent times; particularly by theorists
arguing for ‘process-oriented’ approaches to understanding development (e.g. Lewkowicz 2000;
2011a; Lickliter and Bahrick 2000; Karmiloff-Smith 2009; Mareschal et al. 2007) as an alternative
to the more deterministic viewpoints on development, which peaked during the 1990s
(e.g. Barkow et al. 1992; Baron-Cohen 1998; Leslie 1992; Spelke and Kinzler 2007). Unfortunately
4 Indeed, Chapter 14 by Wallace et al. and Chapter 11 by Laurienti and Hugenschmidt tackle these issues
head-on, drawing links between behavioural findings in humans and the implication of the SC model, and
considering the role of SC–cortical interactions.
for developmental psychologists, ‘process-oriented’ approaches are complex in that they require
the integration of information from multiple disciplines. That is, they require an understanding
of the ontogeny of behaviour through a detailed analysis of the dynamic interactions between
different levels of organization including the genetic, neural, and environmental.
Although a process-oriented approach to the development of multisensory development is a
complex one, some of the chapters in this volume tackle these matters head-on by discussing the
developmental processes involved in the emergence of multisensory functions. For example,
Ghazanfar (Chapter 16) examines the impact of context and timing on multisensory develop-
ment through a comparative approach, Lewkowicz (Chapter 7) examines the paradoxical effects
of early experience, namely perceptual narrowing, on the development of audiovisual perception,
and Soto-Faraco et al. (Chapter 9) investigate the effects of growing up in a multilingual environ-
ment on audiovisual integration in speech processing (see Pons et al. 2009). Meanwhile, Mareschal
et al. describe their computational modelling approach. This considers how the constraints of
connectionist neural architectures and environment can give rise to developmental change in
cognition and behaviour (see Elman et al. 1996; Mareschal et al. 2007; O’Reilly and Munakata
2000).
Lastly, the chapters by Hill et al. (Chapter 12) and Röder (Chapter 13) specifically address the
question of how multisensory processes might go awry in atypical human development (see also
Chapter 9 by Soto-Faraco et al.). Röder reviews the significant body of research that has now
accrued regarding the influence of the loss of a single sensory modality (vision) on multisensory
processes in the remaining modalities. In Chapter 12, Hill et al. address multisensory processes in
developmental disorders. Given the number of disorders in which multisensory abnormalities are
implicated and, indeed, the number of multisensory therapeutic interventions that are currently
available (e.g. Ayres 1979; see Chapter 12 by Hill et al.), it is odd to note that little attention has
been given to atypical multisensory development prior to the publication of this chapter.
Nonetheless, research on the atypical development of multisensory processes in developmental
disorders such as autism spectrum disorder and developmental dyslexia are certainly on the
increase (e.g. see recent articles by Bahrick 2010; Blau et al. 2009; Facoetti et al. 2010; Foss-Feig
et al. 2010; Klin et al. 2009; Russo et al. 2010; Hairston et al. 2005). It is our hope that this multi-
sensory research will shed new light on these disorders and perhaps offer some validation for
multisensory interventions. Furthermore, as Hill et al. demonstrate, a careful analysis of the role
of multisensory abilities in atypical development (and vice versa) will likely shed additional light
on our understanding of the processes of multisensory development across both atypical and
typical populations.
1.4.5 Development, learning, and aging

As we have already discussed, the task of determining how perceptual processes emerge through
the complex dynamic interactions underlying developmental change is a complex puzzle.
Furthermore, it is perhaps even more difficult to determine what processes underlie multisensory
development than it is to understand unisensory development. This is best illustrated by means
of an example. Developmental changes in the relative weighting of the senses do not necessarily
result from changes in the multisensory integrative process itself but, in line with current thinking
regarding integration, could also result from changes in the reliability of one or more of the contrib-
uting senses across the course of development. Thus, unisensory developments can have subsequent
influences on multisensory processes. One way to help solve this kind of problem is through
carefully designed experimentation in which developmental changes in unisensory and multisen-
sory abilities are investigated in concert/parallel (e.g. Gori et al. 2008 and see Chapter 6 by Nardini
and Cowie). However, another way to approach this general problem of understanding the
drivers of developmental change is to consider the wider context of development. Perhaps the
most informed picture of how context affects development is provided by a life-span approach in
which developmental changes are compared across life-periods in which the processes of
development are quite different.
Chapter 11 by Laurienti and Hugenschmidt provides a much needed review of the literature cov-
ering the development of multisensory processes in old age. Chapter 3 by Spence and also Chapter
9 by Soto-Faraco et al. adopt life-span approaches by covering and comparing development at vari-
ous points across the life-span (from infancy to old age). For instance, Spence’s chapter reviewing
the extant literature on multisensory flavour development highlights changes in multisensory inte-
gration in early life, adulthood (on the acquisition of perceptual expertise), and in old age.
But how does this richer context help us solve the problem of determining a developmental
process in multisensory abilities? From the wider literature we can learn much about the context
of developmental change across different periods of life, For instance, it is well known that, during
infancy and early childhood, unisensory performance improves, whereas in old age unisensory
acuity declines. Interestingly, researchers have reported increases in multisensory integration
across infancy, childhood, and old age (Chapter 11 by Laurienti and Hugenschmidt; Barutchu
et al. 2009; Gori et al. 2008; Nardini et al. 2008; Laurienti et al. 2006; Neil et al. 2006). Others have
tentatively suggested that multisensory integration can help to hide the consequences of unisen-
sory decline (see Laurienti et al. 2006). Thus we can conclude that developmental change in mul-
tisensory integration is occurring in the same direction across quite different contexts of
development in terms of unisensory functioning (improvement in infancy and childhood, decline
in old age). This points either to an independence of multisensory and unisensory developmental
processes, or toward there being qualitatively different kinds of changes in multisensory function-
ing at play across the different periods of the lifespan. Taking a life-span approach to development
therefore enriches our understanding of the factors involved in developmental processes.
1.5 Summary
Multisensory processes are at play in almost everything we do. It has taken mainstream psycholo-
gists (developmental and otherwise) some time to appreciate this fact, and we still think that more
needs to be done in order to integrate our understanding of multisensory perceptual functioning
into the developmental literature. Recently, however, there has been dramatic progress in our
understanding of multisensory development, as witnessed by the rich set of contributions in this
volume. As is often the case, the more we learn about multisensory development, the more com-
plex the picture becomes and the more questions are raised. Nonetheless it is our conviction (and
we hope that, after reading this book, you will agree with us) that a full picture of perceptual,
cognitive, and social development will only emerge once we consider the fact that all of these
processes depend crucially on multisensory interactions. We hope that this volume goes some
way toward documenting this fact and, even more importantly, that it spurs the next generation
of researchers to plunge head on into the complex world of multisensory interaction, for ulti-
mately it is at this level of organization that accurate explanations of behaviour lie.
Acknowledgements
AJB is supported by European Research Council Grant No. 241242 (European Commission
Framework Programme 7) and DJL was supported by NSF grant BCS-0751888 and NIH grant
D057116 during the work on this volume.
REFERENCES 19
References
Abrams, R.M., Gerhardt, K.J., and Peters, A.J.M. (1995). Transmission of sound and vibration to the fetus.
In Fetal development: A psychobiological perspective (eds. J. LeCanuet, W. Fifer, N. Krasnegor, and
W. Smotherman), pp. 315–30. Lawrence Erlbaum Associates, Hillsdale NJ.
Alais, D., and Burr, D. (2004). The ventriloquist effect results from near-optimal cross-modal integration.
Current Biology, 14, 257–62.
Arsenio, A., and Fitzpatrick, P. (2005). Exploiting amodal cues for robot perception. International Journal
of Humanoid Robotics, 2, 125–143.
Avillac, M., Deneve, S., Olivier, E., Pouget, A., and Duhamel, J.-R. (2005). Reference frames for
representing visual and tactile locations in parietal cortex. Nature Neuroscience, 8, 941–49.
Ayres, A.J. (1979). Sensory integration and the child. Western Psychological Services, Los Angeles, CA.
Baillargeon, R. (2004). Infants’ reasoning about hidden objects: evidence for event-general and
event-specific expectations. Developmental Science, 7, 391–414.
Bahrick, L.E. (1994). The development of infants’ sensitivity to arbitrary intermodal relations. Ecological
Psychology, 6, 111–23.
Bahrick, L.E. (2010). Intermodal perception and selective attention to intersensory redundancy:
implications for typical social developmental and autism. In The Wiley-Blackwell handbook of infant
development, 2nd edn. (eds. J.G. Bremner, and T.D. Wachs), pp. 120–66. Wiley-Blackwell, Oxford, UK.
Bahrick, L.E., and Licklier, R. (2000). Intersensory redundancy guides attentional selectivity and perceptual
learning in infancy. Developmental Psychology, 36, 190–201.
Barkow, J.H., Cosmides, L., and Tooby, J. (eds.) (1992). The adapted mind: Evolutionary psychology and the
generation of culture. Oxford University Press, New York.
Baron-Cohen, S. (1998). Does the study of autism justify minimalist innate modularity? Learning and
Individual Differences, 10, 179–91.
Barutchu, A., Crewther, D.P., and Crewther, S.G. (2009). The race that precedes co-activation:
Development of multisensory facilitation in children. Developmental Science, 12, 464–73.
Barutchu, A., Crewther, S.G., Fifer, J., et al. (2011). The relationship between multisensory integration and
IQ in children. Developmental Psychology, 47, 877–85.
Beidler, l.M. (1961). Taste receptor stimulation. Progress in Biophysics and Biophysical Chemistry, 12,
107–32.
Berkeley, G. (1948). An essay toward a new theory of vision (originally published in 1709). In Readings in
the history of psychology (ed. W. Dennis), pp. 69–80. Appleton-Century-Crofts, East Norwalk, CT.
Bermudez, J.L., Marcel, A.J., and Eilan, N. (1995). The body and the self. MIT Press, Cambridge, MA.
Bernard, J., and Sontag, L.W. (1947). Fetal reactivity to tonal stimulation: a preliminary report. Journal of
Genetic Psychology, 70, 205–210.
Bernstein, I.L. (1978). Learned taste aversion in children receiving chemotherapy. Science, 200, 1302–1303.
Birch, H.G., and Lefford, A. (1963). Intersensory development in children. Monographs of the Society for
Research in Child Development, 28(5), 1–48.
Birch, H.G., and Lefford, A. (1967). Visual differentiation, intersensory integration, and voluntary motor
control. Monographs of the Society for Research in Child Development, 32(2), 1–87.
Blau, V., van Atteveldt, N., Ekkebus, M., Goebel, R., and Blomert, L. (2009). Reduced neural integration of
letters and speech sounds links phonological and reading deficits in adult dyslexia. Current Biology, 19,
503–508.
Boothe, R.G., Dobson, V., and Teller, D.Y. (1985). Postnatal development of vision in human and
nonhuman primates. Annual Review of Neuroscience, 8, 495–545.
Botvinick, M., and Cohen, J. (1998). Rubber hands ‘feel’ touch that eyes see. Nature, 391, 756.
Braddick, O., and Atkinson, J. (2011). Development of human visual function. Vision Research, 51,
1588–1609.
Brandwein, A.B., Foxe, J.J., Russo, N.N., Altschuler, T.S., Gomes, H., and Molholm, S. (2011). The
development of audiovisual multisensory integration across childhood and early adolescence:
a high-density electrical mapping study. Cerebral Cortex, 21, 1042–55.
Brayanov, J.B., and Smith, M.A. (2010). Bayesian and ‘anti-Bayesian’ biases in sensory integration for
action and perception in the size-weight illusion. Journal of Neurophysiology, 103, 1518–31.
Bredberg, G. (1968). Cellular pattern and nerve supply of the human organ of Corti. Acta Oto-
Laryngologica, Suppl. 236, 1–135.
Bregman, A.S. (1990). Auditory scene analysis: The perceptual organization of sound. MIT Press,
Cambridge, MA.
Bremner, A.J., and Spence, C. (2008). Unimodal experience constrains while multisensory experiences
enrich cognitive construction. Behavioral and Brain Sciences, 31, 335–36.
Bremner, A.J., Holmes, N.P., and Spence, C. (2008). Infants lost in (peripersonal) space? Trends in
Cognitive Sciences, 12, 298–305.
Bremner, J.G., Hatton, F., Foster, K.A., and Mason, U. (2011). The contribution of visual and vestibular
information to spatial orientation by 6- to 14-month-old infants and adults. Developmental Science 14,
1033–1045.
Bryant, P.E. (1974). Perception and understanding in young children. Methuen, London.
Bryant, P.E., Jones, P., Claxton, V., and Perkins, G.H. (1972). Recognition of shapes across modalities by
infants. Nature, 240, 303–304.
Burkhalter, A., Bernardo, K.L., and Charles, V. (1993). Development of local circuits in human visual
cortex. Journal of Neuroscience, 13, 1916–31.
Burr, D., Binda, P., and Gori, M. (2011). Combining vision with audition and touch, in adults and in
children. In Sensory cue integration (eds. J. Trommershäuser, M.S. Landy, and K.P. Körding).
Oxford University Press, New York.
Calvert, G.A., Spence, C., and Stein, B.E. (eds.) (2004). The handbook of multisensory processes. MIT Press,
Cambridge, MA.
Chen, Y.-C., and Spence, C. (2010). When hearing the bark helps to identify the dog: semantically-
congruent sounds modulate the identification of masked pictures. Cognition, 114, 389–404.
Chen, Y.-C., & Spence, C. (2011). Crossmodal semantic priming by naturalistic sounds and spoken words
enhances visual sensitivity. Journal of Experimental Psychology: Human Perception and Performance 37,
1554–1568.
Dodd, B., and R. Campbell (eds.) (1987). Hearing by eye: the psychology of lip-reading. Lawrence Erlbaum
Associates, Hillsdale, NJ.
Doehrmann, O., and Naumer, M.J. (2008). Semantics and the multisensory brain: How meaning
modulates processes of audio-visual integration. Brain Research, 1242, 136–50.
Driver, J., and Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on
‘sensory-specific’ brain regions, neural responses, and judgments. Neuron, 57, 11–23.
Durie, B. (2005). Future sense. New Scientist, 2484, 33–36.
Edwards, A. (1946). Body sway and vision. Journal of Experimental Psychology, 36, 526–35.
Ellingson, R.J. (1960). Cortical electrical responses to visual stimulation in the human infant.
Electroencephalography and Clinical Neurophysiology, 12, 663–77.
Elman, J.L., Bates, E., Johnson, M.H., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996). Rethinking
innateness: a connectionist perspective on development. MIT Press, Cambridge, MA.
Elsabbagh, M., and Johnson, M.H. (2010). Getting answers from babies about autism. Trends in Cognitive
Science, 14, 81–87.
Engel, R. (1964). Electroencephalographic responses to photic stimulation, and their correlation with
maturation. Annals of the New York Academy of Science, 12, 663–77.
Ernst, M.O., and Banks, M.S. (2002). Humans integrate visual and haptic information in a statistically
optimal fashion. Nature, 415, 429–33.
REFERENCES 21
Ernst, M.O., and Bülthoff, H.H. (2004). Merging the senses into a robust percept. Trends in Cognitive
Sciences, 8, 162–69.
Facoetti, A., Tussardi, A.N., Ruffino, M., et al. (2010). Multisensory spatial attention deficits are predictive
of phonological decoding skills in developmental dyslexia. Journal of Cognitive Neuroscience, 22,
1011–25.
Fantz, R. (1961). The origin of form perception. Scientific American, 204(5), 66–72.
Fantz, R. (1964). Visual experience in infants: Decreased attention to familiar patterns relative to novel
ones. Science, 146, 668–70.
Field, T. (2003). Touch. MIT Press, Cambridge, MA.
Field, T. (2010). Massage therapy facilitates weight gain in preterm infants. Current Directions in
Psychological Science, 10, 51–54.
Field, T., Diego, M.A., Hernandez-Reif, M., Deeds, O., and Figuereido, B. (2006). Moderate versus light
pressure massage therapy leads to greater weight gain in preterm infants. Infant Behavior and
Development, 29, 574–78.
Foss-Feig, J.H., Kwakye, L.D., Cascio, C.J., et al. (2010). An extended multisensory temporal binding
window in autism spectrum disorders. Experimental Brain Research, 203, 381–89.
Freides, D. (1974). Human information processing and sensory modality: Cross-modal functions,
information complexity, memory, and deficit. Psychological Bulletin, 81, 284–310.
Galebsky, A. (1927). Vestibular nystagmus in new-born infants. Acta Oto-Laryngologica, 11, 409–23.
Gallace, A., and Spence, C. (2010). The science of interpersonal touch: An overview. Neuroscience and
Biobehavioral Reviews, 34, 246–59.
Gallagher, S. (2005). How the body shapes the mind. Oxford, UK: Oxford University Press.
Ghazanfar, A.A., and Schroeder, C.E. (2006). Is neocortex essentially multisensory? Trends in Cognitive
Sciences, 10, 278–85.
Gibson, E.J. (1969). Principles of perceptual learning and development. Appleton-Century-Crofts,
East Norfolk, CT.
Gibson, J.J. (1943). Adaptation, after-effect and contrast in the perception of curved lines. Journal of
Experimental Psychology, 16, 1–31.
Gibson, J.J. (1966). The senses considered as perceptual systems. Houghton Mifflin, Boston, MA.
Gibson, J.J. (1979). The ecological approach to visual perception. Lawrence Erlbaum Associates, Hillsdale NJ.
Gori, M., Del Viva, M., Sandini, G., and Burr, D.C. (2008). Young children do not integrate visual and
haptic information. Current Biology, 18, 694–98.
Gori, M., Giuliana, L., Alessandra, S., Sandini, G., and Burr, D. (2010a). Calibration of the visual by the
haptic system during development. Journal of Vision, 10, Article 470.
Gori, M., Sandini, G., Martinoli, C., and Burr, D. (2010b). Poor haptic orientation discrimination in
nonsighted children may reflect disruption of cross-sensory calibration. Current Biology, 20, 223–25.
Gottlieb, G. (1971). Ontogenesis of sensory function in birds and mammals. In The biopsychology of
development (eds. E. Tobach, L.R. Aronson, and E. Shaw), pp. 67–128. Academic Press, New York.
Gouin Décarie, T. (1969). A study of the mental and emotional development of the thalidomide child. In
Determinants of infant behavior, Vol. 4 (ed. B.M. Foss), pp. 167–87. Barnes and Noble, New York.
Gouin Décarie, T., and Ricard, M. (1996). Revisiting Piaget revisted, or the vulnerability of Piaget’s infancy
theory in the 1990s. In Developmental and vulnerability in close relationships (eds. G.G. Noam, and
K.W. Fischer), pp. 113–34. Lawrence Erlbaum Associates, Mahwah, NJ.
Gregory, R.L. (1967). Origin of eyes and brains. Nature, 213, 369–72.
Hairston, W.D., Burdette, J.H., Flowers, D.L., Wood, F.B., and Wallace, M.T. (2005). Altered temporal
profile of visual-auditory multisensory interactions in dyslexia. Experimental Brain Research, 166,
474–80.
Halit, H., de Haan, M., and Johnson, M.H. (2003). Cortical specialisation for face processing: face-sensitive
event-related potential components in 3 and 12 month-old infants. NeuroImage, 19, 1180–93.
Hein, G., Doehrmann, O., Müller, N.G., Kaiser, J., Muckli, L., and Naumer, M.J. (2007). Object familiarity
and semantic congruency modulate responses in cortical audiovisual integration areas. Journal of
Neuroscience, 27, 7881–87.
Held, R., Ostrovsky, Y., de Gelder, B., et al. (2011). The newly sighted fail to match seen with felt. Nature
Hernandez-Reif, M. and Bahrick, L.E. (2001). The development of visual-tactile perception of objects:
Amodal relations provide the basis for learning arbitrary relations. Infancy, 2, 51–72.
Hershenson, M. (1963). Reaction time as a measure of intersensory facilitation. Journal of Experimental
Hillock, A.R., Powers, A.R., and Wallace, M.T. (2010). Binding of sights and sounds: age-related changes
in multisensory temporal processing. Neuropsychologia, 49, 461–67.
Hooker, D. (1952). The prenatal origin of behavior. University of Kansas Press, Lawrence, KS.
Howes, D. (ed.) (1991). The varieties of sensory experience: A sourcebook in the anthropology of the senses.
University of Toronto Press, Toronto.
Howes, D. (2006). Cross-talk between the senses. The Senses and Society, 1, 381–90.
Humphrey, T. (1964). Some correlations between the appearance of human fetal reflexes and the
development of the nervous system. Progress in Brain Research, 4, 93–135.
ISO (1992). Standard 5492: Terms relating to sensory analysis. International Organization for
Standardization.
ISO (2008). Standard 5492: Terms relating to sensory analysis. International Organization for
Standardization.
James, W. (1890). Principles of psychology, Vol. 1. Henry Holt, New York.
Johnson, M.H. (2011). Interactive specialization: A domain-general framework for human functional brain
development? Developmental Cognitive Neuroscience, 1, 7–21.
Johnson, M.H., and de Haan, M. (2010). Developmental cognitive neuroscience, 3rd edn. Wiley-Blackwell,
Oxford.
Karmiloff-Smith, A. (2009). Nativism versus neuroconstructivism: rethinking the study of developmental
disorders. Developmental Psychology, 45, 56–63.
Kenny, P., and Turkewitz, G. (1986). Effects of unusually early visual stimulation on the development of
homing behaviour in the rat pup. Developmental Psychobiology, 19, 57–66.
King, A.J. (2005). Multisensory integration: strategies for synchronization. Current Biology, 15, R339–41.
Klatzky, R.L., and Lederman, S.J. (1993). Toward a computational model of constraint-driven exploration
and haptic object identification. Perception, 22, 597–621.
Klatzky, R.L., Lederman, S.J., and Reed, C. (1987). There’s more to touch than meets the eye: the salience of
object attributes for haptic with and without vision. Journal of Experimental Psychology: General, 116,
356–69.
Klin, A., Lin, D.J., Gorrindo, P., Ramsay, G., and Jones, W. (2009). Two-year-olds with autism orient to
non-social contingencies rather than biological motion. Nature, 459, 257–61.
Kushnerenko, E., Teinonen, T., Volein, A., and Csibra, G. (2008). Electrophysiological evidence of illusory
audiovisual speech percept in human infants. Proceedings of the National Academy of Sciences U.S.A.,
105, 11442–45.
Laurienti, P.J., Kraft, R.A., Maldjian, J.A., Burdette, J.H., and Wallace, M.T. (2004). Semantic congruence is
a critical factor in multisensory behavioral performance. Experimental Brain Research, 158, 405–414.
Laurienti, P.J., Burdette, J.H., Maldjian, J.A., and Wallace, M.T. (2006). Enhanced multisensory integration
in older adults. Neurobiology of Aging, 27, 1155–63.
LeCanuet, J.P., and Schaal, B. (1996). Fetal sensory competences. European Journal of Obstetrics,
Gynaecology, and Reproductive Biology, 68, 1–23.
Lee, D.N. (1980). The optic flow field: the foundation of vision. Philosophical Transactions of the Royal
Society London B, 290, 169–79.
REFERENCES 23
Lee, D.N., and Aronson, E. (1974). Visual proprioceptive control of standing in human infants. Perception
and Psychophysics, 15, 529–32.
Leslie, A.M. (1992). Pretence, autism, and the theory-of-mind-module. Current Directions in Psychological
Science, 1, 18–21.
Lewkowicz, D.J. (2000). The development of intersensory temporal perception: an epigenetic systems/
limitations view. Psychological Bulletin, 126, 281–308.
Lewkowicz, D.J. (2002). Heterogeneity and heterochrony in the development of intersensory perception.
Cognitive Brain Research, 14, 41–63.
Lewkowicz, D.J. (2011a). The biological implausibility of the nature-nurture dichotomy and what it means
for the study of infancy. Infancy, 16, 1–37.
Lewkowicz, D.J. (2011b). The development of multisensory temporal perception. In Frontiers in the neural
bases of multisensory processes (eds. M.M. Murray, and M.T. Wallace). Taylor and Francis, New York.
Lewkowicz, D.J., and Ghazanfar, A.A. (2009). The emergence of multisensory systems through perceptual
narrowing. Trends in Cognitive Sciences, 13, 470–78.
Lewkowicz, D.J., and Lickliter, R. (eds.) (1994). The development of intersensory perception: Comparative
perspectives. Lawrence Erlbaum Associates, Hillsdale, NJ.
Lewkowicz, D.J., and Turkewitz, G. (1980). Cross-modal equivalence in early infancy: auditory-visual
intensity matching. Developmental Psychology, 16, 597–607.
Lewkowicz, D.J., Leo, I., and Simion, F. (2010). Intersensory perception at birth: Newborns match
nonhuman primate faces and voices. Infancy, 15, 46–60.
Lickliter, R., and Bahrick, L.E. (2000). The development of infant intersensory perception: advantages of a
comparative convergent-operations approach. Psychological Bulletin, 126, 260–80.
Lickliter, R., and Banker, H. (1994). Prenatal components of intersensory development in precocial birds.
In The development of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz, and
R. Lickliter), pp. 59–80. Lawrence Erlbaum Associates, Hillsdale, NJ.
Locke, J. (1690). An essay concerning human understanding (ed. P.H. Nidditch 1979). Oxford University
Press, Oxford.
Lundström, J.N., Boesveldt, S., and Albrecht, J. (2011). Central processing of the chemical senses: an
overview. ACS Chemical Neuroscience, 2, 5–16.
Ma, W.J., and Pouget, A. (2008). Linking neurons to behavior in multisensory perception: a computational
review. Brain Research, 1242, 4–12.
Macpherson, F. (ed.) (2011). The senses: classic and contemporary philosophical perspectives. Oxford
University Press, Oxford.
Mann, I. (1964). The development of the human eye, 3rd edn. Grune and Stratton, New York.
Mareschal, D., Johnson, M.H., Sirois, S., Spratling, M., Thomas, M., and Westermann, G. (2007).
Neuroconstructivism, Vol. I: How the brain constructs cognition. Oxford University Press, Oxford.
Marks, L.E., and Odgaard, E.C. (2005). Developmental constraints on theories of synaesthesia. In
Synaesthesia: perspectives from cognitive neuroscience (eds. L. Robertson, and N. Sagiv), pp. 214–36.
Oxford University Press, New York.
Mauk, M.D., and Buonomano, M.D. (2004). The neural basis of temporal processing. Annual Review of
Maurer, D., and Mondloch, C.J. (2005). Neonatal synaesthesia: a re-evaluation. In Synaesthesia: Perspectives
from cognitive neuroscience (eds. L.C. Robertson, and N. Sagiv), pp. 193–213. Oxford University Press,
Oxford.
Maurer, D., Pathman, T., and Mondloch, C. (2006). The shape of boubas: sound-shape correspondences in
toddlers and adults. Developmental Science, 9, 316–22.
Meck, W.H. (1991). Modality-specific Circadian rhythmicities influence mechanisms of attention and
memory for interval timing. Learning and Motivation, 22, 153–79.
Meltzoff, A.N., and Borton, R.W. (1979). Intermodal matching by human neonates. Nature, 282, 403–404.
Miller, J.O. (1991). Channel interaction and the redundant targets effect in bimodal divided attention.
Journal of Experimental Psychology: Human Perception and Performance, 17, 160–69.
Moon, C., and Fifer, W.P. (2011). Prenatal development. In An Introduction to Developmental Psychology,
2nd edn (eds. A. Slater and J.G. Bremner). Wiley-Blackwell, Oxford.
Moore, K.L., and Persaud, T.V.N. (2008). The developing human: clinically oriented embryology, 8th edn.
Saunders Elsevier, Philadelphia, PA.
Murray, M.M., and Wallace, M.T. (eds.) (2011). The neural bases of multisensory processes. Taylor and
Francis, New York.
Naumer, M.J., and Kaiser, J. (eds.) (2010). Multisensory object perception in the primate brain. Springer,
New York.
Nardini, M., Jones, P., Bedford, R., and Braddick, O. (2008). Development of cue integration in human
navigation. Current Biology, 18, 689–93.
Nava, E., Scala, G.G., and Pavani, F. (in press). Changes in sensory preference during childhood:
Converging evidence from the Colavita effect and the sound-induced flash illusion. Child Development.
Neil, P.A., Chee-Ruiter, C., Scheier, C., Lewkowicz, D.J., and Shimojo, S. (2006). Development of
multisensory spatial integration and perception in humans. Developmental Psychology, 9, 454–64.
Occelli, V., Spence, C., and Zampini, M. (2011). Audiotactile interactions in temporal perception.
Psychonomic Bulletin and Review, 18, 429–54.
O’Reilly, R.C., and Munakata, Y. (2000). Computational explorations in cognitive neuroscience:
understanding the mind by simulating the brain. MIT Press, Cambridge, MA.
Partan, S. and Marler, P. (1999). Communication goes multimodal. Science, 283, 1272–73.
Parise, C.V., and Spence, C. (in press). Audiovisual crossmodal correspondences. To appear in Oxford
handbook of synaesthesia (eds. J. Simner, and E. Hubbard). Oxford University Press, Oxford, UK.
Petrini, K., Dahl, S., Rochesson, D., et al. (2009). Multisensory integration of drumming actions: Musical
expertise affects perceived audiovisual asynchrony. Experimental Brain Research, 198, 339–52.
Petrini, K., Russell, M., and Pollick, F. (2009). When knowing can replace seeing in audiovisual integration
of actions. Cognition, 110, 432–39.
Piaget, J. (1952). The origins of intelligence in the child. Routledge and Kegan-Paul, London.
Pons, F., Lewkowicz, D.J., Soto-Faraco, S., and Sebastián-Gallés, N. (2009). Narrowing of intersensory
speech perception in infancy. Proceedings of the National Academy of Sciences U.S.A., 106, 10598–10602.
Pöppel, E. (1973). Comments on ‘Visual system’sf view of acoustic space’. Nature, 243, 231.
Roland, B.A., Quessy, S., Stanford, T.R., and Stein, B.E. (2007). Multisensory integration shortens
physiological response latencies. Journal of Neuroscience, 27, 5879–84.
Ross, L.A., Molholm, S., Blanco, D., Gomez-Ramirez, M., Saint-Amour, D., and Foxe, J.J. (2011).
The development of multisensory speech perception continues into the late childhood years.
European Journal of Neuroscience, 33, 2329–37.
Russo, N., Foxe, J.J., Brandwein, A.B., Altschuler, T., Gomes, H., and Molholm, S. (2010). Multisensory
processing in children with autism: high-density electrical mapping of auditory-somatosensory
integration. Autism Research, 3, 253–67.
Schaal, B., Hummel T., and Soussignan, R. (2004). Olfaction in the fetal and premature infant: functional
status and clinical implications. Clinics in Perinatology, 31, 261–85.
Sirois, S., Spratling, M.W., Thomas, M.S.C., Westermann, G., Mareschal, D., and Johnson, M.H. (2008).
Precis of neuroconstructivism: how the brain constructs cognition. Behavioral and Brain Sciences, 31,
321–56.
Smith, E.L., Grabowecky, M., and Suzuki, S. (2007). Auditory-visual crossmodal integration in perception
of face gender. Current Biology, 17, 1680–85.
Spelke, E.S., and Kinzler, K.D. (2007). Core knowledge. Developmental Science, 10, 89–96.
Spence, C. (2011a). Crossmodal correspondences: A tutorial review. Attention, Perception, and
Psychophysics, 73, 971–95.
REFERENCES 25
Spence, C. (2011b). The multisensory perception of touch. In Art and the sense (eds. F. Bacci, and
D. Mecher), pp. 85–106. Oxford University Press, Oxford, UK.
Spence, C. (in press). Multisensory perception, cognition, and behaviour: Evaluating the factors
modulating multisensory integration. To appear in The new handbook of multisensory processing
(ed. B.E. Stein). MIT Press, Cambridge, MA.
Spence, C. and Bremner, A.J. (2011). Crossmodal interactions in tactile perception. In The handbook of
touch: neuroscience, behavioral, and health perspectives (eds. M. Hertenstein, and S. Weiss), pp. 189–216.
Springer, New York.
Spence, C., and Driver, J. (eds.) (2004). Crossmodal space and crossmodal attention. Oxford University Press,
Oxford.
Spence, C., and Squire, S. (2003). Multisensory integration: Maintaining the perception of synchrony.
Current Biology, 13, R519–21.
Spence, C., Shore, D.I., and Klein, R.M. (2001). Multisensory prior entry. Journal of Experimental
Psychology: General, 130, 799–832.
Spence, C., McDonald, J., and Driver, J. (2004). Exogenous spatial cuing studies of human crossmodal
attention and multisensory integration. In Crossmodal space and crossmodal attention (eds. C. Spence,
and J. Driver), pp. 277–320. Oxford University Press, Oxford.
Spence, C., Ngo, M., Lee, J.-H., and Tan, H. (2010a). Solving the correspondence problem in haptic/
multisensory interface design. In Advances in haptics (ed. M.H. Zadeh), pp. 47–74. In-Teh Publishers,
Vukovar, Croatia.
Spence, C., Levitan, C., Shankar, M. U., and Zampini, M. (2010b). Does food color influence taste and
flavor perception in humans? Chemosensory Perception, 3, 68–84.
Stein B.E. (ed.) (in press). The new handbook of multisensory processes. MIT Press, Cambridge, MA.
Stein, B.E., and Meredith, M.A. (1993). The merging of the senses. MIT Press, Cambridge, MA.
Stein, B.E., and Stanford, T.R. (2008). Multisensory integration: current issues from the perspective of the
single neuron. Nature Reviews Neuroscience, 9, 255–67.
Stein, B.E., Meredith, M.A., and Wallace, M.T. (1994). Development and neural basis of multisensory
integration. In The development of intersensory perception: Comparative perspectives (eds.
D.J. Lewkowicz, and R. Lickliter), pp. 81–107. Lawrence Erlbaum Associates, Hillsdale, NJ.
Stein, B.E., Burr, D., Costantinides, C., et al. (2010). Semantic confusion regarding the development of
multisensory integration: a practical solution. European Journal of Neuroscience, 31, 1713–20.
Taylor, M.J., Menzies, R., MacMillan, L.J., and Whyte, H.E. (1987). VEPs in normal full-term and
premature neonates: longitudinal versus cross-sectional data. Electroencephalography and Clinical
Neurophysiology, 68, 20–27.
Trommershäuser, J., Landy, M.S., and Körding, K.P. (eds.) (2011). Sensory cue integration. Oxford
University Press, New York.
Turkewitz, G. (1994). Sources of order for intersensory functioning. In The development of intersensory
perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 3–18. Lawrence
Erlbaum Associates, Hillsdale, NJ.
Turkewitz, G., and Kenny, P.A. (1982). The role of developmental limitations of sensory input on
perceptual devleopment: a preliminary theoretical statement. Developmental Psychobiology, 15, 357–68.
Van der Meer, A.L. (1997). Keeping the arm in the limelight: Advanced visual control of arm movements in
neonates. European Journal of Paediatric Neurology, 1, 103–108.
Varela, F.J., Thompson, E., and Rosch, E. (1991). The embodied mind: cognitive science and human
experience. MIT Press, Cambridge, MA.
Wagner, K., and Dobkins, K.R. (2011). Synaesethetic associations decrease during infancy. Psychological
Science 22, 1067–1072.
Walker, P., Bremner, J.G., Mason, U., et al. (2010). Preverbal infants’ sensitivity to synaesthetic
cross-modality correspondences. Psychological Science, 21, 21–25.
Walker-Andrews, A. (1994). Taxonomy for intermodal relations. In The development of intersensory

perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 39–56. Lawrence
Wallace, M.T. (2004). The development of multisensory processes. Cognitive Processes, 5, 69–83.
Wallace, M.T. et al. (2004) Visual experience is necessary for the development of multisensory integration.
Journal of Neuroscience, 24, 9580–84.
Weitzman, E.D., and Graziani, L.J. (1968). Maturation and topography of the auditory evoked response of
the prematurely born infant. Developmental Psychobiology, 1, 79–89.
Wallace, M.T., and Stein, B.E. (1997) Development of multisensory neurons and multisensory integration
in cat superior colliculus. Journal of Neuroscience, 17, 2429–44.
Welch, R.B., and Warren, D.H. (1980). Immediate perceptual response to intersensory discrepancy.
Psychological Bulletin, 3, 638–67.
Yu, L., Rowland, B.A., and Stein, B.E. (2010). Initiating the development of multisensory integration by
manipulating sensory experience. Journal of Neuroscience, 30, 4904–13.
Zmyj, N., Jank, J., Schütz-Bosbach, S., and Daum, M.M. (2011). Detection of visual-tactile contingency in
the first year after birth. Cognition, 120, 82–89.
Part A
Typical development of
multisensory processes from
early gestation to old age
Chapter 2
The role of olfaction in human

multisensory development
Benoist Schaal and Karine Durand
The omission of taste, touch and olfaction. . .is. . .an accurate

reflection of the disproportionate concern which has been
afforded to vision and audition in studies of infancy. . .
Such imbalance would detract from a comprehension of
functioning at any stages of development, but may be
particularly distorting with regard to our understanding of the
world of the infant.
(G. Turkewitz 1979, p. 408)
2.1 Introduction
Olfaction is an unavoidable and ubiquitous source of perceptual experience from the earliest
steps of mammalian development.1 Olfaction is unavoidable because, in all mammals, nasal
chemosensors develop functionally in advance of other sensory systems (with the exception of
somesthesic/kinesthesic sensors), and are thus in a position of neurologically-imposed readiness
to ‘feed’ the brain before the functional inception of hearing and vision, and to bind with their
inputs when these latter sensory systems set on. Olfaction is ubiquitous because nasal chemosen-
sors are in direct contact with stimuli that result from the normal biological functioning of a
mother–infant relationship. In all placental mammals, the foetuses are bathed in an odorous
amniotic fluid (which is also likely to give rise to tastes). All infants are fed flavourful milk and
cared for in the fragrant bubble of a parent’s body, and they are then all introduced to non-milk
foods whose chemosensory properties are essential to the establishment of liking and wanting,
and thereby sorting out the unwanted and the disliked. All through these stages, either as fore-
ground or as background elements, odours co-occur, interact, and merge into percepts elabo-
rated with the other senses.
The present chapter attempts to summarize the current understanding of how olfaction
functions in concert with the other senses during human development, and the various ways early
multisensory effects involving the chemical senses operate. We will survey some of the available
results concerning odour-based intersensory effects. There are several ways for a sense modality
1 In fact, olfaction appears to be universal in the early developmental steps of any animal species, but our
focus here will be restricted to mammals.
30 THE ROLE OF OLFACTION IN HUMAN MULTISENSORY DEVELOPMENT
to be caught up in multisensory functioning. As defined broadly, ‘intersensory functioning is

involved whenever there are joint influences of stimulation in more than one modality’ (Turkewitz
1994, p. 3). Such intersensory influences are multiple and change over the course of development
(Turkewitz and Mellon 1989; Tees 1994), involving either:
◆ no effects of one modality on another
◆ facilitatory or inhibitory effects of stimulation in one modality on responsiveness to stimuli in
another modality
◆ the association of stimuli between modalities
◆ the extraction of properties that can be picked up by any sensory modality, therefore desig-
nated as amodal properties (E.J. Gibson 1969).
Transposed into the domain of olfaction, these categories of multisensory effects raise the
following questions:
1) Do odours exist independently from other percepts? More specifically, do some odour
perceptions function uninfluenced by other kinds of sensory information and does this
change over development?
2) How do odours alter what one sees, hears, and feels by touch, kinesthesis, proprioception,
pain, or interoception? For instance, how do odours modulate organismic states (attention,
motivation, emotion, mood) that bias the perception of stimuli in other modalities?
Reciprocally, do stimuli in other sense modalities shape olfactory perception?
3) How do olfactory and proprioceptive/tactile/auditory/visual linkages univocally or reciprocally
grow out of, and depend on, experience with objects, persons, and events? Are there biological
constraints to bind odours more easily with certain stimuli rather than with others?
4) To what extent can odours present information that is redundant with information presented
by coordinated sensory systems? Can properties of objects, events, or states be concurrently
specified by olfaction and other sense modalities? In other words, which amodal properties of
objects or events, such as their location, temporal synchrony, tempo, duration, intensity,
variety, complexity, substance, affective value, or common affordances (Walker-Andrews
1994), can be carried by odours, and through which mechanisms are they acquired?
Taken together, all the above issues lead to two overarching research questions. The first
considers how odours are mutually articulated and bound with other sensory impressions to
become integral attributes of multisensory objects, persons, events, or contexts. The second ques-
tion concerns how multisensory processes involving olfaction affect behaviour and contribute
to emotional and cognitive development.
The reader is warned that these questions can be answered only partially today. Indeed, if ‘our
understanding of intersensory functioning is truly in its infancy’ in the course of ontogenesis
(Turkewitz 1994: p. 15), our developmental appreciation of how the chemical senses participate
in multisensory processing is only in its very initial, ‘embryonic’ stage. Thus, the first step that is
required before we can better understand mechanisms underlying the developmental interactions
between the chemical senses and the other senses is a concentrated collection of descriptive
and normative data (at least in humans). Only after such data are obtained can the focus shift
to underlying processes and mechanisms, both in humans and non-humans. Whenever the avail-
able results make it possible, we will here interweave the patchy data from humans with the more
substantial data from non-human animal studies. These latter studies provide research hypothe-
ses and theoretical scaffolding for the human domain, and enrich explanations of human data.
However, sensory functions, and hence the different categories of intersensory functions, unques-
tionably depend on the level of maturity of an infant’s sensory modalities. We will therefore
SOME SPECIFICITIES OF OLFACTION IN HUMAN DEVELOPMENT 31
address them at various developmental stages, namely in foetuses, newborns, infants, and
children. Again, however, it is clear that developmental data on the participation of chemorecep-
tion in multisensory processes are not only scant, but they have been unevenly addressed across
these periods of development.
2.2 Some specificities of olfaction in human development

This section briefly summarizes aspects of current biological and psychological knowledge
concerning olfaction in humans. The aim is to provide some functional principles useful to
understand the following exposition on the development of chemosensation in the context of
multisensory processes. At this stage, nasal chemoreception will be considered in combination
with oral chemoreception to highlight their basic functional interconnectedness. But the empha-
sis of the subsequent sections of the chapter will essentially be on olfaction (for taste and flavour,
see the Chapter 3 by Spence).
2.2.1 Chemoreception is anatomically and functionally complex

Chemoreception is a multisensory system from the outset, constituted by the accretion of ana-
tomically-distinct systems of sensors placed close to the aerial and digestive entries (cf. Fig. 2.1).2
In humans, two main chemoreceptive subsystems are located in the nasal cavities and two others
in the mouth (for reviews on these subsystems, see Doty 2003).3 The major structures to establish
nasal chemoreception are the olfactory and the trigeminal subsystems. Olfactory sensory neurons
(OSNs) of the main olfactory system dwell in the roof of the nasal cavity, connecting through
olfactory nerves to the main olfactory bulbs (OBs). Branches of the trigeminal system innervate
the epithelium lining the upper respiratory tract. The olfactory system is tuned to detect innu-
merable compounds carried in low concentrations in the incoming airflow. The trigeminal sys-
tem is mainly sensitive to higher concentrations of chemostimulants bearing irritating properties,
mediating sensations such as stinging, burning, or cooling (Doty and Cometto-Muniz 2003).
These nasal subsystems appear, however, well interconnected, with functional overlaps or syner-
gies (Murphy and Cain 1980). Thus unless stated otherwise, olfaction will be considered here as a
multi-channel, nasal event without further specification of the separate ‘subsystem’ responsible
for corresponding percepts. Oral chemoreception is based on the specific gustatory (or taste)
pathway via the taste buds, and on a tacto-chemical pathway via trigeminal innervation of the oral
cavity. The gustatory subsystem mediates several principal tastes (sweet, sour, salty, bitter, and
savoury or umami) and their combinations, while the oral trigeminal subsystem mediates astrin-
gency, burning, and cooling, in the same time as temperature and ‘mouthfeel’ (Rolls 2005; Simon
et al. 2006). Thus, from the very periphery of the systems, olfaction, taste, and trigeminal sensa-
tion operate concomitantly (and perhaps sequentially) while food is processed, to provide multi-
sensory information to higher brain structures.
2 Chemoreception is also effective through taste receptors present in the gut (duodenum) where it can
mediate associative processes between intestinal chemoreception and post-ingestive interoception (e.g.
Raybould 1998; Sclafani 2007).
3 Two other chemoreceptive subsystems dwell in the nose: the vomeronasal and the terminal subsystems. The
vomeronasal system comprises bilateral invaginations on the nasal septum (the middle-wall of the nose) lined
with sensory-like cells. The terminal system distributes free nerve endings to the anterior part of the nasal
septum. The exact sensory functions of these two subsystems remain unclear in humans (cf. Doty 2003).
Olfactory bulb Trigeminal

nerve
Olfactory
nerves
Ortho-
nasal
&
retro-
nasal
routes
Olfactory system Trigeminal system
Olfactory Taste bud

receptor Taste
neuron nerves
Taste cells
Taste system
Fig. 2.1 Schematic representation of the human nasal and oral chemosensory systems. Upper left:
the main olfactory system: odorant molecules access the olfactory region at the top of the nasal
cavity by the orthonasal or retronasal pathways. The odorants interact with the receptor proteins
situated on the dendrites of the olfactory sensory neurons that transmit information to the
olfactory bulbs, which then project to primary and secondary areas. Low middle: the gustatory
system: taste buds situated on the tongue send axons to primary and secondary projection areas.
Upper right: the nasal and oral trigeminal subsystem: free nerve endings distributed in the whole
nasal and oral mucosae connect with the brain stem and somesthesis integrative area (© André
Holley, 2012).
2.2.2 The chemoreceptive modalities engender unitary percepts

Although anatomically and functionally dissociable, the chemoreceptive modalities give rise to
unitary percepts. An organism can sample stimuli arising from outside (odours) and from inside
(flavours) the body. Olfaction is thus involved in two perceptual systems, one concerned with
guiding actions in the physical and social world, and the other concerned with savouring food
and beverages, and controlling eating. Odorants can reach the nasal chemoreceptive subsystems
by the orthonasal (when inhaling through the external nares) or the retronasal routes (when
odorants are aspired up from the mouth into the nasal cavities through the choanae; cf. Fig. 2.1).
Ortho- and retronasal pathways elicit different, and sometimes discrepant, perceptions from the
same odour stimulus (Rozin 1982) and they function at different levels of sensitivity (Heilmann
and Hummel 2004). ‘Inside olfaction’ and taste generally work in synchrony, especially when
eating or drinking is concerned, and give rise to odour–taste perceptual mixtures—so-called
flavours—which result from co-operating oro-nasal chemoreception, passive and active (e.g. in
chewing) somesthesis, and audition and vision (Auvray and Spence 2008 ; see Chapter 3
by Spence). Olfaction and taste fuse in a single flavour percept, in which gustatory/olfactory
components are difficult to tell apart spontaneously. Some authors insist on the automatic nature
of the odour–taste association and suggest that it has synesthesia-like properties (Stevenson and
Boakes 2004). Such odour–taste associations are common and do persist lifelong, unlike similar
phenomena involving other sense modalities (Stevenson and Tomiczek 2007). This stability of
odour–taste synesthesia has been suggested to date back to the earliest stages of development,
when oronasal sensations do regularly co-occur through chemical cues carried in amniotic fluid
and milk. Thus, the taste–olfaction synesthesia may in part be explained by the lack of early
pruning of neural interconnections (Spector and Maurer 2009; Verhagen and Engelen 2006),
so that these sense modalities remain more closely interconnected in perception (cf. Chapter 10
by Maurer).
2.2.3 Nasal and oral chemoreceptive subsystems are characterized

by their ontogenetic precocity
Nasal and oral chemoreceptive systems begin their functional activity early along a non-random
temporal sequence of sensory ontogeny, which is well conserved among vertebrates (Gottlieb
1971). Olfactory and gustatory functions begin somewhere between the onset of somesthesis and
that of kinesthesis, and well before audition and vision (Lecanuet and Schaal 1996). In humans,
the olfactory system shows adult-like ciliated OSNs by the 11th gestational week. Taste buds can
be found as early as gestational week 12, and are morphologically mature by gestational week 13
(Beidler 1961). Nasal and oral trigeminal subsystems do support functional activity even earlier
than olfaction and taste, as they appear by gestational week 4 and respond to touch stimulation
by gestational week 7. Thus, human nasal and oral chemoreceptors undergo an anatomical devel-
opment compatible with sensory function from early gestation, although perceptual processing of
sensory inputs may arise later in the last trimester (Schaal et al. 1995b, 2004). Accordingly, from
this age on, informative odour (and taste) cues can theoretically be detected and transduced to
the brain, and can be perceptually bound together, as well as with other sources of sensation.
2.2.4 The nature of information provided by the chemical senses

The chemical senses potentially mediate a wealth of informative cues that are inherent either to
the stimuli—quality, perceived intensity, variety, complexity—or to the properties derived from
an individual’s idiosyncratic interactions with stimuli—hedonic valence (pleasantness/unpleas-
antness), familiarity/novelty, and utility knowledge or affordances (e.g. edibility, stress-buffer-
ing). These properties are interrelated, quality being indeed linked with intensity, and hedonic
valence with intensity and familiarity (Cain and Johnson 1978; Delplanque et al. 2008). Humans’
initial reactions when facing an odorous object or context, whether autonomic, behavioural, or
declarative, pertain to their pleasantness (Schiffman 1974). Such reactions are not symmetric for
pleasant and unpleasant odours or flavours, the latter being treated more rapidly than the former
(Bensafi et al. 2002). Many of the above properties, namely the hedonic and familiarity properties,
have been shown to be attended to keenly from the earliest stages of development (Schaal 1988,
2006; Soussignan et al. 1997; Steiner 1979; Steiner et al. 2001) and even in premature infants
(Pihet et al. 1996, 1997; Schaal et al. 2004). To a certain extent, chemoreception can also carry
spatial cues, although adults are generally considered poor at localizing based on odour alone and
trigeminal sensation seems mainly involved in laboratory experiments (Kobal et al. 1992). As in
adults, trigeminally-driven orientation responses are observed in newborns for offensive stimuli
(Rieser et al. 1976). But ontogenetically adaptive localizing responses based on odour cues may
be proper to the neonatal period. Indeed, infants display reliable spatial performance in localizing
an odour source of low intensity, and this may be improved by bilateral sampling movements
(rooting behaviour) within odour gradients to eventually bring the nose (and the mouth) in con-
tact with the target breast and nipple (see below, Section 2.4).
2.2.5 The role of the neural architecture of olfaction in

multisensory processing
There is a close structural overlap between brain regions related to olfaction (and taste), and those
regions involved in the affective processing of incoming stimuli. Primary and secondary projec-
tions of olfaction (namely anterior olfactory nucleus, olfactory tubercle, piriform cortex, anterior
nucleus of the amygdala, periamygdalian cortex and entorhinal cortex, hypothalamus, hippoc-
ampus, and orbitofrontal cortex (Price 1990; Carmichael et al. 1994) are indeed known to orches-
trate endocrine, autonomic, visceral, emotional, associative, and memory functions (Gottfried
2006; Rolls 2005). Thus, at any stage in development, odours carry strong intrinsic abilities to
evoke instant affects and to re-evoke affect-laden memories long after they have been encoded
(Yeshurun and Sobel 2010). Olfaction is thus in a position to integrate and associate with inputs
from the visceral field that are related to homeostasis or illness, at the same time as memories
from previous experience, emotion, feelings, and knowledge supported by higher cognitive
competences.
2.2.6 Modularity of the chemosensory systems

The chemosensory systems appear to be ‘modular’, and such functional modules appear to follow
a heterochronous ontogeny. This is exemplified in gustation, where distinct ‘submodalities’ sense
sweet, bitter, sour, savoury, and salty stimuli. All of these prototypical taste modes are operative
since birth, and even before, with the seeming exception of saltiness, which appears to control
responses later in development (Beauchamp et al. 1986). In olfaction, evidence from rodents
suggests that the OBs are built as assemblies of heterogeneous neural subsystems or functional
modules that process distinct chemical compounds (e.g. Johnson and Leon 2002; Kobayakawa
et al. 2007). The developmental dynamics of such modules show that glomeruli in the OBs are
progressively but rapidly specified in early development. In the rat pup, odour activation recruits
only a limited set of bulbar zones during the first 3 postnatal days, but bulbar activation increases
to nearly adult levels by day 15 (Astic and Saucier 1982). Thus, functional modules in the OB are
heterochronous, the earliest being presumably linked with OSNs that become functionally mature
in advance of others. Such early-developing modules may be caught-up in the processing
of stimuli mediating vital responses in newborn organisms. This order of functional onset of
modules in the OB is analogous to the ordered onset of sensory systems noted by Gottlieb (1971),
which Turkewitz and Kenny (1985) theorized as an evolved strategy to open the brain progres-
sively to sensory activity, thereby reducing the nature, amount, and complexity of stimuli availa-
ble to maturing neural tissue and regulating competition between emerging sensory systems. This
same logic may be applied intra-modally: a limited set of early functional processing modules
may prevent informational overload in the modalities that support the completion of actions that
are critical to neonatal adaptation and survival. In summary, apparent sensory specializations in
olfaction might emerge from time-ordered neural development, leading certain chemosensory
stimuli to be more readily engaged than others in the control of behaviour, and in uni- and mul-
tisensory learning.
2.2.7 Early functions of olfaction

The first function of olfaction and taste is to divide the world into entities that should be
approached and those that should be avoided, before and after oral incorporation, respectively.
Chemoperceptual abilities support such discrimination from birth. Some chemosensors are con-
genitally connected with distinctive motor-action patterns in the oral–facial field. For example, in
human newborns, the taste of glucose, sucrose, or umami (monosodium glutamate) elicit suck-
ing and smacking, tongue protrusion, and a relaxed facial configuration with smiling-like actions.
In contrast, bitter or sour stimuli trigger gape and gag responses with grimaces, negative head
turns, and crying (Rosenstein and Oster 1990; Steiner 1979; Stirnimann 1936). These responses
occur even in infants born prematurely, who may not yet have been directly exposed to such
stimuli (Maone et al. 1990; Tatzer et al. 1985). Comparable unconditional reactions seem rarer in
the olfactory domain, although newborns display hedonically-differentiated responses to odor-
ants before they had opportunities for extensive postnatal odour learning (Steiner 1979 ;
Soussignan et al. 1997). One case is the rabbit pup’s response to a single odour compound emitted
in milk by the nursing female. This rabbit ‘mammary pheromone’ is behaviourally active appar-
ently without inductive specification by postnatal or prenatal exposure with the same compound
(Schaal et al. 2003). So far it is not known whether such unconditioned odour stimuli do exist in
humans. Such ‘ready-to-use’ stimulus-response loops have been termed prefunctional (Hogan
1998) or predisposed (Bolhuis 1996), in order to highlight the fact that they work in newly born
organisms in advance of functional demands from the environment and without the intervention
of obvious environmental processes (direct sensory exposure) to induce the stimulus–response
association (Schaal 2012). Interestingly, such predisposed chemosensory stimuli also often act as
potent primary reinforcing agents, which transfer their behavioural effect to any contingent, ini-
tially neutral, stimulus (e.g. Booth 1990; Coureaud et al. 2006).
2.2.8 The effect of environmental exposure on

chemosensory structures
Apart from some notable predisposed mechanisms, chemosensory structures and their func-
tional performances are massively influenced by environmental exposure effects and learning
(reviewed in Schaal 2012). The fine-tuning (through neuronal selection, inter-neuronal connec-
tivity, neoneurogenesis) of the sensory organs, connecting centres, and corresponding motor
loops have been shown to depend strongly on prior sensory impact by deprivation or selective
enrichment experiments. These epigenetic influences may be maximized during sensitive periods
of neurosensory ontogeny. However, it is a general property of olfaction (as well as of taste) to be
extremely susceptible to the local conditions of the environment. This plasticity has been docu-
mented both at the peripheral and central levels, where odour exposure impinges on sensitivity
(Wang et al. 1993) and where learning and expertise ameliorate discrimination (Bende and
Nordin 1997; Rabin 1988). The processes underlying odour learning range from non-associative
familiarization (e.g. Li et al. 2006) through to associative processes, such as those mobilized in
evaluative or classical conditioning (Hermans and Baeyens 2002; Li et al. 2008). As for other brain
functions, the plasticity of olfaction, although generally high, may not be linear during develop-
ment. ‘Sensitive periods’ can indeed augment the influence of environmental exposure to and/or
learning of odours and flavours. Such sensitive periods have been described during the neonatal
period when the acquisition of arbitrary odours appears to be facilitated in both non-human and
human subjects (Delaunay-El Allam et al. 2010; Leon et al. 1987). But, more broadly, early infancy
represents a period when the range of odour/flavour preferences first become established
(Beauchamp and Mennella 1998). Finally, although research on experience-dependent plasticity
has been mainly concerned with odour–odour or odour–taste processes, it is argued that wider
integration of olfaction with other sensory modalities also depends largely on experience
(e.g. Spear and Molina 1987).
2.2.9 Subliminal processing of odour stimuli

Olfactory thresholds are highly variable among individuals (Cain 1977), but odour stimuli can be
processed even when they are unnoticed or delivered subliminally. Odours can indeed be proc-
essed subconsciously and affect numerous psychobiological variables in adult humans, including
for example endocrine release (e.g. Wyart et al. 2007), mood fluctuations (e.g. Bensafi et al. 2004;
Lundstrom and Olsson 2005), cognitive performance (e.g. Kirk-Smith et al. 1983; Köster et al.
2002; Zucco et al. 2009), visual exploration (e.g. Michael et al. 2003; Seigneuric et al. 2011), or
likeability assessment of others (e.g. Li et al. 2007). It is interesting to note that the influence of
implicit odour stimuli on memory or on social preferences has been found to be more effective
when participants report that they are unaware of any odour stimulus (Degel and Köster 1999).
These results suggest that odours can exert strong memory and behavioural effects outside
of conscious notice, a point that should be borne in mind in order to understand the subtle influ-
ence of olfaction in infants’ and children’s attitudes and behaviour, as discussed below.
2.2.10 Persistence of odours in memory

Once encoded, odours generally appear to be more persistent in memory than cues encoded in
the other sense modalities (Engen 1991; Herz and Engen 1990). Adult olfactory memory is indeed
only slightly influenced by passing time either in the short-term or in the long-term, and this
stands in contrast with what is observed in vision and audition (Engen and Ross 1973; Engen et al.
1973). Furthermore, the acquisition of olfactory associations appears strongly affected by proac-
tive interference, but only negligibly by retroactive interference, leading the organism to give
more weight to experiences encountered for the first time (Lawless and Engen 1977; Engen 1991).
These fundamental properties have great importance in understanding the developmental roles
of the olfactory sensorium. The early functionality of olfactory memory is exemplified in the
neonatal retention of foetal odours (Abate et al. 2000; Hepper 1995; Faas et al. 2000; Mennella
et al. 2001; Schaal et al. 2000), infantile retention of neonatal odour imprints (Delaunay-El Allam
et al. 2010; Mennella and Beauchamp 1998), and in adults’ reliance on early odour memories
during biographic recollections (Chu and Downes 2000; Willander and Larsson 2007). For
instance, adults can name odours they have not re-encountered since childhood (Goldman and
Seamon 1992) and can display aversions to odours that were negatively conditioned in childhood
(Batsell et al. 2002). Indeed, the profile of their olfactory likes and dislikes (Haller et al. 1999;
Teerling et al. 1994) or differential brain responses to olfactory cues (Poncelet et al. 2010) can be
traced back to early exposure to characteristic ethnic flavours. Such autobiographical memories
are typically composed of sensory percepts in a range of modalities, but odour-based representa-
tions are better remembered and for longer than vision- and word-based representations
(Chu and Downes 2002; Herz 2004).
2.2.12 Links of olfactory percepts with language

The links of olfactory percepts with language are weak. Firstly, odours are poorly represented in
languages around the world and, when they are, local terms are mainly devoted to odours bearing
unpleasant connotations or representing potential harm (e.g. Boisson 1997; Mouélé 1977). Thus,
as already noted, the usual way to label an odour percept is first in the realm of binary hedonic
appreciation. When precise identification is required, such as an accurate name, humans per-
form, on average, very poorly (Engen 1987). However, when words or icons are provided as cues,
or when odour–word links are taught, odour identification is greatly improved, suggesting that
the odour-related verbal deficit resides in great part in memory access or retrieval limitation.
However, this lexical deficit regarding odours does not necessarily imply a systematic semantic
CAN ODOURS BE DRAWN INTO MULTISENSORY PROCESSES IN THE FOETUS? 37
deficit. For example, when confronted with a range of odours from hazardous household prod-
ucts, children gave the correct name to only 15% of them but accurately rated their edibility in
79% of cases (de Wijk and Cain 1994). Thus, although linguistic competence facilitates odour
discrimination and categorization (e.g. Möller et al. 2004), there is no absolute necessity to master
language in order to make sense of the odour and flavour worlds. Thus odours, flavours, and
tastes can mediate multiple and sophisticated affordances regarding the environment in preverbal
stages of human development. Odours or flavours are mainly encoded as associative properties
of objects or contexts: they are accordingly verbalized in reference to these objects or contexts,
and not in terms of attributes that are abstracted from objects as would be the case with colours.
This basic associative character of olfaction has already been noted above in relation to the elabo-
ration of biographical memories, and it will be further highlighted below in our discussion of the
pervasive involvement of odours in multisensory learning.
We conclude from the above points that the involvement of the chemosensory modalities in
early sensory, emotional, cognitive, and behavioural development is multi-determined. Although
some odour-specific perceptual specializations may emerge uninfluenced (or minimally influ-
enced) by experience, the bulk of our chemosensory responses are predominantly canalized by
experience from the earliest stages in ontogeny. Such effects of experience are persistent through-
out the lifespan. Thus the young organism can be considered to be both a perceptual specialist,
able to attend to particular chemosensory stimuli that have survival value, and a skilful generalist,
prone to react to and learn any novel odour stimulus in the environment linked with beneficial or
detrimental consequences.
2.3 Can odours be drawn into multisensory processes

in the foetus?
The developmental precocity of the chemical senses makes possible months of epigenetic influ-
ence created by the pregnant mother and the growing foetus itself. Mammalian foetuses, includ-
ing human ones, are indeed exposed to an amniotic ecology that is replete with chemosensory
agents (Schaal et al. 1995; Robinson and Mendéz-Gallardo 2010) that nasal chemosensors can
detect and which connected brain structures can process, as shown by unimodal testing in animal
foetuses (Schaal and Orgeur 1992; Smotherman and Robinson 1987, 1995). It has been estab-
lished that foetuses can process chemosensory information by testing neonates with chemical
compounds they could have encountered only in the womb. In utero, both olfactory and gusta-
tory chemosensors are probably stimulated, but subsequent testing in aerial conditions shows
that the olfactory percept is sufficient to explain detection/retention by the foetal brain. Thus,
when presented with the odour of their amniotic fluid as compared to that of the amniotic fluid
from another foetus, 3-to-4-day-old newborns exhibit a preference for the familiar amniotic fluid
(Schaal et al. 1998). Along similar lines, human foetuses, much like foetuses in other mammalian
species (e.g. Hepper 1988a; Schaal et al. 1995a; Smotherman and Robinson 1995), are able to
extract a variety of odorants transferred in utero from the mother’s diet and which they can retain
in memory for several days, and up to several months, after birth (Hepper 1995; Faas et al. 2000;
Mennella et al. 2001; Schaal et al. 2000; cf. Schaal 2005, for a review).
2.3.1 The associative nature of foetal learning

Do events exist in utero that can create contingent relationships between stimuli captured through
different sense modalities, and is the foetal brain able to detect such crossmodal relationships? Let
us begin with the impact of fortuitous events imposed by experimenters on the foetal acquisition
of co-occurring odour cues. The induction of a visceral malaise is known to create, in a single
pairing episode, a strong aversion to any associated smell in adult humans and animals (Bernstein
1991), as well as in neonatal animals (Rudy and Cheatle 1977). Accordingly, this associative
ability was used to demonstrate foetal chemoreception in the first place. This demonstration
consisted of causing a digestive malaise in the rat foetus by the intraperitoneal injection of lithium
chloride (LiCl) immediately after a flavouring agent (apple) was infused in utero on the last ges-
tational day. In tests conducted 2 months later, the animals exposed to the apple–LiCl association
as foetuses avoided drinking apple-flavoured water more than did the control groups, who were
exposed to either the malaise alone, the apple flavour alone, or to a sham treatment. This result
demonstrated that the rat foetus can associate an arbitrary chemical cue made contingent on a
state of sickness, and that an enduring aversive representation of the odour can be persistently
stored beyond birth (Smotherman 1982; Stickrod et al. 1982).
The rat foetus is also responsive to psychobiological alterations that constitute everyday occur-
rences in prenatal life, such as transitory fluctuations of internal or external conditions. For
example, the incidental compression of the umbilical cord caused by foetal posture can result in
brief periods of hypoxia, inducing motor activity that contributes to the relief of the mechanical
cause of hypoxia (Smotherman and Robinson 1988). Such physiological variations can engage
rapid differentiation in the salience of coincidental stimuli. This has been demonstrated by
clamping the umbilical cord in order to mimic brain oxygenation upheavals. The results showed
that an odour infused in utero in association with decreasing brain oxygenation induced by
clamping the cord takes on an aversive value, whereas the same odour made contingent with the
release from hypoxia after unclamping the cord takes on an attractive value in the foetal brain.
This differentiation appears stable as it is demonstrable in 12 day-old rats (Hepper 1991a, 1993).
Thus the foetal brain has (in rodent foetuses at least) the ability to olfactorily tag somatic or brain
states through the sensing of undefined interoceptive cues caused by physiological events.
Other sources of intersensory contingencies that might engage the chemical senses are nor-
mally iterative in the typical foetal environment. For example, near-term, human foetuses exhibit
bursts of pseudo-respiratory movements, whereby amniotic fluid is propelled through the foetal
mouth and nasal passageways (Badalian et al. 1993). Continuous recording of these foetal respira-
tory movements using ultrasonic real-time visualization has shown that they are organized in
cycles (Patrick et al. 1980) closely related with maternal meals and subsequent variations in blood
glucose (Patrick et al. 1980, 1982). Maternal glucose transfer to the foetus most probably parallels
the transplacental transfer of chemosensorially-effective metabolites (aromas, flavour agents)
although the relative transfer kinetics of the different compounds is unknown. Thus, at each
of the pregnant mother’s meals, a foetus may be exposed to more or less coincident inflows of
odorous or flavoursome metabolites and nutrients, which at the same time may affect its sensory
processes and its arousal and physiological state. Hence the foetus might be able to learn that such
chemosensorially-efficient metabolites co-occur with some kinds of interoceptive changes in
brain or in somatic states.
Finally, another instance of ecological context promoting the linkage between chemosensory
inputs and inputs from the other senses occurs during the transition from foetal to neonatal
stages of development. Based on the observation that exposure to uterine contractions during
labour was predictive of successful nipple attachment in neonatal rats, Ronca and Alberts (1996)
compared the response to a nipple scented with lemon flavour (citral) in a group of pups exposed
to it concurrently with bodily compression mimicking labour contractions and in a group of pups
exposed to it without compression. The results showed that 88% of compressed pups seized
the scented nipple as compared to only 20 % of non-compressed pups. Thus somesthesic,
kinesthesic, and proprioceptive stimuli that simulate uterine contractions induced the learning
CAN ODOURS BE DRAWN INTO MULTISENSORY PROCESSES IN THE FOETUS? 39
of a contingent odour stimulus that promoted later rat pups’ oral responses. The underlying
processes have been suggested to reside in a state of generalized arousal and neurochemical cor-
relates which are triggered by head compression, and which promote the acquisition of concur-
rent stimuli. Facilitated odour-learning related to the type of birth has also been suggested in the
human case, where the impact of a 30-minute exposure to an odour differs after vaginal delivery
and related contractions, or after caesarean delivery, which take place in very different proprio-
ceptive and tactile conditions (Varendi et al. 2002).
Taken collectively, all above studies suggest that the foetal sensing of an organism’s own physi-
ological state and the consequences for arousal could provide favourable conditions for the
encoding of co-incidental stimuli, namely odours and flavours. Such conditions constitute a
functional ground to shape early intersensory equivalence in terms of psychophysical properties
or affective meaning of temporally coincident stimuli in any sensory modality. For example,
as foods eaten by pregnant women may change in composition each day, foetuses may record
several odour/flavour cues in association with metabolic changes, leading them to learn a range of
stimuli that will elicit equivalent attractiveness when reencountered in the neonatal environment.
Otherwise, studies have suggested that foetuses can acquire cues that are contingent with a mater-
nal behaviour state, such as the distinctive musical tune of a popular television program acquired
while the mother relaxes watching it (Hepper 1988b). As compared to non-exposed foetuses,
those exposed to this music throughout pregnancy display selective motor activation upon its
playback when tested in utero at gestational age 36–37 weeks, and a discriminative heart-rate
and motor response when tested as newborns 2–4 days after delivery (Hepper 1991b). Strong
conclusions cannot be drawn from this research, as it may have multiple interpretations. Foetuses
may have learned the iteratively played auditory stimulus, or they may have associated the
external sound with the relaxed, silent state of the mother; still another possibility involves a
third component: the television-watching mothers were in the habit of ‘settling down with a cup
of tea’ (Hepper 1988b, p. 1348) and hence introduced chemosensory cues to the unfolding
sequence of sensory events whilst watching television. The contingent exposure to regular flavour
(or psychostimulant) cues may serve in making the distinct melody even more salient and
may also have given rise to an expectation of the melody (or conversely, the melody may give
rise to a chemosensory expectation). Although the relative validity of the above interpretations
cannot be determined from the available data, this study points to a potentially suitable ecological
context within which to assess multisensory, including chemoreceptive, learning in the human
foetus.
2.3.2 Can foetuses extract supramodal properties from

odour stimuli?
Another issue concerns whether foetuses can extract supramodal properties, i.e. informative cues
that are common to several sensory modalities. It is generally considered that mammalian
foetuses dwell in an environment where highly intense stimuli are filtered out. Thus, the sensory
flow to which they are ordinarily exposed can be characterized in all sensory modalities as being
low-to-moderate in intensity (Lecanuet and Schaal 1996). Specifically, odorants are probably
presenting at very low concentrations to the developing chemosensors, as attested by the ratio of
the amount of aromas that reach the amniotic fluid to the amount that a pregnant mother takes
in. For example, when mothers ingested capsules filled with a gram-wise quantity of powdered
cumin in the days preceding parturition, only traces of molecular markers of cumin aroma could
be chromatographically detected in the amniotic fluid (Schaal 2005). Nevertheless, such trace
amounts of odorants can be detected by the untrained adult nose (Mennella et al. 1995; Schaal
2005; Schaal and Marlier 1998) and by the foetus itself, as inferred by the fact, mentioned above,
that neonates appear to remember those odours that they have encountered prenatally. Thus a
general intensity attribute occurring in the womb context might be olfactorily extractable by the
foetal brain. Such exposure to generally low-intensity stimuli during the prenatal formative
period of perceptual processing may, in part, explain the pattern of intensity-based responsive-
ness of neonatal and infantile organisms. Schneirla (1965) proposed that early reactions to stimu-
lation are fundamentally dichotomous. He argued that low-intensity tactile/thermal stimuli elicit
approach responses whereas high-intensity stimuli elicit avoidance responses and corresponding
patterns of autonomic activity. This hypothesis has been empirically addressed for audition and
vision (Lewkowicz and Turkewitz 1980; Lewkowicz 1991) but not yet for olfaction. It may be
predicted that suprathreshold odorants will be more attractive (or less aversive) to neonates
when presented at attenuated intensities. Furthermore, intensity-based equivalences between
modalities could potentially be assessed by contrasting odour stimuli that have been intensity-
matched to stimuli in other modalities versus odours that are mismatched in intensity.
2.4 Intersensory involvement of olfaction in the newborn

2.4.1 Neonatal chemosensory integration in action
The olfactory processes that operated in the foetus are further solicited in the newborn infant,4
binding with a wider range of sensory inputs, response resources, and feeding into growing
memory abilities. Unisensory investigations of olfactory responsiveness in newborns indicate that
mammalian, including human, neonates are born with olfactory abilities that are strongly con-
nected with action systems from birth (see reviews by Alberts 1981, 1987; Blass 1990; Ganshrow
and Mennella 2003; Rosenblatt 1983; Schaal 1988, 2005, 2006).
An infant’s chemosensorially-guided actions indeed constitute an important component of the
future integration of multisensory experience. Olfaction directs sensory-motor performance in a
number of different ways (cf. Fig. 2.2). Firstly, odours are potent regulators of activational states in
inhibiting high-arousal states and in promoting the advent of calm awake states (Doucet et al.
2007; Schaal et al. 1980; Sullivan and Toubas 1998). Such state control is followed by heart rate
stabilization, the reduction of respiratory variability, the mobilization of low amplitude move-
ments, cephalic alignment, and often eye-opening and ‘gazing’, and the activation of oral and
lingual movements. All of these variations are indicative of autonomous orientation, interest,
attention, and attraction. In olfaction, stimulus sampling is expressed as sniffing, i.e. a respiratory
pattern that optimizes the airflow over the nasal mucosa. Sniffing varies with the quality and
intensity of odorants and their hedonic value in adults (Bensafi et al. 2005). Although it does not
yet operate on a volitional basis, three-day-old newborns also adjust their nasal airflow depending
on whether it carries a pleasant/unpleasant odour quality (Soussignan et al. 1997).
Secondly, more active odour-guided responses can be seen in orienting movements of the head.
A two-choice test first developed by Macfarlane (1975) and perfected in subsequent studies
(Cernoch and Porter 1985; Schaal et al. 1980, 1995c, 1998; Schleidt and Genzel 1990) capitalized
upon odour-released head movements towards bilaterally-presented pairs of stimuli in newborns
laying supine in their crib or sitting upright in a seat (cf. Fig. 2.2). Using this kind of test, several
laboratories have analyzed neonatal responsiveness towards odours acquired in the maternal
ecology, either prenatally or postnatally in the context of nursing (see below).
4 The human neonatal period is considered here as corresponding to the first month after birth.
INTERSENSORY INVOLVEMENT OF OLFACTION IN THE NEWBORN 41
A C
B F
Fig. 2.2 Different ways and devices used to analyse infant behavioural responsiveness to odours
presented unimodally. (A) Odorants are sequentially presented on Q-tips to assess differential
oral–facial responses or corresponding autonomous responses (e.g. Steiner 1979; Soussignan et al.
1997; Doucet et al. 2009). (B) Responsiveness of infants directly exposed to the mother’s breast
after different odour-related treatment (e.g. Doucet et al. 2007). (C–D) Paired-odour test devices
allowing researchers to assess infants’ relative head orientation and general motor responses while
laying supine (C, Macfarlane 1975; D, e.g. Schaal et al. 1980). (E) Paired-odour choice test for the
assessment of differential head-turning and oral activation toward either odour stimulus (e.g. Schaal
et al. 1995c; Delaunay-El Allam et al. 2006). (F) Differential rooting or crawling movements of the
infant toward a target odour source (e.g. Varendi and Porter 2001) (Drawings: A–E, © B Schaal; F,
redrawn after Prechtl 1958).
In sum, the action systems and sensorimotor configurations released by olfactory cues may
have multiple, cascading consequences on intersensory processes in newborns. When an infant
turns its head in a given direction in response to an odorant, usually he or she is also exposed
to other kinds of stimuli associated with the target object or person in a given context. This leads
to opportunities to create novel associations or to update old ones, as well as to educate spatial
representation of the body and embodied representation of self, or the actions of self, in space
(Aglioti and Pazzaglia 2010; Bremner et al. 2008; Chapter 5 by Bremner et al.).
2.4.2 Nursing and other social contexts that

‘intersensorialize’ odours
The prototypical context of mother–infant exchanges in any mammalian species is nursing.
The transfer of the multiple biological benefits of milk goes on within a concert of sensory
influences. All kinds of sensory inputs are then available and, in theory, all kinds of multisensory
events can be selected by the neonatal brain. The manifold sensory, motor, and reward-related
elements of the nursing context provide a repertoire of cues towards the recognition that
certain events recur in a similar context (with the same person) with similar consequences.
Nursing therefore provides a potent contextual basis that pulls sensory stimuli into a multisen-
sory perceptual framework.
This process has been well analyzed in the rat where passive (warmth) and active tactile stimuli
from the nursing female and etho-physiological events linked with milk intake have been shown
to be causal in assigning attractive value to any arbitrarily associated odour stimulus (e.g.
Rosenblatt 1983; Alberts 1987; Blass 1990; Brake 1981). This bimodal correspondence between
tactile and olfactory inputs is acquired even more efficiently when the odour has previously
gained some predictability through prenatal exposure (Pedersen and Blass 1982). Although the
database is currently meagre regarding the human newborn, converging evidence exists to suggest
that the rate of contingent odour-nursing exposure affects the development of odour preference.
A human mother’s breast odour elicits increasingly reliable positive head orientation in breastfed
infants as a function of suckling experience. By six days of age, infants can reliably orient their
head to the mother’s breast odour (Macfarlane 1975; but less demanding behavioural variables
led to discriminative responses from postnatal days 2 or 3—see Schaal et al. 1980; Doucet et al.
2007). Likewise, nursing-related exposure to maternal stimuli can help to explain why it is that
15 day-old breast-fed infants, but not bottle-fed infants, are able to recognize their mother’s
(axillary) odour (Cernoch and Porter 1985). Even arbitrary odorants can be engaged in the
approach and appetitive responses of neonates after their recurring association with nursing
(Schleidt and Genzel 1990; Delaunay-El Allam et al. 2010). After they were exposed to a chamo-
mile odorant painted on the breast at each feed, three-day-old infants oriented their head (nose)
towards the odour more than to a paired scentless control stimulus (using the test method pic-
tured in Fig. 2.2; Delaunay-El Allam et al. 2006). However, when the chamomile odour was paired
with the odour of human milk in this two-choice test, both stimuli proved to be equally attractive,
indicating that different odour qualities associated with the same context of reinforcement can
thereafter share similar hedonic value (Delaunay-El Allam et al. 2006).
Other developmental niches also promote the associative linkage of odorants with stimuli in
other modalities, such as non-suckling contact comfort and, in nonhuman newborns, huddling
with siblings. Regarding olfaction, the influence of such tactile stimulation on olfactory learning
has begun to be characterized. For example, Brunjes and Alberts (1979) demonstrated that in rat
pups odours gain attractive meaning in both suckling and non-suckling interactions with the
mother. They suggested that temperature was instrumental in this process. When human infants
are exposed to episodes of gentle massage (for ten 30-second periods) together with lemon odour
on postnatal day 1, the odour elicits positive head turning responses when presented separately
the following day (Sullivan et al. 1991). Control groups of infants exposed either to just the
massage, just the odour, or to the odour followed by the massage did not exhibit any differential
response to the lemon odour. It is noteworthy that the touch-then-odour contingency
was required for the learning of an unfamiliar odour to occur and that the odour-then-touch
condition was unsuccessful in the one-day-old infant. This order effect of stimulations may be
related to differences in the arousing properties of touch versus olfaction. Touch-related bodily
sensations may be more efficient in mobilizing attention by their alerting and/or pleasurable
properties, whereas an unfamiliar odour may not appear to induce a similar excitatory/hedonic
effect. Thus a hedonically neutral odour can change into an alerting or attractive stimulus after a
short association with pleasurable massage. However, whether a familiar or pleasant odour can
conversely change the meaning of a tactile stimulus remains to be tested.
So far, no experiment has addressed the effect of visual or auditory stimulation on the functioning
of the chemical senses in newborn infants. Of course, this does not mean that there is an absence of
INTERSENSORY INVOLVEMENT OF OLFACTION IN THE NEWBORN 43
interaction. However, there are a number of studies that have documented effects in the converse
direction, namely influences of the chemical senses on other modalities. Taste has been demon-
strated to have strong modulating potency on neonatal responses to painful stimuli (Blass et al.
1990). An ‘analgesic’ effect similar to that of sweet taste has also been observed with olfactory
stimuli. When, on the occasion of a heel-prick procedure, newborns were exposed to the odour
of their mother’s milk (when separated from their mother) or to a non-maternal but familiar
odour, the behavioural manifestations of pain were attenuated as compared to infants receiving
an unfamiliar odorant or water (Rattaz et al. 2005). The effect of an olfactory stimulus on the
reduction of pain reactions is more efficient, however, when it is used additively with tactile con-
tainment of the whole body (achieved by swaddling) (Goubet et al. 2007). The pacifying effect of
a familiar odour has also been noted in premature infants (Goubet et al. 2003), indicating that an
early-developing process of recognition of olfactory recurrence is a key factor in alleviating the
response to negative or noxious stimuli. Interestingly, all sense modalities seem not to be equiva-
lent in their ability to block pain afferents or efferents (e.g. audition: Arditi et al. 2006; vision:
Johnston et al. 2007), and olfaction and taste may bear special properties in this respect because
of their precocious (Johnston et al. 2002) and privileged connections with reward processes
(Anseloni et al. 2005, 2002; Pomonis et al. 2000).
Finally, olfactory stimuli might also modulate visual activity. When exposed to their mother’s
breast odour immediately before nursing, awake and calm newborns display longer episodes of
eye opening as compared to a similar situation where the odour is masked (cf. Fig. 2.3; Doucet
et al. 2007, but this is significant for boys only). The processes underlying such olfactory–visual
interaction may reside in the facts that:
◆ a familiar odour stimulus is arousing and, hence, stimulates all-purpose sensory seeking
activity, including visual orienting;
◆ breast odours are already associated with expectancies for visual/auditory/tactile/taste
reinforcements, hence triggering the multisensory intake of information (e.g. Korner and
Thoman 1970);
◆ in three-day-olds, breast odour is already part of an organized activity pattern mobilizing
vision and touch/temperature sensing to boost the localization of the nipple.
This latter point is backed up by the second aspect of the observed olfactory–visual interaction
in Doucet et al.’s (2007) study, namely that infants exposed to breast odour when their eyes are
open tend to display augmented oral activity (as compared to the situation when olfaction and
vision are not stimulated or are stimulated separately; cf. Fig. 2.3). This finding, as well as the
additive odour–tactile effect on pain response regulation mentioned above (Goubet et al. 2007),
accords with Bahrick and Lickliter’s (2002) intersensory functioning hypothesis, which states that
‘information presented redundantly and in temporal synchrony to two or more sense modalities
recruits infant attention. . .more effectively than does the same information presented to one
sense modality at a time’ (2002, p. 165; see also Chapter 8 by Bahrick and Lickliter). Here, concur-
rent and complementary olfactory and visual inputs recruit more intensive oro-motor actions
than either of these inputs on their own. In this case, the focus of the actions is primarily aimed at
orienting to the mother and grasping of the nipple.
In sum, at the start of postnatal life, olfaction, so far mostly understood unimodally, supports
adaptive responsiveness. One prominent goal of adaptive behaviour in any mammalian newborn
organism is to acquire milk and to rapidly increase its ability to acquire it at the minimum of cost.
There is a considerable urge for human newborns to ingest colostrum and milk to counteract the
dangers of bacterial predation (Edmond et al. 2007a, b). Thus, any perceptual means that can help
speed-up neonatal performance in the maternal environment can only be beneficial for survival.
Relative duration of eye opening Duration of oral activation (sec.)

0.6 14
B C
12
0.5 10
8
6
0.4
4
2
0.3 0
Breast odour No odour Eyes Open Closed Open Closed
Odour Present Masked

Fig. 2.3 Odour–vision interaction at the breast. (A) The testing situation: awake 3–4 day-old infants
held in a cradle prior to a feed are exposed to their mother’s breast without contact; the breast is
either uncovered or covered with a plastic film to mask its odour (Photograph: Sébastien Doucet).
(B) Relative duration of eye opening of infants when exposed to the breast as a function of breast
odour availability (in the no-odour condition, the breast odour was masked). The duration of eye
opening was longer when the infants were facing the odorous breast as compared with the odour-
masked breast. (C) Duration (sec) of oral activation (rooting, licking, sucking) as a function of
olfactory (breast odour present or masked) and visual inputs (eyes open or closed). Longer oral
activation was noted when infants were simultaneously exposed to the sight and odour of the
breast as compared to the other conditions, which are equivalent (Adapted from Sébastien Doucet,
Robert Soussignan, Paul Sagot,and Benoist Schaal, The 'smellscape' of mother's breast: Effects of
odor masking and selective unmasking on neonatal arousal, oral, and visual responses,
Developmental Psychobiology, 49 (2), pp. 129–38, (c) 2007, John Wiley and Sons, with permission.).
Multisensory integration is certainly an essential mechanism in the rapid improvement of neona-

tal performance (Rosenblatt 1983). The observed facts suggest that ‘olfaction may be important
in early behavioural development precisely because it is particularly well suited to mediate the
transition from neonatal responses based upon the intensity characteristics of tactile and thermal
stimulation to those based upon stimulus meaning’ (Rosenblatt 1983, p. 363). This proposal
raised for mammalian altricial (rat, cat, dog) newborns certainly has general validity for other
mammalian newborns regardless of their state of natal maturation.
2.5 Intersensory binding of olfaction in infancy

and early childhood
During the months following the neonatal period, olfaction will be subject to further multisen-
sory optimization with respect to maturing motivation, action, and cognitive systems. During the
first year of life, infants progressively categorize objects, differentiate self from others, delineate
selective attachments, and become capable of producing increasingly complex and intentional
INTERSENSORY BINDING OF OLFACTION IN INFANCY AND EARLY CHILDHOOD 45
actions to explore the physical and biological world, and to communicate with their social
surroundings. What roles does olfaction play in the development of knowledge about the physical
and social environment? How and when is it involved in the dynamic processes that underlie
adaptive cognition, in terms of attention, motivation, learning, memory, preferences, abstraction,
and predictive abilities?
2.5.1 Odours as foreground cues to multisensory events

A small set of experiments indicates that infants and children are adept learners of
multisensory correspondences involving olfaction. Fernandez and Bahrick (1994) studied the
ability of 4-month-old children to pair an arbitrary odorant—cherry—with an arbitrary object.
Following a period during which an object (A) was systematically linked with the odour,
the infants were given a preference test between objects A and B, with and without the
odour. After familiarizing the infants with the appearance of both objects, the previously
odorized object and the control object were presented alternately for two 30-second trials each.
In the test session, the infants looked more at object A in the presence of the cherry odour than
in its absence, showing that they were able to associate an object with a distinctive odour. It
may be noted, however, that this capacity was observed only in female infants, pointing to
the possibility of early sex-related differentiation in the ability to detect contingences between,
or to bind, odours and stimuli in other sensory modes. Using a similar, well-counterbalanced
paradigm, Reardon and Bushnell (1988) served 7-month-old children with apple sauce, flavoured
either sour or sweet and presented in red or blue cups. The infants were then explicitly introduced
to the colour of the cups, and fed alternately from each cup with the contrasting flavours of
apple sauce. After this colour-flavour pairing session, each infant was invited to choose one or
the other cup by arm-reaching to the pair of cups presented at a distance. A significant proportion
of the infants selected the cup associated with the sweet stimulus, a choice that could only rely
on visual cues in the conditions used. Thus, both of these experiments provide evidence
that infants aged 4–7 months are prone to associate arbitrary odours/flavours with co-occurring
visual cues. Indeed, very few pairing trials were needed to acquire the contingency (two 30-second
odour–visual trials in Fernandez and Bahrick’s study and three taste–visual trials in Reardon
and Bushnell’s study). The opposite matching tasks in both studies (odour to visual in Fernandez
and Bahrick, visual to flavour in Reardon and Bushnell) suggest the possibility of symmetric
binding processes. In addition, presenting only the visual cue in the choice task, the Reardon and
Bushnell study indicates that the infants very easily monitored colour as predicting an absent
flavour.
The occurrence of intersensory contingency learning involving biologically-relevant stimuli
further confirms the early associative readiness of olfaction, but with important qualifica-
tions. So far, such associations have been analyzed mostly in the context of ingestion and
social interaction so that one element in the intersensory process is, or concerns, adaptive
psychobiological responses. The domain of feeding is indeed especially suitable to investigate
the integration of chemosensory events with either beneficial or harmful interoceptive
consequences.
In the first case, children exposed to distinct novel flavours in drinks that differ in carbohydrate-
based energetic content express subsequent preference for the flavour of the more energetic drink
over the less energetic drink (Birch et al. 1990). In a replication, where sweet taste was decoupled
from post-ingestive sensation, a similar linkage between novel flavours and monitored interocep-
tive consequences based on post-ingestive nutritional effects were obtained for foods differing in
fat content in children aged 2–5 years (Johnson et al. 1991). So far, the exact nature and locus of
food-related interoceptive cues that associatively bind with flavours remain unclear (Yeomans
2006), and their discussion is beyond the scope of the present chapter.
The affectively-opposite linkage between flavours and post-ingestive consequences has been
established in conditioned aversion. Children (aged 2–16 years) who are being treated with toxic
chemotherapy exhibit radical changes in the chemosensory appreciation of any associated food
(Bernstein 1978). When given an unusually flavoured ice-cream before drug administration,
the participants exposed to a toxic drug inducing gastrointestinal malaise rejected the ice-cream
2–4 weeks later, as opposed to control children receiving the drug unpaired with the ice-cream or
the ice-cream unpaired with a toxic drug. This aversion persisted for at least 4.5 months after
the initial flavour–nausea pairing. It is notable that positive flavour–interoception associations
mentioned above needed several pairing trials to establish (e.g. eight trials in Johnson et al. 1991),
while Bernstein’s negative flavour–interoception association was specified in a single pairing trial
and remembered over long periods of time. The crossmodal incorporation of chemosensory
stimuli involving interoceptive cues thus appears differentiated in terms of adaptive outcome.
In real-life conditions, these early interactions between negative interoceptive cues and flavour
percepts are highly prevalent and leave persistent memories that inhibit the intake of similarly
flavoured foods over the course of a lifetime (Garb and Stunkart 1974; Logue et al. 1981). But
even stimuli that do not provoke interoceptive malaise but evoke disgust or fear, for example
visual stimuli (e.g. a cute kitten versus the open mouth of a bat), can be enough to durably influence
the hedonic meaning attached to odorants (Hvastja and Zanuttini 1989).
Another essential setting for the establishment of arbitrary relations between multisensory
events involving olfaction is the social environment. Indeed, caretakers and other conspecifics
provide recurrent occasions for an infant to acquire a complex set of cues that characterizes their
appearance (face, eyes, hands, dynamic behaviour) at the same time as their vocal/verbal, tactile,
vestibular, and olfactory features. These perceptual properties of people are idiosyncratically
arranged in time, space, multisensory complexity on different occasions, and dispositions to
interact. Much of the research on infants’ and children’s multisensory perception of these percep-
tual and dispositional properties of people has been into responsiveness to auditory and visual
cues and their relationship (cf. Lewkowicz and Lickliter 1994, and the chapters therein), but there
has been little research on the developmental mechanisms by which olfaction contributes to social
perception (beyond the established fact that odours are part of the cues that mediate person iden-
tification; e.g. Ferdenzi et al. 2010; Mallet and Schaal 1998; Olsson et al. 2006; Weisfeld et al.
2003). Again, olfaction is assumed to operate in human multisensory social cognition in two
ways. Firstly, odours gain meaning as a result of interaction with the multisensory reinforcing
base constituted by the mother and other people. Secondly, by virtue of their reinforcing proper-
ties, odours can precipitate the learning of social stimuli in other sensory modalities.
Human newborns display subtle abilities to recognize significant individuals (mother) or
classes of individuals (lactating women), as inferred from their differential attraction towards
their odour. The reliability of such early recognition abilities is directly linked with the recurrence
of exposure to conspecifics (Macfarlane 1975). Such early socially-acquired olfactory memories
can persist into infancy for months or years. For example, following exposure to an arbitrary
odorant while suckling during the first postnatal days engenders memories that are traceable at
the age of 7 months, and up to 21 months (Delaunay-El Allam et al. 2010). Thus odour cues
acquired in the multisensory context of the mother’s body can be transferred into competent
responses in domains in which the multisensory assortment of the initially-learned sensory cues
is radically dissimilar, such as when interacting with inanimate objects or toys (Delaunay-El
Allam et al. 2010; Mennella and Beauchamp 1998). This also applies to the food domain, where
familiarization to a given flavour in utero or in lacto influences the subsequent appreciation of the
same flavour despite blatant departure of the actual multisensory context (a non-milk food in a
cup or a spoon) from the acquisition context (breast feeding) (Mennella et al. 2001). In this way,
stimuli unprecedented in the context of food (e.g. texture, temperature) may gain attention-
evoking properties and affective equivalence with the chemosensory stimulus experienced
earlier.
Olfaction has a manifest role in an infant’s building of multisensory social representations,
although we do not yet fully understand this role. There are multiple ways to assess the influence
of odours in early social cognition. One is to observe the effects of adulterating the previously
encoded odour features associated with a given conspecific, via olfactory masking or suppression.
This approach has been successful in showing the prominence of odours in the interaction with
the mother in non-human altricial infants (Rosenblatt 1983; Blass 1990), as well as in human
newborns (Kroner 1882; Preyer 1885; Doucet et al. 2007). In puppies, kittens, or rat pups, the
perturbation of the maternal (or nest) odour tends to induce restlessness and distress responses
as if the mother (or nest) were not present or not recognized. Similarly, in human neonates,
spreading intense alien odorants on the nipple causes aversion and crying (Kroner 1882; Preyer
1885); simply removing the natural odour of the breast markedly reduces the infants’ responses
that indicate their wanting to grasp the nipple (Doucet et al. 2007).
Little evidence exists of such effects in older human infants, but some interesting data are at
hand in non-human primates. Harlow, in his early maternal deprivation and surrogate-rearing
experiments, deconstructed the multisensory array of cues that female rhesus monkeys convey to
infants. Following this research, Harlow emphasized the importance of comfort contact (Harlow
and Harlow 1965; see also Gallace and Spence 2010), but never seemed to recognize the impor-
tance of the confounded smell infants spread to, and experience on, a cuddly surrogate. The effec-
tive role of olfactory cues in the representation of the mother was noted subsequently in infant
squirrel monkeys (Saimiri). When mothers were sprayed with artificial odorants, their offspring
did not display the typical visual preference for her against a control female (Redican and Kaplan
1978). Thus, in Saimiri infants aged 1–5 months, the olfactory ‘disfiguration’ of the mother was
not compensated for by her visual identity and behaviour, and accordingly might be considered
to have altered the multisensory representation of the mother. A previous experiment had shown
that Saimiri infants rely more heavily on odour properties than on (static) visual properties of
the (anaesthetized) mother (Kaplan et al. 1977). Similarly surrogate-reared Saimiri infants (aged
1–3 months) exhibit a clear recognition of and preference for their own body odour impregnated
in the cloth covering the surrogate, regardless of its visual aspect (colour; Kaplan and Russell
1973). Using the surrogate-rearing paradigm, Kaplan et al. (1977) further explored the infants’
ability to associate a given odour and colour in two groups exposed to two conditions of colour–
odour pairing (green-floral, GF, and black-clove, BC) over the first 6 months of life. Much to the
researchers’ surprise, both groups differed in the salience assigned to the odour or the colour in
choice tests between surrogates contrasting in odour, colour, or both: While the GF group was
more consistent and precocious in choosing the rearing odour regardless of the colour, the BC
group responded conversely—these infants based their choice much earlier on the rearing colour,
and did not care about the odour. As both colours were pilot-tested to elicit equal attraction, this
difference resided in the odours that indeed elicited differential sniffing behaviours. The clove
odorant appeared a posteriori to either induce avoidance or to have pharmacological effects,
which led the Saimiri infants exposed to them to rely on colour in their selective response.
This set of studies is interesting, first because it illustrates that odorants are not easy-to-
manipulate: their multiple impacts (olfactory, but also trigeminal, or even pharmacological) and
dose-related qualitative variations can strongly affect outcomes. Second, despite evidence for an
apparent dominance of olfaction in Saimiri infants, Kaplan et al.’s (1977) study serendipitously
indicates that Saimiri infants concurrently monitor the visual and olfactory properties of
conspecifics. Furthermore, visual cues can compensate when olfactory cues fail in some way.
Finally, these studies raise important questions about underlying mechanisms responsible for the
effects of odours attached to social relations: Do they have a psychobiological impact by them-
selves in acting unimodally on the pathways that control arousal and distress responses? Or do
they act through higher cognitive mediation, gating the multisensory perceptual gestalt of the
mother or of the self that regulate affective responses? Or, finally, do both types of perceptual loop
come into play in the control of behaviour at different ages or developmental stages? In older
infants (Schaal et al. 1980), children (Ferdenzi et al. 2008), and adults (McBurney et al. 2006;
Shoup et al. 2008) it is clear that the odour of significant others is sought for its calming effects.
But even then, despite easy introspective assessment (in both children and adults), it is not
yet clear what the underlying perceptual/affective processes governing these behaviours are.
Nevertheless, providing the olfactory essence of the mother (or parent) is a common and appar-
ently effective practice, which can be used to manage an infant or child’s affective upheavals
caused by separation or distress. This ‘transitional object’ practice, whereby an odour is (for a
certain time) substituted for the physical presence of a significant other, may be a productive
context in which to empirically explore the emergence and development of multisensory social
processes involving vision, touch, and olfaction, and their affective and cognitive mediation and
consequences.
Thus during early development, odour percepts appear to become integral parts of object or
person representations, as suggested by the fact that altering only the olfactory facet of objects/
persons appears to degrade the recognition of such objects/persons as being familiar. The proc-
esses underlying such multisensory integration of odours are certainly variable at different ages
(e.g. Pirogovsky et al. 2009). They may be facilitated in early developmental stages, when odour
pairing is mandatory during presumed sensitive periods or when chemosensation may have
greater salience relative to vision (Lewkowicz 1988). Suggestive results indicate that when the
odour is novel, no matter whether it is pleasant or unpleasant, the object that carries it appears to
be treated indiscriminately with regard to the object’s visual–auditory–tactile novelty (Schmidt
and Beauchamp 1989, 1990; in 9-month-olds). In contrast, when the object is presumably novel
in terms of visual–auditory–tactile features, and when the odour is familiar, a different outcome
becomes apparent. Mennella and Beauchamp (1998; in 6–13-month-olds) and Delaunay-El
Allam et al. (2010; in 5–23-month-olds) noted that infants sequentially presented with identical
toys differing in odour, one novel and one to which they had been previously exposed during
breast-feeding, explored the object with the familiar odour more. Finally, when neither the odour
nor the object are familiar, 7–15-month-old infants prefer to interact with the unscented rather
than with the scented version of the object (Durand et al. 2008). Interestingly, though, in this lat-
ter object exploration experiment, the scented and unscented objects did not appear to be sig-
nificantly differentiated immediately, but only after several minutes of manipulation and
mouthing. Thus, infants may need some exposure to an object before attending and reacting to
its odour. An alternative possibility is that the odour is perceived immediately, but that the reac-
tions to it are postponed by competing processes, such as the dominance of other sensory systems
mobilized by actions on the object. Thus when interacting with objects, infants’ attentional
resources may be captured to first process the properties that are most immediately meaningful
in specifying the objects’ physical nature and potential affordances. Odours, supposed to have less
predictive value, may thus be treated secondarily. Although odour cues always co-occur with
other object cues, they cannot be as directly ‘observed’ as visual, tactile, or sound cues by infants
and may be less relevant to learning about object function. Therefore, infants may be so engrossed
with the visual and tactual properties of objects that their attention to their olfactory properties
may appear, at first, overcome. However, making odorants more salient by manipulating their
intensity can reverse this effect. When asked to rank by preference four bottles containing four
differently coloured flowers associated with four different scents, 3–5-year-old children relied on
colour when the scents where delivered at low intensities, but they relied on odour information at
higher odour concentrations (Fabes and Filsinger 1986).
In sum, the sensory system(s) that is (are) prevalent in the control of attitudes, decisions and
actions might change within the course of a behaviour sequence underlying object exploration.
This corroborates Ruff et al.’s (1992) proposal that children’s exploratory behaviour may be
organized as a succession of habituations in the different modalities involved to the different cues
emitted by an object. Thus, as suggested by Turkewitz (1994), an actogenetic sensory dominance,
i.e. the relative salience of sensory cues that unfolds during the realization of an action, may be
dissociated from the more established notion of ontogenetic sensory dominance.
2.5.2 Odours as background cues to multisensory events

Another set of studies has investigated whether odours diffused as background cues (i.e. as stimuli
that are not directly relevant to the learned stimuli or contingence between them) can be moni-
tored as implicit cues to multisensory events. Numerous studies have demonstrated that adult
humans encode contextual odours, sometimes outside of awareness or of any explicit focus of
attention on them. For example, ambient odours can be encoded as cues to reinforcing events or
outcomes: When paired with a stressful task, they negatively affect subsequent cognitive perform-
ance in adults (Kirk-Smith et al. 1983; Zucco et al. 2009). Such integration of an undetected
contextual odour can even control actual behaviour. For example, when tested in a room suffused
with a citrus odorant that is evocative of cleaning, participants are more prone to respond to
cleaning-related words, to report having the intention to clean when at home, and to exhibit
effective cleaning gestures while eating (Holland et al. 2005). Finally, the presentation of odour
and visual cues engages the tracking of functional links in space, time or common affordances:
When asked to rate whether a set of odours matched with pictures of everyday settings, adults
assign a better fit between an undetected odour (e.g. coffee odour) and a picture that contains a
visual cue linked with that odour (e.g. a coffee cup; Degel and Köster 1999). These effects of back-
ground odours on cognitive orientation or actual behaviour in adult humans have often been
obtained under conditions of unawareness that an odour is being delivered.
There is some evidence that such encoding of unattended background odours also occurs in
early development. One experimental model, the mobile conjugate reinforcement paradigm, has
been developed by Rovee-Collier and her colleagues (e.g. Rovee-Collier and Cuevas 2009), and
has lately been applied to issues concerning the significance of the olfactory background in infant
learning and memory. This paradigm consists in teaching infants to kick a foot in order to move
a mobile suspended overhead and in testing at various subsequent time points the retention of the
action–outcome contingency. They first demonstrated that visual/auditory contextual informa-
tion available during the encoding phase facilitates later recall in 3-month-olds. Long-term reten-
tion or recall degrades considerably along a period of 5–7 days, and it is completely annihilated
when the visual/auditory properties of the encoding context are modified (Butler and Rovee-
Collier 1989; Fagen et al. 1997). In summary, as the vividness of the infant’s memory for the
learned association decreases, the more the context becomes important as a reminder of the con-
tingent response. This applies to the visual or auditory properties of the learning context, but the
olfactory background of learning has also been assessed in terms of its role in infants’ later mem-
ory performance.
When 3-month-olds learned the effect of their foot kicking activity on the movements of the
mobile in the presence of an odour during two 15-minute training sessions, and were tested for
retention 1, 3, and 5 days thereafter, their responses depended on the olfactory context and on the
nature of the odorant. Those re-exposed to the same odour remembered the contingency between
their kicking and the induced mobile movement well. By contrast, those exposed to no odour
(control) exhibited only partial recall, while those exposed to a novel odour displayed no signs of
recall (Rubin et al. 1998; Schroers et al. 2007). Thus, firstly, infants detect background odours
and appear to rely on them in their retrieval of the learned kicking–object-mobility conjugation.
The differential effect of the three odour conditions on retrieval performance indicates the con-
tribution level of contextual olfactory cues to the representation of a situation dominated by
vision, action, and proprioception: The matched odour facilitates recall of the original context,
the no odour condition was followed by degraded recall, but the non-matched odour was clearly
detrimental to recall. Second, the persistence of the odour cue to the kicking–object mobility
contingence differed as a function of the odour quality. Whereas arbitrary floral/woody odours
(Lily of the Valley or Douglas fir) were not recalled five days after learning, fruity/nutty odours
(cherry and coconut odours) were. This was interpreted as resulting from the meaning of stimuli
for the infants as cues for food. However, one may doubt that 3-month-olds (whose feeding
status is unreported) can assign unprecedented odorant qualities to a functional category of
‘edible’ items, and the effect may rather be sought in some intrinsic properties of the compounds
(namely trigeminal or otherwise potentially aversive features related to their intensity or novelty).
Third, based on these and prior studies, Fagen and his associates (Rubin et al. 1998; Schroers et al.
2007) made the interesting suggestion that, at least in young infants, odour cues might not func-
tion in the same way as visual/auditory cues. These latter cues were interpreted as occasion setters,
i.e. signalling the occurrence of the learned contingency, whereas odour cues may become part of
a compound percept aggregating the mobile and the odour context. When the associated odour
is lacking, the mobile is no longer recognized as such. Thus, according to such an interpretation,
odour + visual + proprioceptive properties of the learning situation may become amalgamated
into a single multisensory representation, which may be acquired holistically by 3-month-old
infants.
This hypothetical process of blending inputs from different sensory modalities into a novel,
emergent representation reminds one of the issues raised above concerning the formation
of multisensory representations of conspecifics (or of the self): are contexts or people perceptually
constructed by the continuously updated additive encoding and storing of information from
different sensory modalities? Do these cues then become, to a certain extent, substitutable for
each other in eliciting attention, recognition, and attraction? The mother’s odour by itself can
thus separately act on an infants’ behavioural state, at least at a certain age, for a certain amount
of time, and in particular situations, in the much same way as the whole mother herself can.
Altering only her odour properties can disrupt the infant’s recognition of her and also induce
avoidance responses until other cues take control. Whether, when, and how odours can function
as cues to representations of people as whole gestalts in infants and children is an interesting area
open to future investigation.
In older children, as noted above in foetuses or newborns, background odours easily become
fused with the multisensory perceptual events that compose an emotionally arousing situation.
For example, five-year-olds were asked to resolve a maze task in the presence of a subliminal
odour for 5 min. But the maze-solving task was impossible, hence producing a feeling of failure
and stress as inferred from the actual behaviour of the participants while undergoing the
task (Epple and Herz 1999). After a distracting interlude, the children were taken to another room
for the completion of an ‘odd-one-out’ test. This room was suffused with either the same odour,
another odour, or no odour at all. The children exposed to the odour previously paired
with frustration obtained lower cross-out scores than those belonging to the no-odour or to the
different-odour groups. In another study (Chu 2008), children aged 11–13 years were selected
according to their academic achievements and only underachieving subjects were enrolled.
These children were first introduced into a room suffused with a given scent, where they had
to complete a cognitive task, the difficulty of which was exaggerated to produce a feeling of
unexpected success in association with the odour. Two days later, when these children performed
as quickly as possible ‘odd-one-out’ and ‘same-as’ tests, those re-exposed to the same scent exhib-
ited better performance than those exposed to a different one.
In both of the above studies, the contingency between odour and emotional experience led the
children to mentally label the odour as either negative or positive. What is being associated with
the background smell is unclear. Epple and Herz (1999) suggest a mechanism of emotional
conditioning, whereby a negative (positive) experience induces an emotional reaction that incor-
porates the co-occurring odour; subsequent presentation of that same odour then evokes a simi-
lar emotional reaction and ensuing negative (positive) effects on performance. Chu (2008)
proposes, alternatively, that the intervention of higher-level evaluative processes such as an
increase in self-confidence or self-esteem plays a role. However, one could explain the findings at
a more elemental level as, in the above experiments, odour stimuli were certainly associated with
a set of sensory cues that were differentially attended as a function of the manipulated affect.
Distinct affective experiences can be discriminated on the basis of the contrasting activation
patterns of somatic and visceral effectors (e.g. facial muscles, heart rate, blood pressure, respira-
tion rate, skin temperature, sweating, gut motility, endocrine release, etc.; see e.g. Levenson et al.
1990; Stockhorst et al. 1999). These patterns engender contrasted interoceptive states or feelings
that may function as ‘somatic markers’ (Damasio 1993). Such somatic markers of bodily state
may become paired with co-occurring exteroceptive cues (namely an odour), generating a multi-
sensory image of situations where the sentient self (Craig 2009) takes in the perception of external
events. Ultimately, the sentient self integrates interoceptive representations with environmental,
hedonic, motivational, social, and cognitive activities to produce a global emotional moment
(Craig 2009, p. 67). Such inclusive emotional experience may explain why children’s responses to
given odours can subtly differ as a function of their association with events concerning significant
others. For instance, in families where parents consume alcohol to reduce dysphoria, children
dislike alcohol-related odours more than children whose parents usually drink for convivial
entertainment (in five-year-olds: Mennella and Garcia 2000; in 5–8-year-olds: Mennella and
Forestell 2008). Similarly, children whose mothers smoke to alleviate their stress dislike the odour
of tobacco smoke more than children of mothers who report smoking for other reasons (Forestell
and Mennella 2005). Thus, seeing and empathically feeling beloved others’ emotional distress or
affliction in the presence of an odour changes the hedonic value of the odour. Therefore, our
odour perceptions are not only shaped in association with multisensory perceptions originating
in our own body, but also in reaction to someone else’s emotional state or behaviour. This might
be linked to the fact that seeing pain in others activates the same representations and overlapping
brain structures as when one is directly exposed to physical or social pain (Eisenberg 2006; Singer
et al. 2004).
To sum up, rare experiments examining odours in multisensory processes in infancy and
childhood clearly show that odours are readily integrated into multisensory percepts.
Chemosensory percepts appear to form unitary perceptual events with inputs from other sensory
modalities, regardless of whether these originate from external stimuli or from internal sensations
or emotional states. This perceptual unification is well evidenced in naturalistic social contexts
where, once ‘glued’ to a multisensory representation, odours by themselves provide efficient
cues to evoke in vivid multisensory detail the whole scene in which they were encoded. This view
of fusion of odours into unified multisensory percepts is also valid for food objects. Not only
are olfaction and taste engaged in synesthesic processes due to their unavoidable contingency
in eating (cf. Section 2.2), but they become combined with the other sensory cues in foods.
For example, this perceptual blending is so tight that odour/taste attributes of a drink are not
easily separated from its colour properties before nine years of age in an identification task
where flavour and colour were manipulated (Oram et al. 1995; see also Chapter 3 by Spence).
Another point of the current state of research on children is that when a learning contingency is
established between an odour and an affective state, this odour can subsequently reactivate the
corresponding mood and influence (positively or negatively) cognitive performance.
2.6 Conclusions
In this review, we have attempted to understand how olfaction interacts with other sensory systems
in early development, and at the same time to survey findings that can shed light on how intersen-
sory interactions involving olfaction can contribute to perceptual development. It is now clear that
smell is far from dormant in the early orchestration of action in the multisensory environment of
young organisms. In newborns, olfactory stimuli can control arousal states and accordingly tutor
attention in the other sensory modalities. By these regulatory functions, olfaction contributes early
on in life to open the brain to the multisensory stream of information.
Several experiments show that arbitrary odours are easily bound with arbitrary stimuli or
sensations in other sensory modalities, suggesting that similar encoding processes may operate
for multisensory interactions going on in natural and species-typical situations. Thus, for exam-
ple, a mother’s odour cues, in the same way as the sound of her voice (Sai et al. 2005), may pro-
mote attentiveness to the sensory information pertaining to her face and body, and hence may
facilitate identity learning. More generally, odours may operate as sensory tags for other stimulus
attributes of objects, persons or contexts, including the subject’s own internal states. This tagging
function of olfaction is supported by persistent memory processes (Engen 1991). These odour
tags can then operate in two inclusive domains of adaptive responsiveness. First, in tracking con-
tinuity in the otherwise constantly changing multisensory environment, providing the young
organism with partially continuous or overlapping sensory cues in different niches. Second, such
odour tags may contribute to the assimilation of perceptual discontinuities, in that their prior
association with familiar contexts confers on them a dose of reassuring or incentive value that
enhances the sustained intake of information in novel situations. These odour tags may derive
from (postnatal or prenatal) learning processes or from predisposed processes that do apparently
develop independently of experience. Accordingly, olfaction is to be considered as a key sense in
the multisensory organization of adaptive responses in early developmental transitions.
At all developmental stages considered above, odour stimuli were shown to become part of
interoceptive experience related to emotional challenge, physical pain, or malaise. Once paired,
often after a single contingency opportunity, these odour or flavour cues become predictive of
similar states. In other words, the chemosensory cues become linked with the cues related with
the bodily or mental state of the organism. This ubiquitous phenomenon provides a useful
paradigm for developmental investigations of intersensory processes involving olfaction.
It invites us to assess whether the separate presentation of the conditioned odour later has the
potency to evoke similar response patterns controlled by the autonomic nervous system and to
retrieve the associated group of non-olfactory reminiscences, as suggested by studies on the long-
term consequences of elation or traumatic life events (e.g. Hinton et al. 2004; Vermetten and
Bremner 2003). How far can the odour stimulus function as a metonym (Van Toller and Kendall-
Reed 1995) of the original object or context to which it has been paired? Specifically, how far
and under what conditions can the individual odour of another person give rise to expectations
about her or even be taken as the person herself? Along which multisensory developmental
pathways and time-courses will olfaction be integrated when it is compensatorily over-invested,
REFERENCES 53
as in blindness (e.g. Cuevas et al. 2009; Wakefield et al. 2004), or more or less disinvested,
as in situations where it is uncoupled from the multisensory context of parental nurturance
(e.g. in case of mother–infant separation) or of eating (e.g. in case of prolonged early tube-feeding
or enteral nutrition by gastrostomy, e.g. Harding et al. 2010)?
Finally, a wealth of studies has now established that olfaction functions unimodally in labora-
tory tasks as well as in various real-life adaptive challenges that individuals have to face through
development. Now considering the exponential increase of psychological and neurobiological
investigations that integrate sensory systems into unified perceptual processes and intercon-
nected brain structures (e.g. Calvert et al. 2004), the time is ripe to more systematically assess the
development of olfaction in the context of co-occurring inputs from the other senses. Olfactory
perceptions will reveal their full developmental significance only when we consider co-encoded
stimulations, feelings and knowledge raised by complementary sensory entries.
Acknowledgements
The authors thank Drs. Roger Lécuyer, André Holley, and Alix Seigneuric, and the editors of
the present book for their significant comments on a previous draft of the manuscript. We also
express our thanks to André Holley for the line drawing of Fig. 2.1. Finally, the authors are also
especially thankful to Giovanna Lux-Jesse for her insightful requests for clarifications and linguis-
tic skills. During the writing of this chapter, the authors were funded by the Centre National de la
Recherche Scientifique (CNRS), Paris; the Université de Bourgogne, Dijon, and the Conseil
Régional de Bourgogne, Dijon.
References
Abate, P., Pepino, M.Y., Dominguez, H.D., Spear, N.E. and Molina, J.C. (2000). Fetal associative learning
mediated through maternal alcohol intoxication. Alcoholism: Clinical and Experimental Research, 24,
39–47.
Aglioti, S.M., and Pazzaglia, M. (2010). Sounds and scents in (social) action. Trends in Cognitive Science, 15,
47–55.
Alberts, J.R. (1981). Ontogeny of olfaction: reciprocal roles of sensation and behavior in the development
of perception. In Development of Perception: psychobiological perspectives, Vol. 1 (eds. R.N. Aslin, J.R.
Alberts, and M.R. Petersen), pp. 321–57. Academic Press, New York.
Alberts, J.R. (1987). Early learning and ontogenetic adaptation. In Perinatal development. A psychobiological
perspective (eds. N.A. Krasnegor, E.M. Blass, M.A. Hofer, and W.P. Smotherman), pp. 12–38.
Academic Press, Orlando, FL.
Anseloni, V.C., Ren, K., Dubner, R., and Ennis, M. (2005). A brainstem substrate for analgesia elicited by
intraoral sucrose. Neuroscience, 133, 231–43.
Arditi, H., Feldman, R., and Eidelman, A.I. (2006). Effects of human contact and vagal regulation on pain
reactivity and visual attention in newborns. Developmental Psychobiology, 48, 561–73.
Astic, L., and Saucier, D. (1982). Ontogenesis of the functional activity of rat olfactory bulb:
autoradiographic study with the 2-deoxyglucose method. Developmental Brain Research, 2, 1243–56.
Auvray, M., and Spence, C. (2008). The multisensory perception of flavor. Consciousness and Cognition, 17,
1016–31.
Badalian, S.S., Chao, C.R., Fox, H.E., and Timor-Tritsch, I.E. (1993). Fetal breathing-related nasal fluid flow
velocity in uncomplicated pregnancies. American Journal of Obstetrics and Gynecology, 169, 563–67.
Bahrick, L.E., and Lickliter, R. (2002). Intersensory redundancy guides early perceptual and cognitive
development. Advances in Child Development and Behavior, 30, 153–87.
Batsell, W.R., Brown, A.S., Ansfield, M.E., and Paschall, G.Y. (2002). ‘You will eat all of that!’: a
retrospective analysis of forced consumption episodes. Appetite, 38, 211–19.
Beauchamp, G.K., and Mennella, J.A. (1998). Sensory periods in the development of human flavor
perception and preferences. Annales Nestlé, pp. 19–31. Raven Press, New York.
Beauchamp, G.K., Cowart, B.J. and Moran, M. (1986). Developmental changes in salt acceptability in
human infants. Developmental Psychobiology, 19, 17–25.
Beidler, L.M. (1961). Taste receptor stimulation. Progress in Biophysics and Biophysical Chemistry, 12, 107–32.
Bende, M., and Nordin, S. (1997). Perceptual learning in olfaction: professional wine tasters versus
controls. Physiology and Behavior, 62, 1065–70.
Bensafi, M., Rouby, C., Farget, V., Bertrand, B., Vigouroux, M., and Holley, A. (2002). Autonomic nervous
system responses to odours: the role of pleasantness and arousal. Chemical Senses, 27, 703–9.
Bensafi, M., Brown, W.M., Khan, R., Levenson, B., and Sobel, N. (2004) Sniffing human sex-steroid derived
compounds modulates mood, memory and autonomic nervous system function in specific behavioral
contexts. Behavioural Brain Research, 152, 11–22.
Bensafi, M., Pouliot, S., and Sobel, N. (2005). Odorant-specific patterns of sniffing during imagery
distinguish ‘bad’ and ‘good’ olfactory imagers. Chemical Senses, 30, 521–29.
Bernstein, I.L. (1978). Learned taste aversions in children receiving chemotherapy. Science, 200, 1302–13.
Bernstein, I.L. (1991). Flavor aversion. In Smell and taste in health and disease (eds. T.V. Getchell, R.L.
Doty, L.M. Bartoshuk, and J.B. Snow), pp. 417–28. Raven Press, New York.
Birch, L.L., McPhee, L., Steinberg, L., and Sullivan, S. (1990). Conditioned flavor preferences in young
children. Physiology and Behavior, 47, 501–507.
Blass, E.M. (1990). Suckling: determinants, changes, mechanisms, and lasting impressions. Developmental
Boisson, C. (1997). La dénomination des odeurs: variations et régularités linguistiques [The naming of
odours: variation and linguistic regularities]. Intellectica, 24, 29–49.
Bolhuis, J.J. (1996). Development of perceptual mechanisms in birds: predispositions and imprinting. In
Neuroethological studies of cognitive and perceptual processes (eds. C.F. Mossand, and S.J. Shettleworth),
pp. 158–84. Westview Press, Boulder, CO.
Booth, D. (1990). Learned roles of taste in eating motivation. In Taste, experience, and feeding (eds. E.D.
Capaldi and T.L. Powley), pp. 179–94. American Psychological Association, Washington, DC.
Brake, S.C. (1981). Suckling infant rats learn a preference for a novel olfactory stimulus paired with milk
delivery. Science, 211, 506–508.
Brunjes, P.C., and Alberts, J.R. (1979). Olfactory stimulation induces filial preferences for huddling in rat
pups. Journal of Comparative Physiology and Psychology, 93, 548–55.
Butler, J. and Rovee-Collier, C. (1989). Contextual gating of memory retrieval. Developmental
Psychobiology, 22, 533–52.
Cain, W.S. (1977). Differential sensitivity for smell: ‘noise’ at the nose. Science, 195, 796–98.
Cain, W.S., and Johnson F., Jr. (1978). Lability of odor pleasantness: influence of mere exposure.
Perception, 7, 459–65.
Calvert, G., Spence, C., and Stein, B.E. (2004). Handbook of multisensory processing. MIT Press, Cambridge, MA.
Carmichael, S.T., Clugnet, M.C., and Price J.L. (1994) Central olfactory connections in the Macaque
monkey. Journal of Comparative Neurology, 346, 403–34.
Cernoch, J.M. and Porter, R.H. (1985). Recognition of maternal axillary odors by infants. Child
Chu, S. (2008). Olfactory conditioning of positive performance in humans. Chemical Senses, 33, 65–71.
Chu, S., and Downes, J.J. (2000). Long live Proust: the odour-cued autobiographical memory bump.
Cognition, 75, 41–50.
Chu, S., and Downes, J.J. (2002). Proust nose best: odours are better cues of autobiographical memory.
Memory and Cognition, 30, 511–518.
REFERENCES 55
Coureaud, G., Moncomble, A.S., Montigny, D., Dewas, M., Perrier, G., and Schaal, B. (2006). A pheromone
that rapidly promotes learning in the newborn. Current Biology, 16, 1956–61.
Craig, A.D. (2009). How do you feel—now? The anterior insula and human awareness. Nature Reviews
Cuevas, I., Plaza, P., Rombaux, P., De Volder, A.G., and Renier L. (2009). Odour discrimination and
identification are improved in early blindness. Neuropsychologia, 47, 3079–83.
Damasio, A.R. (1993). Descartes’ error: emotion, reason and the human brain. Putnam, New York.
de Wijk, R.A., and Cain, W.S. (1994). Odor identification by name and by edibility: life-span development
and safety. Human Factors, 36, 182–87.
Degel, J., and Köster, E.P. (1999). Odours: implicit memory and performance effects. Chemical Senses, 24,
317–32.
Delaunay-El Allam, M., Marlier, L., and Schaal, B. (2006). Learning at the breast: preference formation for
an artificial scent and its attraction against the odor of maternal milk. Infant Behavior and Development,
29, 308–21.
Delaunay-El Allam, M., Soussignan, R., Patris, B., and Schaal, B. (2010). Long lasting memory for an odor
acquired at the mother’s breast. Developmental Science, 13, 849–63.
Delplanque, S., Grandjean, D., Chrea, C., et al. (2008). Emotional processing of odors: evidence for non
linear relation between pleasantness and familiarity evaluations. Chemical Senses, 33, 469–79.
Doty, R.L. (2003). Handbook of olfaction and gustation, 2nd edn Marcel Dekker, New York.
Doty, R.L., and Cometto-Muniz J.E. (2003). Trigeminal sensation. In Handbook of olfaction and gustation,
2nd Edn (ed. R.L. Doty), pp. 981–99. Marcel Dekker, New York.
Doucet, S., Soussignan, R., Sagot, P., and Schaal, B. (2007). The ‘smellscape’ of the human mother’s breast:
effects of odour masking and selective unmasking on neonatal arousal, oral and visual responses.
Developmental Psychobiology, 49, 129–38.
Doucet S., Soussignan R., Sagot P., Schaal B. (2009) The secretion of areolar (Montgomery's) Glands from
lactating women elicits selective, unconditional responses in neonates. PLoS One, 4(10), e7579.
doi:10.1371/journal.pone.0007579.
Durand, K., Baudon, G., Freydefont, L., and Schaal, B. (2008). Odorisation of a novel object can
influence infant’s exploratory behavior in unexpected ways. Infant Behavior and Development, 31,
629–36.
Edmond, K.M., Kirkwood, B.R., Amenga-Etego, S., Owusu-Agyei, S., and Hurt, L.S. (2007a). Effect of early
feeding practices on infection-specific neonatal mortality. An investigation of the causal links with
observational data from rural Ghana. American Journal of Clinical Nutrition, 86, 1126–31.
Edmond, K.M., Zandoh, C., Quigley, M.A., Amenga-Etego, S., Owusu-Agyei, S., and Kirkwood, B.R.
(2007b). Delayed breastfeeding initiation increases risk of neonatal mortality. Pediatrics, 117, 380–86.
Eisenberg, N.I. (2006). Identifying the neural correlates underlying social pain: implications for
developmental processes. Human Development, 49, 273–93.
Engen,T. (1987). Remembering odors and their names. American Scientist, 75, 497–503.
Engen, T. (1991). Odor sensations and memory. Praeger Press, New York.
Engen, T., and Ross, B.M. (1973). Long-term memory odors with and without verbal descriptions. Journal
of Experimental Psychology, 100, 221–27.
Engen, T., Kuisma, J.E., and Eimas P.D. (1973). Short-term memory for odors. Journal of Experimental
Epple, G., and Herz, R. (1999). Ambient odors associated to failure influence cognitive performance in
children. Developmental Psychobiology, 35, 103–107.
Faas, A.E., Sponton, E.D., Moya, P.R., and Molina, J.C. (2000). Differential responsiveness to alcohol odor
in human neonates. Effects of maternal consumption during gestation. Alcohol, 22, 7–17.
Fabes, R.A., and Filsinger, E.E. (1986). Olfaction and young children’s preferences: a comparison of odor
and visual cues. Perception and Psychophysics, 40, 171–76.
Fagen, J., Prigot, J., Carroll, M., Pioli, L., Stein, A., and Franco, A. (1997). Auditory context and memory
retrieval in young infants. Child Development, 68, 1057–66.
Ferdenzi, C., Coureaud, G., Camos, V., and Schaal, B. (2008). Human awareness and uses of odor cues in
everyday life: results from a questionnaire study in children. International Journal of Behavioral
Ferdenzi, C., Schaal, B., and Roberts, C.S. (2010). Family scents: developmental changes in body odor
perception of kin? Journal of Chemical Ecology, 36, 847–54.
Fernandez, M., and Bahrick, L. (1994). Infants’ sensitivity to arbitrary object-odour pairings. Infant
Behavior and Development, 17, 471–74.
Forestell, C.A., and Mennella, J.A. (2005). Children’s hedonic judgment of cigarette smoke odor: effect of
parental smoking and maternal mood. Psychology of Addiction and Behavior, 19, 423–32.
Gallace, A., and Spence, C. (2010). The science of interpersonal touch: an overview. Neuroscience and
Biobehavioral Reviews, 34, 246–59.
Ganshrow, J.R. and Mennella J.A. (2003). The ontogeny of human flavor perception. In Handbook of
olfaction and gustation, 2nd edn (ed. R.L. Doty), pp. 823–46. Marcel Dekker, New York.
Garb, J.L., and Stunkart, A.J. (1974). Taste aversion in man. American Journal of Psychiatry, 131,
1204–1207.
Gibson, E.J. (1969). Principles of perceptual learning and development. Academic Press, New York.
Goldman, W.P., and Seamon, J.G. (1992). Very long-term memory for odors: retention of odors-name
associations. American Journal of Psychology, 105, 549–63.
Gottfried, J.A. (2006). Smell: central nervous processing. Advances in Oto-Rhino-Laryngology, 63, 44–69.
Gottfried, J.A., and Dolan, R.J. (2003). The nose smells what the eye sees: crossmodal visual Facilitation of
human olfactory perception. Neuron, 39, 375–86.
development (eds. E. Tobach, L. Aronson, E. Shaw), pp. 67–128. Academic Press, New York.
Goubet, N., Rattaz, C., Pierrat, V., Bullinger, A., and Lequien, P. (2003). Olfactory experience mediates
response to pain in preterm newborns. Development Psychobiology, 42, 171–80.
Goubet, N., Strasbaugh, K., and Chesney, J. (2007). Familiarity breeds content? Soothing effect of a familiar
odor on full-term ne.wborns. Journal of Developmental and Behavioral Pediatrics, 28, 189–94.
Haller R., Rummel, C., Henneberg, S., Pollmer, U., and Köster, E.P. (1999). The effect of early experience
with vanillin on food preference later in life. Chemical Senses, 24, 465–67.
Harding, C., Faiman, A., and Wright J. (2010). Evaluation of an intensive desensitisation, oral tolerance
therapy and hunger provocation program for children who have had prolonged periods of tube feeds.
International Journal of Evidence-Based Healthcare, 8, 268–76.
Harlow, H.F., and Harlow, M.K. (1965). The affectional systems. In Behavior of nonhuman primates:
Modern research trends, Vol. 2 (eds. A.M. Schrier, H.F. Harlow and F. Stollnitz), pp. 287–334. Academic
Press, New York.
Heilmann, S., and Hummel, T. (2004). A new method for comparing orthonasal and retronasal olfaction.
Behavioral Neuroscience, 118, 412–19.
Hepper, P.G. (1988a). Adaptive foetal learning: prenatal exposure to garlic affects postnatal preferences.
Animal Behaviour, 36, 935–36.
Hepper, P.G. (1988b). Fetal ‘soap’ addiction. The Lancet, June 11, 1(8598), 1347–1348.
Hepper, P.G. (1991a). Transient hypoxic episodes: a mechanism to support associative fetal learning,
Animal Behavior, 41, 477–80.
Hepper, P.G. (1991b). An examination of fetal learning before and after birth. Irish Journal of Psychology,
12, 95–107.
Hepper, P.G. (1993). In utero release from a single transient hypoxic episode: a positive reinforcer?
Physiology and Behavior, 53, 309–11.
Hepper, P.G. (1995). Human fetal ‘olfactory’ learning. International Journal of Prenatal and Perinatal
Psychology and Medicine, 7, 147–51.
REFERENCES 57
Hermans, D., and Baeyens, F. (2002). Acquisition and activation of odor hedonics in everyday situations:
conditioning and priming studies. In Olfaction, taste and cognition (eds. C. Rouby, B. Schaal, D.
Dubois, D. Gervais, R. and A. Holley), pp. 119–39. Cambridge University Press, New York.
Herz, R.S. (2004). A naturalistic analysis of autobiographical memories triggered by olfactory, visual and
auditory stimuli. Chemical Senses, 29, 217–24.
Herz, R.S., and Engen, T. (1990). Odor memory: review and analysis. Psychonomic Bulletin and Review, 3,
300–313.
Hinton, M.D., Pich, M.S., Dara Chean, B.A., et al. (2004). Olfactory-triggered panic attacks among
Cambodian refugees attending a psychiatric clinic. General Hospital Psychiatry, 26, 390–97.
Hogan, J.A. (1998). Cause and function in the development of behavior systems. In Handbook of behavioral
neurobiology, Developmental psychobiology and behavioral ecology, Vol. 9 (ed. E.M. Blass), pp. 63–106.
Plenum Press, New York.
Holland, R. W., Hendriks, M., and Aarts, H.A.G. (2005). Smells like clean spirit: nonconscious effects of
scent on cognition and behavior. Psychological Science, 16, 689–93.
Hvastja and Zanuttini (1989). Odour memory and odour hedonics in children. Perception, 18, 391–96.
Johnson, B.A., and Leon, M. (2002). Modular representations of odorants in the glomerular layer of the rat
olfactory bulb and the effects of stimulus concentration. Journal of Comparative Neurology, 422, 496–509.
Johnston, C.C., Filion, F., Snider, L., et al. (2002). Routine sucrose analgesia during the first week of life in
neonates younger than 31 weeks’ postconceptional age. Pediatrics, 110, 523–28.
Johnston, C.C., Filion, F., and Nuyt, A.M. (2007). Recorded maternal voice for preterm neonates
undergoing heel lance. Advances in Neonatal Care, 7, 258–66.
Johnson, S.L., McPhee, L., and Birch, L.L. (1991). Conditioned preferences: young children prefer flavors
associated with high dietary fat. Physiology and Behavior, 50, 1245–51.
Kaplan, J.N., and Russell, M. (1973). Olfactory recognition in the infant squirrel monkey. Developmental
Kaplan, J.N., Cubicciotti, D.D., and Redican, W.K. (1977). Olfactory and visual differentiation of
synthetically scented surrogates by infant squirrel monkeys. Developmental Psychobiology, 12, 1–10.
Kirk-Smith, M.D., Van Toller, C. and Dodd,G.H. (1983). Unconscious odour conditioning in human
subjects. Biological Psychology, 17, 221–31.
Kobal, G., Hummel, T., and Van Toller, S. (1992). Differences in human chemosensory evoked-potentials
to olfactory and somatosensory chemical stimuli presented to the left and right nostrils. Chemical
Senses, 17, 233–44.
Kobayakawa, K., Kobayakawa, R., Matsumoto, H., et al. (2007). Innate versus learned odor processing in
the mouse olfactory bulb. Nature, 450, 503–510.
Korner, A.F., and Thoman, E.B. (1970). Visual alertness in neonates as evoked by maternal care. Journal of
Experimental Child Psychology, 10, 67–78.
Köster, E.P., Degel, J. and Piper, D. (2002). Proactive and retroactive interference in implicit odor memory.
Chemical Senses, 27, 191–207.
Kroner, T. (1882). Über die Sinnesempfindungen des Neugeborenen [On the sensations of the newborn].
Bresslauer Ärztliche Zeitschrift, 4, 37–58.
Lawless, H., and Engen, T. (1977). Associations to odors: interference, mnemonics, and verbal labeling.
Journal of Experimental Psychology, 3, 52–59.
Lecanuet, J.P., and Schaal, B. (1996). Fetal sensory competences. European Journal of Obstetrics,
Gynaecology, and Reproductive Biology, 68, 1–23.
Leon, M., Coopersmith, R., Lee, S., Sullivan, R.M., Wilson, D.M., and Woo, C.C. (1987). Neural and
behavioral plasticity induced by early olfactory learning. In Perinatal development, A psychobiological
perspective (eds. N.A. Krasnegor, E.M. Blass, M.A. Hoferand, and W.P. Smotherman), pp. 145–67.
Academic Press, Orlando, FL.
Levenson, R.W., Ekman, P., and Friesen, W.V. (1990). Voluntary facial action generates emotion-specific
effects on autonomic nervous system function in humans. Behavioral Neuroscience, 117, 1125–37.
Lewkowicz, D.J. (1988). Sensory dominance in infants: 2. Ten-month-old infants’ response to auditory-
visual compounds. Developmental Psychology, 24, 172–82.
Lewkowicz, D.J. (1991). Development of intersensory functions in human infancy: auditory/visual
interactions. In Newborn Attention: Biological Constraints and the Influence of Experience (eds.
M.J. Weiss and P.R. Zelazo), pp. 308–38. Ablex Publishing Corp, Norwood, NJ.
Lewkowicz, D.J., and Lickliter, R. (1994). The development of intersensory perception: comparative
perspectives. Lawrence Erlbaum Associates, Hillsdale.
Li, W., Luxenberg, E., Parrish, T., and Gottfried, J.A. (2006). Learning to smell the roses: experience-
dependent neural plasticity in human piriform and orbitofrontal cortices. Neuron, 52, 1097–1108.
Li, W., Moallem, I., Paller, K.A., and Gottfried, J.A. (2007). Subliminal smells can guide social preferences.
Li, W., Howard, J.D., Parrish, T., and Gottfried, J.A. (2008). Aversive learning enhances perceptual and
cortical discrimination of indiscriminable odor cues. Science, 319, 1842–45.
Logue, A.W., Ophir, I., and Strauss, K.E. (1981). The acquisition of taste aversions in humans. Behavior
Research and Therapy, 19, 319–33.
Lundstrom, J.N., and Olsson, M.J. (2005). Subthreshold amounts of social odorant affect mood, but not
behavior, in heterosexual women when tested by a male, but not a female, experimenter. Biological
Macfarlane, A.J. (1975). Olfaction in the development of social preferences in the human neonate. Ciba
Foundation Symposia, 33, 103–117.
Mallet, P., and Schaal, B. (1998). Rating and recognition of peer’s personal odours in nine-year-old
children: an exploratory study. Journal of General Psychology, 125, 47–64.
Maone, T.R., Mattes, R.D., Bernbaum, J.C., and Beauchamp, G.K. (1990). A new method for delivering a
taste without fluids to preterm and term infants. Developmental Psychobiology, 23, 179–81.
McBurney, D.H., Shoup, M.L., and Stretter, S.A. (2006). Olfactory comfort: smelling a partner’s clothing
during periods of separation. Journal of Applied Social Psychology, 36, 2325–35.
Mennella, J.A., and Beauchamp, G.K. (1998). Infants’ exploration of scented toys: effects of prior
experiences. Chemical Senses, 23, 11–17.
Mennella, J.A., and Forestell, C.A. (2008). Children’s hedonic responses to the odors of alcoholic beverages:
a window to emotions. Alcohol, 42, 249–60.
Mennella, J.A., and Garcia, P.L. (2000). Children’s hedonic response to the smell of alcohol: effects of
parental drinking habits. Alcohol: Clinical and Experimental Research, 24, 1167–71.
Mennella, J.A., Johnson, A., and Beauchamp, G.K. (1995) Garlic ingestion by pregnant women alters the
odor of amniotic fluid. Chemical Senses, 20, 207–209.
Mennella, J.A., Jagnow, C.P., and Beauchamp, G.K. (2001). Pre- and post-natal flavor learning by human
infants. Pediatrics, 107, 1–6.
Michael, G.A., Jacquot, L., and Brand, G. (2003). Ambient odors modulate visual attentional capture.
Neuroscience Letters, 352, 221–25.
Möller, P., Wulff, C., and Köster, E.P. (2004). Do age differences in odour memory depend on differences
in verbal memory? Neuroreport, 15, 915–917.
Mouélé, M. (1977). L’apprentissage des odeurs chez les Waanzi: notes de recherché [Learnin odours in the
Waanzi : reserach notes]. In L’odorat chez l’enfant : perspectives croisées [Smell in children: crossed
perspectives] (ed. B. Schaal), pp. 209–22. Presses Universitaires de France, Paris.
Murphy, C., and Cain, W.S. (1980). Lability of odor pleasantness: influence vs interaction. Physiology and
Behavior, 24, 601–605.
Olsson, S.B., Barnard, J., and Turri, L. (2006). Olfaction and identification of unrelated individuals. Journal
of Chemical Ecology, 32, 1635–45.
REFERENCES 59
Oram, N., Laing, D.G., Hutchinson, I., et al. (1995). The influence of flavor and color on drink
identification by children and adults. Developmental Psychobiology, 28, 239–46.
Patrick, J., Campbell, K., Carmichael, L., Natale, R., and Richardson, B. (1980). Patterns of human fetal
breathing activity during the last 10 weeks of pregnancy. Obstetrics and Gynecology, 56, 24–28.
Patrick, J., Campbell, K., Carmichael, L., Natale, R., and Richardson, B. (1982). Patterns of gross fetal body
movements over 24-hour observation intervals during the last 10 weeks of pregnancy. American
Journal of Obstetrics and Gynecology, 142, 363–71.
Pedersen, P.A., and Blass, E.M. (1982). Prenatal and postnatal determinants of the 1st suckling episode in
albino rats. Developmental Psychobiology, 15, 349–55.
Pihet, S., Schaal, B., Bullinger, A., and Mellier, D. (1996). An investigation of olfactory responsiveness in
premature newborns. Infant Behavior and Development, ICIS Issue: 676.
Pihet, S., Mellier, D., Bullinger, A., and Schaal, B. (1997). Réponses comportementales aux odeurs chez le
nouveau-né prématuré: étude préliminaire (Behavioural responses to odours in preterm infants : a
preliminary study). In L’odorat chez l’enfant: perspectives croisées (ed. B. Schaal), pp. 33–46. Presses
Universitaires de France, Collection Enfance, Paris.
Pirogovsky, E. Murphy, C., and Gilbert, P.E. (2009). Developmental differences in memory for cross-modal
associations. Developmental Science, 12, 1054–59.
Pomonis, J.D., Jewett, D.C., Kotz, J.E., et al. (2000). Sucrose consumption increases naloxone-induced
c-Fos immunoreactivity in limbic system. American Journal of Physiology, 364, R712–R719.
Poncelet, J., Rinck, F., Bourgeat, F., et al. (2010). The effect of early experience on odor perception in
humans: psychological and physiological correlates. Behavioural Brain Research, 208, 458–65.
Prechtl, H.F.R. (1958). The directed head turning response and allied movements of the human baby.
Behaviour, 13, 212–42.
Preyer, W. (1885). Die Seele des Kindes. [The soul of the child] (French translation). Editions Alcan, Paris.
Price, J.L. (1990). Olfactory system. In The Human Nervous System, G. Paxinos (ed.), pp. 979–1001.
Academic Press, San Diego.
Rabin, M.D. (1988). Experience facilitates olfactory quality discrimination. Perception and Psychophysics,
44, 532–40.
Rattaz, C., Goubet, N., and Bullinger, A. (2005). The calming effect of a familiar odor on full-term
newborns. Journal of Developmental and Behavioral Pediatrics, 26, 86–92.
Raybould, H.E. (1998). Does your gut taste? Sensory transduction in the gastrointestinal tract. News in
Physiological Science, 13, 275–80.
Reardon, P., and Bushnell, E.W. (1988). Infants’ sensitivity to arbitrary pairings of color and taste. Infant
Redican, W.K., and Kaplan, J.N. (1978). Effects of synthetic odors on filial attachment in infant squirrel
monkeys. Physiology and Behavior, 20, 79–85.
Rieser, J., Yonas, A., and Wilkner, K. (1976). Radial localization of odors by human newborns. Child
Robinson, S.R., and Mendèz-Gallardo, V. (2010). Amniotic fluid as an extended milieu intérieur. In
Handbook of developmental science, behavior, and genetics (Eds. K.E. Hood, C.T. Halpern, G. Greenberg,
and R.M. Lerner), pp. 234–84. Blackwell Publishing Ltd, Oxford, UK.
Rolls, E.T. (2005). Taste, olfactory, and food texture processing in the brain, and the control of food intake.
Ronca, A.E., Abel, R.A., and Alberts, J.R. (1996). Perinatal stimulation and adaptation of the neonate. Acta
Paediatrica, Suppl. 416, 8–15.
Rosenblatt, J.S. (1983). Olfaction mediates developmental transitions in the altricial newborn of selected
species of mammals. Development Psychobiology, 16, 347–75.
Rosenstein, D., and Oster, H. (1990). Differential facial responses to four basic tastes in newborns. Child
Rovee-Collier, C., and Cuevas, K. (2009). Multiple memory systems are unnecessary to account for infant
memory development: an ecological model. Developmental Psychology, 45, 160–74.
Rozin, P. (1982). ‘Taste-smell confusions’ and the duality of the olfactory sense. Perception and
Rubin, G.B., Fagen, J.W., and Carroll, M.H. (1998). Olfactory context and memory retrieval in 3-month-
old infants. Infant Behavior and Development, 21, 641–58.
Rudy, J.W., and Cheatle, M.D. (1977). Odor-aversion learning in neonatal rats. Science, 198, 845–46.
Ruff, H.A., Saltarelli, L.M., Capozzoli, M., and Dubiner, K. (1992). The differentiation of activity in infants’
exploration of objects. Developmental Psychology, 28, 851–61.
Saî, F. Z. (2005). The role of the mother’s voice in developing mother’s face preference: evidence for
intermodal perception at birth. Infant and Child Development, 14, 29–50.
Schaal, B. (1988). Olfaction in infants and children: developmental and functional perspectives. Chemical
Senses, 13, 145–90.
Schaal, B. (2005). From amnion to colostrum to milk: odor bridging in early developmental transitions. In
Prenatal development of postnatal functions (eds. B. Hopkins and S.P. Johnson), pp. 51–102. Praeger,
London.
Schaal, B. (2006). The development of flavor perception from infancy to adulthood. In Flavour in food
(ed. A. Voilley), pp. 401–36. Woodhead Publishing, Cambridge.
Schaal, B. (2012). Emerging chemosensory preferences. Another playground for the innate-acquired
dichotomy in human cognition. In Olfactory cognition (eds. G.H. Zucco, R. Herz, and B. Schaal),
pp. 237–68. Benjamin Publishing, Amsterdam.
Schaal, B., and Marlier, L. (1998). Maternal and paternal perception of individual odor signatures in
human amniotic fluid—potential role in early bonding? Biology of the Neonate, 74, 274–80.
Schaal, B., and Orgeur, P. (1992). Olfaction in utero: can the rodent model be generalized? Quarterly
Journal of Experimental Psychology. B. Comparative and Physiological Psychology, 44B, 245–78.
Schaal, B., Montagner, H., Hertling, E., Bolzoni, D., Moyse, R., and Quichon, R. (1980). Olfactory
stimulations in mother–infant relations. Reproduction, Nutrition, and Development, 20, 843–58.
Schaal, B., Orgeur, P., and Arnould, C. (1995a). Olfactory preferences in newborn lambs: possible influence
of prenatal experience. Behaviour, 132, 351–65.
Schaal, B., Orgeur, P., and Rognon, C. (1995b). Odor sensing in the human fetus: anatomical, functional
and chemo-ecological bases. In Prenatal Development, A Psychobiological Perspective (eds. J.P. Lecanuet,
N.A. Krasnegor, W.A. Fifer and W. Smotherman), pp. 205–37. Lawrence Erlbaum Associates,
Hillsdale, NJ.
Schaal, B., Marlier, L., and Soussignan, R. (1995c). Neonatal responsiveness to the odour of amniotic fluid.
Biology of the Neonate, 67, 397–406.
Schaal, B., Marlier, L., and Soussignan, R. (1998). Olfactory function in the human fetus: evidence from
selective neonatal responsiveness to the odor of amniotic fluid. Behavioral Neuroscience, 112, 1438–49.
Schaal, B., Marlier, L., and Soussignan, R. (2000). Human foetuses learn odours from their pregnant
mother’s diet. Chemical Senses, 25, 729–37.
Schaal, B., Coureaud, G., Langlois, D., Giniès, C., Sémon, E., and Perrier, G. (2003). Chemical and
behavioural characterization of the mammary pheromone of the rabbit. Nature, 424, 68–72.
Schaal, B., Hummel T., and Soussignan, R. (2004). Olfaction in the fetal and premature infant: functional
status and clinical implications. Clinics in Perinatology, 31, 261–85.
Schiffman, S.S. (1974). Physicochemical correlates of olfactory quality. Science, 185, 112–17.
Schleidt, M., and Genzel, C. (1990). The significance of mother’s perfume for infants in the first weeks of
their life. Ethology and Sociobiology, 11, 145–54.
Schmidt, H. (1990). Adult-like hedonic responses to odors in 9-month-old infants. Chemical Senses, 15, 634.
Schmidt, H., and Beauchamp, G.K. (1989). Sex differences in responsiveness to odors in 9-month-old
infants. Chemical Senses, 14, 744.
REFERENCES 61
Schneirla, T.C. (1965). Aspects of stimulation and organization in approach/withdrawal processes

underlying vertebrate behavioural development. In Advances in the Study of Behaviour, Vol.1 (eds.
D.S. Lehrman, R.A. Hinde and E. Shaw), pp. 1–74. Academic Press, New York.
Schroers, M., Prigot, J., and Fagen, J. (2007). The effect of a salient odor context on memory retrieval in
young infants. Infant Behavior and Development, 30, 685–89.
Sclafani, A. (2007). Sweet taste signaling in the gut. Proceedings of the National Academy of Sciences of the
USA, 104, 14887–88.
Seigneuric, A., Durand, K., Jiang, T., Baudouin, J.Y., and Schaal, B. (2011). The nose tells it to the eyes:
crossmodal associations between olfaction and vision. Perception, 39, 1541–1554.
Shoup, M.L., Streeter, S.A., and McBurney, D.H. (2008). Olfactory comfort and attachment within
relationships. Journal of Applied Social Psychology, 38, 2954–63.
Simon, S.A., de Araujo, I.E., Gutierrez, R., and Nicolelis, A.L. (2006). The neural mechanisms of gustation:
a distributed processing code. Nature Reviews Neuroscience, 7, 890–901.
Singer, T., Seymour, B., O’Doherty, J., Kaube, H., Dolan, R.J., and Frith, C.D. (2004). Empathy for pain
involves the affective but not sensory components of pain. Science, 303, 1157–62.
Smotherman, W.P. (1982). Odor aversion learning by the rat fetus. Physiology and Behavior, 29, 769–71.
Smotherman, W.P., and Robinson, S.R. (1987). Psychobiology of fetal experience in the rat. In Perinatal
development: A psychobiological perspective (eds. N.E. Krasnegor, E.M. Blass, M.A. Hofer, and
W.P. Smotherman), pp. 39–60. Academic Press, Orlando, FL.
Smotherman, W.P., and Robinson, S.R. (1988). The uterus as environment: the ecology of fetal experience.
In Handbook of behavioral neurobiology: Developmental psychobiology and behavioral ecology, Vol. 9
(ed. E.M. Blass), pp. 149–96. Plenum Press, New York.
Smotherman, W.P., and Robinson, S.R. (1995). Tracing developmental trajectories into the prenatal
period. In Fetal development: a psychobiological perspective (eds. J.P. Lecanuet, W.P. Fifer, N.A.
Krasnegor and W.P. Smotherman), pp. 15–32. Lawrence Erlbaim Associates, Hillsdale, NJ.
Soussignan, R., Schaal, B., Marlier, L., and Jiang, T. (1997). Facial and autonomic responses to biological
and artificial olfactory stimuli in human neonates: re-examining early hedonic discrimination of odors.
Spear, N.E., and Molina, J.C. (1987). The role of sensory modality in the ontogeny of stimulus selection.
In Perinatal development: a psychobiological perspective (eds. N. Krasnegor, E.M. Blass, M.A. Hofer and
W.P. Smotherman), pp. 83–110. Academic Press, Orlando, FL.
Spector, F., and Maurer, D. (2009). Synesthesia: a new approach to understanding the development of
perception. Developmental Psychology, 45, 175–89.
Steiner, J. (1979). Human facial expressions in response to taste and smell stimulations. In Advances in child
development, Vol. 13 (eds. L.P. Lipsitt, H.W. Reese), pp. 257–95. Academic Press, New York.
Steiner, J.E., Glaser, D., Hawilo, M.E., and Berridge, K. (2001). Comparative evidence of hedonic impact:
affective reactions to taste by human infants and other primates. Neuroscience and Biobehavioral
Reviews, 25, 53–74.
Stevenson, R.J., and Boakes, R.A. (2004). Sweet and sour smells: learned synaesthesia between the senses of
taste and smell. In Handbook of multisensory processing (eds. G. Calvert, C. Spence, and B.E. Stein),
pp. 69–83. MIT Press, Cambridge, MA.
Stevenson, R. J., and Tomiczek, C. (2007). Olfactory-induced synesthesias: a review and model.
Stickrod, G., Kimble, D.P., and Smotherman, W.P. (1982). In utero taste odor aversion conditioning of the
rat. Physiology and Behavior, 28, 5–7.
Stirnimann, F. (1936). Versuche über Geschmack und Geruch am ersten Lebenstag. [Experiments on taste
and smell on the first day of life], Jahrbuch der Kindereilkunde, 146, 211–27.
Stockhorst, U., Gritzmann, E., Klopp, K., et al. (1999). Classical conditioning of insulin effects in healthy
humans. Psychosomatic Medicine, 61, 424–35.
Sullivan, R.M., and Toubas, P. (1998). Clinical usefulness of maternal odor in newborns: soothing and
feeding preparatory responses. Biology of the Neonate, 74, 402–408.
Sullivan, R.M., Taborsky, S.B., Mendoza, R. et al. (1991). Olfactory classical conditioning in neonates.
Pediatrics, 87, 511–517.
Tatzer, E., Schubert, M.T., Timischl, W., and Simbruner, G. (1985). Discrimination of taste and preference
for sweet in premature babies. Early Human Development, 12, 23–30.
Teerling, A., Köster, E.P., and van Nispen, V. (1994). Early childhood experiences and future preferences.
11th Congress of the European Chemoreception Research Organization. Blois, France.
Tees, R.C. (1994). Early stimulation history, the cortex, and intersensory functioning in infrahumans: space
and time. In The development of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz
and R. Lickliter), pp. 107–31. Lawrence Erlbaum Associates, Hillsdale, NJ.
Turkewitz, G. (1979). The study of infancy. Canadian Journal of Psychology, 33, 408–412.
Turkewitz, G.A. (1994). Sources of order for intersensory functioning. In The development of intersensory
perception: comparative perspectives (eds. D.J. Lewkowicz and R. Lickliter), pp. 3–17. Lawrence Erlbaum
Turkewitz, G., and Kenny, P.A. (1985). The role of developmental limitations of sensory input on sensory/
perceptual organization. Journal of Developmental and Behavioral Pediatrics, 15, 357–68.
Turkewitz, G.A., and Mellon, R.C. (1989). Dynamic organization of intersensory function. Canadian
Journal of Psychology, 43, 286–301.
van Toller, S., and Kendal-Reed, M. (1995). A possible protocognitive role for odour in human infant
development. Brain and Cognition, 29, 275–93.
Varendi, H., and Porter, R.H. (2001). Breast odour as the only maternal stimulus elicits crawling towards
the odour source. Acta Paediatrica, 90, 372–75.
Varendi, H., Porter, R.H., and Winberg, J. (2002). The effect of labor on olfactory exposure learning within
the first postnatal hour. Behavioral Neuroscience, 116, 206–211.
Verhagen, J.V., and Engelen, L. (2006). The neurocognitive bases of human multimodal food perception:
sensory integration. Neuroscience and Biobehavioral Reviews, 30, 613–50.
Vermetten, E., and Bremner, J.D. (2003). Olfaction as a traumatic reminder in posttraumatic stress
disorders: case report and review. Journal of Clinical Psychiatry, 64, 202–207.
Wakefield, C.E., Homewood, J., and Taylor, A.J. (2004). Cognitive compensations for blindness in children:
an investigation using odour naming. Perception 33, 429–42.
Walker-Andrews, A.S. (1994). Taxonomy for intermodal relations. In The development of intersensory
perception: comparative perspectives (eds. D.J. Lewkowicz and R. Lickliter), pp.39–56. Lawrence
Wang, H., Wysocki, C.J., and Gold, G. (1993). Induction of olfactory receptor sensitivity in mice. Science,
260, 998–1000.
Weisfeld, G.E., Czilly T., Phillips K.A., Gall J.A., and Lichtman, C.M. (2003). Possible olfaction-based
mechanisms in human kin recognition and inbreeding avoidance. Journal of Experimental Child
Willander, J., and Larsson, M. (2007). Olfaction and emotion: the case of autobiographical memory.
Memory and Cognition, 35, 1659–63.
Wyart, C., Webster, W.W., Chen, J.H., et al. (2007). Smelling a single component of male sweat alters levels
of cortisol in women. Journal of Neuroscience, 27, 1261–65.
Yeomans, M.R. (2006). The role of learning in development of food preferences. In The psychology of food
choice (ed. R. Shepherd and M. Raats), pp. 93–112. CABI, Wallingford, UK.
Yeshurun, Y., and Sobel, N. (2010). An odor is not worth a thousand words: from multidimensional odors
to unidimensional odor objects. Annual Review of Psychology, 61, 219–41.
Zucco, G. M., Paolini, M., and Schaal, B. (2009). Unconscious odour conditioning 25 years later: Revisiting
and extending ‘Kirk-Smith, Van Toller and Dodd’. Learning and Motivation, 40, 364–75.
Chapter 3
The development and decline of

multisensory flavour perception
Assessing the role of visual (colour)
cues on the perception of taste and flavour
Charles Spence
3.1 Introduction
Flavour perception is one of the most multisensory of our everyday experiences (Spence 2010b,
2012; Stillman 2002), involving as it does not only the taste and smell of a food or drink item, but
also its texture, the sound it makes, and even what it looks like (though see below).1 The pain
associated with eating certain foods (such as, for example, chilli) also contributes to the pleasure
of many foods. Usually, all of these unisensory cues are seamlessly integrated into our perception
of a particular flavour (located subjectively) in the mouth (Spence 2012; Stevenson 2009). Flavour
perception is, however, also one of the least well understood of our multisensory experiences.
This is especially true from a developmental perspective, where the majority of textbooks and
review papers tend not even to discuss the development of flavour perception (e.g. see Lewkowicz
and Lickliter 1994; Pick and Pick 1970). While the evidence is currently fairly sparse, and in many
cases inconsistent, I would argue that research on the topic of multisensory flavour perception is
nevertheless very important—both in terms of understanding why it is that young children do
not like certain foods (such as vegetables) and what can be done to improve the quality of the food
eaten by those at the other end of the age spectrum who may be suffering from a loss of olfactory
and, to a lesser extent, gustatory sensitivity (e.g. Schiffman 1997). Furthermore, the growing
global obesity epidemic has led to a recent increase in interest in multisensory flavour perception
(see Mennella and Beauchamp 2010, for a review).
Flavour perception is a difficult area to study, in part because researchers cannot agree on
a definition (see Auvray and Spence 2008; Spence et al. 2010; Stevenson and Tomacziek 2007).
Part of the problem here is that there is a great deal of uncertainty over whether or not flavour
should be conceptualized as a separate sensory modality (e.g. McBurney 1986; Stevenson 2009).
That said, the last few years have seen a growing number of cognitive neuroscientists successfully
1 Although the words ‘taste’ and ‘flavour’ are used interchangeably in everyday English, food scientists
typically give each term a very specific, and distinct, meaning. In particular, the word ‘taste’ is used to
describe only those sensations primarily associated with the stimulation of the taste-buds, namely sweet-
ness, sourness, bitterness, saltiness, and the savoury taste of umami. By contrast, the word ‘flavour’ is used
to refer to the experiences resulting from the simultaneous stimulation of the taste buds and the olfactory
receptors in the nasal epithelium. In order to avoid any confusion, this is also how the two terms will be
used in the present article.
64 THE DEVELOPMENT AND DECLINE OF MULTISENSORY FLAVOUR PERCEPTION
applying their understanding of the mechanisms underlying multisensory integration borrowed

from investigations of audiovisual or visuotactile integration to the study of flavour perception
(e.g. Auvray and Spence 2008; Spence 2010b; Verhagen and Engelen 2006). Given that we now
understand more about the multisensory perception of flavour in adults, I would argue that we
are in a better position than ever before to examine how the senses converge to influence flavour
perception developmentally. Below, I review what is currently known about the development and
decline of multisensory flavour perception across the human lifespan (the focus will primarily be
on the role that visual cues play in modulating taste and flavour perception). One caveat at the
outset is that there is not as yet a great deal of evidence relevant to this question, at least not when
compared to other areas of developmental perception (see, for example, the other chapters in this
volume). What is more, many of the studies that have been published to date have generated
results that are either seemingly mutually inconsistent or else have provided only relatively weak
empirical evidence for the claims being made by the authors.
3.2 Which senses contribute to flavour perception?

Our enjoyment of food and drink comes not only from the unified oral sensation of taste and
smell (both orthonasal and retronasal),2 but also from the sound it makes, not to mention what
it looks like. The oral-somatosensory qualities of foods are also very important: texture, tempera-
ture, and even pain, as in the case of eating chilli peppers (see Green 2002), all contribute to the
overall multisensory flavour experience (or gestalt; Spence 2010b; Verhagen and Engelen 2006).
A number of reviews of multisensory flavour perception in adults have been published over
the last few years (e.g. see Spence 2012; Stevenson 2009; Verhagen and Engelen 2006, for some
representative examples). Therefore, given the developmental theme of this volume, I will not
discuss the adult data in any great detail here. I do, however, want to highlight the important
distinction between taste and flavour: the ‘basic’ tastes, which can be detected by receptors on the
human tongue (intriguingly, there also appear to be gustatory receptors in the gastrointestinal
tract; see Egan and Margolskee 2008), consist of sweet, sour, bitter, salty, umami, and metallic
(see Erikson 2008; Spence 2010b). By contrast, flavour perception involves the stimulation of
retronasal olfaction, gustation (i.e. taste), and on occasion oral irritation (transduced by the
trigeminal nerve). It is the combination of odours and tastes that gives rise to the perception of
fruit flavours, meaty flavours, etc. (see Spence et al. 2010). To put the relative contribution of
these two senses into some perspective, it is frequently stated that as much as 80% of our percep-
tion of flavour comes from the information provided by the nose (rather than from the tongue;
e.g. see Martin 2004; Murphy et al. 1977; although note that it is unclear whether this figure
should be taken to refer to the perception of intensity or to the identification of flavour).
According to The International Standards Organization (ISO 5492, 1992), flavour is a ‘complex
combination of the olfactory, gustatory and trigeminal sensations perceived during tasting.
The flavour may be influenced by tactile, thermal, painful and/or kinaesthetic effects.’ (see
Delwiche 2004, p. 137). Visual and auditory cues may modify a food’s flavour, but according to
2 Researchers now believe that there are two relatively distinct olfactory sensory systems (see Chapter 2 by
Schaal and Durand). One system (which is older in phylogenetic terms), associated with the inhalation of
external odours, is known as orthonasal olfaction. The other (newer system involving the posterior nares) is
associated with the detection of the olfactory stimuli emanating from the food we eat, as odours are peri-
odically forced out of the nasal cavity when we chew or swallow food, and is known as retronasal olfaction.
It is an interesting, although as yet unanswered, question as to whether orthonasal and retronasal olfaction,
in addition to their different phylogenetic origins, also have different developmental trajectories.
WHICH SENSES CONTRIBUTE TO FLAVOUR PERCEPTION? 65
the ISO definition at least, they are not intrinsic to it. However, many other researchers disagree
with what they see as an overly restrictive definition and have argued that all five of the major
senses can and do contribute to the multisensory perception of flavour (e.g. see Auvray and
Spence 2008; Stevenson 2009). To make matters all the more complicated, visual cues, such as a
food’s colour, may modify the perception of a food’s flavour by influencing the gustatory qualities
of the food, by influencing the olfactory attributes of the food (as perceived orthonasally and/or
retronasally; Koza et al. 2005), by influencing the oral-somatosensory qualities of the food,
and/or by influencing the overall multisensory flavour percept (or gestalt; see Fig. 3.1). As yet, it
is not altogether clear at which stage(s) vision interacts with the other senses. Furthermore, vari-
ous top-down factors play a profoundly important role in modulating our responses to foods as
well (see Rozin and Fallon 1987; Yeomans et al. 2008).
Here, I would like to argue, as I have done elsewhere (see Spence et al. 2010), that the ISO
definition is overly restrictive, and that audition should be included in the definition of flavour,
whereas (contrary to the claims of a number of contemporary neuroscientists: Stevenson 2009;
Verhagen and Engelen 2006; see also Auvray and Spence 2008) vision should not, and hence its
influence should be considered as crossmodal. (By crossmodal, I mean that one sense influences
another without the two sensory inputs necessarily being integrated into a unified perceptual
whole, or gestalt.) The reason, I would like to argue, why audition should be included is that both
spatial and temporal coincidence play a critical role in terms of what we hear, influencing our
Olfaction Gustation
Flavour
Oral-
somatosensation
Fig. 3.1 This figure highlights the multiple ways in which visual cues might influence flavour
perception. Visual cues (such as the colour of a beverage) may exert a crossmodal influence on
olfaction, gustation, and/or on oral-somatosensation. Such crossmodal effects, should they exist,
might then have a carry-over effect on the experienced multisensory flavour percept once the various
unisensory cues have been integrated. Alternatively, however, visual information might influence
flavour perception only once the olfactory, gustatory, and/or oral-somatosensory cues have been
integrated into a multisensory flavour percept. Unfortunately, as yet, there is no clear answer with
regard to the way(s) in which vision exerts its effect on multisensory flavour perception. (Reproduced
from Charles Spence, Does Food Color Influence Taste and Flavor Perception in Humans?,
Chemosensory Perception, 3 (1), pp. 68-84, © 2010, Springer Science + Business Media.)
perception of a food or drink’s flavour. Spatiotemporal coincidence also plays a critical role in
modulating any oral-somatosensory contribution to flavour perception. By contrast, the effect of
vision usually occurs despite the fact that visual cues are experienced in a different location and
time from the other flavour cues (e.g. in the mouth). The reason why vision should probably not
be included in the definition of flavour is that it may exert its effect on flavour perception by
setting-up an expectation (just like a label or verbal description) about the likely identity/intensity
of what we are about to taste/consume. True, those expectations can modulate the experienced
taste and flavour of the food in the mouth, but the rules of spatial and temporal correspondence
are not as strict as is typical in other examples of multisensory integration.
In this review, I will focus on the effect of visual cues on taste and flavour perception. First,
however, I briefly review the evidence concerning our early flavour learning experiences.
3.3 Prenatal and perinatal flavour learning: gustation and

olfaction
In terms of development, the senses of taste and smell (technically referred to as gustation and
olfaction, respectively) begin to develop after touch, which starts about 7 weeks after fertilization.
The foetus starts to breathe, inhaling and exhaling amniotic fluid around 9–10 weeks after con-
ception, and has a functioning olfactory epithelium by 11 weeks (Doty 1992). Specialized taste
cells appear around the seventh or eighth week of gestation in the human foetus, with structurally
mature taste buds emerging after 13–15 weeks (Bradley and Mistretta 1975; Bradley and Stern
1967; Cowart 1981; Mennella and Beauchamp 1994). During the later stages of gestation, the
foetus takes in considerable amounts of amniotic fluid, inhaling more than twice the volume it
swallows (Ganchrow and Mennella 2003). The development of the senses of taste and smell there-
fore occurs well before the development of fully functional auditory (about 6–7 months) or visual
receptors (see Gottlieb 1971).
While certain of our responses to basic tastes are present at birth, the majority (but by no means
all) of our responses to odours are learnt (Khan et al. 2007; see also Chapter 2 by Schaal and
Durand). We are all born liking sweet- and disliking sour-tasting foodstuffs, while being indiffer-
ent to bitter- and salty-tasting solutions (e.g. Birch 1999; Desor et al. 1973, 1975). Our liking for
salt appears to emerge after approximately 4–6 months (Desor et al. 1975), while a liking for bitter
substances emerges much later in life (see Mennella and Beauchamp 1994, for a review). We are
born liking certain of the flavours/smells of the foods that our mothers happen to have consumed
during pregnancy (e.g. Abate et al. 2008; Schaal et al. 2000; see also DeSnoo 1937; Ganchrow 2003).
It turns out that flavours from a mother’s diet are transmitted to the amniotic fluid and breast milk
and thus swallowed by the foetus and neonate during and after pregnancy (Blake 2004). In turn,
newborns tend to find the odour of their mother’s amniotic fluid attractive (e.g. Varendi et al.
1996). Given that our earliest flavour learning takes place in the womb and at our mother’s breast
(Galef and Sherry 1973; Hausner et al. 2008; Mennella 1995; Mennella and Beauchamp 1994;
Mennella et al. 2001), crossmodal olfactory–gustatory flavour learning presumably starts before
functional vision has come online (i.e. with the opening of a baby’s eyes at, or after, birth; see
Chapter 2 by Schaal and Durand, for further discussion of early flavour learning).
3.4 The later development of multisensory flavour perception:

gustation and olfaction
Olfactory–gustatory flavour learning continues well into adulthood. The latest research from
Stevenson and his colleagues in Australia have demonstrated that novel food odours (i.e. odours
that do not themselves elicit any taste percept when initially presented in isolation) can come to
THE LATER DEVELOPMENT OF MULTISENSORY FLAVOUR PERCEPTION 67
take on specific taste qualities for adults (e.g. Stevenson and Boakes 2004). So, for example, in the
West, adding the tasteless odour of strawberry or vanilla, say, to a drink will make it taste sweeter
(see also Frank and Byram 1988). Lavin and Lawless (1998) have demonstrated that both adults
(18–31 years old) and children (5–14 years old) show this crossmodal enhancement effect, rating
low-fat milk drinks as tasting sweeter when tasteless vanilla odor is added than when it is absent.
This form of crossmodal associative learning takes place very rapidly. Within a few trials of a
novel odourant being paired with a particular tastant (such as pairing the odour of water chestnut
with a sweet taste for a Western European participant), the odour comes to take on the properties
of the tastant (e.g. see Stevenson et al. 1995, 1998). In further experiments, Stevenson and his
colleagues have gone on to demonstrate that a given novel odorant can actually be associated with
a variety of different tastants. So, for example, it turns out that it is just as easy to pair the aroma
of water chestnut with a bitter taste, should a participant first experience it (i.e. the aroma)
together with a bitter taste (see Stevenson 2012, for a review). In the context of the present review,
it would be intriguing to determine whether there is any fall-off in this ability to learn novel taste–
odour associations in old-age (cf. Nusbaum 1999), given the absence of evidence on this question
at present.
Taste and flavour preferences change across the lifespan: for example, children between 9 and
15 years of age appear to like sweet (e.g. sugar), salty (sodium chloride), and extremely sour tastes
more than adults (see Desor et al. 1975; Liem and Mennella 2003), but often avoid anything
remotely resembling a vegetable (Blake 2004; Horne et al. 2004, 2009). In part, this may reflect an
aversion to bitter tastes, a sensible evolutionary strategy for a young child given that bitter-tasting
foods are often poisonous in nature (Bartoshuk and Duffy 2005; Glendinning 1994). There is,
though, also some evidence to suggest that young children may be more sensitive to bitter tastes
than adults (see Cowart 1981; Harris 2008; Mennella and Beauchamp 2010, for reviews). In fact,
as adults, our liking for bitter foods emerges in many cases as the result of social conditioning
and/or the pairing of the unpleasant taste (e.g. of caffeine) with sugar (many people start drinking
sweetened coffee; cf. Zellner et al. 1983) or with the pleasant physiological consequences of other
pharmacologically-active bitter substances, such as the caffeine in coffee or the ethanol in alco-
holic beverages (see Blake 2004; Mennella and Beauchamp 2010).
Recent research from Houston-Price et al. (2009) has demonstrated that merely (‘visually’)
exposing 21–24-month-old toddlers to pictures of fruit and vegetables can influence their subse-
quent willingness to taste those fruits and vegetables (though see also Birch et al. 1987). When
such results are put together with earlier findings showing that adults like unfamiliar fruit juices
more, the more that they have tried (or been exposed to) those flavours previously (Pliner 1982),
the suggestion that emerges is that mere exposure effects (both in the womb, see above, and after
birth) also help to explain many of the changes in the liking for various tastes/flavours that occur
over the course of human development (see Capretta et al. 1975; Harris 2008).
The available evidence suggests that the different sensory attributes of a food (such as its aroma,
flavour, colour, texture, shape, and/or temperature) may play different roles in people’s food
preferences at different ages. So, for example, research comparing the preferences of children
(mean age of 11 years) with those of young adults (with a mean age of 20 years) has revealed that,
if anything, sweetness (e.g. in a soft drink) is more important to children than to adults, whereas
adults tend to rate visual appearance and odour as being more important sensory attributes than
do younger children (Tuorila-Ollikainen et al. 1984). Colour preferences in foodstuffs may also
differ as a function of age: for example, while Lowenberg (1934) reported that preschool children
preferred orange and yellow, Walsh et al. (1990) observed that both 5- and 9-year-old children
preferred red, green orange, and yellow candies in that order (see also Marshall et al. 2006).
However, given the paucity of research in this area, and the large temporal separation between the
studies just reported, further (possibly longitudinal) research will clearly be needed before any
firm conclusions can be drawn with regards to changes in preferred food colours over the course
of development. During stages of maximal growth, humans ought, if anything, to express an
increased liking for carbohydrates (e.g. sweet-tasting foods; Drewnowski 2000; Mennella and
Beauchamp 2010). Consequently, given the correlation in nature between ripeness and sweetness
(e.g. in fruits; Maga 1974), one might expect that red foods ought to be particularly appealing to
children during these periods of maximal growth (when their energy requirements are at their
peak).
While there has been a recent growth of interest in studying the nature of any oral-somatosen-
sory contributions to flavour perception (e.g. Bult et al. 2007), this research has so far primarily
only been conducted in adults (see Spence 2012, for a review; though see Blossfeld et al. 2007, for
a solitary study examining texture perception in 12-month-old infants). Similarly, the resurgence
of interest in auditory contributions to food texture and flavour perception over the last few years
has been restricted to studies conducted on adults (normally college-age students; see Spence and
Shankar 2010; Zampini and Spence 2004, 2010). Hence, there is not yet really a developmental
story to tell concerning any changes in the role of oral-somatosensory or auditory cues to multi-
sensory flavour perception across the life-span. Given that the majority of developmental studies
of multisensory flavour perception have tended to focus on assessing the (changing) crossmodal
influence of visual cues on taste and flavour perception, it is on these studies that we will focus in
the sections below.
3.5 How might the influence of vision in flavour perception be

expected to change over the course of development?
At the outset, two main contrasting predictions can be put forward in terms of how the influence
of visual (specifically colour) cues might be expected to change over the course of early human
development:
1. According to the most commonly expressed view in the literature, we are all born with dis-
tinct sensory systems, and we learn to integrate the cues (or information) provided by each
sense over the course of development as a result of our experience of crossmodal correlations
between the patterns of stimulation presented in the different modalities in the environment
(see Lewkowicz and Lickliter 1994). According to this account (e.g. Christiansen 1985; Lavin
and Lawless 1998), the influence of the colour of a food or drink on our perception of flavour
ought to increase over the first few years of life as we come to learn the correlations that exist
in nature (or, for that matter, in the supermarket; see Shankar et al. 2010a) between colour and
flavour (Wheatley 1973). So, for example, we might come to learn that in many fruits there
is an association between redness (e.g. ripeness) and sweetness (Maga 1974). Similarly, in the
case of many processed foods and drinks in the supermarket, there tends to be a crossmodal
association between the intensity of the colour and the intensity of the taste/flavour. Conse-
quently, over time, certain colours may be expected to come to signify (or lead to the expecta-
tion) that they will be followed by certain tastes/flavours (this is sometimes described as ‘visual
flavour’).3 Such learning presumably starts very early. Indeed, the available evidence suggests
that by 4 months of age, female infants have already started learning crossmodal associations
3 However, the source of the crossmodal association is not always easy to figure out. Gilbert et al. (1996), for
example, have highlighted the existence of certain reliable colour–odour associations present in adults
where it is currently much harder to fathom where people might have come across these associations in
nature (see also Schifferstein and Tanudjaja 2004; Spector 2009).
HOW MIGHT THE INFLUENCE OF VISION IN FLAVOUR PERCEPTION CHANGE? 69
between odours and colours/shapes (Fernandez and Bahrick 1994; see also Hurlbert and Ling
2007; Reardon and Bushnell 1988; Spence 2011).
2. However, according to an alternative view of the development of multisensory perception, we
are all born confusing our senses (the ‘blooming buzzing confusion’ mentioned by William
James, 1890), and we learn through experience (and possibly as a result of parcellation; see
Maurer 1997; Maurer and Mondloch 2005) to distinguish between the attributes that rightfully
belong to each of the senses over the course of early development (see Lewkowicz and Lickliter
1994). According to the latter view, one might expect the influence of visual cues on multisen-
sory flavour perception to decline over the course of development, as individuals become
increasingly competent at individuating their sensory experiences. The development of the
ability to direct one’s attention/cognitive resources to the inputs associated with a particular
sense may also aid this process of individuating sensory inputs (i.e. pulling apart the multisen-
sory flavour gestalt), and, once again, likely improves over the course of development. It would
seem plausible that younger children might find it harder to focus on the flavour of food and
hence might be more easily distracted than adults by any highly salient changes in the colour of
food. Indeed, there is some evidence that the highly-developed ability to focus one’s attention
solely on the taste (e.g. sweetness) of a foodstuff (such as a wine), and not be influenced by the
aroma—that is, the ability to treat flavours analytically—is something that can only be acquired
(typically in adulthood) as a result of extensive training (e.g. Prescott et al. 2004).
Of course, predictions regarding the changing role of vision in multisensory flavour perception
are also going to be complicated by the fact that acuity in each of the senses develops/declines at
different rates/times (see below; cf. Zampini et al. 2008). As we will see below, the weight of the
(admittedly limited) empirical evidence appears to support the view that the crossmodal effect of
visual cues on flavour perception (specifically flavour identification) declines over the course of
early development (i.e. up to adulthood). With regards to what happens at the other end of the
age spectrum (i.e. in old age), the extant evidence (although limited) currently supports the view
that the impact of visual (colour) cues of multisensory flavour perception increases slightly.
Before we come to evaluate the developmental data with regards to vision’s influence on mul-
tisensory flavour perception, however, it is worth noting that while there is good evidence that
visual cues (specifically relating to colour) play an important role in the perception of flavour
identity in adults (e.g. see DuBose et al. 1980; Stillman 1993; Zampini et al. 2007, 2008, for empir-
ical evidence), the literature with regard to the effects of colour intensity on taste and flavour
intensity is much more mixed (e.g. Lavin and Lawless 1998). Indeed, the literature on adults has
not yet delivered a particularly clear story in terms of the effects of changes in colour identity (i.e.
hue) or colour intensity (i.e. saturation) on either taste or flavour perception (see Spence et al.
2010, for a review). Thus, taken as a whole, the literature on adults currently supports the view
that colour has a much more reliable crossmodal effect on flavour judgments than on taste judg-
ments (see Spence et al. 2010, for a review).4 Below, we will see that the same appears to hold true
in the developmental data. (Remember here that flavour judgments include such attributes as
fruitiness, spiciness, etc. that involve the contribution of both olfaction and taste, whereas taste
judgments refer just to basic tastes: sweetness, sourness, bitterness, saltiness etc.).
4 Note here that the strongest crossmodal effects appear to be on qualitative judgments (e.g. of flavour
identity) rather than on quantitative judgments (e.g. of taste or flavour intensity). While it is possible that
colour might also have an effect on qualitative judgments of taste identity (e.g. sweet versus sour), no one
has conducted (or at least published) such a study to date (see Spence et al. 2010).
3.6 Developmental changes in the crossmodal influence of

visual cues on flavour perception
Oram et al. (1995) conducted one of the only studies to have looked at the influence of visual cues
on multisensory flavour perception that specifically tested several different age groups of children
using exactly the same experimental method. In their study, over 300 visitors to a university open
day in Australia were presented with a tray of four drinks whose flavour they had to try and dis-
criminate. In total, 16 drinks were prepared for use in this study, resulting from the crossing of
four possible flavours (chocolate, orange, pineapple, and strawberry) and four possible colours
(brown, orange, yellow, and red). Four of the possible colour-flavour combinations in Oram
et al.’s study were congruent (as determined by the experimenters) while the remaining twelve
were deemed to be incongruent (though see Shankar et al. 2010a on the problematic notion of
congruency in this area of research). Each colour and flavour was represented once on the drinks
tray presented to each participant. The participants were not given any information about the
colours of the drinks and whether or not they might be meaningfully related to the flavours of the
drinks. After tasting each of the drinks, the participant had to try and discriminate whether it had
a chocolate, orange, pineapple, or strawberry flavour. The four choices were written on a card
next to the participant. Additionally, an actual chocolate bar, an orange, a pineapple, and a carton
of strawberries were each also placed next to the appropriate card.
The results of Oram et al.’s (1995) study (see Fig. 3.2) highlighted a clear developmental trend
toward an increased ability to correctly report (i.e. discriminate) the actual flavour of the drinks,
100
90 Colour-associated
Flavour-associated
80
Not associated with flavour or colour
Percentage option selected
70
60
50
40
30
20
10
0
2 to 7 8 and 9 10 and 11 12 to 18 Adult
Age group (in years)
Fig. 3.2 Graph highlighting the percentage of trials in which the participants’ flavour discrimination
response matched the colour of the drink, the actual flavour of the drink, or matched neither the
colour or flavour of the drink as a function of the age of the participants in Oram et al.’s (1995)
study. (Adapted from Nicholas Oram, David G. Laing, Ian Hutchinson, Joanne Owen, Grenville Rose,
Melanie Freeman, and Graeme Newell, The influence of flavor and color on drink identification by
children and adults, Developmental Psychobiology, 28 (4), pp. 239–49, (c) 1995, John Wiley and
Sons, with permission.)
DEVELOPMENTAL CHANGES IN THE CROSSMODAL INFLUENCE OF VISUAL CUES ON FLAVOUR PERCEPTION 71
regardless of their colour. That is, the crossmodal modulation of flavour perception by vision
apparently decreases in a fairly orderly manner with increasing age. Although Fig. 3.2 collapses the
data across all 16 of the possible colour–flavour combinations tested in the study, similar results were
apparently observed for each colour and flavour when they were examined individually. The only
noticeable exceptions to this generalization was that the participants were somewhat more likely to
respond on the basis of flavour for the chocolate-flavoured drink, and more likely to respond on the
basis of colour (i.e. responding ‘strawberry’) when a drink was coloured red (consistent with previous
research showing that red appears to be a particularly powerful colour in terms of modulating flavour
perception; see Spence et al. 2010). That said, more than 80% of the participants in each age group
identified the flavour of the drinks correctly when they were coloured congruently.
Oram et al. (1995) suggested that the most likely explanation for the developmental trend high-
lighted by their data was that, with increasing age, children become better able to focus their
attention on the flavour of food and drink items. Hence their judgments become less and less
influenced by any expectations that they may have regarding the likely flavour of the drink that
happens to be based on its colour. Note here that younger children have sometimes been shown
to be more influenced in their judgments of stimuli by the background within which that stimu-
lus happens to be presented than adults (see, for example, Moskowitz 1985). Oram et al. preferred
the quantitative change account of sensory dominance to the alternative possibility that there may
be an age-dependent qualitative change (or switch) in the reliance on specific sensory cues (visual
versus flavour-based) that children may exhibit. (Of course, it is worth bearing in mind here,
given the uncertainty that surrounds the definition of flavour in adults that it might not be
surprising if children’s understanding of the term were to change with age too.)
None of the participants in Oram et al.’s (1995) study was informed that the colours of the
drinks might be misleading with regard to their actual flavours. Hence, age-dependent changes in
the effects of task demands on participants’ performance cannot be ruled out as a potential factor
influencing Oram et al.’s results (cf. Zampini et al. 2007 for similar concerns with much of the
adult data in this area). That is, younger children may simply be more likely to assume (in the
context of an experimental setting) that the colour is likely to provide a meaningful indicator of a
drink’s flavour (that, or perhaps, children might simply include colour in their definition of fla-
vour at a younger age). By contrast, as adults we may all be more wary of the possibility of trickery
in the context of ‘a food experiment’. It could, however, be argued that the very fact that (for
whatever reason) children are more strongly influenced by colour than are adults when judging
flavour identity is, in itself, an interesting observation. Perhaps then the argument here can best
be framed in terms of there being uncertainty (on the basis of the study presented) about whether
this developmental change in the influence of vision should be thought of as reflecting an auto-
matic crossmodal effect on multisensory integration, or instead some more voluntary attentional
strategy that changes as a function of age.
Finally, when thinking about the results of Oram et al.’s (1995) study, it is worth noting that
even as adults, we tend to be particularly bad at identifying odours (e.g. Cain 1979; see Zellner
et al. 1991, for a review). It is therefore a shame that Oram and colleagues did not collect any data
concerning the baseline olfactory discrimination/identification abilities across the various age
groups that they tested. Without such data, it becomes difficult to rule out the possibility that any
developmental changes that they observed might, in part, simply have reflected the consequences
of age-related changes in olfactory and/or gustatory perception, rather than any age-related
changes in multisensory integration/perception per se (see Doty et al. 1984; Ganchrow and
Mennella 2003; see also Bryant 1974).
It is interesting to compare the slow and gradual change in sensory dominance observed
in Oram et al.’s (1995) study with the rather more sudden changes seen in Gori et al.’s (2008)
recent study of visual–haptic multisensory integration. There, children younger than 8 years of
age were found to use visual (rather than haptic) cues in order to judge the orientation of an
object (and haptic rather than visual cues in order to judge an object’s size). This total dominance
of one sense over the other had switched to a response strategy based on statistically optimal
multisensory integration (i.e. weighting each estimate according to its reliability according to
maximum likelihood estimation) by the time that children reached 8–10 years of age. The changes
in visual dominance observed by Oram et al. would appear to have been taking place much more
slowly.
More generally, though, one might ask whether the maximum likelihood estimation account
(or Bayesian decision theory) could be used to explain the developmental changes in multisen-
sory flavour perception data. One problem with applying this approach to flavour identification
is that most work on Bayesian decision theory has to date focused on situations in which people
have to make quantitative judgments (e.g. of relative size or position) rather than qualitative judg-
ments (such as what shape or speech sound is it). While it may be possible to model certain kinds
of qualitative (or categorical) judgments in terms of Bayesian decision theory (cf. Helbig and
Ernst 2008, pp. 13–14), it is, at present, by no means clear that it will be possible to do so for
categorical judgments such as those involved in flavour identification that are more difficult to
transcribe onto any kind of meaningful continuum.
It is also important to note here that the influence of visual cues on multisensory flavour
perception does not necessarily obey either the spatial or temporal rules (Shankar et al. 2010b):
that is, the sight of a drink on the table can still influence a person’s perception of flavour in
their mouth, despite the fact that the location of the cues is different. Similarly, colour cues are
normally available some time before the flavour of the food is actually experienced in the mouth.
This has led some researchers to argue that visual cues may be better conceptualized as influenc-
ing flavour perception by means of expectancy effects (Cardello 1994; Hutchings 2003; Spence
et al. 2010) rather than by multisensory integration based on the spatial and temporal rules
derived from single-cell neurophysiology (e.g. see Stein and Meredith 1993; see also Spence 2012;
Stevenson et al. 2000).
While a mechanistic explanation of expectancy effects in multisensory flavour perception is still
lacking (see Cardello 1994; Spence et al. 2010), one currently appealing way of thinking about
the integration/influence of colour on flavour is in terms of Bayesian priors (cf. Ernst and Bülthoff
2004; Shankar et al. 2010c; Spence 2011). That is, most likely through experience, we may
build up Bayesian priors concerning the fact that certain food colours normally co-occur with
certain flavours. Of course, it could be argued that Bayesian priors need not be learned, but could
perhaps reflect some bias in the way in which the brain happens to represent different kinds
of information neutrally (see Scholl 2005; Spence 2011). Only further developmental research
will allow us to distinguish between these various possibilities, although, at present, it is probably
safe to say that the majority of researchers favour the experience-based learning account. Red,
for example, often co-occurs with sweetness, while the majority of green fruits are sour (Kostyla
1978; Maga 1974). Given the commercial opportunities associated with being able to model
and predict multisensory flavour perception, it seems likely that within a few years researchers
will have extended Bayesian decision theory to try and account for the contribution of visual
cues to the perception of flavour. Neuroimaging research may, of course, also help researchers
to understand the neural mechanisms underlying the influence of visual cues on multisensory
flavour perception (Österbauer et al. 2005; Skrandies and Reuther 2008; see also De Araujo et al.
2003).
One other influential study to have looked for developmental changes in terms of vision’s influ-
ence on multisensory flavour perception was reported by Lavin and Lawless (1998). They conducted
an experiment in which the influence of colour intensity on ratings of sweetness intensity (i.e. on
CHANGES IN THE INFLUENCE OF COLOUR ON TASTE/FLAVOUR IN ADULTHOOD 73
taste rather than flavour judgments) in North American children and adults was investigated. The
participants were given two pairs of strawberry-flavoured beverages to compare and to rate in
terms of their sweetness (using a nine-point scale). One pair consisted of light- and dark-red
drinks while the other pair consisted of light- versus dark-green drinks. All of the drinks actually
had the same physical sweetness, varying only in terms of their colour. Lavin and Lawless tested
three groups of children (5–7 years, 8–10 years, and 11–14 years) and a group of adults.
The results showed that the adults rated the dark-red and light-green samples as being sweeter
than the light-red and dark-green samples, respectively. By contrast, colour intensity did not
have a significant effect on the responses of the younger age groups (although, if anything, the
11–14 year olds showed a trend in the opposite direction to that of the adults). In contrast to
Oram et al.’s (1995) results, then, Lavin and Lawless’s (1998) results demonstrate that changes in
the level of food coloring appear to have more of an effect on sweetness judgments in adults than
in children. Meanwhile, in another North American study, Alley and Alley (1998) demonstrated
no effect of colour (red, blue, yellow, green, or colourless) on the perceived sweetness (rated on a
ten-point scale) of sugar solutions served in either liquid or solid (i.e. gelatin) form to a group of
11–13 year olds.
However, it is perhaps worth pointing out at this point that judging the degree of sweetness,
as in the studies of Lavin and Lawless (1998), and Alley and Alley (1998), is not the same thing as
trying to identify (or discriminate) the flavour, as in Oram et al.’s (1995) study. What is more,
children are more likely to be able to perform relatively easy categorical judgments (i.e. identify-
ing the flavour of a beverage), than quantitative intensity judgments (i.e. judging how sweet a
drink happens to be). These possibilities raise the suggestion that the developmental story with
regards to colour’s changing influence on flavour perception may not be a simple one. Perhaps,
just like for the literature on adults (see Spence et al. 2010), colour may exert a qualitatively
different effect on flavour (or taste) identification versus on taste/flavour intensity judgments (see
also Koza et al. 2005). At present, the strongest evidence regards colour’s influence (in particular,
the hue of the colour) on flavour identification judgments and the decline of this crossmodal
influence over early development. I would argue that further research is really needed here before
one can draw any firm conclusions regarding the existence of developmental changes in the effect
of colour intensity (or saturation) changes on the perception of taste/flavour intensity.
3.7 Changes in the influence of colour on taste/flavour in

adulthood: the role of expertise
Barring accident, the sensitivities of the human senses do not change much over the course of
adulthood (e.g. between the ages of 18 and 50 years). There is some gradual decline, but the more
severe drops in sensory acuity have yet to occur (see Section 3.8). Hence, the only significant
changes (or development) that one is likely to see during this period relate to the altered sensory
perceptions of those who acquire an expertise in a particular domain of flavour perception. Of
such experts, the most widely studied have been wine tasters (e.g. see Lehrer 2009; Parr et al. 2002;
Spence 2010a). Researchers have, for example, demonstrated that when the students on a univer-
sity degree course in oenology in Bordeaux (i.e. in some sense experts) were given a glass of white
wine that had been artificially coloured red, they could be fooled into smelling the aromas that
they normally (and had previously) associated with a red wine (Morrot et al. 2001).
According to the suggestion put forward earlier, the presence of colour in food influences
flavour perception by means of the expectations that those colours set up in the mind of the
observer. What is more, the stronger the expectations, the stronger the crossmodal influence
of colour on flavour identification is likely to be (e.g. Hutchings 2003; Shankar et al. 2010b, c;
Shankar et al. 2010c).5 If one accepts the logic of this argument then one would expect that wine
experts ought to be more strongly influenced by inappropriate coloration than less experienced
drinkers (see Spence 2010a). That, indeed, is what has now been demonstrated. So, for example,
in an early study, Pangborn et al. (1963) gave expert and non-expert wine drinkers a set of dry
white wines that had been coloured pink, yellow, brown, red, or purple to simulate a rosé, or
blush, wine, Sauternes, sherry, claret, and Burgundy wine, respectively. The experts judged the
pink wine as tasting sweeter than when no colouring had been added, while the non-experts’
sweetness judgments were unaffected by the addition of colour to the drinks.
Parr et al. (2003) conducted a follow-up to Morrot et al.’s (2001) study in New Zealand, but this
time they tested both experts (including professional wine tasters and wine makers) and ‘social’
drinkers. They demonstrated that the experts’ descriptions of the aroma of a Chardonnay when it
was coloured red were more accurate when it was served in an opaque glass than when it was
served in a clear glass. This colour-induced biasing of their olfactory flavour judgments occurred
despite the fact that the experts had been explicitly instructed to rate each wine irrespective of its
colour (thus suggesting that this crossmodal effect of vision is not under cognitive control; cf.
Stillman 1993; Zampini et al. 2007). When the same experiment was conducted in social drinkers,
however, it turned out that they were so bad at reliably identifying the aromas present in the wine
that it was difficult to discern any pattern in the data when an inappropriate wine colour was
added. Nevertheless, taken together, the evidence that has been published to date is consistent
with the view that expert wine drinkers differ from social wine drinkers (i.e. non-experts) in the
degree to which visual (colour) cues influence their orthonasal perception of flavour (Parr et al.
2003) and their perception of sweetness (Pangborn et al. 1963).
That said, not all food/flavour experts exhibit the same increased responsiveness to visual
colour cues when evaluating flavour or taste. For example, Shankar et al. (2010d) recently reported
that flavour experts (those working on a descriptive panel at an international flavour house,
and who all had more than three years of experience flavor profiling food and drink products)
exhibited an equivalent amount of visual capture over their orthonasal olfactory flavour judg-
ments as non-experts (i.e. normal people). All of the participants who were selected to take part
in the main study were shown, in pre-testing, to expect a purple-coloured drink to taste of grape
and an orange-coloured drink to taste of orange. These colours were found to bias both groups
of participants’ judgments on the critical experimental trials when the cranberry- or blueberry-
flavoured drinks were coloured purple and when the grapefruit- and lemon-flavoured drinks
were coloured orange. Thus, the conclusion from the research that has been published to date
on flavour experts would appear to be that while some experts (specifically those with an expertise
in wine—the same may also go for tea and coffee experts) show an enhanced susceptibility to
the crossmodal influence of colour on their judgments of food and drink items within their area
of expertise (Pangborn et al. 1963; Parr et al. 2003), this pattern of results does not necessarily
extend to other groups of flavour experts (Shankar et al. 2010d; see also Lelièvre et al. 2009;
Teerling 1992).
5 According to this account, it should not matter much whether the expectation happens to be set-up by the
colour of a food or drink item, or simply by the name of the colour itself (if, say, the participant happened
to be blindfolded). While this has never been tested for the perception of flavour, the results of a study by
Davis (1981) demonstrated that simple colour word cues can be as effective as colour patches in modulat-
ing a participant’s odour identification responses. That said, it could also be argued that the assumption of
unity will be stronger when the colour comes from the food or drink itself, rather from labeling/verbal
description, packaging colour etc. (see Shankar et al. 2010b).
THE DECLINE OF MULTISENSORY FLAVOUR PERCEPTION 75
At present, it is not clear what explains these differences in the modulatory effect of visual
(colour) cues on the taste and flavour judgments of different groups of flavour experts. Several
possibilities spring to mind, including whether or not the participants in these various studies
were aware that the colour of a foodstuff that they were evaluating may have been misleading
(cf. Stillman 1993; Zampini et al. 2007, 2008). That is, experts may be more influenced by the
colour of a food or drink item if they believe that it is informative with regard to the taste, aroma,
and/or flavour, while at the same time being better able to discard the information before their
eyes (and adopt an analytic approach to tasting; Prescott et al. 2004) if they have reason to believe
that the colour may be misleading. It is perhaps also worth noting that while there is typically a
meaningful relationship (or correlation) between the colour of a ‘natural’ product such as wine
(and presumably also coffee and tea) and its taste/aroma/flavour properties (see Spence 2010a),
the same is not necessarily true of the (synthetic and/or processed) coloured foodstuffs that the
flavour experts studied by Shankar et al. (2010d) would normally have to evaluate (where the
relationship between colour and flavour is often manipulated artificially to obtain a particular
commercially-desirable outcome).
The one other way in which the influence of colour on flavour perception might be expected to
change during adulthood relates to cohort effects in terms of exposure to different foods (cf. Blake
2004; Lavin and Lawless 1998). In fact, some of the latest research from the Crossmodal Research
Laboratory here in Oxford has suggested that exposure to different products in the marketplace
(and hence presumably in one’s diet) can influence which flavours different individuals expect
different coloured foods and drinks to taste of (see Shankar et al. 2010a). The participants in
Shankar et al.’s study were simply asked to look at a set of six coloured liquids presented in trans-
parent plastic drinking cups (see Fig. 3.3) and to report what flavour they would expect a drink of
that colour to have. That is, the flavour expectations of the participants were based solely on the
colour of the beverage that they saw. The results showed that the bright clear blue-coloured drink
(see Fig. 3.3B) was associated with raspberry flavour in a group of young British participants but
with the flavour of mint in a group of young Taiwanese participants. While such ‘arbitrary’ cross-
modal associations might at first seem perplexing, Shankar and her colleagues suggested that the
young adult British participants in their study may have picked up the blue–raspberry association
from fruit drinks (such as Cool Blue Gatorade which has a raspberry taste), whereas in the absence
of such products in the marketplace, the Taiwanese participants may have associated the
blue colour with the minty taste of mouthwash instead. While such differences in colour–flavour
associations were demonstrated cross-culturally in Shankar et al.’s study, similar trends are likely
to be present as a function of age within a culture too, given the different patterns of consump-
tion. What is clear is that certain flavours are likely to be more familiar to those of more advanced
years (cf. Blake 2004). Think here only of Palma violets or the heavy scent of patchouli that
are indelibly linked to the 1960s for many of those of a certain age. Angel Delight™, introduced
into the UK marketplace in the 1960s, and incredibly popular for a couple of decades thereafter,
also established a link (for those who grow up at the time) between a particular shade of pinkish-
red and a synthetic strawberry flavour (one that was unusual in containing a hint of pineapple in
the flavour).
3.8 The decline of multisensory flavour perception

All of the senses decline with age. However, that said, we have prostheses (glasses and hearing
aids) to correct for any loss of visual or auditory acuity. The situation is not so fortunate when it
comes to the other senses. Indeed, there is currently nothing that can be done to recover taste,
smell, and touch function once they have been lost. What is more, more and more people are now
Fig. 3.3 The six coloured drinks used in Shankar et al.’s (2010a) study, in which the crossmodal
flavour expectations associated with particular beverage colours was assessed cross-culturally in
two groups of young adult participants, one from the UK and the other from Taiwan. Reprinted
from Consciousness and Cognition, 19 (1), Maya U. Shankar, Carmel A. Levitan, and Charles Spence,
Grape expectations: The role of cognitive influences in color–flavor interactions, pp. 380–90,
Copyright (2010), with permission from Elsevier (Reproduced in colour in the colour plate section).
THE DECLINE OF MULTISENSORY FLAVOUR PERCEPTION 77
living to a more advanced age, in particular, to an age where the loss of gustatory and olfactory
sensitivity is starting to have a markedly detrimental effect on their health and well-being (see
Doty et al. 1984; Schiffman 1997). This is especially problematic given that eating and drinking
constitute some of the most treasured pleasures for those in their later years. As Anton Brillat-
Savarin (1835) put it in his classic volume, The Philosopher in the Kitchen: ‘The pleasures of the
table, belong to all times and all ages, to every country and to every day; they go hand in hand with
all our other pleasures, outlast them, and remain to console us for their loss’. But what of the
empirical data?
A little over half a century ago, Cooper et al. (1959) reported there to be little change in people’s
taste sensitivity up to 50 years of age, but that after people reached their mid-50s, there was a
sharp decline in sensitivity for the four basic tastes (see Schiffman 1977). Schiffman and Pasternak
(1979) demonstrated that the elderly (72–78 years of age) found it harder to discriminate between
food odours than younger participants (19–25 years of age in this study). In a more recent review
of the literature on the decline of smell and taste in the elderly, Schiffman (1997) reported that the
evidence supported the claim that the decline of taste and smell really starts in earnest once people
reach around 60 years of age, and becomes more severe in those who have reached 70 years or
more (see also Cowart 1981; Doty et al. 1984; Mojet et al. 2003). To put this decline into some
kind of perspective, research shows that thresholds for many tastes and odours can be as much as
12 times higher in the elderly as compared to younger people, especially if the participant happens
to be on medication as the majority of elderly people apparently are (Schiffman 1997; Schiffman
and Warwick 1989). What is more, a number of studies have suggested that olfactory sensitivity
appears to decline more severely than gustatory sensitivity (Cowart 1989; see also Stevens et al.
1984). In addition to any sensory decline, there is also evidence that older participants may strug-
gle with the cognitive demands of chemosensory tasks (e.g. as when participants are required to
remember a taste or flavour and compare it to a subsequently presented stimulus; see Cowart
1981, 1989).
Given that (as pointed out earlier) as much as 80% of flavour perception may come from the
information provided by the nose (rather than from the tongue; see Martin 2004; Murphy et al.
1977), one would expect this decline in olfactory sensitivity to have a particularly severe effect
on multisensory flavour perception. Furthermore, given the much more severe decline of olfac-
tory as compared to gustatory abilities with increasing age, one might also predict that in old
age colour would increasingly come to influence flavour judgments as compared to (the relatively
less impaired) taste judgments. Several researchers have suggested that people’s perception of
food aroma and flavour intensity will increasingly be influenced by food colour as the chemical
senses start their inevitable decline (e.g. Christensen 1985; Clydesdale 1994). However, the evi-
dence that has been published on this topic to date is rather mixed, with several studies actually
failing to find any significant difference between adults and the elderly on the extent to which
colour influences flavour.
So, for example, Chan and Kane-Martinelli (1997) conducted one oft-cited study that exam-
ined the effect of the addition of colour on perceived flavour intensity or acceptability ratings of
chicken bouillon and chocolate pudding in both young and older adults (20–35 years old and
60–90 years old, respectively). Three levels of colour were added: no colour added, standard (the
commercially available colour), and high colour (twice the standard) for each foodstuff. Each
participant tasted and evaluated each of three samples of one food using a series of visual analog
scales. The results suggested that the young adults’ judgments were affected more by the actual
level of food colouring added than were the older adults, although, as one might expect, there was
far more variation in response in the older group (hence making it harder to pick up significant
effects in this group). The younger group’s judgment of the overall flavour intensity of the chicken
bouillon was affected by the amount of colouring added. The younger group also showed signifi-
cant effects of amount of colour added when rating the acceptability of appearance of both the
chicken bouillon and the chocolate pudding.
Similarly, Christensen (1985) failed to find any evidence that elderly participants were affected
more by visual cues than were younger participants. She compared a group of young adults
(21–40 years of age) with a group of elderly participants (65–85 years of age) on their perception
of processed cheese and grape-flavoured jelly. Each participant in the study evaluated one of the
two foods which was presented at one of three flavour intensities and one of three colour intensity
levels (low, medium, or high) giving rise to nine possible variations of each food (where the
medium flavouring and colour levels were considered as normal for these particular products).
The participants were then presented with 52 pairs of food samples and asked to discriminate
which sample had a more intense flavour or aroma (this involved the participants ‘tasting’
or sniffing the products, respectively). The participants also had to give a certainty judgment
concerning their response. The difficulty of the flavour discrimination task was manipulated by
pairing a high intensity flavour with either a low or medium intensity sample (making for easy
versus hard tasks, respectively). The colour of the two samples could either be the same, the col-
our difference could reinforce the flavour difference (i.e. the higher intensity colour matched with
the more intense flavour), or else it could be incongruent (i.e. where the less strongly flavoured
food had the more intense flavour).
Christensen (1985) observed no significant differences between performance of the older
and younger group when evaluating the grape jelly. However, it is worth noting that in the major-
ity of conditions, the elderly group performed less accurately than the young group (i.e. regardless
of the specific colour condition) when judging flavour. The lack of a significant difference between
the groups in this case then likely simply reflects a lack of statistical power given the small sample
sizes used (there were only 12 participants in each group). What is more, both groups of partici-
pants were at ceiling performance in most of the conditions in the aroma judgment task, hence
making it difficult to draw any firm conclusions from their data. When evaluating the cheese
sample, performance was at ceiling in the flavour-intensity judgment task, and the elderly par-
ticipants performed numerically somewhat better than the younger participants in the aroma-
intensity judgment task. For both foods, the participants performed less accurately in the
incongruently coloured condition than in the condition where the colours of the two samples
were matched (thus showing that the addition of colour did have some impact on flavour percep-
tion). As Christensen herself notes, however, participants may by and large have learned to ignore
the colour of the foods in the context of the experimental setting. Given these problems, it
is therefore difficult to draw any firm conclusions from Christensen’s results about whether or
not the relative importance of visual cues changes with age.
Elsewhere, Philipsen et al. (1995) reported a study in which they compared a young adult
population (18–22 years of age) to a group of older participants (60–75 years of age) when rating
various attributes (e.g. sweetness, flavour intensity, flavour quality, flavour identification etc.) of
15 samples of an artificially-flavoured cherry beverage varying in sucrose, flavour, and colour.
Interestingly, variations in colour intensity did not have a significant effect on sweetness ratings
in either group, but did impact on flavour intensity ratings in the older group, if not in the
younger group. Changes in the colour of the drinks also had a significant effect on flavour quality
and overall acceptability ratings in both age groups. Philipsen et al.’s results therefore support the
claim that older participants are more influenced by visual (colour) cues, likely because of their
reduced sensitivity to olfactory and gustatory flavor cues (see also Tepper 1993). Here, however,
one might also think that this increased reliance on vision could relate to older individuals
being, in some sense, more ‘expert’ than younger tasters (that is, they have certainly had far more
CONCLUSIONS 79
experience in terms of picking up the correlations that exist in nature between tastes/flavours
and colours; remember here also that wine experts tend to be swayed more by visual cues than less
experienced wine drinkers; Pangborn et al. 1963; Parr et al. 2003; Spence 2010a).
In summary, there is some evidence that older individuals may rely more heavily on colour
cues when perceiving and evaluating foods than do younger adults (e.g. Philipsen et al. 1995;
Tepper 1993). That said, it should be noted that the data are noisy and somewhat inconsistent,
with no difference between younger and older adults being reported by researchers using certain
tasks (see Chan and Kane-Martinelli 1997; Christensen 1985). It would therefore be ideal if
someone were to repeat Oram et al.’s (1995) flavour discrimination study (discussed earlier)
with older adults. I would predict that such a study, because it involves flavour identification (or
discrimination) rather than taste/flavour intensity ratings, would give rise to more profound
age-related changes in the modulatory role of visual cues (remember that, in adults, flavour-
discrimination responses have proven to be more affected by colour cues than have taste or
flavour-intensity responses; Spence et al. 2010 ). What is more, the relative simplicity of
the experimental design utilized by Oram et al. (involving the presentation of just four drinks to
each participant) is also less likely to give rise to a situation in which participants learn to ignore
the visual cues in the specific context of the experiment situation (cf. Christensen 1985; Spence
et al. 2010).
3.9 Conclusions
Despite the fact that flavour perception constitutes one of the most multisensory of our everyday
experiences (e.g. Auvray and Spence 2008; Spence 2010b; Stillman 2002), developmental research-
ers have seemingly not been overly interested in investigating any developmental changes affect-
ing the relative contribution of each of the senses to multisensory flavour perception. Indeed,
Piaget himself apparently never gave the development of flavour perception much thought (at
least not in print). William James (1890) only got as far as smell, but failed to mention taste in his
oft-cited quote: ‘The baby, assailed by eye, ear, nose, skin and entrails at once, feels it all as one
great blooming buzzing confusion’. Perhaps though this latter omission can be explained by
James’s further comment elsewhere in The Principles of Psychology that ‘Taste, smell, as well as
hunger, thirst, nausea and other so-called “common” sensations need not be touched on . . . as
almost nothing of psychological interest is known concerning them’. In fact, the situation had not
changed much by the time of Pick and Pick’s (1970) influential review of sensory and perceptual
development. More recently, Lewkowicz and Lickliter’s (1994) edited volume on infant develop-
ment also contained nothing on the development of taste, smell, and/or the flavour senses.
This lacuna in the developmental literature is all the more surprising given the fundamental
importance of food and food acquisition to brain development and survival. As Young (1968, p. 21)
puts it so eloquently: ‘No animal can live without food. Let us then pursue the corollary of this:
namely, food is about the most important influence in determining the organization of the brain
and the behavior that the brain organization dictates’. What is more, as Mennella and Beauchamp
(1994, p. 25) point out: ‘Anyone who has observed infants for any period of time can testify to the
intense activity occurring in and around their mouths – a primary site for learning in the first few
months of life. During feeding, or while mouthing objects such as their hands and toys, infants learn
to discriminate the varying features of their new world.’ In their latest review, Mennella and
Beauchamp (2010, p. 204) go on to highlight the fact that: ‘one of the most important decisions that
an animal makes’ is ‘whether to reject a foreign substance or take it into the body’.
Given the alarming rise in childhood obesity in recent years, much of the currently interest in
the developmental aspects of multisensory flavour perception relates to the establishment and
modification of infant preferences for (and acceptance of) particular classes of typically ‘healthier/
healthful’ foodstuff (see Harris 2008 and Mennella and Beauchamp 2010 for reviews). While the
situation is somewhat better at the other end of the age spectrum, research with elderly popula-
tions has nevertheless primarily been driven by concerns over the consequences of sensory decline
for healthy eating, rather than necessarily because of any particular curiosity about how multisen-
sory integration changes over the latter stages of the lifespan (see Schiffman 1997). Given the
paucity of empirical data concerning age-related changes in the multisensory integration of uni-
sensory flavour signals at either end of the age spectrum, this chapter has focused primarily on
those (slightly more common) studies that have investigated the relative contribution of visual
cues (specifically colour cues) to multisensory flavour perception.
In summary, the results, although somewhat messy, tend to support the claim that visual cues
have a greater influence on multisensory flavour perception in childhood and (to a lesser extent)
in old age than during adulthood. Just as for the adult literature, the clearest age-related changes
concern vision’s influence on flavour identification (Oram et al. 1995), whereas the data on fla-
vour or taste intensity are far more mixed (Alley and Alley 1998; Chan and Kane-Martinelli 1997;
Lavin and Lawless 1998; though see also Léon et al. 1999; Philipsen et al. 1995; Tepper 1993). That
said, the reasons behind this change in visual dominance may be somewhat different in the two
cases. It seems plausible that in children, the increased role of vision in flavour identification may
result from a strategy of relying (or tendency to rely) on a single source of sensory information
rather than integrating all of the available sensory cues. Indeed, as Ganchrow and Mennella (2003,
p. 839) note, it is also unclear when exactly during the course of infancy taste, retronasal olfaction,
and oral-irritation fuse into a single sensory gestalt, that of flavour (cf. Spence 2012; Verhagen
and Engelen 2006).
At the opposite end of the age spectrum, taste and smell (the two most important senses for
flavour perception) start their inevitable decline, with the more severe drops in sensitivity occur-
ring during the sixth/seventh decades of life. As a result this loss of gustatory and olfactory acuity,
one might have expected that visual cues would come to play a much more dominant role in
influencing multisensory flavour perception in old age (cf. Christensen 1985). However, that said,
the extent of the increase in visual dominance over flavour perception in old age is perhaps not as
great as one might have predicted given the profound drops in olfactory and gustatory sensitivity
that have now been observed in numerous studies (and by comparison with the developmental
changes in visual dominance one sees at the other end of the age spectrum; though see Mojet et al.
2003). How, then, should the relatively modest changes in vision’s influence over flavour percep-
tion in old age be explained? Well, one possibility is that there may perhaps be some form of
compensatory behaviour in terms of enhanced multisensory integration in older participants (cf.
Chapter 11 by Laurienti and Hugenschmidt; Laurienti et al. 2006). Although there is little evi-
dence directly supporting it yet, the alternative possibility that there might be a more general
breakdown in multisensory integration in old age should also be borne in mind (see Nusbaum
1999, on this possibility). One might also consider the possibility that older individuals can be
thought of as being in some sense more ‘expert’ than younger people. Perhaps any expertise
attributable to age may then increase an individual’s analytic tasting abilities (i.e. their ability to
focus on the inputs from just a single sense; cf. Prescott et al. 2004). Such cognitive factors may
help older people to focus their attention on gustatory and/or olfactory inputs, hence potentially
reducing vision’s influence over flavour perception in this age group. On the other hand, exper-
tise (at least in the food and drink sector) is also associated with people having stronger expecta-
tions about the consequences of changes in the hue and/or intensity (or saturation) of a colour for
the perception of flavour and taste.
REFERENCES 81
Given the recent success of Bayesian decision theory in accounting for and predicting the
patterns of sensory dominance in the conflict situation seen in a range of different situations
(Ernst and Bülthoff 2004), it will be interesting in the coming years to see whether it can also be
used to account for visual influences on multisensory flavour and taste perception (see Shankar
et al. 2010c), and its change over the lifespan (cf. Gori et al. 2008). Of course, if one adopts the
view that vision’s role in modulating flavour perception occurs primarily through the setting-up
of expectations then it might turn out to be more appropriate to model vision’s influence on fla-
vour in terms of a coupling prior in Bayesian decision theory (cf. Shams and Beierholm 2010;
Spence 2011).
As noted earlier, the majority of age-related studies of multisensory flavour perception have
tended to focus on the (changing) influence of visual cues on taste and flavour perception, and
hence it is on these studies that I have focused in this review. Nevertheless, it seems at least plau-
sible that what is now known about the changing contribution of visual cues to multisensory
flavour perception over the lifespan may provide a model for thinking about how other multisen-
sory interactions, such as the influence of auditory or oral-somatosensory cues to flavour percep-
tion (Blossfeld et al. 2007; Bult et al. 2007; Zampini and Spence 2004; see Spence and Shankar
2010; Zampini and Spence 2010, for reviews) may also develop. However, in the absence of any
empirical evidence on this question thus far, confirmation of such claims will clearly need to
await future research. Furthermore, it must also be born in mind that unlike audition and oral-
somatosensation, vision is not integral to many researchers’ definition of flavour (see Spence et al.
2010). Hence, it could also be argued that vision’s influence on flavour perception, in terms of
setting-up expectations, is different in kind from the multisensory integration that results in the
binding of olfactory, gustatory, oral-somatosensory, and auditory cues into unified flavour
gestalts. (Here, the prediction is that a verbal description, such as hearing the phrase ‘this is a red
drink’, might have just as much of an influence on a blindfolded individual’s flavour responses,
as actually colouring a drink red has on the perception of a person who can see the drink.) If the
latter view turns out to be correct, then vision’s influence of flavour perception may end up
having a rather different developmental trajectory than that of audition or oral somatosensation.
Given such uncertainty, more research, both theoretical and empirical, is clearly needed in order
for scientists to make progress in understanding the development and decline of multisensory
flavour perception.
References
Abate, P. Pueta, M., Spear, N.E., and Molina, J.C. (2008). Fetal learning about ethanol and later ethanol
responsiveness: evidence against ‘safe’ amounts of prenatal exposure. Experimental Biology and
Medicine, 233, 139–54.
Alley, R.L., and Alley, T.R. (1998). The influence of physical state and color on perceived sweetness. Journal
of Psychology: Interdisciplinary and Applied, 132, 561–68.
Auvray, M., and Spence, C. (2008). The multisensory perception of flavor. Consciousness and Cognition, 17,
1016–31.
Bartoshuk, L.M., and Duffy, V.B. (2005). Chemical senses: taste and smell. In The taste culture reader:
experiencing food and drink (ed. C. Korsmeyer), pp. 25–33. Berg, Oxford.
Birch, L.L. (1999). Development of food preferences. Annual Review of Nutrition, 19, 41–62.
Birch, L.L., McPhee, L., Shoba, B.C., Pirok, E., and Steinberg, L. (1987). What kind of exposure reduces
children’s food neophobia? Looking vs. tasting. Appetite, 9, 171–78.
Blake, A.A. (2004). Flavour perception and the learning of food preferences. In Flavor perception (ed. A.J.
Taylor and D.D. Roberts), pp. 172–202. Blackwell, Oxford.
Blossfeld, I., Collins, A., Kiely, M., and Delahunty, C. (2007). Texture preferences of 12-month-old infants
and the role of early experience. Food Quality and Preference, 18, 396–404.
Bradley, R.M., and Mistretta, C.M. (1975). Fetal sensory receptors. Physiology Reviews, 55, 352–82.
Bradley, R.M., and Stern, I.B. (1967). The development of the human taste bud during the foetal period.
Journal of Anatomy, 101, 743–52.
Brillat-Savarin, J.A. (1835). Physiologie du goût [The philosopher in the kitchen / The physiology of taste].
J.P. Meline: Bruxelles. (translated by A. Lalauze (1884), A handbook of gastronomy. Nimmo and Bain,
London.
Bryant, P. (1974). Perception and understanding in young children: an experimental approach. Methuen, London.
Bult, J.H.F., de Wijk, R.A., and Hummel, T. (2007). Investigations on multimodal sensory integration:
texture, taste, and ortho- and retronasal olfactory stimuli in concert. Neuroscience Letters, 411, 6–10.
Cain, W.S. (1979). To know with the nose: keys to odor identification. Science, 203, 467–70.
Capretta, P.J., Petersik, J.T., and Stewart, D.J. (1975). Acceptance of novel flavors is increased after early
experience of diverse tastes. Nature, 254, 689–91.
Cardello, A.V. (1994). Consumer expectations and their role in food acceptance. In Measurement of food
preferences (ed. H.J.H. MacFie and D.M.H. Thomson), pp. 253–97. Blackie Academic and Professional,
London.
Chan, M.M., and Kane-Martinelli, C. (1997). The effect of color on perceived flavor intensity and acceptance
of foods by young adults and elderly adults. Journal of the American Dietetic Association, 97, 657–959.
Christensen, C. (1985). Effect of color on judgments of food aroma and food intensity in young and elderly
adults. Perception, 14, 755–62.
Clydesdale, F.M. (1994). Changes in color and flavor and their effect on sensory perception in the elderly.
Nutrition Reviews, 52 (8 Pt 2), S19–20.
Cooper, R.M., Bilash, I., and Zubeck, J.P. (1959). The effect of age on taste sensitivity. Journal of
Gerontology, 14, 56–58.
Cowart, B.J. (1981). Development of taste perception in humans: sensitivity and preference throughout the
life span. Psychological Bulletin, 90, 43–73.
Cowart, B.J. (1989). Relationships between taste and smell across the adult life span. Annals of the New York
Academy of Sciences, 561, 39–55.
Davis, R.G. (1981). The role of nonolfactory context cues in odor identification. Perception and
De Araujo, I.E.T., Rolls, E.T., Kringelbach, M.L., McGlone, F., and Phillips, N. (2003). Taste-olfactory
convergence, and the representation of the pleasantness of flavour, in the human brain. European
Delwiche, J. (2004). The impact of perceptual interactions on perceived flavor. Food Quality and Preference,
15, 137–46.
DeSnoo, K. (1937). Das trinkende kind im uterus [The drinking baby in the uterus]. Monatschrift für
Geburtshilfe und Gynäkologie, 105, 88–97.
Desor, J.A., Maller, O., and Turner, R.E. (1973). Taste in acceptance of sugars by human infants. Journal of
Comparative and Physiological Psychology, 84, 496–501.
Desor, J.A., Greene, L.S., and Maller, O. (1975). Preferences for sweet and salty in 9- to 15-year-old and
adult humans. Science, 190, 686.
Desor, J.A., Maller, O., and Andrews, K. (1975). Ingestive responses of human newborns to salty, sour, and
bitter stimuli. Journal of Comparative and Physiological Psychology, 89, 966–70.
Doty, R.L. (1992). Olfactory function in neonates. In The human sense of smell (ed. D.G. Laing, R.L. Doty,
and W. Breipohl), pp. 155–65. Springer-Verlag, Heidelberg.
Doty, R.L., Shaman, P., Applebaum, S.L., Giberson, R., Siksorski, L., and Rosenberg, L. (1984). Smell
identification ability: changes with age. Science, 226, 1441–43.
Drewnowski, A. (2000). Sensory control of energy density at different life stages. Proceedings of the Nutrition
Society, 59, 239–44.
REFERENCES 83
DuBose, C.N., Cardello, A.V., and Maller O. (1980). Effects of colorants and flavorants on identification,
perceived flavor intensity, and hedonic quality of fruit-flavored beverages and cake. Journal of Food
Science, 45, 1393–99, 1415.
Egan, J.M., and Margolskee, R.F. (2008). Taste cells of the gut and gastrointestinal chemosensation.
Molecular Intervention, 8, 78–81.
Erikson, R.P. (2008). A study of the science of taste: on the origins and influence of the core ideas.
Behavioral and Brain Sciences, 31, 59–105.
Fernandez, M., and Bahrick, L.E. (1994). Infants’ sensitivity to arbitrary object-odor pairings. Infant
Frank, R.A., and Byram, J. (1988). Taste-smell interactions are tastant and odorant dependent. Chemical
Senses, 13, 445–55.
Galef, B.G., and Sherry, D.F. (1973). Mother’s milk: a medium for transmission of cues reflecting the flavor
of mother’s diet. Journal of Comparative and Physiological Psychology, 83, 374–78.
Ganchrow, J.R., and Mennella, J.A. (2003). The ontogeny of human flavour perception. In Handbook of
olfaction and gustation (ed. R.L. Doty), pp. 823–46. Marcel Dekker, New York.
Gilbert, A.N., Martin, R., and Kemp, S.E. (1996). Cross-modal correspondence between vision and
olfaction: the color of smells. American Journal of Psychology, 109, 335–51.
Glendinning, J.I. (1994). Is the bitter rejection response always adaptive? Physiology and Behavior, 56,
1217–27.
haptic information. Current Biology, 18, 694–98.
development (ed. E. Tobach, L.R., Aronson, and E.F. Shaw), pp. 67–128. Academic Press, New York.
Green, B.G. (2002). Studying taste as a cutaneous sense. Food Quality and Preference, 14, 99–109.
Harris, G. (2008). Development of taste and food preferences in children. Current Opinion in Clinical
Nutrition and Metabolic Care, 11, 315–319.
Hausner, H., Bredie, W.L.P., Mølgaard, C., Petersen, M.A., and Møller, P. (2008). Differential transfer of
dietary flavour compounds into human breast milk. Physiology and Behavior, 95, 118–24.
Helbig, H.B., and Ernst, M.O. (2008). Visual-haptic cue weighting is independent of modality-specific
attention. Journal of Vision, 8, 1–16.
Hepper, P. (1995). Human fetal ‘olfactory’ learning. International Journal of Prenatal and Perinatal
Horne, P.J., Hardman, C.A., Lowe, C.F., Tapper, K., Le Noury, J., Madden, P., Patel, P., and Doody, M.
(2009). Increasing parental provision and children’s consumption of lunchbox fruit and vegetables in
Ireland: the Food Dudes intervention. European Journal of Clinical Nutrition, 63, 613–618.
Horne, P.J., Tapper, K., Lowe, C.F., Hardman, C. A., Jackson, M.C., and Woolner, J. (2004). Increasing
children’s fruit and vegetable consumption: a peer-modelling and rewards-based intervention.
European Journal of Clinical Nutrition, 58, 1–12.
Houston-Price, C., Butler, L., and Shiba, P. (2009). Visual exposure impacts on toddlers’ willingness to
taste fruit and vegetables. Appetite, 53, 450–53.
Hurlbert, A.C., and Ling, Y. (2007). Biological components of sex differences in color preference. Current
Biology, 17, R623–25.
Hutchings, J.B. (2003). Expectations and the food industry: the impact of color and appearance. Plenum
Publishers, New York.
ISO (1992). Standard 5492: Terms relating to sensory analysis. International Organization for Standardization
Khan, R.M., Luk, C.H., Flinker, A., et al. (2007). Predicting odor pleasantness from odorant structure:
pleasantness as a reflection of the physical world. Journal of Neuroscience, 27, 10015–23.
Kostyla, A.S. (1978). The psychophysical relationships between color and flavor of some fruit flavored beverages.
Ph.D. thesis. University of Massachusetts, Amherst, Massachusetts.
Koza, B.J., Cilmi, A., Dolese, M., and Zellner, D.A. (2005). Color enhances orthonasal olfactory intensity
and reduces retronasal olfactory intensity. Chemical Senses, 30, 643–49.
Laurienti, P.J., Burdette, J.H., Maldjian, J.A., and Wallace, M.T. (2006). Enhanced multisensory integration
in older adults. Neurobiology of Aging, 27, 1155–63.
Lavin, J., and Lawless, H. (1998). Effects of color and odor on judgments of sweetness among children and
adults. Food Quality and Preference, 9, 283–89.
Lehrer, A. (2009). Wine and conversation (2nd Ed.). Oxford University Press, Oxford.
Lelièvre, M., Chollet, S., Abdi, H., and Valentin, D. (2009). Beer-trained and untrained assessors rely more
on vision than on taste when they categorize beers. Chemosensory Perception, 2, 143–53.
Léon, F., Couronne, T., Marcuz, M., and Köster, E. (1999). Measuring food liking in children: a
comparison of non-verbal methods. Food Quality and Preference, 10, 93–100.
Lewkowicz, D.J., and Lickliter, R. (Eds.). (1994). The development of intersensory perception: comparative
Liem, D.G., and Mennella, J.A. (2003). Heightened sour preferences during childhood. Chemical Senses, 28,
173–80.
Lowenberg, M.E. (1934). Food for the young child. Collegiate Press, Ames, IO.
Maga, J.A. (1974). Influence of color on taste thresholds. Chemical Senses and Flavor, 1, 115–119.
Marshall, D., Stuart, M., and Bell, R. (2006). Examining the relationship between product package colour
and product selection in preschoolers. Food Quality and Preference , 17, 615–21.
Martin, G.N. (2004). A neuroanatomy of flavour. Petits Propos Culinaires, 76, 58–82.
Maurer, D. (1997). Neonatal synaesthesia: implications for the processing of speech and faces.
In Synaesthesia: classic and contemporary readings (ed. S. Baron-Cohen, and J.E. Harrison), pp. 224–42.
Blackwell, Oxford.
Maurer, D., and Mondloch, C.J. (2005). Neonatal synaesthesia: a reevaluation. In Synaesthesia: perspectives
from cognitive neuroscience (ed. L.C. Robertson and N. Sagiv), pp. 193–213. Oxford University Press,
Oxford.
McBurney, D.H. (1986). Taste, smell, and flavor terminology: taking the confusion out of fusion.
In Clinical measurement of taste and smell (ed. H.L. Meiselman and R.S. Rivkin), pp. 117–25.
Macmillan, New York.
Mennella, J.A. (1995). Mother’s milk: a medium for early flavor experiences. Journal of Human Lactation,
11, 39–45.
Mennella, J.A., and Beauchamp, G.K. (1994). Early flavor experiences: when do they start? Nutrition Today,
29, 25–31.
Mennella, J.A., and Beauchamp, G.K. (2010). The role of early life experiences in flavor perception and
delight. In Obesity prevention: the role of brain and society on individual behavior (ed. L. Dubé,
A. Bechara, A. Dagher, A. Drewnowski, J. Lebel, P. James, R.Y. Yada, and M.-C. LaFlamme-Sanders),
pp. 203–217. Elsevier, Amsterdam.
Mennella, J.A., Jagnow, C.P., and Beauchamp, G.K. (2001). Prenatal and postnatal flavor learning by
human infants. Pediatrics, 107, e88, 1–6.
Mojet, J., Heidema, J., and Christ-Hazelhof, E. (2003). Taste perception with age: Generic or specific losses
in supra-threshold intensities of five taste qualities? Chemical Senses, 28, 397–413.
Morrot, G., Brochet, F., and Dubourdieu, D. (2001). The color of odors. Brain and Language, 79, 309–20.
Moskowitz, H.R. (1985). Product testing with children. In New directions for product testing and sensory
analysis of foods (ed. H.R. Moskowitz), pp. 147–64. Food and Nutrition Press, Westport, CT.
Murphy, C., Cain, W.S., and Bartoshuk, L.M. (1977). Mutual action of taste and olfaction. Sensory
Processes, 1, 204–211.
Nusbaum, N.J. (1999). Aging and sensory senescence. Southern Medical Journal, 92, 267–75.
REFERENCES 85
Oram, N., Laing, D.G., Hutchinson, I., et al. (1995). The influence of flavor and color on drink
identification by children and adults. Developmental Psychobiology, 28, 239–46.
Österbauer, R.A., Matthews, P.M., Jenkinson, M., Beckmann, C.F., Hansen, P.C., and Calvert, G.A. (2005).
Color of scents: chromatic stimuli modulate odor responses in the human brain. Journal of
Pangborn, R.M., Berg, H.W. and Hansen, B. (1963). The influence of color on discrimination of sweetness
in dry table-wine. American Journal of Psychology, 76, 492–95.
Parr, W.V., Heatherbell, D., and White, K.G. (2002). Demystifying wine expertise: Olfactory threshold,
perceptual skill and semantic memory in expert and novice wine judges. Chemical Senses, 27, 747–55.
Parr, W.V., White, K.G., and Heatherbell, D. (2003). The nose knows: influence of colour on perception of
wine aroma. Journal of Wine Research, 14, 79–101.
Philipsen, D.H., Clydesdale, F.M., Griffin, R.W., and Stern, P. (1995). Consumer age affects response to
sensory characteristics of a cherry flavored beverage. Journal of Food Science, 60, 364–68.
Pick, H.L., and Pick, A.D. (1970). Sensory and perceptual development. In Carmichael’s manual of child
psychology (Vol. 1) (ed. P. H. Mussen). Wiley, New York.
Pliner, P. (1982). The effects of mere exposure on liking for edible substances. Appetite, 3, 283–90.
Prescott, J., Johnstone, V., and Francis, J. (2004). Odor-taste interactions: effects of attentional strategies
during exposure. Chemical Senses, 29, 331–40.
Rozin, P., and Fallon, A.E. (1987). A perspective on disgust. Psychological Review, 94, 23–41.
Schaal, B., Marlier, L., and Soussignan, R. (2000). Human foetuses learn odours from their pregnant
mother’s diet. Chemical Senses, 25, 729–37.
Schifferstein, H.N.J., and Tanudjaja, I. (2004). Visualizing fragrances through colors: the mediating role of
emotions. Perception, 33, 1249–66.
Schiffman, S.S. (1977). Food recognition in the elderly. Journal of Gerontology, 32, 586–92.
Schiffman, S.S. (1997). Taste and smell losses in normal aging and disease. Journal of the American Medical
Association, 278, 1357–62.
Schiffman, S., and Pasternak, M. (1979). Decreased discrimination of food odors in the elderly. Journal of
Gerontology, 34, 73–79.
Schiffman, S.S., and Warwick, Z.S. (1989). Use of flavor-amplified foods to improve nutritional status in
elderly persons. Annals of the New York Academy of Sciences, 561, 267–76.
Scholl, B.J. (2005). Innateness and (Bayesian) visual perception: reconciling nativism and development. In
The innate mind: structure and contents (ed. P. Carruthers, S. Laurence, and S. Stich), pp. 34–52.
Oxford University Press, Oxford.
Shams, L., and Beierholm, U.R. (2010). Causal inference in perception. Trends in Cognitive Sciences, 14, 425–32.
Shankar, M.U., Levitan, C.A., Prescott, J., and Spence, C. (2009). The influence of color and label
information on flavor perception. Chemosensory Perception, 2, 53–58.
Shankar, M.U., Levitan, C., and Spence, C. (2010a). Grape expectations: the role of cognitive influences in
color-flavor interactions. Consciousness and Cognition, 19, 380–90.
Shankar, M., Simons, C., Levitan, C., Shiv, B., McClure, S., and Spence, C. (2010b). An expectations-based
approach to explaining the crossmodal influence of color on odor identification: the influence of
temporal and spatial factors. Journal of Sensory Studies, 25, 791–803.
Shankar, M., Simons, C., Shiv, B., Levitan, C., McClure, S., and Spence, C. (2010c). An expectations-based
approach to explaining the influence of color on odor identification: the influence of degree of
discrepancy. Attention, Perception, and Psychophysics, 72, 1981–93.
Shankar, M., Simons, C., Shiv, B., McClure, S., and Spence, C. (2010d). An expectation-based approach to
explaining the crossmodal influence of color on odor identification: the influence of expertise.
Chemosensory Perception, 3, 167–73.
Skrandies, W., and Reuther, N. (2008). Match and mismatch of taste, odor, and color is reflected by
electrical activity in the human brain. Journal of Psychophysiology, 22, 175–84.
Spector, F. (2009). The colour and shapes of the world: testing predictions from synesthesia about the development
of sensory associations. Unpublished PhD thesis. McMaster University, Hamilton, Ontario, Canada.
Spence, C. (2010a). The color of wine—Part 1. The World of Fine Wine, 28, 122–29.
Spence, C. (2010b). The multisensory perception of flavour. The Psychologist, 23, 720–23.
Spence, C. (2011). Crossmodal correspondences: a tutorial review. Attention, Perception, and Psychophysics,
73, 971–95.
Spence, C. (2012). Multisensory integration and the psychophysics of flavour perception. In Food oral
processing—fundamentals of eating and sensory perception (ed. J. Chen and L. Engelen) pp. 203–219.
Blackwell Publishing, Oxford.
Spence, C., and Shankar, M.U. (2010). The influence of auditory cues on the perception of, and responses
to, food and drink. Journal of Sensory Studies, 25, 406–30.
Spence, C., Levitan, C., Shankar, M.U., and Zampini, M. (2010). Does food color influence taste and flavor
perception in humans? Chemosensory Perception, 3, 68–84.
Stevens, J.C., Bartoshuk, L.M., and Cain, W.S. (1984). Chemical senses and aging: taste versus smell.
Chemical Senses, 9, 167–79
Stevenson, R.J. (2009). The psychology of flavour. Oxford University Press, Oxford.
Stevenson, R.J. (2012). Multisensory interactions in flavor perception. In The new handbook of multisensory
processes (ed. B.E. Stein) pp. 283–99. MIT Press, Cambridge, MA.
Stevenson, R.J., and Boakes, R.A. (2004). Sweet and sour smells: learned synaesthesia between the senses of
taste and smell. In The handbook of multisensory processing (ed. G.A. Calvert, C. Spence, and B.E.
Stein), pp. 69–83. MIT Press, Cambridge, MA.
Stevenson, R.J., and Tomiczek, C. (2007). Olfactory-induced synesthesias: a review and model. Psychological
Bulletin, 133, 294–309.
Stevenson, R.J., Prescott, J., and Boakes, R.A. (1995). The acquisition of taste properties by odors. Learning
and Motivation, 26, 433–55.
Stevenson, R.J., Boakes, R.A., and Prescott, J. (1998). Changes in odor sweetness resulting from implicit
learning of a simultaneous odor-sweetness association: an example of learned synaesthesia. Learning
and Motivation, 29, 113–32.
Stevenson, R.J., Boakes, R.A., and Wilson, J.P. (2000). Counter-conditioning following human odor-taste
and color-taste learning. Learning and Motivation, 31, 114–27.
Stillman, J. (1993). Color influences flavor identification in fruit-flavored beverages. Journal of Food Science,
58, 810–812.
Stillman, J.A. (2002). Gustation: intersensory experience par excellence. Perception, 31, 1491–1500.
Teerling, A. (1992). The colour of taste. Chemical Senses, 17, 886.
Tepper, B.J. (1993). Effects of a slight color variation on consumer acceptance of orange juice. Journal of
Sensory Studies, 8, 145–54.
Tuorila-Ollikainen, H., Mahlamäki-Kultanen, S., and Kurkela, R. (1984). Relative importance of color, fruity
flavor and sweetness in the overall liking of soft drinks. Journal of Food Science, 49, 1598–1600, 1603.
Varendi, H., Porter, R.H., and Winberg, J. (1996). Attractiveness for amniotic fluid odor: evidence for
prenatal learning? Acta Paediatria, 85, 1223–27.
Verhagen, J.V., and Engelen, L. (2006). The neurocognitive bases of human multimodal food perception:
sensory integration. Neuroscience and Biobehavioral Reviews, 30, 613–50.
Walsh, L.M., Toma, R.B., Tuveson, R.V., and Sondhi, L. (1990). Color preference and food choice among
children. Journal of Psychology, 124, 645–53.
Wheatley, J. (1973). Putting colour into marketing. Marketing, 67, 24–29.
REFERENCES 87
Yeomans, M., Chambers, L., Blumenthal, H., and Blake, A. (2008). The role of expectancy in sensory and
hedonic evaluation: the case of smoked salmon ice-cream. Food Quality and Preference, 19, 565–73.
Young, J.Z. (1968). Influence of the mouth on the evolution of the brain. In Biology of the mouth: a
symposium presented at the Washington meeting of the American Association for the Advancement of
Science, 29–30 December 1966 (ed. P. Person), pp. 21–35. American Association for the Advancement
of Science, Washington, D.C.
Zampini, M., and Spence, C. (2004). The role of auditory cues in modulating the perceived crispness and
staleness of potato chips. Journal of Sensory Science, 19, 347–63.
Zampini, M., and Spence, C. (2010). Assessing the role of sound in the perception of food and drink.
Chemosensory Perception, 3, 57–67.
Zampini, M., Sanabria, D., Phillips, N., and Spence, C. (2007). The multisensory perception of flavor:
assessing the influence of color cues on flavor discrimination responses. Food Quality and Preference,
18, 975–84.
Zampini, M., Wantling, E., Phillips, N., and Spence, C. (2008). Multisensory flavor perception: assessing
the influence of fruit acids and color cues on the perception of fruit-flavored beverages. Food Quality
and Preference, 19, 335–43.
Zellner, D.A., Rozin, P., Aron, M., and Kulish, C. (1983). Conditioned enhancement of humans’ liking for
flavour by pairing with sweetness. Learning and Motivation, 14, 338–50.
Zellner, D.A., Bartoli, A.M., and Eckard, R. (1991). Influence of color on odor identification and liking
ratings. American Journal of Psychology, 104, 547–61.
Chapter 4
Crossmodal interactions in the

human newborn
New answers to Molyneux’s question
Arlette Streri
4.1 Introduction
The environment is characterized by inputs to more than one sensory system: each modality
provides original and unique information about an event. For example, only the visual system can
encode the colour of an event, whereas the haptic sense is needed to perceive the hardness, weight,
and temperature of an object. Only the ears (and to some extent the somatosensory system) are
sensitive to sound, whereas the taste and smell of something can only be coded by the chemical
senses, taste and olfaction. As adults, we integrate the multiple inputs arriving through these
sense organs into unified functional representations because the effective control of actions in the
environment depends on the inputs from a single event being linked and possibly integrated.
Multisensory perception has often been shown to be more precise than unimodal perception (e.g.
Ernst and Banks 2002), bestowing functional advantages such as economy of learning (Hatwell
2003) or intersensory substitution and thus crossmodal plasticity in people born blind or deaf
(Cohen et al. 1997; Röder and Rösler 2004; see also Chapter 13 by Röder). This spontaneous,
effortless integration is especially striking, given the wealth of research in this area that has pro-
vided evidence that integration depends on a number of different complex processes (see Calvert
et al. 2004; Spence and Driver 2004). The organism has to combine information from different
senses to enhance or complete perception, but this combination also poses many challenges for
the nervous system, due to the substantial differences between each sensory system. In the same
way, several decades of study have revealed various functional complexities and developmental
changes in intersensory functioning during the first year after birth (see Lewkowicz and Lickliter
1994) and in adult learning tasks (Shams and Seitz 2008).
The varieties of stimulation (distal versus proximal) and, still more, the varieties of receptors, make
the problem of integrating information across the senses very complex and challenging. This chapter
focuses first on the intersensory interactions observed in human newborns involving the chemical
senses, audition, and vision. Then the relations between the visual and the tactile modalities in new-
borns are examined in detail. These specific visual–tactile relations shed light on an old philosophical
question: Molyneux’s famous question of 7 July 1688 concerning the ‘visual-haptic mapping and the
origin of crossmodal identity’ (Held 2009). They therefore constitute the focus of this chapter.
4.2 The diversity of interactions between the senses

in human newborns
Adults possess a variety of perceptual mechanisms that enable them to deal with multisensory
inputs. Such performance may be due to a long period of learning to combine various inputs (this
THE DIVERSITY OF INTERACTIONS BETWEEN THE SENSES IN HUMAN NEWBORNS 89
is known as the empiricist hypothesis). Newborns enter the world largely naïve, so how can they
make sense of the wealth of stable or moving things, events, and peoples, that they encounter
through audition, vision, touch, and olfaction? Plausibly, they should possess some means that
allow them to have a coherent, if incomplete, representation of these objects, events, and people
(Slater and Kirby 1998). E.J. Gibson (1969) proposed that spatial dimensions as well as temporal
dimensions are amodal in nature, i.e. they are available to all sensory modalities right from birth.
Amodal perception can occur whenever two or more senses provide equivalent information. It is
quite likely that the ability to detect amodal relations is innately given to the infant. For instance,
Wertheimer (1961) reported consistent eye movements in the direction of an auditory stimulus
positioned close to either the left or right ear of a 10-minute-old infant. He interpreted these
ipsilateral eye movements as providing evidence for an innate mechanism subserving the integra-
tion of visual and auditory information. After several failures to replicate this first observation
with brief sounds (Butterworth and Castillo 1976; McGurk et al. 1977), Crassini and Broerse
(1980) finally determined the adequate parameters for the auditory stimulus in order to trigger a
newborn’s eye movements towards the sound source. They observed eye orientations toward a
sound when its duration was sufficiently long, and when the auditory information had a complex
spectral composition (such as speech). In the same way, von Hofsten (1982) provided evidence
that newborns orient their arm and hand to reach for a gently moving object while they look at it.
Thus, newborns’ behaviour is directed to a common auditory, tactile, and visual space (see
Chapter 5 by Bremner et al. for discussion of developments in multisensory representations of
space beyond the first months).
Some preferential orientation towards a stimulus can stem from the prenatal life of the new-
born. Chemosensory interactions provide a good illustration of multisensory interactions because
several senses are involved during the foetal period. Flavour, the most salient feature of foods and
beverages, is defined as the combination of at least three anatomically distinct chemical senses:
taste, smell, and chemosensory irritation (Beauchamp and Mennella 2009). Gustatory stimuli are
detected by receptor cells located in the tongue and palate. The smell component is composed of
volatile compounds detected by receptors in the upper regions of the nose. Chemosensory irrita-
tion is detected by receptors in the skin all over the head, but food is detected by receptors that are
particularly located in the mouth and nose. In adults, these chemical senses (taste, smell and
chemical irritation) work contingently to determine food choices (see Beauchamp and Mennella
2009). In newborns, volatile components of flavour, detected by the olfactory system, are strongly
influenced by early exposure in utero. Chemical molecules soluble in the amniotic fluid soak in
continuously through the nose, lips, and tongue of the foetus. The foetus can detect and store the
unique chemosensory information available in the prenatal environment. At birth, when exposed
to paired-choice tests contrasting the odours of familiar or non-familiar amniotic fluids, infants
orient preferentially and selectively to the odour of familiar amniotic fluid (Schaal et al. 1998; see
also Chapter 2 by Schaal and Durand). Thus from the volatile and non-soluble fragrant informa-
tion alone, the newborn is able to recognize the composite chemical fluid information learned
in utero. Odour cues from lactating women affect the newborn’s behaviours in multiple ways. The
odour of the lactating breast reduces arousal in active newborns and increases it in somnolent
newborns. Such cues elicit positive head- (and nose-) turning and increase oral activity, etc. (cf.
Chapter 2 by Schaal and Durand; Doucet et al. 2007). Auditory stimulation can also modify and
influence the infant’s state of arousal and preference for visual stimuli. Lewkowicz and Turkewitz
(1981) have demonstrated that the newborns exposed first to light spots of different intensities
preferred looking at the light of intermediate intensity. In contrast, the newborns who were first
exposed to a sound (white noise) and then to various light spots preferred the light of lowest
intensity (see also Gardner et al. 1986; Garner and Karmel 1983; Turkewitz et al. 1984 for similar
results). Lewkowicz and Turkewitz (1981) concluded from these results that newborns attend to
90 CROSSMODAL INTERACTIONS IN THE HUMAN NEWBORN
quantitative variations in stimulation. They also concluded that newborns ignore qualitative
attributes of stimulation in favour of quantitative ones.
However, the interactions between auditory and visual information can occur and are
perceived at different levels. Perceiving the equivalent nature of the visible and audible aspects
of an event testifies to an ability to integrate the multimodal properties of temporal events into
unified experiences. Lewkowicz (2000; see also Chapter 7 by Lewkowicz) has proposed that
the four basic features of multisensory temporal experience—temporal synchrony, duration,
rate, and rhythm—emerge in a sequential and hierarchical fashion during the first year after
birth. From birth, synchrony appears to be the fundamental dimension for the perception
of intersensory unity. Many multisensory events give both amodal and arbitrary auditory–visual
information. Infants’ learning about auditory–visual intersensory relations, both amodal
and arbitrary, has been investigated in detail in studies by Bahrick ( 1987 , 1988 ; Bahrick
and Pickens 1994; see also Chapter 8 by Bahrick and Lickliter). Morrongiello et al. (1998) have
demonstrated that newborns can associate objects and sounds on the basis of the combined
cues of collocation and synchrony. They are also capable of learning arbitrary auditory–visual
associations (e.g. between an oriented coloured line and a syllable), but only in the condition
where the visual and auditory information were presented synchronously (Slater et al. 1997,
1999). All these results suggest that, thanks to the temporal synchrony available in the informa-
tion, newborns already have the perceptual mechanisms for later learning the meaning of lexical
words.
Right from birth, infants see many faces speaking to them. The synchrony of voice and mouth
provides amodal information whereas the pairing of the face and the sound of the voice is
arbitrary. Using familiarization and preferential looking times with alternated presentations of
familiar and new stimuli (Spelke 1976), Coulon et al. (2011) recently showed that unfamiliar mov-
ing and talking faces in videos are salient and can enhance face recognition (see also, Guellai et al.
2011; Guellai and Streri, 2011). Moreover, the face of a previously unfamiliar woman was recog-
nized more efficiently when seen talking than when silent. This result supports the idea that audio-
visual integration is fundamental to efficient face processing and learning immediately after birth.
If temporal synchrony is a fundamental dimension to link visual and auditory information
about an event or object, it is not always necessary when the newborns have to abstract informa-
tion not directly available in the layout, such as large numerosities. Izard et al. (2009) recently
revealed that newborn infants spontaneously associate slowly and smoothly moving visual spatial
arrays of 4–18 objects with rapid auditory sequences of events on the basis of number (see
Fig. 4.1). In these experiments, infants were familiarized with sequences of either 4 or 12 sounds
(6 or 18 sounds) accompanied by visual arrays of either 4 or 12 objects (6 or 18 objects). In all the
familiarisation conditions, newborn infants looked longer at the visual image with the matching
number of objects. Despite the absence of synchrony between the sounds and objects, newborns
responded to abstract numerical quantities across different modalities and formats (i.e. sequential
versus simultaneous).
In short, newborns have to understand a complex environment providing a variety of changing
inputs. They have the intermodal mechanisms necessary to perceive this environment in an organ-
ized manner, allowing for an immediate adaptation to facilitate their chances of survival. All these
studies shed light on the different means by which newborns integrate these multiple inputs. These
mechanisms can involve cognitive processes at different levels. For instance, turning the eyes towards
a sound source does not require the same competencies (or have the same ecological value) as
orienting toward or detecting the mother’s milk. The modifications of a newborn’s arousal due to
a milk odour or to an auditory sound make fewer cognitive demands than integration of the arbi-
trary auditory–visual inputs aided by temporal and spatial synchrony. Finally, abstracting and
THE DIVERSITY OF INTERACTIONS BETWEEN THE SENSES IN HUMAN NEWBORNS 91
(A) Familiarisation (2 mins)
. . . “tu-tu-tu-tu-tu-tu-tu-tu-tu-tu-tu-tu-” ... “ra-ra-ra-ra-ra-ra-ra-ra-ra-ra-ra-ra-” . . .
or
. . . “tuuuuu-tuuuuu-tuuuuu-tuuuuu” ... “raaaaa-raaaaa-raaaaa-raaaaa” . . .
Test (4 trials)
(B)
40 s
*** ***
p<0.0001
p<0.1
30
***
Looking time
20
10
0
4 vs. 12 6 vs. 18 4 vs. 8
Congruent number Incongruent number

Fig. 4.1 (A) Auditory and visual displays for an experiment on number representation in newborn
infants, and (B) infants’ looking times to the visual arrays that corresponded or differed in number
from the accompanying auditory sequences (after Izard et al. 2009).
discriminating large numerosities across vision and audition without the aid of synchrony are
abilities that are also present in newborns. In other words, newborns have at their disposal various
fundamental competencies to develop more complex abilities. It is plausible that these competen-
cies may be insufficient or vague or incomplete, and that learning will be necessary for a further
extension of knowledge.
4.3 Crossmodal relations between touch and vision

In recent decades there have been considerable advances in our understanding of the matura-
tional development of vision in the first year after birth. Newborns’ neurological and behavioural
visual systems are very poor, with weak acuity and contrast sensitivity (Allen et al. 1996), poor
fixation ability, and uncoordinated saccadic and other ocular movements (Ricci et al. 2008).
Nevertheless, a series of studies has demonstrated the successful perception of both faces (see
Pascalis and Slater 2003) and objects (Slater 1998). Between the ages of 6 and 9 months, visual
acuity reaches near-mature levels. In contrast, few studies have focused on the tactile system in
this period of development. A recent study using magneto-encephalography (MEG) has provided
evidence concerning the cortical maturation of tactile processing in human subjects from birth
onward (Pihko et al. 2009). Maturation of short-latency cortical responses to tactile stimulation
is largely complete two years after birth. A contrast worth noting here is that while the somes-
thetic system, the first to function in utero, develops slowly even after birth, the visual system, the
last to really begin to work, develops extremely quickly after birth (Granier-Deferre et al. 2004;
Lecanuet et al. 1995).
In the first part of the following section, the complexities of the tactile system in adults and
its development in infancy are described, invoking some important methodological considera-
tions along the way. In the second part, experimental evidence concerning interconnections
between the tactile and visual modalities in newborns is presented in an attempt to evaluate pos-
sible explanations for both the successes and the failures of these interconnections. These findings
will be brought to bear on current theoretical views of the development of relations between
modalities.
4.3.1 Characteristics of the tactile system in adults

The mouth and the hands are the best organs for perceiving and knowing the properties of the
environment using the tactile modality. In many mammals, the mouth and the vibrissal system1
are the most sensitive body parts and the main tools of perceptual exploration. For non-human
primates, the hands are also used for feeding, running, stroking, active manipulation, etc.,
allowing animals to vary their activity in situationally appropriate ways (Tomasello 1998). This
specialization culminates in human beings, whose hands and arms have become a principal tool
of investigation, manipulation and transformation of the world.
It is widely recognized that one characteristic of touch stems from the fact that the hands
are both perceptual systems, able to explore the environment, and motor organs, able to modify
it (Hatwell et al. 2003; Loomis and Lederman 1986). These two skills, perceptual and motor, give
to the haptic system its unique and original property among the senses. Although it is necessary
to distinguish between ‘action for perception’ (exploratory action) and ‘perception for action’
(perception subserving or preparing action; Lederman and Klatzky 1996), perception and action
are closely linked in haptic functioning in all cases.
Knowing an object by touch depends on contact with surfaces and on the quality of proximal
reception. The tactual perceptual field is limited to the zone of contact with an object and has the
exact dimensions of the surface of the skin in contact with the stimulus. However, to obtain a
good representation of the whole object and identify it, voluntary movements must be made in
order to compensate for the smallness of the tactile perceptual field. The kinesthetic perceptions
resulting from these movements are necessarily linked to the purely cutaneous perceptions
generated by skin contact, and they form a whole called ‘haptic’ (or tactilo-kinesthetic, or active
1 Vibrissae (or whiskers) are hairs usually employed for tactile sensation by mammals.
CROSSMODAL RELATIONS BETWEEN TOUCH AND VISION 93
touch) perception. As a result, object perception in the haptic mode is highly sequential. This
property increases the load on working memory and requires a mental reconstruction and syn-
thesis in order to obtain a unified representation of a given object (Revesz 1950).
In adults, the hands alone are able to gather information about the different properties
of objects, but these properties do not have the same salience under haptic exploration with or
without vision (Klatzky et al. 1987). In a free-sorting task, participants had to sort by similarity
objects with different properties according three instructions or conditions: without vision, with
visual imagery, or with visual inspection of the objects. Hardness and texture were highly salient
in the haptic-alone group. For the haptic-plus-imagery group, shape was especially salient. The
haptic-plus-visual group showed salience to be well distributed over all the dimensions. Haptic
identification of different properties of objects is possible, each being detected by means of
specific exploratory procedures (EPs). EPs are voluntary movements for detecting a particular
property of object. For example, enclosure is a necessary procedure to detect global shape; contour
following allows a good detection of precise shape, while lateral motion on the object surface is
sufficient to detect texture. However, not of all these exploratory procedures are exhibited by
infants. Therefore, it is difficult to know which properties the infants can detect. Perhaps they
detect only one or two properties when they are holding an object in their hands but they are
unable to detect all the properties of an object with precision because they do not yet have the
specific EPs that have been described in adults.
4.3.2 Characteristics of the tactile system in infants

In recent years, there has been an increased interest in the manual perception of objects in human
newborns. Earlier, newborns had typically been described as displaying mainly reflex reactions
and clumsy arm movements. Newborns’ hands, in particular, have often been described as closed,
or exhibiting either grasping or avoiding reflexes, which are inappropriate behaviours for holding
an object and gathering and processing information about it (Twitchell 1965). As a consequence,
while a wealth of research studying infants and very young children has been devoted to the
description and development of manual action skills (displacement and manipulation of
objects)—skills which, as mentioned above, reflect the haptic system’s distinctive coupling of
motor and perceptual abilities—relatively few studies have focused on the use of the hands
for perceptual information-gathering. Another reason for the lack of emphasis on manual explo-
ration and perception in young infants is that they spontaneously display marked and varied
behaviours with their mouth, but not with their hands. Numerous studies of the neonatal imita-
tion of tongue protrusion or mouth opening (Field et al. 1982; Fontaine 1984; Maratos 1973;
Meltzoff and Moore 1977; Vinter 1986) and coordination of hand–mouth actions (Butterworth
and Hopkins 1988) reveal the early-developing perceptual and motor abilities of young babies.
Thus, the mouth has been considered by researchers as an exploratory tool and an important part
of the haptic system, playing a major role for the infant when gathering information about
objects. Because of these factors, studies on the specifically manual abilities of newborns have
unfortunately been rather neglected, almost to the point of being forgotten altogether. Thus,
numerous researchers have held that manual exploration is very limited or absent in infancy
(Gibson and Walker 1984; Meltzoff and Borton 1979; Rochat 1989). It is very likely that, from
birth, mouthing and oral exploration provide fundamental means for infants to interact with
objects. However, the hands often aid in these interactions.
While perceiving with the hands is intensively practiced by the young infant, at about
5–6 months of age (see Streri 1993), infants start to maintain a sitting position. They transport a
held object to their mouths or their eyes to bring it into view, and often, if they see an object in
their peripersonal space, i.e. near them, they attempt to catch or grasp it. As a consequence, the
two modalities interact in the course of exploration. In this simultaneous bimodal exploration,
infants have to judge as equivalent items of information gathered simultaneously by the haptic
and visual systems in order to obtain a complete and high-quality representation of the object.
A series of studies has offered a better understanding of the efficiency of each modality (visual,
oral, manual) in babies’ play activities (Gottfried et al. 1978; Ruff 1984; Ruff et al. 1992; Jovanovic
et al. 2008). However, in multisensory exploration, it is difficult to isolate and evaluate the effects
of the manual system’s activity and its consequences for other sensory systems. Our research has
focused mainly on the role of the hands as perceptual systems considered in isolation (i.e. without
visual inspection) and their interaction with vision in crossmodal tasks.
4.3.3 Methodological considerations when assessing tactile

perception in infants
In newborns and young infants, grasping and avoiding reflexes were once regarded as the domi-
nant behaviours mediating interactions with the environment. At first glance, the grasping reflex
in neonates seems to be similar to the adult’s enclosure procedure. However, according to Katz
(1925) and Roland and Mortensen (1987), newborns’ tendency to strongly close their fingers on
an object or an adult’s finger makes it impossible for them to perceive the fine details of an object,
or even recognize an object with a single grasp. In adults, tactile information about object shape
is sampled sequentially by several fingertips sweeping over the object’s surface in different direc-
tions with different velocities. Under such conditions, a newborn’s grasping may be insufficient
to detect fine features. However, a series of several grasps might be adequate to perceive global
shape, rough texture, etc. and give an involuntary EP to detect the global shape.
For this reason, the studies have been performed using habituation with an infant-control proce-
dure, as opposed to fixed-duration familiarization. The familiarization procedure was used in Rose’s
(1994) well-known work in 6- to 24-month-olds. The procedure makes it possible to determine the
minimum duration of tactile familiarization required to obtain visual recognition in crossmodal
transfer tasks.2 The babies feel the object with their hand, without visual control, for a fixed duration
in one trial alone and then see the familiar object and the novel object. One disadvantage of this
method is that it does not necessarily allow the infant to explore the object with more than one grasp,
and thus extract a more full representation of its shape. The habituation procedure, usually used in
the visual modality, was applied for the first time in the haptic modality with 5-month-old infants
(Streri and Pêcheux 1986a), and can easily be adapted to newborns (see Fig. 4.2).
The haptic habituation/dishabituation procedure presents several advantages:
◆ The decrease in holding times seems to reveal the infants’ abilities to perceive and memorize
the shape and recognize it. Moreover, it is now well established that habituation and memory
2 The haptic habituation phase included a series of trials in which the infants received a small object in one
hand. A trial began when the infant held the object and ended when the infant dropped it or after a maxi-
mum duration defined by the experimenter. This procedure was repeated several times. As a consequence,
the habituation process entailed several grasps of different durations. Usually, the minimum duration for a
simple grasp was 1 sec of holding and the maximum was 60 sec. The inter-trial intervals were short (between
4 and 10 sec). Trials were continued until the habituation criterion was met. The infant was judged to have
been habituated when the duration of holding on any two consecutive trials, from the third onwards,
totalled a third (or a quarter according to the infant’s age) or less of the total duration of the first two trials.
The total holding time was taken as an indicator of familiarization duration, exclusive of inter-trial interval.
The mean number of trials to reach habituation was between four and twelve, and often varied according
to the complexity of shapes. Then, in the dishabituation phase, a novel object was put in the infant’s hand.
If an increase in holding time of the novel object was observed, one can infer that the baby was reacting to
novelty, and thus had noticed the difference between the novel and the familiar objects.
CROSSMODAL RELATIONS BETWEEN TOUCH AND VISION 95
50
40
30
Right hand
20
10
Mean duration of holding (seconds)
0
-
50
40
30
Left hand
20
10
0
N-4 N-3 N-2 N-1 N+1 N+2 T1 T2
Haptic habituation PC trials Haptic test
Fig. 4.2 Habituation and discrimination of object shapes by newborn infants in both the left and
the right hand (after Streri et al. 2000). The superimposed shapes indicate the stimuli which were
presented to newborns haptically (the shape which newborns were habituated to was
counterbalanced between participants). ‘PC trials’ are post-criterion trials in which newborns
response to the habituated stimulus were measured. Adapted from Arlette Streri, Myriam Lhote,
and Sophie Dutilleul, Haptic perception in newborns, Developmental Psychobiology, 3 (3),
pp. 319–327, © 2000, John Wiley and Sons, with permission.
processes are closely linked, revealing a form of mental representation of stimuli (cf. Pascalis
and De Haan 2003; Rovee-Collier and Barr 2001).
◆ It gives the infant more than one haptic sample.
◆ This procedure, controlled by the infant, effectively reveals the early perceptual capacities of
young babies (cf. Streri 1993).
◆ Using the same procedure for both visual and haptic tasks makes it possible to compare
performance in these two modalities at the same age. If performance is identical in both
modalities, one can conclude that the processes for taking in information are comparable and
the exchange of information between the senses is possible.
4.3.4 Experimental evidence of perceptual manual

abilities in newborns
Streri et al. (2000) demonstrated that newborns were able to detect differences in the contours
of two small objects (a smoothly curved cylinder versus a sharply angled prism) with both the right
and left hands. Habituation and test phases were performed with the same hand without visual
inspection. After habituation with one of the two objects placed in the right or left hand, confirmed
by two more trials (this is known as the partial-lag design, see Berthental et al. 1983), a novelty
reaction was observed when a new object (the prism or the cylinder) was put in the same hand. This
experiment provided the first evidence of habituation and reaction to novelty observed with the
left as well as the right hand in human newborns (cf. Fig. 4.2). Recently, manual habituation and
discrimination of shapes in preterm infants from 33 to 34 weeks post-conceptional age have
been demonstrated in the same conditions as in full-term newborns, regardless of hand side (Lejeune
et al. 2010). Pre-term and full-term newborns are able to discriminate between curvilinear and
rectilinear contours in small objects. As a consequence, it may be concluded that haptic memory in
its manual mode is also present from birth. Within the first 6 months after birth, the young baby is
also capable of reasonably good performance in the detection of other shape distinctions.
Nevertheless, this behaviour does not show that the baby has a clear representation of what
is being held in the hand. They do not explore the objects actively in their hands. Some squeeze-
releases on the objects are observed. Because young infants are unable to perform the integration
and synthesis of information in working memory required under haptic exploration, shape per-
ception is probably partial or limited to the detection of features such as points, curves, and edges.
The information gathered is provided from enclosure of the object, which seems to be an effective
exploratory procedure for these limited purposes.
4.4 The links between the haptic and visual modalities:

an old–new philosophical debate
If the visual and haptic systems characterize perceptual objects according to some common
property or feature, and if information for that property flows between the two systems, the pre-
requisites to reveal links between modalities in crossmodal transfer tasks are present. The ques-
tion of whether there are links between sensory modalities in newborns is crucial because although
both haptic and auditory information is available in the fetal period, visual information is not. We
may thus ask whether human newborns can perceive shape equivalence between vision and
touch.
For several centuries, the question of crossmodal integration between the senses has been
addressed by philosophical answers to Molyneux’s famous question: will a person born blind
who recovers sight as an adult immediately be able to distinguish visually between a cube and a
sphere? (Bruno and Mandelbaum 2010; Gallagher 2006; Proust 1997). But Diderot (1749–1792),
the first philosopher to compare the blind person to a neonate, contended that there is no doubt
‘that vision must be very imperfect in an infant that opens his/her eyes for the first time, or in a
blind person just after his/her operation’. Diderot was right when he said that newborn’s vision is
very imperfect. However, many experiments have since provided evidence that neonates perceive
THE LINKS BETWEEN THE HAPTIC AND VISUAL MODALITIES 97
several elementary properties of the world, such as colour, shape, movement, objects, and faces
(cf. Kellman and Arterberry 1998; Pascalis and Slater 2003). Moreover, Molyneux’s question
precisely describes the crossmodal task from hands to eyes in infancy. Although newborns are not
directly comparable to congenitally blind individuals (see Gallagher 2006, for a discussion), if
they are able to form a perceptual representation of the shape of objects from the hand and to
recognize this shape visually, this would suggest an affirmative answer to Molyneux’s question.
This would mean that a link between the hands and the eyes exists before infants have had the
opportunity to learn from the pairings of visual and tactual experiences.
4.4.1 Methodological considerations when assessing

crossmodal transfer in infants
Crossmodal transfer tasks have been used to study Molyneux’s question. They involve two
successive phases: a habituation phase in one modality, followed by a recognition phase in a
second modality. In the ‘tactile-to-visual modality’ task, newborns undergo tactile habituation
to an object that is placed in their right hand. Then, in the second phase, the familiar and
the novel objects are visually presented in alternation during four trials in a counterbalanced
order between participants. The ‘tactile-to-visual modality’ task is very similar to the situation
described in Molyneux’s question. It makes it possible to uncover not only links between
the senses but also, in older babies, the capability of using the hands to go beyond directly
perceived information about the environment to abstract some information, as in the case
of perception of partly occluded objects presented in the visual mode (Kellman and
Spelke 1983)—adapted to the haptic modality (Streri and Spelke 1988, 1989)—or in the case of
the perception of number of held objects (Féron et al. 2006). The opposite transfer is also possi-
ble. In the ‘visual-to-tactile modality’ task, newborns are first visually habituated to an object and
then, in the second phase, the familiar and the novel objects are tactually presented to their right
hands in alternation during four trials. In both tasks a reaction to novelty is expected. Its occur-
rence means that the newborns recognize the familiar object and explore the novel object for a
longer time.
This paradigm, widely used in infancy studies, involves several cognitive resources. In the first
phase, the baby has to collect a piece of information on an object in one modality, memorize this
information, and then, in the second phase, choose which object is the familiar one in another
modality.
4.4.2 Experimental evidence of crossmodal transfer of information

Several lines of investigation have approached but not resolved Molyneux’s question. Crossmodal
transfer of texture (smooth versus granular) and substance (hard versus soft) from the oral
modality to vision has been shown in 1-month-old infants (Gibson and Walker 1984; Meltzoff
and Borton 1979). Two-month-old infants were able to visually recognize the shape of an object
that they had previously manipulated with their right hand (Streri 1987). However, several
questions regarding the origins of intermodal transfer remain. First, research on the visual
abilities of newborns indicates a clear possibility that babies might have learned links between the
modalities over 1 or 2 months. Second, data on oral-to-visual transfer of texture were sceptically
received, and the results have not always been reproduced (cf. Maurer et al. 1999). Third, mouth-
ing behaviour involves different exploratory procedures from handling, which is more complex
and varied. These discrepancies suggest that information processing by the two modes of explora-
tion might not develop in a similar manner or have the same function. Finally, according to
Klatzky and Lederman’s (1993) classification, texture and substance are material properties and
not geometric properties. Molyneux’s question, in contrast, concerns the perception of object
shape. Shape is a geometric property under Klatsky and Lederman's (1993) classification, and
thus it would require different exploratory procedures to those available orally.
Streri and Gentaz (2003 ) conducted an experiment on crossmodal recognition of shape
from the right hand to the eyes in human newborns. They used an intersensory paired-preference
procedure that included two phases: a haptic familiarization phase in which newborns were given
an object to explore manually without seeing it, followed by a visual test phase in which infants
were shown the familiar object paired with a novel one for 60 seconds. The participants consisted
of 24 newborns (mean age: 62 hours). Tactile objects were a small cylinder (10 mm in diameter)
and a small prism (10-mm triangle base). Because the vision of newborns is immature and their
visual acuity is weak, visual objects were the same 3D shapes, but much larger (45-mm triangle
base and 100 mm in length for the prism and 30 mm in diameter and 100 mm in length for the
cylinder). An experimental group (12 newborns) received the two phases successively (haptic
then visual) whereas a baseline group (12 newborns) received only the visual test phase with the
same objects as the experimental group but without the haptic familiarization phase.
The comparison of looking times between the two groups allowed us to provide evidence of
crossmodal recognition of shape from hand to eyes in the experimental group. The results revealed
that the newborns in the experimental group looked at the novel object for longer than the famil-
iar one. In contrast, the newborns in the baseline group looked equally at both objects. Moreover,
in the experimental group, infants made more gaze shifts toward the novel object than the famil-
iar object. In the baseline group this was not the case. Thus, this recognition in the experimental
group stems from the haptic habituation phase. These results suggest that newborns recognized
the familiar object through a visual comparison process as well as a comparison between the haptic
and visual modes. Moreover, the discrepancy between the size of visual and tactile objects was
apparently not relevant for crossmodal recognition. Shape alone seems to have been considered
by newborns.
In conclusion, newborns are able to transfer shape information from touch to vision before
they have have the opportunity to learn the pairing of visual and tactile experiences. These results
challenge the empiricist philosophical view, as well as modern connectionist models (Elman et al.
1996; McClelland et al. 1986; see also Chapter 15 by Mareschal et al.) arguing that sensory modal-
ities cannot communicate in newborns. The results reveal an early-developing ability, largely
independent of experience, to detect abstract, amodal higher-order properties of objects. This
ability may be a necessary prerequisite to the development of knowledge in infancy. At birth,
various perceptions of objects are unified to make the world stable. How should these results be
interpreted?
Recently, Held (2009) proposed another interpretation of these results. After several attempts
to obtain an affirmative answer to Molyneux’s question (cf. Jeannerod 1975; Pascual-Leone and
Hamilton 2001), Held’s experiments on congenitally blind adults whose vision was restored
have revealed that they learned very quickly to map a felt object with a seen object by a process
called ‘capture’. This capture consists, for adults, of seeing their hands in normal activities. It is
sufficient for recalibrating the two senses and serves as basis for the crossmodal matching abilities.
In a second experiment (Streri and Gentaz 2004), newborns were 54 hours old. Referring to
this experiment, Held proposed that newborns had the possibility, within these 54 hours, to see
their hand and as a consequence learn to map the felt shape with the seen object through
‘capture’. A third, more recent experiment by Sann and Streri (2007), nevertheless cast doubt
on this interpretation. In Sann and Streri’s experiment, the youngest newborn was 12 hours old.
It seems less likely that during this short period, the ‘capture’ and recalibration effects could have
occurred.
The contrast between the early-emerging perceptual capacities of newborn infants, and the
slower-to-adapt capacities of blind adults whose sight is restored, suggests that the fundamental
question concerning the origins of perceptual knowledge may be better addressed through studies
of infants than through studies of adults. Although both newborn infants and newly operated
blind adults are similarly devoid of visual experience, they differ greatly in other ways. In particu-
lar, blind adults have a lifetime of experience using touch, audition, and other senses to guide
their actions. They also have a nervous system that is fully mature. For both reasons, adults may
adapt to newly incoming visual information, but they are likely to achieve this much more slowly
than infants.
Other explanations have been proposed in the literature to explain these early perceptual
abilities:
1 Synaesthesia hypothesis: Maurer (1993; for a more recent version see Chapter 10 by Maurer et al.;
Spector and Maurer 2009) suggested that newborns cannot yet distinguish among the various
senses ‘ . . .The newborn’s senses are not well differentiated but are instead intermingled in a synes-
thetic confusion . . . Energy from the different senses, including the proprioceptive sense of his
or her own movement, is largely if not wholly undifferentiated: the newborn perceives changes
over space and time in the quantity of energy, not the sense through which it arose’ (Maurer 1993,
pp. 111–112). On this view, early crossmodal transfer is based on a lack of distinction between
senses rather than on a crossmodal recognition of objects and the concept of synaesthesia
would be more appropriate for understanding the relations between sensory modalities. In
the adult field, synaesthesia has a different meaning. It has been described as a ‘mixing’ of the
senses (Rich et al. 2004) or as occurring in a person who experiences sensations in one modal-
ity when a second modality is stimulated, even though no physical stimulation is presented in
the first modality (Marks 1975; Ramachandran and Hubbard 2001). In other words, synaes-
thesia is a clearly defined sensation which is perceived in addition to the percepts of objects
that are physically present and induce the synesthetic sensation. If the most common type of
intramodal synaesthesia is coloured numbers, coloured hearing is a frequent form of intermodal
synaesthesia wherein speech sounds or music produce colored visual images as well as auditory
perceptions.
2. Active intermodal matching hypothesis: Meltzoff (1993) proposed the active intermodal
matching hypothesis (AIM), according to which, 1-month-old infants match what they see
and what they explore orally (with a preference for familiar texture) in experiments on cross-
modal transfer (Meltzoff and Borton 1979). On this view, young infants use a supramodal code
that allows them to unify information derived from different sensory modalities in a common
framework. The same mechanism is invoked to explain neonatal imitations, such as tongue
protrusion or mouth opening. According to Meltzoff’s hypothesis, imitation taps perception,
crossmodal coordination, and motor control. Early imitation is mediated by AIM; to perform
it young infants use the crossmodal equivalence between the dynamics of the visual stimulus
and their own motor signals.
3. Abstract amodal information hypothesis: E.J. Gibson (1969) suggested that infants are able to
abstract amodal or common information. Abstraction of common information is based on
the equivalence between modalities with regard to amodal properties such as location, texture,
movement, shape, and duration. Since Gibson’s suggestion, the concept of amodal perception
has often been used to describe the link between modalities. According to this view, a common
percept, independent from any one sense, is created.
In the remainder of this section I will bring the findings observed in Streri and Gentaz’s experi-
ments to bear on the three hypotheses discussed above. Questions we have addressed, and which
are relevant to distinguishing these hypotheses, concern whether crossmodal transfer is a general
property of the newborn human, or whether it is specific to certain parts of the body. We have
also examined whether crossmodal transfer occurs only in a single direction or not. In concrete
terms, the issues we have addressed are:
◆ Considering that habituation and discrimination have been demonstrated in both the right
and left hands, is crossmodal transfer realized regardless of which hand is stimulated?
◆ If newborns cannot distinguish between senses then, in crossmodal transfer tasks, whatever
modality is stimulated first, recognition should be found in the second modality. The same
argument can be taken up for testing the concept of amodal perception: a bidirectional transfer
should be evidenced.
◆ If the AIM hypothesis is correct, then a preference for the familiar object should be shown in
the recognition test phase—infants matching the felt (or seen) object with the seen (or felt)
object. This is because the AIM proposes that infants should actively attempt to match the
senses. One could argue thus, that active matching would result in a familiarity preference
following crossmodal transfer.
Streri and Gentaz (2004) tested for crossmodal transfer from the left hand to vision, as well as
transfer from the right hand to vision but with another methodology. In the visual test phase,
newborns looked at only one object per trial and the objects were presented successively and not
simultaneously as in previous experiments (Streri and Gentaz 2003). Twenty-four newborns
received the tactile habituation phase with the left hand, 24 other newborns received it with the
right hand, and 12 newborns received no tactile habituation phase. In the visual test phase,
newborns in the experimental groups looked at the familiar object and at the novel object over
four sequential trials presented in alternation (one object per trial). In the control group, new-
borns looked at the cylinder and the prism over four trials in the same manner as the experimen-
tal groups. Thus, all newborns looked twice at both the prism and the cylinder. The order of
presentation of the objects was counterbalanced across participants.
The results were as follows: crossmodal recognition was again evidenced from the right hand:
newborns looked longer at the novel object. No visual recognition was evidenced from infants in
the left-hand group. The haptic-habituation phase did not influence the visual-test phase. In the
control group, newborns looked equally at the two objects. The interpretation of laterality effects
at birth is very difficult (Streri and Gentaz 2004). Some authors have suggested that they might be
determined by foetal position (Michel and Goodwin 1979). Two-thirds of infants are born in a
left vertex presentation, which is probably also their position in the womb 3–4 weeks prior to
birth. It has been suggested that, due to the fetal position, the left arm is stuck to the mother’s
back and consequently receives a lesser flow of arterial blood than the right arm. Other studies in
connected fields have also observed asymmetries favouring the right side before and after birth
(Streri 1993). Newborns present a spontaneous preference for lying with their head turned
toward the right while supine (Coryell and Michel 1978; Gesell 1938; Turkewitz et al. 1965).
Laterality effects in crossmodal transfer from hand to eyes might have the same origin.
Sann and Streri (2007, Exps. 1 and 2) tested transfer from eyes to hand and from hand to eyes
in order to ascertain whether a complete primitive ‘unity of the senses’ would be demonstrated.
After haptic habituation to an object (cylinder or prism), the infants saw the familiar and the
novel shape in alternation. After visual habituation with either the cylinder or the prism, the
familiar and the novel shape were put in the infant’s right hand. The tactile objects were presented
sequentially in an alternating manner. Following haptic habituation, visual recognition was again
observed, whereas no haptic recognition was found following visual habituation. These results
have implications for the three models described earlier.
1. Our data do not support an account that relies on a confusion of the senses as suggested by
Maurer (1993). Haptic inputs are transferred to the visual modality, but the reverse is not true.
That is, the senses are not confused. Some research suggests that synaesthetic correspondences
between sensory modalities, namely between the auditory and visual modalities, exist in chil-
dren (Mondloch and Maurer 2004; although see Spence 2011) and 3- to 4-month-old infants
(Walker et al. 2010), but currently there is no evidence that newborns’ perception is synaes-
thetic. Our results cannot stem from synaesthesia for several reasons. Some of these correspon-
dences can be attributed to intensity matching, but shape is a structural property that cannot
be defined in terms of ‘intensity of stimulation.’ According to Ramachandran and Hubbard
(2001), synaesthesia is a sensory effect rather than a cognitive one based on memory associa-
tions. Our crossmodal transfer tasks between touch and vision with successive presentations
require several cognitive processes such as memory, comparison between senses, and choice
between the objects. Such processes are not involved in an intersensory paired-preference pro-
cedure, as used in Mondloch and Maurer’s (2004) experiments, where the visual and auditory
stimulation were presented simultaneously. Moreover, in visual and auditory transfer tasks,
the senses are both distal, while the tactile sense deals with proximal stimuli. As a consequence,
several forms of intersensory integration are possible at birth, because intersensory integration
is not a unitary process (see Lewkowicz 2000, for a review).
2. Intermodal matching is also ruled out because the transfer is not reciprocal. A felt object is
matched with a seen object, but a seen object is not matched with a felt object. A common
framework may exist, but the matching process is not reciprocal with regard to object proper-
ties, at least in the case of crossmodal transfer of shape information. Moreover, in our experi-
ments, newborns looked more at the novel object than the familiar object, unlike in Meltzoff
and Borton’s (1979) experiments.
3. Detection of amodal information seems to be the best-supported hypothesis because habitua-
tion and dishabituation are observed in both modalities. Newborns detect invariants whatever
the modality that receives information. But this hypothesis must be made more precise. How
exactly is amodal information perceptually linked, given that transfer is not reciprocal?
4.4.3 Similarity of processes versus similarity of representations

Our results suggest that similarity of processes does not necessarily lead to similarity of represen-
tations. Recall that newborns are able to habituate and discriminate between various features in
vision and touch. They detect the same invariants in the course of habituation and react to the
novelty in the phase test when they perceive a novel feature. However, according to Slater et al.
(1991), information processed by vision is global and complete from birth. Although the ‘global’
character of early visual perception is debated (Cohen and Younger 1984), vision, with its large
perceptual field and numerous receptors, is relatively ‘high-bandwidth’ and parallel. Information
gathering by touch, in contrast, is relatively ‘low-bandwidth’ and sequential: even though, thanks
to the grasp, babies are capable of taking in information about the contours of the object, this
information is limited at any one moment to the stimulation of a small patch of skin receptors.
Moreover, successively gathered information may not be synthesized. This discrepancy in the
manner of information-gathering could be the cause of non-reciprocal transfer between vision
and touch. A seen object can be processed as a structured whole, but the newborn cannot manually
recognize this totality from a felt feature. In contrast, an infant can visually recognize a feature of
an object that he or she has previously felt. Amodal perception would be obtained from the sim-
plest processed information; that is, information gathered by the haptic modality. The amodal
percept is only partial because, as discussed above, the newborn is not able to establish a complete
percept of the object using the hand alone. Consequently, crossmodal transfer of shape informa-
tion between touch and vision is not a general property of the organism, since it is found only in
one direction, from touch to vision, and only from the right hand, not from the left.
An important question is how haptic input could be translated into a visual format given that
the sensory impressions are so different? To date, in adults, there is substantial neuroimaging
evidence showing that vision and touch are intimately connected, even though different views
are proposed (see Amedi et al. 2005; Sathian 2005 for reviews). Cerebral cortical areas, specifically
the lateral occipital complex (LOC), previously considered as exclusively visual areas become
activated during haptic perception of shape (Lacey et al. 2007). Perhaps this mechanism is already
present in newborns’ brains.
4.4.4 Shape versus texture

Shape and texture are two amodal properties processed by both vision and touch. Studying these
two properties could allow us to test the hypothesis of amodal perception in newborns anew,
as well as shedding light on the processes involved in gathering information by both sensory
modalities. However, shape is best processed by vision, whereas texture is thought to be best
detected by touch (see Bushnell and Boudreau 1998; Klatzky et al. 1987). According to Guest and
Spence (2003), texture is ‘more ecologically suited’ to touch than vision. In many studies con-
cerning shape (a macrogeometric property), transfer from haptics to vision has been found to be
easier than transfer from vision to haptics in both children and adults (Connolly and Jones 1970;
Jones and Connoly 1970; Juurmaa and Lehtinen-Railo 1988; Newham and MacKenzie 1993;
cf. Hatwell 1994). In contrast, when the transfer concerns texture (a microgeometric property),
for which touch is as efficient as (if not better than) vision, this asymmetry does not appear.
Neuroimaging data obtained in human adults has suggested that there exists a functional
separation in the cortical processing of micro- and macrogeometric cues (Roland et al. 1988). In
this study, adults had to discriminate the length, shape, and roughness of objects with their right
hand. Discrimination of object roughness activated the lateral parietal opercular cortex signifi-
cantly more than did length or shape discrimination. Shape and length discrimination activated
the anterior part of the intraparietal sulcus more than did roughness discrimination. More
recently, Merabet et al. (2004) confirmed the existence of a functional separation and suggested
that the occipital (visual) cortex is functionally involved in tactile tasks requiring fine spatial
judgments in normally sighted individuals. More specifically, a transient disruption of visual
cortical areas using repetitive transcranial magnetic stimulation did not hinder texture judgments
but impaired the subject’s ability to judge the distance spacing between raised dot patterns.
Conversely, transient disruption of somatosensory cortex impaired texture judgments, while
interdot distance judgments remained intact. In short, shape and texture properties need differ-
ent exploratory procedures to be detected, and take place in two different pathways in adult
brains. In our behavioural research, we attempted to determine whether shape and texture are
processed similarly in human newborns or if the different processing modes develop with age
through learning and maturation.
To the best of our knowledge, no study has examined whether crossmodal transfer between
vision and touch occurs in newborns for a property other than shape—for example texture,
although some results about texture processing have been presented. Molina and Jouen (1998)
demonstrated that newborns are able to modulate their manual activity with objects that vary
only according to their texture. Their study was based on the recording of the pressure activity
level exerted on objects. To measure haptic perception, they used hand pressure frequency (HPF),
which consists in the recording of successive contacts between the skin and an object. The results
of this study showed that a continuous high hand pressure was exhibited when newborns held
smooth objects, and discrete continuous low pressure when they held granular objects. Elsewhere,
Molina and Jouen (2001) conducted an experiment to investigate intersensory functioning
between vision and touch for texture information in neonates. Using HPF, they compared man-
ual activity recorded on objects with variable texture in the presence or absence of visual informa-
tion. The visual object had either the same (matching condition) or different (mismatching
condition) texture information as the held object. In the matching condition, holding times
increased but HPF remained unchanged. In the mismatched condition, holding times remained
unchanged, but HPF systematically changed over the test period. Taken together, these results
revealed that manual activity is modulated by vision. This experiment revealed an ability to detect
a common texture across vision and touch in newborns. Nevertheless, these simultaneous match-
ing tasks do not involve memory; they do not tell us anything about the infants’ abilities to trans-
fer information between vision and touch in successive phases.
Sann and Streri 2007 undertook a comparison between shape and texture in bidirectional cross-
modal transfer tasks. They sought to reveal how information is gathered and processed by the vis-
ual and tactile modalities and, as a consequence, to shed light on the perceptual mechanisms of
newborns. If the perceptual mechanisms implicated in gathering information on object properties
are equivalent in both modalities at birth, we would expect to observe a reverse crossmodal trans-
fer. By contrast, if the perceptual mechanisms differ in the two modalities, a non-reversible transfer
should be found. Thirty-two newborns participated in two experiments (16 in crossmodal transfer
from vision to touch, and 16 in the reverse transfer). The material was one smooth cylinder versus
one granular cylinder (a cylinder with pearls stuck on it). The results revealed crossmodal recogni-
tion of texture between modalities in both directions. These findings suggest that for the property
of texture, exchanges between the sensory modalities are bidirectional. Why then is a reverse cross-
modal transfer observed for texture and not for shape (see Fig. 4.3)?
All of our findings suggest that newborns are not able to obtain a good representation of object
shape while they are manipulating it. Two experimental arguments run in favour of this hypoth-
esis: inter-manual transfer of shape versus texture in newborns and bidirectional transfer of shape
in 2-month-olds, under certain conditions.
The study of inter-manual transfer of information does not involve the inherent difficulties in
the decoding process of crossmodal tasks. The inter-manual transfer of tactile information can
be defined as the ability to recognize that an object experienced in one hand is the same or
equivalent to an object experienced in the opposite hand. In adults, neuroimaging studies have
revealed that the transfer of tactile information from one hand to the other rests on the func-
tional integrity of the corpus callosum (Fabri et al. 2001, 2005). The corpus callosum is the
major neural pathway that connects homologous or equivalent regions of the two hemispheres
in mammals and plays a central role in interhemispheric integration and information transfer
(Bloom and Hynd 2005). Although its development begins during fetal life, the corpus callosum
is not fully formed at birth and matures very slowly (Cernacek and Podivinski 1971). However,
neuroimaging data obtained using diffusion tensor imaging (DTI) indicates that, despite uncom-
pleted myelination, the main fiber bundles are already in place at birth (Dubois et al. 2006).
Thus the organization and maturation of white matter bundles seem to be present in infants
from the beginning of life. As a consequence, questions arise as to whether, at birth, the
two hemispheres function independently, and whether inter-manual transfer of information is
possible in the newborn infant.
Unidirectionality of crossmodal transfer of shape
Vision Touch
Bidirectionality of crossmodal transfer of texture
Vision Touch
Fig. 4.3 (top) Visual and tactual objects used in the Streri and Gentaz (2003, 2004) and Sann and
Streri (2007) experiments on crossmodal transfer of shape. (below) Visual and tactual objects used
in the Sann and Streri (2007) experiments on crossmodal transfer of texture. The arrows indicate
the sense of transfer.
Numerous behavioural studies have shown that human neonates are able to tactually discrimi-
nate between two objects varying in texture (smooth/granular) or rigidity (soft/hard) by modu-
lating their manual activity (e.g. Molina and Jouen 1998 ; Rochat 1987 ). We reported that
newborns are able to process and encode some information about shape and discriminate two
different shapes (cylinder/prism) with either the right or the left hand (Streri et al. 2000). The
latter result revealed that neonates possess haptic memory. It should be possible then to observe
an inter-manual transfer at birth.
Two experiments, using a habituation/reaction-to-novelty procedure, were performed to assess
human neonates’ ability to process and exchange information about texture or shape between
their hands, without visual control (Sann and Streri 2008). Forty-eight newborn infants (24 per
experiment) received haptic habituation either with their right or left hand, followed by a haptic
discrimination test in the opposite hand. In the test phase, the novel and the familiar object were
each presented twice, in alternation. In both conditions, a reaction to the novel object in the
opposite hand was expected. The results revealed two patterns of behaviour, depending on the
object property to be processed. After tactile habituation to a texture in one hand, newborns held
the novel texture longer in the other hand than the familiar one. In contrast, after tactile habitu-
ation to a shape in one hand, the familiar shape was held longer in the opposite hand than the
novel one. These findings suggest that inter-manual transfer is possible at birth despite the rela-
tive immaturity of the corpus callosum. Although, in both cases, an intermanual transfer occurred,
CONCLUSIONS 105
the discrepancies in results between object properties (preference for the novel texture and pref-
erence for the familiar shape) reveal once again newborns’ difficulties in obtaining a sufficient
representation of shape in haptic mode, as well as the possibility of a different haptic processing
of texture and shape in the newborn brain. This view is supported by a developmental inter-
manual transfer study revealing recognition of the felt object and a preference for the novel shape
at the test phase beginning only at 6 months of age (Streri et al. 2008).
The difference may be related to the manner of gathering information involved in each case.
Previous studies have shown that adults perceive texture most precisely when they are permitted
to move their hands in a lateral gliding motion across the surface of stimulus objects (Lederman
and Klatzky 1987, 1990). Newborns do not display this exploratory procedure. The grasping
reflex is a rough exploratory procedure, but it seems to be sufficient to differentiate between
granular and smooth objects. Visual scanning of the surface is also sufficient to differentiate
a granular from a smooth object. As a consequence, our results suggest that the manner of
gathering and exchanging information about texture across modalities is equivalent in touch
and vision from birth. However, this exchange fails if the object is flat and not volumetric,
because newborns do not exhibit the exploratory procedure of ‘lateral motion’ needed to detect
the different textures of flat objects (Sann and Streri 2008). Grasping is an inefficient ‘enclosure
procedure’ to detect differences between textured flat objects. Texture is a material or microgeo-
metric property better processed by touch than by vision (see Hatwell et al. 2003). Our results
reveal that newborns process this property using the hands as well as the eyes. Shape, a geometri-
cal or macrogeometric property, is better processed by vision than by touch. Our results confirm
this as well.
The second argument is built on previous research performed in 2-month-old infants and
using a bidirectional crossmodal shape transfer task (Streri 1987). The findings revealed that
2-month-old infants visually recognize an object they have previously held, but do not manifest
tactile recognition of an already-seen object. A plausible explanation of these results on crossmo-
dal transfer is that, as in newborns, the levels of representation reached by each modality are not
sufficiently equivalent to exchange information between sensory modalities. This hypothesis
seems to be validated by the fact that, if a 2-month-old baby is presented with degraded visual
stimulation (a bidimensional sketch of an 3D object) in which volumetric and textural aspects are
missing, leading to a blurred percept, tactile recognition is possible, which is not the case with a
visual volumetric object (Streri and Molina 1993). Although the methodology is quite different,
our results are supported by Ernst and Banks’s (2002) model of human adults’ performance on
an intersensory integration task. In this study, adults compared the size of objects between touch
and vision. Visual capture was observed when the visual stimulus was clear (‘noise-free’), and
haptic capture was observed when the visual stimulus was noisy. In other words, the sensory
modality that dominates performance is the one with the lowest level of variance. A similar result
was subsequently observed in children, with integration between systems not appearing before
8 years of age (Gori et al. 2008).
4.5 Conclusions
We recognize, understand, and interact with objects through both vision and touch. In infancy,
despite the various discrepancies between the haptic and visual modalities, such as asynchrony in
the maturation and development of the different senses, distal versus proximal inputs, or parallel
processing in vision versus sequential processing in the haptic modality, from birth onward both
systems detect regularities and irregularities when they are in contact with different objects.
Conceivably, these two sensory systems may encode object properties such as shape and texture
in similar ways. This ability may facilitate crossmodal communication. Our investigations in
crossmodal transfer tasks have revealed some links between the haptic and visual modes at birth.
Newborns are able to visually recognize a held object (Streri and Gentaz 2003). This is the new-
born’s answer both to Molyneux’s question and support of Diderot’s position. This neonatal
ability is independent of learning or the influence of the environment.
The concept of ‘amodal perception’ (Gibson 1969), understood as the creation of a new percept
independent of the sensory modality, fits with our first results on the intermodal transfer from
touch to vision in infancy. However, by means of bidirectional crossmodal transfer tasks, Streri
and colleagues have provided evidence about the perceptual mechanisms present at birth that
constrain or limit the exchange of information between the sensory modalities. Newborns visu-
ally recognize the shape of a felt object, but are unable to recognize the shape of a seen object with
their hands (Sann and Streri 2007). The link is obtained from the simplest information gathered.
Moreover, it is observed only with the newborn’s right hand and not with the left (Streri and
Gentaz 2004). A third striking result is that crossmodal transfer depends on object properties, as
it is bidirectional with texture but not with shape (Sann and Streri 2007)—though this finding
holds if, and only if, the felt textured object is volumetric, and not if it is flat (Sann and Streri
2008). For shape, just as for texture, the newborn’s exploratory procedures are limited to the
grasping reflex, which makes effective exploration of object properties impossible. All these find-
ings suggest that at birth, the links between the senses are specific to individual modalities and are
not yet or entirely a general property of the brain.
A plausible explanation for these failures to obtain an amodal perception could be due to the
situation itself. Recall that crossmodal transfer tasks require several cognitive processes, such as
encoding, memorization, decoding, the comparison of objects, etc. This combination of resources
may be too challenging for the newborn. In simultaneous bimodal situation of haptic and visual
exploration of an object, which is possible starting only at 5 months of age, infants may obtain
a better representation of perceived objects. In infancy, the hands are used as instruments for
transporting objects to the eyes or mouth, and the acquisition of this new ability comes to the
detriment of the perceptual function of the hands.
Several studies have also revealed that over the course of development the links between the
haptic and the visual modes are fragile, often not bidirectional, and representation of objects is
never complete: this holds not only in infancy (Rose and Orlian 1991; Sann and Streri 2007; Streri
and Pêcheux 1986b), but also in children (Gori et al. 2008) and adults (Kawashima et al. 2002).
For example, in a behavioural and PET study on human adults, Kawashima et al. found that the
human brain mechanisms underlying crossmodal discrimination of object size have two different
pathways depending on the temporal order in which the stimuli are presented. Crossmodal infor-
mation transfer was found to be less accurate with visual-to-tactile (VT) transfer compared with
tactile to visual (TV) transfer. In addition, more brain areas were activated during VT than during
TV. Crossmodal transfer of information is rarely reversible, and is generally asymmetrical even
when it is bidirectional.
All of these results contrast with other studies that have provided evidence for amodal percep-
tion in infancy. For example, Féron, Streri, and Gentaz (2005) have demonstrated that 5-month-
old infants are able to use their hands, without visual reinforcement, to discriminate a set of two
objects from a set of three in a crossmodal transfer task. This result means that discrimination of
number is amodal, since it is revealed also in the visual and auditory modes (Izard et al. 2009;
Kobayashi et al. 2005; Starkey et al. 1990). Four-month-old infants are able, thanks to bimanual
haptic exploration, to complete their information about a large partially felt object, and to track
its unity in the same ways as in the visual mode (Kellman and Spelke 1983; Streri and Spelke
1988). How should one interpret these results?
REFERENCES 107
In short, all these studies on the links between the haptic and visual modalities suggest two
levels of amodal perception or amodal representation. In the conditions where the exchanges
concern object properties, directly perceived by means of the senses, amodal perception is never
complete, or is limited to the less efficient modality, i.e. the haptic modality. But if the infants
have to abstract some not-directly-available properties from their environment to better under-
stand it, then amodal perception and representation occur, and operate independently from the
individual sensory modalities.
Acknowledgements
These studies were financed by Paris Descartes University, the CNRS, and a grant from the
Institut Universitaire de France. The author thanks Andrew Bremner, Paul Reeves, Elizabeth
Spelke, and Charles Spence for their remarks, stylistic corrections and suggestions on earlier ver-
sions of this manuscript.
References
Allen, D., Tyler, C.W., and Norcia, A.M. (1996). Development of grating acuity and contrast sensitivity in
the central and peripheral visual field of the human infant. Vision Research, 13, 1945–53.
Amedi, A., Kriegstein, K. von, Atteveldt, N.M. van, Beauchamp, M.S., and Naumer, M.J. (2005). Functional
imaging of human crossmodal identification and object recognition. Experimental Brain Research, 166,
559–71.
Bahrick, L.E. (1987). Infants’ intermodal perception of two levels of temporal structure in natural events.
Infant Behavior and Development, 10, 387–416.
Bahrick, L.E. (1988). Intermodal learning in infancy: learning on the basis of two kinds of invariant
relations in audible and visible events. Child Development, 59, 197–209.
Bahrick, L.E., and Pickens, J.N. (1994). Amodal relations: the basis for intermodal perception and learning
in infancy. In The development of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz,
and R. Lickliter), pp. 205–33. Lawrence Arlbaum Associates, Hillsdale, NJ.
Beauchamp, G.K., and Mennela, J.A. (2009). Early flavor learning and its impact on later feeding behavior.
Journal of Pediatric Gastroenterology and Nutrition, 48, 25–30.
Berthental, B., Haith, M., and Campos, J. (1983). The partial-lag design: a method for controlling
spontaneous regression in the infant-control habituation paradigm. Infant Behavior and Development,
6, 331–38.
Bloom, J.S., and Hynd, G.W. (2005). The role of the corpus callosum in interhemispheric transfer of
information: excitation or inhibition? Neuropsychology Review, 15, 59–71.
Bruno, M., and Mandelbaum, E. (2010). Locke’s answer to Molyneux’s thought experiment. History of
Philosophy Quarterly, 27, 165–80.
Bushnell, E.W., and Boudreau, J.P. (1998). Exploring and exploiting objects with the hands during infancy.
In The psychobiology of the hand (ed. K.J. Connolly), pp. 144–61. Mac Keith Press, London.
Butterworth, G.E., and Castillo, M. (1976). Coordination of auditory and visual space in newborn human
infants. Perception, 5, 155–60.
Butterworth, G.E., and Hopkins, B. (1988). Hand-mouth coordination in the newborn human infant.
British Journal of Developmental Psychology, 6, 303–314.
Calvert, G., Spence, C., and Stein, B.E. (eds.) (2004). The handbook of multisensory processes. MIT Press,
Cambridge, MA.
Cernacek, K.J., and Podivinski, F. (1971). Ontogenesis of handedness and somatosensory cortical response.
Neuropsychologia, 9, 219–32.
Cohen, L., Celnik, P., Pascual-Leone, A., et al. (1997). Functional relevance of crossmodal plasticity in blind
humans. Nature, 389, 180–83.
Cohen, L.B., and Younger, B.A. (1984). Infant perception of angular relations. Infant BehaviorBehaviour
and Development, 7, 37–47.
Connolly, K., and Jones, B. (1970). A developmental study of afferent-reafferent integration. British Journal
of Psychology, 61, 259–66.
Coryell, J.F., and Michel, G.F. (1978). How supine postural preferences of infants can contribute toward
the development of handedness. Infant Behavior and Development, 1, 245–57.
Coulon, M., Guellai, B., and Streri, A. (2011). Recognition of unfamiliar talking faces at birth. International
Journal of Behavioral Development, 35, 282–87.
Crassini, B., and Broerse, J. (1980). Auditory-visual integration in neonates: a signal detection analysis.
Journal of Experimental Child Psychology, 29, 144–55.
Diderot, D. (1749/1972). La lettre sur les aveugles à l’usage de ceux qui voient (The letter on blind people to
sighted people). Garnier-Flammarion, Paris.
Doucet, S., Soussignan, R., Sagot, P., and Schaal, B. (2007). The ‘smellscape’ of mother’s breast: effects of
odor, masking and selective unmasking on neonatal arousal, oral and visual responses. Developmental
Dubois, J., Hertz-Pannier, L., Dehaene-Lambertz, G., Cointepas, Y., and Le Bihan, D. (2006). Assessment of
the early organization and maturation of infants’ cerebral white matter fiber bundles: a feasibility study
using quantitative diffusion tensor imaging and tractography. NeuroImage, 30, 1121–32.
Elman, J.L., Bates, E., Johnson, M.H., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996). Rethinking
innateness: a connectionist perspective on development. MIT Press, Cambridge, MA.
Fabri, M., Polonara, G., Del Pesce, M., Quattrini, A., Salvolini, U., and Manzoni, T. (2001). Posterior corpus
callosum and interhemispheric transfer of somatosensory information: an fRMI and neuropsychological
study of partially callosotomized patient. Journal of Cognitive Neuroscience, 13, 1071–79.
Fabri, M., Del Pesce, M., Paggi, A., et al. (2005). Contribution of posterior corpus callosum to the
interhemispheric transfer of tactile information. Cognitive Brain Research, 24, 73–80.
Féron, J., Gentaz, E., and Streri, A. (2006). Evidence of amodal representation of small numbers across
visuo-tactile modalities in 5-month-old infants. Cognitive Development, 21, 81–92.
Field, T.M., Woodson, R., Greenberg, R., and Cohen, D. (1982). Discrimination and imitation of facial
expressions by neonates. Science, 218, 179–81.
Fontaine, R. (1984). Imitative skills between birth and six months. Infant Behavior and Development, 7, 323–33.
Gallagher, S. (2006). Neurons and neonates: reflections on the Molyneux problem. In How the body shapes
the mind (ed. S. Gallagher), pp. 45–58. Oxford University Press, Oxford.
Garner, J.M., and Karmel, B.Z. (1983). Attention and arousal in preterm and full-term neonates, in Infant
born at risk: physiological, perceptual, and cognitive processes (eds. T. Field, and A. Sostek), pp. 69–98.
Grune and Stratton, New York.
Garner, J.M., Lewkowicz, D.J., Rose, S.A., and Karmel, B.Z. (1986). Effects of visual and auditory stimulation
on subsequent visual preferences in neonates. International Journal of Behavioral Development, 9, 251–63.
Gesell, A. (1938). The tonic neck reflex in the human infant. Journal of Pediatrics, 13, 455–64.
Gibson, E.J. (1969). Principles of perceptual learning and development. Academic Press, New York.
Gibson, E.J., and Walker, A. (1984). Development of knowledge of visual-tactual affordances of substance.
Child Development, 55, 453–60.
Gogate, L.J., and Bahrick, L.E. (1988). Intersensory redundancy facilitates learning of arbitrary relation
between vowel sounds and objects in seven-month-old-infants. Journal of Experimental Child
haptic form information. Current Biology, 18, 694–98.
Gottfried, A.W., Rose, S.A., and Bridger, W.H. (1978). Effects of visual, haptic and manipulatory
experiences on infants’ visual recognition memory of objects. Developmental Psychology, 14, 305–312.
REFERENCES 109
Granier-Deferre, C., Schaal, B., and DeCasper, A.J. (2004). Les prémices foetales de la cognition (the fœtal
beginnings of cognition). In Le développement du nourrisson (ed. R. Lecuyer), pp. 101–38. Dunod, Paris.
Guellai, B., Coulon, M., and Streri, A. (2011). The role of motion and speech in face recognition at birth.
Visual Cognition, 19:9, 1212–33.
Guellai, B., Streri, A. (2011). Cues for early social skills: direct gaze modulates newborns’recognition of
talking faces. Plos One, doi:10.1371/journal.pone.0018610.
Guest, S., and Spence, C. (2003). Tactile dominance in speeded discrimination of textures. Experimental
Brain Research, 150, 201–207.
Hatwell, Y. (1994). Transferts intermodaux et intégration intermodale (crossmodal transfers and
intersensory integration). In Traité de psychologie expérimentale, Vol. 1 (eds. M. Richelle, J. Requin, and
M. Robert) pp. 543–84. PUF, Paris.
Hatwell, Y. (2003). Intermodal coordinations in children and adults. In Touching for knowing: cognitive
psychology of tactile manual perception (eds. Y. Hatwell, A. Streri, and E. Gentaz), pp. 191–206. Johns
Benjamins Publishing Company, Amsterdam.
Hatwell, Y., Streri, A., and Gentaz, E. (2003) Touching for knowing: cognitive psychology of tactile manual
perception. Johns Benjamins Publishing Company, Amsterdam.
Held, R. (2009). Visual-haptic mapping and the origin of crossmodal identity. Optometry and Vision
Science, 86, 595–98.
Izard, V., Sann, C., Spelke, E.S., and Streri, A. (2009). Newborn infants perceive abstract numbers.
Proceedings of the National Academy of Sciences USA, 106, 10382–85.
Jeannerod, M. (1975). Deficit visuel persistant chez les aveugles-nés opérés. Données cliniques et
expérimentales (Persistent visual deficit in congenital blind people after surgery: clinical and
experimental data). L’Année Psychologique, 75, 169–95.
Jones, B., and Connolly, K. (1970). Memory effects in crossmodal matching. British Journal of Psychology,
61, 267–70.
Jovanovic, B., Duemmler, T., and Schwarzer, G. (2008). Infant development of configural object processing
in visual and visual-haptic contexts. Acta Psychologica, 129, 376–86.
Juurmaa, J., and Lehtinen-Railo, S. (1988). Visual experience and access to spatial knowledge. Journal of
Visual Impairment and Blindness, 88, 157–70.
Katz, D. (1925/1989). The world of touch (translated by L.E. Krueger 1989). Lawrence Erlbaum Associates,
Hillsdale, NJ.
Kawashima, R., Watanabe, J., Kato, T., et al. (2002). Direction of crossmodal information transfer affects
human brain activation: a PET study. European Journal of Neuroscience, 16, 137–44.
Kellman, P.J., and Arterberry, M.E. (1998). The cradle of knowledge. MIT Press, Cambridge, MA.
Klatzky, R.L., and Lederman, S.J. (1993). Toward a computational model of constraint-driven exploration
and haptic object identification. Perception, 22, 597–621.
Kellman, P., and Spelke, E.S. (1983). Perception of partly occluded objects in infancy. Cognitive Psychology,
15, 483–524.
Klatzky, R.L., Lederman, S.J., and Reed, C. (1987). There’s more to touch than meets the eye: the salience of
object attributes for haptic with and without vision. Journal of Experimental Psychology: General, 116,
356–69.
Kobayashi, T., Hiraki, K., and Hasegawa, T. (2005) Auditory-visual intermodal matching of small
numerosities in 6-month-old infants. Developmental Science, 8, 409–419.
Lacey, S., Campbell, C., and Sathian, K. (2007). Vision and touch: multiple or multisensory representations
of objects? Perception, 36, 1513–21.
Lecanuet, J.-P., Granier-Deferre, C., and Busnel, M.-C. (1995). Human fetal auditory perception. In Fetal
development: a psychobiological perspective (eds. J.-P. Lecanuet, W.P. Fifer, N.A. Krasnegor, and W.P.
Smotherman), pp. 239–62. Lawrence Arlbaum Associates, Hillsdale, NJ.
Ledermann, S.J., and Klatzky, R.L. (1987). Hand movements: a window into haptic object recognition.
Cognitive Psychology, 19, 342–68.
Lederman, S.J., and Klatzky, R.L. (1990). Haptic classification of common objects: knowledge-driven
exploration. Cognitive Psychology, 22, 421–59.
Lederman, S.J., and Klatzky, R.L. (1996). Haptic object identification II. Purposive exploration. In Somesthesis
and the neurobiology of the somatosensory cortex (ed. O. Franzen), pp. 153–61. Birkhauser Verlag, Basel.
Lejeune, F., Audeoud, F., Marcus, L., et al. (2010). The manual perception of shapes in preterm human
infants from 33 to 34 + 6 post-conceptional age. PLoS ONE, 5(2): e9108.
Lewkowicz, D.J., and Lickliter, R. (1994). The development of intersensory perception. Comparative
Lewkowicz, D.J., and Turkewitz, G. (1981). Intersensory interaction in newborns: modification of visual
preferences following exposure to sound. Child Development, 52, 827–32.
Loomis, J.M., and Lederman, S.J. (1986). Tactual perception. In Handbook of perception and human performance.
Vol. II: Cognitive processes and performance (ed. K.R. Boff), pp. 1–41. John Wiley and Sons, New-York.
Maratos, O. (1973). The origin and development of imitation in the first six months of life. PhD thesis.
University of Geneva, Switzerland.
Marks, L.E. (1975). On coloured-hearing synaesthesia: crossmodal translations of sensory dimensions.
Maurer, D. (1993). Neonatal synaesthesia: implications for the processing of speech and faces. In
Developmental neurocognition: speech and face processing in the first year of life (eds. B. de Boysson-
Bardies, and S. de Schönen), pp. 109–24. Kluwer Academic Publishers, Netherlands.
Maurer, D., Stager, C., and Mondloch, C. (1999). Crossmodal transfer of shape is difficult to demonstrate
in one-month-olds. Child Development, 70, 1047–57.
McClelland, J., Rumelhart, J., and the PDP research group. (1986). Parallel distributed processing:
explanations in the microstructure of cognition, Vol. 2. MIT Press, Cambridge, MA.
McGurk, H., Turnure, C., and Creighton, S.J. (1977). Auditory-visual coordination in neonates. Child
Meltzoff, A.N. (1993). Molyneux’ s babies: crossmodal perception, imitation and the mind of the preverbal
infant. In Spatial representation: problems in philosophy and psychology (eds. N. Eilan, R. McCarthy, and
B. Brewer), pp. 219–35. Oxford University Press, Oxford.
Meltzoff, A.N., and Moore, M.K. (1977). Imitation of facial and manual gestures by human neonates,
Science, 198, 75–78.
Merabet, L., Thut, G., Murray, B., Andrews, J., Hsiao, S., and Pascual-Leone, A. (2004). Feeling by sight or
seeing by touch? Neuron, 42, 173–79.
Michel, G.F., and Goodwin, R. (1979). Intrauterine birth position predicts newborn supine head
preferences. Infant and Development, 2, 29–38.
Molina, M., and Jouen, F. (1998). Modulation of the palmar grasp behaviour in neonates according to
texture property. Infant and Development, 21, 659–66.
Molina, M., and Jouen, F. (2001). Modulation of manual activity by vision in human newborns.
Developmental Psychobiology, 38, 123–32.
Mondloch, C.J., and Maurer, D. (2004). Do small white balls squeak? Pitch-object correspondences in
young children. Cognitive, Affective and al Neuroscience, 4, 133–36.
Morrongiello, B.A., Fenwick, K.D., and Chance, G. (1998). Crossmodal learning in newborn infants:
inferences about properties of auditory–visual events. Infant and Development, 21, 543–54.
Newham, C., and McKenzie, B.E. (1993). Crossmodal transfer of sequential and haptic information by
clumsy children. Perception, 22, 1061–73.
Pascalis, O., and de Haan, M. (2003). Recognition memory and novelty preference: what model? In Progress
in infancy research, Vol. 3 (eds. H. Hayne, and J. Fagen), pp. 95–119. Lawrence Erlbaum Associates,
Mahwah, NJ.
REFERENCES 111
Pascalis, O., and Slater, A. (2003). The development of face processing in infancy and early childhood: current
perspectives. Nova Science Publishers, Inc., New York.
Pascual-Leone, A., and Hamilton, R. (2001). The metamodal organization of the brain. In Progress in Brain
Research, Vol. 134 (eds. C. Casanova, and M. Petito), pp. 427–45. Elsevier Science B.V., Amsterdam.
Pihko, E., Nevalainen, P., Stephen, J., Okada, Y., and Lauronen, L. (2009). Maturation of somatosensory
cortical processing from birth to adulthood revealed by magnetoencephalography. Clinical
Proust, J. (1997). Perception et Intermodalité (Perception and intersensory relations). Presses Universitaires
de France, Paris.
Ramachandran, V.S., and Hubbard, E.M. (2001). Synaesthesia—a window into perception, thought and
language. Journal of Consciousness Studies, 8, 3–34.
Revesz, G. (1950). Psychology and art of the blind. Longmans Green, London.
Ricci, D., Cesarini, L., Groppo, M., et al. (2008). Early assessment of visual function in full term newborns.
Early Human Development, 84, 107–113.
Rich, A.N., Bradshaw, J.L., and Mattingley, J.B. (2004). A systematic, large-scale study of synaesthesia:
implications for the role of early experience in lexical-colour associations. Cognition, 98, 53–84.
Rochat, P. (1987). Mouthing and grasping in neonates: evidence for the early detection of what hard or soft
substances afford for action. Infant Behavior and Development, 10, 435–49.
Rochat, P. (1989). Object manipulation and exploration in 2- to 5-month-old infants. Developmental
Röder, B., and Rösler, F. (2004). Compensatory plasticity as a consequence of sensory loss. In Handbook of
multisensory processing (eds. G. Calvert, C. Spence, and B.E. Stein), pp. 719–47. MIT Press, Cambridge, MA.
Roland, P.E., and Mortensen, E. (1987). Somatosensory detection of microgeometry, macrogeometry and
kinaesthesia in man. Brain Research Review, 12, 1–42.
Roland, P.E., O’Sullivan, B., and Kawashima, R. (1998). Shape and roughness activate different somatosensory
areas in the human brain. Proceedings of the National Academy of Sciences USA, 95, 3295–300.
Rose, S.A. (1994). From hand to eye: findings and issues in infant crossmodal transfer. In The development
of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 265–84.
Lawrence Erlbaum Associates, Hillsdale, NJ.
Rose, S.A., and Orlian, E.K. (1991). Asymmetries in infant crossmodal transfer. Child Development,
62, 706–18.
Rovee-Collier, C., and Barr, R. (2001). Infant learning and memory. In Handbook of infant development
(eds. G. Bremner and A. Fogel), pp. 139–68. Blackwell Publishing, Malden, MA.
Ruff, H.A. (1984). Infants’ manipulative exploration of objects: effects of age and object characteristics.
Developmental Psychology, 20, 9–20.
Ruff, H.A., Saltarelli, L.M., Cappozoli, M., and Dubiner, K. (1992). The differentiation of activity in infants’
exploration of objects. Developmental Psychology, 28, 851–61.
Sann, C., and Streri, A. (2007). Perception of object shape and texture in human newborns: evidence from
crossmodal transfer tasks. Developmental Science, 10, 398–409.
Sann, C., and Streri, A. (2008). The limits of newborn’s grasping to detect texture in a crossmodal transfer
task. Infant Behavior and Development, 31, 523–31.
Sathian, K. (2005). Visual cortical activity during tactile perception in the sighted and the visually deprived.
Development Psychobiology, 46, 279–86.
Schaal, B., Marlier, L., and Soussignan, R. (1998). Olfactory function in the human fetus: evidence from
selective neonatal responsiveness to the odor of amniotic fluid. Behavioral Neuroscience, 112, 1438–49.
Shams, L., and Seitz, A.R. (2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12, 411–417.
Slater, A. (1998). Perceptual development: visual, auditory and speech perception in infancy. Psychology Press,
Hove.
Slater, A., and Kirby, R. (1998). Innate and learned perceptual abilities in the newborn infant. Experimental
Slater, A., Brown, E., and Badenoch, M. (1997). Intermodal perception at birth: newborn infants’ memory
for arbitrary auditory–visual pairings. Early Development and Parenting, 6, 99–104.
Slater, A., Mattoch, A., Brown, E., and Bremner, J.G. (1991). Form perception at birth: Cohen and Younger
(1984) revisited. Journal of Experimental Child Psychology, 51, 395–406.
Slater, A., Quinn, P.C., Brown, E., and Hayes R. (1999). Intermodal perception at birth: intersensory redundancy
guides newborn infants’ learning of arbitrary auditory–visual pairings. Developmental Science, 2, 333–38.
Spector, F., and Maurer, D. (2009). Synaesthesia: a new approach to understanding the development of
Spelke, E.S. (1976). Infants’ intermodal perception of events. Cognitive Psychology, 8, 553–60.
Spence, C. (2011). Crossmodal correspondences: a tutorial review. Attention, Perception, and Psychophysics,
73, 971–95.
Spence, C., and Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford University Press, Oxford.
Starkey, P., Spelke, E.S., and Gelman, R. (1990). Numerical abstraction by human infants. Cognition, 36, 97–127.
Streri, A. (1987). Tactile discrimination of shape and intermodal transfer in 2- to 3- month-old infants.
British Journal of Developmental Psychology, 5, 213–20.
Streri, A. (1993). Seeing, reaching, touching: the relations between vision and touch in infancy. Halverster
Wheatsheaf, London.
Streri, A., and Gentaz, E. (2003). Crossmodal recognition of shape from hand to eyes in human newborns.
Somatosensory and Motor Research, 20, 11–16.
Streri, A., and Gentaz, E. (2004). Crossmodal recognition of shape from hand to eyes and handedness in
human newborns. Neuropsychologia, 42, 1365–69.
Streri, A., and Molina, M. (1993). Visual-tactual and tactual-visual transfer between objects and pictures in
2-month-old infants. Perception, 22, 1299–318.
Streri, A., and Pêcheux, M.G. (1986a). Tactual habituation and discrimination of form in infancy: a
comparison with vision. Child Development, 57, 100–104.
Streri, A., and Pêcheux, M.G. (1986b). Vision to touch and touch to vision transfer of form in 5-month-old
infants. British Journal of Developmental Psychology, 4, 161–67.
Streri, A., and Spelke, E.S. (1988). Haptic perception of objects. Cognitive Psychology, 20, 1–23.
Streri, A., and Spelke, E.S. (1989). Effects of motion and figural goodness on haptic object perception in
infancy. Child Development, 60, 1111–25.
Streri, A., Lhote, M., and Dutilleul, S. (2000). Haptic perception in newborns. Developmental Science, 3, 319–27.
Streri, A., Lemoine, C., and Devouche, E. (2008). Development of inter-manual transfer information in
infancy. Developmental Psychobiology. 50, 70–76.
Tomasello, M. (1998). Uniquely primate, uniquely human. Developmental Science, 1, 1–16.
Turkewitz, G., Gardner, J., and Lewkowicz, D.J. (1984). Sensory/perceptual functioning during early
infancy: the implications of a quantitative basis for responding. In Behavioral evolution and integrative
levels (eds. G. Greenberg and E. Tobach), pp. 167–95. Lawrence Erlbaum Associates, Hillsdale, NJ.
Turkewitz, G., Gordon, E.W., and Birch, H.G. (1965). Head turning in the human neonate: spontaneous
pattern. Journal of Genetic Psychology, 107, 143–58.
Twitchell, T.E. (1965). The automatic grasping responses of infants. Neuropsychologia, 3, 247–59.
Vinter, A. (1986). The role of movement in eliciting early imitations. Child Development. 57, 66–71.
von Hofsten, C. (1982). Eye-hand coordination in the newborn. Developmental Psychology, 18, 450–61.
Walker, P., Bremner, J.G., Mason, U., et al. (2010). Preverbal infants’sensitivity to synaesthetic
crossmodality correspondences. Psychological Science, 21, 21–25.
Wertheimer, M. (1961). Psychomotor coordination of auditory and visual space at birth. Science, 134, 1692.
Chapter 5
The development of multisensory

representations of the body and
of the space around the body
Andrew J. Bremner, Nicholas P. Holmes,
and Charles Spence
5.1 Introduction
Developing mature multisensory representations of the spatial disposition of our body and limbs
is vital if we are to move around the environment in a physically competent manner. As adults,
we efficiently integrate information arriving from multiple sensory modalities pertaining to
the spatial layout of the body and limbs. Indeed, multisensory body representations underpin
a fundamental aspect of all our mental processes, by providing ‘embodied’ representations
that give a point of reference to objects in the external world (Bermudez et al. 1995; Gallagher
2005; Varela et al. 1991). Such embodied representations provide the basis for action in both the
nearby (peripersonal) and distant (extrapersonal) environments (see Previc 1998; Rizzolatti et al.
1997). Despite the fundamental role of such representations, there is clear scope for significant
development in them; any early ability to represent the layout of one’s body would need to be
continually re-tuned throughout development in order to cope with physical changes in the sizes,
disposition, and movement capabilities of the limbs (see A.J. Bremner et al. 2008a; Gori et al.
2010; King 2004).
A casual observation of young infants’ behaviour with respect to their own bodies indicates
significant limitations in their perception and understanding of their limbs. As suggested by the
following anecdotal observation of a 4-month-old infant, a naïve appreciation of the relation
between tactile and visual spatial frames of reference could lead to a number of problems in
controlling and executing appropriate actions:
Sometimes the hand would be stared at steadily, perhaps with growing intensity, until interest reached
such a pitch that a grasping movement followed as if the infant tried by an automatic action of the
motor hand to grasp the visual hand, and it was switched out of the centre of vision and lost as if it had
magically vanished. (Hall 1898; p. 351).
Even with children well into their second year of life, DeLoache et al. (2004) have observed
striking ‘scale errors’ indicating that, despite being able to arrange their limbs competently
with respect to the actions they select (see also Glover 2004), children misunderstand or fail to take
into account the scale relationship between their body and an object when selecting appropriate
actions.
In this chapter, we discuss the research that has investigated the development, through infancy
and early childhood, of multisensory representations of the body and of the space around
114 DEVELOPMENT OF MULTISENSORY REPRESENTATIONS OF THE BODY
the body. We will limit our focus to the emergence of representations that subserve action in rela-
tion to the nearby (peripersonal) environment (for a discussion of representations subserving
action with respect to the more distant environment—in particular, balance and locomotion—
see Chapter 6 by Nardini and Cowie).
5.2 The multisensory nature of body representations

The sensory information that provides the most obvious contribution to body representations is
that arising from the senses that are only stimulated by the body or by objects touching it. Two
inputs that have particular importance in this respect are those of touch (i.e. cutaneous somato-
sensory inputs) and proprioception (i.e. deep or non-cutaneous somatosensory inputs). However,
this is not to say that it is this sensory information alone which provides embodied experience. On
the contrary, behavioural and physiological evidence from adult humans and monkeys points to
the conclusion that the neural systems subserving representations of the body and the embodied
environment (including peripersonal space) integrate information from the receptors relating to
the body and its surface (touch and proprioception) with that arriving from the receptors that
provide information about the near and distant environment (i.e. vision and audition). For
instance, as early as Brodmann’s area 5 in the higher-order somatosensory cortex, tactile and
proprioceptive information about the arm and hand is integrated with visual inputs, generating
multisensory representations of body-part location (Graziano et al. 2000). Similarly, parts of the
ventral and dorsal premotor cortex in adult monkeys and humans integrate tactile, propriocep-
tive, visual, and even auditory inputs in the representation of limbs, of limb position and of
nearby objects in body-part-centred representations of space (Ehrsson et al. 2004; Graziano et al.
1997; Holmes and Spence 2004; Makin et al. 2008; Maravita et al. 2003). Interestingly, research
also suggests that auditory-tactile integration may be of particular relevance to representations of
the space behind the head (e.g. Kitagawa et al. 2005).
5.3 The development of visual–proprioceptive and visual–tactile

correspondence: early body representations?
The question of whether infants and young children are able to perceive spatial correspondences
between sensory inputs relating to the distant environment (in particular, visual inputs) and
direct inputs concerning the body is one that has fascinated philosophers and psychologists since
the time of Molyneaux and Locke (Locke 1690; see also Chapter 4 by Streri). The first empirical
approaches attempted to discern whether children and later on infants could recognise an object
in one sensory modality, which had previously been presented only in another (‘crossmodal
transfer tasks’). Birch and Lefford (1963), the pioneers of this technique in children, observed that
the accuracy of children’s crossmodal matching of stimuli between vision and touch increases
from the ages of 5 to 11 years of age. However, their conclusion that the crossmodal integration
of these senses undergoes extended development across childhood has since received substantial
criticism. In particular, Bryant and his colleagues (Bryant 1974; Bryant et al. 1972; Bryant and Raz
1975; Hulme et al. 1983) refuted Birch and Lefford’s (1963) finding of age-related developments
in crossmodal matching, by noting that it could be explained by corresponding developments in
unimodal perceptual matching.
Bryant et al. (1972) argued, contrary to Birch and Lefford’s (1963) account, that the crossmodal
integration of touch and vision is in fact present in early infancy, and backed this up with evidence
that 6–12-month-old infants are able to identify shapes visually, which they had only previously
experienced through touch. Rose and her colleagues confirmed in a series of studies that
THE DEVELOPMENT OF VISUAL–PROPRIOCEPTIVE AND VISUAL–TACTILE CORRESPONDENCE 115
bi-directional crossmodal transfer between touch and vision is available by 6 months of age (Rose
et al. 1981a), but they also demonstrated that between 6 and 12 months infants become progres-
sively more efficient at encoding information into memory, allowing them to achieve crossmodal
recognition in a more robust way (Gottfried et al. 1977, 1978; Rose et al. 1981b; for a review see
Rose 1994). More recently, Streri and colleagues have showed that an ability to perceive commo-
nalities between touch and vision is present at birth (although see also Maurer et al. 1999; Meltzoff
and Borton 1979 for controversy surrounding the early emergence of crossmodal transfer),
by demonstrating that newborn infants are capable of some visual–tactile crossmodal transfer
of shape recognition (see Chapter 4 by Streri; also Sann and Streri 2007; Streri and Gentaz 2003,
2004).
The early development of spatial correspondence between vision and proprioception has
also been investigated in infants using a crossmodal matching method, which measures infants’
preference for looking toward visual information that is either congruent or incongruent with
proprioception. Following the development of this method by Bahrick and Watson (1985), a
series of studies conducted by several different research groups have examined whether infants
from as young as 3 months of age were able to recognize crossmodal visual–proprioceptive (VP)
correspondences arising from the movement of their own limbs (e.g. Bahrick and Watson 1985;
Rochat and Morgan 1995; Schmuckler and Fairhall 2001; Schmuckler and Jewell 2007). In these
studies, infants were presented with a visual display showing their own legs moving in synchrony
with their own movements, and compared this with the visual presentation of another child’s leg
movements. The preferences infants showed for the asynchronous display across a number of
different conditions of stimulus presentation provided strong evidence that young infants are
indeed able to detect VP spatiotemporal correspondences.
Thus, the evidence seems fairly conclusive that, soon after birth, humans are able to register
correspondences across the direct and distance receptors (at least for visual–tactile (VT) and VP
correspondences). However, in considering the development of multisensory representations of
the body and limbs, it is important to examine whether these VT and VP spatial correspondences
are perceived with respect to parts of the body (e.g. head-, eye-, body-, limb-centred spatial coor-
dinates). The crossmodal matching and crossmodal transfer studies we have discussed in this
section suffer from a limitation in this respect because although they permit investigation of
infants’ ability to perceive a multisensory spatial correspondence (either between touch and
vision or between proprioception and vision), they do not provide an indication of the spatial
frame of reference in which this correspondence is encoded. Given young infants’ skill with rep-
resenting external (allocentric) spatial frames of reference (A.J. Bremner et al. 2007; Kaufman and
Needham 1999), it is quite plausible that crossmodal transfer is achieved with respect to a frame
of reference unrelated to a representation of the body or indeed any intrinsic spatial framework
such as the eye or the hand (A.J. Bremner et al. 2008a).
In their studies of VP matching (using the crossmodal matching paradigm described above),
Rochat and Morgan (1995), and Schmuckler (1996) attempted to determine the underlying
frame of reference governing the infants’ behaviour. They did this via some manipulations of the
left–right and up–down spatial matches between the visual and proprioceptive stimuli. The
authors of these two papers found, independently, that infants only appear to detect VP corre-
spondence when the visual inputs move in temporal synchrony with the limb, and when they
correspond spatially in the left–right dimension of movement (Rochat and Morgan 1995;
Schmuckler 1996). The infants did not demonstrate any differential looking with respect to
changes in crossmodal correspondence in the up–down (vertical) dimension of movement.
Arguing that the left–right dimension of movement is specific to an egocentric frame of reference
(cf. Driver 1999), Rochat (1998) concluded that the infants’ crossmodal matching is based on a
representation of movements of their own body coded relative to an intrinsic frame of reference.
Rochat has further asserted that these multisensory egocentric representations are made possible
by an innate human capacity to represent one’s own body (Rochat 1998, 2010).
However, there are a number of problems with concluding from these data that young infants
perceive visual and proprioceptive cues with respect to a body-, or limb-centred spatial frame of
reference. In these VP matching studies, the visual cues concerning the body were presented on a
video display outside bodily and peripersonal space (i.e. beyond reach, in extrapersonal space),
and so it is difficult to rule out the possibility that infants were responding on the basis of some
other correspondence between these sensory inputs, such as their correlated movements
(cf. Parise, Spence, and Ernst, 2012). Indeed, it is important to highlight that infants’ detection of
VP correspondence in this paradigm need not necessarily be registered with respect to limb- or
body-centred spatial frames of reference. For instance, VP contiguity occurs when the move-
ments of an infant’s arm cause objects in his or her environment to move (such as a blanket or a
teddy bear). This kind of experience may lead young infants to expect VP spatial correspondence,
without any recourse to a multisensory spatial representation of their own body. Thus, it is pos-
sible that the findings from these studies may tell us more about infants’ perceptions of contin-
gencies between their own movements and the movements of objects or people in their
extrapersonal environment than about representations of their own bodies.1 Recall that the
4-month-old in Hall’s (1898) anecdote mentioned earlier treated their visual hand as if it were an
extrapersonal object.
Indeed, the limitations in the matches that the infants made suggest that this extrapersonal
interpretation of VP correspondence might be the most appropriate. Recall that while infants
detected left–right VP congruency, they failed to notice such crossmodal correspondence in the
up–down (vertical) dimension (Rochat 1998; Schmuckler 1996; see also Snijders et al. 2007).
There seems to be little reason to expect such a problem with up–down VP correspondence with
reference to body- or limb-centred frames of reference because infants will have had crossmodal
experience of their arms moving with respect to both horizontal and vertical axes. Conversely,
experience with VP correspondence arising from the movements of extrapersonal objects is likely
to be much more restricted to the left–right dimension, for the simple reason that the arrange-
ments of objects and surfaces with respect to gravity in our environment allows for less variation
within the vertical plane (at least up until infants start picking up and dropping objects towards
the end of the first year). Thus, young infants’ abilities to detect left–right, but not up–down spa-
tial correspondence between their proprioceptively perceived leg movements and their visually
perceived movements on the video screen may well be based on representations of, and arise from
the experience of, crossmodal correspondences between the proprioceptive body and visual
extrapersonal objects.
Finally, even if young infants do register VP commonalities according to a representation of
their own bodies (as argued by Rochat 1998), we have little information regarding the spatial
precision of such representations. It is unclear to what extent the crossmodal perceptual abilities
of young infants could provide the basis for useful sensorimotor coordination in the immediate
personal and peripersonal environment. It is our assertion (see later, and also A.J. Bremner et al.
2008a) that, in order to glean unambiguous information concerning the development of spatial
representations of the body, spatial-orienting paradigms are needed. Spatial-orienting responses
have the advantage of requiring coordination between the stimulus location and the intrinsic
1 Indeed, this prroblem is also present in a more recent study on infants’ detection of visual-tactile syn-
chrony of stimulation to the legs (Zmyj et al. 2011).
TWO MECHANISMS OF MULTISENSORY INTEGRATION 117
spatial framework within which the response is made. We will shortly move on to a description of
the findings of a study of infants’ orienting responses to tactile stimuli presented to their hands.
However, first we will consider the computational demands of orienting toward the body or
stimuli in the nearby (peripersonal) environment. As will become clear, the task of orienting to a
stimulus with respect to the body is by no means a simple one.
5.4 Computational challenges of forming multisensory

representations of the body
The existence of many separate multisensory representations of space and the body, across indi-
viduals, brain areas, and species, belies the computational complexity of constructing such repre-
sentations. Not only do the senses convey information about the environment in different neural
codes and reference frames, but the relationship between sensory modalities frequently changes,
for example with changes in body posture, such as when the eyes move in their sockets or the
arms move. Take, for instance, the task of retrieving a nearby seen object: first, the brain must
represent the object’s location visually and then it must translate this retinocentric location into
the limb-centred coordinates necessary for initiating a reaching movement. In other words, the
brain calculates the location of the retrieving limb relative to the object. The necessary multisen-
sory information specifying the layout of our bodies with respect to the world is typically pro-
vided by touch, proprioception, vision, and occasionally audition. The neural representations of
limb position and nearby stimuli dynamically remap the ways in which information is integrated
across the senses in response to different circumstances (e.g. changes in posture, or the use of
tools; Groh and Sparks 1996a,b; Graziano et al. 2004; Spence et al. 2008). This dynamic integra-
tion of multisensory spatial cues occurs automatically in human adults (e.g. Kennett et al. 2002),
but we cannot necessarily infer that the same is the case for earlier stages in development.
However, the challenge to multisensory integration posed by changes in the relationships
between the limbs and body is even greater when considered across development. Firstly, the
number and variety of postural changes that a child can readily make increase substantially, par-
ticularly in the first years of life (Bayley 1969; Carlier et al. 2006; Morange and Bloch 1996; Provine
and Westerman 1979; Van Hof et al. 2002). Indeed, such developmental changes in posture
are known to have a significant impact on infants’ and children’s abilities to act on and navigate
around their environment (see Chapter 6 by Nardini and Cowie; see also work by Adolph, e.g.
Adolph 2008). In addition to changes in the postural variations available to children, the spatial
distribution of the limbs and the body also vary profoundly right up to adulthood. Figure 5.1
shows how the relative sizes, shapes and distributions of the limbs, body, and head change sig-
nificantly across development. These changes in body and proportions necessitate continuous
adaptation of sensorimotor integration across early life (Adolph and Avolio 2000 and see Chapter
6 by Nardini and Cowie).
5.5 Two mechanisms of multisensory integration underlying

peripersonal spatial representations and their development
As has been seen, when we orient toward locations in peripersonal space, and indeed when we
orient toward locations on the body, our brains must integrate and align the spatial frames of
reference used by vision, audition, and by the body senses. In a recent review of the literature on
peripersonal spatial representations (A.J. Bremner et al. 2008a), we highlighted two mechanisms
of multisensory integration that adult humans and primates (typically) use in order to achieve
unified, consistent representations of the body and peripersonal space. In A.J. Bremner et al.
(a)
Birth 2 years 5 years 15 years Adult
(b)
70
Sitting height as a percentage of stature
65
Boys
60 Girls
55
50
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Age (years)
Fig. 5.1 The distributions of limbs and the body change continually across infancy and childhood.
(a) An approximation (after Gilbert 2000) of body proportions in a human girl at birth, 2 years, 5 years,
15 years, and as an adult. Note that the head’s size relative to the body and limbs decreases, and
the length of the limbs relative to body stature increase significantly across development. (b) Changes
in sitting height as a percentage of stature across 0–17 years of age. Decreases in this statistic are
directly proportional to increases in leg length as a percentage of stature (With kind permission
from Springer Science+Business Media: Manual of Physical Status and Performance in Childhood:
Volume One Parts A and B; Physical Status, 1983, Roche, Alex F., Malina, Robert M.).
(2008a), we labelled these as visual spatial reliance, and postural remapping. Due to informative
conversations with colleagues we present this framework in a modified form. We have relabelled
visual spatial reliance here as canonical multisensory body representations.
5.5.1 Canonical multisensory body representations

The available research shows that we rely heavily on statistical information about the sensory
stimulation we receive in order to make more precise estimates about and actions on objects and
TWO MECHANISMS OF MULTISENSORY INTEGRATION 119
our spatial environment (e.g. Ernst and Bülthoff 2004; Kersten et al. 2004; Körding and Wolpert
2004). For instance, a number of studies have demonstrated that our reliance on information
from a given sensory modality depends on the variability of information in that modality in the
context of a particular task (Alais and Burr 2004; Ernst and Banks 2002; Ernst and Bülthoff 2004;
Van Beers et al. 2002). Statistical priors can also play a role in telling us how our body is arrayed:
by relying on the fact that the limbs usually occupy particular locations with respect to the body
(i.e. that there is a high prior probability associated with seeing feeling or even hearing a limb in
a particular location), one can approximate limb position on the basis of such priors (e.g. when
one feels a tactile sensation on the left hand the object which has produced that will more often
than not be in the left visual field).
This approach predicts that the influence of canonical body representations should be observ-
able when participants are asked to undertake tasks in which their limbs are placed into positions
that are at odds with the canonical representation of the body derived from its typical layout. One
particularly striking example of this can be seen in those tasks in which adults make temporal
order judgments (TOJs) concerning tactile stimuli presented first on one hand and then on the
other, in quick succession; performance is much less accurate in an unusual crossed-hands pos-
ture than in the more typical (canonical) uncrossed-hands posture (Schicke and Röder 2006;
Shore et al. 2002; Yamamoto and Kitazawa 2001). Interestingly, these crossed-hands deficits do
not occur, or occur much less strongly, in congenitally blind participants, indicating that, in this
case at least, visual input has an important role in setting up such canonical representations of the
body (Röder et al. 2004; see also Chapter 13 by Röder, and Pagel et al. 2009).
But, from what information might such a canonical representation of the body be derived? As
we described above, Röder et al. (2004) demonstrated that visual experience appears to mediate the
development of canonical representations of the body’s layout. Indeed, vision often provides the
most reliable information about where the hands are in space (cf. Alais and Burr 2004; Van Beers
et al. 2002). It is possible to observe our reliance on vision when vision becomes misleading in the
context of striking bodily illusions, such as the ‘rubber hand’ and ‘mirror’ illusions (Botvinick and
Cohen 1998; Holmes et al. 2004). However, vision is not always the most reliable sensory input
about a given stimulus or in the context of a given task. For instance, Van Beers et al. (2002) have
demonstrated that proprioceptive and visual information about the hand’s location are weighted
relatively more strongly in the contexts in which they are most reliable: proprioception is more
reliable than vision when determining location depth away from the body, whereas vision is more
reliable than proprioception in the left–right dimension (azimuth). Adults thus place more weight
on the most reliable modality. Indeed, research suggests that we integrate the senses optimally
by weighting them in accordance with their task-dependent reliabilities (Ernst and Banks 2002).
We thus suggest that canonical representations of the body are multisensory.
5.5.2 Postural remapping

Of course, canonical multisensory body representations should lead to problems when the limbs
are not in their usual positions. By taking account of postural changes (either passively through
visual and proprioceptive cues, or actively through ‘efferent copies’ of the movement plans used
to change posture), the spatial correspondence between external stimuli and the limbs can be
‘remapped’. Research with adult humans has shown that the multisensory interactions that direct
attention take account of postural changes such as when a hand crosses the body midline (Azañón
and Soto-Faraco 2008; Spence et al. 2008). Additionally, brain areas identified as sites of multi-
sensory integration have been implicated in processes of postural remapping: neurons that remap
sensory correspondences across changes in posture have been reported in the monkey superior
colliculus (SC; Groh and Sparks, 1996a,b) and ventral premotor cortex (PMv; Graziano 1999;
Graziano et al. 1997).
Such remapping mechanisms may operate very early in the neural processing between stimulus
and response, at least in adults. Azañón and Soto-Faraco (2008) used an exogenous cuing experi-
ment to investigate the remapping of tactile stimulus location when the hands were crossed
over the body midline. They examined whether adults’ speeded responses to a visual stimulus on
the hand (a light) were influenced by the spatial proximity of a tactile cue that appeared prior to
(a) Saccades at 0 ms latency
(b) Saccades at 600–1000 ms latency
Fig. 5.2 Saccades made by an adult human to tactile stimuli on their right hand in a crossed-hands
posture (with the right hand in the left visual field). (a) Saccades are made to the tactile stimulus without
any intervening delay. (b) Saccades are made to the tactile stimulus with a delay of 600–1000 ms
between stimulus and response. Gradations indicate 10° of visual angle. Note that in (a), several saccades
begin by heading in the direction in which the tactile stimulus would normally lie and then a later
corrective process takes account of current hand posture, and shifts the saccade direction. (Reproduced
from Journal of Neurophysiology, 75 (1), J. M. Groh and D. L. Sparks, Saccades to somatosensory targets.
I. behavioral characteristics, pp. 412-427 © 1996 , The American Physiological Society with permission.)
CANONICAL MULTISENSORY BODY REPRESENTATIONS 121
the visual target. They found that exogenous crossmodal spatial cuing effects to visual
stimuli appearing within the first 100 ms following a tactile stimulus were strongest in an
anatomical frame of reference—that is, they were strongest when the visual target was in the
position where the tactile cue would typically occur if the hands were uncrossed. 100 ms
after stimulus onset, cuing effects were remapped into an external frame of reference,
taking account of the unusual crossed-hands posture. A similar time-locked process of postural
remapping can be observed in adults’ saccades to tactile stimuli. If saccades are made immediately
following a tactile stimulus applied to a hand crossed over the midline, a proportion of saccades
are initially directed toward the visual hemispace where the tactile stimulus would normally be.
However, if saccadic orienting responses to tactile stimuli are delayed by 600–1000 ms, then
they are directed correctly to the actual location of the stimulus in space, even when the tactile
stimulus is in the visual hemifield opposite that to where the hand would normally be (Groh and
Sparks 1996c; see Fig. 5.2). Thus, it would seem as though an integrative mechanism that is sensi-
tive to posture is required in order to make correct gaze-orienting responses to the hands
when they are placed in atypical locations. Interestingly, it appears that adults are only
conscious of such tactile sensations once they have been remapped (Azañón and Soto-Faraco
2008; Kitazawa 2002).
5.6 Canonical multisensory body representations and

postural remapping in infants’ orienting responses
to tactile stimuli on the hands
A.J. Bremner et al. (2008b) recently traced the development of these mechanisms of multisensory
integration in 6.5- and 10-month-old infants, by measuring their spontaneous manual orienting
responses to vibrotactile sensations presented to their hands when placed both in the uncrossed-,
and crossed-hands postures (see Fig. 5.3a). The 6.5-month-old infants demonstrated a bias to
respond to the side of the body where the hand would typically rest (i.e. to the left of space for a
vibration in the left hand), regardless of the posture (uncrossed or crossed) of the hands. This
indicates a reliance on the typical location of the tactile stimulus in visual space. Later, at
10 months of age, a greater proportion of manual responses were made appropriately in both
postures, suggesting the development of an ability to take account of posture in remapping
correspondences between visual and tactile stimulation.
These developmental findings converge with the results of neuroscientific and behavioural
research in suggesting that representations of peripersonal space arise from two distinct mecha-
nisms of sensory integration, which follow separate developmental trajectories. The first mecha-
nism, canonical multisensory body representations, integrates bodily and visual sensory information
but relies substantially on the probable location of the hand, derived primarily from prior multi-
sensory experience about the canonical layout of the body. This mechanism is present early in the
first 6 months of life. The second mechanism, postural remapping, updates these multisensory
spatial correspondences by dynamically incorporating information about the current posture of
the hand and body. This mechanism develops after 6.5 months of age. We are not suggesting that
the early mechanism of canonical body representations is wholly replaced by that of postural
remapping, but that they continue to work together, as has been observed in adults (see above,
and Fig. 5.3b).
Thus we have put forward a dual-mechanism framework for understanding the development
during the first years of life of representations of the body and limbs and the stimuli that impinge
upon them. However, as we stated at the beginning of this chapter, the functional purpose of
such body representations is to provide a basis for action. Whilst orienting responses can them-
selves be considered as exploratory actions with respect to the body and nearby environment,
122 (a) An infant receiving tactile stimulation to the right hand
Uncrossed-hands Crossed-hands
(b) Spatial information available for orienting
hand hand
Tactile location Locations of hand Typical location of R Posture of R hand

on body (R hand) targets in visual field hand in visual field (proprioceptive & visual)
(c) 6.5-month-olds’ responses (/5) (d) 10-month-olds’ responses (/10)

Number of responses
5 5
Contralateral
4 Ipsilateral 4
3 3
2 2
1 1
0 0
Uncrossed- Crossed- Uncrossed- Crossed-
hands hands hands hands
Fig. 5.3 Orienting to touch in familiar and unfamiliar postures. In the uncrossed-hands posture both
the visual information about the hand (circle) and a tactile stimulus on that hand (zig-zag pattern) arrive
at the contralateral hemisphere (tactile stimuli were always delivered when the infants were fixating
centrally). But with crossed hands, these signals initially arrive in opposite hemispheres (a). (b) shows
the sources of information available to be integrated into a representation of stimulus location. Our
framework suggests that all sources of information are available to 10-month-olds, and all but current
postural information is available to 6.5-month-olds. (c) 6.5-month-old infants’ manual responses to
tactile stimuli (d) 10-month-old infants’ manual responses to tactile stimuli. The infants’ first responses
on each trial were coded (from video-recordings) in terms of their direction in visual space with respect
to the hemisphere receiving the tactile signal. Thus, contralateral responses are appropriate in the
uncrossed-hands posture and ipsilateral responses in the crossed-hands posture. The 6.5-month-olds’
manual responses (n = 13; c) showed an overall contralateral bias, as predicted by a hypothesized
reliance on the typical layout of their body relative to vision. The 10-month-olds (n = 14; c) were able
to respond manually in the appropriate direction in either posture, suggesting, in agreement with the
proposed framework, that this age group are able to use information about current posture to remap
their orienting responses (Figure adapted from Bremner et al. 2008a). Asterisks represent significant
comparisons. Solid arrows represent a strong contribution of a particular source of information to
behaviour. Dotted arrows represent a weak contribution of that particular source of information.
Error bars show standard error of the mean. Reprinted from Trends in Cognitive Sciences, 12 (8),
Andrew J. Bremner, Nicholas P. Holmes, and Charles Spence, Infants lost in (peripersonal) space?,
pp. 298–305, Copyright (2008), with permission from Elsevier.
PERIPERSONAL SPATIAL REPRESENTATIONS UNDERLYING ACTION IN EARLY INFANCY 123
it is important to investigate whether our framework is also applicable to overt actions within
peripersonal space.
5.7 Peripersonal spatial representations underlying

action in early infancy
Of the measurable behaviours in early infancy, perhaps the most relevant ways to observe
the development of action in peripersonal space are reaches and grasps made toward nearby
objects. A key question raised by the framework outlined here concerns whether infants’ reaching
movements at any given stage of development are based on multisensory systems that take account
of current limb posture (postural remapping), or whether instead successful reaches are
based on canonical representations of the limbs in their familiar locations derived from multi-
sensory experience (canonical multisensory body representations; also see Körding and Wolpert
2004).
While newborn infants do not often manually contact objects, their reaches are more often
directed toward an object if they are looking at it (Von Hofsten 1982, 2004). Newborns have also
been shown to change the position of their hand, and initiate decelerations of movement in order
to bring it into sight under the illumination of a spotlight that alternated between two locations
near their body (Van der Meer 1997). Thus, at birth, there is at least some spatial integration
between the information coming from nearby visible objects, and that coming from the body
parts with which responses are made.
An important question is whether early reaching is guided by visual feedback concerning the
relative locations of hand and object. Newborns demonstrate a deceleration of arm movements
in anticipation of their hand’s appearance in a spotlight (Van der Meer 1997), and this is sugges-
tive of the coordination of visual, proprioceptive, and kinaesthetic information (purely visual
guidance cannot explain the anticipatory adjustments, since the hand was invisible when outside
of the spotlight). However, it is difficult to determine whether this indicates early crossmodal
spatial correspondence between proprioceptive and visual space, or rather operant conditioning
of particular arm movements, contingent upon the reward of seeing one’s own hand.
The coordination of proprioceptive and visual space in the guidance of reaching has been
investigated more fully by comparing infants’ early successful reaches for distal targets in the light
against those in the dark (i.e. toward sounding or glowing targets without visual cues to the loca-
tion of their hand; Clifton et al. 1993, 1994; Robin et al. 1996). These studies have shown that
successful reaching in the dark develops at the same age as in the light, indicating that the first
reaches (at around 3–4 months of age) can be based on proprioceptive guidance of hand position
toward a sighted visual target.
Given that infants’ first successful reaches toward visual targets can occur without any visual
input about limb position, these actions are clearly generated by peripersonal spatial representa-
tions that integrate visual and proprioceptive cues to the target and the hand, respectively.
Nonetheless, it remains possible that their reaches in the dark are not guided by current proprio-
ceptive information, but rather by a multisensory representation of limb position that is strongly
weighted toward the canonical location that the limb would normally occupy. Because studies of
infants’ reaching in the dark (Clifton et al. 1993, 1994; Robin et al. 1996) have not systematically
varied limb posture prior to reaching, it is difficult to disentangle these interpretations (cf. Holmes
et al. 2004). However, within the framework put forward here (and by A.J. Bremner et al. 2008a),
the predictions are that if posture were to be varied, young infants’ early reaches would be error-
prone, but that in the second 6 months of life they will become better able to take account of the
current position of the limbs in order to reach accurately from a variety of starting postures.
From 4 months of age, an infant’s reaching movements gradually become more ‘goal-directed’
in nature. Grasps which anticipate the orientation of an object begin to emerge at around 5
months (Lockman et al. 1984; Von Hofsten and Fazel-Zandy 1984). By 8 months of age, re-
orienting of the hand in anticipation of the orientation of a visual target also occurs independ-
ently of vision of the hand (McCarty et al. 2001), indicating that postural guidance is achieved
proprioceptively at this age. Grasps that anticipate the size of an object are first observed from
9 months of age (Von Hofsten and Rönnqvist 1988).
Improvements in the ability to use postural information to maintain spatial alignment between
different sensory inputs arising from peripersonal space can also explain the later development of
an infant’s ability to produce more fine-grained (‘goal-directed’) postural adjustments (especially
those made without sight of the hand (McCarty et al. 2001). These behaviours clearly require
postural calibration, and feed-forward prediction in actions made toward objects.
5.8 Neural construction of bodily and peripersonal spatial

representations in the first year of life
We have argued that two mechanisms of multisensory integration underlying peripersonal space
(canonical multisensory body representations and postural remapping) develop independently
during the first year of life. The sensory interactions subserving the early canonical multisensory
body representations mechanism could be governed both by subcortical (e.g. SC or putamen)
and cortical loci for multisensory integration. The strongest evidence for neural systems underly-
ing the dynamic updating of peripersonal space across changes in posture (postural remapping)
has been obtained from single-unit recordings made in the macaque PMv by Graziano and col-
leagues (Graziano et al. 1997; Graziano 1999). Thus, the more protracted development of mecha-
nisms subserving postural remapping could be explained by a developmental shift from
sub-cortical to cortical processing of multisensory stimuli in early infancy (cf. Wallace and Stein
1997). However, a number of factors speak against cortical maturation as the sole explanation for
these developments.
First, there have been a number of demonstrations of the effect of experience on multisensory
integration. In one such study (Nissen et al. 1951), a newborn chimpanzee’s multisensory and
motor experience with his own hands and feet was severely restricted during the first 30 months
of life by fixing restricting cylinders over these limbs. This chimpanzee later demonstrated almost
no ability to learn a conditioned crossmodal orienting response between two tactile cued loca-
tions on the index finger of either hand. Consistent with this finding, neurophysiological evidence
has demonstrated that multisensory neurons in the SC of dark-reared cats fail to demonstrate
the distinct responses to multisensory and unimodal stimuli seen in normally reared animals
(Wallace et al. 2004).
More recently, Röder et al. (2004; see also Chapter 13 by Röder) have shown that early visual
experience may play a key role in establishing how tactile stimuli are related to visual spatial coor-
dinates, and the typical (visual) posture of the limbs. Indeed, visual experience has also been
shown to play a key role in the developmental integration of auditory and tactile spatial represen-
tations (Collignon et al. 2009). As we shall describe later, there are a number of indications that
changes in patterns of sensory weighting in spatial tasks involving the body may continue well
beyond infancy and into late childhood (Gori et al. 2008; Nardini et al. 2008; Pagel et al. 2009;
Renshaw 1930; Smothergill 1973; Warren and Pick 1970).
Second, the more protracted development of postural remapping in infancy may depend
largely on changes in the kinds of active experience that infants have of their environment. The
developments in postural remapping observed between 6.5 and 10 months of age coincide with
DEVELOPMENTAL CHANGES IN BODY REPRESENTATIONS BEYOND INFANCY 125
the emergence (at about 6 months) of spontaneous reaching toward and across the body midline
for visually-presented objects (Morange and Bloch 1996; Provine and Westerman 1979; Van Hof
et al. 2002). The multisensory experience associated with this behaviour is well-suited for driving
the development of mechanisms of postural remapping.
Roles for experience in the development of representations of the body and peripersonal space
are consistent with ‘interactive specialization’ frameworks for neural systems development
(Johnson 2011; Mareschal et al. 2007) in that some degree of specialization of earlier developed
brain regions (such as the SC) for multisensory orienting responses may lay down the behavioural
foundations required for experientially driven development of more specialized networks under-
lying representations of the body and peripersonal space. The provision of a default canonical
representation of the body underpinned by patterns of relative weighting of the senses may pro-
vide a basis upon which (later developing) experience-dependent dynamic networks can be effi-
ciently deployed, when changes in the posture of the body make this necessary for successful
orienting. This is not to say that brain networks underlying a default canonical representation
would be unaffected by experience. Changes in the body across development would require such
networks to be flexible, and indeed the evidence suggests that sensory experience is necessary for
their normal development (cf. Röder et al. 2004). Rather, it seems more reasonable to suggest that
the general function of such networks in establishing a unitary (if vague) default representation
of the body may be well specified prior to birth.2
5.9 Developmental changes in body representations

beyond infancy
As explained above, it is likely that the neural mechanisms underlying representations of the body
and peripersonal space continue to undergo development beyond infancy. The abilities of young
infants to represent the layout of the body would need to be tuned and re-tuned throughout
development in order to cope with physical changes in the disposition, sizes, and movements of
the limbs that continue even beyond adolescence (see Fig. 5.1).
Although we know of no published studies that have directly examined the development
of visual influences on the spatial localization of the limbs beyond infancy, some classic and more
recent studies have indicated that the multisensory processes involved in representations of the
limbs may change substantially in early childhood. For instance, a number of researchers have
asserted that across early childhood to adolescence, children come to rely more on vision (Renshaw
and Wherry 1931; Renshaw et al. 1930; Warren and Pick 1970). Evidence from spatial orienting
tasks has been used to support this view. For instance, Renshaw (1930) asked children and adults
to localize punctate tactile stimuli on their right arm and hand. The task involved pointing, while
blindfolded, to the stimulated locations using the left hand. Interestingly, pre-adolescent children
performed better than adults on this task, and Renshaw thus suggested that adults rely much more
on vision for directing proprioceptive responses with respect to an external frame of reference.
A similar argument has been put forward more recently by Pagel et al. (2009), who examined
the spatial coordinate systems children use when attributing tactile stimuli to their hands. Using
a tactile TOJ task, they demonstrated developments in the ability to detect the temporal sequence
2 While we consider it likely that such a representation of the canonical layout of the body would be formed
via prenatal multisensory experience, we do not rule out the possibility that such a representation could
exist independently of sensory experience, as has been argued in the case of the newborn’s representations
of the visual appearance of the human face (see Johnson et al. 1991; Morton and Johnson 1991).
of tactile stimuli presented across the hands in a familiar uncrossed posture. However, these
developmental improvements with uncrossed hands were not matched when the children per-
formed the same task with their hands crossed. Pagel et al. explained this crossed-hands deficit in
the older children (the same deficit has previously been documented in adults: Shore et al. 2002;
Yamamoto and Kitazawa 2001) by suggesting that they adopt an extrapersonal frame of reference
for locating tactile stimuli on the hands: when the hands are crossed, extrapersonal and anatomi-
cal frames of reference conflict, explaining the difficulty with this posture. Given that this crossed-
hands deficit is not observed in congenitally blind adults (Röder et al. 2004), it seems likely that
the developmental changes observed by Pagel et al. (2009) in encoding tactile stimuli are due to
visual experience.
While these findings suggest that spatial representations of the body and limbs become increas-
ingly visual in nature, this need not necessarily be the case. Certainly, it does not inevitably follow
that multisensory spatial representations of the body undergo the same developments as spatial
representations of stimuli impinging on the body (the tactile stimuli used in Pagel et al.’s 2009
study can to some extent be considered as extrapersonal). A number of authors have suggested
that adults may perceive stimuli on the body with respect to different spatial frames of reference
(internal and/or external) depending on the task (Martin 1995; Spence et al. 2001), and it is quite
plausible that such internal and external frames of reference emerge according to different devel-
opmental time courses (cf. A.J. Bremner et al. 2008b). Thus, it may be that even very young chil-
dren optimally integrate the senses (vision and proprioception) when representing the position of
their body and limbs, but undergo development in the ways that they integrate multisensory
spatial information regarding objects with respect to the body.
5.10 Using illusions to explore the development of visual spatial

reliance in young children’s limb representations
In order to investigate developmental changes in visual influences on limb position during
early childhood, we recently conducted an experiment in which we utilized Holmes et al.’s (2004)
‘mirror illusion’ task as a means of comparing the extent of visual influence on limb position
as measured by subsequent reaching behaviours in 5- to 7-year-old children (A.J. Bremner
et al. submitted). In this illusion, participants view one of their hands on both the left and right
of their midline (via a mirror placed at the midline facing one arm and obscuring the other; see
Fig. 5.4). When the position of the hidden right hand (perceived non-visually) is put into spatial
conflict with the illusory visual image, participants’ perceptions of the location of the hidden
hand and their subsequent reaching movements are typically biased by the illusory visual infor-
mation about their hand position (Holmes et al. 2004; Holmes and Spence 2005; Snijders et al.
2007). We measured this visual bias in our developmental groups by examining the extent
to which children’s reaches were affected by illusory visual cues concerning the location of
the hand.
The appeal of investigating multisensory weighting of limb position with this visual bodily
illusion does not simply lie in the fact that it provides an elegant measure of visual spatial reliance
in young children, but also because this particular paradigm places little demand on executive
resources: bear in mind here the fact that memory limitations are known to limit children’s
performance on a range of tasks, even into adolescence (Anderson 2003; Hitch 2002). Because in
this task children do not have to maintain sensory information in mind before making a response,
it represents an ideal way of investigating sensory weighting in a relatively pure way. Finally, it is
easy to motivate children to complete large numbers of trials in illusory tasks like this one because
the perceptual phenomenon being measured captures participants’ interest.
(a) = starting
locations 127
= target
17cm 12 7 0 cm 12
cm cm cm
+ –
(b)
Visual capture of reaching (mm/mm of visual conflict)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
56–64 mths 65–73 mths 74–82 mths 83–91 mths Adults
Age group
Fig. 5.4 (a) The mirror apparatus from the point of view of the experimenter. The scale below the
diagram indicates distance to the participant’s right with respect to the mirror, measure in
centimetres. Participants viewed their left hand on both the left and right of their midline (by means
of a mirror placed at the midline facing the left arm, and obscuring the right arm). The left hand
was placed at 12 cm from the mirror. The participant’s hidden right hand was either congruent
with the visual image (12 cm right of the mirror, with respect to the participant), or was put into
spatial conflict in the azimuthal dimension with the illusory visual image (at 7 cm or 17 cm to the
right of the mirror). Participants reached toward the target (12 cm to the right of the mirror, and in
front of the starting positions—indicated by the visible arrow above it) and the lateral terminal
errors were measured. Errors to the participant’s right (left) with respect to the target were scored
Fig. 5.4 (Cont.) as positive (negative). Participants’ use of illusory visual information is indicated by
reaching errors under conditions of crossmodal conflict (i.e. when visual information about the
location of the hand conflicts with veridical information arriving from proprioception). A reliance on
visual information is thus indicated by negative errors (away from the target) at 7 cm, errors around
zero at 12 cm, and positive errors at 17 cm. (b) A comparison of visual influence in the mirror
illusion across age groups (56–64 months, n = 12; 65–73 months, n = 10; 74–82 months, n = 13;
83–91 months, n = 10; adults, n = 15). The measure of visual influence given here is a difference
score, which is compared between the gradients of reach error against starting position in Mirror
and No Mirror conditions (error gradient in the Mirror condition − error gradient in the No Mirror
condition). In a comparison of visual influence on reaching across the four age groups of children,
a main effect of age group was observed (p < 0.05). The error bars represent the SE of the means,
and asterisks indicate statistically significant post-hoc comparisons (p d 0.05).
We tested four age-groups of children on the task. As Fig. 5.4 shows, all of the age-groups tested
showed evidence of being influenced by the illusory visual information. Thus, from age 5 years (at
the latest), children, like adults, demonstrate some reliance on vision for locating their hands in
the azimuthal plane (Holmes et al. 2004; Van Beers et al. 2002). Despite this continuous role of
vision, our data also indicate some developmental changes in its influence on behaviour through
early childhood. In order to compare the magnitude of visual influence in the mirror illusion
across age-groups, we obtained measures of the degree of visual influence (see Fig. 5.4) by com-
paring reaching errors in the ‘Mirror’ and ‘No Mirror’ conditions. This age-group comparison
showed a significant non-monotonic development of visual weighting in limb position as
we observed a sharp increase in visual influence between the 56–64-month-old (approx. 5 years)
age group and the 65–73-month-old (approx. 5¾ years) age group followed by a subsequent
reduction in visual influence between 65–73 months (5¾ years) and 74–82 months (6½ years).
In the next section, we attempt to explain this finding in the context of the relevant literature.
5.11 Functions and developmental processes underlying

changes in visual spatial reliance beyond infancy
One view concerning the functional purpose of an increased dominance of vision in orienting
tasks (as observed in the findings reported above, and by Renshaw 1930) was put forward by both
Renshaw (1930) and Warren and Pick (1970) who argued that children come to be dominated by
visual cues because they derive from the most spatially accurate sensory organ. A more recent
refinement of this argument has been proposed in response to the ‘optimal integration’ account
of multisensory integration in adults (e.g. Ernst and Banks 2002). Remember that this account
proposes that all relevant senses contribute to spatial representations, but that the contributions
of the individual senses are weighted in proportion to their relative task-specific reliabilities, thus
yielding statistically optimal responses. Both Gori et al. (2008), and Nardini and colleagues
(Chapter 6 by Nardini and Cowie; Nardini et al. 2008) have proposed that multisensory develop-
ment converges in middle to late childhood onto an optimal sensory weighting. The evidence for
this claim rests on recent demonstrations in which children develop from a state of either relying
entirely on one sensory modality (which sensory modality this is depends on the task; Gori et al.
2008), or alternating between reliance on one modality or the other (Nardini et al. 2008), to a
situation in which they weight the senses in a statistically optimal fashion.
This account can provide a framework for understanding developmental shifts in middle to late
childhood toward an increased visual influence observed across a number of spatial-orienting
DEVELOPMENTS IN VISUAL SPATIAL RELIANCE BEYOND INFANCY 129
behaviours (Pagel et al. 2009; Renshaw 1930, Renshaw et al. 1930; Smothergill 1973; Warren and
Pick 1970). In addition, however, the optimal integration account can also explain developmental
decreases in visual influence as being due to convergence upon an optimal weighting from a prior
state in which vision was weighted disproportionately to its reliability. We speculate that this
account can also explain the non-monotonic developmental changes in visual influence on reach-
ing observed in the mirror illusion (reported in Section 5.10), as follows. The initial shift toward
a greater influence of vision at 65–73 months of age could be due to convergence upon a multi-
sensory weighting in which vision is weighted more in the context of locating the hand in the
azimuthal dimension, just as it is in adults (see Van Beers et al. 2002). We also suggest that the
later shift away from visual influence in the 74–82-month-old group represents a developmental
process of optimization in which children reclaim some degree of reliance on the less reliable
modality (in this case proprioception) following an overemphasis on vision at an earlier stage
(65–73 months).
Thus, we argue that developmental change in visual spatial reliance in limb representations in
early childhood represents a process of convergence upon an optimal weighting of the senses. But
what developmental processes underlie this extended process of development? Warren and Pick
(1970), following on from Birch and Lefford (1967), posited that increased visual reliance is made
possible by a progressive general linking of the senses across childhood. However, this account
has difficulty explaining evidence indicating that much younger children (infants in their
first year of life) are sensitive to multisensory correspondences, both through their visual and
auditory senses (e.g. for correspondences between vision and audition, see Lewkowicz 2000, and
Chapter 7 by Lewkowicz) and their bodily senses (see above).
Thus, it seems more likely that the changes in visual influence observed in young children,
described above, occur as part of a developmental process of multisensory fine-tuning (rather
than a registering of multisensory correspondence per se) in which the specific weightings of the
senses are modified in order to improve the efficiency of sensorimotor performance. Gori et al.
(2008) propose an interesting account for why the development of optimal multisensory integra-
tion may take so long; they suggest that the optimal weighting of the senses is delayed so as to
provide a point of reference against which to calibrate the senses in multisensory tasks. Because
the body and limbs continue to grow over the course of childhood and adolescence, this recalibra-
tion is constantly required, and so, according to Gori et al. (2008), optimal weighting does
not take place until the body has stopped growing substantially (see also King, 2004 for a similar
argument in the ferret). It remains unclear however, how such a process of delay in or inhibition
of integration might come about.
Some clues to the developmental processes involved in changes in visual spatial reliance in body
representations can be gleaned from different patterns of development observed across different
crossmodal tasks. For instance, whereas Gori et al. (2008), and Nardini et al. (2008) identify
developmental shifts in crossmodal weighting between 5 and 8–10 years of age, and 7–8 years and
adulthood, respectively, A.J. Bremner et al.’s (submitted) findings suggest that children settle on
a weighting similar to that used by adults by 6½ years of age. There are several possible explana-
tions for this discrepancy. For one, it is likely that the different kinds of multisensory integration
(e.g. different sensory modalities and weightings) required in these tasks would have different
developmental time-courses. Such discrepancies are certainly evident in the emergence of multi-
sensory integration in infancy (e.g. Lewkowicz 2002). Another explanation we consider particu-
larly likely rests not on the kinds of multisensory integration required but on the different
executive demands required by our task and others used with young children. For example, the
tasks used by both Gori et al. and Nardini et al. both involve short-term maintenance of cues prior
to a response being made. In contrast, the mirror illusion task reported in this chapter requires
children to integrate proprioception and vision concurrently and make an immediate reaching
response. An important question for future research will therefore be to address whether the
development of optimal multisensory weighting of body representations in early childhood is
constrained by the amelioration of executive limitations, which continues across childhood and
into early adolescence (Anderson 2003; Hitch 2002) and likely affects the maintenance and
manipulation of multisensory inputs (Gallace and Spence 2009).
5.12 Summary
A significant challenge for development is to form detailed, accurate multisensory representa-
tions of the body and peripersonal space. The task of integrating the senses to form a unified
percept of the body and its relation to the environment is not simple. Movement of the sense
organs with respect to one another when the body changes posture means that the developing
infant must learn dynamically to alter the way in which he or she integrates the senses (in particu-
lar, the spatial relationships between visual, auditory, and bodily receptors). Here, we have
described a framework that argues for the independent development (at least over the first year of
life) of two integrative mechanisms that give rise to multisensory representations of peripersonal
space: canonical multisensory body representations and postural remapping. We have argued, on
the basis of evidence from tactile spatial orienting behaviours in infants that a mechanism of
canonical multisensory body representations provides an approximate default form of multisen-
sory integration, upon which more dynamic systems of integration can later be built. The later
development during infancy of more dynamic integrative systems may arise in response to
changes in the demands of multisensory and sensorimotor interactions in peripersonal space,
commensurate with the emergence of certain kinds of postural changes related to exploratory
behaviours.
Beyond infancy, and into early childhood, there are developmental changes in the relative
weightings given to vision and proprioception in locating an arm in the azimuthal plane with
respect to the body. We identified continued developmental changes in the weightings given to
vision, characterized by an increase in visual weighting at 65–73 months, followed by a later shift
away from visual weighting at 74–82 months and beyond into adulthood. Such developmental
changes likely play a role in the fine tuning of both canonical multisensory body representations
and postural remapping such that children converge on increasingly optimal weightings of
current and prior sensory inputs regarding limb position.
Multisensory spatial representations of the body not only underpin our ability to act on our
nearby environment, but also provide an embodied point of reference to the external
world (Bermudez et al. 1995; Gallagher 2005; Varela et al. 1991). As such, an understanding of
the development of these representations has important implications for children’s developing
conceptions of the world they inhabit. Indeed, the surprising prevalence of sensorimotor difficul-
ties of one kind or another, across a wide range of developmental disorders, also hints that
body representations may have an important role in atypical development (see Chapter 12 by
Hill et al.). Developmental research is only just beginning to scratch the surface of the rich set
of multisensory interactions that underlie representations of the body and its surroundings.
Despite the emphasis on tactile–proprioceptive–visual interactions in this chapter, it is important
to note that other sensory modalities may also play important roles in the development of body
representations (see e.g. Kitagawa and Spence 2005). In the final section of this chapter we will
discuss how the developmental course of embodied representations may help explain some
of the puzzling dissociations infants and even children demonstrate during their early cognitive
development.
REFERENCES 131
5.13 Epilogue: lessons for early cognitive development

Despite a relatively limited repertoire of sensorimotor spatial behaviours in early infancy, research
using measures of the time spent looking at arrays of objects has shown that, within a few months
of birth, human infants are able to form sophisticated spatial representations of objects in their
environment (e.g. Slater 1995) and their interactions (Baillargeon 2004; Scheier et al. 2003; Spelke
et al. 1992). By 3–4 months of age, infants form the perceptual categories of ‘above’ and ‘below’
(Quinn, 1994), encode the spatial distance between objects (Newcombe et al. 2005), and can
represent the continued existence and extent of a hidden object in a particular spatial location
(Baillargeon 2004). Strikingly, young infants can recognise the locations of objects and features in
relation to external frames of reference across changes in their orientation (A.J. Bremner et al.
2007; Kaufman and Needham 1999).
However, while looking-duration measures have tended to indicate precocious abilities to
represent objects in the extrapersonal environment, spatial-orienting tasks that require infants
to locate objects in the environment relative to themselves (with manual responses, or visual-
orienting behaviour) have provided a mixed picture of early spatial abilities. Typically, before
8 months of age, infants do not even attempt to uncover hidden objects within reach (Piaget
1954). When orienting to targets, young infants seem to code their responses with respect to their
own body and ignore changes in the position of the target or their own body (Acredolo 1978;
J.G. Bremner 1978). When either is moved before the orienting response occurs, young infants
make ‘egocentric’ errors, and it is only in their second year that they correctly update their
responses.
One way of resolving these conflicting findings is to consider the kinds of spatial representation
required in these two different types of task. In looking-duration tasks, it is possible to identify
changes in the location of objects by reference to purely visual, extrapersonal, spatial coordinates.
By contrast, spatial-orienting tasks require infants to represent the location of objects identified
via the ‘distance’ receptors (typically vision) and, in order to translate this target location into the
intrinsic body-centred coordinates required to orient toward it, they require multisensory embod-
ied representations of the target. The extended development of such representations as discussed
in this chapter may help explain a well-known paradox of cognitive development in the first
year of life: that infants’ early competence as demonstrated in looking-duration measures
(e.g. Baillargeon 2004; Spelke et al. 1992) is not matched by their ability to act manually on that
information until much later and, in many cases, beyond infancy (e.g. J.G. Bremner 2000; Hood
et al. 2000; Mareschal 2000; Munakata et al. 1997).
Acknowledgements
AJB is supported by European Research Council Grant No. 241242 (European Commission
Framework Programme 7). We would also like to thank some of our colleagues for helpful discus-
sions that have fed into the development of this chapter; particularly JJ Begum, Dorothy Cowie,
Fran Knight, David Lewkowicz, and Silvia Rigato.
References
Acredolo, L.P. (1978). Development of spatial orientation in infancy. Developmental Psychology, 14, 224–34.
Adolph, K.E. (2008). Learning to move. Current Directions in Psychological Science, 17, 213–18.
Adolph, K.E., and Avolio, A.M. (2000). Walking infants adapt locomotion to changing body dimensions.
Alais, D., and Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration.
Anderson, P. (2003). Assessment and development of executive function (EF) during childhood. Child
Neuropsychology (Neuropsychology, Development and Cognition: Section C), 8, 71–82.
Azañón, E., and Soto-Faraco, S. (2008). Changing reference frames during the encoding of tactile events.
Bahrick, L.E., and Watson, J.S. (1985). Detection of intermodal proprioceptive-visual contingency as a
potential basis of self-perception in infancy. Developmental Psychology, 21, 963–73.
Baillargeon, R. (2004). Infants’ reasoning about hidden objects: evidence for event-general and event-
specific expectations. Developmental Science, 7, 391–414.
Bayley, N. (1969). Manual for the Bayley scales of infant development. Psychological Corp., New York.
Bermudez, J.L., Marcel, A.J., and Eilan, N. (1995). The body and the self. MIT Press, Cambridge, MA.
Research in Child Development, 28, 1–48.
control. Monographs of the Society for Research in Child Development, 32, 1–87.
Bremner, A.J., Bryant, P.E., Mareschal, D., and Volein, Á. (2007). Recognition of complex object-centred
spatial configurations in early infancy. Visual Cognition, 15, 1–31.
Bremner, A.J., Holmes, N.P., and Spence, C. (2008a). Infants lost in (peripersonal) space? Trends in
Bremner, A.J., Mareschal, D., Fox, S., and Spence, C. (2008b). Spatial localization of touch in the first year
of life: early influence of a visual code, and the development of remapping across changes in limb
position. Journal of Experimental Psychology: General, 137, 149–62.
Bremner, A.J., Hill, E.L., Pratt, M., Rigato, S., and Spence, C. (submitted). Bodily illusions in young
children: developmental change in visual and proprioceptive contributions to perceived hand
position.
Bremner, J.G. (1978). Egocentric versus allocentric coding in nine-month-old infants: factors influencing
the choice of code. Developmental Psychology, 14, 346–55.
Bremner, J.G. (2000). Developmental relationships between perception and action in infancy. Infant
Bryant, P.E. (1974). Perception and understanding in young children. Methuen, London, UK.
Bryant, P.E., and Raz, I. (1975). Visual and tactual perception of shape by young children. Developmental
Bryant, P.E., Jones, P., Claxton, V., and Perkins, G.M. (1972). Recognition of shapes across modalities by
Carlier, M., Doyen, A.-L., and Lamard, C. (2006). Midline crossing: Developmental trend from 3 to 10
years of age in a preferential card-reaching task. Brain and Cognition, 61, 255–61.
Clifton, R.K., Muir, D.W., Ashmead, D.H., and Clarkson, M.G. (1993). Is visually guided reaching in early
infancy a myth? Child Development, 64, 1099–1110.
Clifton, R.K., Rochat, P., Robin, D.J., and Berthier, N.E. (1994). Multimodal perception in the control of
infant reaching. Journal of Experimental Psychology: human Perception and Performance, 20, 876–86.
Collignon, O., Charbonneau, G., Lassonde, M., and Lepore, F. (2009). Early visual deprivation alters
multisensory processing in peripersonal space. Neuropsychologia, 47, 3236–43.
DeLoache, J.S., Uttal, D.H., and Rosengren, K.S. (2004). Scale errors offer evidence for a perception-action
dissociation early in life. Science, 304, 1027–29.
Driver, J. (1999). Egocentric and object-based visual neglect. In The hippocampal and parietal foundations of
spatial cognition (eds. N. Burgess, K.J. Jeffrey, and J. O’Keefe), pp. 67–89. Oxford University Press, Oxford.
Ehrsson, H.H., Spence, C., and Passingham, R.E. (2004). ‘That’s my hand!’ Activity in the premotor cortex
reflects feeling of ownership of a limb. Science, 305, 875–77.
REFERENCES 133
Gallace, A., and Spence, C. (2009). The cognitive limitations and neural correlates of tactile memory.
Gallagher, S. (2005). How the body shapes the mind. Oxford University Press, Oxford.
Gilbert, F.S. (2000). Developmental biology, 6th Ed. Sinauer Associates, Inc., Sunderland, MA.
Glover, S. (2004). What causes scale errors in children? Trends in Cognitive Sciences, 8, 440–42.
Gori, M., Del Viva, M.M., Sandini, G., and Burr, D.C. (2008). Young children do not integrate visual and
haptic form information. Current Biology, 18, 694–98.
Gori, M., Sandini, G., Martinoli, C., and Burr, D. (2010). Poor haptic orientation discrimination in
Gottfried, A.W., Rose, S.A., and Bridger, W.H. (1977). Cross-modal transfer in human infants. Child
Gottfried, A.W., Rose, S.A., and Bridger, W.H. (1978). Effects of visual, haptic, and manipulatory
experiences on infants’ visual recognition memory of objects. Developmental Psychology, 14, 305–12.
Graziano, M.S.A. (1999). Where is my arm? The relative role of vision and proprioception in the neuronal
representation of limb position. Proceedings of the National Academy of Sciences, USA, 96, 10418–21.
Graziano, M.S.A., Hu, X.T., and Gross, C.G. (1997). Visuospatial properties of ventral premotor cortex.
Journal of Neurophysiology, 77, 2268–92.
Graziano, M.S.A., Cooke, D.F., and Taylor, C.S.R. (2000). Coding the location of the arm by sight. Science,
290, 1782–86.
Graziano, M.S.A., Gross, C. G., Taylor, C.S.R., and Moore, T. (2004). A system of multimodal areas in the
primate brain. In Crossmodal space and crossmodal attention (eds. C. Spence and J. Driver), pp. 51–67.
Groh, J.M., and Sparks, D.L. (1996a). Saccades to somatosensory targets: 2. Motor convergence in primate
superior colliculus. Journal of Neurophysiology, 75, 428–38.
Groh, J.M., and Sparks, D.L. (1996b). Saccades to somatosensory targets: 3. Eye-position dependent
somatosensory activity in primate superior colliculus. Journal of Neurophysiology, 75, 439–53.
Groh, J.M., and Sparks, D.L. (1996c). Saccades to somatosensory targets: 1. Behavioral characteristics.
Journal of Neurophysiology, 75, 412–27.
Hall, G.S. (1898). Some aspects of the early sense of self. American Journal of Psychology, 9, 351–95.
Hitch, G.J. (2002). Developmental changes in working memory: a multicomponent view. In Lifespan
development of human memory (eds. P. Graf and N. Ohta), pp. 15–37. MIT Press, Cambridge, MA.
Holmes, N.P., and Spence, C. (2004). The body schema and multisensory representation(s) of peripersonal
space. Cognitive Processing, 5, 94–105.
Holmes, N.P., and Spence, C. (2005). Visual bias of unseen hand position with a mirror: spatial and
temporal factors. Experimental Brain Research, 166, 489–97.
Holmes, N.P., Crozier, G., and Spence, C. (2004). When mirrors lie: ‘Visual capture’ of arm position
impairs reaching performance. Cognitive, Affective, and Behavioral Neuroscience, 4, 193–200.
Hood, B., Carey, S., and Prasada, S. (2000). Predicting the outcomes of physical events: two-year-olds fail
to reveal knowledge of solidity and support. Child Development, 71, 1540–54.
Hulme, C., Smart, A., Moran, G., and Raine, A. (1983). Visual kinaesthetic and cross-modal development:
relationships to motor skill development. Perception, 12, 477–83.
Johnson, M.H. (2011). Interactive specialization: a domain-general framework for human functional brain
development? Developmental Cognitive Neuroscience, 1, 7–21.
Johnson, M.H., and De Haan, M. (2011). Developmental cognitive neuroscience, 3rd edn. Blackwell-Wiley, Oxford.
Johnson, M.H., Dziurawiec, S., Ellis, H., and Morton, J. (1991). Newborns’ preferential tracking of face-like
stimuli and its subsequent decline. Cognition, 40, 1–19.
Kaufman, J., and Needham, A. (1999). Objective spatial coding by 6.5-month-old infants in a visual
dishabituation task. Developmental Science, 2, 432–41.
Kennett, S., Spence, C., and Driver, J. (2002). Visuo-tactile links in covert exogenous spatial attention
remap across changes in unseen hand posture. Perception and Psychophysics, 64, 1083–94.
Kersten, D., Mamassian, P., and Yuille, A. (2004). Object perception as bayesian inference. Annual Review
of Psychology, 55, 271–304.
King, A. (2004). Development of multisensory spatial integration. In Crossmodal space and crossmodal
attention (eds. C. Spence, and J. Driver), pp. 1–24. Oxford University Press, Oxford University Press.
Kitagawa, N., Zampini, M., Spence, C. (2005). Audiotactile interactions in near and far space. Experimental
Kitazawa, S. (2002). Where conscious sensation takes place. Consciousness and Cognition, 3, 475–77.
Körding, K.P., and Wolpert, D. (2004). Bayesian integration in sensorimotor learning. Nature, 427, 244–47.
Locke, J. (1690). An essay concerning human understanding. Oxford University Press, Oxford, UK.
Lockman, J.J., Ashmead, D.H., and Bushnell, E.W. (1984). The development of anticipatory hand
orientation during infancy. Journal of Experimental Child Psychology, 37, 176–86.
Makin, T.R., Holmes, N.P., and Ehrsson, H.H. (2008). On the other hand: dummy hands and peripersonal
space. Behavioural Brain Research, 191, 1–10.
Maravita, A., Spence, C., and Driver, J. (2003). Multisensory integration and the body schema: close to
hand and within reach. Current Biology, 13, R531–39.
Mareschal, D. (2000) Infant object knowledge: current trends and controversies. Trends in Cognitive
Sciences, 4, 408–416.
Mareschal, D., Johnson, M.H., Sirois, S., Spratling, M., Thomas, M., and Westermann, G. (2007).
Neuroconstructivism, Vol. I: How the brain constructs cognition. Oxford University Press, Oxford.
Martin, M.G.F. (1995). Bodily awareness: a sense of ownership. In The body and the self (eds. J.L. Bermudez,
A. Marcel, and N. Eilan), pp. 267–89. MIT Press, Oxford.
Maurer, D., Stager, C., and Mondloch, C. (1999) Cross-modal transfer of shape is difficult to demonstrate
in 1-month-olds. Child Development, 70, 1047–57.
McCarty, M.E., Clifton, R.K., Ashmead, D.H., Lee, P., and Goubet, N. (2001). How infants use vision for
grasping objects. Child Development, 72, 973–87.
Morange, F., and Bloch, H. (1996). Lateralization of the approach movement and the prehension
movement in infants from 4 to 7 months. Early Development and Parenting, 5, 81–92.
Morton, J., and Johnson, M.H. (1991). CONSPEC and CONLERN: a two-process theory of infant face
recognition. Psychological Review, 98, 164–81.
Munakata, Y., McClelland, J.L., Johnson, M.H., and Siegler, R.S. (1997). Rethinking infant knowledge:
toward an adaptive process account of successes and failures in object permanence tasks. Psychological
Review, 104, 686–713.
Newcombe, N.S., Sluzenski, J., and Huttenlocher, J. (2005). Pre-existing knowledge versus on-line learning:
what do young infants really know about spatial location? Psychological Science, 16, 222–27.
Nissen, H.W., Chow, K.L., and Semmes, J. (1951). Effects of restricted opportunity for tactual, kinesthetic, and
manipulative experience on the behavior of a chimpanzee. American Journal of Psychology, 64, 485–507.
Orlov, T., Makin, T.R., and Zohary, E. (2010). Topographic representation of the human body in the
occipitotemporal cortex. Neuron, 68, 586–600.
Pagel, B., Heed, T., and Röder, B. (2009). Change of reference frame for tactile localization during child
development. Developmental Science, 12, 929–37.
REFERENCES 135
Parise, C.V., Spence, C., and Ernst, M.O. (2012). Multisensory integration: when correlation implies
causation. Current Biology, 22, 46–49.
Piaget, J. (1954). The construction of reality in the child. Routledge and Kegan-Paul, London.
Previc, F.H. (1998). The neuropsychology of 3-D space. Psychological Bulletin, 124, 123–64.
Provine, R.R., and Westerman, J.A. (1979). Crossing the midline: limits of early eye-hand behavior. Child
Quinn, P.C. (1994). The categorisation of above and below spatial relations by young infants. Child
Renshaw, S. (1930). The errors of cutaneous localization and the effect of practice on the localizing
movement in children and adults. Journal of Genetic Psychology, 28, 223–38.
Renshaw, S., and Wherry, R.J. (1931). Studies on cutaneous localization III: the age of onset of ocular
dominance. Journal of Genetic Psychology, 29, 493–96.
Renshaw, S., Wherry, R.J., and Newlin, J.C. (1930). Cutaneous localization in congenitally blind versus
seeing children and adults. Journal of Genetic Psychology, 28, 239–48.
Rizzolatti, G., Fadiga, L., Fogassi, L., and Gallese, V. (1997). The space around us. Science, 277, 190–91.
Robin, D.J., Berthier, N.E., and Clifton, R.K. (1996) Infants’ predictive reaching for moving objects in the
dark. Developmental Psychology, 32, 824–35.
Rochat, P. (1998). Self-perception and action in infancy. Experimental Brain Research, 123, 102–109.
Rochat, P. (2010). The innate sense of the body develops to become a public affair by 2–3 years.
Rochat, P., and Morgan, R. (1995). Spatial determinants in the perception of self-produced leg movements
by 3- to 5-month-old infants. Developmental Psychology, 31, 626–36.
Roche, R.M., and Malina, A.F. (1983). Manual of physical status and performance in childhood, Volume 1,
Physical status. Plenum, New York.
Röder, B., Rösler, F., and Spence, C. (2004). Early vision impairs tactile perception in the blind. Current
Biology, 14, 121–24.
Rose, S.A. (1994). From hand to eye: findings and issues in infant cross-modal transfer. In The development
of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 265–84.
Rose, S.A., Gottfried, A.W., and Bridger, W.H. (1981a). Cross-modal transfer and information processing
by the sense of touch in infancy. Developmental Psychology, 17, 90–98.
Rose, S.A., Gottfried, A.W., and Bridger, W.H. (1981b). Cross-modal transfer in 6-month-old infants.
Sann, C., and Streri, A. (2007). Perception of object shape and texture in human newborns: evidence from
cross-modal transfer tasks. Developmental Science, 10, 399–410.
Scheier, C., Lewkowicz, D.J., and Shimojo, S. Sound induces perceptual reorganization of an ambiguous
motion display in human infants. Developmental Science, 6, 233–44.
Schicke, T., and Röder, B. (2006). Spatial remapping of touch: confusion of perceived stimulus order across
hand and foot. Proceedings of the National Academy of Science, USA, 103, 11808–13.
Schmuckler, M. (1996). Visual-proprioceptive intermodal perception in infancy. Infant Behavior and
Schmuckler, M.A., and Fairhall, J.L. (2001). Infants’ visual-proprioceptive intermodal perception of
point-light information. Child Development, 72, 949–62.
Schmuckler, M.A., and Jewell, D.T. (2007). Infants’ visual-proprioceptive intermodal perception with
imperfect contingency information. Developmental Psychobiology, 48, 387–98.
Shore, D.I., Spry, E., and Spence, C. (2002). Confusing the mind by crossing the hands. Cognitive Brain
Research, 14, 153–63.
Slater, A.M. (1995). Visual perception and memory at birth. In Advances in Infancy Research, 9 (eds. C.
Rovee-Collier and L.P. Lipsitt), pp. 107–62. Ablex, Norwood, NJ.
Smothergill, D.W. (1973). Accuracy and variability in the localization of spatial targets at three age levels.
Snijders H.J., Holmes N.P., and Spence, C. (2007). Direction-dependent integration of vision and
proprioception in reaching under the influence of the mirror illusion. Neuropsychologia, 45, 496–505.
Spelke, E.S., Breinlinger, K., Macomber, J., and Jacobson, K. (1992). Origins of knowledge. Psychological
Review, 99, 605–32.
Spence, C., Nicholls, M.E.R., and Driver, J. (2001). The cost of expecting events in the wrong sensory
modality. Perception and Psychophysics, 63, 330–36.
Spence, C., Pavani, F., Maravita, A., and Holmes, N.P. (2008). Multisensory interactions. In Haptic rendering:
foundations, algorithms, and applications (eds. M.C. Lin and M.A. Otaduy), pp. 21–52. AK Peters,
Wellesley, MA.
Streri, A., and Gentaz, E. (2003). Cross-modal recognition of shape from hand to eyes in human newborns.
Somatosensory and Motor Research, 20, 13–18.
Streri, A., and Gentaz, E. (2004). Cross-modal recognition of shape from hand to eyes and handedness in
human newborns. Neuropsychologia, 42, 1365–69.
van Beers, R.J., Wolpert, D.M., and Haggard, P. (2002). When feeling is more important than seeing in
sensorimotor adaptation. Current Biology, 12, 834–37.
van der Meer, A.L. (1997). Keeping the arm in the limelight: advanced visual control of arm movements in
neonates. European Journal of Paediatric Neurology, 1, 103–108.
van Hof, P., van der Kamp, J., and Savelsbergh, G.J.P. (2002). The relation of unimanual and bimanual
reaching to crossing the midline. Child Development, 73, 1353–62.
Varela, F.J., Thompson, E., and Rosch, E. (1991). The embodied mind: cognitive science and human
experience. MIT Press, Cambridge, MA.
von Hofsten, C. (1982). Eye-hand coordination in the newborn. Developmental Psychology, 18, 450–61.
von Hofsten, C. (2004). An action perspective on motor development. Trends in Cognitive Sciences, 8, 266–72.
von Hofsten, C., and Fazel-Zandy, S. (1984). Development of visually guided hand orientation in reaching.
Journal of Experimental Child Psychology, 38, 208–219.
von Hofsten, C., and Rönnqvist, L. (1988). Preparation for grasping an object: a developmental study.
Wallace, M.T., and Stein, B.E. (1997). Development of multisensory neurons and multisensory integration
Wallace, M.T., Perrault, T.J., Jr., Hairston, W.D., and Stein, B.E. (2004). Visual experience is necessary for
the development of multisensory integration. Journal of Neuroscience. 24, 9580–84.
Warren, D.H., and Pick, H.L., Jr. (1970). Intermodality relations in blind and sighted people. Perception
Yamamoto, S., and Kitazawa, S. (2001). Reversal of subjective temporal order due to arm crossing. Nature
Zmyj, N., Jank, J., Schütz-Bosbach, S., and Daum, M.M. (2011). Detection of visual-tactile contingency in
the first year after birth. Cognition, 120, 82–89.
Chapter 6
The development of multisensory

balance, locomotion, orientation, and
navigation
Marko Nardini and Dorothy Cowie
6.1 Introduction
This chapter describes development of the sensory processes underlying the movement of our
bodies in space. In order to move around the world in an adaptive manner infants and children
must overcome a range of sensory and motor challenges. Since balance is a prerequisite for whole-
body movement and locomotion, a primary challenge is using sensory inputs to maintain balance.
The ultimate goal of movement is to reach (or avoid) specific objects or locations in space. Thus
a further prerequisite for adaptive spatial behaviour is the ability to represent the locations of
significant objects and places. Information about such objects, and about one’s own movement
(‘self-motion’) comes from many different senses. A crucial challenge for infants and children (not
to mention adults) is to select or integrate these correctly to perform spatial tasks.
We will first describe the development of balance and locomotion. We will then go on to
describe the development of spatial orientation and navigation. Basic spatial orienting (e.g. turn-
ing the head to localize multisensory stimuli), which does not require balance or locomotion, is
also described in this section. These early-developing orienting behaviours are building blocks for
more sophisticated navigation and spatial recall.
We will describe a number of situations in which spatial behaviour in response to multiple
sensory inputs undergoes marked changes in childhood before reaching the adult state. Bayesian
integration of estimates is a theoretical framework that may accommodate some of these findings.
In the concluding section of this chapter (Section 6.5), we will describe this framework and
highlight its potential for explaining some of the existing developmental findings. However, we
will argue that most of the experimental work needed to evaluate the Bayesian explanation still
remains to be done.
6.2 Development of multisensory balance and locomotion

Human balance mechanisms allow us to maintain a positional equilibrium by coordinating
internal and external forces on the body. Static balance refers to the maintenance of equilibrium
during quiet stance (standing still), whereas dynamic balance refers to the maintenance of equi-
librium during movement, for example during walking. In line with the focus of available/current
research, we will concentrate on static balance. Maintaining balance is vitally important for devel-
oping infants and children because it provides a base on which to build other skills. For example,
being able to sit upright allows for reaching, being able to stand without falling over allows for
walking. These motor skills in turn permit explorations of the surrounding space and objects.
138 THE DEVELOPMENT OF MULTISENSORY BALANCE, LOCOMOTION, ORIENTATION, AND NAVIGATION
Useful information for balance and locomotion comes from multiple senses. However, develop-
ment poses a difficult context in which to integrate these multisensory inputs for balance and
locomotion because the child’s sensory and motor capabilities are still developing, while its body
is also changing in shape, size, and mass.
In this review we will separate the sensory inputs used to maintain balance into three functional
groupings: visual inputs, vestibular inputs, and information from muscle and joint mechanore-
ceptors, which we term ‘proprioception’. The key visual information for balance is that which
signals relative motion between an observer and their environment. Such movement produces
characteristic patterns of change (‘optic flow’) in the visual environment (Gibson 1979). For
example, as an observer sways forwards toward an object, the image of that object expands. This
characteristic ‘expansion’ pattern therefore signals that the observer is moving forwards, and a
corrective, backwards sway response may be made. The medial superior temporal area (MST)
plays a prominent role in processing these signals, as do subcortical structures (Billington et al.
2010; Wall and Smith 2008). In the vestibular system, the semicircular canals provide informa-
tion about the rotation of the head whereas the otolith organs signal linear accelerations (Day and
Fitzpatrick 2005). This information is processed in the vestibular nuclei and subsequently in
higher structures including the cerebellum and cortex (see Angelaki et al. 2009). Finally what we
term proprioception includes information arising from muscle spindles sensing muscle stretch
and similar mechanoreceptors in the joints.
6.2.1 Multisensory balance development

Many studies suggest that balance control is strongly coupled to visual information very early in
infancy. Furthermore, this coupling seems to require little experience of standing or even sitting
upright. Children as young as three days old make head movements in response to expanding
optic flow stimuli, and head movement increases linearly as a function of flow velocity (Jouen
et al. 2000). Preferential looking also shows that 2-month-olds can discriminate global radial
expansion patterns from random patterns. Sensitivity to this motion increases during the first
year of life (Brosseau-Lachaine et al. 2008). The earliest responses may have a subcortical basis,
while later responses are likely to recruit cortical visual areas sensitive to global flow patterns, such
as MT (middle temporal cortex) and MST (medial superior temporal cortex) (Wall et al. 2008;
Wattam-Bell et al. 2010).
The ‘swinging room’ technique (Lishman and Lee 1973) has been widely used to investigate the
balance responses of infants and children (Fig. 6.1). Participants stand on a platform inside what
appears to be a small room. The walls and ceiling of the room move back and forth independently
(a) (b)
Fig. 6.1 The swinging room technique. (a) The participant stands in a room with a stationary floor
but moving walls. (b) An optic flow expansion pattern created by walls moving toward the
participant. The same pattern would be produced by the observer swaying forward.
DEVELOPMENT OF MULTISENSORY BALANCE AND LOCOMOTION 139
of the floor, which is fixed. This causes adult participants to sway with the motion of the room.
When the room moves towards the participant, creating an expansion pattern (Fig. 6.1b), the
participant takes this to signal self-motion forwards and corrects their perceived posture by sway-
ing backwards, with the motion of the room. Thus participants became ‘hooked like puppets’ to
the motion of the room. The development of this phenomenon in childhood can be characterized
by two processes: first, a decrease in the gain of the sway response (older children sway less in
proportion to the sway of the room) and second, an increase in the temporal coupling between
room motion and body sway.
Sway responses are present very early: sitting 5-month-olds sway in synchrony with the room
(Bertenthal et al. 1997). Strong sway responses are maintained across the transition from crawling
to walking (Lee and Aronson 1974). Indeed, responses to the moving room are produced to a
greater degree for infants with crawling or locomotor experience than for less experienced infants
of the same age (Higgins et al. 1996). Many new walkers respond so strongly to the stimulus that
they stumble or fall over. Thus, visual inputs provide stronger inputs to balance at this age than
they do in adults. This is still true of 3- to 4-year-old children (Wann et al. 1998), although the
falling responses have disappeared by this age. The extent of visually driven sway decreases rather
sharply between 4 and 6 years (Godoi and Barela 2008), indicating a transition away from visual
dominance of balance responses, and this transition continues through childhood. To under-
stand the exact developmental trajectory of sway gain, the literature would benefit from studies
that used the same paradigm over a very broad age range, including older children and teenagers.
However, available data demonstrate that for children aged 7–14 years, the gain of sway to a
swinging room is still higher than in adults (Sparto et al. 2006; Godoi and Barela 2008).
Alongside the very gradual weakening of responses to visual information, there is a gradual
increase in the temporal coupling of room movement and body sway through childhood.
Developments in coupling can be seen as early as 5–13 months (Bertenthal et al. 1997) and
continue through mid-childhood. Coherence between room and body sway reaches adult levels
by 10 years (Godoi and Barela 2008; Rinaldi et al. 2009). This may indicate improvements in
muscular control, but could also reflect the refinement of visuomotor mechanisms (Godoi and
Barela 2008; Rinaldi et al. 2009). This increased coupling may be a general feature of perceptual
development, since it is also found using a ‘haptic moving room’ (Barela et al. 2003) where the
relevant sensory information is haptic rather than visual.
While the swinging room technique allows for the systematic manipulation of visual input,
work using platform perturbations has investigated the contributions of visual, proprioceptive,
and vestibular inputs within one task (Shumway-Cook and Woollacott 1985; Woollacott et al.
1987). In conditions in which children stand with eyes closed on a platform which rotates under-
foot to dorsiflex the ankle joint, visual signals are not present and vestibular inputs do not initially
signal sway. Only proprioceptive information from the ankle joint indicates that a balance
response is necessary. Children aged 15–31 months do not sway significantly in this situation
(Woollacott et al. 1987). This result accords with those from the swinging room studies in show-
ing that, for very young children, visual and vestibular inputs are much more effective than pro-
prioceptive inputs in driving balance responses. Similar platform rotations do evoke significant
sway responses in children older than 4 years. At this age, where responses to moving visual scenes
are also decreasing, proprioceptive information alone becomes sufficient to induce sway
(Shumway-Cook and Woollacott 1985).
Further conditions in this study measured the relative contributions of vision and propriocep-
tion to balance, asking the children to stand still in four sensory conditions (Fig. 6.2). There was
no sudden platform perturbation, but different conditions removed reliable visual information,
reliable proprioceptive information, neither, or both. Visual information is simply removed by
(a) (b) (c) (d)
Reliable sources of Vestibular Vestibular Vestibular Vestibular

information present: Visual Visual
Proprioceptive Proprioceptive
Fig. 6.2 Measuring multisensory contributions to balance. While participants attempt to stand still
on a platform, visual information can be removed by closing the eyes. Ankle joint proprioception
can be made unreliable by rotating the platform to maintain the ankle at a fixed angle. This creates
four sensory conditions (a)–(d) during which sway was measured (Reproduced from The growth of
stability: postural control from a development perspective. Shumway-Cook, A., and Woolacott,
M.H., Journal of Motor Behaviour, 17, 131–147 © 1985 Taylor and Francis with permission
(Taylor & Francis, http://www.tandfonline.com).
having children close their eyes. Reliable proprioceptive information is removed by making the
platform ‘sway-referenced’; that is, the participant’s sway is monitored and the platform is rotated
accordingly, so that the ankle joint always remains at 90°. In this situation proprioceptive infor-
mation remains present, but is incongruent with other inputs. Reliable vestibular information
was available in all conditions.
Removing visual information caused greater increases in sway for 4- to 6-year-olds than for
older children or adults. However, removing reliable proprioceptive information caused even
greater increases in sway than removing vision. Removing both visual and proprioceptive infor-
mation caused dangerous amounts of sway, with the majority of 4- to 6-year-olds falling over.
These results suggest that, as for adults, both these sources of information are very important in
maintaining balance for children of this age. However, 4- to 6-year-olds are more destabilized
than adults under conditions of sensory conflict, for example when the proprioception system
signals no movement while the vision and vestibular systems signal movement.
Initial studies found that 7- to 10-year-olds’ balance during platform translations was
not greatly affected by removing vision (Shumway-Cook and Woollacott 1985). In these respects,
7- to 10-year-olds’ performance was very similar to that of adults tested on the same tasks. This
indicated weaker visual, and stronger proprioceptive contributions to balance than at younger ages.
However, recent work has suggested that fully mature balance responses may not occur until 12–15
years (Hirabayashi and Iwasaki 1995; Peterka and Black 1990; Peterson, Christou, and Rosengren
2006). Again, more detailed studies of balance in the late childhood and early teenage years would
be valuable in developing a fuller picture of multisensory balance control towards adulthood.
Together, the swinging room and platform perturbation experiments suggest an early reliance
on visual information. The end of this period and transition away from reliance on vision emerges
at around 5 years. Sway responses also achieve a tighter temporal coupling to sensory inputs dur-
ing mid-childhood. However, the longer developmental trajectories of these processes are rela-
tively under-researched.
6.2.2 Multisensory locomotor development

There are several distinct roles for sensory inputs to locomotion. First, as we have seen,
the processing of visual, proprioceptive, and vestibular inputs is crucial for balance, and
balance in turn underpins locomotion. Indeed, an important achievement for early walkers is
learning to differentiate visual information for walking from visual information for balance.
For example, balance can be challenged during walking by moving the room walls. In this
DEVELOPMENT OF MULTISENSORY BALANCE AND LOCOMOTION 141
situation, experienced walkers are highly influenced by the nature of the walking task
(presence/absence of obstacles) as well as by the visual information present for balance. In
contrast, more experienced walkers react to balance demands equally well in both navigation
conditions (Schmuckler and Gibson 1989).
A further role of sensory inputs in locomotion is allowing the walker to judge properties of the
environment, and the position of the limbs. This may be done in the planning phase of a move-
ment, or ‘online’, during the movement. The development of visual planning in locomotion has
been tested using judgment tasks, where children are verbally questioned about whether they
think an obstacle is passable or not, or where they are asked to choose a preferred path through
an environment. These visually guided ‘passability’ judgments depend on the perceived
skill required to cross an obstacle and the perceived size of the obstacle relative to one’s own
body dimensions (Adolph 1995; Kingsnorth and Schmuckler 2000; Schmuckler 1996). Visual
information is particularly useful in determining obstacle size, and by mid-childhood this body-
referenced size information is very tightly coupled to passability judgments (Heinrichs 1994;
Pufall and Dunbar 1992). These judgments become more refined during early childhood: for
example, older toddlers judge better than younger toddlers whether a barrier can be stepped over
(Schmuckler 1996) or a gap crossed (Zwart et al. 2005).
Haptic information can also contribute to locomotor judgments. For example, Adolph (1995)
compared children’s perceptions of which slopes they could walk on with their actual abilities.
14-month-olds made fewer attempts to descend slopes as they became steeper, but nonetheless
overestimated their abilities. Crucially, these infants explored the slopes before they descended.
The exploration was structured, consisting of a short look at the slope followed, if necessary,
by a long look, then exploring the slope by touch, and then trying alternative methods of descent
such as sliding down (Adolph 1997). Exploration was used to inform locomotor choices, with
most exploration at those slopes that were neither obviously easy to descend nor obviously
impossible, but required careful consideration. Thus haptic information was added to visual
information in locomotor decision-making at this young age. Sensory exploration persists at
older ages. In one task, children of 4.5 years were asked to judge whether they could stand upright
on a ramp (Klevberg and Anderson 2002). Their judgements were compared with their actual
competence. They could explore the ramp using either vision or touch (for touch, they felt the
ramp with a wooden pole). Children overestimated their ability to stand on the slopes. Like
adults, they judged more accurately in the visual exploration condition than in the haptic explo-
ration condition. Thus both visual and haptic information can be used for exploration of the
locomotor environment, though further work is needed to explore the balance of these inputs in
more detail.
Online feedback can help to correct and refine movements that are already underway. In loco-
motion, a wide range of sensory inputs can potentially provide feedback, including visual, prop-
rioceptive, vestibular, and tactile inputs. Using feedback to guide actions may be particularly
important in childhood. This is apparent in a stair-descent task. With vision available during the
whole stepping movement, three-year-olds are skilled at using visual information about stair size
to scale their movements to the size of step they are descending. However, this ability is signifi-
cantly impaired when visual feedback is removed during the step down (Cowie et al. 2010). This
suggests a reliance on visual feedback at 3 years which disappears by 4 years of age. In a more
complicated obstacle-avoidance task that required careful foot placement as well as movement
scaling, degrading vision impaired performance even for 7-year-old children (Berard and Vallis
2006). Thus, some existing data suggests that visual feedback may be crucial for complex aspects
of walking throughout childhood. A recent eye-tracking study with young children (Franchak
and Adolph 2010) confirms that during obstacle navigation children make more visual fixations
than adults. More work is needed to establish the dynamics of sensory inputs and motor outputs
during locomotion. Furthermore, it is clear that the extent of reliance on visual feedback depends
on task complexity, and one of the challenges for locomotor research is to formulate general
developmental principles regarding the use of sensory information in locomotor tasks.
6.2.3 Multisensory development of balance and

locomotion: conclusions
Although balance and locomotion have here been reviewed separately, in fact these two skills are
highly interdependent. While balance skills allow the child to walk safely, locomotion also refines
the control of balance. Infants who can sit upright respond better to balance perturbations than
those who cannot (Woollacott et al. 1987), and infants with crawling or locomotor experience
sway less than those without (Barela et al. 1999). Locomotor experience can also alter the sensory
control of balance: for example, experienced walkers are more responsive to specific aspects of
sensory stimuli, such as peripheral optic flow (Higgins et al. 1996).
We have reviewed a range of evidence which suggests important transitions from an early reli-
ance on vision for balance and locomotion, to a later pattern where information from across the
visual, proprioceptive, and vestibular systems is responded to in a more even way. What precisely
do we mean by ‘reliance’, and what are the mechanisms underlying such change? One model
of multisensory processing leading to action is that, during development, estimates from one
predominant sense are simply acted upon in disregard of estimates from the other responses.
It is easy to imagine infants’ responses to the visual swinging-room stimulus in terms of this sort
of mechanism. This can be contrasted with a second type of model in which a central integrator
collates and weights information from the independent senses to produce a final multisensory
estimate, which can then be acted upon (Clark and Yuille 1990; Ernst 2005; Körding and Wolpert
2006). In the concluding section of this chapter (Section 6.5), we will discuss models of informa-
tion integration in more detail, argue that they provide a useful framework for future lines of
investigation, and highlight the experimental manipulations that still need to be carried out in
order to evaluate them.
6.3 Development of multisensory orientation and navigation

Balance and locomotion provide a basis for spatial behaviour. To be adaptive, spatial behaviour
must be directed towards (or away from) significant objects and locations in the environment.
In this section we consider the development of the ability to orient and navigate to significant
objects and locations. Information about objects and locations commonly comes from multiple
senses (e.g. vision, audition, touch), which must be combined appropriately. Information about
one’s own movement in space also comes from multiple senses (e.g. vision, audition, propriocep-
tion, and the vestibular system). Many spatial tasks, such as picking up an object or crossing a
road, require an immediate response to environmental stimuli but no lasting record of them.
Other spatial tasks, such as finding the keys or finding the way home, also require memory. In
these tasks, multisensory spatial information must be stored in a form that is useful for later
retrieval.
6.3.1 Multisensory orienting

Orienting the eyes and head towards salient nearby stimuli is a basic spatial behaviour evident
from birth. Newborns orient to visual patterns (especially faces), but also to auditory and tactile
stimuli (Clifton et al. 1981; Fantz 1963; Moreau et al. 1978; Tan and Tan 1999; Wertheimer 1961).
Early visual orienting is driven by the retinotectal pathway to the superior colliculus in the
DEVELOPMENT OF MULTISENSORY ORIENTATION AND NAVIGATION 143
midbrain, which transmits sensory information for eye and head movements (Bronson 1974).
This subcortical system enables orienting to salient single targets, but not fine discrimination
or target selection. Cortical processing of visual stimuli, enabling increasingly fine discrimina-
tions (e.g. of orientation, motion, or binocular disparity), develops in the first months of
life (Atkinson 2002). Selective attention, enabling flexible selection of targets (for example, disen-
gagement from a central target in order to fixate a peripheral target) develops at 3–4 months
(Atkinson et al. 1992) and represents further cortical control over orienting behaviour (Braddick
et al. 1992).
Orienting responses alert infants to potentially interesting or hazardous objects, and enable
them to collect additional sensory information about them. Since the same objects can be sig-
nalled by information from multiple modalities (e.g. vision, audition, touch), orienting responses
need to be driven by multiple sensory inputs. In addition, as multisensory inputs are unlikely to
come from a similar spatial location at the same time purely by chance (see Körding et al. 2007),
such stimuli are likely to represent significant objects in the environment. The superior colliculus
supports multisensory orienting by integrating spatially localized visual, auditory, and somato-
sensory inputs within a common reference frame (see Chapter 11 by Laurienti and Hugenschmidt
and Chapter 14 by Wallace et al.). A proportion of multisensory neurons in the cat superior col-
liculus have ‘superadditive’ properties: they fire to auditory-only and visual-only stimuli localized
at the same point in space, but the firing rate for visual-and-auditory stimuli is greater than for
the sum of the two single stimuli (Meredith and Stein 1983; Meredith and Stein 1986; Wallace
et al. 1996). This property indicates that these neurons have a dedicated role in the processing of
stimuli providing both auditory and visual information at the same time. This multisensory neu-
ral organization is not present in newborn cats or monkeys (Wallace and Stein 1997, 2001) but
develops in a manner dependent on sensory experience (Chapter 14 by Wallace et al.).
Spatial co-location of auditory and visual events is detected by infants at least by 6 months of
age (Lawson 1980). To investigate the early development of multisensory integration for spatial
orienting, Neil et al. (2006) measured the latencies of 1- to 10-month-olds’ head and eye move-
ments towards auditory-only, visual-only, or auditory and visual targets located left or right of the
midline. In theory, having two stimuli available at once can enable a purely ‘statistical facilitation’
of reaction times, since there are two parallel opportunities to respond (Miller 1982; Raab 1962).
Given parallel processing but no interaction between cues, observers responding to whichever cue
is processed first on any given trial will show a predictable decrease in mean reaction time relative
to single cues. This purely ‘statistical’ improvement is described by the ‘race model’: (Miller 1982;
Raab 1962; see also Otto and Mamassian 2010), in which the two cues being processed in parallel
are in a ‘race’ to initiate a response. To show evidence for multisensory interaction, it is therefore
necessary to show speed advantages for two cues versus one that are greater than those predicted
by the race model. Neil and colleagues (2006) found latency gains given two cues rather than one
at all ages; however, it was only at 8 to 10 months that these exceeded the statistical facilitation
predicted by the race model. This indicates that mechanisms for multisensory facilitation
of audiovisual orienting emerge late in the first year of life in humans, consistent with the
experience-dependent postnatal development that has been reported in the superior colliculus in
monkeys and cats (Wallace and Stein 1997, 2001; see Chapter 14 by Wallace et al.).
Interestingly, when participants did not use eye or head movements to localize audiovisual
stimuli, but responded to them by pressing a button as quickly as possible (a ‘detection’ task),
adult-like multisensory improvements in latency were not evident until after 7 years (Barutchu
et al. 2009; Barutchu et al. 2010). It may be that these tasks do not tap into the early-developing
subcortically driven reflexive orienting system, but require cortical evidence integration and
selection for action (Romo et al. 2004). These may develop later. To investigate this possibility,
the latest research has started to compare manual and eye-movement responses to the same
audiovisual stimuli directly (Nardini et al. 2011).
6.3.2Orienting and reaching while taking own movement

into account
To localize targets that are perceptually available at the time of test, an egocentric coordinate
system is sufficient, and no memory is needed. However, significant objects in the environment
are not continually perceptually available, and may become hidden from view or silent to
the observer. This is particularly the case with mobile observers, who can change their perspective
on spatial layouts from one moment to the next. For infants beginning to crawl and walk
independently, a crucial challenge is to integrate multisensory information into spatial represen-
tations that are useful for relocating objects after a change of position. Infants who are passively
carried must likewise take their movements into account in order to keep track of nearby objects’
locations.1
After movement, a static object initially localized on the observer’s left can take up a new posi-
tion in egocentric space, such as behind or to the right. The original egocentric representation of
the object’s location is then no longer useful for retrieving it. This problem can be overcome in
two ways: by updating objects’ positions in egocentric coordinates while moving (‘spatial updat-
ing’) or by using external landmarks such as the walls of the room to encode and retrieve loca-
tions. For spatial updating, the observer’s own movement (‘self-motion’) needs to be taken into
account. In human adults, both self-motion and landmarks play major roles in spatial orientation
and navigation (for reviews see Burgess 2006; Wang and Spelke 2002). Within the broad category
of ‘landmarks’ key questions concern the extent to which these help to organize space into a geo-
centric ‘cognitive map’, and the extent to which they are used for simpler strategies such as recog-
nition of familiar views (Burgess 2006; Wang and Spelke 2002). For humans, useful landmarks
are overwhelmingly visual.2 Information about self-motion, however, comes from many sensory
sources: from vision, via optic flow (Gibson 1979; see Section 6.2 above), as well as from vestibu-
lar, kinesthetic, and proprioceptive inputs (Howard and Templeton 1966; MacNeilage et al. 2007;
Zupan, et al. 2002).
Until recently, little was known about the neural mechanisms underlying integration of multi-
ple self-motion inputs for spatial behaviour. Important advances have been in showing how
macaque MST neurons integrate visual and vestibular information to compute the animal’s
direction of movement (Gu et al. 2008; Morgan et al. 2008). Self-motion information must fur-
ther be integrated with visual landmarks. A range of cell types in the mammalian medial temporal
lobe (‘head direction cells’, ‘grid cells’, and hippocampal ‘place cells’) encode both self-motion
and landmark information and are likely to be the basis for their integration (for reviews see
Burgess 2008; McNaughton et al. 2006; Moser et al. 2008; Taube 2007).
The earliest situations in which human infants might take their own motion into account
to localize objects are before independent movement, when carried passively. Infants’ spatial
orienting in these situations has been investigated in studies using ‘peekaboo’ tasks. In a study
by Acredolo (1978), 6- to 16- month-olds sat in a parent’s lap at a table inside a square enclosure
1 A related problem is keeping track of own limb positions in order to enable their accurate localization, and
to support accurate reaching towards objects (Chapter 5 by A.J. Bremner et al. and Chapter 13 by Röder
et al.; A.J. Bremner et al. 2008a; 2008b; Pagel et al. 2009).
2 If landmarks do play a role in building up a ‘cognitive map’ of space, an interesting question is how and
whether blind humans do this (Klatzky et al. 1990; Loomis et al. 1993).
with windows to the left and right. In a training phase, infants learned that whenever a buzzer
sounded, an experimenter appeared at one of the windows. This was always the same window for
each child. Once the infant had learnt to orient to the correct window when the buzzer sounded,
the testing stage began. The parent and infant moved to the opposite side of the table, a manipu-
lation that swapped the egocentric positions of the ‘left’ and ‘right’ windows. When the buzzer
sounded, the experimenters recorded whether the infant looked to the correct window, showing
that they had correctly processed their change of position. With no distinctive landmarks distin-
guishing the windows, solving the task depended on the use of self-motion information, e.g. from
optic flow and the vestibular system. Infants up to 11 months old failed to use such information
and oriented to the incorrect window.
Other studies have confirmed the poor abilities of infants in the first year to update their
direction of orienting using only self-motion information (Acredolo and Evans 1980; Keating
et al. 1986; McKenzie et al. 1984; Rieser 1979). However, when direct landmarks are added to
distinguish between locations, infants in the first year orient correctly (e.g. at 6 months in Rieser
1979, and at 8–11 months in Acredolo and Evans 1980; Keating et al. 1986, Lew et al. 2000 and
McKenzie et al. 1984). Similarly, when searching for objects hidden in the left or right container
on a table-top, infants in the first year moved to the opposite side of the table tend not to take
their own movement into account unless they are given useful landmarks in the form of colour
cues (J.G. Bremner 1978; J.G. Bremner and Bryant 1977).
These results suggest that in order to maintain the locations of nearby objects following their
own movement, infants in the first year found added visual landmarks significantly more useful
than self-motion information alone. As stated above, self-motion information comes from a
number of sensory inputs. These may mature at different rates, and interactions between them to
guide self-motion perception may also mature unevenly, as is the case with balance (see Section
6.2, above). In ‘peekaboo’ and reaching tasks, infants have optic flow and vestibular cues to
changes of position. Older infants and children carrying out the movement themselves can also
use kinesthetic and motor efference information about the movement of their body. Children
perform better after moving actively than after being carried (Acredolo et al. 1984), and better
when walking around to the opposite viewpoint than when the opposite viewpoint is presented
by rotating the spatial layout (J.G. Bremner 1978; J.G. Bremner and Bryant 1977; Schmuckler and
Tsang 1997 ). This indicates that kinesthetic and motor-efference information is useful.
Performance is also better in the light than in the dark, consistent with use of optic flow
(Schmuckler and Tsang-Tong 2000). There is, however, as yet little direct evidence for whether,
when, and how infants integrate these multiple potential information sources to orient them-
selves in space. J.G. Bremner et al. (2011) separated the contributions of optic flow and vestibular
cues to 6- to 12-month olds’ abilities to orient correctly to a target after a change of position.
Infants were seated in a rotatable cylindrical surround, which allowed manipulation of visual and
vestibular information separately and in conflicting combinations. The results showed a U-shaped
developmental pattern in which 6-month-olds, 12-month-olds, and young adults responded
on the basis of optic flow information, whereas 9-month-olds responded predominantly on the
basis of vestibular information. This indicates that children’s reliance on visual versus vestibular
information for locating targets undergoes marked changes in the first year.
6.3.3 Navigation and spatial recall

The natural extension of early orienting and reaching responses towards nearby objects is to flexible
navigation and spatial recall in extended environments. This depends on observers being able to
encode locations using frames of reference that will be useful for their later retrieval, and to bind
individual objects (‘what’) to their locations (‘where’). As described above, mature navigation
and recall depend critically both on visual information about landmarks and multisensory infor-
mation about self-motion.
Spatial studies with children of walking age have typically used search tasks in which the child
sees a toy hidden, and must then find it given a specific set of cues or after a specific manipulation.
Evidence for the individual development of landmark-based and self-motion-based spatial recall
has come from tasks that separate these information sources completely. Following spatial tasks
devised for rodents by Cheng and Gallistel (Cheng 1986; Gallistel 1990), Hermer and Spelke
(1994, 1996) used ‘disorientation’ to eliminate self-motion information and so test spatial recall
based only on visual landmarks. In their task, 18- to 24-month-olds saw a toy hidden in one of
four identical containers in the four corners of a rectangular room. Useful landmarks were the
‘geometry’ of the room (i.e. whether the target box had a longer wall on its left or its right), and
in some conditions, different wall colours. If the child was simply turned to face a random direc-
tion and then allowed to search, they could relocate the toy using not only these visual landmarks
but also an egocentric representation. Such a representation might be an initial vector (e.g.
‘on my left’) that had been updated with self-motion to take the child’s recent turn into account.
To eliminate this information source, children were disoriented by repeated turning with eyes
closed before being allowed to search. The resulting searches therefore reflected only the accuracy
of visual landmarks, and not of egocentric representations updated with self-motion.
A large body of research using this technique has demonstrated that children as young as 18 to
24 months can recall locations using only indirect visual landmarks, although their abilities to use
some kinds of indirect landmark (e.g. ‘geometry’, or the shape of the enclosure layout) can be
markedly better than others (Huttenlocher and Lourenco 2007; Learmonth et al. 2002, 2008; Lee
et al. 2006; Lee and Spelke 2008; Nardini et al. 2009; Newcombe et al. 2010; for reviews see Cheng
and Newcombe 2005; Twyman and Newcombe 2010).
The converse manipulation used to test self-motion alone, without use of visual landmarks, is
blindfolding participants. Strictly, this a test for use of non-visual (e.g. vestibular) self-motion
cues only, as removing vision also removes the optic flow cue to self-motion.3 A typical non-
visual ‘self-motion-only’ spatial task is one in which the participant is walked along the two lines
of an ‘L’ and then asked to return directly to the origin, i.e. to complete the triangle (Foo et al.
2005; Loomis et al. 1993). Such a task depends on keeping track of one’s own directions and
angles of movement since leaving the origin. There have been relatively few studies quantifying
development of self-motion-only navigation. Rider and Rieser (1988) found that 2-year-olds
could localize targets after blindfolded walking along routes with two 90° turns, as could 4-year-
olds, who were more accurate. In another study, four-year-olds’ errors, like adults’, increased
with numbers of turns and numbers of targets (Rieser and Rider 1991). While 4-year-olds were
inaccurate compared with adults, they still performed better than chance given up to three turns
and five targets. These results suggest that like the landmark-only based localization of targets,
self-motion-only based localization emerges quite early in development. However, developmen-
tal trajectories for different self-motion inputs (e.g. vestibular versus kinesthetic), or their inte-
gration, have not yet been studied.
The usual case, of course, is one in which both landmarks and self-motion are available. For
example, in order to relocate an object that we have recently put down elsewhere in the room we
can use visual landmarks, as well as a self-motion-updated egocentric estimate of where the object
3 In a study with adults, Riecke et al. (2002) separated the dual roles of vision in providing visual landmarks
and optic flow by having subjects navigate in virtual reality through a field of visual ‘blobs’ that provided
texture for optic flow but could not be used as landmarks.
now is relative to us. A key question is how these systems interact to guide spatial behaviour. For
mammals, including humans, landmarks are usually more useful and exert stronger control
over behaviour (Etienne et al. 1996; Foo et al. 2005). Landmark and self-motion systems can be
seen as complementary, in the sense that self-motion can help when landmarks are not available.
On this model, self-motion is a ‘back-up’ system for landmarks. An alternative model is one
in which self-motion and landmarks are integrated to improve accuracy. That is, neither system
provides a fully reliable guide to location, but by integrating them estimates of location can be
made more reliable. This second view is in line with Bayesian models of multisensory and spatial
behaviour (Cheng et al. 2007; Clark and Yuille 1990; Ernst 2005; Körding and Wolpert 2006; see
below and Section 6.5).
A number of studies have examined the development of spatial recall in situations allowing use
of both self-motion and landmarks. In studies by Huttenlocher, Newcombe, and colleagues
(Huttenlocher et al. 1994; Newcombe et al. 1998), children searched for toys after seeing them
buried in a long sandbox. The 1994 study established that 16- to 24-month olds used the edges
and overall shape of the sandbox as visual ‘landmarks’ when retrieving toys while remaining on
the same side of the box. In the 1998 study, 16- to 36-month-olds walked around the box to
retrieve toys from the opposite side. From the age of 22 months, having additional distant land-
marks in the room visible improved the accuracy of retrieval from the opposite side. The role of
these cues may have been to provide distant landmarks situating the box and locations in a wider
reference frame, and to improve spatial updating with self-motion by highlighting the change in
perspective brought about by walking to the opposite side. In either case, these results are consist-
ent with use of both visual landmarks and multisensory self-motion inputs to encode spatial
locations in the first years of life.
Simons and Wang (1998; Wang and Simons 1999) showed that human adults’ recall for spatial
layouts from new viewpoints is more accurate when they walk to the new viewpoint than when
they stay in one place while the layout is rotated. With rotation of the layout, use of visual land-
marks alone allows adult observers to understand the change of perspective and to relocate
objects with a certain level of accuracy. However, this accuracy is enhanced when subjects walk
to the new location, meaning that self-motion information is also available (see also Burgess et al.
2004).
To study the development of coding based on both visual landmarks and self-motion,
Nardini et al. (2006) tested 3- to 6-year olds’ recall for the location of a toy hidden under an
array of cups surrounded by small local landmarks. Vision was always available. However,
on some trials, recall took place from the same viewpoint, while on others recall was from a dif-
ferent viewpoint. Viewpoint changes were produced either by the child walking (so gaining self-
motion information about the viewpoint change) or by rotation of the array (gaining no
self-motion information). At all ages recall was most accurate without any viewpoint change.
With viewpoint changes produced by walking, the youngest children (3-year-olds) were above
chance and quite accurate at relocating objects. However, with viewpoint changes produced by
array rotation, only 5- and 6-year-olds retrieved objects at rates above chance. This indicates that
self-motion information is a major component of early spatial competence, and can dominate
over visual landmarks: without useful self-motion information, and so using only visual land-
marks, 3- and 4-year olds could not correctly process the viewpoint change. Note that when diso-
riented and so unable to use self-motion, children from 18–24 months are competent at using
some kinds of visual landmarks to recall locations (Hermer and Spelke 1994, 1996). The present
case is more complex in that instead of being eliminated as by disorientation, when the array was
rotated self-motion information remained available, but indicated the wrong location. This study
therefore set up a conflict between visual landmarks and self-motion, and found that before
5 years of age children preferred to rely on self-motion, even when it was inappropriate for
the task.
In a subsequent study, Nardini et al. (2008) used a different method to measure how self-motion
and landmarks interact for spatial recall. The cues were not placed in conflict, as by array rotation
manipulations, but were separated completely in ‘landmark-only’ and ‘self-motion-only’ condi-
tions. This enabled a more formal test for their integration and comparison with a Bayesian
‘ideal observer’ model (Cheng et al. 2007; Clark and Yuille 1990; Ernst 2005; Körding and Wolpert
2006). As in earlier studies, ‘landmark-only’ performance was measured using disorientation,
and ‘self-motion-only’ performance was measured using walking in darkness. However, these
manipulations were now carried out within the same spatial task, and were compared with a
combined ‘self-motion and landmarks’ condition within the same participants.
Four- to eight-year-olds were tested in a dark room with distant glowing landmarks, and a set
of glowing objects on the floor (Fig. 6.3A). The task was to pick up the objects on the floor in
a sequence, and then to return the first object directly to its original place after a short delay. In
‘visual landmarks only’ and ‘(non-visual) self-motion only’ conditions, participants had to return
the object after disorientation and in darkness respectively. A ‘self-motion-plus-landmarks’
condition tested the normal case in which participants were not disoriented and landmarks
remained available. Bayesian integration of cues predicts greater accuracy (reduced variance)
given both cues together than given either single cue alone. In a ‘conflict’ condition the landmarks
were rotated by a small amount before participants responded. They did not detect this conflict,
but it provided information about the degree to which participants followed one or the other of
the two information sources.
Adults were more accurate in their spatial estimates given both self-motion and landmarks, in
line with optimal (ideal observer model) performance. Given two cues rather than one, adults’
searches were on average closer to the target (Fig. 6.3B) and less variable (Fig. 6.3C).4 By contrast,
4- to 5-year-olds and 7- to 8-year-olds did not perform better given two cues together than the
best single cue (Fig. 6.3B, C). Modelling of responses on the ‘conflict’ condition indicated that
while children can use both self-motion and landmarks to navigate, they did not integrate these
but followed one or the other on any trial. Thus, while even very young children use both self-
motion and landmarks for spatial coding, they seem not to integrate these to improve accuracy
until after 8 years of age. Proposed neural substrates for integration of self-motion and landmarks
for spatial tasks are head place cells, direction cells, and grid cells in the hippocampus and medial
temporal lobe (for reviews, see Burgess 2008; McNaughton et al. 2006; Moser et al. 2008; Taube
2007). Late development of spatial information may be reflected in development of these medial
temporal networks. Recent studies with rodents have found these cell types to be functional very
early in life (Langston et al. 2010; Wills et al. 2010), although the development of their abilities to
integrate multisensory information is still to be investigated.
The late development of multisensory integration for reducing uncertainty in spatial tasks
parallels that recently reported for visual–tactile judgments of size and shape (Gori et al. 2008)
and for use of multiple visual depth cues (Nardini et al. 2010). It is possible that while mature
sensory systems are optimized for reducing uncertainty, developing sensory systems are opti-
mized for other goals. One benefit of not integrating multisensory cues could be that keeping cues
separate makes it possible to detect conflicts between them. These conflicts may provide the error
signals needed to learn correspondences between cues, and to recalibrate them while the body is
growing (Gori et al. 2010).
4 The major predicted benefit for integration of cues is a reduction in the variance of responses—see
Section 6.5.
B
100 SM (self-motion)
LM (landmarks)
Mean RMSE (cm)
80 SM+LM
60
40
20
0
4–5 yr. 7–8 yr. Adult
Group
C
100 SM (self-motion)
LM (landmarks)
Mean SD (cm)
80 SM+LM
60
40
20
0
4–5 yr. 7–8 yr. Adult
Group
Fig. 6.3 (A) Layout for the Nardini et al. (2008) spatial task. (B) Mean root mean square error
(RMSE) of responses under conditions providing SM (self-motion), LM (landmarks), or SM + LM
(both kinds of information). This measure reflects distances between each search and the correct
location. (C) Standard deviation (SD) of responses. This measure reflects the variability (dispersion)
of searches. Bayesian integration of cues predicts reduced variance (and hence also reduced SD,
which is the square root of the variance), given multiple cues. Reprinted from Current Biology,
18 (9), Marko Nardini, Peter Jones, Rachael Bedford, and Oliver Braddick, Development of Cue
Integration in Human Navigation, pp. 689–693, Copyright (2008), with permission from Elsevier.
Reproduced in colour in the colour plate section.
6.4 Summary
We have reviewed the development of sensory integration for the skills underpinning human
spatial behaviour: balance, locomotion, spatial orientation, and navigation. Even very young
children are capable of using sensory information in order to control balance and locomotion.
Visual information seems particularly heavily weighted in infancy and early childhood. Around
5–6 years a period of change begins, where visual information is gradually down-weighted in
favour of somatosensory and vestibular inputs. However, on the available evidence, balance and
locomotion are not coupled in an adult-like fashion to either somatosensory or visual information
until adolescence.
Early spatial orienting behaviours show multisensory facilitation late in the first year, consistent
with experience-dependent postnatal development of the superior colliculus. Sensory integration
for more complex orientation and navigation tasks, including localizing objects again after own
movement, shows a much more extended developmental trajectory. Both self-motion informa-
tion and landmarks are used for spatial orientation and navigation from an early age, but they
may not be integrated in an adult-like fashion until after 8 years of age. Neural bases for these
tasks are increasingly becoming understood in animal models, and mathematical models are
increasingly allowing sensory integration during development to be precisely quantified. A major
challenge for the future is to account for developmental change by linking these mathematical
and neural levels of analysis.
6.5 Bayesian models

We have proposed at several points in this chapter that Bayesian models of information integra-
tion can provide a useful framework for understanding developmental balance, locomotion, and
navigation phenomena. Here we briefly describe the models and the prospects for further tests of
their application to multisensory developmental research in the future.
Bayesian models address the problem that individual sensory estimates are inevitably variable
(uncertain) because of random noise. ‘Noise’ is evident in an auditory environment with back-
ground noise, or in a murky visual environment. However, even in ideal environmental condi-
tions, all sensory systems contain some variability (uncertainty) due to their own limited resolution
and noisy transmission of information by neurons. Bayesian models provide a rule by which
the multiple, noisy, sensory estimates we usually have available in parallel can be combined to
provide a single estimate that minimizes noise or uncertainty (Cheng et al. 2007; Clark and Yuille
1990; Ernst 2005; Körding and Wolpert 2006).
The rule for combining estimates to reduce uncertainty is simple: if multiple sensory estimates
are available (e.g. visual and haptic cues to object size) and the noise in each of these is independ-
ent, then taking a weighted average of the estimates can produce a final estimate that is less
variable (more reliable) than the component estimates. The optimal weighting is one in which the
different estimates are weighted in proportion to their reliabilities. A cue that is three times more
variable than another is therefore given three times less weight. Recent multisensory studies with
adults have shown optimal multisensory integration in many situations (e.g. vision and touch,
Ernst and Banks 2002; vision and audition, Alais and Burr 2004; visual and vestibular cues to
direction of heading, Fetsch et al. 2009; reviews, Ernst 2005; Körding and Wolpert 2006).
These findings provide an important new perspective on human multisensory processing
in demonstrating that, in adults at least, multisensory integration is very flexible. Adults are
excellent at dynamically ‘reweighting’ their reliance on different information sources when these
change in reliability. Thus, although vision may usually dominate for a particular task, if visual
information becomes uncertain (e.g. at night or in fog) it will be given less weighting relative to
BAYESIAN MODELS 151
the other available information (Alais and Burr 2004). The conclusion from this and other
studies is that phenomena such as ‘visual dominance’ may be explained simply by the differing
reliabilities of the senses for particular tasks (Ernst 2005 ; Körding and Wolpert 2006 ).
Task-specific explanations for cue weighting: e.g. that vision dominates over audition for spatial
localization (Pick et al. 1969) can then be subsumed by a single, more general explanation: vision
and audition and other sensory inputs are weighted according to their reliabilities (Alais and
Burr 2004).
We have described a range of developmental phenomena in which visual dominance changes
with age. For example, balance is much more influenced by vision in young children than it is in
older children or adults (Godoi and Barela 2008; Shumway-Cook and Woollacott 1985; Wann
et al. 1998; Woollacott et al. 1987). We can state two possible kinds of Bayesian explanation for
this developmental pattern.
Possibility 1: children and adults integrate multisensory information in the same Bayesian/‘optimal’
manner. In this scenario, developmental changes in ‘dominance’ might simply reflect develop-
mental changes in the reliabilities of the underlying estimates. If the development of different
senses were uneven (e.g. if visual information for balance was accurate at an earlier age than
vestibular information) then different ages would show different weightings even though they
were following the same integration rules. A prediction from this account is that adults whose
cue reliabilities were manipulated (e.g. by selectively adding noise) to be the same as children’s
would show the same behaviour as children.
The issue of dynamic sensory reweighting in children’s balance control was first raised by
Forssberg and Nashner (1982), and several studies since have investigated the swinging room
experiment from this perspective, changing the distance between the swinging walls and the
observer to try and detect sensory reweighting as the far walls become a less reliable source of
information about relative motion (Dijkstra et al. 1994; Godoi and Barela 2008). Studies have also
recorded developmental changes in weighting of multisensory information for posture control in
response to differing amplitudes of visual versus somatosensory perturbations (Bair et al. 2007).
However, to be able to test whether patterns of reweighting in either children or adults are in line
with Bayesian predictions, studies will need to measure the sensory reliabilities of visual and other
cues to balance, which remains challenging (see below).
The alternative is Possibility 2: children and adults do not integrate multisensory information in
the same Bayesian/‘optimal’ manner. For example, adults might take a weighted average of esti-
mates in proportion to their reliabilities, while children might do something else. Children could
rely on a single estimate even when other reliable estimates are available, or, they could integrate
estimates using a sub-optimal weighting, or they could be less effective than adults in excluding
estimates that are highly discrepant with others from the integration process.5
5 The simple rule to integrate all available estimates is inappropriate for situations in which one estimate dif-
fers drastically from the others. ‘Robust cue integration’ is an extension to the Bayesian integration frame-
work stating that since large outliers are highly unlikely to be due to normal sensory noise, they should be
presumed to be erroneous and excluded from integration with other estimates (Knill 2007; Körding et al.
2007; Landy et al. 1995). ‘Robust cue integration’ would give observers in the swinging room a basis for
disregarding visual information when it is highly discrepant with vestibular and proprioceptive informa-
tion. The crucial point to note in comparing children and adults is that whether or not estimates can be
detected as ‘highly discrepant’ depends not only on the average difference between them, but also on their
variabilities. Thus, given children and adults subjected to the same experimental manipulation, if adults’
estimates of own posture (via all senses) are much less variable than children’s, then adults might have a
basis for detecting that the visual estimate is discrepant compared with the others, whereas children might
Most of the studies that have been reviewed cannot separate these two accounts, as changes
with age in relative reliance on different cues are consistent with both. Possibility 1—that children
and adults process information in the same way—provides the simplest account in that it needs
to posit fewer kinds of developmental changes. This account therefore should not be ruled out for
specific tasks until the relevant experimental manipulations have been tested. Interestingly, in the
small number of studies that have examined the development of Bayesian integration explicitly
(Gori et al. 2008; Nardini et al. 2008; Nardini et al. 2010), children were shown not to integrate
estimates in an adult-like way until after 8 years—i.e. Possibility 1 could be rejected. These studies
used visual, manual, and locomotor tasks involving spatial or psychophysical decisions. Whether
the same would be true for more elementary sensory-motor behaviours such as balance is a major
question for future studies. A challenge for developmental studies is the time-consuming nature
of testing Bayesian models of multisensory integration, since they require measurement of
response variability in both single-cue and combined-cue conditions. One way to address this is
use of psychophysical staircase procedures (e.g. Kaernbach 1991; Kontsevich and Tyler 1999) to
obtain more rapid estimates of variability (Nardini et al. 2010). A second challenge is measuring
perceptual variability for elementary sensory-motor behaviours such as balance that lead to pos-
tural changes but do not lend themselves to explicit judgments. For this, new approaches need to
be developed for inferring variability from spontaneous behaviours. This is challenging but pos-
sible, as a recent study has shown by measuring integration of depth cues using only spontaneous
eye movements (Wismeijer et al. 2010).
To sum up, many developmental changes in use of multiple cues for balance, locomotion, and
navigation in childhood could be explained within a Bayesian framework. However, to assess
whether developmental changes correspond either to adult-like integration with a different
inputs in terms of cue reliabilities, or non-adult like integration or lack of integration, more
detailed studies need to be carried out quantifying the variability of the underlying estimates. The
few studies so far that have tested this framework with children suggest that they do not integrate
estimates in an adult like manner until after 8 years, across senses for spatial localization or object
perception (Gori et al. 2008; Nardini et al. 2008), or within vision for three-dimensional shape
discrimination (Nardini et al. 2010).
6.5.1 Other models

We have reviewed development of the sensory processes underlying adaptive spatial behaviour.
While children’s sensory and motor systems are immature, they are also in some senses optimized
for the ecology of their environments. For example, young infants do not need high visual acuity
or accurate spatial localization. Bayesian models stress optimization for accuracy, in terms of the
reduction of sensory uncertainty. However, maximal accuracy is not the most important goal for
many spatial tasks that infants and children face. For developing sensory-motor systems there
may be other goals more important than maximizing accuracy, such as responding rapidly
(Barutchu et al. 2009; Neil et al. 2006) or detecting inter-sensory conflicts (Gori et al. 2010;
Nardini et al. 2010). Therefore many other kinds of model remain to be developed, in which
‘optimality’ is quantified based on different goals or properties. It may be that these models will
not. In this situation, if both age groups were following ‘robust cue integration’ with the same integration
rules, adults would reject vision but children would not. To show that adults and children really differ in
integration behaviour it is necessary to exclude this possibility—for example, by showing that even when
noise is added to adults’ estimates in order to bring them to children’s level, they still behave differently.
REFERENCES 153
provide the best description of how developing perceptual and motor systems are optimized for
spatial behaviour.
Acknowledgements
MN would like to acknowledge support from the James S. McDonnell Foundation 21st Century
Science Scholar in Understanding Human Cognition Program, the National Institute of Health
Research BMRC for Ophthalmology, and United Kingdom Economic and Social Research
Council Grant RES-062–23-0819. DC would like to acknowledge support from the European
Research Council under the European Community’s Seventh Framework Programme (FP7/2007–
2013) / ERC Grant agreement no. 241242.
References
Acredolo, L.P. (1978). Development of spatial orientation in infancy. Developmental Psychology, 14, 224–34.
Acredolo, L.P., and Evans, D. (1980). Developmental changes in the effects of landmarks on infant spatial
behavior. Developmental Psychology, 16, 312–318.
Acredolo, L.P., Adams, A., and Goodwyn, S.W. (1984). The role of self-produced movement and visual
tracking in infant spatial orientation. Journal of Experimental Child Psychology, 38, 312–27.
Adolph, K.E. (1995). A psychophysical assessment of toddlers’ ability to cope with slopes. Journal of
Experimental Psychology: Human Perception and Performance, 21, 734–50.
Adolph, K.E. (1997). Learning in the development of infant locomotion. Monographs of the Society for
Research in Child Development, 62, I–158.
Alais, D., and Burr, D. (2004). The ventriloquist effect results from near optimal crossmodal integration.
Angelaki, D.E., Klier, E.M., and Snyder, L.H. (2009). A vestibular sensation: probabilistic approaches to
spatial perception. Neuron, 64, 448–61.
Atkinson, J. (2002). The Developing Visual Brain. Oxford University Press, Oxford.
Atkinson, J., Hood, B., Wattam-Bell, J., and Braddick, O. (1992). Changes in infants’ ability to switch visual
attention in the first three months of life. Perception, 21, 643–53.
Bair, W.N., Kiemel, T., Jeka, J.J., and Clark, J.E. (2007). Development of multisensory reweighting for
posture control in children. Experimental Brain Research, 183, 435–46.
Barela, J.A., Jeka, J.J., and Clark, J.E. (1999). The use of somatosensory information during the acquisition
of independent upright stance. Infant Behavior and Development, 22, 87–102.
Barela, J.A., Jeka, J.J., and Clark, J.E. (2003). Postural control in children. Coupling to dynamic
somatosensory information. Experimental Brain Research, 150, 434–42.
Barutchu, A., Crewther, D.P., and Crewther, S.G. (2009). The race that precedes coactivation: development
of multisensory facilitation in children. Developmental Science, 12, 464–73.
Barutchu, A., Danaher, J., Crewther, S.G., Innes-Brown, H., Shivdasani, M.N., and Paolini, A.G. (2010).
Audiovisual integration in noise by children and adults. Journal of Experimental Child Psychology,
105, 38–50.
Berard, J.R., and Vallis, L.A. (2006). Characteristics of single and double obstacle avoidance strategies: a
comparison between adults and children. Experimental Brain Research, 175, 21–31.
Bertenthal, B.I., Rose, J.L., and Bai, D.L. (1997). Perception-action coupling in the development of visual
control of posture. Journal of Experimental Psychology: Human Perception and Performance, 23, 1631–43.
Billington, J., Field, D.T., Wilkie, R.M., and Wann, J.P. (2010). An fMRI study of parietal cortex
involvement in the visual guidance of locomotion. Journal of Experimental Psychology: Human
Perception and Performance, 36, 1495–1507.
Braddick, O., Atkinson, J., Hood, B., Harkness, W., Jackson, G., and Vargha-Khadem, F. (1992). Possible
blindsight in infants lacking one cerebral hemisphere. Nature, 360, 461–63.
Bremner, A.J., Mareschal, D., Lloyd-Fox, S., and Spence, C. (2008). Spatial localization of touch in the first
year of life: early influence of a visual spatial code and the development of remapping across changes in
limb position. Journal of Experimental Psychology: General, 137, 149–62.
Bremner, J.G. (1978). Egocentric versus allocentric spatial coding in 9-month-old infants: factors
influencing the choice of code. Developmental Psychology, 14, 346–55.
Bremner, J.G., and Bryant, P. (1977). Place versus response as the basis of spatial errors made by young
infants. Journal of Experimental Child Psychology, 23, 162–71.
Bremner, J.G., Hatton, F., Foster, K.A., and Mason, U. (2011). The contribution of visual and vestibular
information to spatial orientation by 6- to 14-month-old infants and adults. Developmental Science 14,
1033–1045
Bronson, G. (1974). The postnatal growth of visual capacity. Child Development, 45, 873–90.
Brosseau-Lachaine, O., Casanova, C., and Faubert, J. (2008). Infant sensitivity to radial optic flow fields
during the first months of life. Journal of Vision, 8, 1–14.
Burgess, N. (2006). Spatial memory: how egocentric and allocentric combine. Trends in Cognitive Sciences,
10, 551–57.
Burgess, N. (2008). Spatial cognition and the brain. Annals of the New York Academy of Sciences, 1124,
77–97.
Burgess, N., Spiers, H.J., and Paleologou, E. (2004). Orientational manoeuvres in the dark: dissociating
allocentric and egocentric influences on spatial memory. Cognition, 94, 149–66.
Cheng, K. (1986). A purely geometric module in the rat’s spatial representation. Cognition, 23, 149–78.
Cheng, K., and Newcombe, N.S. (2005). Is there a geometric module for spatial orientation? Squaring
theory and evidence. Psychonomic Bulletin and Review, 12, 1–23.
Cheng, K., Shettleworth, S.J., Huttenlocher, J., and Rieser, J.J. (2007). Bayesian integration of spatial
information. Psychological Bulletin, 133, 625–37.
Clark, J.J., and Yuille, A.L. (1990). Data Fusion for Sensory Information Systems. Kluwer Academic, Boston, MA.
Clifton, R.K., Morrongiello, B.A., Kulig, J.W., and Dowd, J.M. (1981). Newborns’ orientation toward
sound: possible implications for cortical development. Child Development, 52, 833–38.
Cowie, D., Atkinson, J., and Braddick, O. (2010). Development of visual control in stepping down.
Experimental Brain Research, 202, 181–88.
Day, B.L., and Fitzpatrick, R.C. (2005). The vestibular system. Current Biology, 15, R583–86.
Dijkstra, T.M., Schoner, G., and Gielen, C.C. (1994). Temporal stability of the action-perception cycle for
postural control in a moving visual environment. Experimental Brain Research, 97, 477–86.
Ernst, M.O. (2005). A Bayesian view on multimodal cue integration. In Human body perception from the
inside out (eds. G. Knoblich, I.M. Thornton, M. Grosjean, and M. Shiffrar), pp. 105–34. Oxford
Etienne, A.S., Maurer, R., and Seguinot, V. (1996). Path integration in mammals and its interaction with
visual landmarks. The Journal of Experimental Biology, 199, 201–209.
Fantz, R.L. (1963). Pattern vision in newborn infants. Science, 140, 296–97.
Fetsch, C.R., Turner, A.H., Deangelis, G.C., and Angelaki, D.E. (2009). Dynamic reweighting of visual and
vestibular cues during self-motion perception. The Journal of Neuroscience, 29, 15601–15612.
Foo, P., Warren, W.H., Duchon, A., and Tarr, M.J. (2005). Do humans integrate routes into a cognitive
map? Map- versus landmark-based navigation of novel shortcuts. Journal of Experimental Psychology:
Learning, Memory and Cognition, 31, 195–215.
Forssberg, H., and Nashner, L.M. (1982). Ontogenetic development of postural control in man: adaptation
to altered support and visual conditions during stance. Journal of Neuroscience, 2, 545–52.
REFERENCES 155
Franchak, J.M., and Adolph, K.E. (2010). Visually guided navigation: head-mounted eye-tracking of
natural locomotion in children and adults. Vision Research, 50, 2766–74.
Gallistel, C.R. (1990). The organization of learning. MIT Press, Cambridge, MA.
Gibson, J.J. (1979). The ecological approach to visual perception. Houghton Mifflin, Boston, MA.
Godoi, D., and Barela, J.A. (2008). Body sway and sensory motor coupling adaptation in children: effects of
distance manipulation. Developmental Psychobiology, 50, 77–87.
Gori, M., Del Viva, M., Sandini, G., and Burr, D. (2008). Young children do not integrate visual and haptic
form information. Current Biology, 18, 694–98.
Gu, Y., Angelaki, D.E., and Deangelis, G.C. (2008). Neural correlates of multisensory cue integration in
macaque MSTd. Nature Neuroscience, 11, 1201–1210.
Heinrichs, M.R. (1994). An analysis of critical transition points of barrier crossing actions. Unpublished
doctoral thesis, University of Minnesota.
Hermer, L., and Spelke, E. (1994). A geometric process for spatial reorientation in young children. Nature,
370, 57–59.
Hermer, L., and Spelke, E. (1996). Modularity and development: the case of spatial reorientation.
Cognition, 61, 195–232.
Higgins, C., Campos, J.J., and Kermoian, R. (1996). Effect of self-produced locomotion on infant postural
compensation to optic flow. Developmental Psychobiology, 32, 836–41.
Hirabayashi, S., and Iwasaki, Y. (1995). Developmental perspective of sensory organization on postural
control. Brain and Development, 17, 111–113.
Howard, I.P., and Templeton, W.B. (1966). Human spatial orientation. John Wiley and Sons, London.
Huttenlocher, J., and Lourenco, S.F. (2007). Coding location in enclosed spaces: is geometry the principle?
Developmental Science, 10, 741–46.
Huttenlocher, J., Newcombe, N., and Sandberg, E. (1994). The coding of spatial location in young children.
Jouen, F., Lepecq, J., Gapenne, O., and Bertenthal, B. (2000). Optic flow sensitivity in neonates. Infant
Kaernbach, C. (1991). Simple adaptive testing with the weighted up-down method. Perception and
Keating, M.B., McKenzie, B.E., and Day, R.H. (1986). Spatial localization in infancy: position constancy in
a square and circular room with and without a landmark. Child Development, 57, 115–24.
Kingsnorth, S., and Schmuckler, M.A. (2000). Walking skill vs walking experience as a predictor of barrier
crossing in toddlers. Infant Behavior and Development, 23, 331–50.
Klatzky, R.L., Loomis, J.M., Golledge, R.G., Cicinelli, J.G., Doherty, S., and Pellegrino, J.W. (1990).
Acquisition of route and survey knowledge in the absence of vision. Journal of Motor Behavior, 22, 19–43.
Klevberg, G.L., and Anderson, D.I. (2002). Visual and haptic perception of postural affordances in children
and adults. Human Movement Science, 21, 169–86.
Knill, D.C. (2007). Robust cue integration: a Bayesian model and evidence from cue-conflict studies with
stereoscopic and figure cues to slant. Journal of Vision, 7, 5–24.
Kontsevich, L.L., and Tyler, C.W. (1999). Bayesian adaptive estimation of psychometric slope and
threshold. Vision Research, 39, 2729–37.
Körding, K.P., and Wolpert, D.M. (2006). Bayesian decision theory in sensorimotor control. Trends in
Körding, K.P., Beierholm, U., Ma, W.J., Quartz, S., Tenenbaum, J.B., and Shams, L. (2007). Causal
inference in multisensory perception. PLoS One, 2, e943.
Landy, M.S., Maloney, L.T., Johnston, E.B., and Young, M. (1995). Measurement and modeling of depth
cue combination: in defense of weak fusion. Vision Research, 35, 389–412.
Langston, R.F., Ainge, J.A., Couey, J.J., et al. (2010). Development of the spatial representation system in
the rat. Science, 328, 1576–80.
Lawson, K.R. (1980). Spatial and temporal congruity and auditory–visual integration in infants.
Learmonth, A.E., Nadel, L., and Newcombe, N.S. (2002). Children’s use of landmarks: implications for
modularity theory. Psychological Science, 13, 337–41.
Lee, D.N., and Aronson, E. (1974). Visual propriceptive control of standing in human infants. Perception
Lee, S.A., and Spelke, E.S. (2008). Children’s use of geometry for reorientation. Developmental Science, 11,
743–49.
Lee, S.A., Shusterman, A., and Spelke, E.S. (2006). Reorientation and landmark-guided search by young
children: evidence for two systems. Psychological Science, 17, 577–82.
Lew, A.R., Bremner, J.G., and Lefkovitch, L.P. (2000). The development of relational landmark use in 6–12
month old infants in a spatial orientation task. Child Development, 71, 1179–90.
Lishman, J.R., and Lee, D.N. (1973). The autonomy of visual kinaesthesis. Perception, 2, 287–94.
Loomis, J.M., Klatzky, R.L., Golledge, R.G., Cicinelli, J.G., Pellegrino, J.W., and Fry, P.A. (1993). Nonvisual
navigation by blind and sighted: assessment of path integration ability. Journal of Experimental
Psychology: General, 122, 73–91.
MacNeilage, P.R., Banks, M.S., Berger, D.R., and Bulthoff, H.H. (2007). A Bayesian model of the
disambiguation of gravitoinertial force by visual cues. Experimental Brain Research, 179, 263–90.
McKenzie, B.E., Day, R.H., and Ihsen, E. (1984). Localization of events in space: young infants are not
always egocentric. British Journal of Developmental Psychology, 2, 1–9.
McNaughton, B.L., Battaglia, F.P., Jensen, O., Moser, E.I., and Moser, M.B. (2006). Path integration and
the neural basis of the ‘cognitive map’. Nature Reviews Neuroscience, 7, 663–78.
Meredith, M.A., and Stein, B.E. (1983). Interactions among converging sensory inputs in the superior
colliculus. Science, 221, 389–91.
Meredith, M.A., and Stein, B.E. (1986). Visual, auditory, and somatosensory convergence on cells in
superior colliculus results in multisensory integration. Journal of Neurophysiology, 56, 640–62.
Miller, J. (1982). Divided attention: evidence for coactivation with redundant signals. Cognitive Psychology,
14, 247–79.
Moreau, T., Helfgott, E., Weinstein, P., and Milner, P. (1978). Lateral differences in habituation of
ipsilateral head-turning to repeated tactile stimulation in the human newborn. Perceptual and Motor
Skills, 46, 427–36.
Morgan, M.L., Deangelis, G.C., and Angelaki, D.E. (2008). Multisensory integration in macaque visual
cortex depends on cue reliability. Neuron, 59, 662–73.
Moser, E.I., Kropff, E., and Moser, M.B. (2008). Place cells, grid cells, and the brain’s spatial representation
system. Annual Review of Neuroscience, 31, 69–89.
Nardini, M., Atkinson, J., Braddick, O., and Burgess, N. (2006). The development of body, environment,
and object-based frames of reference in spatial memory in normal and atypical populations. Cognitive
Processing, 7, 68–69.
Nardini, M., Thomas, R.L., Knowland, V.C., Braddick, O.J., and Atkinson, J. (2009). A viewpoint-independent
process for spatial reorientation. Cognition, 112, 241–48.
Nardini, M., Bedford, R., and Mareschal, D. (2010). Fusion of visual cues is not mandatory in children.
Proceedings of the National Academy of Sciences of the United States of America, 107, 17041–46.
Nardini, M., Bales, J., Zughni, S., and Mareschal, D. (2011). Differential development of audio-visual
integration for saccadic eye movements and manual responses. Journal of Vision, Vision Sciences Society
Meeting, 2011, 11, Article 775.
REFERENCES 157
multisensory spatial integration and perception in humans. Developmental Science, 9, 454–64.
Newcombe, N., Huttenlocher, J., Bullock Drummey, A., and Wiley, J.G. (1998). The development of spatial
location coding: place learning and dead reckoning in the second and third years. Cognitive
Newcombe, N.S., Ratliff, K.R., Shallcross, W.L., and Twyman, A.D. (2010). Young children’s use of features
to reorient is more than just associative: further evidence against a modular view of spatial processing.
Developmental Science., 13, 213–20.
Otto, T.U., and Mamassian, P. (2010). Divided attention and sensory integration: the return of the race
model. Journal of Vision, 10, 863.
Peterka, R.J., and Black, F.O. (1990). Age-related changes in human posture control: sensory organization
tests. Journal of Vestibular Research, 1, 73–85.
Peterson, M.L., Christou, E., and Rosengren, K.S. (2006). Children achieve adult-like sensory integration
during stance at 12-years-old. Gait and Posture., 23, 455–63.
Pick, H.L., Warren, D.H., and Hay, J.C. (1969). Sensory conflict in judgments of spatial direction.
Perception and Psychophysics, 6, 203–205.
Pufall, P.B., and Dunbar, C. (1992). Perceiving whether or not the world affords stepping onto and over: a
developmental study. Ecological Psychology, 4, 17–38.
Raab, D.H. (1962). Statistical facilitation of simple reaction times. Transactions of the New York Academy of
Sciences, 24, 574–90.
Rider, E.A. and Rieser, J.J. (1988). Pointing at objects in other rooms: young children’s sensitivity to
perspective after walking with and without vision. Child Development, 59, 480–94.
Riecke, B.E., van Veen, H.A., and Bulthoff, H.H. (2002). Visual homing is possible without landmarks: a
path integration study in virtual reality. Presence: Teleoperators and Virtual Environments, 11, 443–73.
Rieser, J.J. (1979). Spatial orientation of six-month-old infants. Child Development, 50, 1078–87.
Rieser, J.J., and Rider, E.A. (1991). Young children’s spatial orientation with respect to multiple targets
when walking without vision. Developmental Psychology, 27, 97–107.
Rinaldi, N.M., Polastri, P.F., and Barela, J.A. (2009). Age-related changes in postural control sensory
reweighting. Neuroscience Letters, 467, 225–29.
Romo, R., Hernandez, A., and Zainos, A. (2004). Neuronal correlates of a perceptual decision in ventral
premotor cortex. Neuron, 41, 165–73.
Schmuckler, M.A. (1996). The development of visually-guided locomotion: barrier crossing by toddlers.
Ecological Psychology, 8, 209–36.
Schmuckler, M.A., and Gibson, E.J. (1989). The effect of imposed optical flow on guided locomotion in
young walkers. British Journal of Developmental Psychology, 7, 193–206.
Schmuckler, M.A., and Tsang, H.Y. (1997). Visual-movement interaction in infant search. In Studies in
Perception and Action IV (eds. M.A. Schmuckler and J.M. Kennedy), pp. 233–36. Lawrence Erlbaum,
Mahwah, NJ.
Schmuckler, M.A., and Tsang-Tong, H.Y. (2000). The role of visual and body movement information in
infant search. Developmental Psychology, 36, 499–510.
Shumway-Cook, A., and Woollacott, M.H. (1985). The growth of stability: postural control from a
development perspective. Journal of Motor Behavior, 17, 131–47.
Simons, D.J., and Wang, R.F. (1998). Perceiving real-world viewpoint changes. Psychological Science,
9, 315–20.
Sparto, P.J., Redfern, M.S., Jasko, J.G., Casselbrant, M.L., Mandel, E.M., and Furman, J.M. (2006). The
influence of dynamic visual cues for postural control in children aged 7–12 years. Experimental Brain
Research, 168, 505–16.
Tan, U., and Tan, M. (1999). Incidences of asymmetries for the palmar grasp reflex in neonates and hand
preference in adults. Neuroreport, 10, 3253–56.
Taube, J.S. (2007). The head direction signal: origins and sensory-motor integration. Annual Review of
Twyman, A.D., and Newcombe, N.S. (2010). Five reasons to doubt the existence of a geometric module.
Cognitive Science, 34, 1315–56.
Wall, M.B., and Smith, A.T. (2008). The representation of egomotion in the human brain. Current Biology,
18, 191–94.
Wall, M.B., Lingnau, A., Ashida, H., and Smith, A.T. (2008). Selective visual responses to expansion and
rotation in the human MT complex revealed by functional magnetic resonance imaging adaptation.
Wallace, M.T., and Stein, B.E. (2001). Sensory and multisensory responses in the newborn monkey
superior colliculus. Journal of Neuroscience, 21, 8886–94.
Wallace, M.T., Wilkinson, L.K., and Stein, B.E. (1996). Representation and integration of multiple sensory
inputs in primate superior colliculus. Journal of Neurophysiology, 76, 1246–66.
Wang, R.F., and Simons, D.J. (1999). Active and passive scene recognition. Cognition, 70, 191–210.
Wang, R.F., and Spelke, E.S. (2002). Human spatial representation: insights from animals. Trends in
Wann, J.P., Mon-Williams, M., and Rushton, K. (1998). Postural control and co-ordination disorders: the
swinging room revisited. Human Movement Science, 17, 491–513.
Wattam-Bell, J., Birtles, D., Nystrom, P., et al. (2010). Reorganization of global form and motion
processing during human visual development. Current Biology, 20, 411–415.
Wertheimer, M. (1961). Psychomotor coordination of auditory and visual space at birth. Science, 134, 1692.
Wills, T.J., Cacucci, F., Burgess, N., and O’Keefe, J. (2010). Development of the hippocampal cognitive map
in preweanling rats. Science, 328, 1573–76.
Wismeijer, D.A., Erkelens, C.J., van, E.R., and Wexler, M. (2010). Depth cue combination in spontaneous
eye movements. Journal of Vision, 10, 25.
Woollacott, M., Debu, B., and Mowatt, M. (1987). Neuromuscular control of posture in the infant and
child: is vision dominant? Journal of Motor Behavior, 19, 167–86.
Zupan, L.H., Merfeld, D.M., and Darlot, C. (2002). Using sensory weighting to model the influence of canal,
otolith and visual cues on spatial orientation and eye movements. Biological Cybernetics, 86, 209–30.
Zwart, R., Lebedt, A., Fong, B., deVries, H., and Savelsbergh, G. (2005). The affordance of gap crossing in
toddlers. Infant Behavior and Development, 28, 145–54.
Chapter 7
The unexpected effects of experience

on the development of multisensory
perception in primates
David J. Lewkowicz
7.1 Introduction
Whenever we encounter objects and the events that they participate in, we can see, hear, feel,
smell, and/or taste them. In addition, during each of these encounters, our bodies provide a con-
stant flow of proprioceptive, kinesthetic, and vestibular information that often bears some rela-
tionship to our actions directed toward the objects. Thus, even though the specific mélange of
sensory inputs that we experience at any moment in time may differ across different situations,
our normal sensory experiences are always multisensory in nature (J.J. Gibson 1966; Marks 1978;
Stein and Meredith 1993; Welch and Warren 1980; Werner 1973).
The rich and varied combination of everyday multisensory experiences presents a considerable
challenge to developing infants. Unless they can integrate the diverse types of sensory information
in the different modalities, they might end up in William James’ classic state of ‘blooming, buzz-
ing confusion’. Fortunately, most of the objects and events in our life are specified by highly
redundant multisensory perceptual cues (J.J. Gibson 1966, 1979) and, as a result, infants have the
opportunity to perceive a world of coherent objects and events rather than one specified by a col-
lection of disparate sensations (E.J. Gibson 1969). Having the opportunity to experience multi-
sensory coherence and having the ability to do so are two different issues. Indeed, it is theoretically
reasonable to posit a priori that infants may not be able to take advantage of the multisensory and
redundant cues that are all around them because they come into the world with a highly imma-
ture nervous system and are perceptually inexperienced. In other words, it is reasonable to expect
that infants’ ability to perceive multisensory coherence is highly immature, that it takes time for
it to develop, and that experience is likely to contribute to its emergence.
In this chapter I review what is currently known about the development of multisensory per-
ception in infancy and show that the ability to perceive multisensory coherence does, indeed, take
time to emerge. In addition, I review some of our recent data showing for the first time that expe-
rience contributes in an unexpected but critical way to the emergence of multisensory perceptual
skills in human infants and developing monkeys. These data demonstrate that young infants are
broadly tuned to the multisensory inputs and that as a result they treat audiovisual inputs as
coherent regardless of whether they represent a native or a non-native species or a native or non-
native language. Furthermore, these data indicate that multisensory perceptual tuning narrows
during the first year of life and that, as it does, the ability to perceive multisensory coherence of
non-native signals declines. Finally, these findings suggest that multisensory perceptual narrow-
ing may be a recent evolutionary phenomenon because young vervet monkey infants do not
160 UNEXPECTED EFFECTS OF EXPERIENCE ON THE DEVELOPMENT OF PRIMATES
exhibit narrowing and thus continue to perceive the coherence of the faces and vocalizations of
another species.
7.2 Dealing with a multisensory world

Multisensory perceptual inputs provide two kinds of cues: modality-specific and amodal.
Modality-specific cues are unique to a particular modality and include such perceptual attributes
as colour, pitch, aroma, taste, pressure, and temperature. In contrast, amodal cues provide equiv-
alent information about particular objects or events regardless of the modality in which they are
detected and include such perceptual attributes as intensity, duration, tempo, rhythmic pattern,
shape, and texture. Importantly, amodal attributes can be detected in a single sensory modality.
For example, we can perceive the intensity, duration, or rhythm of a stimulus event either by just
listening to it or by seeing it. In addition, regardless of whether we listen to it or hear it, we can tell
that it is the same thing. In contrast, perception of the multisensory coherence of modality-
specific attributes requires concurrent access to such attributes in the different modalities and an
active process of association that relies on their temporal and/or spatial contiguity.
Evolution has taken advantage of the multisensory redundancy that amodal and temporally
and spatially contiguous modality-specific cues provide, resulting in the emergence of nervous
systems that possess sophisticated mechanisms for the detection of multisensory coherence (Foxe
and Schroeder 2005; Ghazanfar and Schroeder 2006; J.J. Gibson 1979; Maier and Schneirla 1964;
Marks 1978; Stein and Meredith 1993; Stein and Stanford 2008). Two sets of findings illustrate
especially well the pervasive nature of multisensory interaction and its fundamental role in behav-
ioral functioning. One is a set of findings on multisensory redundancy effects in various species
and across the lifespan that shows that detection, discrimination, and learning of multisensory as
opposed to unisensory information is more robust (Bahrick et al. 2004; Lewkowicz and Kraebel
2004; Partan and Marler 1999; Rowe 1999). The other set of findings comes from studies of vari-
ous multisensory illusions. These also demonstrate the pervasive nature of multisensory interac-
tion in that in those cases when inputs in different modalities conflict, the nervous system attempts
to resolve the conflict and, in the process, produces surprising perceptual experiences.
Multisensory redundancy effects have been reported in adults of many different species as well
as in human infants. For example, it has been found that human adults exhibit an 11% increase
in speech intelligibility when they can see the face while listening to someone’s voice (Macleod
and Summerfield 1987) and that adults’ ability to understand difficult speech content is enhanced
when the face is visible (Arnold and Hill 2001). Similarly, studies have shown that adult animals
exhibit better learning and discrimination when they have access to multisensory as opposed to
unisensory inputs (Partan and Marler 1999; Rowe 1999; Shams and Seitz 2008). At the neural
level, studies have shown that some proportion of multisensory neurons in the superior colliculus
of the cat and monkey increase their firing rate dramatically in response to multisensory as
opposed to unisensory inputs (Meredith and Stein 1983) and that this ‘superadditivity’ is reflected
in faster behavioral orienting to concurrent and co-located auditory and visual localization cues
than to visual-only cues (Stein et al. 1989).
Like adults, human infants take advantage of multisensory redundancy in that they exhibit bet-
ter detection, learning, and discrimination of multisensory as opposed to unisensory events
(Bahrick et al. 2004; Lewkowicz and Kraebel 2004). For example, as early as two months of age
infants exhibit generally faster responsiveness to co-located auditory and visual cues than to each
of these cues alone and by eight months they exhibit adult-like audiovisual facilitation effects
(Neil et al. 2006). Similarly, infants exhibit greater responsiveness to temporally varying static
visual stimuli presented together with temporally varying sounds than to the same visual stimuli
DEALING WITH A MULTISENSORY WORLD 161
in the absence of the sounds (Lewkowicz 1988a, 1988b). Infants also exhibit greater responsive-
ness to moving objects presented together with their impact sounds than to such objects pre-
sented without impact sounds (Bahrick and Lickliter 2000; Bahrick et al. 2002; Lewkowicz 1992b,
2004). Finally, infants respond more to talking and singing faces than to silently talking and sing-
ing faces (Lewkowicz 1996a, 1998). Importantly, however, no tests have been done so far to
determine when temporally based redundancy effects first become adult-like in infancy.
As far as multisensory illusions are concerned, the McGurk and the spatial ventriloquism
effects are the best known and demonstrate the power of the visual modality to affect responsive-
ness to auditory input. In the McGurk effect (McGurk and Macdonald 1976), adults’ perception
of an audible syllable changes dramatically in the presence of a conflicting visual syllable. For
example, when a visual /va/ is dubbed onto an audible /ba/, adults tend to hear a /va/ rather than
a /ba/. Infants also exhibit evidence of the McGurk effect but not until five months of age
(Rosenblum et al. 1997). In the ventriloquism effect (Bertelson and Radeau 1981; Pick et al. 1969;
Welch and Warren 1980), the perceived location of a sound can be changed dramatically by pre-
senting a concurrent dynamic visual stimulus at another location. For example, when adults hear
a speech sound or a beep at one location and see a talking face or see a flashing light at another
location, they mis-localize the sound and perceive it to be closer to the visual stimulus. Interestingly,
no studies to date have investigated the ventriloquism effect in infancy although it is not likely to
operate during the early months of life for a number of reasons. First, auditory spatial acuity
thresholds are significantly higher (Morrongiello 1988; Morrongiello et al. 1990) making it diffi-
cult for infants to localize sounds precisely. Second, the extent of the visual field is initially quite
small and increases markedly during the early months of life (Delaney et al. 2000). This makes it
difficult for infants to localize visual stimuli during the early months of life. Compounding the
problem is the fact that spatial resolution, saccadic localization, smooth pursuit, and the ability to
localize moving objects are all initially poor and change markedly during the early months of life
(Colombo 2001). Together, these auditory and visual system immaturities make it unlikely that
young infants experience the kind of visual capture that is typical of the spatial ventriloquism
effect in adults because these effects depend on relatively small audio-visual (A-V) disparities
(Recanzone 2009). That is, in order for the visual stimulus to ‘capture’ the location of the auditory
stimulus, subjects must have the ability to precisely localize visual stimuli. Infants certainly do not
have this ability and, if anything, their auditory spatial discrimination ability is probably better,
especially given that audition begins to function during the last trimester of pregnancy while
vision does not begin to function until birth. Even so, auditory spatial detection abilities are still
relatively poor in infancy as well. As a result, the multisensory spatial contiguity window, like the
multisensory temporal contiguity window (Lewkowicz 1996b), is probably much larger in infancy
than later in life. If so, this would suggest that infants might actually integrate spatially discrepant
auditory and visual stimuli over greater spatial discrepancies than adults and thus, paradoxically,
may experience the ventriloquism effect most of the time. So far, this prediction has not been
tested experimentally.
Whereas the McGurk and the ventriloquism effects illustrate the power of the visual modality
to markedly affect auditory perception, the ‘bounce’ and the ‘flash-beep’ illusions illustrate the
power of the auditory modality to radically change our perception of visual input. In the bounce
illusion (Sekuler et al. 1997), adults who watch two identical objects moving in silence along the
same path toward and then through each other report that the objects pass through each other in
nearly 80% of the trials (i.e. they report that the objects bounce in approximately 20% of the tri-
als). In contrast, when they watch the same objects but this time hear a brief sound each time the
objects overlap, they report that the objects now bounce against one another in approximately
65% of the trials. Studies investigating the developmental emergence of this spatiotemporal
multisensory illusion have found that it emerges in infancy. They have found that 4-month-old
infants do not exhibit it but that 6- and 8-month-old infants do (Scheier et al. 2003). Finally,
in the flash-beep illusion, when adults see a single flash of light, they incorrectly report seeing
multiple flashes when they hear multiple auditory beeps (Shams et al. 2000). Developmental stud-
ies (Innes-Brown et al. 2011) have shown that 8-17 year-old children also exhibit this illusion but
that their responsiveness is immature relative to that found in adults.
In sum, the findings of multisensory redundancy effects and the various multisensory illusions
make it clear that multisensory interaction is the norm rather than the exception in human
behavior. The fact that some evidence of these types of multisensory interaction has also been
found in infancy is further testament to the multisensory character of perception and the behav-
iors that it mediates and supports. In the subsequent sections of this chapter I first consider the
theoretical issues confronting those interested in the development of multisensory perception,
then discuss the theoretical frameworks proposed in the past to deal with these issues, and end
by reviewing some recent findings from our laboratory suggesting the operation of a hitherto
unacknowledged process (multisensory narrowing), and propose a reformulation of the extant
theoretical frameworks.
7.3 The developmental challenge and underlying processes

The fundamental importance of multisensory perception raises interesting questions about its
developmental emergence. The principal question is this: given that the infant nervous system is
immature and given that infants are perceptually and cognitively inexperienced, to what extent
might infants be able to perceive the coherence of multisensory inputs and what process(es) might
underlie this ability? This central question was clearly recognized by two classic theoretical views.
The first, the developmental integration view, held that newborns do not perceive multisensory
coherence and that this ability only emerges gradually as a result of the child’s active exploration of
the world (Birch and Lefford 1963, 1967; Piaget 1952). In contrast, the second, the developmental
differentiation view, did not consider the early developmental limitations to be as much of a barrier
to multisensory processing and, thus, proposed that some multisensory perceptual abilities are
present at birth—presumably because of an inborn sensitivity to the invariant structure of the per-
ceptual array—and that other more sophisticated abilities emerge later as a result of continuous
perceptual learning and differentiation (E.J. Gibson 1969, 1984). Both of these theoretical views
were proposed at a time when no empirical evidence was available. This obviously made it difficult
to assess the validity of either view. Since then, however, a large body of empirical evidence has
accumulated and it has made it clear that the neural and behavioural limitations and the relative lack
of experience play a central role in the development of multisensory processing but that, at the same
time, infants do come into the world with some rudimentary multisensory processing abilities
(Lewkowicz 1994, 2000a, 2002; Lickliter and Bahrick 2000, Walker-Andrews 1997). Moreover, it is
now clear that it is not necessary to posit innate sensitivity to multisensory correspondences because,
as discussed next, organisms have lots of opportunities for experiencing concurrent multisensory
inputs during prenatal life (Turkewitz 1994). Thus, the most reasonable view that has emerged from
the body of empirical evidence to date is that developmental integration and developmental differ-
entiation processes operate side-by-side during early development (Lewkowicz 2002).
It is important to note that the developmental integration and developmental differentiation
views focused mostly on the period after birth. The fact is, however, that multisensory perception
begins prior to birth in all mammalian species and prior to hatching in all avian species because
of the prenatal onset of sensory function (Bradley and Mistretta 1975; Gottlieb 1971; LeCanuet
and Schaal 1996). In addition, in all avian and mammalian species, all but the visual modality
THE DEVELOPMENTAL CHALLENGE AND UNDERLYING PROCESSES 163
develop and have their functional onset according to an invariant sequential order (Gottlieb
1971). Specifically, the tactile modality begins to function first as early as the first trimester of
gestation, followed by the vestibular, the chemosensory, and finally the auditory modality. The
fact that all but the visual modality have their functional onset prior to birth/hatching—even if
their structural and functional properties are immature at first—means that the fetus begins to
receive sensory inputs. This, in turn, means that the developing organism is subject to a barrage
of multisensory inputs during prenatal life and must begin to process the relations among them.
Although it might be reasonable to assume that this barrage of prenatal sensory stimulation
may be confusing and disruptive to an immature and inexperienced fetus, Turkewitz and Kenny
(Turkewitz 1994; Turkewitz and Kenny 1982, 1985) have proposed that this is not necessarily
the case. They have suggested that the sequential emergence of each sensory modality actually
facilitates the development of each modality because it is subject to reduced competitive influ-
ences from the other as yet non-functional modalities. In other words, the sequential emergence
of the different sensory modalities is assumed to have beneficial effects on the ultimate organiza-
tion of multisensory function. Indeed, empirical evidence from animal studies that have tested
this theoretical proposal directly has provided support for it (Honeycutt and Lickliter 2003).
Although the sequential onset of sensory function prior to birth ensures that the developing
fetus is not completely overwhelmed by multisensory inputs, there is little doubt that the fetus is
subject to a great deal of concurrent and redundant multisensory stimulation. As a result, there is
every reason to believe that this kind of prenatal experience has important effects on the organiza-
tion and emergence of multisensory perceptual abilities after birth. For example, it is likely that
the months of concurrent prenatal sensations in the different sensory modalities contribute in an
indirect way (at least in the human case) to the appearance of responsiveness to A-V temporal
synchrony relations at birth (Lewkowicz et al. 2010) and perhaps even to visual responsiveness to
affective vocal stimulation at birth (Mastropieri and Turkewitz 1999). That is, experience with
concurrent multisensory inputs in modalities other than vision may set up neural circuits that are
generally responsive to temporal coherence, regardless of the modality of stimulation. Studies
with birds provide evidence in support of these predictions by showing that prenatal multisensory
stimulation has marked effects on the postnatal emergence of unisensory and multisensory
responsiveness (Lickliter 1993, 2005). For example, Jaime and Lickliter (2006) found that embryos
exposed to temporally synchronous and spatially contiguous auditory and visual stimuli prior to
hatching prefer spatially contiguous audiovisual maternal cues after hatching but that embryos
that did not receive exposure to such stimuli did not exhibit a preference for the contiguous hen
and her call over either the hen alone or the call alone. Similarly, Lickliter et al. (2002, 2004) have
demonstrated that bobwhite quail who are exposed to temporally concurrent auditory and visual
stimulation prior to hatching exhibit more robust learning and memory for multisensory stimu-
lation following hatching.
Although there is now little doubt that a good deal of multisensory perceptual development occurs
prior to birth and that this is advantageous from the newborn’s point of view, the advent of visual
function at birth creates a potentially serious new problem for infants. Now, they must begin to dis-
cover how visual input—poor as it may be due to the highly immature state of the visual system at
birth—corresponds to the diverse sensory inputs in the other modalities. Added to this problem is
the fact that the prenatal sensory experiences of the other modalities occurred in an aquatic environ-
ment and that these modalities are still structurally and functionally immature at birth, though obvi-
ously more mature than vision. Interestingly, however, this may actually be a happy circumstance,
from a developmental limitations view because vision can begin to link up with the other sensory
modalities on the basis of some very simple perceptual cues (e.g., intensity, temporal synchrony) and
this can then further bootstrap the development of multisensory perception.
7.4 The development of multisensory perception

The general developmental picture seems to be that infants start out life being able to perceive
certain types of low-level multisensory relations (e.g. intensity, temporal synchrony) and that as
they grow and gain increasing perceptual experience they gradually acquire the ability to respond
to higher-level types of multisensory relations (e.g. rhythm, affect, gender). This general develop-
mental picture is illustrated by several findings. In terms of low-level relations, it has been found
that young infants can perceive A-V intensity relations (Lewkowicz and Turkewitz 1980), A-V
temporal synchrony relations (Bahrick 1983; Lewkowicz 1986, 1992a, 1992b, 1996b, 2000b, 2003,
2010; Lewkowicz et al. 2010), and isolated A-V phonemic relations (Kuhl and Meltzoff 1982;
Patterson and Werker 1999, 2003; Walton and Bower 1993). Importantly even though younger
infants can detect low-level multisensory relations, it has been found that their ability to detect
them is limited. For example, 4-, 6-, and 8-month-old infants can only detect a desynchronization
of an audiovisual event when the event is relatively simple (i.e. not rhythmically patterned)
whereas 10-month-old infants can detect the desynchronization regardless of whether the event
is complex or not (Lewkowicz 2003). One finding that at first blush seems to challenge this gen-
eral developmental picture is that newborn infants choose to look more at their mother’s face
than a stranger’s face if they were first exposed to their mother’s voice (Sai 2005). This remarkable
finding might be interpreted as reflecting newborns’ detection of some higher-level multisensory
relations. Unfortunately, however, during the test trials in this study infants only saw silent faces.
As a result, they did not have to perform any relational processing and could simply have based
their choice on an association that they formed between their mother’s face and voice prior to the
test. In terms of higher-level relations, it has been found that only 7–8 month-old infants can
perceive amodal affect (Walker-Andrews 1986) and gender (Patterson and Werker 2002) and that
only infants above six months of age can perceive spatiotemporal synchrony (Scheier et al. 2003)
and crossmodal duration (Lewkowicz 1986). Moreover, only 6.5–7-month-old infants can
bind modality-specific attributes such as colour and taste (Reardon and Bushnell 1988) and
8-month-old but not younger infants can perceive the spatial coincidence of auditory and visual
localization cues in an adult-like manner (Neil et al. 2006).
The general pattern of improvement in human infants’ ability to detect various types of multi-
sensory relations has clear parallels in findings from animal studies related to the neural and
behavioral mechanisms underlying multisensory processing early in life and the effects of neural
plasticity. For example, studies in neonatal cats and monkeys have found that the multisensory
cells in the superior colliculus do not integrate the way they do in adult animals and that they only
begin to do so gradually over the first weeks of life (Wallace and Stein 1997, 2001; see also Chapter
14 by Wallace et al.). Similarly, studies in rats have found that appropriate alignment of auditory
and visual maps in the superior colliculus depends on their usual spatial co-registration (Wallace
et al. 2006). This is consistent with the findings from earlier studies in ferrets and barn owls
showing that the spatial tuning and calibration of the neural map of auditory space depends on
concurrent visual input (King et al. 1988; Knudsen and Brainard 1991). In a similar vein, studies
examining the effects of early experience on behavioral responsiveness in birds (i.e. the bobwhite
quail) have demonstrated that the ability to perceive the audible and visible attributes of the
maternal hen as belonging together depends on pre- and post-hatching experience with concur-
rent auditory, visual, and tactile stimulation arising from the embryo’s own vocalizations as well
as from its broodmates and its mother (Lickliter and Banker 1994; Lickliter et al. 1996). Finally,
studies in which visual input has been re-routed to the auditory cortex in newborn ferrets via the
medial geniculate body of the thalamus have shown that the neurons in the primary auditory
cortex become responsive to visual input and become organized into the kinds of orientation
DEVELOPMENTAL BROADENING 165
modules found in visual cortex. Moreover, the re-routed animals exhibit visually appropriate
behavioral responsiveness (Sharma et al. 2000; von Melchner et al. 2000). In sum, the data from
human and animal studies indicate that multisensory perceptual abilities improve with develop-
ment enabling infants to perceive increasingly more complex types of multisensory relations.
They also indicate that experience and neural plasticity contribute critically to this overall process.
In the rest of the chapter I discuss the different ways in which experience can contribute to this
process and show that experience not only facilitates the emergence of multisensory responsive-
ness but that it also contributes to its narrowing and the formation of multisensory perceptual
expertise.
7.5 Developmental broadening

The overall pattern of findings on the emergence of multisensory perceptual abilities is consistent
with the general conventional theoretical view about the course of development and with the
specific conventional theoretical view about the course of perceptual development (E.J. Gibson
1969; Gottlieb 1996; Piaget 1952; Werner 1973). According to the general view of development,
relatively primitive functional capacities, that greatly limit the organism’s adaptive responsive-
ness, appear first and are then followed by a gradual increase in the organism’s structural com-
plexity and functional capacity. Likewise, according to the specific view of perceptual development,
infants gradually learn to differentiate new stimulus properties, patterns, and distinctive features
as they develop (E.J. Gibson 1969; Werner 1973). In essence, both of these conventional views
hold that development is a progressive process that leads to a general broadening of functional
capacity.
With specific regard to perceptual development, the developmental broadening view is sup-
ported by a wealth of empirical evidence. For example, newborn infants have very poor visual
acuity and spatial resolution skills, cannot discriminate between different affective expressions
that the human face can convey, do not understand that objects are bounded and that they have
an independent existence, cannot segment speech it into its meaningful components, cannot
understand the meanings inherent in the temporal structure of events, do not perceive depth, do
not have a fear of heights, and cannot self-locomote. Of course, all of these various skills emerge
and continually improve with development. A good case in point is the development of face per-
ception (Simion et al. 2007). At birth, infants exhibit weak and relatively unstable preferences for
faces (Morton and Johnson 1991; Nelson 2001), by 2 months they exhibit more stable prefer-
ences, and by 6–7 months they begin to respond to facial affect (Ludemann and Nelson 1988) and
begin to categorize faces on the basis of gender (Cohen and Strauss 1979). Similarly, although
newborn infants possess rudimentary auditory perceptual abilities that make it possible for them
to recognize their mother’s voice (Decasper and Fifer 1980) and to distinguish different languages
on the basis of their overall rhythmic attributes (Nazzi et al. 1998), they do not have a functional
lexicon. By 1 month of age infants begin to exhibit basic speech perception abilities that enable
them to perceive sounds in a language-relevant way (Eimas et al. 1971), and over the next months
they begin to acquire various phonetic discrimination abilities that permit them to segment
speech into linguistically meaningful components (Jusczyk 1997). Importantly, the ability to seg-
ment speech does not mean that infants are fully capable of detecting complex aspects of sequences
when they are young. For example, even though 2-month-old infants can learn the statistical rela-
tions linking the adjacent members of a sequence of meaningless visual stimuli (Kirkham et al.
2002), 4-month-old infants do not perceive invariant ordinal relations in sequences composed of
abstract moving objects and their impact sounds even though they can perceive the statistical
relations inherent in such multisensory (Lewkowicz and Berent 2009) as well as in visual-only
sequences (Marcovitch and Lewkowicz 2009). Indeed, it is not until 7–8 months of age that
infants begin to extract the statistical properties of auditory speech (Saffran et al. 1996) and can
learn simple syntactic rules linking auditory speech syllables (Marcus et al. 1999).
7.6 Developmental narrowing

Even though the principle of developmental broadening is supported by a wealth of evidence,
there is now a good deal of evidence that developmental narrowing and a concurrent decline in
plasticity play an important role in development too. The earliest acknowledgement of the impor-
tance of narrowing in behavioural development can be found in the concept of behavioural
canalization first introduced into the psychological literature by Holt (1931) to account for the
emergence of organized motor activity patterns during foetal development. According to Holt,
the initially diffuse sensorimotor activity observed in early foetal life was canalized into organized
motor patterns through the process of behavioural conditioning. Kuo (1976) expanded Holt’s
limited concept of canalization by proposing that narrowing of behavioural potential was not
merely the result of the individual’s history of reinforcement but of the individual’s entire devel-
opmental history, context, and experience. Subsequently, Gottlieb (1991) provided especially
convincing evidence in support of Kuo’s concept of canalization in his studies of the development
of species identification in ducks. In a remarkable set of experiments, Gottlieb demonstrated that
mallard hatchlings’ socially affiliative responses toward their conspecifics are determined not only
by exposure to the vocalizations of their siblings and their mother but to their own vocalizations
as well. He found that as embryos vocalize prior to hatching they learn the critical features of their
species-specific call as well as not to respond to the social signals of other species. In this way, they
narrow their perceptual repertoire.
7.7 Perceptual narrowing

A growing body of evidence from studies of human infants has yielded evidence consistent with
the concept of canalization/narrowing. Overall, this body of evidence shows that perceptual
responsiveness is relatively broad during the first months of life and that, as a result, young infants
respond to native as well as non-native perceptual inputs across different domains (e.g. face,
speech, and music perception). The evidence also shows that as infants grow and acquire experi-
ence with their native ecology, their responsiveness to non-native inputs declines while their
responsiveness to native inputs improves.
7.7.1 Narrowing in speech perception

The best-known and earliest evidence of perceptual narrowing effects in humans comes from
studies of speech perception. It shows that young infants can perceive native and non-native
phonetic contrasts but that older infants can only perceive native contrasts. Streeter (1976)
provided the earliest evidence of broad perceptual tuning by showing that 2-month-old Kikuyu
infants learning the Kikuyu language could discriminate a phonologically relevant voice-onset-
time contrast (e.g. the contrast between a /ba/ and a /pa/) despite the fact that this contrast is
not relevant in their native language. Similarly, Aslin et al. (1981) showed that English-learning
6–12-month-old infants can discriminate phonologically relevant and phonologically irrelevant
voice-onset-time contrasts. Werker and Tees (1984) provided the first direct evidence of per-
ceptual narrowing in the speech domain by showing that 6–8-month-old but not 10–12-month-
old English-learning infants can discriminate non-native consonants such as the Hindi retroflex
/Da/ and the dental /da/ and the Thompson glottalized velar /k’i/ versus the uvular /q’i/.
PERCEPTUAL NARROWING 167
Subsequent cross-linguistic studies have provided similar behavioural (Best et al. 1995) and
neural evidence of narrowing in infant response to native and non-native consonants (Rivera-
Gaxiola et al. 2005) as well as vowels (Cheour et al. 1998; Kuhl et al. 1992).
7.7.2 Narrowing of face perception

Following initial work showing that human and monkey adults are better at recognizing faces from
their own species than faces from other species (Dufour et al. 2006; Pascalis and Bachevalier 1998),
Pascalis and colleagues (Pascalis et al. 2002, 2005) demonstrated that this specialization emerges
during infancy. They found that 6-month-old infants can discriminate human faces and monkey
faces but that 9-month-old infants can only discriminate human faces. Similar evidence of narrow-
ing comes from investigations of the ‘other race effect’ (ORE). The ORE reflects adults’ greater
difficulty in distinguishing the faces of people from other races than the faces of people from their
own race (Chiroro and Valentine 1995). The assumption is that this is due to greater exposure to
individuals from one’s own race than to individuals from a different race. Sangrigoli and de Schonen
(2004) demonstrated that the developmental roots of this specialization can be found in infancy by
showing that as early as 3 months of age, Caucasian infants can discriminate different Caucasian
faces but not different Asian faces. Importantly, when infants were given more extensive experience
with the other-race faces by first being familiarized with three different faces within each race, they
did discriminate different Caucasian faces and different Asiatic faces. Kelly et al. (2005) replicated
the ORE in 3-month-old infants and, in addition, showed that the ORE is absent in newborn
infants. This suggests that selective exposure to the faces from one’s own ethnic group during the
first months of life leads to specialization for such faces. In subsequent studies, Kelly et al. (2007)
have shown that the ORE develops slowly between 3 and 9 months of age and Kelly et al. (2009)
demonstrated that the ORE is also present in non-Caucasian (i.e. Chinese) infants. Finally, evidence
indicates that perceptual specialization for certain types of faces early in life also includes specializa-
tion for speech-related utterances that can be seen whenever a face is talking. This evidence
comes from a study showing that 4- and 6-month-old infants can distinguish between two faces
silently uttering native and non-native speech sounds but that 8-month-old infants no longer make
such discriminations unless they are raised in a bilingual environment (Weikum et al. 2007).
7.7.3 Narrowing in music perception

As in the speech- and face-processing domains, investigators have reported evidence of narrow-
ing in the music-perception domain. For example, Hannon and Trehub (2005a) tested adults’
and infants’ discrimination of folk melodies that had either simple or complex metrical struc-
tures. Simple metrical structure, defined by simple duration ratios of inter-onset intervals of
sounds (e.g. 2:1), predominates in North American music whereas complex metrical structure,
defined by complex duration ratios (e.g. 3:2), predominates in many other musical cultures (e.g.
in the Balkans). Hannon and Trehub found that North American adults successfully detected
differences in melodies based on alterations of simple meters but did not when the differences
were based on alterations of complex meters characteristic of Balkan music. In contrast, adults of
Bulgarian or Macedonian origin detected differences in melodies when they differed in terms of
simple and complex metrical structure. Similar to the adults of Bulgarian or Macedonian origin,
6-month-old infants detected melody differences regardless of whether the differences were based
on simple or complex metrical structure. Subsequently, Hannon and Trehub (2005b) showed
that the simple metrical bias exhibited by North American adults is based on selective experience,
in that 12-month-old infants who grew up in a Western musical environment responded to musical
rhythms in the same way that Western adults did.
7.7.4 The role of experience in perceptual narrowing

Specific experience during an organism’s sensitive period—the period of time during early devel-
opment when the organism is most open to experience (Gottlieb 1992; Michel and Tyler 2005;
Turkewitz and Kenny 1982)—can have one of four effects (Aslin and Pisoni 1980; Gottlieb 1992,
1976, 1981). When specific experience is available it can (1) induce the development of a new
perceptual ability, (2) facilitate the development of an incompletely developed ability, (3) main-
tain the ability at its optimum level into later stages of development, and when a particular expe-
rience is missing this can (4) lead to the decline or loss of the ability.
Given these four possible effects, how might experience contribute to perceptual narrowing
effects? It is relatively well known that the ubiquity of native perceptual inputs induces, maintains,
and facilitates responsiveness to such inputs. What is less clear is whether the absence of non-
native inputs leads to a permanent loss of the ability to respond to the latter types of inputs or just
to a decline in responsiveness. Current opinion is that it is the latter and that the uneven experi-
ence that infants have with native as opposed to non-native inputs leads to perceptual re-organi-
zation rather than a permanent loss in responsiveness to non-native inputs (Lewkowicz and
Ghazanfar 2009; Werker and Tees 2005).
If perceptual re-organization results from the absence of specific perceptual input then extra
experience with non-native input should help maintain responsiveness to it. Indeed, English-
learning infants who are exposed to natural Mandarin Chinese during 12 play sessions between
nine and 10 months of age can discriminate a Mandarin Chinese phonetic contrast that does not
occur in English better than control infants who are equivalently exposed to books and toys but
only hear English (Kuhl et al. 2003). In addition, the usually observed decline in responsiveness
to non-native phonetic contrasts was accompanied by an improvement in response to native
phonetic contrasts during the first year of life (Kuhl et al. 2006). Similar effects of experience
have been found in infant response to non-native faces. For example, Pascalis and colleagues
(Pascalis et al. 2005) demonstrated that the decline in the infant ability to discriminate non-
native faces can be prevented with continued experience with such faces. Infants who were
exposed to monkey faces at home during the 3-month period between 6 and 9 months of age
continued to discriminate monkey faces at 9 months of age. Furthermore, Scott and Monesson
(2009) found that infants who are exposed to individual monkey faces together with unique
names for each face maintain their ability to discriminate monkey faces at 9 months of age
whereas infants who are only exposed to the monkey faces along with a category name (i.e. mon-
key) do not. Although no studies to date have directly manipulated exposure to the faces of other
races (by providing extra exposure at older ages), studies with older children suggest that such
experience can prevent the decline in responsiveness to those kinds of faces as well (Sangrigoli
et al. 2005). What is currently not known is whether, like in infant perception of native speech,
infants’ ability to respond to native faces improves while their ability to respond to non-native
faces declines.
Finally, consistent with the findings of the positive effects of extra experience with non-
native inputs on responsiveness to non-native speech and faces, Hannon and Trehub (2005b)
found that 12-month-old infants could discriminate complex Balkan meters following 2-weeks’
exposure to them. Particularly interesting was the finding that the plasticity observed in the
12-month-old infants is not retained into adulthood because the same 2-week exposure to
Balkan music in North American adults was not effective in restoring successful discrimina-
tion. Thus, as is evident in adults’ ability to learn complex non-native complex rhythms through
formal musical instruction, some plasticity is retained into adulthood but not at the level found
in infancy.
7.7.5 Multisensory perceptual narrowing

It is reasonable to hypothesize that perceptual narrowing may be a general developmental process
and thus that it also exerts its effects on the development of multisensory perception. A first indi-
rect hint that this might be the case came from a study by Lewkowicz and Turkewitz (1980). The
findings from this study indicated that 3-week-old infants as well as adults can match auditory
and visual stimuli on the basis of their intensity. Curiously, however, the adult participants
reported that they found the matching task ‘bizarre’. At the time it was not clear how to interpret
the adults’ description of the task but the subsequent reports of unisensory perceptual narrowing
raised the possibility that the adults’ response reflected the fact that they normally perceive the
coherence of multisensory inputs on the basis of higher-level perceptual attributes and thus that
responsiveness to low-level attributes (i.e. intensity) declines during development.
To test the theoretical possibility that narrowing extends to infant response to multisensory
inputs, my colleagues and I recently embarked on a series of studies. In these studies, we asked
whether human infants and non-human primates exhibit multisensory perceptual narrowing
(MPN) as might be expected from the indirect results of Lewkowicz and Turkewitz (1980) and
from the body of unisensory narrowing effects reviewed earlier.
7.7.6 Narrowing of cross-species multisensory perception in infancy

We began our studies by hypothesizing that the broad perceptual tuning that is known to exist in
unisensory responsiveness should enable young infants to match non-native faces and vocaliza-
tions and that this type of unexpected multisensory responsiveness should decline in the later
months of infancy. We also began our studies knowing that infants can match human faces and
vocalizations starting shortly after birth (Kuhl and Meltzoff 1982, 1984; Patterson and Werker
1999, 2002, 2003) and, critically, that rather than narrow with age, this ability improves in that
older infants can match faces and vocalizations on the basis of higher-level amodal cues such as
gender and affect (Poulin-Dubois et al. 1994, 1998; Walker-Andrews 1986, 1991).
In our initial study (Lewkowicz and Ghazanfar 2006), we tested 4-, 6-, 8-, and 10-month-old
infants who saw pairs of movies in which the same macaque monkey’s face could be seen on side-
by-side monitors mouthing a ‘coo’ call on one monitor and a ‘grunt’ call on the other; in silence
first and then in the presence of one of the two calls. Crucially, the onset of the audible call was
synchronized with the onset of both visible calls but its offset was synchronized only with the
matching visible call. This means that the corresponding facial gestures and vocalizations were
related on the basis of their synchronous onsets and offsets as well as in terms of their durations.
To determine whether infants made A-V matches, we compared the proportion of time infants
looked at a given visible call in the presence of the matching audible call (in-sound condition)
versus the proportion of time they looked at the same call in the absence of the audible call (silent
condition). We found that the 4- and 6-month old infants looked significantly longer at the
matching visible call in the in-sound condition than in the silent condition but that the 8- and
10-month-old infants did not (Fig. 7.1). These findings were consistent with our predictions and
provided the first evidence of MPN.
7.7.7 Mechanisms of cross-species multisensory perception

Because the matching visible and audible calls in the Lewkowicz and Ghazanfar (2006) study were
related in terms of their synchronous onsets and offsets and in terms of their durations, in follow-
up studies we investigated which cues were critical for successful matching (Lewkowicz et al. 2008).
To do so, we repeated the Lewkowicz and Ghazanfar (2006) study, but this time we presented the
0.6
Mean proportion of looking (s)
0.5
0.4
0.3 Silent
In sound
0.2
0.1
0.0
4 months 6 months 8 months 10 months
Age
Fig. 7.1 Cross-species intersensory matching in 4–10 month-old infants. The figure shows the mean
proportion of looking time directed at the matching visible call out of the total amount of looking
time directed at both calls in the silent and the in-sound conditions. Error bars indicate standard
errors of the mean.
audible call 666 ms prior to the onset of the two visible calls. In this way, the matching visible and
audible calls were no longer synchronized but still corresponded in terms of their duration.
In contrast, the non-corresponding visible and audible calls did not correspond at all. This time,
we found that neither the younger nor the older infants performed multisensory matching. This
indicated that A-V temporal synchrony mediates multisensory matching in the younger infants,
that it is no longer sufficient for matching at the older ages, and that duration did not play a role in
matching. If duration alone had mediated matching then infants would have made matches even
if the audible and visible calls were out of synchrony because the matching audible and visible calls
were still related in terms of their durations. Interestingly, the infants’ failure to match on the
basis of duration is consistent with prior findings showing that infants do not make intersensory
matches of duration when the durations are not synchronous but do when they are synchronous
(Lewkowicz 1986).
In addition to questions about the role of synchrony cues, the Lewkowicz and Ghazanfar (2006)
study raised two other questions. First, might the older infants’ failure to make matches be due to
unisensory processing deficits? This is likely in the visual modality because, as indicated earlier,
responsiveness to non-native faces declines during infancy. It is also likely in the auditory modality
although no studies to date have reported whether responsiveness to non-native vocalizations per
se declines. Second, does the decline in cross-species multisensory perception reflect the waning
influence and early dominance of multisensory perceptual mechanisms that rely primarily on the
detection of multisensory temporal synchrony relations and the concurrent cumulative effects of
perceptual experience that gradually enables infants to discover higher-level multisensory
attributes? To answer these two questions, Lewkowicz et al. (2008) examined the data from the
silent paired-preference trials and found that both the 4–6-month-old and the 8–10-month-old
infants exhibited differential looking at the two silent visible monkey calls, indicating that the
decline in multisensory responsiveness could not have been due to a decline in visual responsiveness.
In addition, Lewkowicz et al. (2008) conducted a discrimination study in which 8–10-month-old
infants were habituated to one of the audible monkey calls and then tested with the other one.
Results indicated that infants discriminated between the two audible calls. Thus, the failure of the
older infants to match the monkey faces and vocalizations was not due to a unisensory processing
deficit in either sensory modality. Finally, Lewkowicz et al. (2008) examined whether 12- and
18-month-old infants might be able to match the monkey faces and vocalizations that were
presented in the original Lewkowicz and Ghazanfar (2006) study and found that they did not
match. This finding suggests that the ability to perform cross-species multisensory matching does
not return once it has declined and is consistent with the findings reviewed earlier on infant
response to non-native speech, faces, and music.
7.7.8 Cross-species multisensory perception at birth

As indicated earlier, perceptual tuning is likely to be broad starting at birth. This is because sen-
sory/perceptual functions are so crude at birth that newborns are not able to detect the higher-
level attributes of multisensory inputs and, in addition, are perceptually inexperienced. As a
result, it is probable that sensitivity to cross-species multisensory relations is present at birth. This
prediction is made all the more likely by three facts:
1. Multisensory temporal synchrony mediates cross-species perception in young infants (Lewko-
wicz et al. 2008).
2. Detection of A-V synchrony only requires perception of energy onsets and offsets in the two
modalities (Lewkowicz 2010).
3. The neural mechanisms of multisensory temporal synchrony detection are probably operational
at birth because, despite the fact that such mechanisms are widely distributed in the brain, some
of them are subcortical (Bushara et al. 2001) and thus likely to be present at birth.
To determine whether newborns can make cross-species multisensory matches, Lewkowicz
et al. (2010) repeated the Lewkowicz and Ghazanfar (2006) study, but this time by testing new-
born infants. In the first experiment findings showed that newborns did indeed match monkey
audible and visible calls (Fig. 7.2) indicating that infants are broadly tuned to multisensory signals
from birth onwards. The findings of successful matching at birth then raised the obvious question
about the underlying mechanism. Given that matching in older infants was found to be driven by
temporal synchrony cues (Lewkowicz et al. 2008), we hypothesized that newborns also rely on
synchrony to make the multisensory matches and that they probably do so by relying on nothing
more than energy onsets and offsets. To test this hypothesis, in the second experiment we substi-
tuted a complex tone for the natural audible call but, critically, preserved the temporal features
of the audible call by ensuring that the tone had the same duration as the natural call and that
its onsets and offsets were synchronous with the matching visible call. Once again, findings
indicated that newborns performed multisensory matching in that they looked longer at the vis-
ible monkey call in the presence of the synchronous tone than in its absence (Fig. 7.2). What was
most remarkable is that newborns matched despite the fact that the audible stimulus lacked any
identity information and that the correlation between the dynamic variations in the facial ges-
tures and the amplitude and formant structure inherent in the natural audible call was now
absent. This finding clearly shows that the broad multisensory perceptual tuning found early in
life is based on responsiveness to low-level perceptual cues.
7.7.9 Narrowing of multisensory perception of audiovisual speech

The multisensory narrowing findings reviewed so far suggest that perceptual narrowing is a gen-
eral pan-sensory process. If so, then cross-language multisensory responsiveness might also nar-
row during infancy (see also Chapter 9). To test this prediction, Pons et al. (2009) examined
Spanish-learning and English-learning infants’ ability to match visible and audible versions of a
/ba/ and a /va/. These two particular phonemes were chosen because the /v/ sound does not exist
in the Spanish language (it is a homophone of the /b/ sound). Thus, the phonetic distinction
0.8
a b
Silent
In sound
Mean proportion of looking at matching call

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Experiment 1 Experiment 2
Fig. 7.2 Cross-species intersensory matching in newborn infants. The figure shows the mean
proportion of looking-time directed at the matching silent visible call out of the total amount of
looking-time directed at both calls in the silent and the in-sound conditions. (a) Looking at monkey
visible vocalizations in the presence of naturalistic monkey vocalizations; (b) looking at monkey
visible vocalizations in the presence of complex tones that were synchronized with their respective
visible vocalizations and, thus, also matched them in terms of their durations. The fundamental
frequency of each tone was the average fundamental frequency of the two naturalistic
vocalizations. Error bars indicate standard errors of the mean.
between a /ba/ and /va/ is not discriminable to a Spanish speaker. This makes the Spanish lan-
guage the ideal test bed for investigating cross-language MPN. That is, if phonetic tuning is broad
at first and then narrows, then young Spanish-learning infants should be able to match a visible /
ba/ with an audible /ba/ and a visible /va/ with an audible /va/ whereas older Spanish-learning
infants should not. In contrast, English-learning infants, for whom /ba/ and /va/ are both phone-
mically relevant, should be able to make the matches at both ages. Pons et al. tested 6-month-old
and 11-month-old Spanish-learning infants and compared their performance to 6- and 11-month-
old English-learning infants.
The experiment consisted of six trials. The first two were silent paired-preference trials during
which infants saw side-by-side faces of the same person mouthing a /ba/ on one side and a /va/ on the
other side in silence. The next four trials alternated between familiarization trials during which
infants heard one of the two audible syllables in the absence of visual stimulation and the same silent
paired-preference trials as administered before. As predicted, and as depicted in Fig. 7.3, the 6-month-
old but not the 11-month-old Spanish-learning infants looked significantly longer at the matching
visible syllable following auditory familiarization than prior to familiarization. In contrast, and as
expected, the English-learning infants exhibited intersensory matching at both ages.
To determine whether the narrowing persists into adulthood, we also tested Spanish- and English-
speaking adults in a forced-choice multisensory matching task. We presented them with a total of
16 trials during which they first heard one of the two possible syllables (/va/ or /ba/) repeated twice
and, one second later, saw the two silent visible syllables presented side-by-side twice. As expected,
the Spanish-speaking subjects’ choices of matching face were at chance (51.5%) whereas the English-
learning adults’ choices were correct on 92% of the trials. Together, the infant and adult findings
60
50
40
30
Preferential looking times (%)
20
10
0
−10
−20
−30
−40
−50
6-month-olds 11-month-olds 6-month-olds 11-month-olds

Spanish English
Fig. 7.3 Looking-time difference scores to the matching face (percentage of total time infants
looked at the matching face during the test trials minus percentage of total time they looked at it
during the silent trials). Open circles represent each infant’s difference score and black circles with
error bars represent the mean difference score and the standard error of the mean for each group
(Reproduced in colour in the colour plate section).
from this study provided the first evidence of MPN in infant response to non-native speech and
suggest that narrowing is a domain-general as well as a pan-sensory developmental process.
Findings from one of our more recent studies (Lewkowicz and Hansen-Tift 2012) provide addi-
tional support for MPN in the speech domain. They indicate that selective attention to the mouth
as opposed to the eyes of a talker is differentially affected by native versus non-native speech at 12
months of age—a point in development when narrowing is complete. Infants exposed to non-
native speech looked longer at the mouth than at the eyes of the talker whereas infants exposed to
native speech did not.
7.7.10 Cross-species multisensory matching in non-human primates

The final question that we asked with respect to MPN was whether multisensory perceptual
narrowing is an evolutionarily old or new process. To do so, we (Zangenehpour et al. 2009) tested
young vervet monkeys with the same monkey faces and vocalization that were presented by
Lewkowicz and Ghazanfar (2006) and using a similar procedure. Prior findings have shown that
macaque monkeys (Macaca mulatta, Macaca fuscata), capuchins (Cebus apella) and chimpanzees
(Pan troglodytes) all are able to perceive the relationship between vocalizing faces and corresponding
sounds (Adachi et al. 2006; Ghazanfar and Logothetis 2003; Ghazanfar et al. 2007; Izumi and Kojima
2004; Jordan et al. 2005; Parr 2004). Thus, we tested two age groups (23–38 weeks of age and 39–65
weeks) of vervet monkeys and asked whether they could match the faces and vocalizations of
macaque monkeys or whether they, like human infants, exhibit narrowing effects. The specific ages
were chosen to ensure that the vervets were well past the point at which their ability to make cross-
species matches should have declined. Two alternative predictions were tested. If MPN is an evolu-
tionarily conserved characteristic of early development, and if it occurs regardless of the rate of
(a) 23–38 weeks old 39–65 weeks old (b)

1.00 3.00
Duration of looking time (s)
Duration of looking time (s)

0.80
2.00
0.60
0.40
1.00
0.20
0.00 0.00
Match Non-match
ch
ch
ch
ch
at
at
at
at
M
-m
-m
on
on
N
Fig. 7.4 Cross-species intersensory matching in vervet monkeys. The figure shows the mean
duration of looking-time directed at the matching and non-matching visible rhesus calls.
(a) Looking durations at the matching and non-matching calls in the presence of the naturalistic
rhesus vocalizations; (b) looking durations at the matching and non-matching calls in the presence
of complex tones that were synchronized with their respective visible vocalizations and thus also
matched them in terms of their durations. The fundamental frequency of each tone was the
average fundamental frequency of the two naturalistic vocalizations. Error bars indicate standard
errors of the mean.
neural development, then vervets should not match the rhesus faces and vocalizations because they
are past the age of narrowing. If, however, MPN reflects the lengthening of the period of neural and
behavioral development—as is the case in humans relative to other non-human primates—then
vervets may not exhibit narrowing because neural development in monkeys occurs four times faster
than in humans. In other words, the much faster course of neural development in monkeys may
make their nervous system less open to the effects of early perceptual experience.
The results indicated that the vervets matched the rhesus audible and visible calls and, thus, were
consistent with the second prediction. The matching data were, however, somewhat surprising in
that the vervets exhibited greater looking at the non-matching visual call in the presence of the audi-
ble call than at the matching one (Fig. 7.4a). Additional experiments revealed that this effect was due
to the fear-inducing effects of the audible call. This was supported by the finding that when the
spectral information that is normally available in the natural call was replaced by a complex tone
stimulus whose duration matched that of the natural call and whose onset and offset correspond to
the matching visible call, the vervets switched their preference to the matching visual call (Fig 7.4b).
The fear hypothesis was further confirmed by measurements of pupil size. When the vervets listened
to the natural call they exhibited significantly larger pupil size when looking at the matching visible
call in the presence of the corresponding natural call than when looking at the non-matching visible
call. In contrast, when the vervets listened to the tone, they showed no difference in pupil size.
Because pupil size reflects arousal, the pupil data suggested that the greater looking at the non-
matching visible call in the presence of the natural call reflected intersensory matching. Together,
the findings from the vervet monkeys suggest that MPN is a recent evolutionary innovation and that
it reflects the effects of neoteny—the lengthening of the early developmental period in humans as
opposed to vervets. This lengthening presumably makes humans more open to the effects of early
REFERENCES 175
experience. If so, this suggests that other kinds of perceptual and cognitive skills that differentiate
between humans from non-human primates may have developmental roots (Gould 1977).
7.8 Conclusions
There is no longer any doubt that infants come into the world with some rudimentary multisensory
perceptual skills and that as they grow these skills improve and broaden in scope. The processes
that contribute to this broadening are neural growth and differentiation and the perceptual
experience that infants acquire as they grow. Paradoxically, however, experience also contributes
to the narrowing of multisensory responsiveness through the highly selective exposure that
infants have to native as opposed to non-native multisensory inputs. Furthermore, as our vervet
monkey study shows, it appears that MPN may be a recent evolutionary innovation and that it
is the direct result of the neotenous nature of humans as opposed to other primate species. If this
is the case then it is possible that the great apes, whose developmental period is more similar
in length to humans, might also undergo perceptual narrowing in both unisensory and multisen-
sory responsiveness.
One of the questions that the narrowing work raises concerns the role that the neural substrate
plays in perceptual narrowing. Some investigators have proposed that a ‘selectionist’ or regressive
mechanism of neural development partly accounts for the regressive nature of perceptual nar-
rowing (Nelson 2001; Scott et al. 2007). This view is based on theories that propose that neuronal
networks are initially diffuse and that subsequently more modularized networks emerge as expe-
rience ‘prunes’ exuberant connections. The problem with this view is that, except for some rela-
tively isolated cases, for the most part, the nervous system increases greatly in size and the number
of synapses increases enormously during development. This suggests that the selectionist view is
incorrect (Purves et al. 1996). Therefore, a more reasonable view is that narrowing is mediated by
a selective elaboration of synapses whose relevance is determined by postnatal experience and not
the selective pruning of irrelevant synapses (Lewkowicz and Ghazanfar 2009). Regardless of what
specific neural mechanism(s) underlie the MPN phenomenon, there is no doubt that MPN is
clear evidence of the intricate interplay between neural growth and differentiation, on the one
hand, and perceptual experience, on the other. Needless to say, it is the interaction between these
two processes that contributes to the ultimate emergence of multisensory perceptual expertise.
Acknowledgements
The writing of this chapter was supported by a grant from the National Science Foundation
(BCS-0751888).
References
Adachi, I., Kuwahata, H., Fujita, K., Tomonaga, M., and Matsuzawa, T. (2006). Japanese macaques form a
cross-modal representation of their own species in their first year of life. Primates, 47, 350–54.
Arnold, P., and Hill, F. (2001). Bisensory augmentation: A speech-reading advantage when speech is clearly
audible and intact. British Journal of Psychology, 92, 339–55.
Aslin, R.N., and Pisoni, D.B. (1980). Some developmental processes in speech perception. In Child
phonology: perception (eds. G. Yeni-Komshian, J.F., Kavangh, and C.A. Ferguson), pp. 67–96. Academic
Press, New York.
Aslin, R.N., Pisoni, D.B., Hennessy, B.L., and Percy, A.J. (1981). Discrimination of voice onset time by
human infants: New findings and implications for the effects of early experience. Child Development,
52, 1135–45.
Bahrick, L.E. (1983). Infants’ perception of substance and temporal synchrony in multimodal events. Infant
Bahrick, L.E., and Lickliter, R. (2000). Intersensory redundancy guides attentional selectivity and perceptual
Bahrick, L.E., Flom, R., and Lickliter, R. (2002). Intersensory redundancy facilitates discrimination of
tempo in 3-month-old infants. Developmental Psychobiology, 41, 352–63.
Bahrick, L.E., Lickliter, R., and Flom, R. (2004). Intersensory redundancy guides the development of
selective attention, perception, and cognition in infancy. Current Directions in Psychological Science,
13, 99–102.
Bertelson, P., and Radeau, M. (1981). Cross-modal bias and perceptual fusion with auditory-visual spatial
discordance. Perception and Psychophysics, 29, 578–84.
Best, C.T., Mcroberts, G.W., Lafleur, R., and Silver-Isenstadt, J. (1995). Divergent developmental patterns
for infants’ perception of two nonnative consonant contrasts. Infant Behavior and Development, 18,
339–50.
Bradley, R., and Mistretta, C. (1975). Fetal sensory receptors. Physiological Reviews, 55, 352–82.
Bushara, K.O., Grafman, J., and Hallett, M. (2001). Neural correlates of auditory-visual stimulus onset
asynchrony detection. Journal of Neuroscience, 21, 300–304.
Cheour, M., Ceponiene, R., Lehtokoski, A., et al. (1998). Development of language-specific phoneme
representations in the infant brain. Nature Neuroscience, 1, 351–53.
Chiroro, P., and Valentine, T. (1995). An investigation of the contact hypothesis of the own-race bias in
face recognition. The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology,
48A, 879–94.
Cohen, L.B., and Strauss, M.S. (1979). Concept acquisition in the human infant. Child Development, 50,
419–24.
Colombo, J. (2001). The development of visual attention in infancy. Annual Review of Psychology, 52,
337–67.
Decasper, A.J., and Fifer, W.P. (1980). Of human bonding: newborns prefer their mothers’ voices. Science,
208, 1174–76.
Delaney, S.M., Dobson, V., Harvey, E.M., Mohan, K.M., Weidenbacher, H.J., and Leber, N.R. (2000).
Stimulus motion increases measured visual field extent in children 3.5 to 30 months of age. Optometry
Visual Science, 77, 82–9.
Dufour, V., Pascalis, O., and Petit, O. (2006). Face processing limitation to own species in primates: A
comparative study in brown capuchins, Tonkean macaques and humans. Behavioural Processes, 73,
107–113.
Eimas, P.D., Siqueland, E.R., Jusczyk, P., and Vigorito, J. (1971). Speech perception in infants. Science, 171,
303–306.
Foxe, J.J., and Schroeder, C.E. (2005). The case for feedforward multisensory convergence during early
cortical processing. Neuroreport, 16, 419.
Ghazanfar, A.A. and Logothetis, N.K. (2003). Facial expressions linked to monkey calls. Nature, 423, 937–38.
Sciences, 10, 278–85.
Ghazanfar, A., Turesson, H., Maier, J., Van Dinther, R., Patterson, R., and Logothetis, N. (2007). Vocal-tract
resonances as indexical cues in rhesus monkeys. Current Biology, 17, 425–30.
Gibson, E.J. (1969). Principles of perceptual learning and development. Appleton-Century-Crofts, New York.
REFERENCES 177
Gibson, E.J. (1984). Perceptual development from the ecological approach. In Advances in developmental
psychology, Vol. 3 (eds. M.E. Lamb, A.L. Brown, and B. Rogoff), pp. 243–86. Lawrence Erlbaum
Gibson, J.J. (1966). The senses considered as perceptual systems. Houghton-Mifflin, Boston, MA.
Gibson, J.J. (1979). An ecological approach to perception. Houghton-Mifflin, Boston, MA.
development (eds. E. Tobach, L.R. Aronson, and E. Shaw), pp. 67–128. Academic Press, New York.
Gottlieb, G. (1976). The roles of experience in the development of behavior and the nervous system.
In Development of neural and behavioral specificity (ed. G. Gottlieb), pp. 25–54. Academic Press,
New York.
Gottlieb, G. (1981). Roles of early experience in species-specific perceptual development. In Development of
perception: Psychobiological perspectives (R.N. Aslin, J.R. Alberts, and M.R. Petersen), pp. 5–44.
Academic Press, New York.
Gottlieb, G. (1991). Experiential canalization of behavioral development: results. Developmental Psychology,
27, 35–39.
Gottlieb, G. (1992). Individual development and evolution: The genesis of novel behavior. Oxford University
Press, New York.
Gottlieb, G. (1996). Developmental psychobiological theory. In Developmental science. Cambridge studies in
social and emotional development (eds. R.B. Cairns, and G.H. Elder, Jr.), pp. 63–78. Cambridge
Gould, S. (1977). Ontogeny and phylogeny. Belknap press, Cambridge, MA.
Hannon, E.E., and Trehub, S.E. (2005a). Metrical categories in infancy and adulthood. Psychological
Science, 16, 48–55.
Hannon, E.E., and Trehub, S.E. (2005b). Tuning in to musical rhythms: infants learn more readily than
adults. Proceedings of the National Academy of Sciences USA, 102, 12639–43.
Holt, E.B. (1931). Animal drive and the learning process. Holt, New York.
Honeycutt, H., and Lickliter, R. (2003). The influence of prenatal tactile and vestibular stimulation on
auditory and visual responsiveness in bobwhite quail: A matter of timing. Developmental Psychobiology,
43, 71–81.
Innes-Brown, H., Barutchu, A., Shivdasani, M.N., Crewther, D.P., Grayden, D.B., and Paolini, A.G. (2011).
Susceptibility to the flash-beep illusion is increased in children compared to adults. Developmental
Science, 14(5), 1089–1099. doi:10.1111/j.1467-7687.2011.01059.
Izumi, A., and Kojima, S. (2004). Matching vocalizations to vocalizing faces in a chimpanzee (Pan
troglodytes). Animal Cognition, 7, 179–84.
Jaime, M., and Lickliter, R. (2006). Prenatal exposure to temporal and spatial stimulus properties affects
postnatal responsiveness to spatial contiguity in bobwhite quail chicks. Developmental Psychobiology,
48, 233.
Jordan, K., Brannon, E., Logothetis, N., and Ghazanfar, A. (2005). Monkeys match the number of voices
they hear to the number of faces they see. Current Biology, 15, 1034–38.
Jusczyk, P.W. (1997). The discovery of spoken language. MIT Press, Cambridge, MA.
Kelly, D.J., Quinn, P.C., Slater, A.M., et al. (2005). Three-month-olds, but not newborns, prefer own-race
faces. Developmental Science, 8, F31–36.
Kelly, D.J., Quinn, P.C., Slater, A., Lee, K., Ge, L., and Pascalis, O. (2007). The other-race effect develops
during infancy: Evidence of perceptual narrowing. Psychological Science, 18, 1084–89.
Kelly, D.J., Liu, S., Lee, K., et al. (2009). Development of the other-race effect during infancy: evidence
toward universality? Journal of Experimental Child Psychology, 104, 105–114.
King, A.J., Hutchings, M.E., Moore, D.R., and Blakemore, C. (1988). Developmental plasticity in the visual
and auditory representations in the mammalian superior colliculus. Nature, 332, 73–76.
Kirkham, N.Z., Slemmer, J.A., and Johnson, S.P. (2002). Visual statistical learning in infancy: Evidence for
a domain general learning mechanism. Cognition, 83, B35–42.
Knudsen, E.I., and Brainard, M.S. (1991). Visual instruction of the neural map of auditory space in the
developing optic tectum. Science, 253, 85–87.
Kuhl, P.K., and Meltzoff, A.N. (1982). The bimodal perception of speech in infancy. Science, 218, 1138–41.
Kuhl, P.K., and Meltzoff, A.N. (1984). The intermodal representation of speech in infants. Infant Behavior
Kuhl, P.K., Williams, K.A., Lacerda, F., Stevens, K.N., and Lindblom, B. (1992). Linguistic experience alters
phonetic perception in infants by 6 months of age. Science, 255, 606–608.
Kuhl, P.K., Tsao, F.M., and Liu, H.M. (2003). Foreign-language experience in infancy: effects of short-term
exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences
USA, 100, 9096–9101.
Kuhl, P.K., Stevens, E., Hayashi, A., et al. (2006). Infants show a facilitation effect for native language
phonetic perception between 6 and 12 months. Developmental Science, 9, F13–21.
Kuo, Z.Y. (1976). The dynamics of behavior development: an epigenetic view. Plenum, New York.
LeCanuet, J., and Schaal, B. (1996). Fetal sensory competencies. European Journal of Obstetrics and
Gynecology, 68, 1–23.
Lewkowicz, D.J. (1986). Developmental changes in infants’ bisensory response to synchronous durations.
Infant Behavior and Development, 9, 335–53.
Lewkowicz, D.J. (1988a). Sensory dominance in infants: I. Six-month-old infants’ response to auditory-
Lewkowicz, D.J. (1988b). Sensory dominance in infants: II. Ten-month-old infants’ response to auditory-
Lewkowicz, D.J. (1992a). Infants’ response to temporally based intersensory equivalence: the effect of
synchronous sounds on visual preferences for moving stimuli. Infant Behavior and Development, 15,
297–324.
Lewkowicz, D.J. (1992b). Infants’ responsiveness to the auditory and visual attributes of a sounding/
moving stimulus. Perception and Psychophysics, 52, 519–28.
Lewkowicz, D.J. (1994). Development of intersensory perception in human infants. In The development of
intersensory perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 165–203.
Lewkowicz, D.J. (1996a). Infants’ response to the audible and visible properties of the human face. I: Role
of lexical-syntactic content, temporal synchrony, gender, and manner of speech. Developmental
Lewkowicz, D.J. (1996b). Perception of auditory-visual temporal synchrony in human infants. Journal of
Lewkowicz, D.J. (1998). Infants’ response to the audible and visible properties of the human face: II.
Discrimination of differences between singing and adult-directed speech. Developmental Psychobiology,
32, 261–74.
Lewkowicz, D.J. (2000a). The development of intersensory temporal perception: an epigenetic systems/
Lewkowicz, D.J. (2000b). Infants’ perception of the audible, visible and bimodal attributes of multimodal
syllables. Child Development, 71, 1241–57.
Lewkowicz, D.J. (2003). Learning and discrimination of audiovisual events in human infants: the
hierarchical relation between intersensory temporal synchrony and rhythmic pattern cues.
Lewkowicz, D.J. (2004). Perception of serial order in infants. Developmental Science, 7, 175–84.
REFERENCES 179
Lewkowicz, D.J. (2010). Infant perception of audio-visual speech synchrony. Developmental Psychology, 46,
66–77.
Lewkowicz, D.J., and Berent, I. (2009). Sequence learning in 4 month-old infants: do infants represent
ordinal information? Child Development, 80, 1811–23.
Lewkowicz, D.J., and Ghazanfar, A.A. (2006). The decline of cross-species intersensory perception in
human infants. Proceedings of the National Academy Sciences USA, 103, 6771–74.
Lewkowicz, D.J., and Hansen-Tift, A.M. (2012). Infants deploy selective attention to the mouth of a talking
face when learning speech. Proceedings of the National Academy of Sciences, 109(5), 1431–1436.
doi:10.1073/pnas.1114783109
Lewkowicz, D.J., and Kraebel, K. (2004). The value of multimodal redundancy in the development of
intersensory perception. In Handbook of multisensory processing (eds. G. Calvert, C. Spence, and B.E.
Stein), pp. 655–78. MIT Press, Cambridge, MA.
Lewkowicz, D.J., Sowinski, R., and Place, S. (2008). The decline of cross-species intersensory perception in
human infants: underlying mechanisms and its developmental persistence. Brain Research, 1242, 291–302.
Lewkowicz, D.J., Leo, I., and Simion, F. (2010). Intersensory perception at birth: newborns match non-
human primate faces and voices. Infancy, 15, 46–60.
Lickliter, R. (1993). Timing and the development of perinatal perceptual organization. Developmental Time
and Timing (eds. G. Turkewitz, and D.A. Devenny), pp. 105–24. Lawrence Erlbaum Associates,
HIllsdale, NJ.
Lickliter, R. (2005). Prenatal sensory ecology and experience: implications for perceptual and behavioral
development in precocial birds. Advances in the Study of Behavior, 35, 235–74.
Lickliter, R., and Banker, H. (1994). Prenatal components of intersensory devleopment in precocial birds.
In The development of intersensory perception: comparative perspectives (eds. D.J. Lewkowicz and R.
Lickliter), pp. 59–80. Lawrence Erlbaum Associates, Hillsdale, NJ.
Lickliter, R., Lewkowicz, D.J., and Columbus, R.F. (1996). Intersensory experience and early perceptual
development: the role of spatial contiguity in bobwhite quail chicks’ responsiveness to multimodal
maternal cues. Developmental Psychobiology, 29, 403–416.
Lickliter, R., Bahrick, L.E., and Honeycutt, H. (2002). Intersensory redundancy facilitates prenatal perceptual
learning in bobwhite quail (Colinus virginianus) embryos. Developmental Psychology, 38, 15–23.
Lickliter, R., Bahrick, L.E., and Honeycutt, H. (2004). Intersensory redundancy enhances memory in
bobwhite quail embryos. Infancy, 5, 253–69.
Ludemann, P.M., and Nelson, C.A. (1988). Categorical representation of facial expressions by 7-month-old
infants. Developmental Psychology, 24, 492–501.
Macleod, A., and Summerfield, Q. (1987). Quantifying the contribution of vision to speech perception in
noise. British Journal of Audiology, 21, 131–41.
Maier, N.R.F., and Schneirla, T.C. (1964). Principles of animal psychology, New York, Dover Publications.
Marcovitch, S., and Lewkowicz, D.J. (2009). Sequence learning in infancy: the independent contributions
of conditional probability and pair frequency information. Developmental Science, 12, 1020–25.
Marcus, G.F., Vijayan, S., Rao, S., and Vishton, P. (1999). Rule learning by seven-month-old infants.
Science, 283, 77–80.
Marks, L. (1978). The unity of the senses. Academic Press, New York.
Mastropieri, D., and Turkewitz, G. (1999). Prenatal experience and neonatal responsiveness to vocal
expressions of emotion. Developmental Psychobiology, 35, 204–214.
McGurk, H., and Macdonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 229–39.
Meredith, M.A., and Stein, B.E. (1983). Interactions among converging sensory inputs in the superior
colliculus. Science, 221, 389–91.
Michel, G.F., and Tyler, A.N. (2005). Critical period: a history of the transition from questions of when, to
what, to how. Developmental Psychobiology. Special Issue: Critical Periods Re-examined: Evidence from
Human Sensory Development, 46, 156–62.
Morrongiello, B.A. (1988). Infants’ localization of sounds in the horizontal plane: estimates of minimum
audible angle. Developmental Psychology, 24, 8–13.
Morrongiello, B.A., Fenwick, K.D., and Chance, G. (1990). Sound localization acuity in very young infants:
An observer-based testing procedure. Developmental Psychology, 26, 75–84.
Morton, J., and Johnson, M.H. (1991). CONSPEC and CONLERN: A two-process theory of infant face
recognition. Psychological Review, 98, 164–81.
Nazzi, T., Bertoncini, J., and Mehler, J. (1998). Language discrimination by newborns: toward an
understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and
Performance, 24, 756–66.
Nelson, C.A. (2001). The development and neural bases of face recognition. Infant and Child Development,
10, 3–18.
Parr, L.A. (2004). Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition.
Animal Cognition, 7, 171–78.
Partan, S., and Marler, P. (1999). Communication goes multimodal. Science, 283, 1272–3
Pascalis, O., and Bachevalier, J. (1998). Face recognition in primates: a cross-species study. Behavioural
Processes, 43, 87–96.
Pascalis, O., Haan, M.D., and Nelson, C.A. (2002). Is face processing species-specific during the first year of
life? Science, 296, 1321–23.
Pascalis, O., Scott, L.S., Kelly, D.J., et al. (2005). Plasticity of face processing in infancy. Proceedings of the
National Academy of Sciences USA, 102, 5297–300.
Patterson, M.L., and Werker, J.F. (1999). Matching phonetic information in lips and voice is robust in
4.5-month-old infants. Infant Behavior and Development, 22, 237–47.
Patterson, M.L., and Werker, J.F. (2002). Infants’ ability to match dynamic phonetic and gender
information in the face and voice. Journal of Experimental Child Psychology, 81, 93–115.
Patterson, M.L., and Werker, J.F. (2003). Two-month-old infants match phonetic information in lips and
voice. Developmental Science, 6, 191–96.
Piaget, J. (1952). The origins of intelligence in children, New York, International Universities Press.
Pick, H.L., Warren, D.H., and Hay, J.C. (1969). Sensory conflict in judgments of spatial direction.
speech perception in infancy. Proceedings of the National Academy of Sciences USA, 106, 10598–602.
Poulin-Dubois, D., Serbin, L.A., Kenyon, B., and Derbyshire, A. (1994). Infants’ intermodal knowledge
about gender. Developmental Psychology, 30, 436–42.
Poulin-Dubois, D., Serbin, L.A., and Derbyshire, A. (1998). Toddlers’ intermodal and verbal knowledge
about gender. Merrill-Palmer Quarterly, 44, 338–54.
Purves, D., White, L., E., and Riddle, D., R. (1996). Is neural development darwinian? Trends in
Neurosciences, 19, 460–64.
Recanzone, G.H. (2009). Interactions of auditory and visual stimuli in space and time. Hearing Research,
258, 89–99.
REFERENCES 181
Rivera-Gaxiola, M., Silva-Pereyra, J., and Kuhl, P. (2005). Brain potentials to native and non-native speech
contrasts in 7- and 11-month-old American infants. Developmental Science, 8, 162–72.
Rosenblum, L.D., Schmuckler, M.A., and Johnson, J.A. (1997). The McGurk effect in infants. Perception
Rowe, C. (1999). Receiver psychology and the evolution of multicomponent signals. Animal Behaviour, 58,
921–31.
Saffran, J.R., Aslin, R.N., and Newport, E.L. (1996). Statistical learning by 8-month-old infants. Science,
274, 1926–28.
Sai, F.Z. (2005). The role of the mother’s voice in developing mother’s face preference: evidence for
Sangrigoli, S., and De Schonen, S. (2004). Recognition of own-race and other-race faces by three-month-old
infants. Journal of Child Psychology and Psychiatry, 45, 1219–27.
Sangrigoli, S., Pallier, C., Argenti, A.M., Ventureyra, V.A.G., and De Schonen, S. (2005). Reversibility of the
other-race effect in face recognition during childhood. Psychological Science, 16, 440–44.
Scheier, C., Lewkowicz, D.J., and Shimojo, S. (2003). Sound induces perceptual reorganization of an
ambiguous motion display in human infants. Developmental Science, 6, 233–44.
Scott, L.S., and Monesson, A. (2009). The origin of biases in face perception. Psychological Science, 20,
676–80.
Scott, L.S., Pascalis, O., and Nelson, C.A. (2007). A domain general theory of the development of perceptual
discrimination. Current Directions in Psychological Science, 16, 197–201.
Sekuler, R., Sekuler, A.B., and Lau, R. (1997). Sound alters visual motion perception. Nature, 385, 308.
Shams, L., and Seitz, A. (2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12, 411–417.
Shams, L., Kamitani, Y., and Shimojo, S. (2000). What you see is what you hear. Nature, 408, 788.
Sharma, J., Angelucci, A. and Sur, M. (2000). Induction of visual orientation modules in auditory cortex.
Nature, 404, 841–47.
Simion, F., Leo, I., Turati, C., Valenza, E., and Dalla Barba, B. (2007). How face specialization emerges in
the first months of life. Progress in Brain Research, 164, 169–86.
Stein, B.E., and Meredith, M.A. (1993). The merging of the senses. Cambridge, MA, The MIT Press.
single neuron. Nature Review Neuroscience, 9, 255–66.
Stein, B.E., Meredith, M.A., Huneycutt, W.S. and Mcdade, L. (1989). Behavioral indices of multisensory
integration: orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience,
1, 12–24.
Streeter, L.A. (1976). Language perception of 2-month-old infants shows effects of both innate mechanisms
and experience. Nature, 259, 39–41.
Turkewitz, G. (1994). Sources of order for intersensory functioning. In The development of intersensory
perception: comparative perspectives (eds. D.J. Lewkowicz and R. Lickliter), pp. 3–18. Lawrence Erlbaum
Turkewitz, G., and Kenny, P.A. (1982). Limitations on input as a basis for neural organization and
perceptual development: a preliminary theoretical statement. Developmental Psychobiology, 15,
357–68.
Turkewitz, G., and Kenny, P.A. (1985). The role of developmental limitations of sensory input on sensory/
perceptual organization. Journal of Developmental and Behavioral Pediatrics, 6, 302–306.
von Melchner, L., Pallas, S.L., and Sur, M. (2000). Visual behaviour mediated by retinal projections
directed to the auditory pathway. Nature, 404, 871–76.
Walker-Andrews, A.S. (1986). Intermodal perception of expressive behaviors: relation of eye and voice?
Walker-Andrews, A.S. (1997). Infants’ perception of expressive behaviors: differentiation of multimodal
Walker-Andrews, A.S., Bahrick, L.E., Raglioni, S.S., and Diaz, I. (1991). Infants’ bimodal perception of
gender. Ecological Psychology, 3, 55–75.
Wallace, M.T., Stein, B.E., and Ramachandran, R. (2006). Early experience determines how the senses will
interact: a revised view of sensory cortical parcellation. Journal of Neurophysiology, 101, 2167–72.
Walton, G.E., and Bower, T.G. (1993). Amodal representations of speech in infants. Infant Behavior and
Weikum, W.M., Vouloumanos, A., Navarra, J., Soto-Faraco, S., Sebastián-Gallés, N., and Werker, J.F.
(2007). Visual language discrimination in infancy. Science, 316, 1159.
Welch, R.B., and Warren, D.H. (1980). Immediate perceptual response to intersensory discrepancy.
Werker, J.F., and Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual
reorganization during the first year of life. Infant Behavior and Development, 7, 49–63.
Werker, J.F., and Tees, R.C. (2005). Speech perception as a window for understanding plasticity and
commitment in language systems of the brain. Developmental Psychobiology. Special Issue: Critical
Periods Re-examined: Evidence from Human Sensory Development, 46, 233–34.
Werner, H. (1973). Comparative psychology of mental development. International Universities Press,
New York.
Zangenehpour, S., Ghazanfar, A.A., Lewkowicz, D.J., and Zatorre, R.J. (2009). Heterochrony and cross-species
intersensory matching by infant vervet monkeys. PLoS ONE, 4, e4302.
Chapter 8
The role of intersensory redundancy

in early perceptual, cognitive, and
social development
Lorraine E. Bahrick and Robert Lickliter
In the simplest of terms, attention refers to a selectivity of

response. Man or animal is continuously responding to some
events in the environment and not to others that could be
responded to (or noticed) just as well.
(Donald Hebb 1949, p. 4)
8.1 Introduction
The natural environment provides a flux of concurrent stimulation to all our senses, far more
than can be attended to at any given moment in time. Adults are exquisitely skilled at selectively
attending to specific features or aspects of objects and events, picking out information that is
relevant to their needs, goals, and interests, and ignoring irrelevant stimulation. For example, we
easily pick out a friend in a crowd, follow the flow of action in a ball game, and attend to the voice
of the speaker at a cocktail party in the context of competing conversations. We long ago learned
to pick out human speech from non-speech sounds and parse continuous speech into meaningful
words by ignoring variations across speakers, accents, and intonation. Similarly, we have learned
to parse the visual array into coherent objects and surfaces despite variation due to lighting and
shadow, and interruption of surfaces due to occlusion. These remarkable skills, easily taken for
granted by experienced perceivers, develop rapidly across infancy as a result of ongoing experi-
ence with objects and events (Kellman and Arterberry 1998; Lewkowicz and Lickliter 1994). This
rapid perceptual development entails improving attentional allocation and economy of informa-
tion pick-up for relevant aspects of the environment by attending to meaningful variability while
ignoring meaningless variability (E.J. Gibson 1969, 1988; E.J. Gibson and Pick 2000; Ruff and
Rothbart 1996).
A great deal of research and theory has been devoted to understanding how perception and
learning develop across the first years of life. In contrast, little developmental research has focused
on the processes that guide selective attention to relevant aspects and levels of stimulation in
the first place (see Driver 2001; Pashler 1998; Spence and Driver 2004, for useful reviews of
adult-based research). Like perceptual development, the progressive honing of selective attention
is also the result of ongoing experience with objects and events and provides the basis for further
perceptual learning and exploratory activity. In contrast with later development, the early
184 ROLE OF INTERSENSORY REDUNDANCY IN EARLY PERCEPTUAL, COGNITIVE, AND SOCIAL DEVELOPMENT
development of attentional selectivity is thought to be more influenced by the infant’s sensitivity

to salient properties of stimulation such as contrast, movement, intensity (e.g. Kellman and
Arterberry 1998; Lewkowicz and Turkewitz 1980), and intersensory redundancy (overlapping
information across auditory, visual, tactile, and/or proprioceptive stimulation for properties
of objects and events; see Bahrick 2010; Bahrick and Lickliter 2002). In this chapter, we explore
the powerful role of intersensory redundancy in guiding and shaping early selective attention and,
in turn, perception and learning. We review recent empirical and theoretical efforts to better
understand what guides the allocation of selective attention during early development and we
briefly discuss the implications of early selective attention for perceptual, cognitive, and social
development.
8.2 Perceiving unitary multisensory events in infancy: a

basic bootstrapping problem
The newborn infant faces a significant developmental challenge following birth: how to become
increasingly economical and efficient at attending to multisensory stimulation that is unitary
(coherent across the senses and originating from a single event) and relevant to their needs and
actions, while ignoring stimulation that is less relevant. This is a particularly challenging task, as
the environment provides far more stimulation from multiple objects and events than can be
attended to at any given time, each providing stimulation to multiple sense modalities concur-
rently. The infant must attend to variations in incoming stimulation that are meaningful, rele-
vant, and coherent (e.g. coordinated changes in the face and voice of a single speaker amidst
unrelated changes in other objects, people, and events nearby; goal-directed human actions
amidst irrelevant movements of people, objects, and events) and ignore other variations that are
relatively meaningless (differences in lighting and shadow across cohesive objects, variations in
speaker voice or intonation across the same phoneme). What factors might determine which
information is selected and attended to by young infants and which information is typically
ignored during early development?
Evidence accumulated over several decades of infancy research suggests that selective attention
is more stimulus-driven during early postnatal development and with experience becomes
increasingly endogenous and modulated by top-down processes, including the individual’s goals,
plans, and expectations (see Colombo 2001; Haith 1980; Johnson et al. 1991; Ruff and Rothbart
1996). Thus, for experienced perceivers, prior knowledge, categories, goals, plans, and expecta-
tions typically guide information pick-up (e.g. Bartlett 1932; Chase and Simon 1973; Neisser
1976; Schank and Ableson 1977). What we know and what we expect to happen influence
where we allocate our attention and what information we pick up in the present as well as in
future encounters. What guides this process in young infants, who have little prior knowledge or
experience to rely on in the first months of postnatal life?
8.3 The salience of amodal information during

early development
Amodal information is information that is not specific to a particular sense modality. Rather, it is
information that can be conveyed redundantly across multiple senses, including fundamental
aspects of stimulation such as time, space, and intensity. A large body of research has indicated
that the detection of amodal information such as temporal synchrony, rhythm, tempo, and inten-
sity is a cornerstone of early perceptual development (see Bahrick 2004, in press; Bahrick and
SELECTIVE ATTENTION: THE FOUNDATION FOR PERCEPTION, LEARNING, AND MEMORY 185
Lickliter 2002; Lewkowicz 2000; Lewkowicz and Lickliter 1994). The finding that infants are adept
at perceiving amodal information is consistent with J.J. Gibson’s (1966, 1979) ecological view of
perception, which proposed that the different forms of stimulation available to the senses are not
a problem for perception, but rather provide an important basis for perceiving unitary objects
and events, such as a person speaking or a ball bouncing. Gibson proposed that our senses work
together as a unified perceptual system. For example, by attending to and perceiving amodal
information, there is no need to learn to integrate stimulation across the senses in order to per-
ceive unified objects and events, as proposed by constructivist accounts of early perceptual
and cognitive development (e.g. Piaget 1952, 1954). Perceiving amodal relations, combined with
an increasing sensitivity to the statistical regularities of the environment, effectively ensures
that young inexperienced perceivers preferentially attend to unified multimodal events, such as
people speaking, dogs barking, or keys jingling.
Temporal synchrony is the most fundamental type of amodal information. Temporal synchrony
refers to the simultaneous co-occurrence of stimulation across the senses (e.g. audiovisual) with
respect to onset, offset, and duration of sensory patterning. It is a higher-order, global amodal
property, in that it can be detected only by abstracting information across different sense modali-
ties (e.g. audible and visual changes) over time. Thus, it is inherently relational and abstract.
Furthermore, it facilitates the detection of nested amodal properties such as rhythm, tempo, and
duration across the senses (Bahrick 1992, 1994, 2001; E.J. Gibson 1969). Temporal synchrony has
been proposed as the ‘glue’ that effectively binds stimulation across the senses (see Bahrick and
Lickliter 2002; Bahrick and Pickens 1994; Lewkowicz 2000). For example, by attending to audio-
visual synchrony, the sounds and sights of a single person speaking will be perceived together as a
unified event. Detecting this synchronous information can prevent the accidental association of
unrelated but concurrent sensory stimulation, such as nearby conversations. The ‘ventriloquism
effect’ (Alais and Burr 2004; Radeau and Bertelson 1977; Warren et al. 1981) illustrates the power-
ful role of synchronous amodal information in guiding perception. Because the ventriloquist
moves the dummy’s mouth and body in synchrony with his own speech sounds, he or she creates
amodal information, which promotes the illusion that the dummy is speaking even though the
sound actually emanates from the ventriloquist’s mouth. Amodal information (audiovisual
temporal synchrony, rhythm, tempo, and intensity changes common to the dummy’s movements
and the sounds of speech) promotes the perception of a unitary event—the dummy speaking—
and effectively overrides information about the source or location of the sound. Young infants
show similar sensitivity to synchronous amodal information, even in the first months of life (e.g.
Bahrick 1988; Lewkowicz 1996; Morrongiello et al. 1998). Importantly, once infant attention is
focused on a ‘unitary’ audiovisual event, further perceptual differentiation of the unitary event can
then be promoted. This sequence sets the stage for coherent perceptual processing and in turn
provides a foundation for early cognitive and social development.
8.4 Selective attention: the foundation for perception,

learning, and memory
Attention entails exploratory behaviour such as orienting, eye movements, and active interaction
with the environment (e.g. reaching, head turning). These behaviours, provide continuous
and contingent feedback to our multiple senses. An obvious but nonetheless important insight is
that selective attention to stimulation generated from exploratory activity provides the basis for
what is perceived, learned, and remembered. In turn, what is perceived, learned, and remem-
bered, influence what is attended to in subsequent bouts of exploration, in continuous cycles of
attention ´ perception ´ learning ´ memory ´ attention, and so on. Figure 8.1

illustrates this dynamic system of influences and the fundamental role of selective attention for
perception, learning, and memory. Moreover, action is tightly coupled with these processes,
as exploratory activity provides new stimulation for attention, perception, learning, and memory
across continuous feedback loops (see Fig. 8.1; Adolph and Berger 2005; E.J. Gibson 1988; E.J.
Gibson and Pick 2000; von Hofsten 1983, 1993). This cycle can be characterized as a system of
dynamic, interactive influences that evolve over time, with concurrent changes in neurodevelop-
ment that go hand-in-hand with perception and action (see Adolph and Berger 2006; E.J. Gibson
1988; Thelen and Smith 1994, for discussion of such systems).
Surprisingly little scientific effort has been devoted to the study of attentional selectivity in
infancy (see Colombo 2001; Ruff and Rothbart 1996 for overviews), despite its obvious impor-
tance for perceptual, cognitive, social, and linguistic development. However, in recent years
investigators working at the neural, physiological, and behavioural levels of analysis have begun
to provide new insights into the nature and processes that guide attentional allocation to unimo-
dal and multimodal stimulation during early development (e.g. Hollich et al. 2005; Hyde et al.
Stimulation available
for exploration:
environmental and
self-stimulation
Action/
exploratory
activity
Perception
Attention Learning
Memory
Fig. 8.1 The critical role of selective attention in the development of perception, learning and
memory is depicted in two interrelated, concurrent feedback loops: (a) the attention–perception–
learning–memory system, and (b) the attention–perception–action system. The arrows represent the
primary direction of the flow of information. Selective attention to stimulation that results from
exploratory activity provides the basis for what is perceived, what is perceived provides the basis for
what is learned, and in turn what is remembered. This sequence in turn affects what is attended to
next and in subsequent encounters with similar stimulation. Perception is tightly coupled with
action via selective attention to the stimulation generated from exploratory activity in a continuous
feedback loop. (Reproduced from Theories of Infant Development, Bremner, and A. Slater, The
development of perception in a multimodal environment, Bahrick, L.E., pp. pp. 90–120 © 2004,
John Wiley & Sons, Ltd., with permission.)
THE INTERSENSORY REDUNDANCY HYPOTHESIS 187
2009; Reynolds et al. 2010; Richards et al. 2010). This work emphasizes the salience of multimodal
stimulation for early attention allocation. It is clear that infants quickly establish efficient patterns
for selectively attending to relevant and coherent aspects of the environment, and these patterns
become increasingly efficient with experience, eventually evolving into the expert patterns of
adult selective attention. A central issue for developmental science is to uncover what principles
govern this process. We have proposed and provided support for the intersensory redundancy
hypothesis (IRH), a framework describing four general principles that we think guide this devel-
opmental process (Bahrick 2010; Bahrick and Lickliter 2000, 2002; Bahrick et al. 2004a; Lickliter
and Bahrick 2004). These principles are all an outcome of infants’ sensitivity to intersensory
redundancy in attentional allocation, perceptual processing, and learning and memory during
the first months of postnatal life. A large body of research indicates that intersensory redundancy
promotes attention and perceptual processing of some properties of stimulation at the expense of
others, particularly in early development when attentional resources are most limited. We have
argued that this has a profound effect on the nature and trajectory of early development (Bahrick
and Lickliter 2002).
8.5 The intersensory redundancy hypothesis: a framework

for early perceptual development
Intersensory redundancy is provided by an event when the same amodal information (rhythm,
tempo, intensity changes) is simultaneously available and temporally synchronized across two or
more sense modalities. For example, when the rhythm and tempo of speech can be perceived by
looking and by listening, the rhythm and tempo are redundantly specified. Most naturalistic,
multimodal events provide intersensory redundancy for multiple properties (e.g. tempo, rhythm,
duration, intensity). By definition, only amodal properties (as opposed to modality-specific
properties) can be redundantly specified across the senses. Typically, a given event (such as a
person speaking) also provides non-redundant modality-specific information, such as the appear-
ance of the face, the colour of clothing, and the specific acoustic qualities of the voice. What
guides selective attention to these various properties of events during bouts of exploration?
Infant-based research consistently indicates that redundancy across the senses (both global and
nested amodal properties) promotes attention to redundantly specified properties of objects and
events at the expense of other (non-redundantly specified) stimulus properties, particularly in
early development when attentional resources are most limited (e.g. Bahrick and Lickliter 2000,
2002; Bahrick et al. 2010; Lewkowicz 2000; Lickliter and Bahrick 2004). Later in development,
attention is extended to less salient, non-redundantly specified properties. Factors such as com-
plexity, familiarity, the length of exploratory time, and the level of expertise of the perceiver can
affect the speed of progression through this salience hierarchy.
The IRH (Bahrick and Lickliter 2000, 2002) is a model of how selective attention guides early
perceptual development. It provides a framework for understanding how and under what condi-
tions attention is allocated to amodal versus modality-specific aspects of stimulation. The IRH
addresses how young infants, with no prior knowledge of the world, rapidly come to perceive
unitary events and attend to stimulation that is relevant to their needs and actions. Although the
IRH is primarily a framework for describing the early development of attention and intermodal
perception, the principles can also apply across the lifespan, particularly when attentional
resources are limited (for example, for difficult tasks or conditions of high cognitive load).
The IRH consists of four specific predictions. Two predictions address the nature of selective
attention to different properties of objects and events. The remaining two are developmental
predictions that address implications of the IRH across the life span. The first prediction describes
the salience of redundantly specified amodal properties in multimodal synchronous stimulation

(intersensory facilitation). The second describes the salience of non-redundantly specified,
modality-specific properties in unimodal stimulation (unimodal facilitation). The third predic-
tion holds that across development infants become more efficient and flexible processors, leading
to detection of both redundantly and non-redundantly specified properties in unimodal and
multimodal stimulation. The fourth prediction holds that intersensory and unimodal facilitation
are most pronounced for tasks of relatively high difficulty in relation to the expertise of the
perceiver, and thus are likely apparent under some conditions across the lifespan.
These four predictions have been supported by empirical studies with both human and
non-human animal infants and across non-social and social domains. Below we describe the four
predictions of the IRH in more detail and briefly review the research findings that support each
prediction. Table 8.1 provides a summary of the convergent research findings from our labs that
have supported each of the four predictions to date.
8.5.1 Prediction 1: intersensory facilitation

Redundantly specified, amodal properties are highly salient and detected more easily in bimodal
synchronous stimulation than are the same amodal properties in unimodal stimulation.
According to the first and most fundamental prediction of the IRH, intersensory redundancy (the
synchronous alignment of stimulation across two or more senses) recruits infant attention to
redundantly-specified properties of events (amodal properties such as tempo, rhythm, duration,
and intensity), effectively causing them to become ‘foreground’ and other stimulus properties to
become ‘background’ in a given bout of exploration. For example, intersensory redundancy has
been shown to be so salient that it allows young infants to selectively attend to one of two super-
imposed events while ignoring the other. When the soundtrack to one film (e.g. a hand striking
the keys of a toy xylophone) is presented to 4-month-old infants, they can selectively follow the
flow of action even when the film is superimposed on another film (e.g. a toy slinky or a hand-
clapping game). In other words, the sound-synchronized film appears to ‘pop out’ and become
foreground while the silent film becomes background, indicated by the fact that infants respond
to the background film as novel when given a novelty preference test (Bahrick et al. 1981).
Research from our respective labs (see Table 8.1) has demonstrated that intersensory redun-
dancy promotes enhanced attention and perceptual discrimination and learning in human and
non-human animal infants (Bahrick and Lickliter 2000, 2002; Bahrick et al. 2002a; Flom and
Bahrick 2007, 2010; Lickliter et al. 2002, 2004). For example, in the domain of non-social events,
human infants detect the rhythm and tempo of a toy hammer tapping when they experience the
synchronous sights and sounds together (providing intersensory redundancy), but not when they
experience the rhythm or tempo in one sense modality alone or when the sights and sounds are
presented out of synchrony (providing no intersensory redundancy, see Bahrick and Lickliter
2000; Bahrick et al. 2002a). In particular, our habituation studies demonstrate that infants show
visual recovery to a change in tempo or a change in rhythm only in the context of redundant
stimulation (i.e. when they can see and hear the hammer tapping in synchrony). In contrast, they
show no visual recovery to this change in tempo or rhythm in the context of non-redundant
stimulation (i.e., unimodal visual or asynchronous audiovisual hammer tapping). Our data also
suggest evidence of intersensory facilitation for detection of tempo changes in adult participants
(Bahrick et al. 2009a). Research from other laboratories has also provided support for intersen-
sory facilitation. For example, four-month-old infants have been shown to detect the serial order
of events in synchronous audiovisual but not unimodal auditory or unimodal visual stimulation
(Lewkowicz 2004). Seven-month-old infants can detect numerical information in audiovisual
Table 8.1 Studies providing support for the intersensory redundancy hypothesis. Most studies listed under ‘Developmental change’ also include specific age
groups which provide support for ‘Intersensory facilitation’ or ‘Unimodal facilitation’ as well. Thus these studies are entered again under the respective cate-
gories. Under findings, type of stimulation is defined as: A, unimodal auditory (soundtrack accompanied by static image); V, unimodal visual (dynamic visual,
no soundtrack); AV, bimodal dynamic audiovisual; Sync, synchronous (soundtrack temporally aligned with video); Async, asynchronous (sound unsystemati-
cally out of phase with video). All infant studies were conducted using infant-control habituation procedures, and some also included two-choice preference
phases. Discrimination is inferred from visual recovery to a change in infant-control habituation procedures, as well as from preference data. All studies test-
ing bobwhite quail were conducted using two-choice preference procedures. Studies testing children and adults were conducted using forced-choice judg-
ments. Recognition memory is inferred from two-choice novelty preference data and forced-choice judgments
Prediction Study Event property Domains assessed Stimuli Subjects Findings

1. Intersensory Bahrick and Rhythm Discrimination Non-social Infants: 5 months Discriminate rhythm change in AV-sync but not
facilitation Lickliter (2000) (hammer tapping) V, A, or AV-async
Bahrick et al. Tempo Discrimination Non-social Infants: 3 months Discriminate tempo change in AV-sync but
(2002a) (hammer tapping not V or A
fast vs slow)
Lickliter et al. Multiple Discrimination, Social Quail embryos Learn and remember call in AV-sync but not A
(2002) temporal recognition memory (maternal and chicks or AV-async across prenatal to postnatal life
properties call A vs B)
THE INTERSENSORY REDUNDANCY HYPOTHESIS

Lickliter et al. Multiple Discrimination, long- Social Quail embryos Learn and remember AV-sync but not A
(2004) temporal term recognition (maternal and chicks across 4 days
properties memory call A vs B)
Lickliter et al. Multiple Discrimination, Social Quail embryos Discriminate a maternal call with AV-sync→A
(2006) temporal recognition memory, (maternal and chicks exposure, but not with A→AV-sync exposure or
properties educating attention call A vs B) AV-async →A exposure
Castellanos Tempo Discrimination, Non-social Infants: 2 months Discriminate tempo in V habituation following
et al. (2006) educating attention (hammer tapping AV-sync but not V pre-exposure
fast vs slow)
Flom and Affect Discrimination Social Infants: 3, 4, 3 months, no discrimination
Bahrick (2007) (women 5, 7 months 4 months, discriminate AV-sync
speaking- happy, 5 months, discriminate AV-sync, A
sad, angry) 7 months, discriminate AV-sync, A, V
(Continued )
189
190
ROLE OF INTERSENSORY REDUNDANCY IN EARLY PERCEPTUAL, COGNITIVE, AND SOCIAL DEVELOPMENT
Table 8.1 (continued) Studies providing support for the intersensory redundancy hypothesis
Prediction Study Event property Domains assessed Stimuli Subjects Findings

Bahrick, Todd Tempo Discrimination, Non-social Adults More correct responses (same/different
et al. (2009) recognition memory (hammer tapping, judgments) for AV-sync than V; main effect of
four tempos) task difficulty
Jaime et al. Synchrony Discrimination, Social (maternal Quail embryos Onset AV-sync sufficient to enhance learning of
(2010) educating attention call A vs B) and chicks maternal call
Bahrick et al. Prosody of Discrimination; Social (women Infants: 4 months Discriminate approval vs prohibition in AV-sync
(submitted) speech categorization speaking: but not A or AV-async
approval vs
prohibition)
Vaillant-Molina Object-affect Discrimination, Social (contingent Infants: 5.5 Social referencing: discriminate affective
and Bahrick relations recognition memory robot vs pony) months expressions and the object to which they refer in
(submitted) AV-sync but not V; prefer to touch objects
paired with happy expression
Castellanos Prosody of Discrimination, Social (women Infants: 3 months Discriminate prosody in A habituation following
and Bahrick speech educating speaking AV-sync but not A or AV-async pre-exposure
(2008) attention -approval vs
prohibition)
2. Unimodal Bahrick et al. Pitch of voice Discrimination Social 3, 4 months 3 months, discriminate voice in A but not
facilitation (2005) (women AV-sync
speaking) 4 months, discriminate voice in A and AV-sync
Bahrick et al. Orientation Discrimination Nonsocial Infants: 3,5, 8 3 months and 5 months, discriminate
(2006) (hammer tapping months orientation in V but not AV-sync
up vs down) 8 months, discriminate in V and AV-sync
3 months, controls discriminate in AV-async
Vaillant et al. Pitch Discrimination Social (maternal Quail embryos Detect a change in pitch following A but not
(2009) call pitch A vs B) and chicks AV-sync exposure
Bahrick, Facial Discrimination, Social Children: 4 yrs Face discrimination and memory in V and
Krogh- configuration recognition memory (women AV-async but not AV-sync
Jesperson, et al. speaking)
(submitted)
Flom and Orientation Discrimination, Non-social Infants:3, 5, 9 Memory across a 1-month delay for V but not
Bahrick (2010) long-term memory (hammer tapping months AV-sync at 5 months
up vs down)
3. Bahrick and Rhythm and Discrimination Nonsocial Infants: 3, 5, Older infants discriminate tempo and rhythm
Developmental Lickliter (2004) tempo (hammer tapping) 8 months change in AV-sync and V; younger infants
change discriminate in AV-sync but not V
Bahrick et al. Pitch of voice Discrimination Social (women Infants: 3, 3 months, discriminate voice in A but not
(2005) speaking) 4 months AV-sync
4 months, discriminate voice in A and AV-sync
Bahrick et al. Orientation Discrimination Non-social Infants: 3, 3 months and 5 months, discriminate V but not
(2006) (hammer tapping 5, 8 months AV-sync
up vs down) 8 months, discriminate V and AV-sync
3 months, controls discriminate AV-async
Flom and Affect Discrimination Social Infants: 3, 4, 3 months, no discrimination
Bahrick (2007) (women 5, 7 months 4 months, discriminate AV-sync
speaking-happy, 5 months, discriminate AV-sync, A
THE INTERSENSORY REDUNDANCY HYPOTHESIS

sad, angry) 7 months, discriminate AV-sync, A, V
Flom and Orientation Discrimination, Non-social Infants: 3, 5, Memory across a 1-month delay emerged by
Bahrick (2010) long-term memory (hammer tapping 9, months 5 months for V (but not AV-sync) and by 9 months
up vs down) for V and AV-sync; memory was expressed as
shifting preference (novelty-null-familiarity) across
retention time
4. Task Bahrick et al. Tempo Discrimination Non-social Infants: 5 months For difficult tasks, older infants discriminate
difficulty and (2010) (hammer tapping tempo in AV-sync but not V; for easy tasks older
expertise fast vs slow) infants discriminate tempo in AV-sync and V
Bahrick, Todd Tempo Discrimination, Non-social Adults More correct responses (same/different
et al. (2009) recognition (hammer tapping judgments) for AV-sync than V; main effect of
memory 4 tempos) task difficulty
191
sequences of faces and voices developmentally earlier than in auditory or visual sequences alone
(Jordan et al. 2008).
Similar effects are also found for perception of social events. Detection of emotion and prosody
of speech is primarily supported by amodal information, including changes in tempo, temporal
patterning, and intensity of facial and vocal stimulation. Four-month-old infants can detect
a change in prosody (from approval to prohibition, or vice versa) in bimodally synchronous
audiovisual speech, but not in unimodal auditory or asynchronous audiovisual speech (Bahrick,
Castellanos, et al. submitted). Similarly, four-month-old infants can detect a change in the affect
of a woman speaking (happy, sad, angry) in synchronous audiovisual speech, but not in unimo-
dal visual or asynchronous audiovisual speech (Flom and Bahrick 2007). Moreover, social refer-
encing appears to emerge in the context of intersensory redundancy. Infants of 5.5 months can
detect the relation between a woman’s emotional expression (happy versus fearful) and the object
to which it refers when presented with audiovisual redundancy (synchronous audiovisual speech)
speech, but not in the absence of redundancy (unimodal visual speech). They also preferentially
touch the three-dimensional object previously paired with the happy expression (Vaillant-Molina
and Bahrick, 2012). Taken together, these findings demonstrate the powerful role of intersensory
redundancy in directing infant selective attention and perceptual processing to amodal properties
in both social and non-social events. These parallel findings across social and non-social events
indicate that attention and perceptual processing in both the social and non-social domains is
governed by the same domain general processes, arguing against the view that perception of social
events is a function of domain-specific mechanisms (see Bahrick and Todd, in press).
Studies of non-human animals have also found support for intersensory facilitation, even dur-
ing the prenatal period of development. Following redundant audiovisual prenatal stimulation
(where a synchronized light and a maternal call were presented to embryos), quail embryos
learned an individual maternal call four times faster and remembered the individual call four
times longer into postnatal development than when they heard the maternal call alone or when
the call and light were presented out of synchrony (Lickliter et al. 2002, 2004). As can be seen from
Tab. 8.1, Prediction 1 of the IRH (intersensory facilitation) has received empirical support across
diverse participants (human infants, children, and adults; animal embryos and neonates) and
stimulus properties, including tempo, rhythm, affect, prosody, and temporal patterning in both
social and non-social events.
Intersensory redundancy has also been shown to ‘educate attention’ to amodal properties of
events, much like transfer of training effects. Once intersensory redundancy directs attention to
amodal properties in multimodal stimulation, infants appear able to detect these same amodal
properties in subsequent unimodal stimulation, at younger ages and under exposure conditions
that would otherwise not support the detection of amodal properties in unimodal stimulation.
Studies of bobwhite quail embryos and chicks illustrate this effect. Lickliter et al. (2006) found
that quail chicks showed no preference for a familiarized maternal call when they had received
relatively brief prenatal unimodal auditory familiarization. In contrast, by first exposing embryos
to the redundant audiovisual presentation of the maternal call (call synchronized with flashing
light) followed by a unimodal auditory presentation (bimodal unimodal), chicks showed a
significant preference for the familiar auditory maternal call two days after hatching. Embryos
who received the reverse sequence of exposure to the maternal call (unimodal bimodal)
showed no preference for the familiarized maternal call in postnatal testing. Intersensory
redundancy (in bimodal stimulation) apparently highlighted the temporal features of the call
and then ‘educated attention’ to these temporal features in subsequent unimodal stimulation.
This education of attention to redundant temporal properties was effective even after delays of
2 or 4 hours between initial bimodal stimulation and subsequent unimodal stimulation (Lickliter
et al. 2006).
A recent study investigated the role of educating attention in perceptual learning at a more fine-
grained level of analysis. Jaime et al. (2010) found that audiovisual temporal synchrony occurring
only at the onset (first note) of the five-note bobwhite maternal call was sufficient to facilitate
enhanced perceptual learning of the call when compared to unimodal exposure. Indeed, onset
synchrony (the visual stimulus is present only with the first note and not subsequent notes in the
call burst) was just as effective as full synchrony across all five notes of the call in facilitating pre-
natal learning in quail embryos. Apparently, intersensory redundancy created by a single flash
synchronized with the first note of the maternal call effectively educated attention to the temporal
patterning of the entire call that followed. In contrast, embryos that received alternating auditory
and visual stimulation (rather than synchronous exposure) showed no perceptual learning fol-
lowing hatching (Jaime et al. 2010).
Studies of human infants have shown parallel findings. Four month-old infants detect a change
in the tempo of the toy hammer tapping in unimodal visual stimulation only if they had received
a brief pre-exposure to redundant (synchronous audiovisual), but not to non-redundant (uni-
modal visual or asynchronous audiovisual) stimulation from the hammer tapping (Castellanos
et al. 2006). Similar results were found for infant detection of prosody in speech (Castellanos and
Bahrick 2008). Thus, attention can be ‘educated’ to amodal properties of events in unimodal
stimulation by pre-exposing infants to salient intersensory redundancy, which serves to highlight
amodal properties. It appears that infants continue to detect those same amodal properties in
familiarized events, even when redundancy is eliminated. Educating attention and shifting explo-
ration from multimodal to unimodal and vice versa is a fundamental and practical process,
important for attention allocation in the natural environment. For example, as the mother speaks,
her synchronous face and voice may be visible to her infant, but when she turns away while speak-
ing, only her voice is audible. This process of ‘educating attention’ likely serves as a central means
by which infants extend their detection of amodal properties from multimodal to unimodal
events.
Taken together, our results from human and animal infants indicate that the intersensory
redundancy available in bimodal stimulation plays a key role in organizing early selective atten-
tion, and in turn in directing early perception, learning, and memory. In particular, the evidence
indicates that redundancy can facilitate attention to amodal properties such as the rhythm,
tempo, and temporal patterning of audible and visible stimulation when compared with the same
properties experienced in only one sense modality. Moreover, the finding that intersensory
facilitation is observed in bimodal synchronous but not bimodal asynchronous conditions (where
the overall amount and type of stimulation are equated) effectively rules out alternative hypoth-
eses: that increased levels of overall arousal or simply receiving stimulation in two different
modalities could be the basis for demonstrations of intersensory facilitation.
It is important to emphasize that our findings supporting intersensory facilitation do not sug-
gest that intersensory redundancy is always better for perception or learning than unimodal
stimulation nor that, as some studies suggest (e.g. Shams and Seitz 2008), it is superior for percep-
tion of all stimulus properties. Rather, intersensory redundancy promotes attention to certain
properties of stimulation (amodal) at the expense of other properties (modality specific). Given
that the environment provides far too much stimulation to attend to at any given time and that
intersensory redundancy is high on the infant’s salience hierarchy, it can play a powerful role in
regulating and constraining which aspects of stimulation are attended to, particularly early in
development when attention resources are most limited. However, in bouts of exploration of the
natural environment, intersensory redundancy is not always available. Prediction 2 of the IRH
describes which properties of events are attended to when intersensory redundancy does not
compete for attention—in unimodal stimulation.
8.5.2 Prediction 2: unimodal facilitation

Non-redundantly specified, modality specific properties are more salient and detected more easily in
unimodal stimulation than are the same properties in bimodal, synchronous stimulation (where
redundantly specified amodal properties compete for attention).
According to the second prediction of the intersensory redundancy hypothesis, in conditions of

unimodal stimulation attention is selectively directed to non-redundantly specified properties
such as colour, pattern, timbre, or pitch to a greater extent than in multimodal stimulation. This
‘unimodal facilitation’ occurs in part because there is no competition for attention from salient
intersensory redundancy. Particularly in early development, a given event typically provides sig-
nificantly more stimulation than can be attended to at any one time, and thus redundantly and
non-redundantly specified properties within the same event compete for an infant’s attention.
Because redundantly specified properties are more salient, they typically capture attention at the
expense of modality-specific properties. For example, a young infant exploring a person speaking
might selectively attend to amodal properties such as the prosody of speech (comprised of rhythm,
tempo, and intensity patterns) at the expense of modality-specific properties such as the appear-
ance of the person, the colour of their clothing, or the specific nature of their voice. In contrast,
when salient redundancy is unavailable, as when the person is silent, attention is free to focus on
non-redundant, modality specific properties available in unimodal visual stimulation. Under
these conditions we would expect to observe unimodal facilitation and enhanced attention to the
appearance of the individual.
Consistent with this second prediction of the IRH, research has shown that in early develop-
ment unimodal stimulation selectively recruits attention and promotes the perceptual processing
of non-redundantly specified, modality-specific properties more effectively than does redundant,
audiovisual stimulation. The findings of our studies to date supporting this prediction are sum-
marized in Table 8.1 (Prediction 2). For example, Bahrick et al. (2006) found that 3- and 5-month
olds can discriminate a change in the orientation of a toy hammer tapping against a surface
(upward versus downward) when they could see the hammer tapping (unimodal visual) but not
when they could see and hear the natural synchronous audiovisual stimulation. This latter condi-
tion provided intersensory redundancy, which presumably attracted attention to redundantly
specified amodal properties such as rhythm and tempo and interfered with attention to visual
information such as the direction of motion or orientation of the hammer. An asynchronous
control condition eliminated intersensory redundancy but equated overall amount and type of
stimulation with the bimodal synchronous condition. Instead of impairing perception of orienta-
tion, asynchronous bimodal stimulation enhanced infant perception of orientation when com-
pared with synchronous bimodal stimulation (Bahrick et al. 2006). Consistent with the predictions
of the IRH, asynchronous sights and sounds resulted in heightened discrimination on a par with
that of unimodal visual stimulation. Studies assessing infant discrimination of faces and voices
have found similar results (see Table 8.1; Bahrick et al. 2004b; Bahrick et al. 2005). Young infants
discriminated between faces in unimodal visual stimulation and voices in unimodal auditory
stimulation, but did not discriminate these stimuli in synchronous audiovisual stimulation.
These findings of unimodal facilitation have also been extended to memory for faces in 4-year-old
children (Bahrick, Krogh-Jesperson, et al. submitted). Unimodal facilitation is thus seen in
human infants and children and for both social and non-social events.
Parallel studies with bobwhite quail have also demonstrated unimodal facilitation for detection
of modality-specific properties of stimulation (Vaillant et al. 2009). Quail embryos were exposed
to an individual maternal call either unimodally (auditory only) or bimodally (redundant light
and call) on the day prior to hatching. Following hatching, chicks were tested between the familiar
version of the maternal call that was presented prenatally versus the same maternal call with a
pitch alteration (raised by one and a half notes, with all other acoustic features held constant).
Results revealed that only chicks that had received unimodal exposure to the maternal call as
embryos preferred the familiar call over the acoustically modified call following hatching. Chicks
that received redundant audiovisual exposure as embryos failed to discriminate the change
in pitch, showing no preference between the normal and modified versions of the maternal call
during postnatal testing. These results provide evidence that unimodal facilitation for modality-
specific properties of stimulation is promoted when salient intersensory redundancy is eliminated
and attention is free to focus on the information conveyed by a single sense modality. As sum-
marized in Table 8.1, unimodal facilitation appears to be a general principle of early attention and
perceptual processing, applicable across species (human and avian), domains of stimulation
(social and non-social), and developmental periods (prenatal, infancy, and childhood).
8.5.3 Prediction 3: developmental improvement

in selective attention
Across development, infants’ increasing perceptual differentiation, efficiency of processing, and flexi-
bility of attention lead to the detection of both redundantly and non-redundantly specified properties
in unimodal, nonredundant and bimodal, redundant stimulation.
As infants become older and more experienced, their processing speed increases, perceptual dif-
ferentiation progresses, and attention becomes more efficient and flexible (see E.J. Gibson 1969,
1988; Ruff and Rothbart 1996). For example, older infants habituate more quickly to stimuli,
produce shorter looks, more shifting between targets, and can discriminate changes in objects and
events with shorter processing times (e.g. Colombo 2001, 2002; Colombo and Mitchell 1990;
Colombo et al. 1991; Frick et al. 1999; Hale 1990; Hunter and Ames 1988; Rose et al. 2001). These
changes, along with the accumulation of experience differentiating amodal and modality-specific
properties in the environment, promote older infants’ ability to detect both redundantly and
non-redundantly specified properties in unimodal and bimodal stimulation within an episode of
exploration. Furthermore, attention typically progresses from the most salient to increasingly less
salient properties across exploratory time (see Bahrick 2010; Bahrick et al. 2002a; Craik and
Lockhart 1972). Therefore, as perceptual learning and economy of information pick-up improve,
more attentional resources are available for detecting information from multiple levels of the sali-
ence hierarchy, progressing from more to less salient. This pattern is consistent with the principle
of increasing specificity, originally proposed by E.J. Gibson (1969) as a cornerstone of perceptual
development. The principle of increasing specificity proposes that differentiation progresses from
abstract and global information to increasingly more specific information across development.
For example, infants show shifts from detection of global (general) to more local (detailed)
information (Frick et al. 2000), from detection of actions and information about object function
to more specific information about the appearance of objects (Bahrick et al. 2002b; Bahrick and
Newell 2008; Oakes and Madole 2008; Xu et al. 2004), and from the detection of more global
(general) amodal audiovisual relations to more specific amodal audiovisual relations across
exploratory time. Given that patterns of attentional selectivity across exploratory time provide the
foundation for infant perceptual and cognitive development, parallel shifts from detection of
global to specific information are evident across exploratory time and across developmental time
(see Bahrick 1992, 1994, 2001; Morrongiello et al. 1998).
Studies testing predictions of the IRH have provided findings consistent with this progression
for both social and non-social events. These studies are summarized in Table 8.1 (Prediction 3).
For example, with only a few months’ additional experience, infants viewing the toy hammer
events described earlier detect redundantly specified properties such as rhythm and tempo
(Bahrick et al. 2006) and non-redundantly specified properties such as orientation (Bahrick and
Lickliter 2004) presented via both unimodal visual and bimodal synchronous stimulation.
Moreover, evidence indicates that the developmental progression for unimodal facilitation
extends to the domain of memory. For example, we found that at 5 months of age, infants could
detect and remember the orientation of the toy hammer in unimodal visual presentations (but
not in bimodal audiovisual presentations) following a one month retention interval. By the age of
9 months, infants could detect and remember the orientation after a 1-month retention interval
in both unimodal visual and bimodal audiovisual presentations (Flom and Bahrick 2010). Thus,
attention becomes more flexible across development and, consistent with our view that selective
attention provides a foundation for perception, learning, and memory (see Fig. 8.1), effects
of unimodal facilitation of attention extend across the domains of perception, learning, and
memory and persist across considerable retention intervals (at least 1 month in young infants).
Parallel developmental progressions have been found in the social domain. Although 4-month-
old infants can detect affect only in synchronous audiovisual speech, by 5 months of age they
detect affect in synchronous audiovisual speech as well as unimodal auditory speech. By the age
of 7 months, infants can detect affect in synchronous audiovisual speech, unimodal auditory, and
unimodal visual speech (Flom and Bahrick 2007). Thus, patterns of intersensory facilitation and
unimodal facilitation (described by Prediction 1 and Prediction 2 of the IRH) that are apparent
in early development become less apparent in later development as infants accumulate additional
experience with objects and events and their attention becomes more flexible and efficient. As
discussed earlier, research with both human and animal infants suggests that one avenue for this
developmental improvement is the ‘education of attention’ (see E.J. Gibson 1969; Zukow-
Goldring 1997, for further discussion of this concept.)
The third prediction of the IRH proposes that patterns of intersensory facilitation and uni-
modal facilitation become less evident across development as discrimination capabilities
improve and events become more familiar. However, if tasks are made sufficiently difficult to
challenge older perceivers, then the patterns of intersensory facilitation and unimodal facilita-
tion predicted by the IRH should also be apparent. In other words, if task difficulty or cognitive
load is increased, we predict that the patterns of facilitation described by the IRH would not
disappear across age. Instead, facilitation effects would simply become less evident as individu-
als become more efficient and skilled at perceiving objects and events with experience.
Prediction 4 of the IRH, the most recent prediction, describes the rationale and conditions
under which intersensory facilitation and unimodal facilitation should be most evident in later
stages of development.
8.5.4 Prediction 4: facilitation across development:

task difficulty and expertise
Intersensory and unimodal facilitation are most pronounced for tasks of relatively high difficulty in
relation to the expertise of the perceiver, and thus should be apparent across the lifespan.
As discussed earlier, we have proposed that continued exposure to an event promotes perceptual
differentiation, likely in order of salience, such that more salient properties are differentiated first
and the differentiation of less salient properties requires longer processing time. Furthermore,
perceptual differentiation of event properties may, in turn, enhance efficiency and flexibility of
attention by fostering more rapid detection of previously differentiated properties in subsequent
encounters and more flexible attentional shifting among familiar properties (see Ruff and
Rothbart 1996). Thus, the degree of intersensory and/or unimodal facilitation observed for an
individual should in large part be a function of exposure/familiarization time (which promotes
perceptual learning) and task difficulty (defined in measurable units such as differences in inten-
sity, size, or temporal parameters) in relation to the expertise (i.e. cognitive and perceptual skills
developed through cumulative experience) of the perceiver. (Note that in early development,
expertise roughly covaries with infant age, particularly for general perceptual skills resulting from
attention to ordinary events.) Early development is a period during which task demands are typi-
cally high. Infants are relatively naïve perceivers of events, and therefore the perceptual processing
of most events is likely rather difficult and effortful. Consequently, the effects of intersensory
redundancy should be most pronounced in early development. However, because perceptual
learning and differentiation occur across the lifespan, intersensory facilitation should also be
evident in later development when the task demands are high. Children and adults continue to
develop expertise, acquiring new information and learning to perceive finer distinctions such as
learning a new language, playing a new musical instrument, or becoming skilled at identifying
birds, dinosaurs, or aeroplanes. In early stages of learning, expertise is low in relation to task dif-
ficulty, and consequently task demands are high. The IRH predicts that when task demands are
high, and attention therefore progresses more slowly along the salience hierarchy, children and
even adults should experience intersensory facilitation and unimodal facilitation. Thus, when
learning new material that challenges their skill level, intersensory and unimodal facilitation
should be observed. Similarly, when cognitive load is high and attentional resources are taxed,
such as under conditions of divided attention (‘multi-tasking’), conditions that require greater
self-regulation, executive function, or higher effort, unimodal and intersensory facilitation should
also be apparent in older perceivers.
Research findings, including studies of adult perception (e.g. Kaplan and Berman 2010; Lavie
1995, 2005) are consistent with this view. Studies with infants and children across a variety of
domains, including motor and cognitive development, indicate that under conditions of higher
task difficulty and cognitive load, performance often reverts to that of earlier stages of develop-
ment (e.g. Adolph and Berger 2005; Berger 2004; Corbetta and Bojczyk 2002). For example,
Berger (2004) found that on a locomotor A-not-B task, 13-month-olds regressed when cognitive
load was increased, demonstrating perseverative behaviours characteristic of younger infants.
Research generated from predictions of the IRH has directly tested this hypothesis (see Table 8.1,
Prediction 4, for a summary). We found that by the age of 5 months, infants no longer show
intersensory facilitation for discrimination of simple tempo changes, as their performance was
apparently at a ceiling for both unimodal and bimodal audiovisual presentations (Bahrick and
Lickliter 2004). However, by increasing task difficulty (requiring finer tempo discriminations),
we were able to reinstate intersensory facilitation. Specifically, in the more difficult tempo dis-
crimination task, 5-month-olds showed intersensory facilitation comparable to that shown by
3-month-olds in the simpler discrimination task (Bahrick et al. 2010). Data collection with adults
is currently underway on this topic and findings thus far indicate intersensory facilitation in adult
perceivers under conditions of high task difficulty (Bahrick et al. 2009a). Similar research with
adults has also demonstrated that bimodal cues capture spatial attention more effectively than
unimodal cues under conditions of perceptual load (Santangelo et al. 2008; Santangelo and
Spence 2007; Spence 2010), again suggesting that multimodal information plays a key role in
directing attention in demanding events or situations.
If findings of intersensory facilitation and unimodal facilitation hold up across studies

of adults, this would suggest important applications to educational theory, particularly when
children or adults are attending to new or difficult information or when cognitive load is high.
8.6 Mechanisms of perceptual development: attentional

biases, salience hierarchies and developmental change
Taken together, the findings reviewed in this chapter reveal an attentional trade-off during early
development such that under conditions of multimodal stimulation amodal properties are more
salient and modality-specific properties less so, whereas in unimodal stimulation, modality-spe-
cific properties are more salient and amodal properties are less salient. Because most events are
multimodal and because intersensory redundancy is highly salient to young infants, there is a
general processing advantage for amodal over modality-specific properties in early development.
This processing priority for amodal over modality-specific properties is likely the case both in a
given bout of exploration as well as across development.
We have proposed and provided evidence for the notion of attentional salience hierarchies.
Selective attention to various properties of stimulation is allocated in order of attentional salience.
The most salient properties are attended to first, and as exploration continues, attention is then
allocated to increasingly less salient properties (Bahrick 2010; Bahrick et al. 2002a; Bahrick and
Lickliter 2002; Bahrick and Newell 2008). Thus, in the context of a given episode of multimodal
exploration, selective attention is likely first allocated to the most salient amodal properties of
stimulation, followed later by less salient modality-specific properties. Given limited attentional
resources in early infancy, exploration may often be interrupted before attention progresses to
less salient properties of stimulation. As processing becomes more efficient with age and experi-
ence, attention progresses down the hierarchy more rapidly, and less salient properties can be
attended to with increasing frequency and duration. This results in an attentional salience hierar-
chy across development that parallels the hierarchy observed across individual episodes of explo-
ration. On balance, in early development, attention to salient amodal properties will have a
history of greater frequency, duration, and earlier processing across episodes of exploration, as
compared with attention to less salient modality-specific properties. Research documenting ear-
lier detection of amodal than modality-specific properties across age, and detection of amodal but
not modality-specific properties within age, supports these developmental predictions (Bahrick
1988, 1992, 1994, 2001; Gogate and Bahrick 1998; Hernandez-Reif and Bahrick 2001). However,
research has yet to document perceptual processing sequences illustrating attentional salience for
amodal followed by modality-specific properties within an episode of exploration.
This salience hierarchy, where amodal properties are detected prior to modality-specific prop-
erties, fosters coordinated perception by allowing infants to process visual, auditory, and tactile
stimulation from unitary events. Moreover, this salience hierarchy serves to guide and constrain
perceptual development in order of increasing specificity. Perception of amodal relations (syn-
chrony, rhythm, tempo, duration) constrains attention to unitary multimodal events and in turn
promotes further processing of modality-specific detail, providing a coherent event context for
organizing detail. For example, in a crowded room, by first processing synchronous, amodal
stimulation, the infant is able to differentiate and perceive the sights and sounds of a single indi-
vidual speaking while ignoring the voices and movements of other objects and individuals nearby.
Once attention is focused on multimodal stimulation from a single individual, the infant can then
meaningfully process other aspects of the individual such as the quality of her voice, the configu-
ration of her face, and the colour and arrangement of her clothing (while ignoring other voices,
faces, and objects). Such selective attention thereby promotes efficient learning about a single
SUMMARY AND DIRECTIONS FOR FUTURE RESEARCH 199
individual, in order of increasing specificity. Thus, sensitivity to amodal properties promotes

attention to unified events and guides subsequent knowledge acquisition by highlighting general
perceptual information and constraining the acquisition of specific detail. Without such con-
straints effectively guiding attention to multimodal events during early development, processing
would often be piecemeal and unintegrated. For example, sounds of one individual might be dif-
ferentiated along with the movements or colours of other individuals or objects nearby, and in
turn this would promote further processing of unrelated patterns of stimulation. Thus perceptual
salience and processing priority for redundant amodal properties would seem critical for promot-
ing optimal perceptual development and typical developmental outcomes. Furthermore, early
salience hierarchies likely have a cascading effect on cognition, language, and social development
(which emerge in multimodal learning contexts) by establishing initial conditions which favour
processing unitary, multimodal events early in perceptual processing sequences (e.g. Bahrick and
Lickliter 2002; Gogate and Bahrick 1998; Lickliter and Bahrick 2000). Importantly, a disturbance
of this salience hierarchy, where modality-specific details are acquired prior to or without the
context of unitary multimodal events, could potentially contribute to the piecemeal processing
and social-orienting impairments observed in children with autism (Dawson et al. 1998, 2004;
Mundy and Burnette 2005; see Bahrick 2010, for further discussion).
8.7 Summary and directions for future research

The studies reviewed in this chapter were generated from a convergent-operations research pro-
gram designed to address parallel questions across human and non-human animal participants.
Our convergent animal-human work has identified several principles of early perceptual develop-
ment that appear to be common across human and animal infants, including the salience of
intersensory redundancy, the importance of amodal information in guiding early perceptual
development, and the benefits of educating attention for facilitating learning and memory. Our
findings (summarized in Table 8.1) were obtained across species, developmental periods, stimu-
lus event types (social and non-social), and methods of inquiry. As such, they provide evidence
for principles of perceptual development that should be generalizable across a wide range of con-
texts and developmental systems. Our converging results consistently emphasize the importance
of selective attention in guiding and constraining development across the interrelated domains of
perception, learning, and memory. The rich bidirectional traffic between these interconnected
processes (depicted in Figure 8.1) highlights the need for more integrative theories applicable to
real world, multimodal settings.
As the various chapters of this volume make clear, there is a growing appreciation of the impor-
tance of intersensory perception to social, emotional, cognitive, and language development.
Recent findings from neural, physiological, and behavioural levels of analysis have provided
evidence that intersensory rendundancy plays a key role in guiding and constraining the course
of early perceptual responsiveness. This insight has provided a framework for advancing our
understanding of the emergence and maintenance of a number of perceptual and cognitive skills
observed during infancy, including affect discrimination (Flom and Bahrick 2007), face discrimi-
nation (Bahrick and Newell 2008), rhythm and tempo discrimination (Barhick et al. 2010),
numerical discrimination (Farzin et al. 2009; Jordan et al. 2008), sequence detection (Lewkowicz
2004), abstract rule learning (Frank et al. 2009), and word comprehension and segmentation
(Gogate and Bahrick 2001; Hollich et al. 2005). A number of important questions remain to be
explored. For example, what accounts for the initial salience of intersensory redundancy and its
effects on perceptual processing? Does intersensory redundancy foster longer attentional engage-
ment, deeper processing, or both, and under what conditions?
Work in embodied and developmental robotics (e.g. Arsenio and Fitzpatrick 2005; Fitzpatrick
et al. 2008; Lungarella et al. 2003; Weng 2004) is advancing our understanding of how sensitivity
to intersensory redundancy (particularly audiovisual synchrony) can guide and constrain percep-
tual, motor, cognitive, and social skills during early development. Developmental robots ‘develop’
their perceptual, cognitive, and behavioural skills incrementally through real-time explorations
and interactions with their environment (Weng 2004). For example, Lungarella and colleagues
(2003) note that repetitive actions similar to those often displayed by human infants provide
developmental robots a large amount of synchronous multimodal information, which can be
used to bootstrap early motor and cognitive processes. In a similar vein, Fitzpatrick and col-
leagues (2006) have demonstrated the importance of repetition and redundancy to robot percep-
tion and recognition of multimodal events. They have provided examples of how binding vision,
audition, and proprioception can highlight temporal information and thereby enhance the devel-
opment of robot perception and recognition of objects and events. Advances in developmental
robotics (based in large part on our increasing knowledge of infant development) will likely make
a significant contribution to more integrative theories of human perceptual as well cognitive and
social development (see Smith and Breazeal 2007).
Studies of atypical development, including autism and autism spectrum disorders (ASD) are
also providing evidence that points to the critical role of multimodal stimulation and intersensory
processing in promoting typical perceptual, cognitive, and social development (see Bahrick and
Todd in press, for a review). Research indicates intersensory processing impairments in ASD,
including deficits in intersensory binding, audiovisual speech processing, and matching synchro-
nous sights and sounds, particularly for social events. We have proposed that social events pro-
vide exaggerated amounts of intersensory redundancy relative to non-social events, and that
infant sensitivity to intersensory redundancy underlies the typical emergence of preferential
attention to social events. In ASD, there is a disturbance of intersensory processing, contributing
to social-orienting impairments and in turn, affecting language development and contributing to
the repetitive behaviours characteristic of ASD (see Bahrick 2010; Bahrick and Todd in press).
Making sense of developmental impairments in intersensory processing will undoubtedly con-
tribute to a deeper understanding of the typical development of intersensory processing.
Studies of the neural underpinnings of intersensory processing during early development will
also help address important questions about the mechanism underlying the salience of intersen-
sory redundancy. There is a growing body of research on the neural architecture and neural
processes involved in intersensory functioning (for overviews see Chapter 14 by Wallace et al.;
Calvert et al. 2004; Stein and Meredith 1993), but little of this research has had a developmental
focus (but see Brainard and Knudsen 1993; King and Carlile 1993; Wallace and Stein 2007) and
only a few studies have assessed the effects of redundant versus non-redundant stimulation on
patterns of neural responsiveness to bimodal events during infancy (e.g. Hyde et al. 2009;
Reynolds et al. 2010). Additional work in developmental cognitive neuroscience focusing on the
links between selective attention and intersensory perception is needed to advance our under-
standing of the neural mechanisms that contribute to how young infants become economical
and efficient at attending to multimodal stimulation that is unitary and relevant to their needs
and actions.
Acknowledgements
The research and theory development reported in this chapter were supported by NICHD grants
RO1 HD053776, RO3 HD052602, and K02HD064943 awarded to LEB, NICHD grant RO1
HD048423 and NSF grant BCS 1957898 awarded to RL.
REFERENCES 201
References
Adolph, K.E., and Berger, S.E. (2005). Physical and motor development. In Developmental science: an
advanced textbook, 5th edn. (eds. M.H. Bornstein, and M.E. Lamb), pp. 223–81. Lawrence Erlbaum
Adolph, K. E., and Berger, S. E. (2006). Motor development. In Handbook of Child Psychology, Vol 2:
Cognition, Perception, and Language, 6th edn. (eds. W. Damon, R. Lerner, D. Kuhn, and R.S. Siegler),
pp. 161–213. John Wiley, New York.
Arsenio, A., and Fitzpatrick, P. (2005). Exploiting amodal cues for robot perception. International Journal
of Humanoid Robotics, 2, 125–43.
Bahrick, L.E. (1988). Intermodal learning in infancy: learning on the basis of two kinds of invariant
Bahrick, L.E. (1992). Infants’ perceptual differentiation of amodal and modality-specific audio-visual
relations. Journal of Experimental Child Psychology, 53, 180–99.
Bahrick, L.E. (1994). The development of infants’ sensitivity to arbitrary intermodal relations. Ecological
Bahrick, L.E. (2001). Increasing specificity in perceptual development: infants’ detection of nested levels of
multimodal stimulation. Journal of Experimental Child Psychology, 79, 253–70.
Bahrick, L.E. (2004). The development of perception in a multimodal environment. In Theories of Infant
Development (eds. G. Bremner, and A. Slater), pp. 90–120. Blackwell Publishing, Oxford.
implications for typical social development and autism. In Blackwell Handbook of Infant Development,
2nd Ed. (eds. G. Bremner, and T.D. Wachs). Blackwell Publishing, Oxford.
Bahrick, L.E., and Lickliter, R. (2000). Intersensory redundancy guides attentional selectivity and perceptual
Bahrick, L.E., and Lickliter, R. (2002). Intersensory redundancy guides early perceptual and cognitive
development. In Advances in Child Development and Behavior: Vol. 30 (ed. R. Kail), pp. 153–87.
Academic Press, New York.
Bahrick, L.E., and Lickliter, R. (2004). Infants’ perception of rhythm and tempo in unimodal and
multimodal stimulation: a developmental test of the intersensory redundancy hypothesis. Cognitive,
Affective and Behavioral Neuroscience, 4, 137–47.
Bahrick, L.E., and Newell, L.C. (2008). Infant discrimination of faces in naturalistic events: Actions are
more salient than faces. Developmental Psychology, 44, 983–96.
Bahrick, L.E., and Pickens, J.N. (1994). Amodal relations: The basis for intermodal perception and learning.
In The development of intersensory perception: comparative perspectives (eds. D. Lewkowicz, and
R. Lickliter), pp. 205–33. Lawrence Erlbaum Associates, Hillsdale, NJ.
Bahrick, L. E., and Todd, J.T. (in press). Multisensory processing in autism spectrum disorders:
Intersensory processing disturbance as a basis for atypical development. In The New Handbook of
Multisensory Processes (eds. B. Stein, and M. Wallace), MIT Press, Cambridge, MA.
Bahrick, L.E., Walker, A.S., and Neisser, U. (1981). Selective looking by infants. Cognitive Psychology,
13, 377–90.
Bahrick, L.E., Flom, R., and Lickliter, R. (2002a). Intersensory redundancy facilitates discrimination of
tempo in 3-month-old infants. Developmental Psychobiology, 41, 352–63.
Bahrick, L.E., Gogate, L.J., and Ruiz, I. (2002b). Attention and memory for faces and actions in infancy: the
salience of actions over faces in dynamic events. Child Development, 73, 1629–43.
Bahrick, L.E., Lickliter, R., and Flom, R. (2004a). Intersensory redundancy guides infants’ selective
attention, perceptual and cognitive development. Current Directions in Psychological Science, 13,
99–102.
Bahrick, L.E., Lickliter, R., Vaillant, M., Shuman, M., and Castellanos, I. (2004b, June). The development of
face perception in dynamic, multimodal events: Predictions from the intersensory redundancy
hypothesis. Poster presented at the International Multisensory Research Forum, Barcelona, Spain.
Bahrick, L.E., Lickliter, R., Shuman, M., Batista, L.C., Castellanos, I., and Newell, L.C. (2005, November).
The development of infant voice discrimination: from unimodal auditory to bimodal audiovisual
presentation. Poster presented at the International Society for Developmental Psychobiology,
Washington, DC.
Bahrick, L.E., Lickliter, R., and Flom, R. (2006). Up versus down: the role of intersensory redundancy in the
development of infants’ sensitivity to the orientation of moving objects. Infancy, 9, 73–96.
Bahrick, L.E., Todd, J.T., Argumosa, M., Grossman, R., Castellanos, I., and Sorondo, B.M. (2009a, July).
Intersensory facilitation across the lifespan: adults show enhanced discrimination of tempo in
bimodal vs. unimodal stimulation. Poster presented at the International Multisensory Research Forum,
New York, NY.
Bahrick, L.E., Krogh-Jespersen, S., Argumosa, M., and Lopez, H. (submitted). Intersensory redundancy
hinders face discrimination in preschool children: Evidence for visual facilitation.
Bahrick, L.E., Lickliter, R., Castellanos, I., and Vaillant-Molina, M. (2010). Intersensory redundancy and
tempo discrimination in infancy: the roles of task difficulty and expertise. Developmental Science, 13,
731–37.
Bartlett, F. C. (1932). Remembering: a study in experimental and social psychology. Cambridge University
Press, Cambridge.
Berger, S. E. (2004). Demands on finite cognitive capacity cause infants’ perseverative errors. Infancy, 5,
217–38.
Brainard, M.S., and Knudsden, E.I. (1993). Experience-dependent plasticity in the inferior colliculus: a site
for visual calibration of the neural representation of auditory space in the barn owl. Journal of
Calvert, G.A., Spence, C., and Stein, B.E. (eds.) (2004). The handbook of multisensory processes. MIT Press,
Cambridge, MA.
Castellanos, I., and Bahrick, L.E. (2008, November). Educating infants’ attention to amodal properties of
speech: the role of intersensory redundancy. Poster presented at the International Society for
Developmental Psychobiology, Washington, DC.
Castellanos, I., Vaillant-Molina, M., Lickliter, R., and Bahrick, L.E. (2006, October). Intersensory
redundancy educates infants’ attention to amodal information in unimodal stimulation. Poster
presented at the International Society for Developmental Psychobiology, Atlanta, GA.
Chase, W.G., and Simon, H.A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81.
Colombo, J. (2001). The development of visual attention in infancy. Annual Review of Psychology, 52,
337–67.
Colombo, J. (2002). Infant attention grows up: The emergence of a developmental cognitive neuroscience
perspective. Current Directions in Psychological Science, 11, 196–99.
Colombo, J., and Mitchell, D.W. (1990). Individual and developmental differences in infant visual attention:
Fixation time and information processing. In Individual differences in infancy: reliability, stability, and
prediction (eds. J. Colombo, and J.W. Fagen), pp. 193–227. Lawrence Erlbaum Associates, Hillsdale, NJ.
Colombo, J., Mitchell, D.W., Coldren, J.T., and Freeseman, L.J. (1991). Individual differences in infant visual
attention: are short lookers faster processors or feature processors? Child Development, 62, 1247–57.
Corbetta, D., and Bojczyk, K.E. (2002). Infants return to two-handed reaching when they are learning to
walk. Journal of Motor Behavior, 34, 83–95.
Cornell, E. (1974). Infants’ discrimination of faces following redundant presentations. Journal of
Experimental Child Psychology, 18, 98–106.
Craik, F.I.M., and Lockhart, R.S. (1972). Levels of processing: a framework for memory research. Journal of
Verbal Learning and Verbal Behavior, 11, 671–84.
REFERENCES 203
Dawson, G., Meltzoff, A.N., Osterling, J., Rinaldi, J., and Brown, E. (1998). Children with autism fail
to orient to naturally occurring social stimuli. Journal of Autism and Developmental Disorders, 28,
479–85.
Dawson, G., Toth, K., Abbott, R., et al. (2004). Early social attention impairments in autism: social
orienting, joint attention, andattention to distress. Developmental Psychology, 40, 271–83.
Driver, J. (2001). A selective review of selective attention research from the past century. British Journal of
Farzin, F., Charles, E., and Rivara, S. (2009). Development of multimodal processing in infancy. Infancy,
14, 563–78.
Fitzpatrick, P., Arsenio, A., and Torres-Jara, E.R. (2006). Reinforcing robot perception of multi-modal events
through repetition and redundancy and repetition and redundancy. Interaction Studies, 7, 171–96.
Fitzpatrick, P., Needham, A., Natale, L., and Metta, G. (2008). Shared challenges in object perception for
robots and infants. Infant and Child Development, 17, 7–24.
Flom, R., and Bahrick, L.E. (2007). The development of infant discrimination of affect in multimodal and
unimodal stimulation: the role of intersensory redundancy. Developmental Psychology, 43, 238–52.
Flom, R., and Bahrick, L.E. (2010). The effects of intersensory redundancy on attention and memory:
Infants’ long-term memory for orientation in audiovisual events. Developmental Psychology, 46,
428–36.
Frank, M.C., Slemmer, J., Marcus, G., and Johnson, S.P. (2009). Information from multiple modalities
helps 5-month-olds learn abstract rules. Developmental Science, 12, 504–509.
Frick, J.E., Colombo, J., and Allen, J.R. (2000). Temporal sequence of global-local processing in 3-month-
old infants. Infancy, 1, 375–86.
Frick, J.E., Colombo, J., and Saxon, T.F. (1999). Individual and developmental differences in
disengagement of fixation in early infancy. Child Development, 70, 537–48.
Gibson, E.J. (1969). Principles of perceptual learning and development. Appleton-Century-Crofts, East
Norwalk, CT.
Gibson, E.J. (1988). Exploratory behavior in the development of perceiving, acting, and the acquiring of
knowledge. Annual Review of Psychology, 39, 1–41.
Gibson, E J., and Pick, A.D. (2000). An ecological approach to perceptual learning and development. Oxford
Gibson, J.J. (1966). The senses considered as perceptual systems. Houghton-Mifflin, Boston.
Gibson, J.J. (1979). The ecological approach to visual perception. Houghton-Mifflin, Boston.
Gogate, L.J., and Bahrick, L.E. (1998). Intersensory redundancy facilitates learning of arbitrary relations
between vowel sounds and objects in seven-month-old infants. Journal of Experimental Child
Gogate, L.J., and Bahrick, L.E. (2001). Intersensory redundancy and seven-month-old infants’ memory for
arbitrary syllable-object relations. Infancy, 2, 219–31.
Haith, M.M. (1980). Rules that babies look by: the organization of newborn visual activity. Lawrence Erlbaum
Associates, Potomac, MD.
Hale, S. (1990). A global developmental trend in cognitive processing speed. Child Development, 61, 653–63.
Hebb, D.O. (1949). The organization of behavior: a neuropsychological theory. John Wiley, New York.
Hernandez-Reif, M., and Bahrick, L.E. (2001). The development of visual-tactile perception of objects:
Amodal relations provide the basis for learning arbitrary relations. Infancy, 2, 51–72.
Hollich, G., Newman, R.S., and Jusczyk, P.W. (2005). Infant’s use of synchronized visual information to
separate streams of speech. Child Development, 76, 598–613.
Hunter, M.A., and Ames, E. W. (1988). A multifactor model of infant preferences for novel and familiar
stimuli. In Advances in Infancy Research, Vol. 5 (eds. C. Rovee-Collier, and L.P. Lipsitt), pp. 69–95.
Albex, Norwood, NJ.
Hyde, D.C., Jones, B.L., Porter, C.L., and Flom, R. (2009). Visual stimulation enhances auditory processing
in 3-month-old infants. Developmental Psychobiology, 52, 181–89.
Jaime, M., Bahrick, L.E., and Lickliter, R. (2010). The critical role of temporal synchrony in the salience of
intersensory redundancy during prenatal development. Infancy, 15, 61–82.
Johnson, M.H., Posner, M.I., and Rothbart, M.K. (1991). Components of visual orienting in early infancy:
contingency learning, anticipatory looking, and disengaging. Journal of Cognitive Neuroscience, 3, 335–44.
Jordan, K.E., Suanda, S.H., and Brannon, E. M. (2008). Intersensory redundancy accelerates preverbal
numerical competence. Cognition, 108, 210–21.
Kaplan, S., and Berman, M.G. (2010). Directed attention as a common resource for executive functioning
and self-regulation. Perspectives on Psychological Science, 5, 43–57.
Kellman, P.J., and Arterberry, M.E. (1998). The cradle of knowledge: the development of perception in infancy.
MIT Press, Cambridge, MA.
King, A.J., and Carlile, S. (1993). Changes induced in the represenation of auditory space in the superior
colliiculus by rearing ferrets with binocular eyelid suture. Experimental Brain Research, 94, 444–55.
Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental
Psychology: Human Perception and Performance, 21, 451–68.
Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cognitive Sciences, 9,
75–82.
Lewis, R., and Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced
connectivity between early visual and auditory areas. Journal of Neuroscience, 30, 12329–39.
Lewkowicz, D.J. (1996). Infants’ response to the audible and visible properties of the human face: I. Role of
lexical syntactic content, temporal synchrony, gender, and manner of speech. Developmental
Lewkowicz, D.J. (2004). Perception of serial order in infants. Developmental Science, 7, 175–84.
Lickliter, R., and Bahrick, L.E. (2000). The development of infant intersensory perception: Advantages of a
Lickliter, R., and Bahrick, L.E. (2004). Perceptual development and the origins of multisensory
responsiveness. In Handbook of multisensory processes (eds. G. Calvert, C. Spence, and B.E. Stein),
pp. 643–54. MIT Presss, Cambridge, MA.
Lewkowicz, D.J., and Lickliter, R. (eds.). (1994). Development of intersensory perception: comparative
Lickliter, R., Bahrick, L.E., and Honeycutt, H. (2002). Intersensory redundancy facilitates prenatal perceptual
learning in bobwhite quail (Colinus virginianus) embryos. Developmental Psychology, 38, 15–23.
Lickliter, R., Bahrick, L.E., and Honeycutt, H. (2004). Intersensory redundancy enhances memory in
bobwhite quail embryos. Infancy, 5, 253–69.
Lickliter, R., Bahrick, L.E., and Markham, R. G. (2006). Intersensory redundancy educates selective
attention in bobwhite quail embryos. Developmental Science, 9, 605–616.
Lungarella, M., Metta, G., Pfeifer, R., and Sandini, G. (2003). Developmental robotics: a survey. Connection
Science, 15, 151–90.
Morrongiello, B.A., Fenwick, K.D., and Nutley, T. (1998). Developmental changes in associations between
auditory-visual events. Infant Behavior and Development, 21, 613–26.
Mundy, P., and Burnette, C. (2005). Joint attention and neurodevelopment. In Handbook of autism and
pervasive developmental disorders: Vol. 3 (eds. F. Volkmar, A. Klin, and R. Paul), pp. 650–81.
John Wiley, Hoboken, NJ.
Neisser, U. (1976). Cognitive Psychology. Prentice Hall, Englewood Cliffs, NJ.
REFERENCES 205
Oakes, L.M., and Madole, K.L. (2008). Function revisited: how infants construe functional features in
their representation of objects. In Advances in child development and behavior, Vol. 36 (ed. R. Kail),
pp. 135–85. Academic Press, New York.
Pashler, H. (1998). The psychology of attention. MIT Press, Cambridge, MA.
Piaget, J. (1952). The origins of intelligence in children. International Universities Press, New York.
Piaget, J. (1954). The construction of reality in the child. Basic Books, New York.
Radeau, M., and Bertelson, P. (1977). Adaptation to auditory-visual discordance and ventriloquism in
semi-realistic situations. Perception and Psychophysics, 22, 137–46.
Reynolds, G., Bahrick, L.E., Lickliter, R., and Riggs, M. (2010, March). Intersensory redundancy and
infant event-related potentials. Poster presented at the International Conference on Infancy Studies,
Baltimore, MD.
Reynolds, G.D., Courage, M., and Richards, J.E. (2010). Infant attention and visual preferences: converging
evidence from behavior, event-related potentials, and cortical source localization. Developmental
Richards, J.E., Reynolds, G.D., and Courage, M. (2010). The neural bases of infant attention. Current
Directions in Psychological Science, 19, 41–46.
Rose, S.A., Feldman, J.F., and Jankowski, J.J. (2001). Attention and recognition memory in the 1st year of
life: A longitudinal study of preterm and full-term infants. Developmental Psychology, 37, 135–51.
Ruff, H.A., and Rothbart, M.K. (1996). Attention in early development: themes and variations. Oxford
Santangelo, V., Ho, C., and Spence, C. (2008). Capturing spatial attention with multisensory cues.
Psychonomic Bulletin and Review, 15, 398–403.
Santangelo, V., and Spence, C. (2007). Multisensory cues capture spatial attention regardless of perceptual
load. Journal of Experimental Psychology: Human Perception and Performance, 33, 1311–21.
Schank, R., and Ableson, R. (1977). Scripts, plans, goals, and understanding. Lawrence Erlbaum Associates,
Hillsdale, NJ.
Schmuckler, M.J. (1996). Visual-proprioceptive intermodal perception in infancy. Infant Behavior and
Shams, L., and Seitz, A.R. (2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12, 411–417.
Smith, L.B., and Breazeal, C. (2007). The dynamic lift of developmental process. Developmental Science, 10,
61–68.
Spence, C. (2010). Crossmodal spatial attention. Annals of the New York Academy of Science, 1191, 182–200.
Spence, C., and Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford University Press,
Oxford.
Thelen, E., and Smith, L.B. (1994). A dynamic systems approach to the development of cognition and action.
MIT Press, Cambridge, MA.
Vaillant-Molina, M, and, Bahrick, L. E.. (2012). The role of intersensory redundancy in the emergence of
social referencing in 5.5-month-old infants. Developmental Psychology, 48, 1–9.
Vaillant, J., Bahrick, L. E., and Lickliter, R. (2009). Detection of modality specific stimulus properties are
enhanced by unimodal exposure during prenatal development. Poster presented at the Society for
Research in Child Development, Denver, CO.
Vaillant-Molina, M., Newell, L., Castellanos, I., Bahrick, L.E., and Lickliter, R. (2006). Intersensory
redundancy impairs face perception in early development. Poster presented at the International
Conference on Infant Studies, Kyoto, Japan.
Von Hofsten, C. (1983). Catching skills in infancy. Journal of Experimental Psychology: Human Perception
and Performance, 9, 75–85.
Von Hofsten, C. (1993). Prospective control: A basic aspect of action development. Human Development,
36, 253–70.
Wallace, M.T., and Stein, B.E. (2007). Early experience determines how the senses will interact. Journal of
Warren, D., Welch, R., and McCarthy, T. (1981). The role of visual-auditory ‘compellingness’ in the
ventriloquism effect: implications for transitivity among the spatial senses. Perception and
Weng, J. (2004). Developmental robotics: theory and experiments. International Journal of Humanoid
Robotics, 1, 199–236.
Xu, F., Carey, S., and Quint, N. (2004). The emergence of kind-based object individuation in infancy.
Zukow-Goldring, P. (1997). A social ecological realist approach to the emergence of the lexicon: educating
attention to amodal invariants in gesture and speech. In Evolving explanations of development
(eds. C. Dent-Read, and P. Zukow-Goldring), pp. 199–249. American Psychological Association,
Washington, DC.
Chapter 9
The development of audiovisual

speech perception
Salvador Soto-Faraco, Marco Calabresi, Jordi Navarra,
Janet F. Werker, and David J. Lewkowicz
9.1 Introduction
The seemingly effortless and incidental way in which infants acquire and perfect the ability to use
spoken language/s is quite remarkable considering the complexity of human linguistic systems.
Parsing the spoken signal successfully is in fact anything but trivial, as attested by the challenge
posed by learning a new language later in life, let alone attaining native-like proficiency. It is now
widely accepted that (adult) listeners tend to make use of as many cues as there are available to
them in order to decode the spoken signal effectively (e.g. Cutler 1997; Soto-Faraco et al. 2001).
Different sources of information not only span various linguistic levels (acoustics, phonology,
lexical, morphology, syntax, semantics, pragmatics, etc.), but also encompass different sensory
modalities such as vision and audition (Campbell et al. 1998) and even touch (see Gick and
Derrick 2009). Thus, like most everyday perceptual experiences (Gibson 1966; Lewkowicz 2000a;
Stein and Meredith 1993), spoken communication involves multisensory inputs. In the particular
case of speech, the receiver in a typical face-to-face conversation has access to the sounds as well
as to the corresponding visible articulatory gestures made by the speaker. The integration of heard
and seen speech information has received a good deal of attention in the literature, and its conse-
quences have been repeatedly documented at the behavioral (e.g. Ma et al. 2009; McGurk and
MacDonald 1976; Ross et al. 2007; Sumby and Polack 1954) as well as the physiological (e.g.
Calvert et al. 2000) level. The focus of the present chapter is on the developmental course of mul-
tisensory processing mechanisms that facilitate language acquisition, and on the contribution of
multisensory information to the development of speech perception. We argue that the plastic
mechanisms leading to the acquisition of a language, whether in infancy or later on, are sensitive
to the correlated and often complementary nature of multiple crossmodal sources of linguistic
information. These crossmodal correspondences are used to decode and represent the speech
signal from the first months of life.
9.2 Early multisensory capacities in the infant

From a developmental perspective, the sensitivity to multisensory redundancy and coherence is
essential for the emergence of adaptive perceptual, cognitive, and social functioning (Bahrick
et al. 2004; Gibson 1969; Lewkowicz and Kraebel 2004; Spelke 1976; Thelen and Smith 1994).
This includes the ability to extract meaning from speech, arguably the most important commu-
nicative signal in an infant’s life. The multisensory coherence of audiovisual speech is determined
by the overlapping nature of the visual and auditory streams of information emanating from the
face and vocal tract of the speaker, reflected in a variety of amodal stimulus attributes (such as
208 THE DEVELOPMENT OF AUDIOVISUAL SPEECH PERCEPTION
intensity, duration, tempo, and rhythm) and temporally and spatially contiguous modality-
specific attributes (such as the color of the face and the pitch and timbre of the voice). A priori, it
is reasonable to assume that an immature and inexperienced infant should not be able to extract
the intersensory relations offered by all these cues. This assumption is consistent with the classic
developmental integration view, according to which, initially in life, infants are not capable of
perceiving crossmodal correspondences and only acquire this ability gradually as a result of expe-
rience (Birch and Lefford 1963, 1967; Piaget 1952). However, this idea stands in contrast to the
other classic developmental framework, namely the developmental differentiation view (Gibson
1969, 1984), according to which a number of basic intersensory perceptual abilities are already
present at birth (although which specific ones are present has never been specified). Furthermore,
like the integration view, the differentiation view assumes that as infants discover increasingly
more complex unisensory features of their world through the process of perceptual learning and
differentiation they can also discover increasingly more complex types of intersensory relations.
A great deal of empirical work on the development of intersensory perception has accumulated
since these two opposite theoretical frameworks were first proposed and it has now become abun-
dantly clear that both developmental integration and differentiation processes contribute to the
emergence of intersensory processing skills in infancy (Lewkowicz 2002). On the one hand, it has
been found that infants exhibit some relatively primitive intersensory perceptual abilities, such as
the detection of synchrony and intensity relations early in life (e.g. Lewkowicz 2010; Lewkowicz
et al. 2010; Lewkowicz and Turkewitz 1980). On the other, it has been found that infants gradu-
ally acquire the ability to perceive more complex types of intersensory relations as they acquire
perceptual experience (Lewkowicz 2000a, 2002; Lewkowicz and Ghazanfar 2009; Lewkowicz and
Lickliter 1994; Lickliter and Bahrick 2000; Walker-Andrews 1986, 1997). Finally, it is worth
pointing out that, just like in other perceptual domains (e.g. Werker and Tees 1984), the way
in which intersensory development progresses is not always incremental (a trend that will be
discussed in the following sections). For example, recent studies have shown that young infants
can match other-species’ faces and vocalizations (Lewkowicz et al. 2010; Lewkowicz and Ghazanfar
2006;) or non-native visible and audible speech events (Pons et al. 2009; see Section 9.5), but
when they are older infants no longer make intersensory matches of non-native auditory and
visual inputs (i.e. those from other species or from a non-native language).
The balance between the early intersensory capabilities on the one side and the limitations due
to the immaturity of the nervous system and relative perceptual inexperience on the other, raises
several questions regarding infants’ initial perception of audiovisual speech.
◆ At what point in early development does the ability to perceive the multisensory coherence of
audiovisual speech emerge?
◆ When this ability emerges, what specific attributes permit its perception as a coherent event?
◆ Do certain perceptual attributes associated with audiovisual speech have developmental
priority over others?
The answers to these questions depend, first and foremost, on an analysis of the audiovisual
speech signal. In essence, audiovisual speech is represented by a hierarchy of audiovisual relations
(Munhall and Vatikiotis-Bateson 2004). These consist of the temporally synchronous and spa-
tially contiguous onsets and offsets of facial gestures and vocalizations, the correlation between
the dynamics of vocal-tract motion and the dynamics of accompanying vocalizations (that are
usually specified by duration, tempo, and rhythmical pattern information), and finally by various
amodal categorical attributes such as the talker’s gender, affect, and identity. Whether infants can
perceive some or all of these different intersensory coherence cues is currently an open question,
especially during the early months of life.
EARLY PERCEPTUAL SENSITIVITY TO HEARD AND SEEN SPEECH 209
So far, a number of studies are consistent with the idea that infants can perceive coherence
between auditory and visual phonemes from as young as 2 months of age (Kuhl and Meltzoff
1982, 1984; Patterson and Werker 1999, 2003; Pons et al. 2009; though see Walton and Bower
1993, for potential evidence of matching at even younger ages). Moreover, some researchers have
attempted to extrapolate audiovisual matching abilities to longer speech events (e.g. Dodd 1979,
1987; Dodd and Burnham 1988). However, it is still unclear what particular sensory cues infants
respond to in these studies. More recently, it has been reported that by around 4 months of age
infants exhibit the McGurk effect (Burnham and Dodd 2004; Desjardins and Werker 2004;
Rosenblum et al. 1997; see Section 9.3), suggesting that infants, like adults, can integrate incon-
gruous audible and visible speech cues. Finally, it has been reported that infant discrimination of
their mother’s face from that of a stranger is facilitated by the concurrent presentation of the
voice, indicating the ability to use crossmodal redundancy (e.g. in 4–5-month-olds; Burnham
1993, and even newborns; Sai 2005).
The findings regarding infants’ abilities to match visible and audible phonemes are particularly
interesting because at first glance they seem to challenge the notion that early intersensory per-
ception is based on low-level cues such as intensity and temporal synchrony. Indeed, with specific
regards to temporal synchrony, some of the studies to date have demonstrated phonetic matching
across auditory and visual stimuli presented sequentially rather than simultaneously (therefore,
in the absence of crossmodal temporal synchrony cues; e.g. Pons et al. 2009). As a result, the most
reasonable conclusion to draw from these studies is that infants must have extracted some higher-
level, abstract, intersensory relational cues in order to perform the crossmodal matches (i.e. the
correlation between the dynamics of vocal-tract motion and the dynamics of accompanying
vocalizations; see MacKain et al. 1983, for a similar argument). As indicated earlier, audiovisual
speech is a dynamic and multidimensional event that can be specified by synchronous onsets of
acoustic and gestural energy as well as by correlated dynamic patterns across audible and visible
articulations (Munhall and Vatikiotis-Bateson 2004; Yehia et al. 1998). Thus, the studies showing
that infants can match faces and voices across a temporal delay show that they can take advantage
of one perceptual cue (e.g. correlated dynamic patterns) in the absence of the other (e.g. synchro-
nous onsets). This is further complemented by findings that infants can also perceive audiovisual
synchrony cues in the absence of higher-level intersensory correlations (Lewkowicz 2010 ;
Lewkowicz, et al. 2010). Therefore, when considered together, current findings suggest that
infants are sensitive to both correlated dynamic patterns of acoustic and gestural energy as
well as to synchronous onsets, and that they can respond to each of these intersensory cues
independently.
9.3 Early perceptual sensitivity to heard and seen speech

Previous research has revealed that, from the very first moments in life, human infants exhibit
sensitivity to the perceptual attributes of speech, both acoustic and visual. For example, human
neonates suck preferentially when listening to speech versus complex non-speech stimuli
(Voulumanos and Werker 2007). As mentioned in the previous section, this sensitivity can be
fairly broad in the newborn (e.g. extending both to human speech and to non-human vocaliza-
tions), but within the first few months of life it becomes quite specific to human speech
(Vouloumanos et al. 2010). Moreover, human neonates can acoustically discriminate languages
from different rhythmical classes (Byers-Heinlein et al. 2010; Mehler et al. 1988; Nazzi et al. 1998)
and show categorical perception of phonetic contrasts (Dehaene-Lambertz and Dehaene 1994).
At birth, infants can discriminate both native and unfamiliar phonetic contrasts, suggesting that
the auditory/linguistic perceptual system is broadly sensitive to the properties of any language.
Then, over the first several months after birth, phonetic discrimination becomes reorganized
resulting in an increase in sensitivity to the speech contrasts in the infant’s native language (Kuhl
et al. 2003, 2006; Narayan et al. 2010) and a concomitant decrease in their sensitivity to those
phonetic contrasts that are not used to distinguish meaning in the infant’s native language
(Werker and Tees 1984; see Saffran et al. 2006 for a review).
The perceptual sensitivity to the attributes of human speech early in life is paralleled by special-
ized neural systems for the processing of both the phonetic (e.g. Dehaene-Lambertz and Baillet
1998; Dehaene-Lambertz and Dehaene 1994) and rhythmical (e.g. Dehaene et al. 2006; Peña et al.
2003) aspects of spoken language. Indeed, some studies indicate that newborn infants are sensi-
tive to distinct patterns of syllable repetition (i.e. they can discriminate novel speech items follow-
ing an ABB syllabic structure—e.g. wefofo, gubibi, etc.—from those following an ABC structure).
This discriminative ability seems to implicate left hemisphere brain areas analogous to those
involved in language processing in the adult brain (Gervain et al. 2008) and dissociable from
areas implicated in tone sequence discrimination (Gervain et al. 2009). This differential pattern
of neural activation suggests that infants’ sensitivity to speech-like stimuli may involve early
specialization.
Although visual language discrimination has not been studied in the newborn (in the same way
that auditory language has), there is evidence that infants as young as 4 months are already able
to discriminate languages just by watching silent talking faces (Weikum et al. 2007). In the study
by Weikum and colleagues, infants were presented with videos of silent visual speech in English
and French in a habituation paradigm (see Fig. 9.1). Monolingual infants of 4 and 6 months
revealed sensitivity to a language switch (in the form of increased looking times), but appeared to
lose this sensitivity at 8 months of age. On the contrary, 8-month-olds raised in a bilingual envi-
ronment maintained the sensitivity to switches between their two languages. Thus, the sensitivity
to visual speech cues begins to decline by 8 months of age, unless the infant is being raised in a
bilingual environment. It would be interesting to address whether this early visual sensitivity to
distinct languages exploits rhythmical differences (as seems to be the case for auditory discrimi-
nation), dynamic visual phonetic (i.e. visegmetic1) information (which differs between English
and French), or both.
Young infants are not only sensitive to the auditory and visual concomitants of speech in isola-
tion, they are also sensitive, from a very early age, to the percepts that arise from the joint influ-
ence of both auditory and visual information. For example, as mentioned in the previous section,
the presence of the maternal voice has been found to facilitate recognition of the maternal face in
newborn infants (Sai 2005) as well as in 4–5-month-old infants (Burnham 1993; Dodd and
Burnham 1988). These findings suggest that sensitivity to speech, evident early in life, can support
and facilitate learning about specific faces. Perhaps even more remarkable is the extent to which
young infants are sensitive to matching information between heard and seen speech (Baier et al.
2007; Kuhl and Meltzoff 1982, 1984; MacKain et al. 1983; Patterson and Werker 1999, 2002;
Yeung and Werker, in preparation). For example, in Kuhl and Meltzoff’s study, 2- and 4-month-
old infants looked preferentially at the visual match when shown side-by-side video displays of a
woman producing either an ‘ee’ or an ‘ah’ vowel together with the soundtrack of one of the two
1 The term ‘visegmetic’ is used here to refer to the dynamically changing arrangement of features on the face
(mouth, chin, cheeks, head etc.). In essence, it refers to anything visible on the face and head that accom-
panies the production of segmental phonetic information. Visegmetic segments are therefore meant to
capture all facets of motion that the face and head make as language sounds are produced, and include the
time-varying qualities that are roughly equivalent to auditory phonetic segments.
EARLY PERCEPTUAL SENSITIVITY TO HEARD AND SEEN SPEECH 211
(a)
Sentences in one Sentences in the same

language (e.g. English) (e.g. English) or a new
presented until looking (e.g. French) language.
time drops by 60% Looking times are measured
(b) 4 month olds 6 month olds 8 month olds

8.00
Looking time (s)
6.00
4.00
Switch/monolinguals
2.00
No switch/monolinguals
0.00
Final Test Final Test Final Test
habituation trials habituation trials habituation trials
8.00
Looking time (s)
6.00
4.00
Switch/monolingual
2.00
Switch/bilingual
0.00
Final Test Final Test
habituation trials habituation trials
Fig. 9.1 (a) Schematic representation of the habituation paradigm used in Weikum et al.’s (2007) visual
language discrimination experiment. Infants (4, 6 and 8 months old) were exposed to varied speakers
(only one shown in the figure) across different trials, but all could speak either English or French. Some
example snap-shots of the set up are presented in the top illustrations. Infants sat in their mother’s lap.
She was wearing darkened glasses to prevent her from seeing the video-clips the infant was being
presented with. (b) Results (in average looking times) are presented separately with respect to the
different age groups (columns) and linguistic backgrounds (rows) tested in the experiment. In each
graph, the looking times for the final phase the habituation period and the test phase are presented
for the language switch and no-switch (control) trials. Error-bars represent the SEM. From Science, 316
(5828), Visual Language Discrimination in Infancy, Whitney M. Weikum, Athena Vouloumanos, Jordi
Navarra, Salvador Soto-Faraco, Núria Sebastián-Gallés, Janet F. Werker, p 1159, © 2007, AAAS.
Reprinted with permission from AAAS. Reproduced in colour in the colour plate section.
vowels. Some authors have reported that, even at birth, human infants seem to show an ability to
match human facial gestures with their corresponding heard vocalizations (Aldrige et al. 1999).
Yet, although it appears that there is substantial organization from an early age that supports
intersensory matching, this ability is amenable to experience-related changes. For example,

following a cochlear implant in infants born deaf, the propensity for intersensory matching of
speech begins returning within weeks and months (Barker and Tomblin 2004; see Section 9.7).
Furthermore, during the first year of life, intersensory matching becomes increasingly attuned to
the native input of the infant (Pons et al. 2009).
The intersensory matching findings discussed so far highlight the remarkable ability of infants to
link phonetic characteristics across the auditory and visual modality. However, natural speech
processing requires the real-time integration of concurrent heard and seen aspects of speech. One
of the most interesting illustrations of real-time audiovisual integration in adult speech processing
is that visible speech can change the auditory speech percept in the case of crossmodal conflict
(McGurk and MacDonald 1976).2 Several studies have indicated that this illusion, known as the
McGurk effect, is present as early as 4 months of age, thus implying that infants are able to integrate
seen and heard speech (e.g. Burnham and Dodd 2004; Desjardins and Werker 2004; Rosenblum
et al. 1997). For example, in Burnham and Dodd’s study, infants were habituated to either a con-
gruent auditory /ba/ + visual [ba] or an incongruent auditory /ba/ + visual [ga]; the latter leads to
a ‘da’ or ‘tha’ percept in adults. After habituation, the infants’ looking times to each of three differ-
ent auditory-only syllables, /ba/, /da/, and /tha/, all of which were paired with a still face, were
measured. Habituation to the audiovisually congruent syllable /ba/ + [ba] made infants look
longer to /da/ or /tha/ than to /ba/, indicating that infants perceived /ba/ as different from the other
two syllables. Furthermore, and consistent with integration, after habituation to the incongruent
audiovisual stimulus (auditory /ba/ + visual [ga]) infants looked longer to /ba/, implying that they
perceived this syllable as a novel stimulus. Other results confirm the finding of integration from an
early age using different stimuli, albeit not in every combination in which it should have been pre-
dicted from adult performance (Desjardins and Werker 2004 ; Rosenblum et al. 1997 ; see
Kushnerenko et al. 2008, discussed in Section 9.6 for electrophysiological evidence).
Beyond illusions arising from crossmodal conflict, pairing congruent visual information and
speech sounds can facilitate phonetic discrimination in infancy (Teinonen et al. 2008; see Navarra
and Soto-Faraco 2007 for evidence in adults). In Teinonen et al. ’s study, infants aged
6 months were familiarized to a set of acoustic stimuli selected from a restricted range in the
middle of the /ba/–/da/ continuum (and thus, somehow acoustically ambiguous), paired with a
visual display of a model articulating either a canonical [ba] or a canonical [da]. In the consistent
condition, auditory tokens from the /ba/ side of the boundary were paired with visual [ba] and
all audio /da/ tokens were paired with visual [da]. In the inconsistent condition, audio and
visual stimuli were randomly paired. Only infants in the consistent pairing condition were able to
2 Referred to as the McGurk effect or McGurk illusion. When adults are presented with certain mismatching
combinations of acoustic and visual speech tokens (such as an auditory /ba/ in synchrony with a seen [ga]),
the resulting percept is fusion between what is heard and seen (i.e. in this example it would result in a
perceived ‘da’ or ‘tha’, an entirely illusory outcome). Intersensory conflict in speech can also result in
combinations rather than fusions (for example, when presented with an auditory /ga/ and a visual [ba],
adults often report perceiving ‘bga’) and in visual capture (for example, when presented with a heard /ba/
and a seen [va], participants often report hearing ‘va’). Even when aware of the dubbing procedure and the
audiovisual mismatch, adults cannot inhibit the illusion, though some limitations due to attention (Alsius
et al. 2005, 2007; Navarra et al. 2010, for a review) and experience (Sekiyama and Burnham 2008; Sekiyama
and Tohkura 1991, 1993, see also Section 9.7) have been reported (some have even argued that it is a
learned phenomenon; e.g. Massaro 1984). The existence of this illusory percept arising from conflict
between seen and heard speech in infancy provides much stronger evidence of integration than bimodal
matching.
DEVELOPMENTAL CHANGES IN AUDIOVISUAL SPEECH PROCESSING 213
discriminate /ba/ versus /da/ stimuli from either side of the boundary in a subsequent auditory
discrimination test. Teinonen et al.’s results provide support for the notion that visual and
auditory speech involve similar representations, and that their co-occurrence in the world might
support learning of native category boundaries. However, this phenomenon might extend to
other types of correlations.
In a recent study, Yeung and Werker (2009) found that simply pairing auditory speech stimuli
with two different (non-speech) visual objects also facilitates discrimination. Yeung and Werker
tested 9-month-old infants of English-speaking families using a retroflex /Da/ versus a dental
/da/, a Hindi contrast, which does not distinguish meaning in English. Prior studies have already
shown that English infants are sensitive to this contrast at 6–8 months of age, but not at 10–12
months (Werker and Lalonde 1998; Werker and Tees 1984). Yeung and Werker demonstrated,
however, that exposure to consistent pairings of /da/ with one visual object and /Da/ with another
during the familiarization phase facilitated later discrimination of this non-native contrast,
whereas following an inconsistent pairing, the 9-month-old English infants could not discrimi-
nate. Yeung and Werker’s results raise the possibility that it is not the matching between specific
audiovisual linguistic information that enhanced performance in Teinonen et al.’s study, but
rather that it is just the contingent pairing between phonemic categories and visual objects.
9.4 Developmental changes in audiovisual speech processing

from childhood to adulthood
Despite the remarkable abilities of infants to extract and use audiovisual correlations (in speech
and other domains) from a very early age (e.g. Burnham 1993; Kuhl and Meltzoff 1982, 1984;
Lewkowicz 2010; Lewkowicz et al. 2010; Lewkowicz and Turkewitz 1980; MacKain et al. 1983;
Patterson and Werker 1999, 2003; Sai 2005; Walton and Bower 1993), it also becomes clear from
the literature reviewed above that there are very important experience-related changes in the
development of audiovisual speech processing (e.g. Barker and Tomblin 2004; Lewkowicz 2000a,
2002; Lewkowicz and Ghazanfar 2006, 2009; Lewkowicz and Lickliter 1994; Lewkowicz, et al.
2010; Lickliter and Bahrick 2000; Pons et al. 2009; Teinonen et al. 2008; Walker-Andrews 1997).
These changes, however, do not stop at infancy but continue throughout childhood and beyond.
A good illustration of this point is provided in the paper where the now famous McGurk effect
was first reported (McGurk and MacDonald 1976). McGurk and MacDonald’s study was in fact
a cross-sectional investigation including children of different ages (3–5 and 7–8 year-olds) as well
as adults (18–40 years old), and the results pointed out a clear discontinuity between middle
childhood and adulthood. In McGurk and MacDonald’s study, children of both age groups were
less amenable to the audiovisual illusion than adults, a consistent pattern that has been replicated
in subsequent studies (e.g. Desjardins et al. 1997; Hockley and Polka 1994; Massaro 1984; Massaro
et al. 1986; Sekiyama and Burnham 2008). This developmental trend is usually accounted for by
the reduced amount of experience with (visual) speech in children as well as, according to some
authors, by the reduced degree of sensorimotor experience resulting from producing speech (e.g.
Desjardins et al. 1997). Massaro et al. pointed out that rather than reflecting a different strategy of
audiovisual integration in children, the reduced visual influence at this age range was better
accounted for by a reduction in the amount of visual information extracted from the stimulus.
Consistent with this interpretation, children seem to be poorer lip-readers than adults (Massaro
et al. 1986). The fact that young infants exhibit the McGurk illusion at all (see Section 9.3) and
that they are able to extract sufficient visual information to discriminate between languages
(Weikum et al. 2007) seems to indicate, however, that visual experience may not be the only fac-
tor at play in the development of the integrative mechanisms, or at the least that a minimal
amount of experience is sufficient to set up the rudiments of audiovisual integration (albeit not
up to adult-like levels).
This general trend of increasing visual influence in speech perception toward the end of
childhood seems to be modulated by environmental factors, such as the language background
and/or cultural aspects. This comes across clearly in the light of the remarkable differences in
audiovisual speech perception between different linguistic groups in adulthood. For example,
Japanese speakers are in general less amenable to the McGurk illusion and to audiovisual speech
integration than westerners (e.g. Sekiyama and Burnham 2008; Sekiyama and Tohkura 1991,
1993; see also Sekiyama 1997, for Chinese). The accounts of this cross-linguistic difference
in susceptibility to audiovisual effects have varied. One initial explanation pointed at cultural
differences in terms of how much people tend to look at each other’s faces while speaking. In
particular, direct gaze is less frequent in Japanese society (e.g. Sekiyama and Tohkura 1993),
leading to a smaller amount of visual experience with speech and, as a consequence, reducing its
contribution in audiovisual speech perception. Nevertheless, not all cross-linguistic differences
in audiovisual speech processing may be so easily explained along such cultural lines (e.g. Aloufy
et al. 1996 for reduced visual influence in Hebrew versus English speakers). Other, more recent,
explanations for differences in audiovisual speech integration across languages focus on how the
phonological space of different languages may determine the relative reliance on the auditory
versus visual inputs by their speakers. Because the phonemic repertoire of Japanese is less
crowded than that of English, Japanese speakers would require less visual aid in everyday spoken
communication thereby learning to rely less strongly in the visual aspects of speech (e.g. Sekiyama
and Burnham 2008).
Another possibly important source of change in audiovisual speech integration during adult-
hood relates to variations in sensory acuity throughout life. Along these lines, it has been claimed
that multisensory integration in general becomes more important in old age, because the general
decline in unisensory processing would render greater benefits of the integration between differ-
ent sensory sources as per the rule of inverse effectiveness (Laurienti et al. 2006; also see Chapter
11). However, several previous studies testing older adults have failed to meet the prediction of
increased strength of audiovisual speech integration as compared to younger adults (Behne et al.
2007; Cienkowski and Carney 2002; Middleweerd and Plomp 1987; Sommers et al. 2005; Walden
et al. 1993). In fact, some of these studies reported superior audiovisual integration in adults of
younger age when differences in hearing thresholds are controlled for (e.g. Sommers et al. 2005).
In a similar vein, Ross et al. (2007) reported a negative correlation between age and audiovisual
performance levels across a sample ranging from 18 to 59 years old. The finding of reduced
audiovisual integration in older versus younger adults could be caused by a decline in unisensory
(visual speech reading) in the elderly (e.g. Arlinger 1991; Gordon and Allen 2009). At present, the
conflict between studies showing increased audiovisual influence with ageing, and those report-
ing no change or even decline is unresolved.
9.5 Development of audiovisual speech processing in

multilingual environments
An often overlooked fact is that, for a large part of the world’s population, language acquisition
takes place in a multilingual environment. However, the role of audiovisual processes in the con-
text of multiple input languages has received comparatively little attention from researchers (see
Burnham 1998 for a review). One of the initial challenges posed for infants born in a multilingual
environment is undoubtedly the necessity to discriminate between the languages spoken around
them (Werker and Byers-Heinlein 2008). From a purely auditory perspective, current evidence
DEVELOPMENT OF AUDIOVISUAL SPEECH PROCESSING IN MULTILINGUAL ENVIRONMENTS 215
highlights the key role played by prosody in language discrimination. For example, the ability to
differentiate between languages belonging to different rhythmic classes (e.g. stress-timed lan-
guages, such as English, versus syllable-timed languages, such as Spanish; see Mehler et al. 1988)
is present at birth whereas the ability to discriminate between languages belonging to the same
rhythmic class emerges by 4–5 months of age (Bosch and Sebastian-Galles 2001; Nazzi and
Ramus 2003). Moreover, infants as young as 4 months of age can discriminate languages on the
basis of visual information (Weikum et al. 2007, described earlier). This last finding raises the
interesting possibility, mentioned earlier, that, just as in infant auditory language discrimination,
rhythmic attributes play an important role in infant visual language discrimination. This is made
all the more likely by the fact that rhythm is an amodal stimulus property to which infants are
sensitive visually as well as acoustically from a very early age (Lewkowicz 2003; Lewkowicz and
Marcovitch 2006) and by the fact that rhythmic properties seem to play an important role in
visual language discrimination in adults (Soto-Faraco et al. 2007). For instance, in Soto-Faraco
et al. (2007), one of the few variables that correlated with (Spanish–Catalan) visual language dis-
crimination ability in adults was vowel-to-consonant ratio, a property that is related to rhythm.
One should nevertheless also consider the role of visegmetic (visual phonetic) information as a
cue to visual language discrimination (Weikum & Werker, 2008).
What about matching languages across sensory modalities? One of the few studies addressing
audiovisual matching cross-linguistically was conducted by Dodd and Burnham (1988). The
authors measured orienting behaviour in infants raised in monolingual English families when
presented with two faces side by side, one silently articulating a language passage in Greek and the
other another passage in English, accompanied by either the appropriate Greek or the appropriate
English passage played acoustically. Infants of 2½ months did not show significant audiovisual
language matching in either condition whilst 4½-month-olds significantly preferred to look at
the English-speaking face while hearing the English soundtrack. Unfortunately, given that there
was an overall preference for the native-speaking faces, it is unclear whether these results reflect
true intersensory matching or a general preference for the native language.
Phoneme perception and discrimination behaviour offers a further opportunity to study
mono- and multilingual aspects of audiovisual speech development. A pioneering study by
Werker and Tees (1984) using acoustic stimuli showed that the initial broad sensitivity to native
and non-native phonemic contrasts by 6 months of age narrowed down to only those contrasts
that are relevant to the native language by 12 months of age. This type of phenomenon, referred
to as perceptual narrowing, has recently been shown to occur in visual development (i.e. in the
discrimination of other-species faces; Pascalis et al. 2005), in matching of other-species faces and
vocalizations (Lewkowicz and Ghazanfar 2006, 2009), and, interestingly, in matching auditory
phonemes and visual speech gestures from a non-native language (Pons et al. 2009). In particular,
the study by Pons et al. revealed that narrowing of speech perception to the relevant contrasts in
the native language is not limited to one modality (acoustic or visual), but that it is indeed a
pansensory3 and domain-general phenomenon. Using an intersensory matching procedure, Pons
and colleagues familiarized infants from Spanish and English families with auditory /ba/ or /va/,
an English phonemic contrast which is allophonic in Spanish (see Fig. 9.2). The subsequent test
consisted of side-by-side silent video clips of the speaker producing each of these syllables.
Whereas 6-month-old Spanish-learning infants looked longer at the silent videos matching the
pre-exposed syllable, 11-month-old Spanish-learning infants did not, indicating that their ability
to match the auditory and visual representation of non-native sounds declines during the first
3 ‘Pansensory’ is used here in the sense that it applies to all sensory modalities.
(a)
(b) (c)
100.00
12.00
Preferential looking to matching
Correct syllable matching (%)
6.00 75.00
syllable (%)
0.00 50.00
Spanish
English
−6.00 25.00
6 11 Adult
Age (months)
Fig. 9.2 (a) Schematic representation of the preferential looking paradigm used in Pons et al.’s
(2009) audiovisual matching experiment. Infants' preferential looking to each silent vocalization
was measured before (baseline) and after (test) a purely acoustic familiarization period where
infants were exposed to one of the two syllables. During the familiarization phase, a rotating circle
was shown in the centre of the screen. (b) Results (in relative preferential looking times) are
presented separately with respect to the different infant age groups and linguistic backgrounds
tested in the experiment. (c) results (proportion correct) from monolingual english- and spanish-
speaking adults in an adapted version of the audiovisual matching experiment Reproduced from
Ferran Pons, David J. Lewkowicz, Salvador Soto-Faraco, and Núria Sebastián-Gallés, Narrowing of
intersensory speech perception in infancy, Proceedings of the National Academy of Sciences, 106
(26), © 30th June 2009, PNAS, with permission..
year of life. In contrast, English-learning infants, whose phonological environment includes both
/ba/ and /va/, exhibited intersensory matching at both ages. Interestingly, once the decline occurs
it seems to persist into adulthood, given that Spanish-speaking adults (tested on a modified inter-
sensory matching task) exhibited no evidence of intersensory matching for /va/ and /ba/, whereas
English-speaking adults exhibited matching accuracy above 90%.
NEURAL UNDERPINNINGS OF THE DEVELOPMENT OF AUDIOVISUAL SPEECH PROCESSING 217
Evidence from the cross-linguistic studies discussed earlier (Section 9.4) suggests that adult
audiovisual speech processing can be highly dependent on the language-specific attributes
experienced throughout development (e.g. Sekiyama 1997; Sekiyama and Tohkura 1991, 1993).
To track the developmental course of these differences, Sekiyama and Burnham (2008) tested 6-,
8- and 11-year-old English and Japanese children as well as adults in a speeded classification task
for unimodal and crossmodal (congruent and incongruent) /ba/, /da/, and /ga/ syllables. Despite
the equivalent degree of visual interference from audiovisually incongruent syllables of the two
language groups at 6 years of age, from 8 years onward the English participants were more prone
to audiovisual interference than the Japanese participants. The authors of this study speculated
that the reasons for this difference may be due to the complexity of English phonology showing a
larger benefit of redundant and complementary information from the visual modality relative to
the comparatively less crowded Japanese phonological space. Sekiyama and Burnham proposed
that this would be exacerbated by 8 years of age as children enter in contact with larger talker
variability. In a related study, Chen and Hazan (2007) tested the degree of visual influence on
auditory speech in children between 8 to 9 years of age and adult participants in monolingual
(English and Mandarin native speakers) and one early bilingual group (children only, Mandarin
natives who were learning English). The stimuli were syllables presented in auditory, visual, cross-
modal congruent, and crossmodal incongruent (McGurk-type) conditions. Although the data
revealed a developmental increase in visual influence (in accord with much of the previous litera-
ture), in contrast to Sekiyama and Burnham (2008) no differences in the size of visual influence
as a function of language background or bilingual experience were found.
Finally, recent work by Kuhl et al. (2006) provides some further insights into the limits and
plasticity of crossmodal speech stimulation for perceiving non-native phonological contrasts.
Kuhl et al. (2006) reported two studies in which 9-month-old infants raised in an American
English language environment were exposed to Mandarin Chinese for twelve 25-minute sessions
over a month, either in a social (face-to-face) situation or through a pre-recorded presentation
played back on a TV set. The post-exposure discrimination tests showed that only those infants in
the face-to-face sessions were able to perceive Mandarin phonetic contrasts at a similar level to
native Mandarin Chinese infants. These infants clearly outperformed a control group who under-
went similar face-to-face exposure sessions but in English, their native language. This finding was
interpreted as indicating that social context is necessary for maintaining or re-gaining sensitivity
to non-native phonetic contrasts by the end of the first year of life. However, the actual role of
increased attention associated with personal interactions, as compared to watching a pre-recorded
video-clip, should also be considered. That is, by 9 months of age children still retain some
residual flexibility for non-native contrast discrimination, which can be maintained simply by
increasing passive exposure time (from 2 min at 6–8 months; Maye et al. 2002, to 4 min of expo-
sure at 10 months; Yoshida et al. 2010). Thus although social interaction is effective in facilitating
the relearning of non-native distinctions after 9 months of age, it may be so not because it is social
per se, but because it increases infant attention by providing a richly referential context in which
objects are likely being labelled when the infant is paying attention.
9.6 Neural underpinnings of the development of

audiovisual speech processing
Only a handful of studies have dealt with the neural underpinnings of audiovisual speech percep-
tion in infants. One behavioral study that is worth mentioning is the one conducted by MacKain
et al. (1983), who first suggested the lateralization of audiovisual matching of speech to the left
cerebral hemisphere in infancy. In this study, 5–6-month-old infants were presented with an
auditory sequence of disyllabic pseudo-words (e.g. /vava/, /zuzu/. . .) together with two side-by-
side video-clips of a speaking face. Both speaking faces uttered two syllable pseudo-words, tem-
porally synchronized with the syllables presented acoustically, but only one of the visual streams
was congruent in segmental content with the auditory stream. Infants preferred to look selectively
at the video-clip which was linguistically congruent with the soundtrack, but only if it was shown
on the right side of the infant’s field of view. This finding resonates well with the existence of an
early capacity for audiovisual matching and integration within the first few months of life revealed
by other studies (Baier et al. 2007; Burnham and Dodd 2004; Desjardins and Werker 2004; Kuhl
and Meltzoff 1982; 1984; Rosenblum et al. 1997). In McKain et al.’s study, owing to the lateraliza-
tion of gaze behaviour to the contralateral hemispace in tasks engaging one hemisphere more than
the other, the authors inferred left-hemisphere specialization for audiovisual speech matching.
This would suggest an early functional organization of the neural substrates subserving audio-
visual speech processing.
There are only a few studies that have directly related physiological measurements in infants to
crossmodal speech perception. Kushnerenko and colleagues (2008) recorded electrical activity
from the brains of 5-month-old infants while they listened to and watched the syllables /ba/ and
/ga/. For each stimulus, the modalities could be crossmodally matched or mismatched in content.
In one of the mismatched conditions where auditory /ba/ was dubbed onto visual [ga], the
expected perceptual fusion (‘da’, as per adult and infant results, see Burnham and Dodd 2004)
does not violate phonotactics in the infant’s native language (i.e. English). In this case, the elec-
trophysiological trace was no different from the crossmodally matching stimuli (/ba/ + [ba], or
/ga/ + [ga]). Crucially, in the other mismatched condition auditory /ga/ was dubbed onto visual
[ba], a stimulus perceived as ‘bga’ by adults and therefore in violation of English phonotactics.
In this case, the trace evoked by the crossmodal mismatch was significantly different from the
matching crossmodal stimuli (as well as from the other crossmodal mismatch leading to phono-
tactically legal ‘da’). The timing and scalp distribution of the differences in evoked potentials
suggested that the effects of audiovisual integration could be reflected in the auditory cortex. The
expression of audiovisual integration at early sensory stages coincides with what is found in adults
(e.g. Sams et al. 1991), and therefore Kusherenko et al.’s results suggest that the organization of
adult-like audiovisual integration networks starts early on in life.
The results of Bristow and colleagues (2008) with 10-week-old infants further converge on the
conclusion that the brain networks supporting audiovisual speech processing are already present,
albeit at an immature stage, soon after birth. In their experiment, infants’ mismatch responses in
evoked potentials were measured to acoustically presented vowels (e.g. /a/) following a brief
habituation period to the same or a different vowel (e.g. /i/). Mismatch responses were very similar
regardless of the sensory modality in which the vowels had been presented during the habituation
period (auditory or visual). This led Bristow et al. to conclude that infants at this early age have
crossmodal or amodal phonemic representations, in line with the results of behavioural studies
revealing an ability to match speech intersensorially (e.g. Kuhl and Meltzoff 1982, 1984; Patterson
and Werker 2003). Interestingly, the scalp distribution and the modelling of the brain sources of
the mismatch responses revealed a left lateralized cortical network responsive to phonetic match-
ing that is dissociable from areas sensitive to other types of crossmodal matching (such as face–
voice–gender). Remarkably, at only 10 weeks of age, this network already has substantial overlap
with audiovisual integration areas typically seen in adults, including frontal (left inferior frontal
cortex) as well as temporal regions (left superior and left inferior temporal gyrus).
Recent work from Dick and colleagues (Dick et al. 2010) sheds some light on the brain corre-
lates of audiovisual speech beyond infancy (∼9 years old). This is an interesting age range because
it straddles the transition period between the reduced audiovisual integration levels in infancy
DEVELOPMENT OF AUDIOVISUAL SPEECH AFTER SENSORY DEPRIVATION 219
and adult-like audiovisual processing, discussed in Section 9.4 (Desjardins et al. 1997; Hockley
and Polka 1994; Massaro 1984; Massaro et al. 1986; McGurk and MacDonald 1976; Sekiyama and
Burnham 2008). Dick et al. demonstrated that by nine years of age children rely on the same net-
work of brain regions used by adults for audiovisual speech processing. However, the study of the
information flow with structural equation modelling demonstrated differences between adults
and children (the posterior inferior frontal gyrus/ventral premotor cortex modulations on supra-
marginal gyrus differed across age groups) and these differences were attributed to less efficient
processing in children. The authors interpreted these findings as the result of maturational proc-
esses modulated by language experience.
A clear message emerges in light of the physiological findings discussed so far; the basic
brain mechanisms that will support audiovisual integration in adulthood begin to emerge
quite early in life, albeit in an immature form. Thus these mechanisms and their supporting
neural substrates seem to undergo a slow developmental process from early infancy up until
the early stages of adulthood. This fits well with previous claims that the integration mecha-
nisms in infants and adults are not fundamentally different, and that it is the changes in the
amount of information being extracted from each input that determines the output of inte-
gration throughout development (Massaro et al. 1986). In addition, the seemingly long period
of time that it takes before adult-like levels of audiovisual processing are achieved suggests
that there are lots of opportunities for experience to continually contribute to the develop-
ment of these circuits.
9.7 Development of audiovisual speech after

sensory deprivation
As many as 90–95% of deaf children are born to hearing parents and thus they grow up in normal-
hearing linguistic environments where they experience the visual correlates of spoken language
(see Campbell and MacSweeney 2004). What is more, in many cases visual speech becomes the
core of speech comprehension. A critical question is thus whether speech-reading, or lip-reading,
can be improved with training and, related, whether deaf people are better speech-readers than
hearing individuals by way of their increased practice with visual speech. On the one hand, several
studies have shown that a percentage of individuals born deaf have better lip-reading abilities than
normally-hearing individuals (see Auer and Bernstein 2007; Bernstein et al. 2000; Mohammed
et al. 2005), although others have failed to find that deaf people are superior lip readers (see
Mogford 1987). When looking at the effectiveness of specific training in lip-reading, success is very
limited, at least with short training periods (e.g. 5 weeks), in both hearing individuals as well as in
inaccurate deaf lip-readers (Bernstein et al. 2001), suggesting that longer-term experience may be
necessary for the development of lip-reading skills. On the other hand, the differences between
congenital and developmentally deaf individuals reveal that one key factor that affects the develop-
ment of effective speech-reading abilities is early experience with the correspondence between
auditory and visual speech (see Auer and Bernstein 2007).
The widespread availability of cochlear implants (a technology based on the electric stimula-
tion of the auditory nerve) to palliate deafness has allowed researchers to study audiovisual
integration abilities in persons who have not had prior access to (normal) acoustic input. It has
been found that if an implant is introduced after infancy, deaf individuals retain a marked
visual dominance during audiovisual speech perception even after years of implantation
(Rouger et al. 2008). For example, Rouger et al. found that late-implanted individuals display
higher incidence of visually dominant responses to the typical McGurk stimulus (that is,
reporting ‘ga’ when [ga] is presented visually and /ba/ auditorily) than normally hearing participants
(whose typical fused response is ‘da’; McGurk and MacDonald 1976; and also Massaro 1998).
In contrast, when the implant is introduced before 30 months of age, cochlear-implanted par-
ticipants experience the McGurk effect like normally-hearing individuals (Barker and Tomblin
2004; Schorr et al. 2005). Thus, findings from cochlear implanted individuals support the idea
that, quite early on, the deaf brain tends to specialize in decoding speech from vision, and that
this strong bias toward visual speech information seems to remain unaffected even after a dra-
matic improvement in auditory perception (i.e. after 6 months of implantation; see Rouger
et al. 2007). However, sufficient experience at early stages of development leads to a normal
(that is, not visually-biased) audiovisual integration of speech. Moreover, individuals who
receive the cochlear implants seem to show a greater capacity (than healthy controls) in their
capability to integrate visual information from lip-reading with the new acoustic input (see
Rouger et al. 2007).
At a neural level, the primary auditory cortex of deaf individuals undergoes similar structural
development (e.g. in terms of size) as in hearing individuals (see Emmorey et al. 2003; Penhune
et al. 2003), thus suggesting that this area can be recruited by other modalities in the absence of
any auditory input. In fact, the available evidence suggests a stronger participation of auditory
association and multisensory areas (such as the superior temporal gyrus and sulcus) in deaf sub-
jects than in normal-hearing individuals during the perception of visual speech (Lee et al. 2007)
and non-speech events (Finney et al. 2001, 2003). This functional reorganization seems to occur
quickly after the onset of deafness and it has been proposed that it could result from a relatively
fast adaptation process based on pre-existing multisensory neural circuitry (see Lee et al. 2007).
The lack of increased lateral superior temporal activity, as measured through fMRI, during visual
speech perception in congenitally deaf individuals (with no prior multisensory experience;
Campbell et al. 2004; MacSweeney et al. 2001, 2002) reinforces the idea that some pre-existing
working circuitry specialized in integrating audiovisual speech information is needed for the
visual recruitment of audiovisual multisensory areas.
Blindness, like deafness, can provide critical clues regarding the contribution of visual experi-
ence to the development of the audiovisual integration of speech. At first glance, congenitally
blind individuals who have never experienced the visual correlates of speech seem perfectly able
to use (understand and produce) spoken language, a fact that qualifies the importance of audio-
visual integration in the development of speech perception. Some studies addressing speech pro-
duction have reported slight differences between blind and sighted individuals in the pronunciation
of sounds involving visually distinctive articulation, such as /p/, but these are very minor (e.g.
Göllesz 1972; Mills 1987; Miner 1963; Ménard et al. 2009). Very few studies have assessed speech
perception in blind individuals. In general, the results of these studies are consistent with the idea
that due to compensatory mechanisms, acoustic perception in general (and as a consequence,
auditory speech perception) is superior in the blind than in sighted people. In line with this idea,
Lucas (1984) reported data supporting a superior ability of blind children in spotting misspellings
from oral stories, and Muchnik et al. (1991) showed that the blind seem to be less affected than
sighted individuals by noise in speech. If confirmed, the superiority of blind individuals in speech
perception may signal a non-mandatory (or at most, minor) role of visual speech during acquisi-
tion of language. Yet, the putative superiority of speech perception in the blind remains to be
conclusively demonstrated (Elstner 1955; Lux 1933, cited in Mills 1988, for conflicting findings)
and, even if it were so, one would still need to ascertain whether it is speech-specific or a conse-
quence of general auditory superiority. A detailed analysis of particular phonological contrasts in
which vision may play an important developmental role will be necessary (see Ménard et al. 2009
for perception of French vowel contrasts in the blind). In addition, recent studies (Lewkowicz &
Hansen-Tift, 2012) have found that when infants reach the second half of the first year of life—a
CONCLUSIONS AND FUTURE DIRECTIONS 221
time when they begin learning how to talk—they shift their attention to the mouth of a talker.
This suggests that the early acquisition of speech and language depends on a sustained period of
attention to audiovisual speech.
9.8 Conclusions and future directions

On the one hand, we have learned a great deal about the development of multisensory speech
perception over the last twenty years and, on the other, the extant findings have raised a number of
new questions. By way of summary, it is now known that early emerging intersensory matching
and integration abilities, some of which are based on a perceptual sensitivity to low-level intersen-
sory relational cues (Bahrick 1983, 1988; Lewkowicz 1992a, 1992b, 1996, 2000b, 2010; Lewkowicz
et al. 2010; Lewkowicz and Turkewitz 1980), can provide one basis for the initial parsing of audio-
visual speech input (Kuhl and Meltzoff 1982, 1984; Patterson and Werker 1999, 2003; Pons, et al.
2009; Walton and Bower 1993). For example, early sensitivity to temporal relations could underlie
sensitivity to rhythm and thus can facilitate language discrimination in infancy at the acoustic
(Byers-Heinlein et al. 2010; Mehler, et al. 1988; Nazzi et al. 1998) and visual (Weikum et al. 2007)
levels. It is not yet known, however, whether only low-level, domain-general mechanisms underlie
the early acquisition of audiovisual speech processing skills or whether instead domain-specific
mechanisms and the infant’s preference for speech also play a role. Evidence for integration of
heard and seen aspects of speech in the first few months of life, albeit attenuated in its strength and
in its degree of automaticity as compared to that of adults, points to a sophisticated multisensory
processing of linguistic input at an early age. Converging on this conclusion, electrophysiological
studies provide evidence for neural signatures of audiovisual speech integration in infancy at early
stages of information processing (e.g. Bristow et al. 2008; Kushnerenko et al. 2008). Furthermore,
the results of these studies suggest continuity in the development of brain mechanisms supporting
audiovisual speech processing from early infancy to adulthood.
It has also become clear that, above and beyond any initial crossmodal capacities, experience plays
a crucial role in shaping adult-like audiovisual speech-processing mechanisms. A finding that illus-
trates this tendency is that the development of audiovisual speech processing, like in the unisensory
case, is non-linear. That is, rather than being purely incremental (i.e. characterized by a general
broadening and improvement of initially poor perceptual abilities), recent findings (Pons et al. 2009)
suggest that early audiovisual speech perception abilities are initially broad and then, with experience,
narrow around the linguistic categories of the native language. This, in turn, results in the emergence
of specialized perceptual systems that are selectively tuned to native perceptual inputs. This result
provides evidence that the reorganization around native linguistic input not only occurs at a modality
specific level, but also at a pansensory level. It will be important in the future to discern the possible
inter-dependencies between unisensory and intersensory perceptual narrowing.
This development of audiovisual speech processing is also strongly driven by experience-
related factors after infancy, during childhood, and into adulthood. These changes can be trig-
gered by a number of environmental variables, such as the specific characteristics of the (native)
linguistic system the speaker is exposed to (Sekiyama and Burnham 2008; Sekiyama and Tohkura
1991, 1993; Sekiyama 1997), or alterations in sensory acuity as we grow older. Indeed, when it
comes to late adulthood, there seem to be some differences in audiovisual speech processing in
older versus younger adults (e.g. Laurienti et al. 2006), but evidence of audiovisual integration as
a compensatory mechanism for acoustic loss in old age is inconsistent at present (Cienkowski and
Carney 2002; Middleweerd and Plomp 1987; Sommers et al. 2005; Walden et al. 1993).
The sensory deprivation literature has provided some important findings regarding the plastic-
ity of audiovisual speech-processing mechanisms. First, speech-reading abilities in the deaf have
been found to be superior to those of hearing subjects only in limited cases (Auer and Bernstein
2007; Bernstein et al. 2000; Mohammed et al. 2005), but prior acoustic experience with speech
(and hence, with audiovisual correlations) is a strong predictor for successful speech-reading in
the deaf. Second, plastic changes after cochlear implant are strongly dependent on age of implan-
tation. Congenitally deaf individuals who have undergone cochlear implantation within the first
30 months of life achieve audiovisual integration capacities that are equivalent or even superior to
those of hearing individuals (Rouger et al. 2007, 2008). When the implant occurs at an older age,
a strong tendency to use visual speech information remains even after extensive experience with
the newly acquired acoustic input. Visual deprivation studies are scarce and unsystematic to date,
and their outcomes are mixed between mild superiority in perception (Lucas et al. 1984; Muchnik
et al. 1991) and mild or no impairment in production (Mills 1987). This, does not, however,
necessarily mean that visual information is not important in the perception of speech for sighted
individuals who come to rely on vision in their daily life through massive visual experience from
early in infancy (Lewkowicz & Hansen-Tift 2012).
Acknowledgements
SS-F and MC are funded by Grants PSI2010–15426 and CSD2007–00012 from Ministerio de Ciencia
e Innovación and SRG2009–092 from Comissionat per a Universitats i Recerca, Generalitat de
Catalunya and, the European Research Council (StG-2010 263145). JN is funded by Grant PSI2009–
12859 from Ministerio de Ciencia e Innovación. JFW is funded by NSERC (Natural Sciences and
Engineering Research Council of Canada), SSHRC (Social Sciences and Humanities Research Council
of Canada), and the McDonnell Foundation. DJL is funded by NSF grant BCS-0751888.
References
Aldridge, M.A., Braga, E.S., Walton, G.E., and Bower, T.G.R. (1999). The intermodal representation of
speech in newborns. Developmental Science, 2, 42–46.
Aloufy, S., Lapidot, M., and Myslobodsky, M. (1996). Differences in susceptibility to the ‘blending illusion’
among native Hebrew and English speakers. Brain and Language, 53, 51–57.
Alsius, A., Navarra, J., Campbell, R., and Soto-Faraco, S. (2005). Audiovisual speech integration falters
under high attention demands. Current Biology, 15, 839–43.
Alsius, A., Navarra, J., and Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech
integration. Experimental Brain Research, 183, 399–404.
Arlinger, S. (1991). Results of visual information processing tests in elderly people with presbycusis.
Acta Oto-Laryngologica, 111, 143–48.
Auer, E.T., and Jr., Bernstein L.E. (2007). Enhanced visual speech perception in individuals with early-onset
hearing impairment. Journal of Speech, Language and Hearing Research, 50, 1157–65.
Bahrick, L.E. (1983). Infants’ perception of substance and temporal synchrony in multimodal events. Infant
Bahrick, L.E. (1988). Intermodal learning in infancy: Learning on the basis of two kinds of invariant
Bahrick, L.E., Lickliter, R., and Flom, R. (2004). Intersensory redundancy guides the development of
selective attention, perception, and cognition in infancy. Current Directions in Psychological Science,
13, 99–102.
Baier, R., Idsardi, W., and Lidz, J. (2007). Two-month-olds are sensitive to lip rounding in dynamic and
static speech events. Paper presented in the International Conference on Auditory-Visual Speech Processing
(AVSP2007), Kasteel Groenendaal, Hilvarenbeek, The Netherlands, 31 August–3 September 2007.
Barker, B.A., and Tomblin, J.B. (2004). Bimodal speech perception in infant hearing aid and cochlear
implant users. Archives of Otolaryngology—Head and Neck Surgery, 130, 582–86.
REFERENCES 223
Behne, D., Wang, Y., Alm, M., Arntsen, I., Eg, R., and Vals, A. (2007). Changes in audio-visual speech
perception during adulthood. Paper presented in the International Conference on Auditory-Visual
Speech Processing (AVSP2007), Kasteel Groenendaal, Hilvarenbeek, The Netherlands, 31 August–3
September 2007.
Bernstein L.E., Auer, E.T., Jr., and Tucker, P.E. (2001). Enhanced speechreading in deaf adults: can short-
term training/practice close the gap for hearing adults? Journal of Speech, Language and Hearing
Research, 44, 5–18.
Bernstein, L.E., Demorest, M.E., and Tucker, P.E. (2000). Speech perception without hearing. Perception
Bosch, L., and Sebastian-Galles, N. (2001). Early language differentiation in bilingual infants. In Trends in
bilingual acquisition (ed. J. Cenoz and F. Genesee), pp. 71–93. John Benjamins Publishing Company,
Amsterdam.
Bristow, D., Dehaene-Lambertz, G., Mattout, J., et al. (2008). Hearing faces: How the infant brain matches
the face it sees with the speech it hears. Journal of Cognitive Neuroscience, 21, 905–21.
Burnham, D. (1993). Visual recognition of mother by young infants: facilitation by speech. Perception, 22,
1133–53.
Burnham, D. (1998). Language specificity in the development of auditory-visual speech perception. In
Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (ed. R.
Campbell, B. Dodd and D. Burham), pp. 27–60. Psychology Press, Hove, East Sussex.
Burnham, D., and Dodd, B. (2004). Auditory-visual speech integration by pre-linguistic infants: Perception
of an emergent consonant in the McGurk effect. Developmental Psychobiology, 44, 204–20.
Byers-Heinlein, K., Burns, T.C., and Werker, J.F. (2010). The roots of bilingualism in newborns.
Calvert, G., Campbell, R., and Brammer, M. (2000). Evidence from functional magnetic resonance imaging
of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–57.
Campbell, R., and MacSweeney, M. (2004). Neuroimaging studies of crossmodal plasticity and language
processing in deaf people. In The handbook of multisensory processes (ed. G.A. Calvert, C. Spence, and
B.A. Stein), pp. 773–78. MIT Press, Cambridge, MA.
Campbell, R., Dodd, R., and Burnham, D. (1998). Hearing by Eye II: Advances in the psychology of speech
reading and auditory-visual speech. Psychology Press, Hove, East Sussex.
Chen, Y., and Hazan, V. (2007). Language effects on the degree of visual influence in audiovisual speech
perception. In Proceedings of the 16th International Congress of Phonetic Sciences, pp. 6–10. Saarbrueken,
Germany.
Cienkowski, K.M., and Carney, A.E. (2002). Auditory–visual speech perception and aging. Ear and Hearing,
23, 439–49.
Cutler, A. (1997). The comparative perspective on spoken-language processing, Speech Communication, 31,
3–15.
Dehaene-Lambertz, G., and Baillet, S. (1998). A phonological representation in the infant brain.
Neuroreport, 9, 1885–88.
Dehaene-Lambertz, G., and Dehaene, S. (1994). Speed and cerebral correlates of syllable discrimination in
Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Meriaux, A.R., Sigman, M., and Dehaene, S. (2006).
Functional organization of perisylvian activation during presentation of sentences in preverbal infants.
Proceedings of the National Academy of Sciences USA, 103, 14240 –45.
Desjardins, R.N., and Werker, J.W. (2004). Is the integration of heard and seen speech mandatory for
infants? Develpmental Psychobiology, 45, 187–203.
Desjardins, R.N., Rogers, J., and Werker, J.F. (1997). An exploration of why preschoolers perform
differently than do adults in audiovisual speech perception tasks. Journal of Experimental Child
Dick, A.S., Solodkin, A., and Small, S.L. (2010). Neural development of networks for audiovisual speech
comprehension. Brain and Language, 114, 101–114.
Dodd, B., and Burnham, D. (1988). Processing speechread information. Volta-Review, 90, 45–60.
Dodd, B. (1979). Lip-reading in infants: attention to speech presented in- and out-of-synchrony. Cognitive
Dodd, B. (1987). The acquisition of lip-reading skills by normally hearing children. In Hearing by eye: the
psychology of lip-reading (ed. B. Dodd and R. Campbell), pp. 163–75. Lawrence Erlbaum Associates,
London
Elstner, W. (1955). Erfahrungen in der Behandlung sprachgestörter blinder Kinder [Experiences in the
treatment of blind children with speech impairments]. Bericht über die Blindenlehrerfortbildungstagung
in Innsbruck 1955, pp. 26–34.Verlag des BundesBlindenerziehungsinstitutes, Wien.
Emmorey, K., Allen, J.S., Bruss, J., Schenker, N., and Damasio, H. (2003). A morphometric analysis of
auditory brain regions in congenitally deaf adults. Proceedings of the National Academy of Sciences USA,
100, 10049–54.
Finney, E.M., Fine, I., and Dobkins, K.R. (2001). Visual stimuli activate auditory cortex in the deaf. Nature
Finney, E.M., Clementz, B.A., Hickok, G., and Dobkins, K.R. (2003). Visual stimuli activate auditory cortex
in deaf subjects: evidence from MEG. Neuroreport, 14, 1425–27.
Gervain, J. Berent, I., and Werker, J.F. (2009). The encoding of identity and sequential position in
newborns: an optical imaging study. Paper presented at the 34th Annual Boston University Conference
on Child Language, Boston, USA, 6 November–8 November 2009.
Gervain, J., Francesco, M., Cogoi, S., Peña, M., and Mehler, J. (2008). The neonate brain extracts speech
structure. Proceedings of the National Academy of Sciences USA, 105, 14222–27.
Gibson, E.J. (1969). Principles of perceptual learning and development. Appleton, New York.
Gibson, E.J. (1984). Perceptual development from the ecological approach. In Advances in developmental
psychology (ed. M. E. Lamb, A. L. Brown and B. Rogoff), pp. 243–86. Erlbaum, Hillsdale New Jersey.
Gibson, J.J. (1966). The senses considered as perceptual systems. Houghton-Mifflin, Boston
Gick, B., and Derrick, D. (2009). Aero-tactile integration in speech perception. Nature, 462, 502–504.
Göllesz, V. (1972). Über die Lippenartikulation der von Geburt an Blinden [About the lip articulation in
the congenitally blind]. Papers in Interdisciplinary Speech Research. Proceedings of the Speech
Symposium, Szeged, 1971, pp. 85–91. Akadémiai Kaidó, Budapest.
Gordon, M., and Allen, S. (2009). Audiovisual speech in older and younger adults: integrating a distorted
visual signal with speech in noise. Experimental Aging Research, 35, 202–219.
Hockley N.S., and Polka, L. (1994). A developmental study of audiovisual speech perception using the
McGurk paradigm, Journal of the Acoustical Society of America, 96, 3309.
Kuhl, P.K., and Meltzoff, A.N. (1984). The intermodal representation of speech in infants. Infant Behavior
Kuhl, P.K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., and Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months. Developmental
Science, 9, F13–21.
Kuhl, P.K., Tsao, F.M., and Liu, H.M. (2003). Foreign-language experience in infancy: effects of short-term
exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences
USA, 100, 9096–9101.
REFERENCES 225
Kushnerenko, E., Teinonen, T., Volein, A., and Csibra, G. (2008). Electrophysiological evidence of illusory
audiovisual speech percept in human infants. Proceedings of the National Academy of Sciences USA, 105,
11442–45.
Laurienti, P., Burdette, J., Maldjian, J., and Wallace M. (2006). Enhanced multisensory integration in older
adults. Neurobiology of Aging, 27, 1155–63.
Lee, H.J., Truy, E., Mamou, G., Sappey-Marinier, D., and Giraud, A.L. (2007) Visual speech circuits in
profound acquired deafness: a possible role for latent multimodal connectivity. Brain, 130, 2929–41.
Lewkowicz, D.J. (1992a). Infants’ response to temporally based intersensory equivalence: the effect of
synchronous sounds on visual preferences for moving stimuli. Infant Behavior and Development, 15,
297–324.
Lewkowicz, D.J. (1992b). Infants’ responsiveness to the auditory and visual attributes of a sounding/
moving stimulus. Perception and Psychophysics, 52, 519–28.
Lewkowicz, D.J. (1996). Perception of auditory-visual temporal synchrony in human infants. Journal of
Lewkowicz, D.J. (2000a). The development of intersensory temporal perception: an epigenetic systems/
Lewkowicz, D.J. (2000b). Infants’ perception of the audible, visible and bimodal attributes of multimodal
syllables. Child Development, 71, 1241–57.
Lewkowicz, D.J. (2003). Learning and discrimination of audiovisual events in human infants: the
hierarchical relation between intersensory temporal synchrony and rhythmic pattern cues.
Lewkowicz, D.J. (2010). Infant perception of audio-visual speech synchrony. Developmental Psychology, 46,
66–77.
human infants. Proceedings of the National Academy of Sciences USA, 103, 6771–74.
narrowing. Trends in Cognive Sciences, 13, 470–78.
Lewkowicz, D.J., and Hansen-Tift, A.M. (2012). Infants deploy selective attention to the mouth of a talking
face when learning speech. Proceedings of the National Academy of Sciences, 109(5), 1431–1436
Lewkowicz, D.J., and Kraebel, K. (2004). The value of multimodal redundancy in the development of
intersensory perception. In Handbook of multisensory processing (ed. G. Calvert, C. Spence and B.
Stein), pp. 655–79. MIT Press, Cambridge, Massachusetts.
Lewkowicz, D.J., and Lickliter, R. (1994). The development of intersensory perception: comparative
perspectives. Erlbaum, Hillsdale New Jersey.
Lewkowicz, D.J., and Marcovitch, S. (2006). Perception of audiovisual rhythm and its invariance in 4- to
10-month-old infants. Developmental Psychobiology, 48, 288–300.
Lewkowicz, D.J., and Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory-visual
Lewkowicz, D.J., Leo, I. and Simion, F. (2010). Intersensory perception at birth: newborns match non-
human primate faces and voices. Infancy, 15, 46–60.
Lucas, S.A. (1984). Auditory discrimination and speech production in the blind child. International Journal
of Rehabilitation Research, 7, 74–76.
Lux, G. (1933). Eine Untersuchung über die nachteilige Wirkung des Ausfalls der optischen Perzeption auf
die Sprache des Blinden [An investigation about the detrimental effects of the failure of optical
perception in the language of blind persons]. Der Blindenfreund, 53, 166–70.
Ma, W.J., Zhou, X., Ross, L.A., Foxe, J.J., and Parra, L.C. (2009). Lip-reading aids word recognition
most in moderate noise: a Bayesian explanation using high-dimensional feature space. PLoS ONE,
4: e4638.
MacKain, K., Studdert-Kennedy, M., Spierker, S., and Stern, D. (1983). Infant intermodal speech
perception is a left hemisphere function. Science, 219, 1347–49.
MacSweeney, M., Calvert, G.A., Campbell, R., et al. (2002). Speechreading circuits in people born deaf.
MacSweeney, M., Campbell, R., Calvert, G.A., et al. (2001). Dispersed activation in the left temporal cortex
for speech-reading in congenitally deaf people. Proceedings of the Royal Society B: Biological Sciences,
268, 451–57.
Massaro, D.W. (1984). Children’s perception of visual and auditory speech. Child development, 55, 1777–88.
Massaro, D.W. (1998). Perceiving talking faces: from speech perception to a behavioral principle. MIT
Press, Cambridge, MA.
Massaro, D.W., Thompson, L.A., Barron, B., and Laren, E. (1986). Developmental changes in visual and
auditory contributions to speech perception. Journal of Experimental Child Psychology, 41, 93–113.
Maye, J., Werker, J.F., and Gerken, L.A. (2002). Infant sensitivity to distributional information can affect
phonetic discrimination. Cognition, 82, B101–B111.
McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–48.
Mehler, J., Jusczyk, P.W., Lambertz, G., Halsted, N., Bertoncini, J., and Amiel-Tison, C. (1988). A precursor
of language acquisition in young infants. Cognition, 29, 143–78.
Ménard, L., Dupont, S., Baum, S.R., and Aubin, J. (2009). Production and perception of French vowels by
congenitally blind adults and sighted adults. Journal of the Acoustical Society of America, 126, 1406–1414.
Middleweerd, M.J., and Plomp, R. (1987). The effect of speech-reading on the speech reception threshold
of sentences in noise. Journal of the Acoustical Society of America, 82, 2145–47.
Mills, A.E. (1987). The development of phonology in the blind child. In Hearing by eye: the psychology of
lip-reading (ed. B. Dodd and R. Campbell), pp. 145–61. Erlbaum, London.
Mills, A.E. (1988). Visual handicap. In Language development in exceptional circumstances (2nd edn, 1993)
(ed. D. Bishop and K. Mogford), pp. 150–64. Psychology Press, Hove, East Sussex.
Miner, L.E. (1963). A study of the incidence of speech deviations among visually handicapped children.
The New Outlook for the Blind, 57, 10–14.
Mogford, K. (1987). Lip-reading in the prelingually deaf. In Hearing by eye: the psychology of lip-reading
(ed. B. Dodd and R. Campbell), pp. 191–211. Erlbaum, Hove, East Sussex.
Mohammed, T., Campbell, R., MacSweeney, M., Milne, E., Hansen, P., and Coleman, M. (2005).
Speechreading skill and visual movement sensitivity are related in deaf speechreaders. Perception, 34,
205–216.
Muchnik, C., Efrati, M., Nemeth, E., Malin, M., and Hildesheimer, M. (1991). Central auditory skills in
blind and sighted subjects. Scandinavian Journal of Audiology, 20, 19–23.
Munhall, K.G., and Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audiovisual speech
perception. In The handbook of multisensory processes (ed. G. A. Calvert, C. Spence and B. E. Stein),
pp. 177–88. MIT Press, Cambridge, Massachusetts.
Narayan, C., Werker, J.F., and Beddor, P. (2010). The interaction between acoustic salience and language
experience in developmental speech perception: evidence from nasal place discrimination.
Developmental Science, 13, 407–20.
Navarra, J., and Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information
enables the perception of L2 sounds. Psychological Research, 71, 4–12.
Navarra, J., Alsius, A., Soto-Faraco, S., and Spence, C. (2010). Assessing the role of attention in the
audiovisual integration of speech. Information Fusion, 11, 4–11.
Nazzi, T., and Ramus, F. (2003). Perception and acquisition of linguistic rhythm by infants. Speech
Communication, 41, 233–44.
REFERENCES 227
Nazzi, T., Bertoncini, J., and Mehler, J. (1998). Language discrimination by newborns: Toward an
understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and
Performance, 24, 756–66.
Pascalis, O., Scott, L.S., Kelly, D.J., et al. (2005). Plasticity of face processing in infancy. Proceedings of the
National Academy of Sciences USA, 102, 5297–5300.
Patterson, M.L., and Werker, J.F. (1999). Matching phonetic information in lips and voice is robust in
4.5-month-old infants. Infant Behavior and Development, 22, 237–47.
Patterson, M.L., and Werker, J.F. (2002). Infants’ ability to match dynamic phonetic and gender
information in the face and voice. Journal of Experimental Child Psychology, 81, 93–115.
Patterson, M.L., and Werker, J.F. (2003). Two-month old infants match phonetic information in lips and
Peña, M., Maki, A., Kovacic, D., et al. (2003). Sounds and silence: an optical topography study of language
recognition at birth. Proceedings of the National Academy of Sciences USA, 100, 11702–11705.
Penhune, V.B., Cismaru, R., Dorsaint-Pierre, R., Petitto, L.A., and Zatorre, R.J. (2003). The morphometry
of auditory cortex in the congenitally deaf measured using MRI. Neuroimage, 20, 1215–25.
Piaget, J. (1952). The origins of intelligence in children. International Universities Press, New York.
speech perception in infancy. Proceedings of the National Academy of Sciences USA, 106, 10598–10602.
Rosenblum, L.D., Schmuckler, M.A., and Johnson, J.A. (1997). The McGurk effect in infants. Perception
Ross, L.A., Saint-Amour, D., Leavitt, V.M., Javitt, D.C., and Foxe, J.J. (2007). Do you see what I am saying?
Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17,
1147–53.
Rouger, J., Lagleyre, S. Fraysse, B. Deneve, S., Deguine, O., and Barone, P. (2007). Evidence that cochlear-
implanted deaf patients are better multisensory integrators. Proceedings of the National Academy of
Sciences USA, 104, 7295–7300.
Rouger, J., Fraysse, B., Deguine, O., and Barone, P. (2008). McGurk effects in cochlear-implanted deaf
subjects. Brain Research, 1188, 87–99.
Saffran, J.R., Werker, J.F., and Werner, L.A. (2006). The infant’s auditory world: hearing, speech, and the
beginnings of language. In Handbook of child development (vol. 6) (ed. R. Siegler and D. Kuhn),
pp. 58–108. Wiley, New York.
Sai, F.Z. (2005). The role of the mother’s voice in developing mother’s face preference: evidence for
Sams, M., Aulanko, R., Hamalainen, M., et al. (1991). Seeing speech: visual information from lip
movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–45.
Schorr, E.A., Fox, N.A., van Wassenhove, V., and Knudsen, E.I. (2005). Auditory-visual fusion in speech
perception in children with cochlear implants. Proceedings of the National Academy of Sciences USA,
102, 18748–50.
Sekiyama, K. (1997). Cultural and linguistic factors in audiovisual speech processing: the McGurk effect in
Chinese subjects. Perception and Psychophysics, 59, 73–80.
Sekiyama, K., and Burnham, D. (2008). Impact of language on development of auditory-visual speech
perception. Developmental Science, 11, 306–20.
Sekiyama, K., and Tohkura, Y. (1991). McGurk effect in non-English listeners: few visual effects for
Japanese subjects hearing Japanese syllables of high auditory intelligibility. Journal of the Acoustical
Society of America, 90, 1797–1805.
Sekiyama, K., and Tohkura, Y. (1993). Inter-language differences in the influence of visual cues in speech
perception. Journal of Phonetics, 21, 427–44.
Sommers, M.S., Tye-Murray, N., and Spehar, B. (2005). Auditory-visual speech perception and auditory-
visual enhancement in normal-hearing younger and older adults. Ear and Hearing, 26, 263–75.
Soto-Faraco, S., Sebastián-Gallés, N., and Cutler, A. (2001). Segmental and suprasegmental mismatch in
lexical access. Journal of Memory and Language, 45, 412–32.
Soto-Faraco, S., Navarra, J., Voloumanos, A., Sebastián-Gallés, N., Weikum, W.M., and Werker, J.F.
(2007). Discriminating languages by speechreading. Perception and Psychophysic, 69, 218–31.
Spelke, E. (1976). Infants’ intermodal perception of events. Cognitive Psychology, 8, 553–60.
Stein, B.E., and Meredith, M.A. (1993). The merging of the senses. MIT Press, Cambridge, Massachusetts.
Sumby, W.H., and Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America, 26, 212–215.
Teinonen, T., Aslin, R.N., Alku, P., and Csibra, G. (2008) Visual speech contributes to phonetic learning in
6-month-old infants. Cognition, 105, 850–55.
Thelen, E., and Smith, L.B. (1994). A dynamic systems approach to the development of cognition and action.
MIT Press, Cambridge, Massachusetts.
Vouloumanos, A., and Werker, J.F. (2007). Listening to language at birth: evidence for a bias for speech in
neonates. Developmental Science, 10, 159–64.
Vouloumanos, A., Hauser, M.D., Werker, J.F., and Martin, A. (2010). The tuning of human neonates’
preference for speech. Child Development, 81, 517–27.
Walden, B.E., Busacco, D.A., and Montgomery, A.A. (1993). Benefit from visual cues in auditory-visual speech
recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research, 36, 431–36.
Walker-Andrews, A.S. (1986). Intermodal perception of expressive behaviors: relation of eye and voice?
Walker-Andrews, A.S. (1997). Infants’ perception of expressive behaviors: differentiation of multimodal
Walton, G.E., and Bower, T.G.R. (1993). Amodal representations of speech in infants. Infant Behavior and
Weikum, W.M., Vouloumanos, A., Navarra, J., Soto-Faraco, S., Sebastián-Gallés, N., and Werker, J.F.
(2007). Visual language discrimination in infancy. Science, 316, 1159.
Weikum, W.M., and Werker, J.F. (2008). Cues infants use to visually discriminate languages. Biennial
International Conference on Infant Studies (ICIS), Vancouver, Canada.
Werker, J.F., and Lalonde, C.E. (1998). Cross-language speech perception: Initial capabilities and
developmental change, Developmental Psychology, 24, 672–83.
Werker, J.F., and Byers-Heinlein, K. (2008). Bilingualism in infancy: first steps in perception and
comprehension. Trends in Cognitive Sciences, 12, 144–51.
Werker J.F., and Tees, R.C. (1984). Cross-language speech perception: evidence for perceptual
reorganization during the first year of life. Infant Behavior and Development, 7, 49–63.
Yehia, H., Rubin, P., and Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract and facial
behavior. Speech Communication, 26(1–2), 23–43.
Yeung, H.H., and Werker, J.F. (2009). Learning words’ sounds before learning how words sound: 9-month-
olds use distinct objects as cues to categorize speech information. Cognition, 113, 234–43.
Yeung, H.H., Scott, J., Gick, B.W., and Werker, J.F. (in preparation). Silently articulating while perceiving
speech produces an auditory illusion.
Yoshida, K.A., Pons, F., Maye, J., and Werker, J.F. (2010). Distributional phonetic learning at 10 months of
age. Infancy, 15, 420–33.
Chapter 10
Infant synaesthesia
New insights into the development of
multisensory perception
Daphne Maurer, Laura C. Gibson,
and Ferrinne Spector
10.1 Synaesthesia
Synaesthesia is a neurological phenomenon that involves crossmodal or cross-dimensional per-
ceptions that are not related to environmental stimuli. Stimulation of one sense, such as hearing,
triggers the normal perception of a specific sound, but also an additional perception in another
sense, such as a specific colour. Most of the 63 forms of synaesthesia identified to date (Day 2011)
are multisensory (e.g. in ‘coloured hearing’, stimulation of one sense evokes an additional percep-
tion in a different sense). However, one common form, called ‘coloured-grapheme synaesthesia’,
is unisensory and involves perceiving black letters and digits in colour. These extra perceptions
are highly specific within each individual synaesthete, but vary to a large extent across synaes-
thetes. For example, among synaesthetes with coloured hearing, middle C played on the piano
may induce fire-engine red for one synaesthete and chartreuse for another, while D sharp above
middle C induces lavender for the first and vermilion for the second. Individuals often report
more than one form of synaesthesia and, regardless of its type, the synaesthetic perceptions are
consistent over time. These unusual crossmodal perceptions are automatic and involuntary, and
synaesthetes report having had them ‘all their lives’. In ‘projector’ synaesthetes, the synaesthetic
perceptions are experienced as superimposed onto real-world stimuli, while in ‘associator’ synaes-
thetes, synaesthetic perceptions are experienced as internal, unavoidable associations in the
‘mind’s eye’ (Dixon et al. 2004).
In this chapter, we will summarize evidence that synaesthesia is a remnant of a normal devel-
opmental process involving an initial proliferation of synaptic connections, including connec-
tions linking cortical areas that will later become specialized for unisensory processing. Postnatal
experience refines those connections by strengthening those that match the child’s environment
and largely eliminating the rest. In synaesthesia, more of these initial connections appear to
remain, with the ability to influence conscious perception. Therefore, we will argue that crossmo-
dal and cross-dimensional associations commonly manifested in synaesthetic adults provide
clues about cortical connections in early childhood that may influence perception in the typical
non-synaesthetic child. We will provide behavioural evidence from children supporting this
viewpoint. We will also argue that remnants of the original connections are present even in
non-synaesthetic adults, in whom their influence is manifested not in conscious perception, but
in implicit crossmodal associations in perception.
230 INFANT SYNAESTHESIA
In the first section of this chapter, we will review evidence on the nature of synaesthetic percep-
tion and its neural basis. In the middle section, we summarize the evidence on the developmental
origins of synaesthesia. In the final section, we will derive hypotheses from research on synaesthe-
sia for understanding the perception of typical non-synaesthetic children and present evidence
verifying some of those hypotheses. This perspective provides new insights into the development
of perception and even language.
10.2 Perceptual manifestations

Synaesthetic experiences can occur across different senses (e.g. ‘coloured hearing’) or within the
same sense (e.g. ‘coloured graphemes’). It is difficult to estimate the prevalence of synaesthesia, as
some individuals with the condition are not aware that their perceptual experiences are atypical,
while others keep their synaesthetic experiences private to avoid being stigmatized (Rich et al.
2005 ). Earlier estimates of the prevalence of synaesthesia ranged from 1 in about 200
(Ramachandran and Hubbard 2001) to 1 in 25,000–100,000 (Cytowic 1997). However, these
estimates are based on the frequency with which adults refer themselves to researchers studying
synaesthesia, and, as such, likely underestimate the true incidence. Studies of the general popula-
tion of university students indicate a much higher prevalence. Two experiments investigated the
prevalence of sequence-form synaesthesia, in which ordinal sequences like numbers, days of the
week, or months of the year are perceived in specific spatial locations (Galton 1881), in samples
of 50 (Mann et al. 2009) and 500 university students (Sagiv et al. 2006b). Approximately one-
quarter of the students reported at least one manifestation of sequence-form synaesthesia, and
those who reported these associations were faster at behavioural tasks requiring mental manipu-
lations of the sequence (e.g. name every second month in reverse chronological order; Mann et al.
2009). Another study investigated the prevalence of other forms of synaesthesia (excluding
sequence-form synaesthesia) in a large sample of 500 university students. The students were
asked to give details of any synaesthetic experiences corresponding to a list of possible inducers
on an original test and a surprise retest 6 months later (Simner et al. 2006). Their consistency was
compared to that of adults reporting no synaesthesia, who also made crossmodal associations to
the same items during two sessions just 2 weeks apart. Among the 500 participants, 4.4% reported
at least one type of synaesthesia, which was verified by its greater consistency in the synaesthetic
group than that shown by the control group. Most forms involved colour as the extra elicited
perception, with coloured-grapheme synaesthesia reported by 45% of those indicating at least
one form of synaesthesia. A similar prevalence for colour-grampheme synaesthesia (1.1% of the
general population) was found in a survey of 1190 individuals recruited at a science centre (Simner
et al. 2006) and in a study of 615 children aged 6–8 years old (1.3%; Simner et al. 2009a).
The possibility of a female bias in synaesthesia has been raised repeatedly in the literature, with
estimates of a female:male ratio as high as 6:1 (Baron-Cohen et al. 1996; Rich et al. 2005). However,
this purported female bias likely reflects a greater reluctance of men than women with synaesthe-
sia to identify themselves by contacting researchers. The two aforementioned studies performed
with university students and science-centre patrons both suggest that all forms of synaesthesia
occur equally often in men and women, including the two most common types of the condition
(sequence-form and coloured-grapheme synaesthesia; Sagiv et al. 2006a; Simner et al. 2006).
Galton (1881) was the first to describe the possibility of a familial basis for synaesthesia when
he observed that many of the sequence-form synaesthetes he encountered had close relatives with
similar perceptual experiences. Subsequent investigations have revealed that 33–44% of synaes-
thetes recruited by self-referral report at least one other family member with synaesthesia (Barnett
et al. 2008; Baron-Cohen et al. 1996; Ward and Simner 2005), although often not of the same type
PERCEPTUAL REALITY 231
and never with exactly the same synaesthetic perceptions. Family-tree analyses have revealed
no case of father–son transmission, but two cases of monozygotic twins (one pair of each sex)
in which one twin is synaesthetic and the other is not (Smilek et al. 2002; Smilek et al. 2005).
These patterns in twins indicate that synaesthesia is not caused by a single dominant gene. Thus
the familial patterns do suggest a genetic influence, but the nature of that influence is not yet
clear.
Although synaesthesia is described as an atypical neurological condition, those who experi-
ence synaesthesia do not regard it as a perceptual or cognitive deficit. On the contrary, many
individuals with synaesthesia consider the condition to be a gift, and cannot imagine a life lack-
ing in their atypical perceptual experiences. In fact, synaesthetes outperform controls on some
perceptual tasks related to the sensory modality in which they experience synaesthesia. For
example, those with colour-grampheme synaesthesia have better colour vision, colour memory,
and word memory than controls (Banissy et al. 2009; Yaro and Ward 2007), and those with
sequence-form synaesthesia have better memory for the timing of world events and are better on
tasks requiring the manipulation of units of time or space (Mann et al. 2009; Simner et al. 2009b;
but see Ward et al. 2009, for slower mathematical operations in this form of synaesthesia).
In addition, synaesthesia may facilitate certain types of creativity and artistic pursuit (Rich et al.
2005; Ward et al. 2008).
10.3 Perceptual reality

For non-synaesthetes, the idea of a synaesthetic experience is difficult to conceive. As such, the
condition has long been viewed as a curiosity that might arise from overly active imagination
(Hubbard and Ramachandran 2005), and the last century has seen intense debate concerning the
perceptual reality of synaesthesia. Recently, a variety of behavioural and neuroimaging methods
have verified the validity of the synaesthetic experience.
10.3.1 Behavioural evidence

One type of behavioural evidence establishing the perceptual reality of synaesthesia is the consist-
ency of reported percepts over time. For example, in one study 26 adults with coloured-hearing
synaesthesia and 23 controls matched specific sounds to their choice of 238 calibrated colour
swatches. When retested at least one month later, the synaesthetes’ consistency was more than
twice as high as that of the controls, who had been instructed to remember their choices until
their retest one week later (Asher et al. 2005). Such high consistency has been found in other stud-
ies of coloured-hearing synaesthesia, as well as in studies of sequence-form synaesthesia, word-
taste synaesthesia, and coloured-grapheme synaesthesia (e.g. Baron-Cohen et al. 1993: Dixon
et al. 2000; Eagleman et al. 2007; Edquist et al. 2006; Smilek et al. 2007b; Ward et al. 2006; Ward
and Simner 2003). Typically, even the most consistent control subject does not approach the
range of consistency shown by the synaesthetic group. In fact, the consistency of reported synaes-
thetic experiences over time has come to be known as the behavioural gold standard for assessing
the reality of the condition (Cytowic 1989).
Stroop-like interference paradigms have demonstrated the automatic and obligatory nature of
synaesthetic percepts. In a traditional Stroop task, participants are asked to name the ink colour
of colour words printed in a consistent (red printed with red ink) or inconsistent (red printed in
green ink) colour (Stroop 1935). Participants’ longer response times and more frequent errors on
inconsistent trials indicate that the automatic decoding of word meaning interferes with naming
the colour of the ink. Similar interference occurs when synaesthetes with coloured hearing are
asked to name the colour of a visual patch while ignoring irrelevant tones that induce consistent
colours (a green-inducing tone at the same time as a green visual patch) or inconsistent colours
(a red-inducing tone with the green patch). Individuals with coloured hearing are significantly
slower on inconsistent trials than on consistent trials, a pattern suggesting that the synaesthetic
colour elicited by the tone occurs automatically and interferes with, or slows down, the percep-
tion and naming of the actual ink colour (Ward et al. 2006). Similar Stroop-like interference
occurs in synaesthetes with coloured graphemes when they are shown letters or digits printed in
colours consistent or inconsistent with their synaesthesia and are then asked to name the colour
of ink while ignoring the letter (Dixon et al. 2000; Mills et al. 1999; Mattingley et al. 2001;
Mattingley et al. 2006; Nikolic´ et al. 2007; Rich and Mattingley 2003; Ward et al. 2006).
Synaesthetic crossmodal percepts also cue visual attention in a manner that suggests that they
are perceptually valid. For example, when synaesthetes with coloured hearing are exposed to a
tone while observing an array of colour patches, their attention is cued toward the visual patch
matching the colour induced by the sound, as evidenced by their faster detection of targets in that
location (Ward et al. 2006). Similarly, when individuals with sequence-form synaesthesia are
asked to detect targets appearing in various locations in visual space, each preceded by an irrele-
vant digit, they are faster if the digit is one that is perceived synaesthetically in the same location
as the target than if it is perceived as located elsewhere (Jarick et al. 2009). Akin to data on consist-
ency over time, the behavioural findings on Stroop interference and attentional cueing suggest
that synaesthetic percepts behave like typical, non-synaesthetic percepts, further establishing the
perceptual reality of the phenomenon.
10.3.2 Neuroimaging evidence

Neuroimaging evidence has also played a role in establishing synaesthesia as a genuine perceptual
phenomenon. Studies using functional magnetic resonance imaging (fMRI) have demonstrated
that stimuli reported to induce specific synaesthetic percepts elicit brain activity in regions known
to be involved in the processing of those perceptual characteristics. For example, an fMRI study
with coloured-hearing synaesthetes and non-synaesthetic controls showed that spoken words
elicited activations in areas V4 and V8 of the visual cortex (human colour area) in synaesthetes,
but not controls (Nunn et al. 2002; see also Paulesu et al. 1995, for similar results from positron
emission topography). Subsequent fMRI studies have replicated this V4/V8 activation in response
to words, as well as, in some studies, other parts of the visual pathway and areas of the parietal
cortex known to be involved in the binding of colour to shape in typical perception (Gray et al.
2006; Hubbard et al. 2005; Stevens et al. 2006; Witthoft and Winawer 2006). In other words, for
synaesthetes with coloured hearing, hearing words activates not only the auditory and lexical
pathways, but also pathways normally involved in the processing of colour and its linking to
shape. In a similar study of a synaesthete who tastes words (‘Philip’ evokes the taste of ‘oranges
not quite ripe’), listening to words caused the typical activation in lexical decoding areas, but,
crucially, also caused activation in the primary gustatory cortex that is active during the experi-
ence of taste in the typical brain (Ward and Simner 2003). Similar conclusions come from fMRI
studies of coloured-grapheme synaesthetes and non-synaesthetic controls who have been pre-
sented with arrays of achromatic digits. All participants show activation in the grapheme area of
the cortex, however synaesthetes also show activity in visual colour area V4, with some reports of
activity in lower visual areas and in a number of higher cortical areas, including the intraparietal
sulcus (Hubbard et al. 2005; Rouw and Scholte 2007, 2010; Sperling et al. 2006; but see Rich et al.
2006). A role for the parietal cortex in the induction of this form of synaesthesia was confirmed
in two studies using transcranial magnetic stimulation (TMS): Temporarily interrupting parietal
cortex activity reduced the Stroop-like interference between real-world and synaesthetic colour
(Esterman et al. 2006; Muggleton et al. 2007). Combined, the fMRI and TMS results provide
HYPOTHESIZED DEVELOPMENTAL ORIGINS 233
strong evidence that the reported synaesthetic percepts are mediated by neural pathways active
during real-world perception of the same stimuli.
10.4 Hypothesized developmental origins

There are two main hypotheses about the developmental origins of synaesthesia: cross-activation
of cortical areas because of:
◆ less-than-normal pruning of the exuberant connections that are prevalent during early
development
◆ less-than-normal development of inhibition of such connections.
There is evidence to support each hypothesis, and they may well both contribute to the manifesta-
tion of synaesthesia. What is important for this chapter is that both hypotheses represent
an atypical playing out of a normal developmental process. Hence, both hypotheses imply that
the neural substrate for unusual connections among dimensions and modalities exists in the
typically developing child and may persist in muted form in typical, non-synaesthetic adults.
In this section, we review the evidence for each hypothesis and its relationship to the normal
developmental process.
10.4.1 Pruning
In typical adults, each sensory cortical area is specialized for the processing of information from
one sensory system. However, that is the end point of a developmental process that begins with
an exuberant production of synapses, including ones between sensory cortical areas. During
development, these exuberant synapses undergo experience-dependent pruning, such that stimu-
lated synapses are strengthened and refined, while inactive synapses are pruned. As a result of this
process, each sensory cortex becomes tuned to a single modality because it receives the strongest
and most coherent input from that sense. The temporary connections are most likely to form
between contiguous cortical areas. Whilst exuberance and pruning appear to be ubiquitous proc-
esses, their timetables differ across species and sensory modalities (e.g. Bourgeois and Rakic 1993;
Dehay et al. 1984; Huttenlocher 1984). In the cat, such exuberant connections form during
infancy among primary visual, auditory, and somatosensory cortices, but during development
these synapses are pruned so that these areas become unisensory areas (Dehay et al. 1988). In the
monkey, such ubiquitous exuberance has not been observed, but there is evidence of the forma-
tion of connections during infancy from primary auditory cortex to visual area V4, the area active
during coloured hearing and coloured grampheme synaesthesia (see Section 10.3.2), connections
that are no longer observed in the brain of the adult monkey (Kennedy et al. 1997).
In humans, there is indirect evidence for a similar developmental process of over-production
of synapses and experience-dependent pruning. Anatomical tracing has revealed a process of
over-expression of dendritic spines early in life followed by a reduction to adult levels in every
cortical area, although the timing of the peak over-production and subsequent pruning varies
across cortical areas, and even between layers within each area (e.g. Huttenlocher 1984). A similar
pattern emerges in measurements of the levels of glucose utilization, which reflect the metabolic
demands of neural activity; for example, there is an increase in glucose utilization in sensory cor-
tical areas over the first 3–4 years of life, which is followed by a plateau and a subsequent decrease
beginning at about age 9 (Chugani 1994; Chugani and Phelps 1986; Chugani et al. 1987).
Comparisons of cortical specialization in normal, blind, and deaf adults indicate that this
pruning is experience-dependent. When normal input is missing, for example because of
congenital blindness or deafness, the normal development of unimodal specialization of the
visual and auditory cortices fails to occur. In the congenitally blind, the adult visual cortex
responds to auditory, tactile, and language input, and interference by TMS degrades performance
(reviewed in Spector and Maurer 2009b). Studies of adult cats that were blinded at birth (by
removing the eyes) indicate that the visual cortex failed to develop its normal specialization;
instead, the visual cortex contains neurons that give well-tuned responses to auditory stimuli
(inputs from other modalities have not been tested) (Yaka et al. 1999). Similarly, in humans with
congenital deafness, the adult auditory cortex responds to both visual and tactile stimulation
(reviewed in Spector and Maurer 2009b).
Together, these studies suggest that unisensory cortical areas become specialized as a result of
dominant input from a single sense, leading to strengthening and refinement of direct sensory
connections and the pruning away of the initially prevalent connections from other senses. When
the dominant input is missing because of deprived input, such as deafness or blindness, no such
specialization occurs, and the area remains multisensory in adulthood.
There is indirect evidence that yet-to-be-pruned crossmodal connections are functional in
early childhood. For example, in young infants, human speech elicits event-related potentials
over the auditory cortex, as it does in adults. However, unlike in adults, speech also elicits strong
responses over the visual cortex, which gradually diminish over the first three years of life (Neville
1995). In newborns, tactile stimulation of the wrist elicits an evoked response over the somato-
sensory cortex, as it does in adults. However, unlike in adults, the response is larger when white
auditory noise accompanies the tactile stimulation (Wolff et al. 1974). Similarly, PET (positron
emission tomography) data reveal that even the response to human faces is not as localized early
in life as it is in adults. For example, similar to adults, 2-month-olds who looked at faces and a
control visual stimulus of Christmas tree lights exhibited more activity in response to faces in the
right inferior gyrus, close to the classic fusiform face area of adults. Unlike adults, however, there
was also more activity in the left auditory cortex and left Broca’s area, areas that later become
specialized for hearing and language (Tzourio-Mazoyer et al. 2002). Together with the evidence
for congenital blindness and deafness, these findings suggest that the human cortex is not initially
subdivided into specialized unisensory areas, but instead that unisensory cortical areas develop
postnatally in response to the patterns of environmental input received.
There is also indirect evidence for early functional interconnection among visual pathways that
will later become segregated and specialized. For example, in adults, the initial processing of
motion and colour occur via distinct, parallel pathways, such that the motion system appears to
be largely ‘colour-blind’. One demonstration of this is that adults can readily report the direction
of moving stripes if they are formed by contrasting luminance (e.g. light and dark stripes), but not
if the stripes are formed by equiluminant contrasting colours (e.g. red and green stripes). Young
infants 2–4 months old, however, are equally proficient at seeing moving stripes that are defined
by luminance or by colour, suggesting that colour pathways initially have connections to the
motion pathway that are later retracted (Dobkins and Anderson 2002). Thus, in the infant brain,
colour processing appears not yet to be confined to specialized networks.
Combined, these studies suggest that initially there are connections among and within sensory
cortical areas that are pruned during development in an experience-dependent manner. These
extra connections are most prevalent during early childhood, and exactly when they diminish to
adult levels likely varies with respect to the cortical areas they connect. The initial exuberant con-
nections are likely to be fairly local, such as those connecting contiguous brain areas. Indirect
evidence on timing comes from studies of glucose metabolism and fibre tracking. The increased
levels of local connectivity, revealed by glucose metabolism to be present in early childhood, do
not begin to decline toward adult levels until around age 9 years (Chugani et al. 1987). A similar
conclusion comes from recent studies tracing interconnected areas based on similar fluctuations
in resting state metabolism, confirmed by fibre tracking with DTI (diffusion tensor imaging);
both methods indicated that as late as age 7–9 years, local connectivity is stronger in children than
in adults (Fair et al. 2009; Supekar et al. 2009).
The ‘pruning’ hypothesis of synaesthesia explains the phenomenon as the result of less-than-
normal pruning of the exuberant connections seen in early development; some of the connec-
tions are posited to remain functional and influence conscious perception (e.g. Maurer and
Maurer 1988; Ramachandran and Hubbard 2001). As would be expected from the normal devel-
opmental process, manifestations of synaesthesia often involve inductions between contiguous
brain areas; colour may often be induced because areas V4 and V8 that are involved in its media-
tion lie amidst many other brain areas. Also, as atypical synaptic pruning would likely not be
restricted to one cortical area, many synaesthetes have more than one form of synaesthesia.
However, there is no explanation of what might cause less-than-normal pruning to occur or,
further, why it affects some connections between contiguous areas and not others.
The pruning hypothesis predicts that synaesthetes will exhibit more-than-normal neural con-
nections between cortical regions involved in their atypical perceptions. There is evidence from
MRI (magnetic resonance imaging) and DTI indicating increased grey and white matter, respec-
tively, in the expected areas. For example, a study of synaesthetes with coloured hearing observed
increased cortical thickness, surface area, and volume throughout the ventral visual pathway
where colour is processed, as would be expected if there were increased connectivity because of
reduced pruning (Jäncke et al. 2009). Importantly, this experiment also uncovered a possible dif-
ference between associator and projector synaesthetes, in that both groups had increased grey
matter volume in the left parietal cortex, which is implicated in the binding of colour to shape,
but only projectors showed the increases in the visual cortex, along with increases in auditory,
motor, and superior frontal areas (Rouw and Scholte 2010). Associators, on the other hand,
showed increased grey matter volume in the hippocampus. An MRI study of 18 individuals with
coloured-grapheme synaesthesia and 18 matched controls also found evidence consistent with
increased volume from decreased pruning (Weiss and Fink 2009). The synaesthetes had increased
density of grey matter in the right fusiform gyrus, near the V4 colour area and visual word form
area, and correlated increases in the left intraparietal sulcus, near hIP3, an area known to be
involved in the multisensory processing that links colour and form. A DTI study of individuals
with coloured-grapheme synaesthesia revealed evidence of increased volume of white matter fibre
tracts, compared to matched controls, in three areas:
◆ in a visual word form area in the right inferior temporal cortex next to an area in the fusiform
gyrus implicated in colour processing
◆ in areas in the left parietal cortex implicated in the binding of colour to form
◆ in the superior frontal cortex (Rouw and Scholte 2007).
Interestingly, the volume differences in the visual word form area were greater for projector
than associator synaesthetes. Similar patterns (increased fibre tracts and grey and white matter
volume) were found in auditory, gustatory, and visual areas in a synaesthete with both coloured
hearing and an usual form of synaesthesia in which specific tone intervals induce specific tastes
(e.g. a minor second tastes sour; Hänggi et al. 2008).
The DTI and MRI findings converge nicely with fMRI evidence of activity in area V4 and in the
fusiform gyrus in individuals with coloured hearing and coloured-grapheme synaesthesia (e.g.
Hubbard et al. 2005; Rouw and Scholte 2010; Stevens et al. 2006) and its suppression when TMS
is applied to the parietal cortex, at least on the right (Esterman et al. 2006; Muggleton et al. 2007;
see Section 10.3.2). It is in those areas that there is consistent evidence across studies of increased
cortical thickness and white matter tracts. The DTI and MRI evidence for increased fibres/
thickness in the superior frontal cortex was unexpected, and might reflect the effort needed to
constantly distinguish between synaesthetic and veridical perceptions. Also unexpected was the
evidence for changes in hippocampal memory networks in associators.
10.4.2 Inhibition
The second hypothesis about the origins of synaesthesia proposes that the condition arises from
altered feedback from higher cortical areas onto lower sensory cortical areas (Grossenbacher and
Lovelace 2001). In the typical adult, this feedback strengthens the firing of neurons responding
coherently to the perceptual properties of a stimulus and inhibits the firing of more isolated neu-
rons. For example, this process will facilitate the continued firing of neurons in the primary visual
cortex tuned to diagonal orientations when viewing the letter X, while simultaneously inhibiting
the firing of neurons tuned to horizontal and vertical orientations. In synaesthesia, according
to this account, some of this inhibition is disinhibited, resulting in extra firing, including
that elicited by any remnants of the exuberant connections of early childhood among dimensions
(e.g. colour and motion) or modalities (e.g. vision and hearing).
Note that this account assumes that not all of the exuberant connections are eliminated by
pruning, but rather that some connections remain that are normally inhibited. Indeed, in adult
monkeys, neurons in the primary visual cortex with receptive fields in the visual periphery receive
direct input from the primary auditory cortex (Falchier et al. 2002), and responses of neurons in
monkey auditory cortex are modulated by simultaneous visual input (Bulkin and Groh 2006).
Indirect evidence for such connections in the typical human adult comes from studies of blind-
folded adults. For example, after 5 days of blindfolding to remove the normal, dominant visual
input, the visual cortex became active during tactile and auditory discriminations, and interfer-
ence with visual cortex activity by TMS (transcranial magnetic stimulation) disrupted tactile
discrimination (auditory discrimination was not tested; Pascual-Leone and Hamilton 2001).
These findings suggest that some direct crossmodal connections between sensory cortical areas
remain in the typical adult brain, but that these connections are normally sufficiently inhibited to
have no effect on conscious perception. They nevertheless can have indirect effects on adults’
perception (see Section 10.5). Note that during normal development, such crossmodal connec-
tions will be more prevalent and are less likely to be inhibited (e.g. Burkhalter 1993).
By this account, synaesthesia arises not from a lack of pruning, but from a failure to inhibit
remaining crossmodal and cross-dimensional connections. As with pruning, there is no explana-
tion of what causes synaesthetes to develop less-than-normal inhibition. The ‘inhibition’ explana-
tion of synaesthesia predicts that it should be possible to induce synaesthesia in non-synaesthetes
by reducing inhibitory feedback. Indeed, it is possible to elicit reports of synaesthesia in typical,
non-synaesthetic adults through hypnosis (Cohen Kadosh et al. 2009) and ingestion of LSD
(Aghajanian and Marek 1999), although the mechanism has not been established, nor whether
what is induced has all the hallmarks of synaesthesia, including consistency, automaticity, and
modulation by TMS.
10.4.3 Role of experience

Synaesthetes report that they have had extra synaesthetic percepts ‘all their lives’. Nevertheless,
many forms of synaesthesia must have emerged postnatally, as they are elicited by culturally
learned stimuli such as the letters of the alphabet, days of the week, or specific words. One possi-
bility is that before learning such stimuli, synaesthesia was manifested in a different, but related,
form—as a connection, for example, between basic shapes and colour (a prelude to coloured
graphemes), between sequences of daily events and spatial locations (a prelude to sequence-form
synaesthesia), or between specific combinations of sounds and taste (a prelude to lexical-

gustatory synaesthesia). This possibility is supported by the early drawings of words made by one
coloured-grapheme synaesthete before she learned to read (Duffy 2001): She drew coloured
patches resembling a primitive Mondrian. Because of reduced pruning and/or inhibition, indi-
viduals predisposed to synaesthesia may experience extra percepts automatically as they learn the
alphabet, learn to read, taste new foods, hear new musical notes, etc. From the synaesthete’s per-
spective, the additional percepts have been there ‘always’; that is, always when the individual was
exposed to the inducing stimulus. The specific synaesthetic perceptions may be influenced by
couplings encountered in the child’s environment, such as the colour of shapes (e.g. letters on
refrigerator magnets). Because the child has less than typical pruning or inhibition, those cou-
plings may be retained and become perceptual in nature, unlike the fleeting effect they have on
typical children (Witthoft and Winawer 2006). However, for most synaesthetes with coloured
graphemes, no environmental source for the specific idiosyncratic mappings can be found,
despite perusal of children’s books and the child’s toy box in search of coloured letters or digits to
which they were or might have been exposed (Rich et al. 2005). Indeed, in an ongoing study of
three preschool children with colour-grampheme synaesthesia, we have found no correspond-
ence between the colours they report perceiving for each letter of the alphabet and anything in
their home environment (Spector and Maurer unpublished data).
When a stimulus induces a synaesthetic experience early in development during the period of
experience-dependent pruning, rather than being encountered for the first time later in develop-
ment, the underlying neural connections should be strengthened by their co-activation and lead
to the establishment of a stronger synaesthetic percept. For the same reason, inducers encoun-
tered frequently during development should lead to stronger percepts than those encountered
infrequently. With the reinforcement of these neural connections by their co-activation, one
would expect the elicited synaesthetic percepts to become stronger, more consistent, and more
easily transferred to similar forms or sounds or associated stimuli (Mroczko et al. 2009). Indeed,
in young children showing signs of colour-grampheme synaesthesia, the consistency of the
reported associations increases between age 3.5 and 5.5 years (Spector and Maurer 2009a) and
after 1 year of school (Simner et al. 2009a). Data from adults with coloured-grapheme synaes-
thesia are also consistent with these predictions: low numbers (1, 2, 3—the numbers that chil-
dren learn first) and high-frequency letters elicit brighter synaesthetic percepts (Beeli et al. 2007;
Cohen Kadosh et al. 2009; Simner 2007; Smilek et al. 2007a; Smith and Sera 1992). In contrast,
when a new category of inducer is encountered in adulthood (e.g. a new alphabet), new synaes-
thetic percepts may arise, but they are usually less strong than existing ones (Mills et al. 2002;
Ward and Simner 2003).
Efforts to induce synaesthesia by training in typical adults have proven unsuccessful. Non-
synaesthetic adults have learned to associate letters or digits with specific arbitrary colours after
many hours of training in the laboratory and after studying consistent patterns linking numbers
to colours in cross-stitch needlepoint (Elias et al. 2003; Meier and Rothen 2009). After training,
these associations cause a congruency effect on Stroop tasks, much like genuine synaesthetes; but,
unlike synaesthetes, trained non-synaesthetes do not report experiencing synaesthetic percepts,
the letters do not evoke responses conditioned to the colours, and extra-striate visual areas are not
activated when doing mental manipulations with the inducers (e.g. mental arithmetic with
trained digits; Elias et al. 2003; Meier and Rothen 2009). Thus, synaesthesia cannot be induced by
training in typical, non-synaesthetic adults. Nevertheless, in those with synaesthesia, the indi-
vidual’s developmental history of encountering inducers and specific associations can affect the
specificity and strength of their synaesthetic percepts.
10.4.4 Which correspondences?

It is not clear why each individual synaesthete develops an idiosyncratic set of specific corre-
spondences, either a specific form of synaesthesia (e.g. coloured letters or coloured tones) or
the specific set of correspondences within that form (e.g. middle C as red or pink). However,
common patterns across synaesthetic perceptions in adults with synaesthesia and the crossmodal
associations of adults without synaesthesia indicate that some specific correspondences are more
likely than others. These similarities may be related to the systematic organization within each
sensory cortical area, such that neurons with similar tuning preferences lie closest to each other
and farthest from neurons with the orthogonal tuning. Hence any mapping between contiguous
cortical areas is likely to link neurons with one set of preferences in one domain (e.g. high audi-
tory frequency) to neurons with a particular set of preferences in another domain (e.g. jagged
shape). As discussed in Section 10.4.3, idiosyncratic exposure to crossmodal or cross-dimensional
associations in the environment (e.g. coloured letters on fridge magnets) may also influence the
specific mappings acquired.
10.5 Implications for development

The hypothesis that incomplete pruning and/or inhibition lead to synaesthesia has three implica-
tions for the development of the typical non-synaesthetic child.
1. Before the completion of experience-dependent pruning within and between sensory cortical
areas, the child should show synaesthetic-like crossmodal and cross-dimension associations.
2. Even in adulthood, there may be remnants of these earlier associations because the pruning
and/or inhibition are not complete.
3. There will be similarities in the cross-dimensional and crossmodal percepts of synaesthetic
adults and the associations of typically developing children and non-synaesthetic adults.
Recent studies have documented many instances supporting the third prediction by showing
similarities between synaesthetic and non-synaesthetic adults (for a review see Spector and
Maurer 2009b). In this section, we concentrate on the associations that have also been shown in
young children. Some of these associations may be based on early learning because they match
environmental statistics (high pitch = small size; e.g. mice squeak and lions roar; children have
higher pitched voices than adults) or appear only later, after the child has had many years to
learn crossmodal and cross-dimensional correspondences in the environment and begun to
acquire culturally-specific knowledge, like the spelling of words (e.g. the letter ‘b’ goes with
blue). Other correspondences, however, have no ready learning explanation and are already
present in infancy or toddlerhood (e.g. high pitch = sharp and bright). Rather than learning,
these associations appear to reflect natural associations favoured by the intrinsic wiring of the
nervous system.
10.5.1 Magnitude mapping: loudness—lightness

Newborn infants appear to match louder sounds to brighter lights and softer sounds to darker
lights. Thus, after habituation to a bright patch of light, newborns show less heart rate response to
an intense sound; after habituation to a dark patch, they respond less to a soft sound—as if they
translated the magnitude of the light stimulation into auditory magnitude and generalized habit-
uation across domains (Lewkowicz and Turkewitz 1980). Such magnitude mapping is common
in adults, in whom it is manifest in situations as diverse as judging the duration of notes (longer
if accompanied by a larger gesture), the loudness of white noise (louder if accompanied by light),
IMPLICATIONS FOR DEVELOPMENT 239
the sweetness of sugar water (sweeter if redder), or the intensity of odour solutions (more intense
if coloured, even if the colour is wrong, e.g. green strawberry) (Johnson and Clydesdale 1982;
Odgaard et al. 2004; Schutz and Lipscomb 2007; Stein et al. 1996; Zellner and Kautz 1990; reviewed
in Spector and Maurer 2009b). Although the common code for magnitude may be learned (e.g.
larger objects do tend to make louder sounds), the evidence for magnitude matching at birth sug-
gests it may instead, or in addition, be based on a natural bias in the associations between sensory
modalities that is independent of environmental learning. By natural bias, we mean that the asso-
ciation may develop from general developmental processes rather than the learning of statistical
regularities in associations present in the environment.
10.5.2 Visual associations to pitch

Pitch is an interesting dimension for the investigation of these questions because it is metathetic:
unlike brightness and loudness, it has no obvious end that is of greater magnitude—adults do not
agree on whether treble tones are more or less than bass tones; indeed, our labelling of treble tones
as higher pitch is an arbitrary convention of our language. There are a number of common asso-
ciation to pitch:
A) Sharpness. Adults with coloured hearing report that lower pitches evoke rounder visual
images and higher pitches evoke more pointed images (Marks 1974), an association that
affects the associations of non-synaesthetic adults as well. For example, non-synaesthetic
adults pick a tone of a lower pitch as the most suitable match for a rounder shape (O’Boyle
and Tarte 1980), and they are faster and more accurate to report that a visual form is round
rather than pointed if it is accompanied by an irrelevant tone of lower rather than higher pitch
(Marks 1987). In fact, when asked to judge whether a shape or sound was presented first, the
accuracy of non-synaesthetic adults decreases when a higher pitched sound occurs just before
an angular, rather than rounded shape, as if the congruent mapping fuses the sound and
shape together so that the participant can no longer judge accurately which one came first
(Parise and Spence 2009). The same association has been demonstrated in infants just
3–4 months old. Infants looked longer at a visual display when a shape morphing from an
amoeboid to a pointed version was accompanied by a whistle sound increasing in pitch than
at a display with the reverse mapping (Walker et al. 2010). These babies had 3–4 months of
postnatal experience during which they would have been exposed to many auditory-visual
combinations, some spurious coincidences (e.g. mother enters room as telephone rings) and
some representing environmental correspondences (e.g. mother’s moving lips and mother’s
voice). Nevertheless, there is no obvious environmental link between rounder objects and
lower pitches or pointed objects and higher pitches (e.g. faces, one of the stimuli infants
encounter most often, are round and come with voices covering a large range of pitches).
Thus, it seems likely that the correspondence between pitch and sharpness is a natural bias
arising from intrinsic connections between visual and auditory brain areas.
B) Height. Adults also associate lower pitches with lower positions in space and higher pitches
with higher positions in space: they judge a lower pitched sound to be coming from a lower
position in space (Roffler and Butler 1968) and make judgements about both location and pitch
faster if the other dimension is congruent (Melara and O’Brien 1987). Infants 3–4 months
of age appear to make the same association: they look longer at a visual display when an
orange ball moves up the screen as the pitch of the accompanying whistle tone increases than
at the visual display with the reverse mapping (Walker et al. 2010). These associations are not
likely to have been learned from the infants’ environment over the first 3–4 months.
For example, babies do not yet know that we refer to treble tones as higher pitched and that
taller people (e.g. Dad) have voices with lower pitch than shorter people (e.g. mother, older
sister), which is just the opposite mapping. Instead, the results suggest that there are natural
associations between pitch and visual characteristics that arise from the intrinsic wiring of the
nervous system.
C) Lightness. Synaesthetic adults experience brighter percepts in response to sounds of higher
pitch (e.g. a higher-pitched C elicits a brighter red than the duller red evoked by a lower-
pitched C) (Marks 1974; Ward et al. 2006). Likewise, non-synaesthetic adults match tones
of higher pitch to lighter colours and judge both pitch and lightness more accurately if a dis-
tracter on the other dimension is congruent (e.g. darker light with lower pitched distracting
noise) (Marks 1974). Toddlers (2.5–3 years of age) demonstrate the same pitch–lightness
correspondence as adults. This was shown in a study in which toddlers observed two simulta-
neously bouncing balls, one light and one dark, accompanied by a lower-pitched or higher-
pitched sound. When asked which ball was making the noise, toddlers consistently matched
the lower-pitched sound to the darker ball and the higher-pitched sound to the lighter ball
(Mondloch and Maurer 2004). This correspondence between pitch and lightness is unlikely
to arise from experience with the association in the environment, as lighter objects do not
consistently make higher pitched sounds in the world (e.g. a brown mouse squeaks at the
same pitch as a white mouse). Thus, pitch and lightness seem to connect crossmodally in
a way that could be naturally-biased by cortical connectivity among neighbouring sensory
cortical areas.
D) Size. Toddlers connect higher pitch to smaller visual stimuli (Mondloch and Maurer 2004),
just like the perceptual matches of non-synaesthetic adults and the induced percepts of adults
with synaesthesia (Marks 1974). This correspondence also influences adults’ judgments of the
temporal order and spatial contiguity of a sound and visual stimulus: when a higher-pitched
sound occurs on the same trial as a smaller visual stimulus, adults are more likely to misjudge
which one came first and which one is displaced to the right, as if the correspondence melded
the pitch and shape together (Parise and Spence 2009). This could be an additional example
of a natural bias, or it might arise from experience, as larger organisms tend to make lower
pitched sounds (e.g. mice squeak and lions roar and children have higher pitched voices than
adults). Alternatively, it could result from a dynamic interplay of natural biases and learning.
For example, there may be an initial natural bias to associate high pitch with lightness
and smallness, which would help the developing child to understand the statistics of the
environment. Learning these statistics would reinforce the strength of this association as the
child gains experience in a world where smaller organisms tend to make higher pitched
sounds. Indeed, studies of older children indicate gradually increasing understanding of this
correspondence throughout middle childhood (Marks et al. 1987).
In sum, young children associate higher and lower pitch to polar opposites along the visual
dimensions of sharpness, height in field, surface lightness, and size. We hypothesize that the
matching for sharpness, height in field, and surface lightness represent examples of crossmodal
mappings that cannot readily be explained by learning because the correspondence is not preva-
lent in the environment. The matching for size might be acquired early from the observation of
statistical regularities in the environment or it, too, might start out as a natural bias that can be
modified by learning.
10.5.3 Colour—letters
Coloured graphemes is one of the most common forms of synaesthesia (see Section 10.1).
If young children have more synaesthetic-like perception before experience-dependent pruning
IMPLICATIONS FOR DEVELOPMENT 241
and the development of inhibitory feedback, then it might be possible to observe perceptual
manifestations of synaesthesia early in development. Dobkins (2011) tested this hypothesis
by investigating whether young infants behave as if circular and diamond shapes have distinct
colours. To do so, she showed infants a field of circles (or diamonds) on a background, half of
which was red and half of which was green. She reasoned that for any given baby, these two shapes
might induce different colours, so that circles would be easier to see against one colour of back-
ground and diamonds against the other. Indeed, she found non-random choices across trials
consistent with synaesthetic perception at 2 months of age for red/green and at a slightly older age
for yellow/blue, as would be expected given that it is known that the yellow/blue channels develop
more slowly than the red/green channels. These results provide the first direct evidence that
stimuli that act as inducers for adult synaesthetes may not only elicit crossmodal associations
of the same type in infants, but actually elicit synaesthetic percepts that are superimposed on the
world.
Some of these early colour-shape associations appear to persist in typical adults as well as
in adults with synaesthesia. While each individual colour-grampheme synaesthete has a unique
coloured alphabet, there are some letters of the alphabet that tend to be associated to the same
colours at above chance levels across synaesthetes (e.g. ~40% of synaesthetes report that A is red;
Day 2004; Rich et al. 2005; Simner et al. 2005). Likewise, non-synaesthetic adults do not typically
associate letters to colours, but when asked to do so, they tend to agree on the colour of some let-
ters of the alphabet, in large part the same ones for which synaesthetes with coloured graphemes
show consistency (Rich et al. 2005; Simner et al. 2005). Some of these consistent letter–colour
associations appear to be based upon literacy. For example, English-speaking participants
commonly associate G to green. However, some of the consistent letter–colour associations
cannot be readily explained by literacy. For example, at levels far exceeding chance, both synaes-
thetic and non-synaesthetic English-speaking adults associate X and Z with black; O and I with
white, and C with yellow (Barnett et al. 2008; Day 2004; Rich et al. 2005; Simner et al. 2005).
Despite the early colour–shape connections observed in infants, one might expect differences
in colour–shape connections between infants, children, and adults for whom those shapes have
become letters of the alphabet they use in reading. Consistent with this prediction, adults and
children 7 to 9 years of age who have begun to read, but not toddlers, consistently associate letters
to colours when there is an apparent literacy basis for their mapped colours in adults (A/red and
G/green, B/blue and Y/yellow; Spector and Maurer 2008, 2011). However, in some cases where
the adult associations do not have a ready literacy explanation, toddlers made the same colour–
shape associations as are common in synaesthetic percepts and typical adult crossmodal associa-
tions: they expected X and Z to be hidden in a black box and I and O to be hidden in a white box.
Further, the consistent matching for I, Z, X and O was based upon the shape and not the sound
of the letter, and appeared to be related to the angularity of the shape: toddlers searched for jagged
shapes in the black box and rounded shapes in the white box. In addition to the associations of
smooth shapes to white and jagged shapes to black, toddlers, like synaesthetic and non-synaes-
thetic adults (Day 2004; Rich et al. 2005; Simner et al. 2005), associated C with yellow (Spector
and Maurer 2011). As suggested earlier, these early associations may result from naturally biased
associations between shape and colour that reflect intrinsic cortical connectivity among neigh-
bouring sensory cortical areas. These connections seem to persevere into adulthood, as shown by
the persistence into adulthood of the associations not readily explained by literacy (O and I/white;
X and Z/black; C/yellow) (Day 2004; Maurer and Spector 2011; Rich et al. 2005; Simner et al.
2005) and their commonness in the actual synaesthetic percepts of adults with coloured graph-
eme synaesthesia (Day 2004; Rich et al. 2005; Simner et al. 2005). Furthermore, while sensory
cortical organization may initially bind colour to shape, the development of literacy can induce
additional associations, as shown by the emergence around age 7 in English-speaking children of

the association of A to red, B to blue, Y to yellow, and G to green, perhaps as a result of differential
recruitment of higher order networks as letters take on meaning (Spector and Maurer 2011).
10.5.4 Sound—shape (sound symbolism)

Our final example examines whether intrinsic crossmodal associations may influence the devel-
opment not only of perception but of language as well. It originated from evidence that typical
adults have biases to associate specific sounds to specific shapes (Ramachandran and Hubbard
2001; Kohler 1929; Lindauer 1990; Marks 1996). For example, when asked to match the nonsense
words ‘takete’ and ‘maluma’ to rounded and jagged shapes, most people answer that the jagged
shape is ‘takete’ and the rounded shape is ‘maluma’ (Kohler 1929; Lindauer 1990). This effect has
been replicated with modified shapes and words (e.g. kiki and bouba) in experimental studies
with English-speaking adults and with 8–14-year-old children who spoke Swahili and the Bantu
dialect of Kitongwe, but not English (Davis 1961; Holland and Westheimer 1964; Ramachandran
and Hubbard 2001). Researchers speculate that these phenomena arise from connections among
contiguous cortical areas mediating decoding of the visual percept of the nonsense shape (round
or angular), the appearance of the speaker’s lips (open and round or wide and narrow), and the
feeling of saying the same words oneself (e.g. Sapir 1929; Ramachandran and Hubbard 2001).
They argue that these connections lead to natural mappings between sound and shape that some-
times result in synaesthesia but which are present in some form in everyone. These naturally-
biased associations may also influence the language development of an individual child by
contributing to the ease with which the child learns semantic mappings. The relationship between
the natural mappings and the semantics of the language will be one of mutual influence: as the
child acquires the vocabulary of the language some of the natural correspondences between shape
and sound will be reinforced and others will be altered because they are not common in the
child’s language (see Nuckolls 1999; Smith and Sera 1992). The natural mappings may also have
influenced the evolution of languages, and may explain why adults are able to guess the meanings
of foreign words from remotely related languages at rates exceeding chance (Berlin 1994), with
even higher accuracy if the words match in meaning (Nygaard et al. 2009) or are chosen based on
sound symbolism (Imai et al. 2008).
The maluma/takete mapping has usually been attributed to the contrast between rounded and
unrounded vowels. A vowel is considered rounded or non-rounded based upon the shape of
the mouth and the lips when pronouncing it. For example, for the phoneme [o], as in ‘code’, the
mouth is wide open and the lips are rounded and slightly protruded, whereas for the phoneme [i],
as in the word ‘feet’, the mouth is partly closed and the corners of the lips are drawn back
(Dale 1976). In two recent studies, we assessed whether language-learning toddlers associate non-
sense words with rounded vowels to unfamiliar rounded shapes and nonsense words with non-
rounded vowels to unfamiliar angular shapes. This was tested using a game in which
English-speaking toddlers were presented with four pairs of nonsense words, in which one word
contained rounded vowels and the other word contained non-rounded vowels. Children were
asked to choose which of two unfamiliar shapes (one round and one angular) corresponded to
one of the words in each pair. The contrasting shapes were ones known to be optimal for selec-
tively stimulating neurons in cortical area V4, the area active during forms of synaesthesia
that involve sound (see Section 10.3.2). As predicted, toddlers, like the non-synaesthetic control
adults, associated the nonsense words that contained non-rounded vowels (e.g. tee-tay; tuh-kee-
tee, gee gee, dee-dee) to the jagged shapes and the nonsense words with rounded vowels
(e.g. go-gaa, maa-boo-maa, go-go, do-do) to the rounded shapes (Maurer et al. 2006; Spector and
SUMMARY 243
Maurer, in preparation). In toddlers, there was no influence of contrasting stop versus approxi-
mate consonant sounds (e.g. bibi versus yiyi) on matching to jagged versus rounded shapes
(Spector and Maurer, in preparation). The unifying influence might well be pitch: pitch tends to
be higher when pronouncing non-rounded vowels compared to rounded vowels, and infants as
young as 3 months of age associate rising pitch to more pointed shapes (see Section 10.5.2). These
results suggest that sound/shape matching can influence vocabulary acquisition in toddlers.
Indeed, in the only study to investigate this possibility directly, Japanese-learning 3-year-olds
were better at generalizing the meaning of newly acquired verbs if they used sound symbolic
mimetics than if they did not (Imai et al. 2008).
We cannot rule out an experiential explanation for sound-shape mappings in toddlers, as tod-
dlers have had enough experience with words and the objects that they represent to pick up sta-
tistical regularities in English semantics. It is possible that words that have non-rounded vowels
tend to represent objects that are jagged (e.g. spiky) and that words that have rounded vowels
tend to represent objects with curved contours (e.g. round, amoeboid). Indeed, there are consist-
encies across languages in using words containing the non-rounded vowel i (as in feet) to name
objects or attributes that are smaller, brighter, closer, and/or associated with higher pitch (e.g.
tiny, mini) and in using words containing the non-rounded vowel o to name objects or attributes
that are larger, darker, farther away, and/or associated with lower pitch (e.g. whopping; e.g.
Day 2004; Nuckolls 1999; Tanz 1971). However, it is also possible that this effect represents a
naturally-biased association between the shape and sound of the phoneme, between shape and
the sight of the shape of the mouth when producing the sound, and/or between shape and the
feeling (amount of oral constriction) needed to produce the same sound oneself. These natural
associations may have influenced the evolution of language itself, as described above
(Ramachandran and Hubbard 2001), and in turn they may influence the language learning of the
individual child. Any or all of these associations could influence the language-learning child:
young infants look preferentially toward a face with the lip movements matching the sound they
are hearing or have just heard, initially even for foreign language contrasts (Pons et al. 2009;
see Chapter 9 by Soto-Faraco et al.) and monkey calls (Lewkowicz and Ghazanfar 2006; see
Chapter 7 by Lewkowicz and Chapter 16 by Ghazanfar), an effect that is modulated by the move-
ments they are making with their own mouth (Yeung and Werker 2010; see Ito et al. 2009 for
similar evidence from adults).
10.6 Summary
In sum, development involves a proliferation of connections among and within sensory cortical
areas followed by experience-dependent pruning that leads to more specialized sensory networks
that are further enhanced by re-entrant feedback. When the pruning or inhibitory feedback are
less-than-normal, some of the early crossmodal and cross-dimensional connections remain to
influence conscious perception in the form of synaesthesia. In typical adults, remnants of these
connections persist that directly influence associations and indirectly affect behaviour. During
early childhood, however, these connections may have a stronger influence on the modalities and
dimensions the child perceives as connected and on the words they find easy to learn. They will
interact with the experience-dependent pruning driven by the child’s individual environment to
make some veridical associations easier to learn because the associations in the environment
match the intrinsic connections in the young brain.
Thus the young infant appears to begin development with the neural substrate for both cross-
modal and cross-dimensional synaesthetic associations, including both: (1) an equation of mag-
nitude (e.g. louder = brighter, brighter = longer), and (2) a systematic linking between seemingly
arbitrary attributes, the natural biases we referred to throughout this chapter. With development,
some of these correspondences will be reinforced by experience and, when they are, it will lead to
consolidation and refinement of the initial connections. These natural biases will influence the
ease of the child’s initial perceptual and language learning. At the same time, the child will learn
other correspondences from regularities in the environment (e.g. this voice timbre goes with
mother’s face; mice make squeaky sounds; roses have a particular smell), likely in part by the
interlinking of schemas as described by Piaget (1962). Natural biases that are not reinforced by
correspondences in the child’s environment will be largely pruned away, with the remainder
likely inhibited by re-entrant feedback. Nevertheless, remnants of the natural biases will remain
as unconscious influences on typical adult perception.
In those with synaesthesia, development takes a different course. Because of less-than-normal
pruning and/or less than normal inhibition, more of the exuberant connections remain—
especially among contiguous sensory cortical areas—as well as within the parietal cortex in areas
binding stimulus properties together and within the frontal cortex. These connections include
ones underlying the natural biases observed in early development and ones that are idiosyncratic,
varying from individual to individual, but highly consistent over time. These connections are suf-
ficiently strong and widespread to induce conscious synaesthetic percepts and accompanying
cortical brain activity in the same areas activated by real world stimuli with the same attributes.
Something similar may happen in the typical infant before experience-dependent pruning and
the development of inhibition. Nevertheless, because many of the synaesthetic percepts link back
to the initial natural biases, typical adults demonstrate crossmodal influences of similar type. For
that reason, future research on synaesthesia has the potential to reveal other additional natural
biases influencing the perception of all of us.
Acknowledgements
This work was supported by a grant from the Natural Sciences and Engineering Research Council
(9797) to Daphne Maurer.
References
Aghajanian, G.K., and Marek, G.J. (1999). Serotonin and hallucinogens. Neuropsychopharmacology, 21, 16s–23s.
Asher, J.E., Aitken, M.R.F., Farooqi, N., Kurmani, S., and Baron-Cohen, S. (2005). Diagnosing and
phenotyping visual synaesthesia: a preliminary evaluation of the revised test of genuineness (TOG-R).
Cortex, 42, 137–46.
Banissy, M., Walsh,V., and Ward, J. (2009). Enhanced sensory perception in synaesthesia. Experimental
Barnett, K.J., Finucane, C., Asher, J.E., et al. (2008). Familial patterns and the origins of individual
differences in synaesthesia. Cognition, 106, 871–93.
Baron-Cohen, S., Harrison, J., Goldstein, L.H., and Wyke, M. (1993). Coloured speech perception: Is
synaesthesia what happens when modularity breaks down? Perception, 22, 419–26.
Baron-Cohen, S., Burt, L., Smith-Laittan, F., Harrison, J., and Bolton, P. (1996). Synaesthesia: prevalence
and familiality. Perception, 25, 1073–79.
Beeli, G., Esslen, M., and Jäncke, L. (2007). Frequency correlates in grapheme-color synaesthesia.
Berlin, B. (1994). Evidence for pervasive synesthetic sound symbolism in ethnozoological nomenclature. In
Sound symbolism (eds. L. Hinton, J. Nichols, and J. Ohala), pp. 76–93. Cambridge University Press,
New York.
Bourgeois, J.P., and Rakic, P. (1993). Changes of synaptic density in the primary visual cortex of the
macaque monkey from fetal to adult stage. The Journal of Neuroscience, 13, 2801–20.
REFERENCES 245
Brang, D., Edwards, L., Ramachandran, V.S., and Coulson, S. (2008). Is the sky 2? Contextual priming in
grapheme-color synaesthesia. Psychological Science, 19, 421–28.
Bulkin, D.A., and Groh, J.M. (2006). Seeing sounds: visual and auditory interactions in the brain. Current
Opinion in Neurobiology, 16, 415–419.
Burkhalter, A. (1993). Development of forward and feedback connections between areas V1 and V2 of
human visual cortex. Cerebral Cortex, 3, 476–87.
Chugani, H.T. (1994). Development of regional brain glucose metabolism in relation to behavior and
plasticity. In Human behavior and the developing brain (eds. G. Dawson, and K. W. Fischer),
pp. 153–75. Guildford Press, New York.
Chugani, H.T., and Phelps, M.E. (1986). Maturational changes in cerebral function in infants determined
by 18FDG positron emission tomography. Science, 231, 840–43.
Chugani, H.T., Phelps, M.E., and Mazziotta, J.C. (1987). Positron emission tomography study of human
brain functional development. Annals of Neurology, 22, 487–97.
Cohen Kadosh, R., Henik, A., and Walsh, V. (2009). Synaesthesia: learned or lost? Developmental Science,
12, 484–91.
Cohen Kadosh, R., Henik, A., Catena, A., Walsh, V., and Fuentes, L.J. (2009). Induced cross-modal
synaesthetic experience without abnormal neuronal connections. Psychological Science, 20, 258–65.
Cytowic, R.E. (1989). Synesthesia: a union of the senses. Springer, New York.
Cytowic, R.E. (1997). Synaesthesia: Phenomenology and neuropsychology. A review of current knowledge.
In Synaesthesia: classic and contemporary readings (eds. S. Baron-Cohen, and J.E. Harrison), pp. 17–39.
Basil Blackweel, Oxford: Basil Blackwell.
Dale, P.S. (1976). Language development, 2nd Ed. Holt Rinehart and Winston, New York.
Davis, R. (1961). The fitness of names to drawings: a crosscultural study in Tanganyika. British Journal of
Dehay, C., Bullier, J., and Kennedy, H. (1984). Transient projections from the fronto-parietal and temporal
cortex to areas 17, 18 and 19 in the kitten. Experimental Brain Research, 57, 208–212.
Dehay, C., Kennedy, H., and Bullier, J. (1988). Characterization of transient cortical projections from
auditory, somatosensory, and motor cortices to visual areas 17, 18, and 19, in the kitten. Journal of
Comparative Neurology, 272, 68–89.
Dixon, M.J., Smilek, D., Cudahy, C., and Merikle, P.M. (2000). Five plus two equals yellow. Nature, 406, 365.
Dixon, M.J., Smilek, D., and Merikle, P.M. (2004). Not all synaesthetes are created equal: projector versus
associator synaesthetes. Cognitive, Affective, and Behavioral Neuroscience, 4, 335–343.
Dobkins, K.R. (2011). Perceptual tuning across domains: Mechanisms and functions. Paper presented at
the Annual Meeting of the Society for Research in Child Development, April 2011, Montreal, Canada.
Dobkins, K.R., and Anderson, C.M. (2002). Color-based motion processing is stronger in infants than in
adults. Psychological Science, 13, 76–80.
Duffy, P.L. (2001). Blue cats and charatreuse kittens: how synesthetes color their worlds. Times Books, New York.
Eagleman, D.M., Kagan, A.D., Nelson, S.S., Sagaram, D., and Sarma, A.K. (2007). A standardized test
battery for the study of synesthesia. Journal of Neuroscience Methods, 159, 139–45.
Edquist, J., Rich, A.N., Brinkman, C., and Mattingley, J.B. (2006). Do synaesthetic colours act as unique
features in visual search? Cortex, 42, 222–31.
Elias, L.J., Saucier, D.M., Hardie, C., and Sarty, G.E. (2003). Dissociating semantic and perceptual
components of synaesthesia: behavioural and functional neuroanatomical investigations. Cognitive
Esterman, M., Verstynen, T., Ivry, R.B., and Robertson, L.C. (2006). Coming unbound: disrupting
automatic integration of synesthetic color and graphemes by transcranial magnetic stimulation of the
right parietal lobe. Journal of Cognitive Neuroscience, 18, 1570–76.
Fair, D.A., Cohen, A.L., Power, J.D., et al. (2009). Functional brain networks develop from a ‘local to
distributed’ organization. PLoS Computational Biology, 5, e1000381.
Falchier, A., Clavagnier, S., Barone, P., and Kennedy, H. (2002). Anatomical evidence of multimodal
integration in primate striate cortex. The Journal of Neuroscience, 22, 5749–59.
Gebuis, T., Nijboer, T.C., and van der Smagt, M.J. (2009). Of colored numbers and numbered colors:
Interactive processes in grapheme-color synesthesia. Experimental Psychology, 56, 180–87.
Galton, F. (1881). Visualized numerals. The Journal of the Anthropological Institute of Great Britain and
Ireland, 10, 85–102.
Gibson, J.J. (1972). A theory of direct visual perception. In The psychology of knowing (eds. J.R. Royce and
W.W. Rozeboom), pp. 215–40. Gordon and Breach, New York.
Gray, J.A., Parslow, D.M., Brammer, M.J., et al. (2006). Evidence against functionalism from neuroimaging
of the alien colour effect in synaesethesia. Cortex, 42, 309–318.
Grossenbacher, P., and Lovelace, C.T. (2001). Mechanisms of synaesthesia: cognitive and physiological
constraints. Trends in Cognitive Sciences, 5, 36–41.
Hansen, T., Pracejus, L., and Gegenfurtner, K.R. (2009). Color perception in the intermediate periphery of
the visual field. Journal of Vision, 9, 1–12.
Hänggi, J., Beeli, G., Oechslin, M.S., and Jäncke, L. (2008). The multiple synaesthete E.S.: neuroanatomical
basis of interval-taste and tone-colour synaesthesia. Neuroimage, 43, 192–203.
Holland, M., and Wertheimer, M. (1964). Some physiognomic aspects of naming, or maluma and takete
revisited. Perceptual and Motor Skills, 19, 111–117.
Hubbard, E.M., Arman, A.C., Ramachandran, V.S., and Boynton, G.M. (2005). Individual differences
among grapheme-color synesthetes: brain-behavior correlations. Neuron, 45, 975–85.
Hubbard, E.M., and Ramachandran, V.S. (2005). Neurocognitive mechanisms of synesthesia. Neuron, 48,
509–20.
Hubbard, E.M., Manohar, S., and Ramachandran, V.S. (2006). Contrast affects the strength of synesthetic
colors. Cortex, 42, 184–94.
Huttenlocher, P. (1984). Synapse elimination and plasticity in developing human cerebral cortex. American
Journal of Mental Deficiency, 88, 488–96.
Imai, M., Kita, S., Nagumo, M., and Okada, H. (2008). Sound symbolism facilitates early verb learning.
Ito, T., Tiede, M., and Ostry, D.J. (2009). Somatosensory function in speech perception. Proceedings of the
National Academy of Sciences of the USA, 106, 1245–48.
Jäncke, L., Beeli, G., Eulig, C., and Hänggi, J. (2009). The neuroanatomy of grapheme-color synesthesia.
The European Journal of Neuroscience, 29, 1287–93.
Jarick, M., Dixon, M.J., Maxwell, E.C., Nicholls, M.E.R., and Smilek, D. (2009). The ups and downs (and
lefts and rights) of synaesthetic number forms: Validation from spatial cueing and SNARC-type tasks.
Cortex, 45, 1190–99.
Johnson, J., and Clydesdale, F.M. (1982). Perceived sweetness and redness in colored sucrose solutions.
Journal of Food Science, 47, 747–52.
Kennedy, H., Batardiere, A., Dehay, C., and Barone, P. (1997). Synaesthesia: implications for
developmental neurobiology. In Synaesthesia: classic and contemporary readings (eds. S Baron-Cohen,
and J. Harrison), pp. 243–56. Blackwell Publishers, Oxford.
Kim, C-Y., Blake, R., and Palmeri, T.J. (2006). Perceptual interaction between real and synesthetic colors.
Cortex, 42, 195–203.
Kohler, W. (1929). Gestalt psychology. Liverright, New York.
Laeng, B. (2009). Searching through synaesthetic colors. Attention, Perception, and Psychophysics, 71,
1461–1467.
Laeng, B., Svartdal, F., and Oelmann, H. (2004). Does color synesthsia pose a paradox for early-selection
theories of attention? Psychological Science, 15, 277–81.
human infants. Proceedings of the National Academy of Sciences of the USA, 103, 6771–74.
REFERENCES 247
Lewkowicz, D.J., and Turkewitz, G. (1980). Cross-modal equivalence in early infancy: auditory visual
Lindauer, M. (1990). The effects of the physiognomic stimuli takete and maluma on the meanings of
neutral stimuli. Bulletin of the Psychonomic Society, 28, 151–54.
Mann, H., Korzenko, J., Carriere, J.S., and Dixon, M.J. (2009). Time-space synaesthesia—a cognitive
advantage? Consciousness and Cognition, 18, 619–27.
Marks, L.E. (1974). On associations of light and sound: the mediation of brightness, pitch, and loudness.
American Journal of Psychology, 87, 173–88.
Marks, L.E. (1987). On cross-modal similarity: auditory-visual interactions in speeded discrimination.
Marks, L.E. (1996). On perceptual metaphors. Metaphor and Symbolic Activity, 11, 39–66.
Mattingley, J.B., Rich, A.N., Yelland, G., and Bradshaw, J.L. (2001). Unconscious priming eliminates
automatic binding of colour and alphanumeric form in synaesthesia. Nature, 410, 580–82.
Mattingley, J.B., Payne, J.M., and Rich, A.N. (2006). Attentional load attenuates synaesthetic priming
effects in grapheme-colour synaesthesia. Cortex, 42, 213–21.
Maurer, D., and Maurer, C. (1988). The world of the newborn. Basic Books, New York.
Maurer, D., Pathman, T., and Mondloch, C. (2006). The shape of boubas: Sound-shape correspondences in
toddlers and adults. Developmental Science, 9, 316–22.
Meier, B., and Rothen, N. (2007). When conditioned responses ‘fire back’: bidirectional cross-activation
creates learning opportunities in synesthesia. Neuroscience, 147, 569–72.
Meier, B., and Rothen, N. (2009). Training grapheme-colour associations produces a synaesthetic stroop
effect, but not a conditioned synaesthetic response. Neuropsychologia, 47, 1208–1211.
Melara, R.D., and O’Brien, T.P. (1987). Interaction between synesthetically corresponding dimensions.
Journal of Experimental Psychology, 116, 323–36.
Mills, C.B., Boteler, E.H. and Oliver, G.K. (1999). Digit synaesthesia: a case study using a Stroop-type test.
Cognitive Neuropsychology, 16, 181–91.
Mills, C.B., Viguers, M.L., Edelson, S.K., Thomas, A.T., Simon-Dack, S.L. and Innis, J.A. (2002). The color
of two alphabets for a multilingual synesthete. Perception, 31, 1371–94.
Mondloch, C. and Maurer, D. (2004). Do small white balls squeak? Pitch-object correspondences in young
children. Cognitive, Affective, and Behavioural Neuroscience, 4, 133–36.
Mroczko, A., Metzinger, T., Singer, W., and Nikoli , D. (2009). Immediate transfer of synesthesia to a novel
inducer. Journal of Vision, 9, 12–25.
Muggleton, N., Tsakanikos, E., Walsh, V., and Ward, J. (2007). Disruption of synaesthesia following TMS
of the right posterior parietal cortex. Neuropsychologia, 45, 1582–85.
Navon, D., and Miller, J. (1987). Role of outcome conflict in dual-task interference. Journal of Experimental
Neville, H. (1995). Developmental specificity in neurocognitive development in humans. In The Cognitive
Neurosciences (ed. M. Gazzaniga), pp. 219–31. Bradford, Cambridge, MA.
Nikolić, D., Lichti, P., and Singer, W. (2007). Color opponency in synaesthetic experiences. Psychological
Science, 18, 481–86.
Nuckolls, J. (1999). The case for sound symbolism. Annual Reviews of Anthropology, 28, 225–52.
Nunn, J.A., Gregory, L.J., Brammer, M., et al. (2002). Functional magnetic resonance imaging of
synesthesia: Activation of V4/V8 by spoken words. Nature Neuroscience, 5, 371–75.
Nygaard, L.C., Cook, A.E., and Namy, L.L. (2009). Sound to meaning correspondences facilitate word
learning. Cognition, 112, 181–86.
O’Boyle, M.W., and Tarte, R.D. (1980). Implications for phonetic symbolism: the relationship between
pure tones and geometric figures. Journal of Psycholinguistic Research, 9, 535–44.
Odgaard, E.C., Arieh, Y., and Marks, L.E. (2004). Brighter noise: sensory enhancement of perceived
loudness by concurrent visual stimulation. Cognitive, Affective and Behavioral Neuroscience, 4, 127–32.
Palmeri, T.J., Blake, R., Marois, R., Flanery, M.A., and Whetcell Jr., W. (2002). The perceptual reality of
synesthetic colors. Proceedings of the National Academy of Sciences of the USA, 99, 4127–31.
Parise CV, and Spence C, 2009 ‘When birds of a feather flock together’: synesthetic correspondences
modulate audiovisual integration in non-synesthetes. PLoS ONE, 4(5): e5664. doi:10.1371/journal.
pone.0005664.
Pascual-Leone, A., and Hamilton, R. (2001). The metamodal organization of the brain. Progress in Brain
Research, 134, 1–19.
Paulesu, E., Harrison, J., Baron-Cohen, S., et al. (1995). The physiology of colour hearing: A PET activation
study of colour-word synaesthesia. Brain, 188, 661–76.
Piaget, J. (1962). The stages of the intellectual development of the child. Bulletin of the Menninger Clinic, 26,
120–8.
Pons, F., Lewkowicz, D.J., Soto-Faraco, S., and Sebastián-Gallés, N. (2009). Narrowing of intersensory speech
perception in infancy. Proceedings of the National Academy of Sciences of the U.S.A, 106, 10598–10602.
Ramachandran, V.S., and Azoulai, A. (2006). Synesthetically induced colors evoke apparent-motion
perception. Perception, 35, 1557–60.
Ramachandran, V.S., and Hubbard, E.M. (2001). Synaesthesia: A window into perception, thought and
Rich, A.N., and Mattingley, J.B. (2003). The effects of stimulus competition and voluntary attention on
colour-graphemic synaesthesia. Neuroreport, 14, 1793–98.
Rich, A.N., Bradshaw, J.L., and Mattingley, J.B. (2005). A systematic, large-scale systematic study of
synaesthesia: Implications for the role of early experience in lexical-colour associations. Cognition, 98,
53–84.
Rich, A.N., Williams, M.A., Puce, A., et al. (2006). Neural correlates of imagined and synaesthetic colours.
Roffler, S.K., and Butler, R.A. (1968). Localization of tonal stimuli in the vertical plane. The Journal of the
Acoustical Society of America, 43, 1260–66.
Rothen, N., and Meier, B. (2009). Do synesthetes have a general advantage in visual search and episodic
memory? A case for group studies. PLoS ONE, 4, e5037.
Rothen, N., and Meier, B. (2010). Grapheme-colour synaesthesia yields an ordinary rather than
extraordinary memory advantage: Evidence from a group study. Memory, 18, 258–64.
Rouw, R., and Scholte, H.S. (2007). Increased structural connectivity in grapheme-color synesthesia.
Nature Neuroscience, 10, 792–97.
Rouw, R., and Scholte, S. (2010). Neural basis of individual differences in synesthetic experiences. The
Sagiv, N., Heer, J., and Robertson, L. (2006). Does binding of synesthetic color to the evoking grapheme
require attention? Cortex, 42, 232–42.
Sagiv, N., Simner, J., Collins, J., Butterworth, B. and Ward, J. (2006). What is the relationship between
synaesthesia and visuo-spatial number forms? Cognition, 101, 114–28.
Sapir, E. (1929). A study in phonetic symbolism. Journal of Experimental Psychology, 12, 225–39.
Schutz, M., and Lipscomb, S. (2007). Hearing gestures, seeing music: vision influences perceived tone
duration. Perception, 36, 888–97.
Simner, J. (2007). Beyond perception: synaesthesia as a psycholinguistic phenomenon. Trends in Cognitive
Simner, J., Ward, J., Lanz, M., et al. (2005). Non-random associations of graphemes to colours in
synaesthetic and non-synaesthetic populations. Cognitive Neuropsychology, 22, 1069–85.
Simner, J., Mulvenna, C., Sagiv, N., et al. (2006). Synaesthesia: the prevalence of atypical cross-modal
experiences. Perception, 35, 1024–33.
Simner, J., Harrold, J., Creed, H., Monro, L., and Foulkes, L. (2009a). Early detection of markers for
synaesthesia in childhood populations. Brain, 132, 57–64.
REFERENCES 249
Simner, J., Mayo, N., and Spiller, M-J. (2009b). A foundation for savantism? Visuo-spatial synaesthetes
present with cognitive benefits. Cortex, 45, 1246–60.
Smilek, D., Dixon, M.J., Cudahy, C., and Merikle, P.M. (2001). Synaesthetic photisms influence visual
perception. Journal of Cognitive Neuroscience, 13, 930–36.
Smilek, D., Moffatt, B.A., Pasternak, J., et al. (2002). Synaesthesia: A case study of discordant monozygotic
twins. Neurocase, 8, 338–42.
Smilek, D., Dixon, M.J., and Merikle, P.M. (2003). Synaesthetic photisms guide attention. Brain and
Smilek, D., Dixon, M.J., and Merikle, P.M. (2005). Synaesthesia: discordant male monozygotic twins.
Neurocase, 11, 363–70.
Smilek, D., Callejas, A., Dixon, M.J., and Merikle, P.M. (2007a). Ovals of time: time-space associations in
synaesthesia. Consciousness and Cognition, 16, 507–519.
Smilek, D., Carriere, J.S., Dixon, M.J., and Merikle, P.M. (2007b). Grapheme frequency and color
luminance in grapheme-color synaesthesia. Psychological Science, 18, 793–95.
Smith, L.B., and Sera, M.D. (1992). A developmental analysis of the polar structure of dimensions.
Spector, F., and Maurer, D. (2011). The colors of the alphabet: naturally-biased associations between
shape and color. Journal of Experimental Psychology: Human Perception and Performance,
37, 484–95.
Spector, F., and Maurer, D. (2008). The colour of Os: Naturally-biased associations between shape and
colour. Perception, 37, 841–47.
Spector, F., and Maurer, D. (2009a). The development of colour-grapheme synaesthesia. Poster presented
at the Annual Meeting of the Society for Research in Child Development, April 2009, Denver, US.
Spector, M., and Maurer, D. (2009b). Synesthesia: a new approach to understanding the development of
Spector, F., and Maurer, D. (in preparation). Early sound symbolism for vowel but not consonant sounds.
Sperling, J.M., Prvulovic, D., Linden, D.E.J., Singer, W., and Stirn, A. (2006). Neuronal correlates of
colour-graphemic synaesthesia: a fMRI study. Cortex, 42, 295–303.
Stein, B.E., London, N., Wilkinson, L.K., and Price, D.D. (1996). Enhancement of perceived visual intensity
by auditory stimuli: a psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.
Stevens, M.S., Hansen, P.C., and Blakemore, C. (2006). Activation of color-selective areas of the visual
cortex in a blind synesthete. Cortex, 42, 304–308.
Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18,
643–62.
Supekar, K., Musen, M., and Menon, V. (2009). Development of large-scale functional brain networks in
children. PLoS Biology, 7, e1000157.
Tanz, C. (1971). Sound symbolism in words relating to proximity and distance. Language and Speech, 14,
266–76.
Tzourio-Mazoyer, N., De, S.S., Crivello, F., et al. (2002). Neural correlates of woman face processing by
2-month-old infants. Neuroimage, 15, 454–61.
Walker, P., Bremner, J.G., Mason, U., et al. (2010). Preverbal infants’ sensitivity to synaesthetic cross-modality
correspondences. Psychological Science, 21, 21–25.
Ward, J., Jonas, C., Dienes, Z., and Seth, A. (2010). Grapheme-colour synaesthesia improves detection of
embedded shapes, but without pre-attentive ‘pop-out’ of synaesthetic colour. Proceedings of the Royal
British Society: Biology, 277, 1021–26.
Ward, J., and Simner, J. (2003). Lexical-gustatory synaesthesia: linguistic and conceptual factors. Cognition,
89, 237–61.
Ward, J., and Simner, J. (2005). Is synaesthesia an x-linked dominant trait with lethality in males?
Perception, 34, 611–23.
Ward, J., Huckstep, B., and Tsakanikos, E. (2006). Sound-colour synaesthesia: To what extent does it use
cross-modal mechanisms common to us all? Cortex, 42, 264–80.
Ward, J., Thompson-Lake, D., Ely, R., and Kaminski, F. (2008). Synaesthesia, creativity, and art: what is the
link? British Journal of Psychology, 99, 127–41.
Ward, J., Sagiv, N., and Butterworth, B. (2009). The impact of visuo-spatial number forms on simple
arithmetic. Cortex, 45, 1261–65.
Weiss, P.H., and Fink, G.R. (2009). Grapheme-colour synaesthetes show increased grey matter volumes of
parietal and fusiform cortex. Brain, 132, 65–70.
Witthoft, N., and Winawer, J. (2006). Synesthetic colors determined by having colored refrigerator magnets
in childhood. Cortex, 42, 175–83.
Wolff, P.H., Matsumiya, Y., Abroms, I.F., van Velzer, C., and Lombroso, C.T. (1974). The effect of white
noise on the somatosensory evoked response in sleeping newborn infants. Electroencephalography and
Clinical Neurophysiology, 37, 269–74.
Yaka, R., Yinon, U., and Wollberg, Z. (1999). Auditory activation of cortical visual areas in cats after early
visual deprivation. The European Journal of Neuroscience, 11, 1301–1312.
Yaro, C., and Ward, J. (2007). Searching for Shereshevskii: what is superior about the memory of
synaesthetes? Quarterly Journal of Experimental Psychology, 60, 681–95.
Yeung, H., and Werker, J. (2010). Sensorimotor influences in infant perception: how lip-rounds affects
4-month-olds’ perception of talking faces. Poster presented at the International Conference on Infant
Studies, March 2010, Baltimore, US.
Zellner, D.A., and Kautz, M.A. (1990). Color affects perceived odor intensity. Journal of Experimental
Chapter 11
Multisensory processes in old age

Paul J. Laurienti and Christina E. Hugenschmidt
11.1 Introduction
Typically when we think of development, we consider changes that begin at conception and
culminate in adulthood. However, development is an ongoing process that proceeds until death
and non-pathological changes occur in old age. Neuroscience dogma used to state that remodel-
ling in the adult and aging brain was due solely to loss of neurons and connections. It is now well
established that older brains do retain plastic capacity. This plasticity not only results in typical
age-related variation, but allows for the use of interventions to direct neural changes.
Interactions between the senses are governed by the physical aspects of the stimuli such as tem-
poral, spatial, and intensity properties. Higher-order cognitive processes also modulate the inter-
actions between sensory modalities. Examples include the attentional state of the brain when the
stimuli are presented, or the congruency of the stimuli. Sensory perception and higher-order
processes are not static experiences in the brain; both evolve over the lifespan of the individual.
It is known that even healthy aging people experience diminished sensory acuity in all five senses
and have subtle alterations in cognitive processes like attention and memory. Given what we
know about multisensory interactions, these common age-related changes have the potential to
alter the probability and/or magnitude of integration in aging adults. This chapter will focus on
the interplay between multisensory processing and the age-related cognitive and sensory changes
that commonly occur in late development.
11.2 Evidence for enhanced multisensory

integration in the elderly
The growing body of literature evaluating the integration of information in older adults indicates
that the gain associated with multisensory integration is increased in older adults (Laurienti et al.
2006b; Peiffer et al. 2007; Strupp et al. 1999; although see Stephen et al. 2010). Effects of aging on
the likelihood of integrating sensory cues is only just beginning to be explored. As detailed below,
the evidence suggests that while older adults gain more from multisensory integration, they are
actually less likely to integrate sensory cues from different modalities.
11.2.1 Aging increases the magnitude of multisensory integration

One of the first examples of age altering the interactions between the senses was reported by
Strupp and colleagues (Strupp et al. 1999). Previous research had shown that stimulation of neck
muscles can cause the illusion of head movement, which in turn causes the perception of lateral
motion of visual stimuli. In Strupp et al.’s experiment, participants were seated in the dark and
used a laser to indicate the location they perceived as straight ahead while they experienced either
neck muscle vibration or no neck muscle vibration. Participants ranged in age from 20 to 81 years
252 MULTISENSORY PROCESSES IN OLD AGE
and significantly larger visual displacements due to neck muscle vibration were observed with
advancing age. This outcome supports enhanced integration in older adults but the possibility of
altered bias with a greater reliance on vibrotactile perception could not be ruled out.
Studies attempting to isolate multisensory integration effects from other cognitive processes
often use what has become known as the race model (see Chapter 14 by Wallace et al.) to analyze
data. Multisensory tasks typically compare responses made to each of the unisensory stimuli to
responses made to a combination stimulus made from simultaneous presentation of the unisen-
sory stimuli. Since multisensory trials inherently have two stimuli but unisensory trials only have
one stimulus, it is important to evaluate the multisensory trials against a model of the unisensory
trials (Miller 1982, 1986). The race model compares the distribution of response times for the
multisensory condition to a summed distribution from the two unisensory trials (Fig. 11.1; see
Chapter 14 by Wallace et al., and Miller 1982). Using this method, a positive difference between
the multisensory response distribution and the race model is indicative of integration.
Experiments using reaction time (RT) measures and the race model to evaluate multisensory
integration have directly demonstrated increases in the magnitude of multisensory interactions in
older adults. For instance, Peiffer and colleagues showed an enhancement in the magnitude of
integration when they evaluated integration in younger (18–35 years old) and older (65–90 years
old) adults using a simple response time task (Peiffer et al. 2007). In this task, participants were
seated in a light- and sound-attenuated testing booth and viewed a fixation point (a light emitting
diode; LED) located at eye level straight ahead. Participants were instructed to press a button on
a response box whenever they detected a visual stimulus, an auditory stimulus, or the simultane-
ous presentation of both stimuli (the multisensory stimulus). The trials were presented in ran-
dom order in one 6-minute block with the onset time of trials randomly jittered. No cues were
presented at any point in the task.
This paradigm had RT as the primary variable of interest and all the stimuli were supra-thresh-
old. The purpose of this paradigm was to try to present a task so simple that response times to
unisensory trials would be equivalent in older and younger adults. If multisensory gains were
larger in the older adults, this would mean that older adults would actually be responding faster
than younger adults on multisensory trials.
As expected, in this simple detection task both younger and older adults exhibited speeded
responses when a multisensory stimulus was presented relative to unisensory trials. This speeding
surpassed that predicted by the race model for both populations. However, the magnitude of
integration was larger in the older adults than the younger adults (Fig. 11.2), meaning that older
adults were speeded more than younger adults by the presence of crossmodal information. Since
unisensory response times were comparable between older and younger adults, the enhanced
integration in older adults resulted in the older population actually responding faster than the
younger population on the multisensory trials. In addition, the integration peak for older adults
is broader than that of younger adults, indicating that integration was occurring over a larger
range of response times. Similar increases in the magnitude of integration in older adults have
been observed in studies using choice discriminations (Laurienti et al. 2006a) as well as saccadic
eye movements (Diederich et al. 2008).
One concern in any multisensory task is the possibility that speeding could be accounted for by
other factors, such as a criterion shift. A criterion shift occurs when the participant’s threshold for
response is changed during a task, and such effects have been observed in multisensory detection
tasks. For instance, a participant may lower the luminance at which they respond to a visual target
when they are expecting a visual target, resulting in faster RTs but also lower accuracy, as the
participant is more likely to respond when the target is absent. Although the design of Peiffer and
colleague’s study precluded the use of catch trials, a criterion shift is an unlikely explanation for
EVIDENCE FOR ENHANCED MULTISENSORY INTEGRATION IN THE ELDERLY 253
100%
90%
80%
Response probability
70%
60%
50%
40%
30% Auditory
20% Visual
Multisensory
10%
Race model
0%
250 400 550 700 850 1000 1150 1300 1450 1600
A Response time (ms)
16%
14%
12%
Probability difference
10%
8%
6%
4%
2%
0%
250 400 550 700 850 1000 1150 1300 1450 1600
−2%
B Response time (ms)
Fig. 11.1 Depiction of the race model. (A) Cumulative distribution functions (CDF) for auditory,
visual, and multisensory response times are shown on the left. The CDFs represent the probability
that a person has responded by a given time. For example, approximately 50% of all responses can
be expected to be made to the multisensory stimulus within 500 ms of stimulus onset. The
independent race model represents the summed probabilities of the unisensory conditions minus
the intersection ((pAuditory + pVisual) − (pAuditory × pVisual)). The race model for this example is
shifted to the right of the multisensory condition indicating that the multisensory trials were faster
than predicted by the summed probability of their unisensory trials. (B) A difference curve generated
by subtracting the race model from the multisensory CDF. Positive deflections indicate multisensory
integration. (Reprinted from Neurobiology of Aging, 27 (8), Paul J. Laurienti, Jonathan H. Burdette,
Joseph A. Maldjian, and Mark T. Wallace, Enhanced multisensory integration in older adults,
pp. 1155–63, Copyright (2006), with permission from Elsevier.)
the effects observed. Accuracy was quite high on all trials, with responses to more than 98% of
stimuli for both younger and older adults. Furthermore, because the trials were randomly inter-
mixed rather than blocked, no cues were used, and the onset time of trials was randomized, partici-
pants were not able to anticipate when a trial would occur or what type of trial would be presented.
20% 16%
Young 14% Young
15% Older 12% Older
Probability difference
10%
10% 8%
6%
5% 4%
2%
0% 0%
1060
1160
1260
1360
1460
1560
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
−2%
360
460
560
960
−5% −4%
Response time (ms)
Fig. 11.2 Multisensory difference time-course comparing integration in young and older adults. The
figure on the left taken from Peiffer et al. (2007) shows that older adults exhibit enhanced integration
on a simple response-time task. Note that the older adults integrate over a broader range of response
times but the peaks of the distributions are matched. (Reproduced from Peiffer, A.M., Mozolic, J.L.,
Hugenschmidt, C.E., and Laurienti, P.J., Age-related multisensory enhancement in a simple
audiovisual detection task. Neuroreport, 18, pp. 1077–1081 © 2007, Wolters Kluwer Health.) The
figure on the right was taken from Laurienti et al. (2006) and shows enhanced integration in older
adults (65–90 years old) compared to younger adults (18–38 years old) during performance of a task
that required participants to discriminate between the colours red and blue. The visual stimulus was a
coloured disk presented on a computer screen and the auditory stimulus was the pronunciation of
the words ‘red’ and ‘blue.’ These data also show integration over a broader range of response times
in the older adults but the peak is shifted to the right. This right shift in the peak of integration is
likely due to an overall slowing of the older participants that is not compensated for by speeding of
responses during multisensory integration. (Reprinted from Neurobiology of Aging, 27 (8), Paul J.
Laurienti, Jonathan H. Burdette, Joseph A. Maldjian, and Mark T. Wallace, Enhanced multisensory
integration in older adults, pp. 1155–63, Copyright (2006), with permission from Elsevier.)
While a participant’s attention might vary across the task, this would not present a systematic
bias on any one trial type. Another concern in comparing reaction times for younger and older
adults on a cognitive task is the effects of general cognitive slowing, discussed in more detail in
Section 11.5.1 below. This task was specifically designed to minimize the effects of general cogni-
tive slowing. Overall, the enhanced speeding observed in older adults in this task was likely due to
multisensory integration.
11.2.2 Aging may decrease the probability of integration

A recent study by Diederich et al. (2008) utilized a model called the time window of integration
(Colonius et al. 2009; Diederich and Colonius 2009) to evaluate multisensory integration in older
adults. In this study, participants were instructed to make an eye movement to the appearance of a
visual target either to the left or right of fixation. The participants were further instructed to ignore
any sounds, which were presented either on the same side or the opposite side as the visual target.
The window of integration in the Colonius and Diederich model refers to the time period over
which two stimuli can be integrated. The first stimulus to arrive ‘opens a window’ to allow integra-
tion. The second stimulus must arrive within that time window. By shifting the stimulus onset
asynchrony (SOA) they were able to measure the duration that the window was open in younger
(20–22 years old) and older (65–75 years old) adults. Their study demonstrated that older adults
had a window of duration of 450 ms while the younger participants had a window of 275 ms duration.
The fact that older adults have a larger window of integration suggests that stimuli can be separated
in time by a greater extent than in younger participants and still be integrated.
NEURAL MECHANISMS OF ALTERED MULTISENSORY INTEGRATION IN OLDER ADULTS 255
However, a critical finding of this study was that older adults exhibited a significant slowing in
early sensory processing, and this slowing actually resulted in a lower probability of integration in
spite of the larger window of integration. Thus, at least under certain circumstances, older adults had
a lower probability of integrating simultaneous visual-auditory stimuli than younger adults. However,
when integration did occur, the older adults gained more from the stimulus combination than their
younger counterparts. This finding of enhanced multisensory benefit is consistent with other studies
showing enhanced integration in older adults (Laurienti et al. 2006a; Peiffer et al. 2007).
These data imply that the increased window of integration cannot account for the enhanced
magnitude of integration. Although the Peiffer et al. study shown in Fig. 11.2 did not manipulate
the SOA, Colonius and Diederich suggested that the expanded window of integration in older
adults may explain the observation of integration over a wider response time period. Such an
explanation is further supported by a study of visuotactile integration in older adults (Poliakoff
et al. 2006). The duration of the window of integration has significant implications. If integration
occurred only for stimuli that were simultaneous then the window of integration would be zero
and even slight delays in processing would abolish integration. If the window is pathologically
widened, one could begin to integrate stimuli that occurred at vastly different times. Events that
occur in close proximity to the individual would not likely emit stimuli that would be separated
in time by 400 ms. Thus integration of such stimuli would be detrimental to sensory processing
as they most likely come from different sources.
11.2.3 The impact of altered multisensory integration in the aged

The functional implications of altered multisensory integration in the elderly have not been fully
evaluated. At first blush one may think that an enhanced magnitude of integration would be ben-
eficial for older adults. This very idea has been exploited to improve balance and reduce falls (Maki
et al. 2008) and to help prevent automobile accidents in older adults (Ho et al. 2007; Spence and
Ho 2008). However, the alterations of integration in older adults also likely result in behavioural
decrements due to the integration of cues arising from separate physical events and due to distrac-
tions across sensory modalities. In a laboratory setting it is possible to have fine control over
stimulus conditions, and appropriate pairing of simultaneous stimuli carrying complementary
information can be achieved. An example of a complementary pairing would be the presentation
of a picture of a cow with the sound of a cow mooing. In such a circumstance it is likely that older
adults will, in fact, benefit more from the multisensory condition than will younger adults.
While one can imagine many situations where enhanced integration would be beneficial, there
are many situations where it could disrupt performance. In the natural world, our sensory sys-
tems are stimulated continuously by environmental energies. Much of the information arriving
at our sensors at any given time is not emitted from the same source and can convey non-match-
ing or even conflicting information. In a laboratory experiment the presentation of a picture of a
cow with the sound of a bird chirping would be an example. Such situations may result in the
inappropriate integration of simultaneously arriving information, producing behavioural decre-
ments. It has previously been shown that non-matching crossmodal stimuli result in poorer
behavioural performance (Laurienti et al. 2004; Spence et al. 2004) and increased integration in
the elderly could exacerbate such decrements.
11.3 Neural mechanisms of altered multisensory

integration in older adults
The neural mechanisms that underlie age-related changes in multisensory integration remain
unknown but can be evaluated from two main perspectives. ‘Bottom-up’ modulation occurs
through changes in stimulus properties, while ‘top-down’ modulation occurs through cognitive
processes such as attention. This section will highlight how stimulus properties and cognitive
control can change the magnitude and/or probability of integration. Each of these will be related
back to the aging brain and considered as factors responsible for the increases in multisensory
integration observed in older adults.
11.3.1 Bottom-up modulation of multisensory integration

The principal stimulus factors that alter multisensory integration have been studied from the level
of the single neuron to the level of the functioning individual. This issue is detailed in Chapter 14
by Wallace et al. (for a review see Stein and Stanford 2008). Most commonly identified are the
timing, location, and effectiveness of the stimuli. If a single environmental event occurs in close
proximity to an individual and emits energies that stimulate multiple sensory modalities, the
energies will arrive at an individual’s sensors nearly simultaneously (slight error may occur due to
differences in travel time such as the speed of sound and speed of light). The further away the
event, the greater the disparity in arrival time. The timing of visual-auditory stimuli is critical for
the probability of integration (Meredith et al. 1987). As the separation in time between two sen-
sory stimuli increases, the probability of integration decreases. Previous research has evaluated
the degree of separation that can exist while still achieving integration (Meredith and Stein 1986a,
1996; Van Wanrooij et al. 2009). In general, as one stimulus precedes the other by more than 200 ms
the probability of integration drops precipitously. In humans it has been clearly demonstrated
that the integration of speech signals is dependent on temporal synchrony. For example, illusory
percepts induced by the McGurk effect are most robust when the visual and auditory stimuli fall
within a 200 ms window (van Wassenhove et al. 2007). As discussed above, the temporal window
of integration seems to be widened in older adults, although Colonius and Diederich’s data suggest
that this is not the basis for observed increases in the magnitude of multisensory integration in the
elderly.
As with temporal disparity, increasing the spatial disparity between stimuli decreases the likeli-
hood of integration in certain tasks (Meredith and Stein 1986a; Stein and Meredith 1993),
although spatial effects are not present in all situations (Jones and Jarick 2006). The effects of
space on multisensory integration intuitively make sense, as energies being emitted from a single
source should originate from the same location. With diminished sensory acuity, older adults
may be more prone to integrate information coming from a wider spatial area. The integration of
information that is spatially disparate would increase the likelihood of combining sensory stimuli
from distinct environmental events. It is important to note that integration of spatially disparate
information could lead to improved detection in the laboratory but would likely lead to behav-
ioural degradation in the real world.
To date, most studies have evaluated multisensory integration in older adults using spatially
coincident stimuli. In a study of crossmodal attention, Poliakoff and colleagues (Poliakoff
et al. 2006) compared the effects of crossmodal distractors on RTs in younger and older adults.
In this study, visual targets were presented with tactile distractors or tactile targets were presented
with visual distractors. There were four potential spatial locations for stimuli: top right,
bottom right, top left, and bottom left. The distractors could occur on the same side or the oppo-
site side from the target (e.g. right or left), and they could be congruent or incongruent (e.g. top
or bottom).
There was a main effect of spatial congruency for all age groups for both target conditions. That
is, speeding of response times was observed when the distractor and target were both presented
on the top or both on the bottom for all age groups. There was also an effect of side in all age
groups for both visual and tactile targets, where responses were speeded to distractors that
occurred on the same side as the target. In other words, the fastest response times in all age groups
were on congruent same-side trials. However, only the young adults showed an interaction
between congruency and side in the visual task, and the oldest group did not show an interaction
between congruency and side in either task. Poliakoff et al.’s study did not directly examine mul-
tisensory integration, but these results suggest that aging may differentially affect the spatial
property of multisensory integration, leaving the door open for future investigations. If it is dem-
onstrated that older adults integrate over a larger spatial area, this may provide an opportunity for
developing interventions related to spatial attention.
The effectiveness of individual stimuli modulates the magnitude of integration, with less effec-
tive stimuli resulting in greater multisensory gains at the cellular level (Meredith and Stein 1986b;
Stein and Meredith 1993), though there is discussion in the literature about the generalizability of
this observation to human behavioural performance (Ma et al. 2009; Ross et al. 2007). This phe-
nomenon is known as inverse effectiveness. As with the spatial and temporal factors modulating
integration, inverse effectiveness has an intuitive interpretation. If multisensory integration is a
process by which an individual extracts greater information from the environment than can be
achieved from unisensory processing alone, integration would be most valuable when sensory
signals are vague. For example, if one is walking down a dark alley and hears a faint noise, the
addition of a visual cue could considerably help with the identification of the environmental
event of interest. However, when walking across a street the sound of a blowing horn gives an
unequivocal signal that it is important to move out of the way. Thus one has more to gain when
individual components of a multisensory stimulus are ambiguous and, in fact, that is what is
observed from the level of the neuron to the behaving human (see Holmes 2009 and Senkowski
et al. 2008, however).
It is well known that older adults have increased sensory thresholds, poorer sensory detection
(Enrietto et al. 1999; Gates et al. 1990; Helfer 1998; R.W. Li et al. 2000; Spear 1993) and sensory
deficits that occur across all sensory modalities (Baloh et al. 1993; Enrietto et al. 1999; Hirvonen
et al. 1997; Hoffman et al. 1998; Kaneda et al. 2000; Lopez et al. 1997; Nusbaum 1999; Wayler et al.
1990; Whipple et al. 1993; Yousem et al. 1999). The fact that individual stimuli are less effective
in older adults suggests that their integration may be increased in magnitude due to inverse
effectiveness.
Recent work by James and colleagues (Kim and James 2010; Stevenson et al. 2009; Stevenson
and James 2009) has used functional brain imaging to demonstrate that multisensory brain
regions are particularly sensitive to changes in stimulus effectiveness in young adults. It would be
very interesting to determine if the magnitude of the neural response changes when stimulus
effectiveness is manipulated in older adults. However, the actual experiments that manipulate
stimulus efficacy to evaluate the role of inverse effectiveness in older adults have not been com-
pleted. Furthermore, it is important to note that diminished sensory processing in one modality
(say vision) could have very different implications than diminished processing in another modal-
ity (say audition) or even both modalities simultaneously. This is an issue that is open for future
studies.
11.3.2 Top-down modulation of multisensory integration

Bottom-up processes are important in governing multisensory interactions, but stimuli can be
congruent in space and time and not be combined. One reason for this is the role of cognitive
processes, particularly attention, in regulating multisensory integration. Studies of modality-
specific selective attention are critical for understanding the processing of multisensory integra-
tion because alterations in processing of any one sensory modality can influence the combination
of information with other sensory modalities. Below is an introduction to crossmodal attention,
followed by a review of current aging research in the realm of crossmodal attention.
Modality-specific attention enhances processing in the attended sensory modality relative to

unattended senses. Behavioural and imaging research indicates that this takes place primarily by
suppressing processing of sensory information in competing modalities rather than enhancing
neural activity in the attended modality. In pioneering work related to modality-specific attention,
Spence and Driver demonstrated that there is little or no behavioural enhancement in processing
information in an attended modality compared to a divided attention state (Spence and Driver
1997; Spence et al. 2001). The effects of modality-specific attention were almost exclusively borne
out through suppression of processing in the non-attended or ignored sensory modalities.
The experimental design of the basic experiment was as follows. Participants were seated in a
dark room focusing on an LED straight ahead. Target LEDs and speakers were located in the four
quadrants surrounding the fixation point. LEDs flanking the fixation cued the participant to
either attend to vision, audition, or to divide their attention between the two modalities. Following
the cue, a target was presented in one of the four quadrants and the participants had to indicate
as rapidly as possible on which side of fixation the target appeared. On the selective attention tri-
als, approximately 80% of the cues were correct, e.g. a visual cue was followed by a visual target.
However on the other 20% of the selective attention trials the cue was wrong. For example,
the participant was cued to attend to vision, but the target was auditory. Regardless of the cue,
the participant was to respond to the location of the target. The effects of the attentional cue were
evaluated by comparing response times on selective attention trials to those on divided attention
trials. When response times were speeded on correctly cued trials (relative to divided attention
trials), this was interpreted as a ‘benefit’ of selective attention. The ‘cost’ of selective attention was
calculated as the slowing that occurred on incorrectly cued trials (again relative to divided atten-
tion). The cost and benefit were also added to determine a total gain in response time due to
crossmodal attention.
The results demonstrated that when processing stimuli in the cued modality, there was only
a small benefit in the form of speeded RTs relative to the divided attention. However, when the
stimuli followed an invalid cue, the RTs were significantly slower than the divided attention
conditions. In their work, Spence and Driver utilized meticulous study designs to ensure that the
effects being observed were truly due to modality-specific attention and not other factors such as
feature or space-based attention. Having controlled for possible confounding factors, they were
able to conclude that modality-specific selective attention results in processing costs in the ignored
sensory modalities.
The neural correlates of modality-specific selective attention examined by multiple research
teams using functional brain imaging support these behavioural observations. The consistent
finding in healthy younger adults is that attention to one sensory modality results in suppression
or deactivation of activity in cortical areas that process information for other sensory systems.
Early crossmodal studies revealed that passive sensory stimulation produced suppressive responses
in non-stimulated sensory cortical areas (Haxby et al. 1994; Kawashima et al. 1995; Laurienti et al.
2001). While these studies did not evaluate attention directly, the thought was that attending to
the stimulus likely contributed to the suppression observed. More recently, it has been shown that
when an individual shifts attention from one sensory modality to another, there are small
enhancements in the cortex supporting the attended modality and more robust suppressive
responses in the modality that is being ignored (Johnson and Zatorre 2006; Shomstein and Yantis
2004). In both of these studies, participants were presented with simultaneous streams of visual
and auditory stimuli but were cued to attend either to the visual or to the auditory component of
the stimulus. When a cue indicated that they should shift their attention from the visual modality
to the auditory modality, the activity in visual cortex was reduced. These results occurred even
with no change in stimulation.
It has also been shown that attention alone can produce deactivation in the ignored sensory
cortices even in the absence of a stimulus (Mozolic et al. 2008b). Participants were cued to either
attend to the visual or auditory modality to detect a stimulus. On some of the trials, no stimulus
was actually presented and the mere cueing of one sensory modality resulted in deactivation of
the other modality. It is interesting to note that there was little increase in activity in the attended
sensory domain, a result that is consistent with the behavioural studies of Spence and Driver.
However, crossmodal deactivations were larger when a sensory stimulus was presented in addi-
tion to an attentional cue. This finding suggests that crossmodal deactivations may result from
the combined effects of attention and the presence of a sensory stimulus.
The fact that attention to one sensory modality results in suppression of information process-
ing in ignored modalities suggests that multisensory integration will be reduced during modality-
specific selective attention. If information is being suppressed from ignored modalities it will be
less available for integration in subsequent steps of the processing stream. Two studies using a
dual-task design support this idea (Alsius et al. 2005; Alsius et al. 2007). In both of these studies,
multisensory integration was indexed using the McGurk illusion (McGurk and MacDonald
1976), where incongruent lip movements and pronounced syllables are combined using multi-
sensory integration to form a novel percept. Two tasks were presented to the participants simul-
taneously: the McGurk task and a visual, auditory, or tactile task. When a second task was
presented, the number of illusory perceptions decreased. These studies likely include both effects
of crossmodal attention and effects of dual-task performance, but nonetheless show the impor-
tance of attention on multisensory integration.
Mozolic and colleagues also observed evidence that crossmodal attention can suppress multi-
sensory integration, this time in the context of a single task (Mozolic et al. 2008a). This study
evaluated the effects of modality-specific attention on multisensory integration using a cued task
similar to the Spence and Driver studies. This study included trials with multisensory as well as
unisensory targets, there were no invalid cues, and divided attention trials could be followed by a
visual, auditory, or multisensory target. The results showed that divided attention resulted in
response enhancement with a magnitude comparable to that previously demonstrated using the
same forced-choice task but with no attentional cues (Laurienti et al. 2004). However, when the
participants were cued to attend either to the visual modality or the auditory modality, the mag-
nitude of integration relative to the divided-attention trials was significantly reduced. The inter-
pretation of this finding was that attention suppressed the processing of the ignored sensory
modality, made it unavailable for combination with the other sensory information, and resulted
in decreased multisensory integration.
Given that modality-specific attention suppresses multisensory integration, it is interesting to
note that older adults are particularly susceptible to crossmodal distractions. Alain and colleagues
(Alain and Woods 1999) showed that background auditory stimulation (i.e. noise) presented
during the performance of a visual task resulted in larger brain responses, measured using elec-
troencephalography, in older than in younger adults. However, when a deviant auditory tone was
inserted into the background noise the younger adults actually exhibited larger responses than the
older adults. These data indicate that the older adults process more background information than
the young participants, regardless of the salience of the stimulus. Younger adults exhibited
responses to deviant stimuli, but older adults responded equally to all background sounds. This
conclusion is supported by behavioural data from the same study showing that older participants
make many more errors on a visual-discrimination task when background auditory noise is
present (Alain and Woods 1999).
If modality-specific selective attention suppresses multisensory integration and older adults
exhibit increased integration as well as increased crossmodal distraction, it is reasonable to
hypothesize that older adults have impaired modality-specific attention. This is the hypothesis
that Hugenschmidt and colleagues tested in a recent study (Hugenschmidt et al. 2009b) using the
spatial discrimination task developed by Spence and Driver. The study evaluated 26 younger
(mean age 28 years old) and 26 older adults (mean age 68 years old). The results from the young
adults replicated prior findings and showed that modality-specific selective attention resulted
in large performance decrements in the non-attended or ignored modality. There were small
but significant performance enhancements in the attended modality. The interesting finding was
that older adults maintained the ability to engage modality-specific attention and showed similar
performance decrements in the ignored modality and similar enhancements in the attended
modality. Thus, while older adults may exhibit enhanced multisensory integration and crossmo-
dal distraction, impaired modality-specific selective attention is apparently not the underlying
mechanism.
Since older adults have the ability to engage selective attention and suppress the processing in
unattended sensory modalities, they should also exhibit suppressed multisensory integration dur-
ing selective attention. In another study conducted by Hugenschmidt, Mozolic, and colleagues
(Hugenschmidt et al. 2009a), this was in fact demonstrated. Using the exact same procedures
used to evaluate the effects of attention on multisensory integration in younger adults, they were
able to demonstrate that older adults are also able to suppress integration by selectively attending
to a single sensory modality. However, the older and younger adults did not have exactly the same
integration patterns.
Recall that older adults exhibit an enhanced magnitude of integration (see Fig. 11.2). Across the
three prior studies that have evaluated the magnitude of multisensory integration in aging
(Diederich et al. 2008; Hugenschmidt et al. 2009a; Laurienti et al. 2006a; Peiffer et al. 2007) the
data all showed that the older populations exhibit anywhere from 50–100% increase in magni-
tude of integration relative to the younger populations. Under the divided-attention condition,
Hugenschmidt et al. also measured a near doubling of the magnitude of integration in the older
population compared to the younger population. During modality-specific selective attention
both populations showed similar proportional decreases in the magnitude of integration. The
result of this was that even during selective attention older adults continued to exhibit integration
magnitudes that were significantly larger than those of their younger counterparts.
These findings support the results of the selective attention study described above. Older adults
can, in fact, engage attention and suppress multisensory integration. However, they integrate
more at baseline, possibly because they process more background sensory information during
non-selective attention states. During selective attention, comparable suppression still results in
more integration due to enhanced processing at baseline. The mechanism underlying the
enhanced baseline processing remains unknown but could be related to age-based differences in
cumulative experience. It is possible that the extensive experience and formed associations in the
older adults has shifted them to a more automatic integration state. This automatic integration is
consistent with the reduced inhibition observed in older adults. It is interesting to note that inhi-
bition is not fully mature in infants and it is possible that similar enhanced integration could be
present at earlier stages in development. Studies that evaluate competing multisensory inputs
have not been completed in infants and would be valuable for understanding the role of general
inhibition on multisensory integration.
11.4 Age-related alterations in baseline sensory processing

The available data support the hypothesis that there is enhanced background ‘noise’ in the sensory
processing streams of older adults (Gates et al. 1990; Lichtenstein 1992). Overall, there appears to
AGE-RELATED ALTERATIONS IN BASELINE SENSORY PROCESSING 261
be a reduction in the relevant signals and an increase in the background noise (i.e. a decreased
signal-to-noise ratio) both within and between sensory modalities. It has been proposed that
reduced inhibitory projections from the thalamus, frontal cortex, and brain stem may contribute
to increased responses to background stimuli (Amenedo and Diaz 1998b, 1998a). The neural
mechanisms that underlie the increased processing of background information across sensory
modalities are likely not associated with attention per se but with the amount of information
processed by sensory systems. Increases in processing of information at baseline would result in
increased noise when processing task-relevant information. Many psychophysical and imaging
experiments are designed to isolate a process or variable of interest by comparing two different
conditions, for instance, isolating the effects of selective attention by using performance during
divided attention as a baseline. This means that studies of baseline cognitive states are inherently
difficult to perform because the goal is to measure brain responses without perturbing the system.
When evaluating baseline, brain imaging has become the tool of choice as traditional behavioural
studies cannot be used to assess the resting brain. The benefit of imaging data is that neural activ-
ity can be recorded even in the absence of a task, and this baseline state can be compared across
study populations or correlated with performance on perceptual or cognitive tasks.
A recent study of resting brain function using perfusion imaging rather than fMRI was per-
formed by Hugenschmidt and colleagues (Hugenschmidt et al. 2009c). The distinct advantage of
perfusion imaging is that it is quantitative and absolute levels of blood flow can be determined at
rest. One can think of resting perfusion imaging as similar to resting positron emission tomo-
graphic imaging. Traditional functional MRI has the limitation of being a relative measure and
cannot measure absolute baseline activity. This study evaluated whole-brain perfusion in younger
and older adults while participants viewed a fixation point (baseline) and while participants
watched a video clip with no sound. Two findings of this study are highly relevant to the discus-
sion of multisensory integration in older adults (Fig. 11.3). First, it was demonstrated that the
level of activity in auditory cortex was reduced when changing from a resting state to viewing a
video. This is a neural correlate of visual attention and replicates prior studies using fMRI that
have highlighted crossmodal deactivations during single sensory stimulation. It is important to
note that engaging the visual system produced suppressions in auditory cortex of the same mag-
nitude in younger and older adults. Thus, the imaging data corroborate the behavioural study
showing that older adults can engage modality-specific attention and suppress the processing of
incoming sensory information in the ignored sensory domain comparably to younger adults. The
other finding that is critical to the current discussion is that older adults exhibited an overall
higher level of activity in the auditory cortex. This finding was present both at baseline and during
visual engagement. The importance of this finding is that older adults have a higher level of audi-
tory processing during low-level visual processing (fixation) as well as during high-level visual
engagement (watching a movie). This increase in baseline sensory processing may contribute to
increased multisensory integration and increased crossmodal distraction in older adults. In fact,
the level of auditory activity was correlated with performance on a visual task with auditory dis-
tracters in the older adults (Fig. 11.3). In other words, the higher the level of baseline auditory
processing in older adults, the worse the performance on the crossmodal distraction task. This
relationship was not significant in the younger population. The outcomes of this study were inter-
preted as representing an increase in the baseline level of sensory processing in older adults that
contributed to increased multisensory interactions.
Other imaging studies provide further support for the notion that there are alterations in the
baseline processing in older adults. Functional brain-imaging studies have successfully identified
a brain network that is most active at rest and is suppressed during the performance of purposeful
tasks (Gusnard et al. 2001; Raichle et al. 2001).This network has been called the default-mode
Fig. 11.3 Quantitative cerebral perfusion. Average images from 20 young and 20 older adults.
The images are normalized to the global mean. Note the enhanced activity in auditory cortices
in the older adults. Both groups show reductions in auditory activity during the video but the older
adults continued to process more auditory information than the young adults (Reproduced in colour
in the colour plate section).
network (DMN) because it is believed that activity in this network is a default brain state. The
DMN is thought to act as an attentional filter that combines functions of monitoring the external
environment and orienting attention with maintenance of a self-referenced representation of
the world (Buckner et al. 2008; Raichle et al. 2001; Shulman et al. 2007). Network activity is
suppressed during purposeful tasks and the magnitude of suppression is proportional to the
difficulty of the task (McKiernan et al. 2003). Furthermore, DMN activity is related to errors
(Li et al. 2007) and lapses in attention (Weissman et al. 2006). When participants are performing
a task and have a lapse of attention or make an error, activity in the DMN is suppressed less dur-
ing that trial compared to successful trials.
The available data on default-mode network functioning indicate it could be a crucial player in the
information-processing changes observed in older adults. First, it has been repeatedly demonstrated
that older adults are less able to suppress activity in the DMN during the performance of purposeful
tasks (Grady et al. 2006; Lustig et al. 2003; Persson et al. 2007). It has also been shown that the rela-
tionship between task demands and degree of DMN suppression is disrupted in older adults (Persson
et al. 2007). If increased suppression is necessary for the completion of more difficult tasks, then older
adults will be at a distinct disadvantage as task difficulty increases. In addition, a recent study by
Stevens and colleagues demonstrated that older adults show increased activity in auditory cortex dur-
ing missed trials on a memory task (Stevens et al. 2008). Interestingly, functional connectivity analy-
sis linked this increase in auditory cortical activity with brain regions involved in the DMN.
While there is considerable evidence that older adults do not suppress activity in the DMN
to the same degree as younger adults, a major issue remains unresolved. There are no data pub-
lished to date that specifically evaluate the resting DMN activity level in older adults. This is an
ALTERNATIVE MECHANISMS FOR AGE-RELATED CHANGES IN MULTISENSORY PROCESSING 263
important issue because decreased suppression during tasks could be related to an inability to
suppress the network or due to decreased baseline activity levels. The available research clearly
demonstrates that suppression of the DMN during task performance is important, but conceptu-
alizing the network in terms of suppression implies that the DMN is a sort of nuisance network,
where its importance to voluntary cognitive functions lies primarily in minimizing its activity
during tasks. Following this logic, patients with Alzheimer’s disease, who show significant
hypometabolism in the precuneus/posterior cingulate component of the DMN (Bradley et al.
2002; Chetelat et al. 2008; Langbaum et al. 2009; Petrie et al. 2009; Schroeter et al. 2009), should
have very good cognitive functioning during purposeful tasks because the DMN is not active.
However, this is clearly not the case. This issue requires further study in order to interpret age-
related changes in baseline function and their relationship to multisensory integration.
11.5 Alternative mechanisms for age-related changes

in multisensory processing
The neural mechanisms underlying cognitive decline in aging are a subject of great debate, and
there are several theories that warrant discussion. Common theories concerning general cognitive
decline include reduction in speed of processing (Salthouse 1996), processing capacity (Lavie
1995), and inhibitory function (Hasher and Zacks 1988). Below, each of these theories is briefly
reviewed, noting that each has strengths and weakness and that each should be considered when
discussing sensory processing. After all, alterations in cognitive function could have implications
for multisensory processing and vice versa. It is easy to envision some combination of each of
these mechanisms contributing to alterations in multisensory integration. One could also con-
sider the implications of combined effects of the mechanisms below with the processes described
in the prior sections of this chapter.
11.5.1 Speed of processing

Reduction in the speed of cognitive processing as a theory of cognitive aging has received substan-
tial attention recently. The theory is that overall processing speed is reduced, and this results in
failed processing secondary to time limitations and disruption in sequential processing (Cerella
1985, 1991, 1994; Salthouse 1996; Verhaeghen and Cerella 2002). Older adults typically exhibit
disproportionate response-time increases with increased demands regardless of the task type
(Cerella 1991, 1994; Fisk et al. 1992; Jacewicz and Hartley 1987), a concept known as general
cognitive slowing. It has been shown that many of the age-related differences in cognitive func-
tion may be attributed to reductions in processing speed (Salthouse 2000; Verhaeghen and
Salthouse 1997). Recent studies have suggested that the slowing is focused at the sensorimotor
transformation, but this finding remains equivocal (Falkenstein et al. 2006; Kolev et al. 2006;
Peiffer et al. 2008 ; Yordanova et al. 2004b) and the neural mechanisms underlying this
slowing remain a mystery. It is interesting to note that under highly controlled conditions with
simple tasks older adults can perform as fast as younger adults (Looren de Jong et al. 1989; Peiffer
et al. 2007; Yordanova et al. 2004a). However, with increased task difficulty, older adults exhibit
accelerated response slowing. From the point of view of a multisensory integration task, the uni-
sensory trials are more difficult than the multisensory trials. So it is possible that older adults
actually show disproportionate slowing on unisensory trials compared to younger adults. When
multisensory trials are compared to the unisensory trials, older adults will exhibit disproportion-
ate speeding simply because the task is easier rather than because the magnitude of integration has
increased. While there remains more work do in this area, a study by Peiffer and colleagues took
a first step towards addressing this possibility (Peiffer et al. 2007). The premise of the study was as
follows. If older adults were gaining more than younger adults simply due to diminished process-
ing of the unisensory stimuli, then one could eliminate the enhanced integration by matching
performance on the unisensory trials. The interesting finding was that when they utilized a task
where the young and older adults were matched in performance for the unisensory conditions,
the older adults still gained more than the younger adults. More studies are needed to clarify the
role of general cognitive slowing and enhanced integration in the elderly.
11.5.2 Inhibitory control

Reduced inhibitory processing was once the dominant process-specific theory to explain cogni-
tive aging (Hasher and Zacks 1988). Vast amounts of work have been dedicated to disproving this
theory (McDowd 1997) and much of that work has been successful. However, there is still con-
siderable evidence that a portion of age-related cognitive changes can be attributed to reduced
inhibition. For example, recent work by Hasher and colleagues demonstrates that background
visual information that is to be ignored during a task is processed more deeply by older adults
(Butler and Zacks 2006; Radvansky et al. 2005). The Stroop task (Stroop 1935) is commonly used
to assess automaticity in word processing, and older adults have been shown to exhibit deficits in
the suppression of automatic processes (Brink and McDowd 1999; Milham et al. 2002) Such
deficits have been argued to be due to reduced inhibition as well as general cognitive slowing
(Cerella 1994; Verhaeghen and Cerella 2002). As discussed above, it has been demonstrated that
older adults cannot suppress background stimulation across sensory modalities as well as younger
adults, and this is believed to be related to decreased inhibitory control (Alain and Woods 1999;
Amenedo and Diaz 1998a; Bunge et al. 2001; Poliakoff et al. 2006). One can clearly see the link
between changes in inhibitory control and the ability to suppress multisensory integration.
However, the mechanisms that underlie this idea remain equivocal and likely do not relate to the
ability to engage attention mechanisms, at least across sensory modalities.
11.5.3 Attentional capacity

Given that older adults are able to selectively attend to a single sensory modality, it is important
to consider that there could be changes in other aspects of attention that result in altered multi-
sensory processing. It has been proposed that the ability to attend to specific aspects of a task
places a processing load on our attentional system—a system with limited capacity. Lavie has
proposed that selective attention depends on this capacity limitation (Lavie and Tsal 1994; Lavie
1995) and further, that older adults have a reduction in available capacity. Maylor and Lavie
manipulated the load being placed on the attentional system and evaluated the degree to which
this load altered attention in younger and older adults (Maylor and Lavie 1998). The findings
showed that older adults exhibited load-dependent effects in attention beyond those observed in
young adults. This finding was interpreted as a reduction in the attentional capacity in the older
adults. It is also important to note that our attentional processing resources appear to be specific
to the sensory modality (Lavie 2005; Rees et al. 2001). While Hugenschmidt et al. (Hugenschmidt
et al. 2009b) showed that older adults can selectively attend to a single modality, no manipula-
tions in the attentional load were performed. It is possible that under higher demands the older
adults would fail to suppress the processing in the ignored sensory domain. Since Hugenschmidt
and colleagues performed their experiments in a highly controlled, sound- and light-attenuated
environment, the sensory load was low. In the natural environment the sensory load is much
higher and could result in crossmodal attention deficits. While such deficits could alter the
amount of integration under selective attention it is unlikely that this could account for the
baseline changes in integration that are observed in older adults.
REFERENCES 265
11.6 Summary and conclusions

In the first part of this chapter it was established that older adults exhibit an increase in the mag-
nitude of multisensory integration. The increase in the magnitude of integration may result in
speeding during controlled laboratory experiments, but it can also result in greater distractibility
and crossmodal interference. It is likely that in the complex sensory environments encountered in
the real world, older adults experience greater performance decrements than performance
enhancements due to the increase in multisensory integration. Research on the interaction
between age and the probability of integration is quite limited. Hopefully future research can
confirm and expand upon the finding that the probability of integration decreases even though
the magnitude increases with age.
It was further demonstrated that multisensory integration can be suppressed substantially by
selectively attending to a single sensory modality. While one may hypothesize that age-related
declines in modality-specific attention could be the driving force behind increased integration,
this does not appear to be the case. Older adults are not only able to engage modality-specific
attention to the same degree as young adults, but they are also able to suppress integration during
selective attention. However, the increase in integration at baseline means that even during selec-
tive attention the magnitude of integration in older adults approaches that of younger adults
when they are not selectively attending to a single sensory modality.
Changes in baseline sensory processing were hypothesized to be related to the increase in inte-
gration at baseline, and brain-imaging data supported this hypothesis. At rest and during visual
engagement, older adults exhibit greater levels of auditory activity than younger adults, indicating
that the older adults have increased levels of crossmodal noise.
Studies of multisensory integration in older adults are in their infancy and much remains to be
learned. For instance, to our knowledge there are no studies examining the influence of the order
in which the senses decline on multisensory processes. Whether someone loses their hearing or
their vision first, and the severity of their sensory deficit(s), may influence the relative weightings
of incoming sensory information. This would almost certainly alter multisensory integration. Of
pressing interest is gaining a better understanding of why older adults exhibit enhanced integra-
tion at baseline and how baseline brain networks may contribute to this outcome. It is likely that
alterations in attentional mechanisms will need to be addressed and the role of attentional capac-
ity remains unresolved. Studies that manipulate the sensory load and evaluate multisensory inte-
gration may be able to address these critical issues.
Acknowledgements
The authors would like to acknowledge the funding from the National Institutes of Health
(AG026353, NS42568, and AG030838) as well as the Kulynych Center for Memory and Cognition
at the Wake Forest University.
References
Alain, C., and Woods, D.L. (1999). Age-related changes in processing auditory stimuli during visual attention:
evidence for deficits in inhibitory control and sensory memory. Psychology and Aging, 14, 507–519.
Alsius, A., Navarra, J., Campbell, R., and Soto-Faraco, S. (2005). Audiovisual integration of speech falters
under high attention demands. Current Biology, 15, 839–43.
Alsius, A., Navarra, J., and Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech
integration. Experimental Brain Research, 183, 399–404.
Amenedo, E., and Diaz, F. (1998a). Effects of aging on middle-latency auditory evoked potentials: a
cross-sectional study. Biological Psychiatry, 43, 210–219.
Amenedo, E., and Diaz, F. (1998b). Aging-related changes in processing of non-target and target stimuli
during an auditory oddball task. Biological Psychology, 48, 235–67.
Baloh, R.W., Jacobson, K.M., and Socotch, T.M. (1993). The effect of aging on visual-vestibuloocular
responses. Experimental Brain Research, 95, 509–516.
Bradley, K.M., O’Sullivan, V.T., Soper, N.D.W., et al. (2002). Cerebral perfusion SPET correlated with
Braak pathological stage in Alzheimer’s disease. Brain, 125, 1772–81.
Brink, J.M., and McDowd, J.M. (1999). Aging and selective attention: an issue of complexity or multiple
mechanisms? Journal of Gerontology: Psychological Sciences and Social Sciences, 54, 30–33.
Buckner, R.L., Andrews-Hanna, J.R., and Schacter, D.L. (2008). The brain’s default network: anatomy,
function, and relevance to disease. Annals of the New York Academy of Sciences, 1124, 1–38.
Bunge, S.A., Ochsner, K.N., Desmond, J.E., Glover, G.H., and Gabrieli, J.D. (2001). Prefrontal regions
involved in keeping information in and out of mind. Brain, 124, 2074–86.
Butler, K.M., and Zacks, R.T. (2006). Age deficits in the control of prepotent responses: evidence for an
inhibitory decline. Psychology and Aging, 21, 638–43.
Cerella, J. (1985). Information processing rates in the elderly. Psychological Bulletin, 98, 67–83.
Cerella, J. (1991). Age effects may be global, not local: comment on Fisk and Rogers (1991). Journal of
Experimental Psychology: General, 120, 215–23.
Cerella, J. (1994). Generalized slowing in Brinley plots. Journal of Gerontology, 49, 65–71.
Colonius, H., Diederich, A., and Steenken, R. (2009). Time-window-of-integration (TWIN) model for
saccadic reaction time: effect of auditory masker level on visual-auditory spatial interaction in
elevation. Brain Topography, 21, 177–84.
Diederich, A., and Colonius, H. (2009). Crossmodal interaction in speeded responses: time window of
integration model. Progress in Brain Research, 174, 119–35.
Diederich, A., Colonius, H., and Schomburg, A. (2008). Assessing age-related multisensory enhancement
with the time-window-of-integration model. Neuropsychologia, 46, 2556–62.
Enrietto, J.A., Jacobson, K.M., and Baloh, R.W. (1999). Aging effects on auditory and vestibular responses:
a longitudinal study. American Journal of Otolaryngology, 20, 371–78.
Falkenstein, M., Yordanova, J., and Kolev, V. (2006). Effects of aging on slowing of motor-response
generation. International Journal of Psychophysiology, 59, 22–29.
Fisk, A.D., Fisher, D.L., and Rogers, W.A. (1992). General slowing alone cannot explain age-related search
effects: reply to Cerella (1991). Journal of Experimental Psychology: General, 121, 73–78.
Gates, G.A., Cooper, J.C., Jr., Kannel, W.B., and Miller, N.J. (1990). Hearing in the elderly: the Framingham
cohort, 1983–1985. Part I. Basic audiometric test results. Ear and Hearing, 11, 247–56.
Grady, C.L., Springer, M.V., Hongwanishkul, D., McIntosh, A.R., and Winocur, G. (2006). Age-related
changes in brain activity across the adult lifespan. Journal of Cognitive Neuroscience, 18, 227–41.
Gusnard, D.A., Raichle, M.E., and Raichle, M.E. (2001). Searching for a baseline: functional imaging and
the resting human brain. Nature Reviews Neuroscience, 2, 685–94.
Hasher, L., and Zacks, R.T. (1988). Working memory, comprehension, and aging: a review and a new view.
In The psychology of learning and motivation: advances in research and theory (ed. G.H. Bower),
pp.193–225. Academic Press, New York.
Haxby, J.V., Horwitz, B., Ungerleider, L.G., Maisog, J.M., Pietrini, P., and Grady, C.L. (1994). The
functional organization of human extrastriate cortex: a PET-rCBF study of selective attention to faces
and locations. Journal of Neuroscience, 14, 6336–53.
Helfer, K.S. (1998). Auditory and auditory-visual recognition of clear and conversational speech by older
adults. Journal of the American Academy of Audiology, 9, 234–42.
Hirvonen, T.P., Aalto, H., Pyykko, I., Juhola, M., and Jantti, P. (1997). Changes in vestibulo-ocular reflex of
elderly people. Acta Otolaryngology Supplment, 529, 108–110.
Ho, C., Reed, N., and Spence, C. (2007). Multisensory in-car warning signals for collision avoidance.
Human Factors, 49, 1107–1114.
REFERENCES 267
Hoffman, H.J., Ishii, E.K., and MacTurk, R.H. (1998). Age-related changes in the prevalence of smell/taste
problems among the United States adult population. Results of the 1994 disability supplement to the
National Health Interview Survey (NHIS). Annals of the New York Academy of Sciences, 855, 716–22.
Holmes, N.P. (2009). The principle of inverse effectiveness in multisensory integration: some statistical
considerations. Brain Topography, 21, 168–76.
Hugenschmidt, C.E., Mozolic, J.L., and Laurienti, P.J. (2009a). Suppression of multisensory integration by
modality-specific attention in aging. Neuroreport, 20, 349–53.
Hugenschmidt, C.E., Peiffer, A.M., McCoy, T.P., Hayasaka, S., and Laurienti, P.J. (2009b). Preservation of
crossmodal selective attention in healthy aging. Experimental Brain Research, 198, 273–85.
Hugenschmidt, C.E., Mozolic, J.L., Tan, H., Kraft, R.A., and Laurienti, P.J. (2009c). Age-related increase in
cross-sensory noise in resting and steady-state cerebral perfusion. Brain Topography, 21, 241–51.
Jacewicz, M.M., and Hartley, A.A. (1987). Age differences in the speed of cognitive operations: resolution
of inconsistent findings. Journal of Gerontology, 42, 86–88.
Johnson, J.A., and Zatorre, R.J. (2006). Neural substrates for dividing and focusing attention between
simultaneous auditory and visual events. NeuroImage, 31, 1673–81.
Jones, J.A., and Jarick, M. (2006). Multisensory integration of speech signals: the relationship between space
and time. Experimental Brain Research, 174, 588–94.
Kaneda, H., Maeshima, K., Goto, N., Kobayakawa, T., Ayabe-Kanamura, S., and Saito, S. (2000). Decline in
taste and odor discrimination abilities with age, and relationship between gustation and olfaction.
Chemical Senses, 25, 331–37.
Kawashima, R., O’Sullivan, B.T., and Roland, P.E. (1995). Positron-emission tomography studies of cross-
modality inhibition in selective attentional tasks: closing the ‘mind's eye’. Proceedings of the National
Academy of Sciences USA, 92, 5969–72.
Kim, S., and James, T.W. (2010). Enhanced effectiveness in visuo-haptic object-selective brain regions with
increasing stimulus salience. Human Brain Mapping, 31, 1719–30.
Kolev, V., Falkenstein, M., and Yordanova, J. (2006). Motor-response generation as a source of aging-
related behavioural slowing in choice-reaction tasks. Neurobiology of Aging, 27, 1719–30.
Langbaum, J.B.S., Chen, K., Lee, W., et al. (2009). Categorical and correlationl analyses of baseline
fluorodeoxyglucose positron emission tomography images from the Alzheimer’s disease neuroimaging
initiative (ADNI). Neuroimage, 45, 1107–16.
Laurienti, P., Burdette, J., Wallace, M.T., Yen, Y.F., Field, A., and Stein, B. (2001). Deactivation of sensory-
specific cortices: Evidence for cross-modal inhibition. Neuroimage, 13, S904–S04.
Laurienti, P.J., Kraft, R.A., Maldjian, J.A., Burdette, J.H., and Wallace, M.T. (2004). Semantic congruence is
a critical factor in multisensory behavioral performance. Experimental Brain Research, 158, 405–414.
Laurienti, P.J., Burdette, J.H., Maldjian, J.A., and Wallace, M.T. (2006a). Enhanced multisensory
integration in older adults. Neurobiology of Aging, 27, 1155–563.
Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental
Psychology: Human Percepttion and Performance, 21, 451–68.
Lavie, N. (2005). Distracted and confused?: selective attention under load. Trends in Cognitive Science, 9,
75–82.
Lavie, N., and Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in visual
attention. Perception and Psychophysics, 56, 183–97.
Li, C.S., Yan, P., Bergquist, K.L., and Sinha, R. (2007). Greater activation of the ‘default’ brain regions
predicts stop signal errors. Neuroimage, 38, 640–48.
Li, R.W., Edwards, M.H., and Brown, B. (2000). Variation in vernier acuity with age. Vision Research, 40,
3775–81.
Lichtenstein, M.J. (1992). Hearing and visual impairments. Clinics in Geriatric Medicine, 8, 173–82.
Looren de Jong, H., Kok, A., and Van Rooy, J.C. (1989). Stimulus probability and motor response in young
and old adults: an ERP study. Biological Psychology, 29, 125–48.
Lopez, I., Honrubia, V., and Baloh, R.W. (1997). Aging and the human vestibular nucleus. Journal of
Vestibular Research, 7, 77–85.
Lustig, C., Snyder, A.Z., Bhakta, M., et al. (2003). Functional deactivations: change with age and dementia
of the Alzheimer type. Proceedings of the National Academy of Sciences USA, 100, 14504–14509.
Ma, W.J., Zhou, X., Ross, L.A., Foxe, J.J., and Parra, L.C. (2009). Lip-reading aids word recognition most in
moderate noise: a Bayesian explanation using high-dimensional feature space. PLoS One, 4, e4638.
Maki, B. E., Cheng, K.C., Mansfield, A., et al. (2008). Preventing falls in older adults: new interventions to
promote more effective change-in-support balance reactions. Journal of Electromyography and
Kinesiology, 18, 243–54.
Maylor, E.A., and Lavie, N. (1998). The influence of perceptual load on age differences in selective
attention. Psychology and Aging, 13, 563–73.
McDowd, J.M. (1997). Inhibition in attention and aging. Journal of Gerontology: Psychological Sciences and
Social Sciences, 52, 265–73.
McKiernan, K.A., Kaufman, J.N., Kucera-Thompson, J., and Binder, J.R. (2003). A parametric
manipulation of factors affecting task-induced deactivation in functional neuroimaging. Journal of
Cognitive Neuroscience, 15, 394–408.
Meredith, M.A., and Stein, B.E. (1986a). Spatial factors determine the activity of multisensory neurons in
cat superior colliculus. Brain Research, 365, 350–54.
Meredith, M.A., and Stein, B.E. (1986b). Visual, auditory, and somatosensory convergence on cells in
Meredith, M.A., and Stein, B.E. (1996). Spatial determinants of multisensory integration in cat superior
colliculus neurons. Journal of Neurophysiology, 75, 1843–57.
Meredith, M.A., Nemitz, J.W., and Stein, B.E. (1987). Determinants of multisensory integration in superior
colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7, 3215–29.
Milham, M.P., Erickson, K.I., Banich, M.T., et al. (2002). Attentional control in the aging brain: insights
from an fMRI study of the stroop task. Brain and Cognition, 49, 277–96.
Miller, J. (1982). Divided attention: evidence for coactivation with redundant signals. Cognitive Psychology,
14, 247–79.
Miller, J. (1986). Timecourse of coactivation in bimodal divided attention. Perception and Psychophysics, 40,
331–43.
Mozolic, J.L., Hugenschmidt, C.E., Peiffer, A.M., and Laurienti, P.J. (2008a). Modality-specific selective
attention attenuates multisensory integration. Experimental Brain Research, 184, 39–52.
Mozolic, J.L., Joyner, D., Hugenschmidt, C.E., et al. (2008b). Cross-modal deactivations during modality-
specific selective attention. BMC Neurol, 8, 35.
Nusbaum, N. J. (1999). Aging and sensory senescence. Southern Medical Journal, 92, 267–75.
Peiffer, A.M., Mozolic, J.L., Hugenschmidt, C.E., and Laurienti, P.J. (2007). Age-related multisensory
enhancement in a simple audiovisual detection task. Neuroreport, 18, 1077–81.
Peiffer, A.M., Maldjian, J.A., and Laurienti, P.J. (2008). Resurrecting Brinley plots for a novel use: meta-
analyses of functional brain imaging data in older adults. International Journal of Biomedical Imaging,
2008, Article ID 167078 (7 pages).
Persson, J., Lustig, C., Nelson, J. K., and Reuter-Lorenz, P.A. (2007). Age differences in deactivation: a link
to cognitive control? Journal of Cognitive Neuroscience, 19, 1021–32.
Petrie, E.C., Cross, D.J., Galasko, D., et al. (2009). Preclinical evidence of Alzheimer changes: convergent
cerebrospinal fluid biomarker and fluorodeoxyglucose positron emission tomography findings.
Archives of Neurology, 66, 632–37.
Poliakoff, E., Ashworth, S., Lowe, C., and Spence, C. (2006). Vision and touch in ageing: crossmodal
selective attention and visuotactile spatial interactions. Neuropsychologia, 44, 507–517.
REFERENCES 269
Radvansky, G.A., Zacks, R.T., and Hasher, L. (2005). Age and inhibition: the retrieval of situation models.
Journal of Gerontology: Psychological Sciences and Social Sciences, 60, 276–78.
Raichle, M.E., MacLeod, A.M., Snyder, A.Z., Powers, W.J., Gusnard, D.A., and Shulman, G.L. (2001).
A default mode of brain function. Proceedings of the National Academy of Sciences USA, 98, 676–82.
Rees, G., Frith, C., and Lavie, N. (2001). Processing of irrelevant visual motion during performance of an
auditory attention task. Neuropsychologia, 39, 937–49.
Ross, L.A., Saint-Amour, D., Leavitt, V.M., Javitt, D.C., and Foxe, J.J. (2007). Do you see what I am saying?
Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17,
1147–53.
Salthouse, T.A. (1996). The processing-speed theory of adult age differences in cognition. Psychological
Review, 103, 403–28.
Salthouse, T.A. (2000), Aging and measures of processing speed. Biologial Psychology, 54, 35–54.
Senkowski, D., Saint-Amour, D., Gruber, T., and Foxe, J.J. (2008). Look who’s talking: the deployment of
visuo-spatial attention during multisensory speech processing under noisy environmental conditions.
Neuroimage, 43, 379–87.
Shomstein, S., and Yantis, S. (2004). Control of attention shifts between vision and audition in human
cortex. Journal of Neuroscience, 24, 10702.
Shulman, G.L., Astafiev, S.V., McAvoy, M.P., d’Avossa, G., and Corbetta, M. (2007). Right TPJ deactivation
during visual search: functional significance and support for a filter hypothesis. Cerebral Cortex, 17,
2625–33.
Spear, P.D. (1993). Neural bases of visual deficits during aging. Vision Research, 33, 2589–2609.
Spence, C., and Driver, J. (1997). On measuring selective attention to an expected sensory modality.
Spence, C. and Ho, C. (2008). Multisensory interface design for drivers: past, present and future.
Ergonomics, 51, 65–70.
Spence, C., Nicholls, M.E., and Driver, J. (2001). The cost of expecting events in the wrong sensory
modality. Perception and Psychophysics, 63, 330–36.
Spence, C., Pavani, F., Maravita, A., and Holmes, N. (2004). Multisensory contributions to the 3-D
representation of visuotactile peripersonal space in humans: evidence from the crossmodal congruency
task. Journal of Physiology Paris, 98, 171–89.
Stephen, J.M., Knoefel, J.E., Adair, J., Hart, B., and Aine, C.J. (2010). Aging-related changes in auditory and
visual integration measured with MEG. Neuroscience Letters, 484, 76–80.
Stevens, W.D., Hasher, L., Chiew, K.S., and Grady, C.L. (2008). A neural mechanism underlying memory
failure in older adults. Journal of Neuroscience, 28, 12820–24.
Stevenson, R.A., and James, T.W. (2009). Audiovisual integration in human superior temporal sulcus: inverse
effectiveness and the neural processing of speech and object recognition. Neuroimage, 44, 1210–23.
Stevenson, R.A., Kim, S., and James, T.W. (2009). An additive-factors design to disambiguate neuronal and
areal convergence: measuring multisensory interactions between audio, visual, and haptic sensory
streams using fMRI. Experimental Brain Research, 198, 183–94.
Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18,
643–62.
Strupp, M., Arbusow, V., Borges Pereira, C., Dieterich, M., and Brandt, T. (1999). Subjective straight-ahead
during neck muscle vibration: effects of ageing. Neuroreport, 10, 3191–94.
Van Wanrooij, M.M., Bell, A.H., Munoz, D.P., and Van Opstal, A.J. (2009). The effect of spatial-temporal
audiovisual disparities on saccades in a complex scen. Experimental Brain Research, 198, 425–37.
van Wassenhove, V., Grant, K.W., and Poeppel, D. (2007). Temporal window of integration in auditory-visual
speech perception. Neuropsychologia, 45, 598–607.
Verhaeghen, P., and Cerella, J. (2002). Aging, executive control, and attention: a review of meta-analyses.
Neuroscience and Biobehavorial Reviews, 26, 849–57.
Verhaeghen, P., and Salthouse, T.A. (1997). Meta-analyses of age-cognition relations in adulthood:
estimates of linear and nonlinear age effects and structural models. Psychological Bulletin, 122, 231–49.
Wayler, A.H., Perlmuter, L.C., Cardello, A.V., Jones, J.A., and Chauncey, H.H. (1990). Effects of age and
removable artificial dentition on taste. Special Care Dentistry, 10, 107–113.
Weissman, D.H., Roberts, K.C., Visscher, K.M., and Woldorff, M.G. (2006). The neural bases of
momentary lapses in attention. Natural Neuroscience, 9, 971–78.
Whipple, R., Wolfson, L., Derby, C., Singh, D., and Tobin, J. (1993). Altered sensory function and balance
in older persons. Journal of Gerontology, 48, 71–76.
Yordanova, J., Kolev, V., Hohnsbein, J., and Falkenstein, M. (2004a). Sensorimotor slowing with ageing is
mediated by a functional dysregulation of motor-generation processes: evidence from high-resolution
event-related potentials. Brain, 127, 351–62.
Yousem, D.M., Maldjian, J.A., Hummel, T., et al. (1999). The effect of age on odor-stimulated functional
MR imaging. American Journal of Neuroradiology, 20, 600–608.
Part B
Atypical multisensory
development
Chapter 12
Developmental disorders and

multisensory perception
Elisabeth L. Hill, Laura Crane,
and Andrew J. Bremner
12.1 Introduction
Developmental disorders are disorders of the brain and nervous system that have their onset
during early development, in contrast to disorders acquired in later life (Tager-Flusberg 1999).
This difference in timing of onset has important implications for how we can characterize a
disorder and its causal origins; if the onset of a disorder is early in development, whether it is
caused by aberrant genes or environment, the phenotypical outcome will be the result of an
interaction between the initial atypical state of the individual and the subsequent ontogenetic
developmental process (Bishop 1997; Karmiloff-Smith 1998, 2009). Across the study of develop-
mental disorders, the developmental role of multisensory, and indeed unisensory, perceptual
deficits has received only patchy consideration. Developmental dyslexia (DD), for example, was
originally characterized as ‘word-blindness’, a visual perceptual impairment, with little consid-
eration given to the developmental ontogeny of this impairment. Moreover, despite the multi-
sensory basis of reading (in which we have to learn relations between auditory phonology
and visual orthography), more recent perceptual accounts of DD have typically not focused on
multisensory impairments. Similarly, despite the sensory abnormalities reported by Kanner
(1943) and Asperger (1944/1991) in their first case study reports of autism, early formal diag-
nostic classifications (e.g. Wing and Gould’s 1979, ‘triad of impairments’ in socialization, com-
munication, and imagination), and later DSM classifications (where ‘imagination’ was replaced
with repetitive or stereotyped behaviours) make no reference to perceptual/sensory abnormali-
ties. This is despite the numerous reports of such abnormalities across multiple senses in this
disorder (see below).
This less-than-central consideration given to the developmental impact of multisensory per-
ceptual impairments seems somewhat neglectful. Our ability to perceive the environment in a
coherent way, which is largely dependent upon an ability to integrate information across multiple
sensory channels, is arguably among the first cognitive skills that the human infant seeks to mas-
ter (e.g. Piaget 1952; see also Chapter 8 by Bahrick and Lickliter; Chapter 5 by Bremner et al.;
Chapter 7 by Lewkowicz). If such processes are impaired this is likely to have significant down-
stream developmental effects. Furthermore, multisensory processing abnormalities are impli-
cated across a wide range of developmental disorders and, as we shall discuss later, there also
appear to be some surprising symptomatic commonalities between disorders in terms of multi-
sensory processing. This raises the interesting possibility that multisensory processing abnor-
malities could represent a particular vulnerability or, perhaps more importantly, a particular risk
factor in atypical development.
274 DEVELOPMENTAL DISORDERS AND MULTISENSORY PERCEPTION
12.2 Why study atypical multisensory development?

12.2.1 The relation between unisensory and multisensory processing
impairments in developmental disorders
While the majority of the chapters in this volume examine the typical development of multisen-
sory processes, here we will review the literature pointing to multisensory processing abnormali-
ties in atypically developing individuals. We will consider developmental disorders that are, for
the most part, diagnosed on the basis of behavioural characteristics and are characterized by spe-
cific cognitive profiles. We will not consider neurological or genetic markers, or a sensory recep-
tor impairment—for coverage of atypical multisensory processing in individuals with congenital
and late blindness see Chapter 13 by Röder. We will focus specifically on three developmental
disorders, namely developmental coordination disorder (DCD), autism spectrum disorder
(ASD), and developmental dyslexia (DD) in which abnormalities point most clearly towards a
potential multisensory impairment. Because other disorders, such as attention deficit hyperactiv-
ity disorder (ADHD), are also suggestive of sensory processing impairments, we will refer to these
where appropriate throughout.1
As already stated, there have been relatively few studies of multisensory impairments in devel-
opmental disorders. We believe that there are now a number of converging reasons to pursue the
question of multisensory impairment in atypical development more seriously. The broad litera-
ture on multisensory perceptual processes (cf. Calvert et al. 2004; Spence and Driver 2004; Stein
and Meredith 1993) demonstrates quite conclusively that our senses do not function in isolation,
either at the level of basic perceptual processes or attention. Consequently, the growing literature
investigating sensory processing impairments across various disorders will have to acknowledge,
sooner or later, that, even if the ontogeny of a sensory processing problem is a basic unisensory
deficit, difficulties with one sense will have important consequences for the way the other senses
function, and importantly how multisensory development proceeds. Strong evidence for the
developmental impact of a unisensory deficit on the development of multisensory processes can
be seen from the literatures investigating sensory deprivation in developing animals (Chapter 14
by Wallace et al.) and in human infants and children with sensory loss (Chapter 13 by Röder).
Similarly, a deficit with multisensory integration can have consequences for sensory processing
when that is measured with respect to a single sensory modality.
12.2.2 A particular vulnerability of multisensory integration in

developmental disorders: hyposensitivity, hypersensitivity, and
sensorimotor problems
Another important reason to consider atypical multisensory development is that evidence
concerning the development of multisensory integration (described throughout this volume)
is increasingly indicating that the ability of typically developing individuals to optimally integrate
the senses to speed and improve the accuracy of perceptual judgements or responses undergoes
significant developments right across infancy, childhood, and into adolescence (e.g. Gori
et al. 2008; Nardini et al. 2008; Neil et al. 2006; Tremblay et al. 2007). Given the extended develop-
ment of multisensory integration and the fact that environmental input is considered to be an
1 We have not devoted an entire section to ADHD as it is as yet unclear whether reported atypical responses
to sensory information arise from sensory processing abnormalities per se or from difficulties with the
top-down control processes implicated in the hyperactive and inattentive difficulties found in this disorder
(Friedman-Hill et al. 2010).
DEVELOPMENTAL COORDINATION DISORDER 275
important factor in integration (Chapter 14 by Wallace et al.; Gori et al. 2010), one can postulate
that the typical emergence of such integrative processes could be particularly vulnerable to devia-
tions from an atypical developmental trajectory. Indeed, as already stated, there are some striking
commonalities across disorders in symptoms that are suggestive of a multisensory deficit. One
particular example of this is poor sensorimotor integration (or motor control), which appears to
be apparent in almost all developmental disorders (e.g. DCD, ASD, specific language impairment;
see, respectively, Sugden and Chambers 2005; Mari et al. 2003; Hill 2001). Sensorimotor control
typically requires information about both the body and the relationship between the body and the
external environment (Lee 1980). This information is provided via visual, somatosensory, prop-
rioceptive, and even auditory inputs, all of which require integration for optimal performance
(Chapter 6 by Nardini and Cowie; Ernst and Bülthoff 2004).
A further set of symptoms observed across disorders that may indicate multisensory deficits
comes from reports of hyper- or hyposensitivity to perceptual information arising from one
particular channel, such as hypersensitivity to sounds or tactile stimuli at the expense of other
modalities (Grandin and Scariano 1986; O’Brien et al. 2008; O’Neill and Jones 1997; Williams
1992). As we shall explain later, one possible explanation for hyper- or hyposensitivity to sensory
information arriving from a particular modality could be atypical integration of that sense with the
others.
We now move on to describe DCD, ASD, and DD. We will cover the literature indicating
multisensory processing abnormalities in each of these disorders, and will examine possible
developmental trajectories which may have led to such impairments (see Table 12.1 for a sum-
mary of studies examining multisensory processing abnormalities in these disorders).
12.3 Developmental coordination disorder

Developmental coordination disorder (DCD) is diagnosed in children who experience move-
ment difficulties out of proportion with their general development and in the absence of any
medical condition (such as cerebral palsy) or identifiable neurological disease (American
Psychiatric Association 2000, for full diagnostic criteria see Box 12.1). In daily life, this includes
gross motor difficulties (e.g. poor posture, problems walking up/down stairs with alternate feet,
poor sporting achievements) and fine motor difficulties (e.g. problems with buttoning, tying
laces, handwriting, etc). Individuals with DCD can also be hyper- or hyposensitive to noise,
touch, taste, temperature, or light (e.g. Dare and Gordon 1970), have difficulty with spatial
awareness, display poor body awareness (e.g. Hill and Bishop 1998), and experience socio-emo-
tional problems (e.g. Green et al. 2006; Skinner and Piek 2001; Wilson et al. 2000).
The general consensus, as well as the bulk of the empirical evidence, supports the view that
sensorimotor difficulties are the core deficit in those with DCD. Given that multisensory integra-
tion is at the heart of sensorimotor processes (see Chapter 5 by Bremner et al. and Chapter 6
by Nardini and Cowie), this disorder is a prime candidate for the influence of multisensory
processing abnormalities, with these impacting on, and being impacted by, atypical development
from birth.
12.3.1 Multisensory processing abnormalities in DCD

We began this chapter by pointing out that we do not process information from our sensory
modalities in isolation. For example, in controlling the movements of our body and limbs we
make use of information from a variety of sensory inputs and, in particular, vision and proprio-
ception (our sense of our body’s layout with respect to itself arising from receptors in our limbs).
Like vision, proprioception is a crucial source of input for efficient movement and thus engagement
276
DEVELOPMENTAL DISORDERS AND MULTISENSORY PERCEPTION
Table 12.1. Summary of the key studies assessing multisensory integration in groups with DCD, ASD, or DD that are referred to in this chapter. The right
hand column indicates whether the DCD, ASD, or DD group were impaired in multisensory integration, relative to any other groups. (VIQ = verbal IQ; PIQ
= performance IQ; FSIQ = full scale IQ)
Reference Diagnosis (ASD, Comparison Age (years) Matching criteria Multisensory Group
DCD or DD) Group modalities assessed impaired?
Hulme, DCD Typical children 11 Chronological age Visual; Spatial Yes
Smart, and
Moran (1982)
Lord and DCD Typical children Mean age of both Chronological age, Visual; Spatial Yes
Hulme (1987) groups = 8.8 (SD = 1.5) gender, VIQ, reading
ability
Mon-Williams, DCD Typical children 5–7 Chronological age Visual; Proprioceptive Yes
Wann, and
Pascal (1999)
Schoemaker DCD Typical children 6–12 Chronological age, Visual; Proprioceptive Yes
et al. (2001) gender
Sigmundsson DCD Typical children 6–8 Chronological age, Visual; Proprioceptive Yes
et al. (1999) gender
Smyth and DCD Typical children 5–8 Chronological age, Visual; Proprioceptive Yes
Mason (1998) gender, IQ
Bonneh et al. ASD N.A. (case study) 13 N.A. (case study) Visual; Auditory; N.A.
(2008) Tactile (case study)
Oberman and ASD Typical children Mean age ASD group = 9.27 Chronological age, Visual; Auditory Yes
Ramachandran (SD = 1.3); Mean age control VIQ, Matrices
(2008) group = 9.27 (SD = 1.3) (boys only)
Foss-Feig et al. ASD Typical children 8-17 (Mean age ASD Chronological age, Visual; Auditory Yes
(2010) group = 12.6, SD=2.6; gender, IQ
Mean age control
group = 12.09, SD=2.2)
Klin et al. ASD Typical infants, 2 (Mean age ASD group = 1.97, Chronological age Visual; Auditory Yes
(2009) SE=.25; Mean age control
group = 2.1, SE=.32)
Smith and ASD Typical children 12–19 Chronological age, Visual; Auditory Yes
Bennetto (2007) gender, FSIQ, receptive
language
Van der ASD Typical adults Mean age ASD group = 20.5 Chronological age; Visual; Auditory No
Smagt et al. (SD = 3.2); Mean age gender; VIQ; PIQ; FSIQ
(2007) control group = 20.7 (SD = 2.6)
Williams et al. ASD Typical children 5–13 Chronological age, Visual; Auditory No
(2004) gender
Birch and DD Typical children 9–10 Chronological age Visual; Auditory Yes
Belmont (1964) (boys only)
Birch and DD Typical children 9–10 Chronological age Visual; Auditory Yes
DEVELOPMENTAL COORDINATION DISORDER

Belmont (1965) (boys only)
Blau et al. DD Typical adults Mean age DD group = 23.5 Chronological age, Visual; Auditory Yes
(2009) (SD = 3.7); Mean age educational level,
control group = 26.8 (SD = 5.4) handedness, IQ
Hairston DD Typical adults 21–57 Chronological age Visual; Auditory Yes
et al. (2005)
277
Box 12.1. A summary of DSM-IV (1994) diagnostic criteria

for DCD
Criterion A:
Performance in daily activities that require motor coordination is substantially below that
expected given the individual’s chronological age and measured intelligence. This may be seen
in marked delays in achieving motor milestones (such as walking, crawling, sitting), dropping
things, “clumsiness”, poor performance in sports, or poor handwriting.
Criterion B:
The disturbance in Criterion A significantly interferes with academic achievement or activities
of daily living.
Criterion C:
The difficulties are not due to a general medical condition (such as cerebral palsy or hemiplegia)
and the individual does not meet criteria for a Pervasive Developmental Disorder.
Criterion D:
If Mental Retardation is present, the motor difficulties are in excess of those usually associated
with it.
N.B. Previously referred to as ‘clumsy child syndrome’ (Gubbay 1975, American Psychiatric Association
1980); now often referred to as ‘dyspraxia’. DCD is estimated to affect 2-5% of individuals (Lingam
et al. 2009, APA 2000, respectively). Like most developmental disorders, more males are affected than
females (5:1, Henderson and Hall 1982).
in activities of daily living, learning, and employment. Both static and dynamic (kinaesthetic)
information is provided from the proprioceptors. As such, a number of studies of crossmodal
processing of visual and proprioceptive information have been conducted in children with or
without DCD.
Difficulties in using visual and proprioceptive inputs concurrently would be expected to have a
particular impact on gross motor skills, including balancing, posture maintenance, and locomotion
(e.g. walking). To balance and locomote efficiently we must process visual information about the
body and external environment, proprioceptive information about limb and body position, and, on
the basis of this, initiate an appropriate corrective response (see Chapter 6 by Nardini and Cowie).
In addition to guiding proprioception and vision to produce accurate limb movements, the cross-
modal calibration and integration of these sources of information is also a critical ingredient in
balance and locomotion (Lee and Aronson 1974; Lee and Lishman 1975; Nardini et al. 2008).
Difficulties in balance are apparent in a significant proportion of children with DCD, evidenced
in performance on standardized motor tests such as the Movement ABC, as well as from the out-
comes of cluster analysis studies such as those reported by Dewey and Kaplan (1994), Hoare
(1994), and Macnab et al. (2001). Visual information is important for accurate, effortless balance,
and improvements in balance are observed over typical development. Interestingly, the reduction
of sway/falls seen in the swinging room paradigm over time age in typical development suggest
that proprioception becomes better able to override the misleading visual information present in
that environment. Lee and Aronson (1974) have argued that this is due to the crossmodal tuning
of the proprioceptors by vision across development.
A small number of papers have considered posture and balance in DCD. All of these highlight
important differences in performance compared to typically developing peers. Wann et al. (1998),
for example, reported that children with DCD swayed more than typically developing peers when
standing upright with their eyes open and showed particularly poor balance relative to controls
when using proprioception alone. Wann et al. (1998) also examined responses to a moving visual
environment in a swinging-room procedure (cf. Lee and Aronson 1974). Here, the children with
DCD who had balance difficulties demonstrated greater postural responses to the swinging envi-
ronment than controls. These data indicate that DCD children with balance problems may be
over-reliant on visual information relative to their peers, and may also have difficulty in using
vision to calibrate proprioceptive control of balance (resulting in the poor balance that Wann
et al. observed when the DCD children’s eyes were closed).
In a relatively recent study, Cherng et al. (2007) examined the effect on balance of varying the
reliability of the proprioceptive/somatosensory input (this was achieved by varying the compli-
ance of the standing surface). When there was one dominant sensory input (eyes open, or fixed
foot support with eyes closed), those with DCD showed no differences in the stability of standing
balanced in comparison to their peers, suggesting no significant difficulty in using the senses
individually. Group differences were observed, however, when sensory inputs were altered by
making the surface on which the children were standing less stable. As a result, Cherng et al.
(2007) argued that individual sources of sensory input are adequate in DCD, but that children
with DCD demonstrate a difficulty in retuning their sensorimotor control to cope with different
reliabilities of those sensory signals.
Walking, another sensorimotor task that requires integration of visual and proprioceptive inputs,
has also been studied in children with DCD. Deconinck et al. (2006) demonstrated that although a
group of children with DCD walked at a similar speed as their typical peers in light conditions,
they slowed their walking significantly in dark conditions (whereas typically developing children
did not). This again suggests that those with DCD have a stronger reliance than their peers on
visual input—in this case for locomotion.
The largest group of studies to have investigated multisensory functioning in DCD have
addressed the integration of proprioception and vision in reaching behaviours. They have used a
paradigm developed by von Hofsten and Rösblad (1988). In von Hofsten and Rösblad’s original
study, participants were asked to stick a pin in the underside of the table to match the location of
a spot that could be seen or felt on the table-top. This was achieved in three conditions in which
available sensory information about the target location was varied:
1. visual—participants viewed the target location via the dot on the table top
2. proprioceptive—participants touched the dot on the table-top with their non-reaching hand
with their eyes closed
3. visual and proprioceptive—participants viewed the target location while also touching the dot
on the table-top with their eyes open.
Just as for typically developing children, those with DCD made significantly more errors in the
proprioceptive-only condition, in line with studies demonstrating that vision provides more reli-
able information about location (Ernst and Banks 2002). However, in von Hofsten and Rösblad’s
study, and a number of other investigations (Mon-Williams et al. 1999; Schoemaker et al. 2001;
Sigmundsson et al. 1999; Smyth and Mason 1998), those with DCD performed significantly less
accurately than their peers across all conditions. Interestingly, however, a study by Mon-Williams
et al. (1999) indicated that those with DCD experienced a particular difficulty relative to controls
in the visual condition. In this condition, the children had to reach to a visual location using pro-
prioceptive guidance alone (i.e. they were forced to match spatial locations across the modalities).
One possibility is that the children with DCD find this task difficult because they have not appro-
priately calibrated proprioceptive guidance to a visual spatial frame of reference. Of course, this
leaves open the question of whether impaired calibration of proprioception is due to upstream
deficits in proprioception, vision, or the processes that allow them to interact. We shall discuss
this question in more detail in the next section.
Overall, then, sensorimotor tasks requiring the integration of multisensory inputs (e.g. reach-
ing, balance, and locomotion) constitute areas of difficulty for those with DCD, and this is pre-
dominantly the case in difficult or novel situations, including in the absence, or degradation, of
particular sensory channels (vision/proprioception).
12.3.2 Unisensory origins of multisensory and

sensorimotor impairments?
As we have already argued, reported unisensory impairments can both impact on multisensory
functioning and arise as a result of multisensory impairments. This relationship is likely to give
rise to cascading developmental effects on perceptual functioning as sensory impairments have
multiple downstream effects in other modalities and across modalities. It is now well accepted, for
example, that the absence of a sensory channel will have a downstream influence on multisensory
processing (Chapter 13 by Röder). Here, we review the research that has led scientists to argue
that the difficulties with sensorimotor abilities (subserved by multisensory processes) observed in
DCD are due to a concurrent unisensory impairment. Discussion of the relationship between
unisensory impairments and sensorimotor functioning has not typically considered developmen-
tal relationships between unisensory and multisensory impairments. We shall argue that this is an
important oversight.
One important unisensory account of the difficulties experienced by children with DCD has
focused on deficits in proprioceptive/kinaesthetic perception. One paradigm that has typically
identified a proprioceptive deficit is a non-visual arm placing task, which requires a participant to
actively move their arm to match the posture of the other arm (placed by the experimenter) in a
symmetrical arrangement across the body midline (see Fig. 12.1). This task has consistently high-
lighted substantial difficulties in those with DCD as compared to their peers (Cantell et al. 1994;
Hill 1997; Pratt 2009; Smyth and Mason 1997).
Others have investigated the view that a kinaesthetic deficit (i.e. a deficit of dynamic propriocep-
tion) causes DCD. Using their ‘kinaesthetic acuity test’, Laszlo and colleagues reported that chil-
dren with DCD were significantly less accurate in reporting which of their two arms was higher
than the other, thus showing poor ability to discriminate unseen limb position following passive
movement (Laszlo and Bairstow 1985; Laszlo et al. 1988; see Fig. 12.2). This paradigm has the
benefit of not requiring a motor response. Thus, any deficit is unlikely to be due to concurrent dif-
ficulties with motor control. However, a number of replications using this test have yielded ambig-
uous results (Hoare and Larkin 1991; Lord and Hulme 1987; Piek and Coleman-Carman 1995).
Perhaps the main difficulty in arguing that the primary deficit in DCD relates to proprioception
is that proprioceptive difficulties could be caused by developmentally earlier difficulties in senso-
rimotor functioning and in multisensory interactions. For instance, as we have already described,
it is frequently argued that visual tuning of proprioception is required for proprioceptive sensitiv-
ity to develop fully (Lee and Aronson 1974). This interpretation is corroborated by data showing
that people who have been blind from birth sway more when standing upright than sighted, but
blindfolded individuals (Edwards 1946; see also Gori et al. 2010 regarding the visual tuning of
haptics). Thus, proprioceptive deficits may well be due to developmentally prior difficulties with
vision or with the crossmodal calibration process.
LH* RH
LH RH
Indicates a flat, upright, hand with fingers and thumb extended

Indicates a fist
Indicates a flat, horizontal, hand with fingers and thumb extended
Fig. 12.1 Schematic representation of the positions used in the arm matching task (RH and LH
indicate which arm was placed by the examiner). All postures are shown in forwards profile, except
those marked with an asterisk, these are shown in sideways profile.
Fig. 12.2 Set-up of the kinaesthetic acuity task (Laszlo & Bairstow, 1985). The child grasps the pegs
with his/her hands and must indicate which hand is higher after the hands are moved to the top of
the ramps by the experimenter. The complete set-up (including child’s hands) are covered
throughout the task.
Charles Hulme and colleagues (Hulme et al. 1982; Lord and Hulme 1987) have argued that
visual perceptual processing plays an important role in the difficulties experienced by those with
DCD. They have shown that children with DCD demonstrate deficits in a range of visual tasks,
including size constancy, visual length discrimination, and visual area discrimination. It is argued
that the visual deficit is primary because deficits in these judgements remain even when children
are not required to make a motor response in response to that information. Hulme and his
colleagues’ position has been further supported by a more recent meta-analysis, which indicates
that visuospatial deficits are strongly implicated in DCD (Wilson and McKenzie 1998). Since
visual perceptual ability is involved in most sensorimotor skills, dysfunction at this level of the
sensorimotor control hierarchy may have a knock-on effect, both in terms of the sensory infor-
mation available for on-line control and the crossmodal calibrations required for optimal senso-
rimotor control (see above). Such a difficulty would thus be predicted to have a developmental
impact affecting a broad variety of sensorimotor and other areas of development over time
(Bishop 1997; Hill 2010; Karmiloff-Smith 1998, 2009; Pratt 2009).
According to this account, a unisensory deficit has an impact on the development of the
multisensory processes involved in sensorimotor control. However, once again, there exist some con-
cerns with this account. Not the least of which is that in a number of sensorimotor tasks, children with
DCD appear to rely more heavily than their peers on visual input (e.g. Deconinck et al. 2006; Wann
et al. 1998). There also remains the possibility that the visual discriminative impairments observed by
Hulme and his colleagues (Hulme et al. 1982; Lord and Hulme 1987) may have arisen through impov-
erished tuning of vision in the context of atypical multisensory interactions earlier in development.
Certainly, the atypical motor behaviours observed in DCD could give rise to such a context.
Thus unisensory accounts of sensorimotor impairments in DCD, be those in terms of a proprio-
ceptive (e.g. Lazlo et al. 1988) or a visual deficit (e.g. Lord and Hulme 1987), suffer from a difficulty
in tracing whether unisensory deficits themselves have their developmental origins in prior multi-
sensory impairments of sensorimotor control or crossmodal calibration. However, some recent
intervention studies have indicated that proprioceptive training can have significant, immediate,
and sustained effects on static and dynamic balance tasks as well as manual dexterity and ball skills
at a three-month follow-up (Sims et al. 1996a; see also Sims et al. 1996b; Sims and Morton 1998).
This suggests that there may be something to the unisensory kinaesthetic deficit account.
In summary, research on visual and proprioceptive/kinaesthetic perception has highlighted
deficits in children with DCD. However, unisensory deficit accounts of DCD suffer from an ina-
bility to rule out origins in terms of prior difficulties with sensorimotor control and multisensory
integration and calibration. In order to examine the plausibility of such accounts more thor-
oughly, prospective longitudinal studies investigating the developmental primacy of sensory
deficits will be important. Nonetheless, the research reported by Sims and colleagues (Sims et al.
1996a; see also Sims et al. 1996b; Sims and Morton 1998) highlights the importance of providing
specific kinds of proprioceptive experience in ameliorating a wide range of difficulties in DCD.
12.4 Autism spectrum disorder

Autism spectrum disorder (or ASD) is a developmental disorder characterized by impaired social
interaction and communication, as well as restricted, repetitive and stereotyped patterns of
behaviour, interests, and activities (American Psychiatric Association 2000; World Health
Organization 1993, for full diagnostic criteria see Box 12.2)2. There are wide variations in the
2 Note changes will be seen in the diagnosis of ASD in DSM-V (2013).

AUTISM SPECTRUM DISORDER 283
Box 12.2. A summary of DSM-IV (1994) diagnostic criteria for

Autistic Disorder and Asperger’s Disorder
Autistic Disorder
Criterion A:
Impairments in social interaction; communication; evidence of restricted and repetitive be-
haviours – example behaviours for each of these include, but are not restricted to:
Social interaction
Lack of peer relationships; lack of spontaneous seeking to share enjoyment, interests etc with
others; clear impairment in the use of nonverbal behaviours such as eye-to-eye gaze, facial
expression or body postures.
Communication
Delayed development (or lack) of spoken language; difficulty initiating or sustaining conver-
sations; stereotyped and repetitive use of language; lack of spontaneous pretend play.
Restricted, repetitive behaviours

High intensity preoccupation with one or more stereotyped and restricted patterns of interest;
inflexible adherence to specific, non-functional routines or rituals; stereotyped and repetitive
motor mannerisms (such as hand flapping).
Criterion B:
Onset before 3 years relating to delays or atypical functioning in one or more of: social interac-
tion; language for social communication; symbolic or imaginative play.
Criterion C:
Behaviours above are not explained by Rett’s Disorder or Childhood Disintegrative Disorder.
Asperger’s Disorder
Criterion A:
Impairments in social interaction; seen for example through lack of peer relationships; lack of
spontaneous seeking to share enjoyment, interests etc with others; clear impairment in the use
of nonverbal behaviours such as eye-to-eye gaze, facial expression or body postures.
Criterion B:
Restricted repetitive behaviours; seen for example through a high intensity preoccupation
with one or more stereotyped and restricted patterns of interest; inflexible adherence to spe-
cific, non-functional routines or rituals; stereotyped and repetitive motor mannerisms (such
as hand flapping).
Box 12.2. A summary of DSM-IV (1994) diagnostic criteria for Autistic Disorder and
Asperger’s Disorder (continued)
Criterion C:
Behaviours have clinically significant impact in social, occupational, or other key aspects of
functioning.
Criterion D:
Language development is not clinically delayed.
Criterion E:
Cognitive development and adaptive behaviours are not clinically delayed.
Criterion F:
Behaviours above are not explained by another specific Pervasive Developmental Disorder or
Schizophrenia.
N.B. The autism spectrum can be referred to as autism, autistic disorder, autism spectrum disorder.
ASD is estimated to affect 1% of the population (Baird et al. 2006). Like most developmental
disorders, more males are affected than females. Learning disability (IQ<70) is estimated to affect
25% of those diagnosed on the autism spectrum (e.g., Friedman-Hill et al. 2010).
presentation of autism and it is therefore commonly accepted that autism is a spectrum disorder
that varies in severity between individuals.
12.4.1 Hypothesizing ‘core’ multisensory deficits in ASD

Multisensory processing in ASD has gained interest recently, with a number of researchers sug-
gesting that impairments in multisensory integration might be a core deficit in ASD (Bahrick
2010; Bahrick and Todd 2012; Foxe and Molholm 2009; Oberman and Ramachandran 2008).
Foxe and Molholm (2009) have suggested that difficulties in integrating or binding disparate
sensory elements (e.g. linking the movement of a bouncing ball with the sound of the ball hitting
the ground) could lead to confusion in individuals with ASD, which may result in classic symp-
toms of the disorder such as social withdrawal. On the other hand, Bahrick and her colleagues
(Bahrick 2010; Bahrick and Todd 2012) have put forward some more specific hypotheses regard-
ing the multisensory origins of ASD suggesting that a deficit in sensitivity to intersensory redun-
dancy and, in Bahrick and Todd (in press), a deficit in engaging and disengaging attention to
intersensory redundancy in early infancy, could lead to a particular problem in orienting to social
stimuli (which Bahrick argues are intrinsically multisensory), thus explaining the later difficulties
children with ASD have with social interaction. This account makes use of the intersensory
redundancy hypothesis (Chapter 8 by Bahrick and Lickliter), according to which the detection
and attention to amodal multisensory information (information that is redundantly presented
across more than one modality) is crucial for the typical development of perceptual and cognitive
processes.
A rather different multisensory account of ASD has been put forward by Oberman and
Ramachandran (2007, 2008). They suggest that an abnormality in the ‘mirror neuron system’
(Rizzolatti and Craighero 2004) is central to ASD. The mirror neuron system is a brain network
AUTISM SPECTRUM DISORDER 285
converting sensory stimuli concerning others’ actions into similar (mirrored) sensorimotor
representations in the observer. This system is believed, by some, to subserve understanding
of thoughts, emotions, and actions in others, and to be necessary for the typical development
of imitation, theory of mind, language, empathy, and recognition (Oberman and Ramachandran
2007). Interestingly from the point of view of the current review, Oberman and Ramachandran
(2007, 2008) hint that a deficit in the mirror neuron system might be related to a multisensory
deficit (stating that multisensory systems are ‘mirror-neuron like’). Unfortunately this idea is
not expanded upon and, crucially, Oberman and Ramachandran do not explain how and why a
multisensory integration deficit would imply a deficit in the (sensorimotor) mirror neuron sys-
tem, or vice versa. Perhaps more damaging for their account, there is now much debate about the
existence of a mirror neuron system abnormality in ASD, with some authors demonstrating nor-
mal development of imitation in individuals with ASD (see Fan et al. 2010; Hamilton 2009;
Southgate and Hamilton 2008). Nonetheless, Oberman and Ramachandran (2008) present some
data which may be indicative of a multisensory impairment (see below).
One further account, which bears similarities to both of the aforementioned accounts, is offered
by Gergely (2001), who argues that ASD represents a deficit in responding to ‘imperfect’ multi-
sensory contingencies. Gergely (2001) focuses particularly on visual-proprioceptive contingen-
cies, arguing that a typical process of developmental change in attention to such contingencies in
infancy does not occur in ASD. Infants who are developing typically undergo a shift in attention
from preferring perfect visual-proprioceptive contingencies, which arise when they see their own
body moving (see Chapter 5 by Bremner et al.; Bahrick and Watson 1985), to preferring imperfect
contingencies, which are apparent when another person moves in response to their own move-
ments (Gergely and Watson 1999). This ‘contingency switch’, is argued to take place in typically
developing children at around 3 months of age, and to be an important process underlying the
emergence of social-orienting behaviour (see Csibra and Gergely 2011). Gergely (2001) proposes
that the switch does not take place in individuals with ASD, explaining the emergence of atypical
social orienting in this group.
12.4.2 Evidence of multisensory deficits in ASD

Despite the emergence of these recent theories concerning the multisensory origins of ASD, there
is relatively little relevant empirical data so far, and certainly little evidence to distinguish between
the accounts described above. Nonetheless, a number of recent studies indicate that multisensory
impairments may represent a promising avenue of research in ASD (Bonneh et al. 2008; Foss-Feig
et al. 2010; Klin et al. 2009; Oberman and Ramachandran 2008; Smith and Bennetto 2007; Russo
et al. 2010). First-hand accounts of individuals with ASD also support the idea of multisensory
abnormalities in this group (e.g. O’Neill and Jones 1997). Some have argued that first-hand
accounts converge on the opinion that autistic perception is ‘monochannel’ (Bonneh et al. 2008).
In other words, in individuals with ASD, it is suggested that attention to one sensory modality can
impair perception in another.
Unusual sensory responses suggestive of ‘monochannel’ perception (hyper- and/or hyposensi-
tivity to individual sensory channels) have been reported retrospectively by observers from as early
as 6–12 months of age (Baranek 1999; Dawson et al. 2000; Freeman 1993) and are therefore one of
the earliest indicators of the disorder (O’Neill and Jones 1997). These unusual responses appear to
persist across the lifespan and have been reported in older children (Kientz and Dunn 1997;
Leekam et al. 2007), adolescents (Jones et al. 2009), and adults (Baron-Cohen et al. 2009; Crane
et al. 2009) with ASD. Such reports have also been confirmed by first-hand accounts of adults with
ASD (Grandin and Scariano 1986, Williams 1992). However, unusual sensory processing is not
included in the diagnostic criteria for ASD, primarily because similar reports are provided across a
range of developmental disorders (e.g. Miller et al. 2001; Rogers et al. 2003).3 We will discuss the
similarities in unusual sensory processing in ASD and DCD in the following section, and highlight
how this pattern of processing may be indicative of developmental disorder more generally, rather
than a specific indicator of a particular disorder. However, as noted above, the single case study
reported by Bonneh et al. (2008), has examined the multisensory basis of ‘mono-channel’ percep-
tion in an individual with ASD, and we report that here.
Given that descriptions of autistic perception as ‘monochannel’ could indicate increased crossmo-
dal extinction (or an exaggerated Colavita sensory dominance effect; see Colavita 1974; Koppen and
Spence 2007; Sinnett et al. 2007), Bonneh et al. (2008) investigated this possibility in the context of a
case-study of a 13-year-old boy (‘AM’) with a diagnosis of ASD. In their first study, AM was presented
with either unimodal (tactile, visual, or auditory stimuli) or bimodal (visual-auditory, visual-tactile,
or auditory-tactile) stimuli and was asked to identify how many and which modalities were presented
on each trial. AM demonstrated a striking inability to report bimodal stimuli (in comparison to
typical 8-year-olds who performed perfectly). Moreover, the child exhibited a hierarchy of extinction
in which visual stimuli extinguished concurrent tactile stimuli, and auditory stimuli extinguished
both visual and tactile stimuli. Further studies demonstrated that AM also had difficulty detecting
whether bimodal stimuli originated from a single location or two locations (e.g. visual flash to the
right and sound to the left), and also that he was unable to respond to a visual colour when an incon-
gruent colour name was presented simultaneously. These findings are in striking accordance with
AM’s report that ‘when I hear, my vision shuts down’ (Bonneh et al. 2008, p. 2).
The case study reported by Bonneh et al. (2008) is certainly intriguing, and points towards an
important avenue of research into multisensory processing in individuals with ASD. However, it
is important to recognize the limitations of this case study which demonstrated crossmodal
extinction in only one, relatively low-functioning (in terms of daily-life functioning) individual.
It will be important to determine whether this kind of crossmodal difficulty is also present in oth-
ers with ASD. At present, there is some evidence to suggest that this is the case. Van der Smagt
et al. (2007), and Foss-Feig et al. (2010), examined low-level influences of auditory information
on visual perception (sounds presented concurrently with visual flashes inducing illusory percep-
tions of additional flashes, cf. Shams et al. 2000). Using this task, Van der Smagt et al. (2007)
found similar influence of audition on reports of visual percepts between high-functioning adults
with and without ASD. However, by varying parameters of the stimuli Foss-Feig et al. (2010)
found that children with ASD were more likely than typically developing controls to show an
influence of an auditory beep on their visual percepts across a wider temporal window.
Researchers have also started to investigate a potential multisensory deficit in language process-
ing in ASD (see also the developmental dyslexia section below). Language processing can be seen
as a multisensory integration process, combining auditory and visual information in perceiving
speech and making links between audio–visual input, orthography and articulated speech sounds
in reading. Here, again, the research has revealed some promising findings, but also a number of
conflicting results. Using a speech-in-noise paradigm, which allowed the investigation of audio-
visual processing and lip reading in adolescents with and without ASD, Smith and Bennetto
(2007) reported that those with ASD were significantly poorer than their peers at lip reading, and
gained less benefit from visual information in an audiovisual speech-perception task. They con-
cluded that this performance might reflect a specific deficit in auditory–visual integration.
However, others have failed to find poor performance on similar audiovisual integration tasks.
Using speech synthesizer software and a ‘virtual’ head, Williams et al. (2004) presented speech
3 The importance of sensory symptoms in ASD will be recognised in DSM-V (2013).

LINKS BETWEEN DCD AND ASD 287
stimuli to children with and without ASD in unimodal (visual, auditory) and bimodal conditions
(requiring multisensory integration). A group difference was only identified in the unimodal con-
ditions. An intervention study, along with computer modelling, led these authors to conclude that
children with ASD show normal integration of visual and auditory information.
Working from their premise that the mirror neuron system is impaired in ASD (see above),
Oberman and Ramachandran ( 2008 ) investigated the ‘bouba-kiki effect’ (Köhler 1929 ;
Ramachandran and Hubbard 2001). This effect arises when participants are asked to pair nonsense
words with shapes. Typical adults (and children) pair these stimuli up with remarkable agreement,
suggesting that participants perceive synaesthetic correspondences between the visual and auditory
stimuli (see Chapter 10 by Maurer et al. for research discussing the role of synaesthesia in develop-
ment; see also Spence 2011). In contrast, children with ASD perform at almost chance levels on this
task. Oberman and Ramachandran (2007, 2008) argue that this arises from poor multisensory inte-
gration (which they locate—perhaps controversially—to the mirror neuron system, see above).
Thus while a number of studies are suggestive of a deficit in multisensory integration in ASD
(Bonneh et al. 2008; Foss-Feig et al. 2010; Klin et al. 2009; Oberman and Ramachandran 2008;
Smith and Bennetto 2007, see Table 12.1), the number of negative results urges caution and fur-
ther research before accepting a core deficit account of ASD in terms of multisensory processing.
As with DCD, studies have yet to clarify the origins of multisensory atypicalities in terms of prior
difficulties with unisensory processing, sensorimotor control, and multisensory integration and
calibration.
However, at least one study has indicated atypicalities in multisensory processing in infants diag-
nosed with ASD (Klin et al. 2009). In an investigation of 2-year-olds’ preferences to attend to biolo-
gial motion displays with an accompanying soundtrack, Klin et al. found that, whereas typically
developing 2-year-olds’ visual preferences were driven by the presence of appropriately oriented
biological motion in the displays, 90% of the looking in the ASD group of 2-year-olds was driven by
the amount of audiovisual (AV) synchrony present in the stimulus events; they preferred to look at
AV synchrony. Such early manifestations of atypical multisensory processing in children with ASD
indicates that future research into multisensory perception in ASD is certainly warranted. It is
important to note, however, that Klin et al.’s (2009) data is not entirely consistent with current
theories of multisensory disturbances in ASD (particularly those that implicate a difficulty in detec-
tion of and attention to intersensory redundancy; Bahrick 2010; Bahrick and Todd in press).
12.5 Links between DCD and ASD: hypersensitivity,

hyposensitivity, and atypical multisensory processing
Reports of hyper-/hyposensitivity to individual sensory channels across developmental disorders
led to a significant body of occupational therapists and other clinicians adopting a sensory
integration approach to the remediation of DCD, ASD, and other disorders. This has stemmed
predominantly from the work of Anna Jean Ayres (1979), who described ‘sensory integration
disorder’ (later referred to as sensory processing disorder, SPD), arguing that individuals with this
disorder experience difficulty in dealing with cues from sensory sources and using these to both
initiate and control behaviour. On this basis, Ayres and colleagues developed ‘sensory integration
therapy’, which aimed to provide proprioceptive, kinaesthetic, tactile, and vestibular stimulation
to those with DCD, ASD, and other disorders, with the aim of improving the sensory and
sensory-integration deficits that Ayres and others believed were the cause of these conditions.
While this approach remains popular in some areas of the world today, the small number of
reports evaluating the success of this intervention approach suggest that it is, at best, no better
than other therapeutic interventions (Hoehn and Baumeister 1994; Kaplan et al. 1993).
The disappointing results of sensory integration therapy might be partly due to a lack of a sys-
tematic study of the nature of the atypical sensory processes underlying hyper- and hyposensitivity
to sensory stimulation. Until recently, there have been few attempts to identify what aberrant
perceptual process may lead to these symptoms. However, there is the possibility that hypo- or
hypersensitivity could reflect a difficulty with multisensory integration.
B.E. Stein et al. (2009) recently suggested that symptoms of hyper- and hyposensitivity to unisen-
sory inputs may be due to atypical neural integration of separate sensory channels in the superior
colliculus (SC). Drawing from their extensive body of research on the neurophysiological responses
of neurons in cat SC (see Chapter 14 for a discussion of how multisensory integration develops in
the SC), Stein et al. (2009) suggest the development of responses to multisensory inputs in SC as a
model for examining sensory integration disorder, but stop short of explaining how an underdevel-
oped neural response in SC would lead to symptoms such as hyper- or hyposensitivity. One possi-
bility is that a breakdown of multisensory integration may lead to an increased reliance on (or
dominance/capture by) one sensory input, and corresponding extinction of responses to the other
(see Bonneh et al. 2008, described above). It seems logical to predict that an increased reliance or
capture by one sense would lead to hypersensitivity, whereas reduced reliance on another sense
might lead to hyposensitivity to that modality of stimulation. Certainly, much more research is
needed to examine sensory processing abnormalities across disorders. We suggest, specifically, that
it will be informative to test whether such a relationship between crossmodal dominance, extinc-
tion, and specific patterns of hypo- and hypersensitivity exists in those with developmental disor-
ders. Bonneh et al.’s (2008) investigation in an individual with ASD makes a start in this regard.
12.6 Developmental dyslexia

Developmental dyslexia (DD) is diagnosed in those who show a substantial discrepancy between
their reading ability and intelligence despite adequate teaching. Individuals diagnosed with DD
may experience difficulties such as an inability to learn the alphabet, problems in distinguishing
between similar sounding words, the presence of writing errors (e.g. letter reversal), and very
poor spelling (see Box 12.3). DD has also been associated with deficits in oral and written lan-
guage acquisition (dysphasia and dysgraphia, respectively), mathematics, visuospatial ability,
Box 12.3. A summary of DSM-IV (1994) diagnostic criteria for DD

Criterion A:
Reading achievement, as measured by individually administered standardized tests of read-
ing accuracy or comprehension, is substantially below that expected given the individual’s
chronological age, measured intelligence and age-appropriate education.
Criterion B:
The disturbance in Criterion A significantly interferes with academic achievement or activities
of daily living that require reading skills.
Criterion C:
If a sensory deficit is present, the reading difficulties are in excess of those usually associated
with it.
N.B. DD is diagnosed in 3-6% of the population (Rutter et al. 2004), is found across cultures
(Paulesu et al. 2001) and persists into adulthood (Rutter et al. 2006).
DEVELOPMENTAL DYSLEXIA 289
motor coordination, and attention (Habib 2000). Genetic effects are moderately important in DD
(DeFries and Alarcon 1996) although environmental effects, such as orthography (the fact that
different languages have different writing systems, some of which are more regular than others;
e.g. Greek versus English) are also important (Furnes and Samuelsson 2010; Paulesu et al. 2001).
12.6.1 Multisensory processing difficulties in DD

Reading, the core deficit in DD, requires us to make rapid temporal parsings between (visual)
written letters and speech sounds. The multisensory nature of this task prompted a number of
early researchers of reading disorders to hypothesize and examine visual–auditory integration
deficits in children with reading difficulties (e.g. Birch and Belmont 1964; Critchley 1970).
For instance, Birch and Belmont (1964, 1965) reported that children with reading difficulties
were impaired at identifying crossmodal equivalence between auditory tap patterns and visual
patterns of dots in a line. They concluded that deficits in detecting equivalences between hearing
and vision contribute to reading difficulties.
However, later researchers noticed some important problems with Birch and Belmont’s test of
auditory–visual equivalence detection (Tallal and Stark 1982; Zurif and Carson 1970). Primarily,
it is possible that the children with reading difficulties were impaired on Birch and Belmont’s
task, not because of a difficulty with noticing equivalences across the modalities, but rather due
to a difficulty in perceiving temporal patterns within one or other modality or across both
modalities. Later studies have been equivocal in terms of supporting Birch and Belmont’s (1965)
interpretation. Snowling ( 1980 ) examined dyslexic children’s ability to make grapheme–
phoneme correspondences in the context of a recognition memory task involving pseudowords.
She observed that, compared to reading-age-matched controls, children with DD showed deficits
in recognizing across, but not within, modalities. On the other hand, Zurif and Carson (1970)
found that children with dyslexia were impaired at matching temporal patterns both crossmo-
dally and within modalities, and concluded that perceptual deficits in dyslexia could also be
due to an amodal difficulty with temporal information. Temporal processing accounts of DD
(e.g. J. Stein 2001; Tallal 1980; Temple 2002) have since become some of the more influential
explanations of the disorder. We shall return to discuss these accounts in the next section.
In one more recent study addressing the relationship between visual and auditory temporal
processing in DD, Hairston and colleagues (2005) have identified differences in multisensory
interactions in processing temporal information between adults with DD and controls. Using a
visual ‘temporal order judgement’ (TOJ) task in which participants were asked to judge which of
two visual stimuli had appeared first on a screen (either above or below a fixation cross), Hairston
et al. (2005) first examined the participants’ threshold stimulus onset asynchrony for this task.
Consistent with arguments for a temporal processing deficit (J. Stein 2001; Tallal 1980; Temple
2002), participants with DD had a much higher threshold for discriminating the temporal order
of the stimuli (i.e. they required longer temporal gaps between the stimuli in order to respond
appropriately to their temporal order). In a separate condition, the authors presented an interfer-
ing auditory stimulus (which was not spatially informative). When this was presented close
enough in time to the visual stimuli, all participants showed an improvement in their TOJs with
respect to the visual stimuli. This benefit is thought to be provided by a multisensory ‘ventrilo-
quism’ of one of the visual cues to the auditory stimulus (Morein-Zamir et al. 2003). Hairston
et al.’s (2005) key finding was that the facilitation conferred by the auditory stimulus was greater
in the group with DD than in the control group. In addition, the temporal window over which
this facilitation occurred was greater for the group with DD. Hairston et al. (2005) concluded that
adults with DD have an expanded temporal window of multisensory integration for auditory and
visual information. They suggest that this expanded temporal window for processing multisensory
information may lead to an increased number of multisensory binding errors. More specifically,
they argue that an increase in the time taken to link visual and auditory information together (e.g.
when linking visual representations of orthography with corresponding auditory information
during reading) may lead to a higher number of reading errors, as well as a general slowing in
reading ability, in those with DD.
Thus, a few studies have indicated that multisensory impairments exist in individuals with DD.
However, the argument for multisensory integration being an important contribution to reading
disability (Birch and Belmont 1965) suffers from a paucity of empirical support, and also a lack of
consistency among the findings of studies of multisensory processing in DD. Whilst Snowling
(1980) has demonstrated, and Birch and Belmont (1965) have argued, that individuals with DD
have a particular difficulty in making links between visual and auditory stimuli, more recent
findings by Hairston and colleagues (2005) suggest that individuals with DD actually integrate
auditory–visual information more (or across a longer time-span) than controls. While it is
possible to see how both of these multisensory processing differences could lead to reading defi-
cits, they do not offer a consistent explanation of DD in terms of a core multisensory deficit. Next
we discuss the potential relationships between more accepted ‘core deficit’ accounts of DD and
multisensory functioning.
Other accounts of developmental dyslexia and their

12.6.2
developmental relationship to multisensory impairments
Over the last three decades there has been a great deal of debate as to what represents the ‘core deficit’
in DD. Some theories argue that the disorder represents a specific phonological deficit (e.g. Pennington
et al. 1991), and others place it within a more broad perceptual and/or sensorimotor domain. John
Stein (2001), for instance, suggests that DD is primarily due to difficulties in processing rapidly
occurring sensory stimuli, irrespective of modality. Meanwhile, Nicholson and Fawcett (1990) have
argued that sensorimotor difficulties substantiated in the cerebellum are at the heart of the disorder.
The magnocellular deficit theory of dyslexia (J. Stein 2001; J. Stein and Walsh 1997) was pro-
posed as an explanation for reports of poor visual and auditory temporal processing in those with
dyslexia. It has been argued that a unifying explanation of this performance profile relates to a
specific biological atypicality in the magnocellular neurons in the sensory areas of the central
nervous system (as distinct from parvocellular neurons). In the visual system, magnocellular
neurons are particularly involved in the perception of visual movement, depth, and small differ-
ences in brightness (low-contrast black and white information). Although magnocellular neurons
do not form a distinct pathway in the auditory system (as they do in the visual system), J. Stein
(2001) also argues that a magnocellular deficit can explain auditory processing difficulties reported
in some individuals with DD, including difficulties with rapid auditory temporal processing
(Tallal and Piercy 1973) and auditory frequency discrimination (e.g. Ahissar et al. 2000; McAnally
and Stein 1996). It is claimed that this view explains additional differences in visual processing,
binocular control, vergence, visual crowding, and visuospatial attention reported in groups of
individuals with DD (e.g. Hari et al. 2001; Spinelli et al. 2002; J. Stein 2001; J. Stein and Fowler
1993; J. Stein and Walsh 1997). The deficits entailed by the magnocellular theory of DD could
certainly go some way to explaining difficulties with the multisensory task of reading, as reading
requires us to make rapid temporal translations between visual orthography and auditory pho-
nology. A magnocellular impairment could also underlie some of the multisensory deficits
described in the previous section. For instance, less rapid temporal processing of stimuli (audi-
tory or visual) could result in multisensory integration of auditory and visual information occur-
ring across a wider temporal window (Hairston et al. 2005).
DEVELOPMENTAL DYSLEXIA 291
However, while there is some neurophysiological evidence to support the magnocellular deficit
theory of DD in the form of structural and functional brain abnormalities (Eden et al. 1996;
Livingstone et al. 1991), there is now a growing body of evidence against this view, perhaps most
critical being those studies that show that whilst magnocellular system deficits are more prevalent
in individuals with DD than the general population, such deficits are not nearly as consistent as
phonological deficits (Ramus et al. 2003a,b).
An additional theory of DD, the ‘automatization’, or ‘cerebellar deficit’, theory was proposed
by Nicolson and Fawcett (1990) following their observation that individuals with dyslexia experi-
enced difficulty in dual tasks involving motor skills, such as balancing and counting at the same
time. In this account, reading deficits are attributed to ‘an inability to become completely fluent
in cognitive and motor skills’ (Fawcett and Nicolson 1992, p. 507), with the cerebellum and asso-
ciated systems being specifically implicated. Further work reported by Nicolson, Fawcett, and
colleagues has documented a broad range of motor deficits in children and adults with dyslexia
(e.g. Fawcett et al. 1996; Stoodley et al. 2005, 2006), as well as suggesting atypicalities in the cere-
bellum in adults with dyslexia (Laycock et al. 2008). However, again this theory suffers from the
limitation that motor difficulties only affect a small subset of individuals with DD (e.g. Ramus
et al. 2003a,b).
It is a little difficult to see how the cerebellar deficit theory could explain multisensory abnor-
malities either specific to reading abilities or more generally. Indeed, there appears to be little
evidence for a causal link between motor impairments and reading difficulties (Ramus et al.
2003b). However, an alternative possibility is that motor impairments and reading difficulties
may arise as a result of a more general multisensory vulnerability. As we saw earlier, multisensory
processes are at the heart of sensorimotor control and may play an important role in other disor-
ders with a motor component, such as DCD. Indeed, Ramus et al. (2003b) make a parallel argu-
ment to this, suggesting that DD may represent a general sensorimotor vulnerability, which
manifests itself in different ways across the population of affected individuals.
However, the most accepted account of DD is the one that argues that phonological processing
difficulties constitute the ‘core deficit’ of the disorder. According to this account, it is poor detec-
tion and/or discrimination of speech sounds that leads to the difficulties observed in dyslexia.
These difficulties have a striking impact on reading and writing since these depend on good pho-
nological processing (Muter et al. 2004), and the greatest functional impact is seen in those whose
language has a greater degree of irregular orthography (conversion from sounds to the written
word). However, even those whose native language has a regular orthography show a ‘dyslexic’
pattern of brain activation in phonological processing tasks (e.g. Paulesu et al. 2001). Furthermore,
pre-readers at risk for dyslexia show poor phonological learning and awareness skills (Carroll and
Snowling 2004) such as slow learning of rhymes.
While there is no doubt that more general sensory processing abnormalities and sensorimotor
problems are evident in at least a subgroup of those with DD, the bulk of the evidence supports
the view that a phonological deficit is sufficient to cause DD, while additional sensory and senso-
rimotor difficulties are seen in some individuals (Ramus et al. 2003a,b).
A recent neuroimaging study by Blau et al. (2009) has attempted to elucidate the relationship
between the phonological deficits observed in DD and the multisensory impairments in making
links between phonology and orthography when reading. Blau et al. (2009) used fMRI to examine
brain activation in adults with DD in response to speech sounds, visual letters, and multisensory
speech–letter combinations (congruent and incongruent). Replicating previous findings, they
found that adults with DD showed underactivation (relative to controls) of the left superior tem-
poral cortex in response to speech sounds. However, they also found an absence of an audiovisual
congruency effect in the superior temporal cortex, such that, in contrast to controls, no difference
was observed between blood-oxygen-level-dependence responses to congruent and incongruent

speech–letter combinations. The authors conclude that there is an observable neural impairment
in audiovisual multisensory integration in adults with DD, and furthermore demonstrate a strong
relationship between the multisensory impairment and the simple auditory processing impair-
ment that was also observed. Whilst it is difficult to identify a direction of causality, there seems
to be a strong relationship between phonological and multisensory impairments in adults with
DD. Further developmental research will elucidate how these impairments arise and whether one
or the other is the ‘core’ deficit.
While these findings suggest that examining a multisensory involvement in DD offers a prom-
ising avenue of research, there is much work to be done in determining the origin of multisensory
impairments (in behaviour and the brain) and DD. One particularly important question con-
cerns whether atypical multisensory processing in DD is the cause or result of reading/phono-
logical abnormalities. The majority of studies of multisensory processing in DD have involved
adults, so we have a poor picture of the developmental relationship between multisensory proc-
esses and the symptoms of DD. Further research on multisensory integration in children with DD
and pre-reading children with DD will help clarify whether multisensory integration is a central
deficit in DD.
12.7 Conclusion
The study of atypical development is not only useful in terms of the treatment of developmental
disorders, but because it can also provide a point of comparison from which to investigate the
processes involved in the emergence of certain skills and behaviours in typical development.
Furthermore, it can also help us to explain individual differences in the population (Elsabbagh
and Johnson 2010). As we pointed out at the start of this chapter, it is surprising to note how little
research has examined sensory and multisensory impairments in developmental disorders. The
three disorders covered in this chapter—DCD, ASD, and DD—have all been proposed at one
point or another to involve atypical sensory processing. Indeed, clinicians would, for the most
part, argue strongly that sensory difficulties were central in at least the first two of the three disor-
ders considered (i.e. DCD and ASD).
In conclusion, this review raises a number of points regarding our understanding of sensory
processing in developmental disorders. Not least, it shows that there are some distinct similarities
in sensory responses that exist across disorders. Hypo- and hypersensitivity to sensory informa-
tion is found in ASD, DCD, ADHD, and Williams syndrome, among others. Such responses may
be indicative of disordered multisensory integration (resulting in dominance or extinction of
responses to particular sensory channels). Sensorimotor impairment has been identified across a
similarly large range of developmental disorders (including all of those reported here). As we have
argued earlier, much research on DCD points to an interpretation of sensorimotor impairments
in terms of atypical interactions between the sensory modalities used to guide a range of different
actions.
Thus it is our contention that further multisensory research in developmental disorders is
much warranted. A large body of literature (see the other chapters in this volume) now indicates
that multisensory processes have an extended development through infancy and well into child-
hood. Such abilities, if perturbed at an early stage in development, could lead to significant down-
stream impairments, not just in multisensory processes, but also in unisensory and cognitive
abilities. It is certainly a limitation that most studies of sensory processing difficulties in children
with developmental disorders examine processing in one sense at a time. Because unisensory
processing is influenced by multisensory processes and vice versa, developmental disorders
REFERENCES 293
research requires studies to trace the trajectories of emergence of problems in processing the
senses on their own and in combination. Such studies, we hope, will provide answers not just to
the question of how sensory impairments emerge in atypical development, but also help us
understand the ontogeny of multisensory perception more generally.
Acknowledgements
The authors would like to thank the editors of this volume for their insightful comments on an
earlier version of this manuscript. AJB would also like to acknowledge financial support for this
work from the European Research Council under the European Community’s Seventh Framework
Programme (FP7/2007–2013)/ERC Grant agreement no. 241242.
References
Ahissar, M., Protopapas, A., Reid, M., and Merzenich, M.M. (2000). Auditory processing parallels reading
abilities in adults. Proceedings of the National Academy of Sciences USA, 97, 6832–37.
American Psychiatric Association (1980). Diagnostic and statistical manual of mental disorders, 3rd edn
(DSM-III). Washington, DC.
American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders, text revision
(DSM-IV-TR). Washingon, DC.
Ashwin, E., Ashwin, C., Tavassoli, T., Chakrabarti, B., and Baron-Cohen, S. (2009). Eagle-eyed visual acuity
in autism. Biological Psychiatry, 66, e23–24.
Asperger, H. (1944/1991). ‘Autistic psychopathy’ in childhood. In Autism and Asperger syndrome (ed. U.
Frith), pp. 37–92. Cambridge University Press, Cambridge.
Ayres, A.J. (1979). Sensory integration and the child. Western Psychological Services, Los Angeles.
Implications for typical social developmental and autism. In The Wiley-Blackwell handbook of infant
development, 2nd edn. (eds. J.G. Bremner, and T.D. Wachs), pp. 120–66. Wiley-Blackwell, Oxford, UK.
Bahrick, L.E., and Watson, J.S. (1985). Detection of intermodal proprioceptive-visual contingency as a
potential basis of self-perception in infancy. Developmental Psychology, 21, 963–73.
Bahrick, L.E., and Todd, J.T. (2012). Multisensory processing in autism spectrum disorders: Intersensory
processing disturbance as a basis for atypical development. In The new handbook of multisensory
processes (ed. B.E. Stein). MIT Press, Cambridge, MA.
Bahrick, L.E., Lickliter, R., Castellanos, I., and Vaillant-Molina, M. (2010). Increasing task difficulty
enhances effects of intersensory redundancy: testing a new prediction of the Intersensory Redundancy
Hypothesis. Developmental Science, 13, 731–37.
Baird, G., Simonoff, E., Pickles, A., Chandler, S., Loucas, T., Meldrum, D., and T.C. (2006). Prevalence of
disorders of the autism spectrum in a population cohort of children in South Thames: The Special
Needs and Autism Project (SNAP). Lancet, 368, 179–81.
Baranek, G.T. (1999). Autism during infancy: a retrospective video analysis of sensory-motor and social
behaviors at 9–12 months of age. Journal of Autism and Developmental Disorders, 29, 213–24.
Baranek, G.T., David, F.J., Poe, M.D., Stone, W.L., and Watson, L.R. (2006). Sensory experiences
questionnaire: discriminating sensory features in young children with autism, developmental delays,
and typical development. Journal of Child Psychology and Psychiatry, 47, 591–601.
Baron-Cohen, S., Ashwin, E., Ashwin, C., Tavassoli, T., and Chakrabati, B. (2009). Talent in autism: hyper-
systemizing, hyper-attention to detail and sensory hypersensitivity. Philosophical Transactions of the
Royal Society B: Biological Sciences, 364, 1377–83.
Birch, H.G., and Belmont, L. (1964). Auditory visual integration in normal and retarded readers. Annals of
Dyslexia, 15, 48–96.
Birch, H.G., and Belmont, L. (1965). Auditory-visual integration, intelligence and reading ability in school
children. Perceptual and Motor Skills, 20, 295–305.
Bishop, D.V.M. (1997). Uncommon understanding: development and disorders of language comprehension in
children. Psychology Press, Hove.
Blakemore, S.-J., Tavassoli, T., Calò, S., et al. (2006). Tactile sensitivity in Asperger syndrome. Brain and
Blau, V., van Atteveldt, N., Ekkebus, M., Goebel, R., and Blomert, L. (2009). Reduced neural integration
of letters and speech sounds links phonological and reading deficits in adult dyslexia. Current Biology,
19, 503–508.
Bonneh, Y.S., Belmonte, M.K., Pei, F., et al. (2008). Cross-modal extinction in a boy with severely autistic
behaviour and high verbal intelligence. Cognitive Neuropsychology, 25, 635–52.
Calvert, G., Spence, C., and Stein, B.E. (2004). The handbook of multisensory processes. MIT Press,
Cambridge, MA.
Cantell, M.H., Smyth, M.M., and Ahonen, T.K. (1994). Clumsiness in adolescence: educational, motor
and social outcomes of motor delay detected at five years. Adapted Physical Activity Quarterly, 11,
115–29.
Carroll, J.M., and Snowling, M.J. (2004). Language and phonological skills in children at high risk of
reading difficulties. Journal of Child Psychology and Psychiatry, 45, 631–40.
Cherng, R.J., Hsu, Y.W., Chen, Y.J., and Chen, J.Y. (2007). Standing balance of children with developmental
coordination disorder under altered sensory conditions. Human Movement Science, 26, 13–26.
Colavita, F.B. (1974). Insular-temporal lesions and vibrotactile temporal pattern discrimination in cats.
Physiological Behaviour, 12, 215–218.
Crane, L., Goddard, L., and Pring, L. (2009). Sensory processing in adults with autism spectrum disorders.
Autism: The International Journal of Research and Practice, 13, 215–28.
Critchley, M. (1970). The dyslexic child. Heinemann, London.
Csibra, G., and Gergely, G. (2011). Natural pedagogy as evolutionary adaptation. Philosophical Transactions
of the Royal Society B, 366, 1149–57.
Dare, M.T., and Gordon, N. (1970). Clumsy children: a disorder of perception and movement
organisation. Developmental Medicine and Child Neurology, 12, 178–85.
Dawson, G., and Watling, R. (2000). Interventions to facilitate auditory, visual, and motor integration in
autism: A review of the evidence. Journal of Autism and Developmental Disorders, 30, 415–21.
Dawson, G., Osterling, J., Meltzoff, A.N., and Kuhl, P. (2000). Case study of the development of an infant
with autism from birth to 2 years of age. Journal of Applied Developmental Psychology, 21, 299–313.
Deconinck, F.J.A., De Clercq, D., Savelsbergh, G.J.P., et al. (2006). Visual contribution to walking in
children with developmental coordination disorder. Child: Care, Health and Development, 32, 711–22.
DeFries, J.C., and Alarcon, M. (1996). Genetics of specific reading disability, Mental Retardation and
Developmental Disabilities Research Reviews, 2, 39–47.
Dewey, D., and Kaplan, B.J. (1994). Subtyping of developmental motor deficits. Developmental
Neuropsychology, 10, 265–84.
Dunn, W. (1999). Sensory profile: user’s manual. The Psychological Corporation, San Antonio.
Dunn, W., and Bennett, D. (2002). Patterns of sensory processing in children with attention deficit
hyperactivity disorder. OTJR-Occupation Participation and Health, 22, 4–15.
Eden, G.F., VanMeter, J.W., Rumsey, J., Maisog, J.M., Woods, R.P., and Zeffiro, T.A. (1996). Abnormal
processing of visual motion in dyslexia revealed by functional brain imaging. Nature, 382, 66–69.
Edwards, A. (1946). Body sway and vision. Journal of Experimental Psychology, 36, 526–35.
Elsabbagh, M., and Johnson, M.H. (2010). Getting answers from babies about autism. Trends in Cognitive
REFERENCES 295
Fan, Y.T., Decety, J., Yang, C.Y., Liu, J.L., and Cheng, Y. (2010). Unbroken mirror neurons in autism
spectrum disorders. Journal of Child Psychology and Psychiatry, 51, 981–88.
Fawcett, A.J., and Nicolson, R.I. (1992). Automatisation deficits in balance for dyslexic children. Perceptual
Motor Skills, 75, 507–29.
Fawcett, A.J., Nicolson, R.I., and Dean, P. (1996). Developmental dyslexia: the cerebellar deficit hypothesis.
Trends in Neurosciences, 24, 508–511.
Filipek, P.A., Accardo, P.J., Baranek, G.T., et al. (1999). The screening and diagnosis of autistic spectrum
disorders. Journal of Autism and Developmental Disorders, 29, 439–84.
Foss-Feig, J.H., Kwakye, L.D., Cascio, C.J., et al. (2010). An extended multisensory temporal binding
window in autism spectrum disorders. Experimental Brain Research, 203, 381–89.
Foxe, J.J., and Molholm, S. (2009). Ten years at the Multisensory Forum: musings on the evolution of a
field. Brain Topography, 21, 149–54.
Foxe, J.J., and Schroeder, C.E. (2005). The case for feedforward multisensory convergence during early
cortical processing. Neuroreport, 16, 419–23.
Freeman, B. (1993). The syndrome of autism: update and guidelines for diagnosis. Infants and Young
Children, 6, 1–11.
Friedman-Hill, S.R., Wagman, M.R., Gex, S.E., Pine, D.S., Leibenluft, E., and Ungerleider, L. G. (2010).
What does distractibility in ADHD reveal about mechanisms for top-down attentional control?
Cognition, 115, 93–103.
Frith, C. (2004). Is autism a disconnection disorder? The Lancet Neurology, 3, 577.
Furnes, B., and Samuelsson, S. (2010). Predicting reading and spelling difficulties in transparent and
opaque orthographies: a comparison between Scandinavian and US/Australian children. Dyslexia, 16,
119–42.
Gergely, G. (2001). The object of desire: ‘Nearly, but clearly not, like me’: contingency preference in normal
children versus children with autism. Bulletin of the Menninger Clinic, 65, 411–26.
Gergely, G., and Watson, J.S. (1999). Early socio-emotional development: contingency perception and
social-biofeedback model. In Early social cognition: Understanding others in the first months of life
(ed. P. Rochat), pp. 101–36. Lawrence Erlbaum Associates, Hillsdale, NJ.
Sciences, 10, 278–85.
Gordon, N. (1982). The problems of the clumsy child. Health Visitor, 55, 54–57.
Gori, M., Del Viva, M., Sandini, G., and Burr, D. (2008). Young children do not integrate visual and haptic
information. Current Biology, 18, 694–98.
non-sighted children may reflect disruption of cross-sensory calibration. Current Biology, 20, 223–25.
Gothelf, D., Farber, N., Raveh, E., Apter, A., and Attias, J. (2006). Hyperacusis in Williams syndrome:
characteristics and associated neuroaudiologic abnormalities. Neurology, 66, 390–95.
Grandin, T., and Scariano, M. (1986). Emergence: Labeled Autistic. Arena Press, Novato, CA.
Green, D., Baird, G., and Sugden, D. (2006). A pilot study of psychopathology in developmental
coordination disorder. Child: Care, Health and Development, 32, 741–50.
Gubbay, S. (1975). The clumsy child—a study of developmental apraxia and agnosic ataxia. W.B. Saunders,
London.
Habib, M. (2000). The neurological basis of developmental dyslexia: an overview and working hypothesis.
Brain, 123, 2373–99.
Hairston, W.D., Burdette, J.H., Flowers, D.L., Wood, F.B., and Wallace, M.T. (2005). Altered temporal profile
of visual-auditory multisensory interactions in dyslexia. Experimental Brain Research, 166, 474–80.
Hamilton, A. F. (2009). Goals, intentions and mental states: challenges for theories of autism. Journal of
Child Psychology and Psychiatry, 50, 881–92.
Happé, F., and Frith, U. (1994). Autism: Beyond theory of mind. Cognition, 50, 115–32.
Happé, F., and Frith, U. (2006). The weak coherence account: detail-focused cognitive style in autism
spectrum disorders. Journal of Autism and Developmental Disorders, 36, 5–25.
Hari, R., Renvall, H., and Tanskanen, T. (2001). Left minineglect in dyslexic adults. Brain, 124, 1373–80.
Heath, S.M., Bishop, D.V.M., Hogben, J.H., and Roach, N.W. (2006). Psychophysical indices of perceptual
functioning in dyslexia: a psychometric analysis. Cognitive Neuropsychology, 23, 905–29.
Henderson, S., and Hall, D. (1982). Concomitants of clumsiness in young school-children. Developmental
Medicine and Child Neurology, 24, 448–60.
Hill, E.L. (1997). An investigation of the motor deficits in developmental coordination disorder and specific
language impairment. Unpublished thesis (PhD), University of Cambridge.
Hill, E.L. (2001). The nonspecific nature of specific language impairment: a review of the literature with
regard to concomitant motor impairments. International Journal of Language and Communication
Disorders, 36, 149–71.
Hill, E.L. (2010). Motor difficulties in specific language impairment: evidence for the Iverson account? – a
commentary on Iverson’s ’Developing language in a developing body: The relationship between motor
development and language development’. Journal of Child Language, 37, 229–61.
Hill, E.L., and Bishop, D.V.M. (1998). A reaching test reveals weak hand preference in specific language
impairment and developmental coordination disorder. Laterality, 3, 295–310.
Hill, E.L., and Frith, U. (2003). Understanding autism: insights from mind and brain. Philosophical
Transactions of the Royal Society B: Biological Sciences, 358, 281–89.
Hoare, D. (1994). Subtypes of developmental coordination disorder. Adapted Physical Activity Quarterly,
11, 158–69.
Hoare, D., and Larkin, D. (1991). Kinaesthetic abilities of clumsy children. Developmental Medicine and
Child Neurology, 33, 671–78.
Hobson, J. A., and Minshew, N. (2008). Sensory sensitivities and performance on sensory perceptual
tasks in high-functioning individuals with autism. Journal of Autism and Developmental Disorders, 38,
1485–98.
Hoehn, T.P., and Baumeister, A.A. (1994). A critique of the application of sensory integration therapy to
children with learning disablities. Journal of Learning Disabilities, 27, 338–50.
Hulme, C., Smart, A., and Moran, G. (1982). Visual perceptual deficits in clumsy children.
Johnson, M. H. (2005). Developmental Cognitive Neuroscience. Wiley-Blackwell, Oxford.
Jones, C., Happé, F., Baird, G., et al. (2009). Auditory discrimination and auditory sensory behaviours in
autism spectrum disorders. Neuropsychologia, 47, 2850–58.
Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217–50.
Kaplan, B.J., Polatajko, H.J., Wilson, B.N., and Faris, P.D. (1993). Reexamination of sensory integration
treatment. Journal of Learning Disabilities, 26, 342–47.
Karmiloff-Smith, A. (1998). Development itself is the key to understanding developmental disorders.
Trends in Cognitive Sciences, 2, 389–98.
Karmiloff-Smith, A. (2009). Nativism versus neuroconstructivism: Rethinking the study of developmental
disorders. Developmental Psychology, 45, 56–63.
Kientz, M.A., and Dunn, W. (1997). A comparison of the performance of children with and without autism
on the sensory profile. American Journal of Occupational Therapy, 51, 530–37.
Klin, A., Lin, D.J., Gorrindo, P., Ramsay, G., and Jones, W. (2009). Two-year-olds with autism orient to
non-social contingencies rather than biological motion. Nature, 459, 257–61.
Köhler, W. (1929). Gestalt psychology. Liveright, New York.
Koppen, C., and Spence, C. (2007). Audiovisual asynchrony modulates the Colavita visual dominance
effect. Brain Research, 1186, 224–32.
Kronbichler, M., Hutzler, F., and Wimmer, H. (2002). Dyslexia: verbal impairments in the absence of
magnocellular impairments. Neuroreport, 13, 617–20.
REFERENCES 297
Laszlo, J.I., and Bairstow, P.J. (1985). Test of kinaesthetic sensitivity. Holt, Rinehart and Winston Ltd., Eastbourne.
Laszlo, J.I., Bairstow, P.J., Bartrip, J., and Rolfe, U.T. (1988). Clumsiness or perceptuo-motor dysfunction?
In Cognition and action in skilled behaviour (eds. A. Colley and J. Beech), pp. 293–309. Elsevier Science
Publishers B.V., North Holland.
Laufer, Y., Ashkenazi, T., and Josman, N. (2008). The effects of a concurrent cognitive task on the postural
control of young children with and without developmental coordination disorder. Gait and Posture,
27, 347–51.
Laycock, S.K., Wilkinson, I.D., Wallis, L.I., et al. (2008). Cerebellar volume and cerebellar metabolic
characteristics in adults with dyslexia. Annals of the New York Academy of Sciences, 1145, 222–36.
Lee, D.N. (1980). The optic flow field: the foundation of vision. Philosophical Transactions of the Royal
Society London B, 290, 169–79.
Lee, D.N., and Aronson, E. (1974). Visual proprioceptive control of standing in human infants. Perception
Lee, D.N., and Lishman, J.R. (1975). Visual proprioceptive control of stance. Journal of Human Movement
Studies, 1, 87–95.
Leekam, S.R., Nieto, C., Libby, S.J., Wing, L., and Gould, J. (2007). Describing the sensory abnormalities of
children and adults with autism. Journal of Autism and Developmental Disorders, 37, 894–910.
Lingam, R., Hunt, L., Golding, J., Jongmans, M., and Emond, A. (2009). Prevalence of developmental
coordination disorder using the DSM-IV at 7 years of age: a UK population-based study. Pediatrics,
123, e693–e700.
Livingstone, M.S., Rosen, G.D., Drislane, F.W., and Galaburda, A.M. (1991). Physiological and anatomical
evidence for a magnocellular defect in developmental dyslexia. Proceedings of the National Academy of
Sciences USA, 88, 7943–47.
Lord, R., and Hulme, C. (1987). Kinaesthetic sensitivity in normal and clumsy children. Developmental
Medicine and Child Neurology, 29, 720–25.
Macnab, J.J., Miller, L.T., and Polatajko, H.J. (2001). The search for subtypes of DCD: Is cluster analysis the
answer? Human Movement Science, 20, 49–72.
Mari, M., Castiello, U., Marks, D., Marraffa, C., and Prior, M. (2003). The reach-to-grasp movement in children
with autism spectrum disorder. Philosophical Transactions of the Royal Society Series B, 358, 393–404.
McAnally, K.I., and Stein, J.F. (1996). Auditory temporal coding in dyslexia. Proceedings of the National
Academy of Sciences USA, 263, 961–65.
Miller, L J., Reisman, J.E., McIntosh, D.N., and Simon, J. (2001). An ecological model of sensory
modulation: performance of children with fragile X syndrome, autistic disorder, attention-deficit/
hyperactivity disorder, and sensory modulation dysfunction. In Understanding the nature of sensory
integration with diverse populations (eds. S. Smith-Roley, E. Blanche, and R. Schaaf), pp. 57–88.
Therapy Skill Builders, San Antonio, TX.
Milne, E., Swettenham, J., Hansen, P., Campbell, R., Jeffries, H., and Plaisted, K. (2002). High motion
coherence thresholds in children with autism. Journal of Child Psychology and Psychiatry, 43, 255–63.
Mon-Williams, M., Wann, J.P., and Pascal, E. (1999). The integrity of visual-proprioceptive mapping in
developmental coordination disorder. Developmental Medicine and Child Neurology, 41, 247–54.
Morein-Zamir, S., Soto-Faraco, S., and Kingstone, A. (2003). Auditory capture of vision: examining
temporal ventriloquism. Brain Research: Cogition Brain Research, 17, 154–63.
Mottron, L., Dawson, M., Soulières, I., Hubert, B., and Burack, J. (2006). Enhanced perceptual functioning
in autism: an update, and eight principles of autistic perception. Journal of Autism and Developmental
Disorders, 36, 27–43.
Muter, V., Hulme, C., Snowling, M.J., and Stevenson, J. (2004). Phonemes, rimes, vocabulary, and
grammatical skills as foundations of early reading development: evidence from a longitudinal study.
Nicolson, R.I., and Fawcett, A.J. (1990). Automaticity: a new framework for dyslexia research? Cognition,
30, 159–82.
Oberman, L.M., and Ramachandran, V.S. (2007). The simulating social mind: the role of the mirror
neuron system and simulation in the social and communicative deficits of autism spectrum disorders.
Oberman, L.S., and Ramachandran, V.S. (2008). Preliminary evidence for deficits in multisensory
integration in autism spectrum disorders: the mirror neuron hypothesis. Social Neuroscience,
3, 348–55.
O’Brien, J., Tsermentseli, S., Cummins, O., Happé, F., Heaton, P., and Spencer, J. (2009). Discriminating
children with autism from children with learning difficulties with an adaptation of the short sensory
profile. Early Child Development and Care, 179, 383–94.
O’Neill, M., and Jones, R.S.P. (1997). Sensory-perceptual abnormalities in autism: a case for more research?
Journal of Autism and Developmental Disorders, 27, 283–93.
Ornitz, E.M. (1989). Autism at the interface between sensory and information processing. In Autism:
nature, diagnosis and treatment (ed. G. Dawson), pp. 174–207. The Guilford Press, New York.
Paulesu, E., Demonet, J.F., Fazio, F., et al. (2001). Dyslexia: cultural diversity and biological unity. Science,
291, 2165–67.
Pennington, B.F., Gilger, J.W., Pauls, D., Smith, S.A., Smith, S.D., and DeFries, J.C. (1991). Evidence for
major gene transmission of developmental dyslexia. Journal of the American Medical Association, 266,
1527–34.
Piaget, J. (1952). The origins of intelligence in children. International University Press, New York.
Piek, J., and Coleman-Carman, R. (1995) Kinaesthetic sensitivity and motor performance of children with
developmental coordination disorder. Developmental Medicine and Child Neurology, 37, 976–84.
Pratt, M.L. (2009). Profiling patterns of movement disturbance and their relationship to cognition and
emotional wellbeing in two developmental disorders. Upublished thesis (PhD)., University of London.
Raberger, T., and Wimmer, H. (2003). On the automaticity/cerebellar deficit hypothesis of dyslexia: balancing
and continuous rapid naming in dyslexic and ADHD children. Neuropsychologia, 41, 1493–97.
Rajendran, G., and Mitchell, P. (2007). Cognitive theories of autism. Developmental Review, 27, 224–60.
Ramachandran, V.S., and Hubbard, E.M., (2001). Synaesthesia: a window into perception, thought and
Ramus, F., and Szenkovits, G. (2008). What phonological deficit? Quarterly Journal of Experimental
Ramus, F., Pidgeon, E., and Frith, U. (2003a). The relationship between motor control and phonology in
dyslexic children. Journal of Child Psychology and Psychiatry, 44, 712–22.
Ramus, F., Rosen, G.D., Dakin, S.C., et al. (2003b). Theories of developmental dyslexia: Insights from a
multiple case study of dyslexic adults. Brain, 126, 841–65.
Rizzolatti G., and Craighero, L. (2004). The mirror neuron system. Annual Review of Neuroscience, 27, 169–92.
Rochelle, K.S., and Talcott, J.B. (2006). Impaired balance in developmental dyslexia? A meta-analysis of the
contending evidence. Journal of Child Psychology and Psychiatry, 47, 1159–66.
Rochelle, K.S., Witton, C., and Talcott, J.B. (2009). Symptoms of hyperactivity and inattention can mediate
deficits of postural stability in developmental dyslexia. Experimental Brain Research, 192, 627–33.
Rogers, S.J., Hepburn, S., and Wehner, E. (2003). Parent reports of sensory symptoms in toddlers with autism
and those with other developmental disorders. Journal of Autism and Developmental Disorders, 33, 631–42.
Rutter, M., Caspi, A., Fergusson, D., et al. (2004). Sex differences in developmental reading disability: new
findings from 4 epidemiological studies. Journal of the American Medical Association, 291, 2007–2012.
Rutter, M., Kim-Cohen, J., and Maughan, B. (2006). Continuities and discontinuities in psychopathology
between childhood and adult life. Journal of Child Psychology and Psychiatry, 47, 276–95.
REFERENCES 299
Russo, N., Foxe, J.J., Brandwein, A.B., Altschuler, T., Gomes, H., and Molholm, S. (2010). Multisensory
processing in children with autism: high-density electrical mapping of auditory-somatosensory
integration. Autism Research, 3, 253–67.
Schoemaker, M.M., van der Wees, M., Flapper, B., Verheij-Jansen, N., Scholten-Jaegers, S., and Geuze, R.H.
(2001). Perceptual skills of children with developmental coordination disorder. Human Movement
Science, 20, 111–33.
Shams, L., Kamitani, Y., and Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408, 788.
Sigmundsson, H., Whiting, H.T.A., and Ingvaldsen, R.P. (1999). Proximal versus distal control in
proprioceptively guided movements of motor-impaired children. Behavioral Brain Research, 106,
47–54.
Sims, K., and Morton, J. (1998). Modelling the training effects of kinaesthetic acuity measurement in
children. Journal of Child Psychology and Psychiatry, 39, 731–46.
Sims, K., Henderson, S., Hulme, C., and Morton, J. (1996a). The remediation of clumsiness. I: An
evaluation of Laszlo’s kinaesthetic approach. Developmental Medicine and Child Neurology, 38, 976–87.
Sims, K., Henderson, S.E., Morton, J., and Hulme, C. (1996b). The remediation of clumsiness. II: Is
kinaesthesis the answer? Developmental Medicine and Child Neurology, 38, 989–97.
Sinnett, S., Spence, C., and Soto-Faraco, S. (2007). Visual dominance and attention: the Colavita effect
revisited. Perception & Psychophysics, 69, 673–86.
Skinner, R.A., and Piek, J.P. (2001). Psychosocial implications of poor motor coordination in children and
adolescents. Human Movement Science, 20, 73–94.
Smith, E.G., and Bennetto, L. (2007). Audiovisual speech integration and lipreading in autism. Journal of
Child Psychology and Psychiatry 48, 813–21.
Smyth, M.M., and Mason, U.C. (1997). Planning and execution of action in children with and without
developmental coordination disorder. Journal of Child Psychology and Psychiatry, 8, 1023–37.
Smyth, M.M., and Mason, U.C. (1998). Direction of response in aiming to visual and proprioceptive targets in
children with and without developmental coordination disorder. Human Movement Science, 17, 515–39.
Snowling, M.J. (1980). The development of grapheme-phoneme correspondence in normal and dyslexic
readers. Journal of Experimental Child Psychology, 29, 294–305.
Southgate, V., and Hamilton, A.F. (2008). Unbroken mirrors: challenging a theory of autism. Trends in
Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, and Psychophysics,
73, 971–95.
Spence, C., and Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford University Press, Oxford.
Spinelli, D., De Luca, M., Judica, A., and Zoccolotti, P. (2002). Crowding effects on word identification in
developmental dyslexia. Cortex, 38, 179–200.
Stein, J.F. (2001). The magnocellar theory of developmental dyslexia. Dyslexia, 7, 12–36.
Stein, J.F., and Fowler, M.S. (1993). Unstable binocular control in children with specific reading
retardation. Journal of Research in Reading, 16, 30–45.
Stein, J.F., and Walsh, V. (1997). To see but not to read; the magnocellular theory of dyslexia. Trends in
Stein, B.E., Perrault, T.J., Jr., Stanford, T.R., and Rowland, B.A. (2009). Postnatal experiences influence how
the brain integrates information from different senses. Frontiers in Integrative Neuroscience, 3, 21.
Stoodley, C.J., Fawcett, A.J., Nicolson, R.I., and Stein, J.F. (2005). Impaired balancing ability in dyslexic
children. Experimental Brain Research, 167, 370–80.
Stoodley, C.J., Fawcett, A.J., Nicolson, R.I., and Stein, J.F. (2006). Balancing and pointing tasks in dyslexic
and control adults. Dyslexia, 12, 276–88.
Sugden, D., and Chambers, M., eds. (2005). Children with developmental coordination disorder. Whurr
Publishers, London.
Tager-Flusberg, H. (1999). An introduction to research on neurodevelopmental disorders from a cognitive

neuroscience perspective. In Neurodevelopmental disorders (ed. H. Tager-Flusberg), pp. 3–24. MIT
Press, Cambridge, MA.
Tallal, P. (1980). Auditory temporal perception, phonics, and reading disabilities in children. Brain and
Language, 9, 182–98.
Tallal, P., and Piercy, M. (1973). Developmental aphasia: Impaired rate of non-verbal processing as a
function of sensory modality. Neuropsychologia, 11, 389–98.
Tallal, P., and Stark, R.E. (1982). Perceptual/motor profiles of reading impaired children with or without
concomitant oral language deficits. Annals of Dyslexia, 32, 163–76.
Tecchio, F., Benassi, F., Zappasodi, F., et al. (2003). Auditory sensory processing in autism: a
magnetocephalographic study. Biological Psychiatry, 54, 647–54.
Temple, E., Poldrack, R.A., Protopapas, A., et al. (2002). Disruption of the neural response to rapid
acoustic stimuli in dyslexia: evidence from functional MRI. Proceedings of the National Academy of
Sciences, 97, 13907–13912.
Tremblay, C., Champoux, F., Voss, P., Bacon, B.A., Lepore, F., and Théoret, H. (2007). Speech and
non-speech audio-visual illusions: a developmental study. PLoS One, 2, e742.
van der Leij, A., and van Daal, V.H. (1999). Automatization aspects of dyslexia: speed limitations in word
identification, sensitivity to increasing task demands, and orthographic compensation. Journal of
Learning Disabilities, 32, 417–28.
van der Smagt, M.J., van Engeland, H., and Kemner, C. (2007). Brief report: Can you see what is not there?
Low-level auditory-visual integration in autism spectrum disorder. Journal of Autism and
Developmental Disorders, 37, 2014–2019.
von Hofsten, C., and Rösblad, B. (1988). The integration of sensory information in the development of
precise manual pointing. Neuropsychologia, 26, 805–21.
Wann, J. P., Mon-Williams, M., and Rushton, K. (1998). Postural control and coordination disorders: the
swinging room revisited. Human Movement Science, 17, 491–513.
White, S., O’Reilly, H., and Frith, U. (2009). Big heads, small details and autism. Neuropsychologia, 47,
1274–81.
Williams, D. (1992). Nobody nowhere: the remarkable autobiography of an autistic girl. Jessica Kingsley
Publishers, London.
Williams, J.H., Massaro, D.W., Peel, N.J., Bosseler, A., and Suddendorf, T. (2004). Visual-auditory
integration dring speech imitation in autism. Research in Developmental Disabilities, 25, 559–75.
Wilson, B.N., Kaplan, B.J., Crawford, S., Campbell, A., and Dewey, D. (2000). Reliability and validity of a
parent questionnaire on childhood motor skills. The American Journal of Occupational Therapy, 54,
484–93.
Wilson, P.H., and McKenzie, B.E. (1998). Information processing deficits associated with developmental
coordination disorder: a meta-analysis of research findings. Journal of Child Psychology and Psychiatry,
39, 829–40.
Wimmer, H., Mayringer, H., and Raberger, T. (1999). Reading and dual-task balancing: evidence against
the automatization deficit explanation of developmental dyslexia. Journal of Learning Disabilities, 32,
473–78.
Wing, L. (1996). The autistic spectrum: a guide for parents and professionals. Constable and Robinson, London.
Wing, L., and Gould, J. (1979). Severe impairments of social interaction and associated abnormalities in
children: epidemiology and classification. Journal of Autism and Developmental Disorders, 9, 11–29.
World Health Organisation (1993). The ICD-10 classification for mental and behavioural disorders:
Diagnostic criteria for research. World Health Organisation, Geneva.
Yap, R.L., and van der Leij, A. (1994). Testing the automatization deficit hypothesis of dyslexia via a
dual-task paradigm. Journal of Learning Disabilities, 27, 660–65.
Zurif, E.B., and Carson, G. (1970). Dyslexia in relation to cerebral dominance and temporal analysis.
Chapter 13
Sensory deprivation and the

development of multisensory
integration
Brigitte Röder
13.1 Introduction
Environmental events are commonly characterized by input coming from more than one sensory
system: each modality provides unique and redundant information characterizing specific events
(Millar 2008). Research conducted over the past few years has convincingly demonstrated that
multisensory perception and multisensorially-guided actions often lead to more precise and
faster reactions as compared to single modality perception and action (e.g. Driver and Noesselt
2008; Spence and Driver 2004). In order to gain from crossmodal input, individuals must be able
to use crossmodal correspondences that allow them to assign inputs from each of the senses to the
same external event. Adults use supramodal features such as space, time, and semantic meaning
that can, to a certain degree, be coded by all senses in order to link modality-specific features of
an object such as colour, pitch, and weight.
In addition to temporal features (Lewkowicz 2000), spatial supramodal features are the most
frequently investigated: If two sensory events originate from the same spatial location it is more
likely that they belong to the same object than that they belong to different objects. However,
the use of spatial location as crossmodal binding feature is not a trivial problem for the brain to
solve: At the first processing stages, each sensory system uses modality-specific coordinate sys-
tems, i.e. retinotopically or somatotopically organized maps, or non-spatial representations, such
as the tonotopic representations of the auditory system. Modality-specific spatial codes must be
transformed in a way that allows for the efficient crossmodal matching of location information.
It has been argued that the visual modality, as the sense with the highest spatial precision, guides
crossmodal spatial processing and thus imposes upon other modalities the use of visual spatial
reference frames (Pouget et al. 2002).
This chapter focuses on the development of spatial representations for crossmodal binding.
In particular, the privileged role of vision during the development of spatial representations will
be examined by discussing the results from visually deprived individuals (this is known as
the retrospective developmental approach) and by relating these finding to the results from
prospective studies of development.
As broadly discussed in the other chapters in this volume, different views exist with regard to
how experience shapes multisensory functions (see especially Chapter 8 by Bahrick and Lickliter
and Chapter 7 by Lewkowicz). These views differ mainly in the proposed ‘starting point’ of the
developmental trajectory, i.e. whether or not some multisensory capabilities exist at birth. Views
based on the hierarchical developmental approach of Piaget (1952) assume that each sensory
system emerges first in isolation. It is not before single senses reach a certain degree of maturity
302 SENSORY DEPRIVATION AND THE DEVELOPMENT OF MULTISENSORY INTEGRATION
that they start to interact (this is known as the ‘integration view’). In contrast, the ‘differentiation
view’ assumes that all sensory systems are initially mingled and start parcelling out over the first
postnatal months and years (James 1890). Others have postulated that infants are born with
exuberant connections across modalities (‘neonatal synaesthesia’: see Maurer 1997), which are
successively pruned down or displaced by more specific interactions based on the use of supramo-
dal features with increasing complexity (Lewkowicz 2002). In line with this proposal is the finding
that, in infants, sensory stimuli cause a widespread brain activity and that activation patterns
gradually become less broad when the individual grows older (Stevens and Neville 2009).
All views are compatible with the idea that early multisensory functions are less specific (when
they emerge) and increase in their specificity and complexity as the child grows older. Lewkowicz
and Ghazanfar (2009) proposed a ‘multisensory perceptual narrowing’ hypothesis, based on the
assumption that many species are born with broadly tuned, but already significant, multisensory
abilities and that experience shapes or extracts multisensory functions that are adaptive to certain
species and in specific environments (Lewkowicz and Ghazanfar 2009). The multisensory percep-
tual narrowing hypothesis stresses that certain multisensory abilities may be lost rather than
gained during ontogeny. Differentiation and narrowing processes most likely go hand in hand
during different stages of multisensory development, and may rely on both selective and con-
structive neural mechanisms (Munakata et al. 2004). As a result, neurocognitive functions such as
multisensory integration become increasingly specialized and fine-tuned.
13.2 Crossmodal plasticity following sensory deprivation

Given the protracted and competitive functional specialization of perceptual-cognitive processes
during ontogeny, the lack of a major sensory channel, such as vision, must be expected to result
in a reorganization of neurocognitive functions: On the one hand, it might be expected that
|some systems reorganize in a way that provides enhanced performance based on input from the
intact sensory systems and are thus partially able to compensate for the loss of capacity. On
the other hand, systems that rely on the unique input features of the visual system might come to
operate in a qualitatively different way. This might give rise to lower performance under certain
conditions.
It is beyond the scope of the present chapter to review the extensive literature on crossmodal
reorganization and compensation after sensory deprivation (see Bavelier and Neville 2002;
Merabet and Pascual-Leone 2010; Pavani and Röder in press). Numerous studies have demon-
strated higher activity following non-visual stimulation in cortical regions generally predomi-
nantly associated with the visual system in blind adults. Crossmodal activity in the ‘visual’ brain
areas of totally blind individuals has been observed in tasks measuring tactile (Amedi et al. 2010;
Burton et al. 2004; Sadato et al. 1996), auditory (Rauschecker 1995; Röder et al. 1996; Weaver and
Stevens 2007) and olfactory (Cuevas et al. 2009) discrimination, Braille reading (Burton and
McLaren 2006; Gizewski et al. 2003; Sadato et al. 1998), sound localization (Collignon, Voss et al.
2009; Rauschecker 1995; Voss et al. 2004), spatial imagery (Röder et al. 1997; Vanlierde et al.
2003), motor responses (Fiehler et al. 2009), voice perception (Gougoux et al. 2009; Klinge et al.
2010), language perception (Röder et al. 2002, see Fig. 13.1), and short- and long-term memory
performance (Raz et al. 2005). The observation that transcranial magnetic stimulation (TMS)
over occipital cortex, which normally disrupts visual processing, can interfere with tactile percep-
tion (Cohen et al. 1997), word generation (Amedi et al. 2004), and auditory localization (Collignon
et al. 2007) in blind but not in sighted humans has generally been interpreted as providing evi-
dence for the functionally adaptive consequences of crossmodal plasticity. Thus, it could
be speculated that the same experience-driven neural mechanisms that result in an increased
CROSSMODAL PLASTICITY FOLLOWING SENSORY DEPRIVATION 303
Sighted Cong. blind
Broca’s area
Broca’s area
Wernicke’s area
Wernicke’s area Visual cortex

Fig. 13.1 Language-related functional magnetic resonance imaging activity in sighted (left) and
congenitally blind humans (right). While the sighted showed a left lateralized activity in the perisylvian
region, the congenitally blind showed corresponding activity in the right homologous brain structures
as well as in visual cortex. These figures used neuroradiological convention: the left hemisphere is
shown on the right side and visa versa. (Reproduced from European Journal of Neuroscience, 16 (5),
Brigitte Röder, Oliver Stock, Siegfried Bien, Helen Neville, and Frank Rösler, Speech processing
activates visual cortex in congenitally blind humans, pp. 930–6 © 2002, John Wiley and Sons, with
permission.) (Reproduced in colour in the colour plate section.)
specialization and functionality of brain systems during development result in plastic changes
mediating compensatory performance changes in the blind. However, reports of similar cross-
modal brain activation in late-blind adults (Lepore et al. 2010; Rösler et al. 1993; Voss et al. 2006)
and in sighted individuals blindfolded for a few days (Pascual-Leone and Hamilton 2001; Weisser
et al. 2005) have cast doubt on the notion that crossmodal activation of visual cortex is exclusively
a consequence of developmental processes. Researchers have therefore started to wonder to what
extent crossmodal activity in the early sensory cortex may also exist in non-deprived adults (e.g.
Macaluso et al. 2000; Zangaladze et al. 1999). If such activity does occur then the higher crossmo-
dal activity observed in sensory cortices in people with total sensory deprivation may not reflect a
massive neural and cortical reorganization (including the growth of new connections during
development) but rather a stabilization and strengthening of connections that already exist in the
normally developing nervous system (for recent support see Klinge et al. 2010). In terms of mul-
tisensory development, it would follow that the functional specialization observed during ontog-
eny might either rely more on physiological rather than structural-selective mechanisms or that
these processes are reversible rather than permanent.
Finally, recent studies using well-selected control conditions have also demonstrated strong
similarities in blind and sighted individuals in the functional organization of processes that have
often been tightly linked to the visual modality, including the representation of actions (mirror
neuron system of Ricciardi et al. 2009 and the ‘Theory of Mind’ network of Bedny et al. 2009)
and the dorsal stream for action control (Fiehler et al. 2009b). The latter findings suggest that the
non-deprived modality systems are capable of providing compensatory input to shape brain
systems that have often been exclusively linked to the visual system.
13.3 The role of visual input for multisensory development

Are there unique contributions of the visual system for multisensory processes as well? It has been
argued that synchronized, crossmodal stimulation is necessary to extract crossmodal invariants
(Bahrick et al. 2005; Gibson 1969), which may be crucial for specific and functional crossmodal
binding. If this hypothesis is correct, one would expect that multisensory interactions of the non-
deprived sensory modalities are not affected by sensory deprivation of one modality system or, at
least, are not qualitatively different from those of non-deprived individuals. Since, in
the blind, audio-tactile stimulation is not affected by visual deprivation, audio-tactile crossmodal
interactions should not be significantly affected either. On the other hand, one could argue
that during the development of multisensory interactions, one modality system might take over
the lead and calibrate the remaining sensory modalities. For example, it has been shown that
vision has a strong influence on auditory spatial learning (King 2009). Which modality takes over
the ‘instructional’ role might depend on the special supramodal binding feature (e.g. space or
time) involved. The point in time at which sensory systems mature may also be important.
Turkewitz and Kenny (1982) proposed that sensory limitations during early postnatal life allow
the earlier-developing sensory system to unfold without interference from later-maturing
systems, but that later-developing systems partially reorganize earlier-developed sensory repre-
sentations. The latter views would predict qualitatively different multisensory, and in particular
spatial, representations as a consequence of visual deprivation, given that vision is the last modal-
ity to fully mature (Gottlieb 1991). Thus, the visual deprivation approach allows one to investi-
gate the key role of experience and the unique contribution of the deprived modality for
multisensory development.
13.3.1 Studies in visually deprived animals

To date, only a few studies have addressed the issue of multisensory processing in individuals who
experience total sensory deprivation. Multisensory neurons seem to exist in primates at birth
(Wallace and Stein 2001), and emerge within the first two weeks in cats (Wallace and Stein 1997).
However, these neurons appear unable to integrate crossmodal input (for more detail on the early
development of neurophysiological correlates of multisensory integration see Chapter 14 by
Wallace et al.). In electrophysiological studies in animals, integration is generally defined as the
multisensory neural firing rate that is at least higher than the best unisensory response rate (the
maximum criterion, see Goebel and van Atteveldt 2009; Stein and Stanford 2008). Multisensory
enhancement is commonly seen when the two modality components of a crossmodal stimulus
fall within the overlapping area of the modality-specific receptive fields (RF) of a given neuron.
Wallace and his co-workers (2004) reported that cats reared in total darkness have a similar
number of multisensory neurons in the superior colliculus compared to sighted control animals.
In particular, a similar number of neurons as in sighted animals was found to be assigned to each
modality and to each modality combination (Wallace et al. 2004). However, there were two main
group differences. First, the multisensory neurons of dark-reared animals were not capable of
integrating, i.e. response rates to spatially aligned multisensory stimuli did not exceed response
rates to unisensory stimuli. Second, RFs were larger in the dark-reared animals, irrespective of
modality. The size of the RFs in blind animals resembled those of newborns (Wallace and Stein
1997, 2001). Interestingly, in control animals the shrinkage of receptive fields during development
THE ROLE OF VISUAL INPUT FOR MULTISENSORY DEVELOPMENT 305
to an adult size correlated with the emergence of the integrative properties of multisensory neu-
rons. Thus the lack of visual input seems to prevent the shrinkage of RFs in the superior colliculus
and, possibly as a consequence of this, integration properties of neurons remain at the level of
newborn animals. Importantly, this did not only hold for multisensory functions involving visual
stimuli but also for auditory–tactile combinations. This latter finding suggests that the visual
modality shapes all forms of crossmodal interactions based on spatial features and that synchro-
nized crossmodal experience (audio-tactile experience in this case) is not sufficient.
The effects of visual deprivation on multisensory cortical areas, however, differed somewhat:
Carriere et al. (2007) recorded neural activity in the anterior ectosylvian cortex of dark-reared cats.
Similar to the superior colliculus (SC), the relative proportions of unimodal and multimodal cells
did not differ a great deal between visually-deprived animals and controls (Carriere et al. 2007). In
contrast to the results found in the SC, multisensory neurons with integrative properties were
detected in the ectosyslvian cortex of dark-reared animals. However, the response characteristics
differed markedly from those of sighted control animals: First, many neurons, although not respond-
ing to visual stimuli presented alone, were modulated by concurrent visual stimulation. Second,
while bimodal stimulation in control animals resulted in response enhancement, response depres-
sion was observed in the dark-reared animals. Unfortunately, the functional implications of these
altered responses have not yet been investigated. In sum, studies in dark-reared animals suggest that
a neuron’s properties for the integration of inputs from more than one sensory system are experi-
ence-dependent. Moreover, these results suggest a privileged role of visual input for the develop-
ment of multisensory integration capacities, at least when space is used as a cue for binding.
13.3.2 Studies in congenitally blind humans

The results from the animal work suggest that the visual deprivation approach might provide a
fruitful method with which to investigate how experience shapes multisensory development. A
clear advantage of investigating humans as compared to animals is that behavioural consequences
of neural changes can be assessed more easily. By contrast, an obvious disadvantage of the sensory-
deprivation approach in humans is that the research participants are rather heterogeneous with
respect to blindness aetiology and previous efforts of rehabilitation. As outlined above, studies in
congenitally blind human adults allow one to test whether visual and/or simultaneous crossmodal
stimulation is necessary for the typical emergence of multisensory functions. The comparison of
congenital and late-blind individuals allows one to disambiguate the role of experience-dependent
influences during ontogeny and the effects of visual status among groups. In the following section,
studies on multisensory functions in totally blind human adults are discussed. In order to test for
critical or sensitive periods in multisensory development, the recovery of multisensory function
after a transient phase of sensory deprivation has to be investigated. This approach will be addressed
briefly below (for a more detailed discussion, see Lewkowicz and Röder in press).
There is a long tradition of research on space perception in the blind (see e.g. Thinus-Blanc
and Gaunet 1997). Originally, these studies were driven by the main goal of documenting
whether visual input is a necessary prerequisite for the full development of spatial skills in the
non-deprived modalities. While some early studies had reported disadvantages for the congeni-
tally blind as compared to late-blind and sighted control adults in some spatial tasks, numerous
other studies have failed to demonstrate such group differences in various spatial table-top and
locomotion tasks (Klatzky et al. 1995). For example, several spatial imagery studies have sug-
gested that congenitally blind individuals have access to holistic spatial representations with
Euclidian properties (Carpenter and Eisenberg 1978; Marmor and Zaback 1976; Röder and
Rösler 1998; Röder et al. 1997), although these representations might develop later (for a review
see Warren 1994).
Millar (2008) suggested that the development of the default use of an external (allocentric)
frame of reference rather than a self-referent egocentric frame of reference might depend on the
availability of distal cues during locomotion (Millar 2008). Since the visual system provides the
only means of assessing many of these distal cues, the use of an external frame of reference might
initially depend on vision. Since blind people use spatial representations with Euclidian proper-
ties, non-visual cues might be sufficient to set-up such spatial representations, and might also be
sufficient to adapt auditory spatial representations during development to new input values until
head growth is complete (King 2009). External coding is important for multisensory processing
and locomotion. Recently, it has been demonstrated that only children older than eight years start
to efficiently use multisensory input (proprioceptive and visual) for orienting in space (Nardini
et al. 2008). In order to gain from multisensory input for navigation, involving more precise
localization, different sensory inputs have to be linked spatially. Thus, a spatial representation
must exist that can be accessed by all sensory systems. There is extensive evidence from several
areas of research suggesting that eye-centred or external coordinates are important for linking
inputs of multiple senses (Pouget et al. 2002). Researchers have manipulated limb posture during
perceptual and motor tasks. An example of such a task used in many studies is to ask participants
to distinguish the temporal order of two tactile stimuli presented one after the other to each hand,
at a range at different interstimulus intervals. Participants either adopt a parallel- or a crossed-
hands posture. The task of the participant was to indicate which hand had been touched first
(Shore et al. 2002; Yamamoto and Kitazawa 2001). Despite the fact that this task does not force
participants to remap tactile input into external space, humans seem to do so automatically:
Temporal order judgements (TOJ) markedly decline when participants adopt a crossed-hands
posture. In order to explain this finding, it has been suggested that due to a default remapping of
somatosensory stimuli into external coordinates, a conflict arises between the misaligned soma-
tosensory (anatomical) coordinates and the external coordinates of the same tactile events
(Kitazawa 2002). Indeed, there is ample evidence from animal research (e.g. Avillac et al. 2004;
Mullette-Gillman et al. 2009) and humans (e.g. Bruns and Röder 2010; Heed and Röder 2010)
that sensory stimuli are coded in multiple reference frames, including an external coding.
A remapping of tactile stimuli into external space occurs within the first 100–200 ms of stimulus
presentation (Azanon and Soto-Faraco 2008; Bruns and Röder 2010; Heed and Röder 2010) and
both coordinates seem to be active at the same time (Heed and Röder 2010). The remapping of
tactile and auditory input into a visual-external frame of reference has been linked to the need to
control actions visually (Pouget et al. 2002, 2004).
To test whether the default use of external coordinates for any sensory localization depends on
visual experience during development, blind human adults were first tested with the tactile TOJ
task, both with parallel- and crossed-hands postures (Röder et al. 2004). The results showed that
the crossing-hand effect (i.e. the decline of TOJ performance under a crossed- compared to an
uncrossed-hand posture) was not observed in humans who had been blind since birth. As shown
in Fig. 13.2, congenitally blind individuals had overall lower thresholds, indicating a higher tempo-
ral resolution. This finding provides another example of crossmodal compensation in the blind:
Congenitally blind adults have a higher temporal resolution within their non-deprived modalities
(Gougoux et al. 2004; Muchnik et al. 1991). In order to exclude the possibility that the lack of a
crossing effect in the congenitally blind group was due to a ceiling effect, the highest-performing
sighted and the lowest-performing blind participants were matched according to their perform-
ance in the parallel-hand posture. As shown in Fig. 13.2, sighted controls, but not congenitally
blind individuals, displayed a crossing-hand effect. The finding that even high-performing sighted
individuals displayed a crossing-hand effect is consistent with other reports showing that crossing-
hand effects cannot be eliminated by extensive practice (Craig and Belser 2006), and has also been
observed in individuals who are generally characterized by enhanced temporal resolution similar
All participants Matched groups
150 145
Sighted
115 Late
blind Sighted (7)
100 91
JND
(ms) Cong. Cong.
47 blind blind (7)
50 43 41
24 35
29
23
0
II X II X II X II X II X
Posture
Fig. 13.2 Just noticeable differences (JND in ms) for sighted controls, congenitally blind and
late-blind adults for both the parallel (II) and crossed-hand posture (X). While sighted and late-blind
adults showed a crossing-hands effect (a decline of tactile localization) the congenitally blind adults
did not. This pattern of results for the congenitally blind and sighted adults was observed even after
matching for performance in the parallel-hand posture (right panel) (Reprinted from Current
Biology, 14 (2), Brigitte Röder, Frank Rösler, and Charles Spence, Early Vision Impairs Tactile
Perception in the Blind, pp. 121–4, Copyright (2004), with permission from Elsevier.) (Reproduced
in colour in the colour plate section.)
to that found in blind individuals (Craig and Belser 2006). These findings have thus been inter-
preted as providing strong support for the claim that an external remapping of tactile stimuli is
developed as a consequence of visual experience (Pouget et al. 2002). However, TOJ tasks are non-
speeded and thus it might be argued that the finding in the congenitally blind did not demonstrate
a lack of remapping of tactile stimuli into external space but rather an enhanced efficiency in
resolving the incongruent anatomical and external coding results. This hypothesis can be tested
with event-related potentials (ERPs), since this method allows for continuous monitoring of tactile
processing at a millisecond timescale, contrary to behavioural methods that are only able to assess
the end product of a sequence of multiple processing steps. Röder et al. (2008) presented tactile
stimuli in a random sequence to the left and the right index finger. A preceding tone indicated
which hand was task relevant in the next trial and spatial attention was manipulated by asking
participants to detect rare deviant touches (double rather than single touches) at the task-relevant
hand only, adopting a parallel- versus a crossed-hand posture. In agreement with earlier reports
(Eimer et al. 2001), the enhancement of ERPs due to attention emerged later and with reduced
amplitude in the crossed- as compared to the parallel-hands posture in sighted individuals. By
contrast, ERP attention effects did not differ between postures in the congenitally blind (Röder
et al. 2008). These results therefore provide strong evidence for the hypothesis that the blind do not
automatically or by default recode tactile stimuli in external coordinates. These conclusions are
also consistent with the observation of van Velzen and colleagues (van Velzen et al. 2006), who
failed to find ERP signs of orienting in external space in the congenitally blind.
Röder et al. (2008) reported two additional findings. First, the spatial attention effect in
congenitally blind individuals emerged later than in the sighted group: The sighted group showed
an early contralateral attention positivity in the uncrossed-hand posture, which was no longer
reliable when the hands were crossed. The blind did not show this effect reliably in any hand
posture. In the sighted, a later attention negativity was observed in both the uncrossed- and
crossed-hand condition as well, although with a lower amplitude in the latter condition
(Fig. 13.3). This attention negativity was the first reliable ERP spatial attention effect recorded in
the blind. However, it was again unaffected by hand posture in this group. Interestingly, this late
spatial attention effect in the congenitally blind was nearly the same size as in the sighted with
crossed hands. Moreover, while the sighted controls performed worse in the crossed than in the
uncrossed condition, performance of the blind did not differ between postures. Surprisingly, the
blind performed in both conditions at the level of the sighted in the crossed-hand posture. These
results might suggest that the use of visual-external coordinates facilitates the orienting of spatial
attention and sensory localization in the sighted. It might be speculated that crossing the hands
prevents a kind of redundancy gain of a parallel coding in multiple frames of reference rather
than a specific conflict processing when anatomical and external coordinates are misaligned. This
hypothesis has been supported by developmental studies on the emergence of the crossing hands
effect in the TOJ task (Pagel et al. 2009, see below; see also Röder et al. 2007b). Thus, vision might
further shape and facilitate spatial processes in the tactile and the auditory sensory systems (Bizley
and King 2009) by inducing a re-mapping of all sensory inputs into external coordinates (Mullette-
Gillman et al. 2009). The higher tactile-spatial resolution of the skin when the stimulated body
part is visible might arise from a similar process (Kennett et al. 2001).
In summary, these results might suggest that the development of sensory functions within
one modality system is not totally independent of the development of sensory functions in other
modality systems. While this finding would argue against the ‘integration’ view of multisensory
development, the findings discussed here do not necessarily support the extreme ‘differentiation
account’ or the ‘neonatal synaesthesia account’. Rather the data could be explained as well
in terms of successive development of spatial representations within sensory systems with the
Sighted Cong. blind

μV μV
−3 −3
−1 −1
[ms] [ms]
100 200 300 400 500
100 200 300 400 500 1
1
3 3
Uncrossed
(attended minus unattended)
Crossed
(attended minus unattended)
Fig. 13.3 ERP spatial attention effect (attended minus unattended ERPs) in the uncrossed- and
crossed-hand condition for sighted (left panel) and congenitally blind adults (right panel). Crossing
the hands resulted in delayed and reduced ERP attention effects in the sighted only (Reproduced
from European Journal of Neuroscience, 28 (3), Brigitte Röder, Julia Föcker, Kirsten Hötting, and
Charles Spence, Spatial coordinate systems for tactile spatial attention depend on developmental
vision: evidence from event-related potentials in sighted and congenitally blind adult humans,
pp. 475–83 © 2008, John Wiley and Sons, with permission.).
later-developing senses (vision) reorganizing spatial representations of earlier-developing senses

(Turkewitz and Kenny 1982). This latter account would predict that when the reorganizing influ-
ence of visual input is brought into operation, the existing spatial representations of all modalities
become refined and default modes of how to perform certain perceptual tasks change. This
hypothesis was tested in a multidimensional stimulus selection task in which the targets were
defined both along the temporal and spatial dimension.
Given that the blind possess superior temporal-processing skills but partially less efficient stimu-
lus-localization abilities, it was speculated that under high input load, stimulus-selection strategies
might differ between blind and sighted individuals, i.e. that blind individuals would prefer temporal
rather than spatial selection strategies. Röder et al. (2007a) used a selective attention paradigm in
which the participants were asked to attend to sounds defined by one of two locations (presented in
the left versus right hemifield) and one of two possible interstimulus intervals (short versus long)
while ignoring all stimuli presented at the task-irrelevant location and the task-irrelevant point in
time. Temporal attention, at least in the auditory and tactile system, has been shown to cause a
similar enhancement of early, sensory-related stimulus processing as spatial attention (Lange and
Röder 2010). An analysis of the ERP amplitudes revealed a parallel selection based on space and time
at an early processing stage (90–130 ms after stimulus onset) in the sighted controls. However, at
this early processing level, the congenitally blind adults selected auditory stimuli on the basis of time
only (Röder et al. 2007a). In later processing steps (150–400 ms after stimulus onset), the blind
engaged in a parallel selection based on time and space, while the sighted used a serial selection
strategy, first processing spatial and then temporal stimulus features. Interestingly, ERP spatial
attention effects had a different, more posterior topography in the blind than in the sighted, while
the scalp distribution for temporal ERP attention effects did not differ between groups. A reorgani-
zation of auditory spatial representations in blind individuals is in accord with findings from animal
studies (Rauschecker 1995) and from both ERP (Röder et al. 1999b), brain-imaging (Gougoux et al.
2005), and TMS (transcranial magnetic stimuation) studies in humans (Collignon et al. 2009).
Interestingly, there is no evidence yet for a similar reorganization of temporal representations in the
blind. According to the framework of Turkewitz and Kenny (1982), this pattern might suggest that
given the overall higher temporal resolution of the auditory system, temporal processing, in contrast
to spatial processing, is not reorganized by the later-developing visual system.
In contrast, when considering ‘space’ as a supramodal feature, results converge on the conclu-
sion that vision imposes upon the residual senses the use of a visual-external spatial frame of refer-
ence, which might result in an overall processing gain for multisensory perception and action
control in sighted individuals. Based on the findings in dark-reared cats (Carriere et al. 2007;
Wallace et al. 1997, 2004), it might be hypothesized that the congenitally blind do not integrate
inputs based on spatial features (at least not in an automatic fashion) and that the altered default
spatial encoding mode is responsible for, or at least contributes to, these altered multisensory inter-
actions. This hypothesis was directly tested with a multidimensional selective-attention task in
humans who had been blind from birth: Tactile and auditory stimuli were presented in a random
sequence from two different locations; the participants had to select the input according to modal-
ity (auditory versus tactile stimuli) and location (left or right hemifield; (Hötting et al. 2004).
Usually, three main ERP attention effects emerge in such tasks and were, indeed, replicated by
Hötting et al. (see Fig. 13.4). Early ERPs (with a latency shorter than 200 ms) were enhanced:
◆ for stimuli of the task relevant compared to the task irrelevant modality (intermodal attention
effect)
◆ for stimuli presented at the task relevant vs. the task irrelevant location for the task relevant
modality (unimodal spatial-attention effect)
◆ and for the task irrelevant modality (crossmodal spatial-attention effect).
Somatosensory ERPs Auditory ERPs

Sighted Cong. blind Sighted Cong. blind
Cluster C4 Unimodal Cluster C4 Unimodal Cluster C5 Cluster C5
attention effect attention effect
−2.0 −2.0 Unimodal
attention effect
−3.0 Crossmodal −3.0
Crossmodal
−1.0 attention effect −1.0 attention effect
−1.0 −1.0
ms ms ms ms
1.0 1.0
1.0 1.0
3.0 3.0
2.0 2.0
5.0 5.0
μV μV
μV μV
Attend touch, attended side (M+/L+) Attend tones, attended side (M+/L+)
Attend touch, unattended side (M+/L−) Attend tones, unattended side (M+/L−)
Attend tones, attended side (M−/L+) Attend touch, attended side (M−/L+)
Attend tones, unattended side (M−/L−) Attend touch, unattended side (M−/L−)
Fig. 13.4 Somatosensory (left) and auditory ERPs (right) of sighted and congenitally blind adults in a
multisensory spatial attention task. While both groups showed unisensory spatial attention effects,
congenitally blind humans did not show crossmodal spatial attention effects at early processing
stages as the sighted did (Reproduced from Experimental Brain Research, 159 (3), Kirsten Hötting,
Altered auditory-tactile interactions in congenitally blind humans: an event-related potential study,
pp. 370–381 © 2004, with permission from Springer Science + Business Media.)
The crossmodal ERP spatial-attention effect, i.e. a spatial attention effect for task-irrelevant stim-
uli, has generally been interpreted as evidence for an automatic recoding of sensory input into a
common spatial representation. Thus, if a recoding of sensory input into a visual-external refer-
ence frame is necessary for an automatic spatial matching of sensory inputs, crossmodal spatial-
attention effects are not expected for congenitally blind adults. This is exactly the pattern of results
obtained by Hötting et al. (2004) (see Fig. 13.4). Accordingly, it has recently been shown that con-
genitally blind individuals do not gain from redundant auditory–tactile cues for localization when
head-centred and somatotopic reference frames are misaligned (Collignon and De Volder 2009).
In summary, these data suggest that visual experience is necessary to set up spatial representa-
tions that are necessary for a default or automatic crossmodal interactions of any sensory system
based on the spatial location of the input, suggesting a privileged role of vision for multisensory
spatial processes.
So far the possible advantage of encoding sensory stimuli in a common frame of reference for
perception has been discussed. The fact that has been neglected thus far is that we are usually not
static passive observers, but instead we actively interact with the world around us. The advantages
of multisensory guided actions have been reported, for example, for saccadic reaction times
(Kirchner and Colonius 2005). Each eye movement changes the relation between the visual world
and the body including the auditory world coded in head-centered coordinates.
Correspondingly, each manual action changes the relative orientation of the body (hand) and
the visual and auditory world etc. In order to rapidly assess whether a movement or action was
successful (i.e. whether it reached its target position), the brain has to predict the change in the
sensory world as a consequence of the motor action (see e.g. the reafference principle; von Holst
1950). Moreover, the efference copy allows an individual to initiate an anticipatory attention shift
to the intended target position. For example, it has been shown that the processing of visual
(Baldauf and Deubel 2009; Collins et al. 2008, 2010; Eimer and Van Velzen 2006), auditory
(Gherri et al. 2008), and tactile stimuli (Karnath et al. 2002) is enhanced at the motor endpoint of
both saccadic eye and of manual movements. Thus, the posture of effectors has to be continu-
ously updated with respect to other body parts and with respect to the external world.
If sensory inputs are remapped into external coordinates, the remapping of body posture is pre-
dicted to occur in external coordinates as well (Heed and Röder 2010; Longo and Haggard 2010;
Schicke and Röder 2006). If vision is essential for the automatic updating of the sensory world in exter-
nal space and with regard to the body, the prediction is that congenitally blind people do not auto-
matically update the sensory world with respect to their body. In turn, if they have to update the
sensory world with respect to their effectors, it will take them longer as compared to sighted controls.
To test these predictions, a standard Simon task was used (Simon et al. 1970): In this task, two
stimuli are presented, usually one in each hemifield. The participants generally have to respond
with two keys, arranged in the horizontal dimension, to a stimulus dimension unrelated to the
location of the stimulus. For example, participants have to indicate the pitch of the sound origi-
nating from a loudspeaker placed on the left or right by pressing either a left or right response key.
The Simon effect shows up as an increase of reaction times when stimulus and response location
do not correspond. Importantly, when hands are crossed the Simon effect prevails with respect to
the external location of the stimuli and the effectors, e.g. participants respond more rapidly with
the left hand, resting on the right key to stimuli at the right rather than to stimuli at the left hemi-
field (Simon et al. 1970). While congenitally blind individuals showed a similar Simon effect as
the sighted controls when adapting a parallel uncrossed-hand posture (Röder et al. 2007b), their
Simon effect reversed when they adopted a crossed-hand posture; that is, they were still faster in
responding with the left hand to stimuli of the left hemifield although their left hand was now
located in the right hemifield. Moreover, while the sighted participants showed an overall cost of
hand crossing, the congenitally blind did not.
First of all, these results show that spatial representations activated during perceptual coding
and motor control interact similarly in blind and sighted individuals. This hypothesis has recently
been explicitly supported using neuroimaging techniques (Fiehler et al. 2009b). In their task,
participants had to perform movements with a stylus along a predefined path (forming different
shapes), which they had to recall after a delay phase. Congenitally blind individuals activated
similar dorsal pathway structures as sighted controls, suggesting that dorsal pathway structures
important for movement control develop in the absence of vision as well (possibly as a conse-
quence of sensorimotor feedback involving the non-visual senses). Interestingly, Fiehler et al.
observed greater activation of auditory cortices in the blind than in the sighted controls, suggest-
ing that although both sighted and blind individuals use proprioceptive feedback for motor con-
trol, the blind might rely more on auditory cues.
Röder et al. (2007b) hypothesized that sighted people must have an advantage resulting from the
external coding of sensory stimuli and effector location in a situation in which, as is usual in everyday
life, the relative position of effectors and action targets is task relevant. Thus, the same experimental
set-up was used in a second experiment but participants now had to respond with the hand beneath
the sound source rather than to the pitch of the sound. In this situation, larger costs of hand crossing
were observed in the congenitally blind than in the sighted controls. Importantly, however, error rates
in the blind were lower than 10%. That is, they were able to perform the task but it took them longer
to match the location of the hands with the location of the sound source. Thus it might be speculated
that congenitally blind individuals had to perform an extra processing step, i.e. a posture remapping
with respect to the external world, which happened automatically in the sighted. This behavioural
disadvantage of the congenitally blind is reminiscent of disadvantages in tactile localization and tactile
spatial attention orienting (Röder et al. 2008), as well as of the lack of multisensory localization gains
seen in crossed-hands conditions (Collignon and de Volder 2009).
In summary, these retrospective developmental studies in blind people suggest that the domi-
nant reference frame for sensory localization might be the result of an increasing dominance of
vision for spatial cognition. This assumption is in line with developmental studies that have sug-
gested that supramodal (external) reference frames emerge in parallel with an increasing visual
dominance (Warren and Pick 1970). Thus if there is no need for controlling eye movements and
if the high accuracy spatial sense—vision—is lacking from birth, a default remapping of sensory
input into a common (external) coordinate system might not be made to the default spatial
localization mode, resulting in processing advantages when the external location of stimuli and
effectors is task-irrelevant but in a processing disadvantage when the effector positions have to be
updated with respect to external stimuli.
13.3.3 Studies in late-blind humans

So far, the role of visual experience in shaping multisensory functions during ontogeny has been
discussed. It has implicitly been assumed that the group differences in sensory and multisensory
localization observed between sighted and blind individuals result from a visually guided devel-
opmental shaping of spatial representations. However, data obtained from the congenitally blind
alone cannot be interpreted unequivocally in this way. In fact, the current visual status, rather
than the lack of developmental experience, might be the driving factor for the way sensory stimuli
are processed. For example, a number of studies have demonstrated that human observers inte-
grate input of different senses in an optimal manner. Changes in the reliability of sensory inputs
are immediately taken into account and crossmodal integration adapts accordingly (Alais and
Burr 2004; Ernst and Banks 2002). Accordingly, it has been shown that spatial perception is not
always dominated by vision as the sense that generally has the highest spatial resolution. If visual
input is made less reliable by blurring the visual stimuli, auditory spatial information has a higher
weighting and might even dominate spatial localization (Alais and Burr 2004). This finding indi-
cates that our brain does not (exclusively) use innate or acquired biological-constrained sensory
biases but rather adapts the integration process in a highly flexible way to the inputs provided by
the conditions/environment. Thus, the lack of a default coding of sensory input in external coor-
dinates, as observed in the congenitally blind, might result from a flexible adaptation of the brain
to a new sensory situation rather than indicating that experience is necessary for the emergence of
such spatial remapping mechanisms during development.
This question can be addressed in studies with late-blind individuals: people who lost their
eyesight after the end of the major steps of brain development, i.e. after the age of 12 years. A
number of studies on crossmodal processing using space as a supramodal binding feature (with a
manipulation of hand posture) have been conducted in late-blind individuals (Collignon and De
Volder 2009; Röder et al. 2004, 2007b). The results were both impressive and clear, in that the
performance of the late blind was indistinguishable from the performance of the sighted control.
For example, late-blind adults displayed crossed-hand effects in tactile perceptual tasks, such as
the tactile TOJ task (Röder et al. 2004). Moreover, the control of auditory guided actions operated
in external space in the late blind as in the sighted (Röder et al. 2007b), and they showed similar
multisensory advantages when matching body parts and external stimuli in space (Collignon and
De Volder 2009; Röder et al. 2007b). First of all, these data suggest that the prevailing visual status
is unlikely the reason for differences observed between congenitally blind adults and sighted con-
trols. Second, the finding that the late blind do not differ from the sighted in multisensory locali-
zation suggests that the brain systems that mediate multisensory processes and action control do
not remain totally plastic throughout life. Rather, developmental visual input may trigger selec-
tive and constructive developmental mechanisms (see e.g. Lewkowicz and Ghazanfar 2009) that
irreversibly result in the coding of spatial information within an external reference frame. The use
THE DEVELOPMENT OF SUPRAMODAL SPATIAL REPRESENTATIONS 313
of this frame of reference facilitates multisensory processing, even when an individual has been
blind for several decades. On the basis of this evidence, it might be concluded that the optimal
integration observed in adults (Alais and Burr 2004; Ernst and Banks 2002) works within certain
limits of plasticity. The overall restrictions of adult plasticity are the result of experience-driven
neural shaping in the course of brain development (see also Knudsen 2004).
13.4 The development of supramodal spatial representations

Before drawing conclusions with regards to some major principles of multisensory development,
one alternative explanation for the visual deprivation effects reported so far has to be ruled out. For
instance, it might be argued that humans (and other animals) are born with a preferred or auto-
matic use of external spatial coordinates for multisensory action control. Following visual depriva-
tion, this default mode might be switched to the use of anatomically anchored coordinates. Though
this account would be in conflict with the major theories on the development of spatial cognition
(Piaget 1952), other theories have postulated the existence of predominant visual orienting to body
parts as early as in the first half year of life (Bremner et al. 2008). In order to exclude the alternative
account, that congenitally blind individuals become more anatomically orientated, prospective
developmental studies have to be conducted. Such studies have, indeed, now provided evidence
that crossing-hand effects in tactile TOJ tasks do not emerge earlier than the fifth to sixth year of
life (Pagel et al. 2009; see Fig. 13.5), suggesting that an automatic remapping of sensory stimuli into
external coordinates does not exist at birth but rather emerges in parallel with an increasing visual
dominance (Warren and Pick 1970). However, spatial correspondence between tactile and visual
(e.g. Bremner et al. 2008), and visual and auditory stimuli (e.g. Morrongiello and Fenwick 1991;
Neil et al. 2006) can be detected by infants within the first year of life. Accordingly, the congenitally
blind are able to update their body in external space (Röder et al. 2007b), but they do not do so
automatically. Thus, developmental vision seems to induce an automatic remapping of sensory
0.04
(Regression slope uncrossed minus
probit (unitless) per SOA(ms)
0.03
regression slope crossed)
Crossed-hand effect
0.02
0.01
0.00
−0.01
5:00 6:00 7:00 8:00 9:00 10:00

Age (years)
Fig. 13.5 Emergence of crossing-hands effects over the course of childhood in a tactile TOJ task.
The figures shows the difference scores for each single participant (°) for the crossed minus the
uncrossed condition. Crossing-hands effects and thus the automatic encoding of sensory input into
external coordinates emerge within the second half of the fifth year (Reproduced from Developmental
Science, 12 (6), Birthe Pagel, Tobias Heed, Brigitte Röder, Change of reference frame for tactile
localization during child development, pp. 329–37 © 2009, John Wiley and Sons, with permission).
input into external coordinates (while still keeping active the modality-specific coordinates of
touch), which in most situations facilitates the control of actions.
13.5 Critical periods for the development of

multisensory functions
Critical periods have been defined as the phases in development during which deviant inputs
result in irreversible changes of function (e.g. Knudsen 2004). Thus in order to conclude that
multisensory binding based on spatial features are indeed linked to a certain time period in life, it
is necessary to show that congenitally blind people, whose sight has successfully been restored
(e.g. by cataract surgery) do not acquire these multisensory functions. People who were born
blind or who became blind within the first year of life and whose sight could be restored in adult-
hood are very rare (see Fine et al. 2003 and Sacks 1995). A relatively consistent finding, however,
is that these patients often develop basic visual functions but show marked impairments in visual
spatial orienting, while orienting towards non-visual stimuli seems to be unimpaired (Hyvarinen
et al. 1981). Hyvärinen et al. (1981) suggested that crossmodal plasticity increases the further up
a brain region is in the visual processing hierarchy. In particular, they observed in primates that
multisensory parietal functions lack visual responsiveness even after long recovery periods
(Carlson et al. 1987). A similar reorganization of multisensory brain regions has been docu-
mented in cats (Rauschecker 1995) and humans (De Volder et al. 1997; Röder et al. 1999a,b).
Thus it might be speculated that competition between modalities in multisensory brain regions
during development results in irreversible loss of visual access. Additionally, if certain functions
develop much later in ontogeny, missing input and successful competition of sensory input in
multisensory brain regions might translate into behavioural impairments at a later point in time
when the more complex function is ready to be acquired. This phenomenon has been called
the ‘sleeper effect’ (Maurer et al. 2007b). It might be speculated that competition in multisensory
regions resembles the competition of inputs in predominantly unisensory brain regions.
For example, missing inputs of one eye during the first months of life result in a permanent
impairment of binocular responsiveness of visual cortex neurons (Wiesel and Hubel 1965) and
a lack or impairment of stereo vision and other more complex visual functions (Maurer et al.
2007a; Putzar et al. 2007a). Indeed, the ‘multisensory perceptual narrowing’ view explicitly pro-
poses competitive processes among multisensory connections (Lewkowicz and Ghazanfar 2009).
From these data, it could be predicted that (at least some) multisensory functions do not (fully)
recover after a (not-yet-defined) critical or sensitive period has been passed. As mentioned above,
this prediction is in accordance with findings in animal studies (Carlson et al. 1987; Wallace et al.
2004) and some recent studies in humans (Putzar et al. 2007b, 2010).
13.6 From multisensory development to

crossmodal compensation
As mentioned in the section on crossmodal plasticity, sensory deprivation in one modality results
in crossmodal compensation and reorganization in a number of functions associated with the
non-deprived modalities (Bavelier and Neville 2002; Merabet and Pascual-Leone 2010; Pavani
and Röder 2011). It might be speculated that the mechanisms of crossmodal compensation might
differ as a function of the onset of sensory deprivation and thus whether or not certain multisen-
sory capacities have already been developed. Röder et al. (1999b) used an auditory spatial atten-
tion paradigm and simultaneously recorded ERPs. The participants in this study had to attend to
the central speaker (at 0° azimuth) or, in other trials, to the peripheral speaker (at 90° azimuth).
MAIN CONCLUSIONS AND HYPOTHESES 315
Sounds were presented successively, in a random order, from these two and an additional six
speakers. Participants had to localize sounds at the attended location only, while ignoring all
sounds from the remaining speakers. For central sound sources, blind and sighted participants
displayed a similar spatial tuning (as indicated by the sharpness of the tuning of the ERP spatial
attention effect between 100 and 200 ms). By contrast, a sharper spatial tuning for peripheral
sound sources was observed in the congenitally blind than in sighted controls. Accordingly, at the
behavioural level, more precise sound localization was observed in the congenitally blind than in
the sighted controls for peripheral locations only. Interestingly, the ERP spatial attention effect
was more posteriorly distributed over the scalp in the congenitally blind than in sighted controls.
The authors interpreted this finding as evidence for a reorganization of multisensory space repre-
sentations, in particular an expansion of auditory spatial representations at the expense of visual
spatial representations in parietal brain structures (see Hyvarinen et al. 1981).
Interestingly, the N1 spatial attention effect indicates a processing stage that is crossmodally
modulated (see above; Eimer 2001; Föcker et al. 2010; Hillyard et al. 1984; Hötting et al. 2003).
This crossmodal influence can only emerge after the remapping of sensory input into a coordi-
nate system that can be accessed by all involved senses (see Gondan et al. 2004). Moreover, as
mentioned above, congenitally blind people do not show crossmodal spatial attention effects at
this processing stage. Thus, it might be speculated that a reorganization of spatial representations
(Röder et al. 1999) can result on the one hand in crossmodal compensatory behaviour in the
congenitally blind and, on the other, in a change in crossmodal processing among the non-
deprived senses. Since it has been shown that late-blind adults use external coordinates for cross-
modal processing (Collignon and De Volder 2009; Röder and Rösler 2004), it follows that these
spatial representations should not be reorganized in this group. This is exactly the pattern of
results which was observed using the same auditory spatial attention task (Fieger et al. 2006). The
scalp distribution of auditory spatial attention effects was indistinguishable in late-blind and
sighted adults. Interestingly, however, late-blind individuals perform at a similarly enhanced level
in the auditory localization tasks as congenitally blind individuals, i.e. despite the fact that no
crossmodal plasticity was observed at early processing stages (100–200 ms poststimulus presenta-
tion). A closer look at their ERP data, however, revealed that late selection mechanisms (>200 ms)
were more sharply tuned in this group than in their age-matched sighted control group. Thus it
might be concluded that early spatial remapping processes and related representations are set up
during development and cannot be changed considerably thereafter. Due to the limited plasticity
of this processing stage, it cannot be used to support compensatory behaviour. Instead, other
brain systems, possibly involving the slower spatial processing route, may mediate adult compen-
satory plasticity. Incidentally, it might be speculated that the same constraints contribute to the
lack of adaptation of the body schema after the loss of a limb (Flor et al. 2006). The body schema
is known to be represented in multiple spatial reference frames including somatosensory and
visual coordinates (e.g. Botvinick and Cohen 1998; Heed and Röder 2010). These findings indi-
cate that developmental and adult neural plasticity differ but may result in the same behaviour
(compensatory performance improvement) (see Thinus-Blanc and Gaunet 1997 for the advan-
tages of being late blind as compared to congenitally blind in spatial tasks due to access to visual
representations). Thus, early (multisensory) development sets up the constraints for later neural
plasticity.
13.7 Main conclusions and hypotheses

From the results of studies using the sensory deprivation model as a retrospective development
approach to study multisensory development, one might conclude that none of the extreme views
of multisensory development—neither the integration/hierarchical developmental approach nor

the extreme differentiation view—is able to account for the results on multisensory processes
after sensory deprivation. The finding that visual input reorganizes the spatial representations for
(unisensory) auditory and tactile localization seems to be incompatible with both the hierarchical
development and the differentiation view. A heterochronous development of spatial representa-
tions within each sensory system and a retroactive influence of late onto early representations
is reminiscent of the ideas of Turkewitz and Kenny (1982) but cannot be easily explained by
the ‘multisensory perceptual narrowing’ view. Finally, the observation of sensitive periods for
the development of multisensory processes seems to be compatible with the view that specific
experience is necessary to trigger both the selective and constructive neural processes that are
necessary for the specialization and fine tuning of multisensory processes. While short-term and
small-scale adaptations are observed to guarantee optimal use of crossmodal input in various
contexts in adulthood, more extensive reorganizations including major principles of multisen-
sory processes, such as the use of default spatial reference frames for sensory localization seem to
be limited to early development.
Acknowledgements
The author was supported by grants of the German Research Foundation and the European
Research Council. The author thanks Elena Nava and the editors for comments on the chapter
and for English editing.
References
Amedi, A., Floel, A., Knecht, S., Zohary, E., and Cohen, L.G. (2004). Transcranial magnetic stimulation of
the occipital pole interferes with verbal processing in blind subjects. Nature Neuroscience, 7, 1266–70.
Amedi, A., Raz, N., Azulay, H., Malach, R., and Zohary, E. (2010). Cortical activity during tactile exploration
of objects in blind and sighted humans. Restorative Neurology and Neuroscience, 28, 143–56.
Avillac, M., Olivier, E., Deneve, S., Ben Hamed, S., and Duhamel, J.R. (2004). Multisensory integration in
multiple reference frames in the posterior parietal cortex. Cognitive Process, 5, 159–66.
Azanon, E., and Soto-Faraco, S. (2008). Changing reference frames during the encoding of tactile events.
Bahrick, L.E., Hernandez-Reif, M., and Flom, R. (2005). The development of infant learning about specific
face-voice relations. Developmental Psychology, 41, 541–52.
Baldauf, D., and Deubel, H. (2009). Attentional selection of multiple goal positions before rapid hand
movement sequences: an event-related potential study. Journal of Cognitive Neuroscience, 21, 18–29.
Bavelier, D., and Neville, H.J. (2002). Cross-modal plasticity: where and how? Nature Review Neuroscience,
3, 443–52.
Bedny, M., Pascual-Leone, A., and Saxe, R.R. (2009). Growing up blind does not change the neural bases of
Theory of Mind. Proceedings of the National Academy of Sciences, 106, 11312–11317.
Bizley, J.K., and King, A.J. (2009). Visual influences on ferret auditory cortex. Hearing Research, 258, 55–63.
Bruns, P., and Röder, B. (2010). Tactile capture of auditory localization: an event-related potential study.
Burton, H., and McLaren, D.G. (2006). Visual cortex activation in late-onset, Braille naive blind individuals: an
fMRI study during semantic and phonological tasks with heard words. Neuroscience Letters, 392, 38–42.
REFERENCES 317
Burton, H., Sinclair, R.J., and McLaren, D.G. (2004). Cortical activity to vibrotactile stimulation: an fMRI
study in blind and sighted individuals. Human Brain Mapping, 23, 210–28.
Carlson, S., Pertovaara, A., and Tanila, H. (1987). Late effects of early binocular visual deprivation on the
function of Brodmann’s area 7 of monkeys (Macaca arctoides). Brain Research, 430, 101–111.
Carpenter, P.A., and Eisenberg, P. (1978). Mental rotation and the frame of reference in blind and sighted
individuals. Perception and Psychophysics, 23, 117–24.
Carriere, B.N., Royal, D.W., Perrault, T.J., et al. (2007). Visual deprivation alters the development of
cortical multisensory integration. Journal of Neurophysiology, 98, 2858–67.
Cohen, L.G., Celnik, P., Pascual-Leone, A., et al. (1997). Functional relevance of cross-modal plasticity in
blind humans. Nature, 389, 180–83.
Collignon, O., Davare, M., Olivier, E., and De Volder, A.G. (2009). Reorganisation of the right occipito-
parietal stream for auditory spatial processing in early blind humans. A transcranial magnetic
stimulation study. Brain Topography, 21, 232–40.
Collignon, O., and De Volder, A.G. (2009). Further evidence that congenitally blind participants react
faster to auditory and tactile spatial targets. Canadian Journal of Experimental Psychology, 63, 287–93.
Collignon, O., Lassonde, M., Lepore, F., Bastien, D., and Veraart, C. (2007). Functional cerebral
reorganization for auditory spatial processing and auditory substitution of vision in early blind
subjects. Cerebral Cortex, 17, 457–65.
Collignon, O., Voss, P., Lassonde, M., and Lepore, F. (2009). Cross-modal plasticity for the spatial
processing of sounds in visually deprived subjects. Experimental Brain Research, 192, 343–58.
Collins, T., Heed, T., and Röder, B. (2010). Visual target selection and motor planning define attentional
enhancement at perceptual processing stages. Frontiers in Human Neuroscience, 4, 14.
Collins, T., Schicke, T., and Röder, B. (2008). Action goal selection and motor planning can be dissociated
by tool use. Cognition, 109, 363–71.
Craig, J.C., and Belser, A.N. (2006). The crossed-hands deficit in tactile temporal-order judgments: the
effect of training. Perception, 35, 1561–72.
Cuevas, I., Plaza, P., Rombaux, P., De Volder, A.G., and Renier, L. (2009). Odour discrimination and
identification are improved in early blindness. Neuropsychologia, 47, 3079–83.
De Volder, A.G., Bol, A., Blin, J., et al. (1997). Brain energy metabolism in early blind subjects: neural
activity in the visual cortex. Brain Research, 750, 235–44.
Driver, J., and Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ‘sensory-
specific’ brain regions, neural responses, and judgments. Neuron, 57, 11–23.
Eimer, M. (2001). Crossmodal links in spatial attention between vision, audition, and touch: evidence from
event-related brain potentials. Neuropsychologia, 39, 1292–1303.
Eimer, M., Cockburn, D., Smedley, B., and Driver, J. (2001). Cross-modal links in endogenous spatial
attention are mediated by common external locations: evidence from event-related brain potentials.
Eimer, M., and van Velzen, J. (2006). Covert manual response preparation triggers attentional modulations
of visual but not auditory processing. Clinical Neurophysiology, 117, 1663–74.
Fieger, A., Röder, B., Teder-Salejarvi, W., Hillyard, S.A., and Neville, H.J. (2006). Auditory spatial tuning in
late-onset blindness in humans. Journal of Cognitive Neuroscience, 18, 149–57.
Fiehler, K., Burke, M., Bien, S., Röder, B., and Rösler, F. (2009a). The human dorsal action control system
develops in the absence of vision. Cerebral Cortex, 19, 1–12.
Fiehler, K., Reuschel, J., and Rösler, F. (2009b). Early non-visual experience influences proprioceptive-
spatial discrimination acuity in adulthood. Neuropsychologia, 47, 897–906.
Fine, I., Wade, A.R., Brewer, A.A., et al. (2003). Long-term deprivation affects visual perception and cortex.
Nature Neuroscience, 6, 915–916.
Flor, H., Nikolajsen, L., and Staehelin Jensen, T. (2006). Phantom limb pain: a case of maladaptive CNS
plasticity? Nature Review Neuroscience, 7, 873–81.
Föcker, J., Hötting, K., Gondan, M., and Röder, B. (2010). Unimodal and crossmodal gradients of spatial
attention: Evidence from event-related potentials. Brain Topography, 23, 1–13.
Gherri, E., Driver, J., and Eimer, M. (2008). Eye movement preparation causes spatially-specific
modulation of auditory processing: new evidence from event-related brain potentials. Brain Research,
1224, 88–101.
Gibson, E.J. (1969). Principles of perceptual learning and development. Appleton-Century-Crofts, New York.
Gizewski, E.R., Gasser, T., de Greiff, A., Boehm, A., and Forsting, M. (2003). Cross-modal plasticity for
sensory and motor activation patterns in blind subjects. Neuroimage, 19, 968–75.
Goebel, R., and van Atteveldt, N. (2009). Multisensory functional magnetic resonance imaging: a future
perspective. Experimental Brain Research, 198, 153–64.
Gondan, M., Lange, K., Rösler, F., and Röder, B. (2004). The redundant target effect is affected by modality
switch costs. Psychonomic Bulletin and Review, 11, 307–313.
Gottlieb, G. (1991). Experiental canalization of behavioral development: theory. Developmental Psychology,
27, 4–13.
Gougoux, F., Lepore, F., Lassonde, M., Voss, P., Zatorre, R.J., and Belin, P. (2004). Neuropsychology: pitch
discrimination in the early blind. Nature, 430, 309.
Gougoux, F., Zatorre, R.J., Lassonde, M., Voss, P., and Lepore, F. (2005). A functional neuroimaging study
of sound localization: visual cortex activity predicts performance in early-blind individuals. PLoS
Biology, 3, e27.
Gougoux, F., Belin, P., Voss, P., Lepore, F., Lassonde, M., and Zatorre, R.J. (2009). Voice perception in
blind persons: A functional magnetic resonance imaging study. Neuropsychologia, 47, 2967–74.
Heed, T., and Röder, B. (2010). Common Anatomical and External Coding for Hands and Feet in Tactile
Attention: Evidence from Event-related Potentials. Journal of Cognitive Neuroscience, 22, 184–202.
Hillyard, S.A., Simpson, G.V., Woods, D.L., VanVoorhis, S., and Münte, T.F. (1984). Event-related brain
potentials in selective attention to different modalities. In Cortical integration (eds. F. Reinoso-Suárez,
and C. Ajmone-Marsan), pp. 395–414. Raven Press, New York.
Hötting, K., Rösler, F., and Röder, B. (2003). Crossmodal and intermodal attention modulate event-related
brain potentials to tactile and auditory stimuli. Experimental Brain Research, 148, 26–37.
Hötting, K., Rösler, F., and Röder, B. (2004). Altered auditory-tactile interactions in congenitally blind
humans: an event-related potential study. Experimental Brain Research, 159, 370–81.
Hyvarinen, J., Hyvarinen, L., and Linnankoski, I. (1981). Modification of parietal association cortex and
functional blindness after binocular deprivation in young monkeys. Experimental Brain Research, 42, 1–8.
Karnath, H.O., Reich, E., Rorden, C., Fetter, M., and Driver, J. (2002). The perception of body orientation
after neck-proprioceptive stimulation. Effects of time and of visual cueing. Experimental Brain
Research, 143, 350–58.
Kennett, S., Taylor-Clarke, M., and Haggard, P. (2001). Noninformative vision improves the spatial
resolution of touch in humans. Current Biology, 11, 1188–91.
King, A.J. (2009). Visual influences on auditory spatial learning. Philosophical Transactions of the Royal
Society of London. Series B: Biological Sciences, 364, 331–39.
Kirchner, H., and Colonius, H. (2005). Cognitive control can modulate intersensory facilitation:
speeding up visual antisaccades with an auditory distractor. Experimental Brain Research, 166,
440–44.
Kitazawa, S. (2002). Where conscious sensation takes place. Consciousness and Cognition, 11, 475–77.
Klatzky, R.L., Golledge, R.G., Loomis, J.M., Cicinelli, J.G., and Pellegrino, J.W. (1995). Performance of
blind and sighted persons on spatial tasks. Journal of Visual Impairment and Blindness, 89, 70–82.
REFERENCES 319
Klinge, C., Röder, B., and Büchel, C. (2010). Increased amygdala activation to emotional auditory stimuli in
the blind. Brain, 133, 1729–36.
Knudsen, E.I. (2004). Sensitive periods in the development of the brain and behavior. Journal of Cognitive
Lange, K., and Röder, B. (2010). Temporal orienting in audition, touch, and across modalities. In Attention
and time (Eds. A. C. Nobre and J. T. Coull), pp. 393–405. Oxford University Press.
Lepore, N., Voss, P., Lepore, F., et al. (2010). Brain structure changes visualized in early- and late-onset
blind subjects. Neuroimage, 49, 134–40.
Brain Research. Cognitive Brain Research, 14, 41–63.
Lewkowicz, D.J., and Röder, B. (2011). Developing of multisensory processing and the role of early
experience. In The new handbook of multisensory processes (Ed. B. Stein). MIT Press, Cambridge, MA.
Longo, M.R., and Haggard, P. (2010). An implicit body representation underlying human position sense.
Proceedings of the National Academy of Sciences USA, 107, 11727–32.
Macaluso, E., Frith, C.D., and Driver, J. (2000). Modulation of human visual cortex by crossmodal spatial
attention. Science, 289, 1206–1208.
Marmor, G.S., and Zaback, L.A. (1976). Mental rotation by the blind: does mental rotation depend on
visual imagery? Journal of Experimental Psychology: Human Perception and Performance, 2, 515–21.
Maurer, D. (1997). Neonatal synaesthesia: implications for the processing of speech and faces.
In Synaestesia: classical and contemporary readings (Eds. S. Baron-Cohen and J. E. Harrison),
pp. 224–42. Blackwell, Oxford.
Maurer, D., Mondloch, C.J., and Lewis, T.L. (2007a). Effects of early visual deprivation on perceptual and
cognitive development. Progress in Brain Research, 164, 87–104.
Maurer, D., Mondloch, C.J., and Lewis, T.L. (2007b). Sleeper effects. Developmental Science, 10, 40–47.
Merabet, L.B., and Pascual-Leone, A. (2010). Neural reorganization following sensory loss: the opportunity
of change. Nature Review Neuroscience, 11, 44–52.
Millar, S. (2008). Space and sense. Psychology Press, New York.
Morrongiello, B.A., and Fenwick, K.D. (1991). Infants’ coordination of auditory and visual depth
information. Journal of Experimental Child Psychology, 52, 277–96.
Muchnik, C., Efrati, M., Nemeth, E., Malin, M., and Hildesheimer, M. (1991). Central auditory skills in
blind and sighted subjects. Scandinavian Audiology, 20, 19–23.
Mullette-Gillman, O.A., Cohen, Y.E., and Groh, J.M. (2009). Motor-related signals in the intraparietal
cortex encode locations in a hybrid, rather than eye-centered reference frame. Cerebral Cortex, 19,
1761–75.
Munakata, Y., Casey, B.J., and Diamond, A. (2004). Developmental cognitive neuroscience: progress and
potential. Trends in Cognitive Sciences, 8, 122–28.
Pascual-Leone, A., and Hamilton, R. (2001). The metamodal organization of the brain. In Progress in brain
research (Eds. C. Casanova and M. Ptito), pp. 427–45. Elsevier Science, Amsterdam.
Pavani, F., and Röder, B. (2011). Crossmodal plasticity as a consequence of sensory loss: Insights from
blindness and deafness. In The new handbook of multisensory processes (Ed. B. Stein). MIT Press,
Cambridge, MA.
Pouget, A., Deneve, S., and Duhamel, J.R. (2004). A computational neural theory of spatial representations.
In A computational neural theory of spatial representations (Eds. C. Spence and J. Driver), pp. 123–40.
Pouget, A., Ducom, J.C., Torri, J., and Bavelier, D. (2002). Multisensory spatial representations in eye-
centered coordinates for reaching. Cognition, 83, B1–B11.
Putzar, L., Goerendt, I., Lange, K., Rösler, F., and Röder, B. (2007a). Early visual deprivation impairs
multisensory interactions in humans. Nature Neuroscience, 10, 1243–45.
Putzar, L., Hötting, K., Rösler, F., and Röder, B. (2007b). The development of visual feature binding
processes after visual deprivation in early infancy. Vision Research, 47, 2618–28.
Putzar, L., Goerendt, I., Heed, T., Richard, G., Büchel, C., and Röder, B. (2010). The neural basis of lip-reading
capabilities is altered by early visual deprivation. Neuropsychologia, 48, 2158–66.
Rauschecker, J.P. (1995). Compensatory plasticity and sensory substitution in the cerebral cortex. Trends in
Raz, N., Amedi, A., and Zohary, E. (2005). V1 activation in congenitally blind humans is associated with
episodic retrieval. Cerebral Cortex, 15, 1459–68.
Ricciardi, E., Bonino, D., Sani, L., et al. (2009). Do we really need vision? How blind people ‘see’ the actions
of others. Journal of Neuroscience, 29, 9719–24.
Röder, B., and Rösler, F. (1998). Visual input does not facilitate the scanning of spatial images. Journal of
Mental Imagery, 22, 165–82.
Röder, B., and Rösler, F. (2004). Compensatory plasticity as a consequence of sensory loss. In The handbook
of multisensory processing. G Calvert (Eds. C. Spence and B. E. Stein), pp. 719–47. MIT Press,
Cambridge, MA.
Röder, B., Rösler, F., Hennighausen, E., and Nacker, F. (1996). Event-related potentials during auditory
and somatosensory discrimination in sighted and blind human subjects. Brain Research. Cognitive
Röder, B., Rösler, F., and Hennighausen, E. (1997). Different cortical activation patterns in blind
and sighted humans during encoding and transformation of haptic images. Psychophysiology,
34, 292–307.
Rösler, F., Röder, B., Heil, M., and Hennighausen, E. (1993). Topographic differences of slow event-related
brain potentials in blind and sighted adult human subjects during haptic mental rotation. Brain
Research. Cognitive Brain Research, 1, 145–59.
Röder, B., Rösler, F., and Neville, H.J. (1999a). Effects of interstimulus interval on auditory event-related
potentials in congenitally blind and normally sighted humans. Neuroscience Letters, 264, 53–56.
Röder, B., Teder-Salejarvi, W., Sterr, A., Rösler, F., Hillyard, S.A., and Neville, H.J. (1999b). Improved
auditory spatial tuning in blind humans. Nature, 400, 162–66
Röder, B., Stock, O., Bien, S., Neville, H., and Rösler, F. (2002). Speech processing activates visual cortex in
congenitally blind humans. European Journal of Neuroscience, 16, 930–36.
Röder, B., Rösler, F., and Spence, C. (2004). Early vision impairs tactile perception in the blind. Current
Biology, 14, 121–24.
Röder, B., Kramer, U.M., and Lange, K. (2007a). Congenitally blind humans use different stimulus
selection strategies in hearing: an ERP study of spatial and temporal attention. Restorative Neurology
and Neuroscience, 25, 311–22.
Röder, B., Kusmierek, A., Spence, C., and Schicke, T. (2007b). Developmental vision determines the
reference frame for the multisensory control of action. Proceedings of the National Academy of Sciences,
104, 4753–58.
REFERENCES 321
Röder, B., Föcker, J., Hötting, K., and Spence, C. (2008). Spatial coordinate systems for tactile spatial
attention depend on developmental vision: evidence from event-related potentials in sighted and
congenitally blind adult humans. European Journal of Neuroscience, 28, 475–83.
Sacks, O. (1995). To see and not to see. In An anthropologist on Mars (Ed. O. Sacks), pp. 108–52. Vintage
Books, New York.
Sadato, N., Pascual-Leone, A., Grafman, J., Deiber, M.P., Ibanez, V., and Hallett, M. (1998). Neural
networks for Braille reading by the blind. Brain, 121, 1213–29.
Sadato, N., Pascual-Leone, A., Grafman, J., et al. (1996). Activation of the primary visual cortex by Braille
reading in blind subjects. Nature, 380, 526–28.
Schicke, T., and Röder, B. (2006). Spatial remapping of touch: Confusion of perceived stimulus order
across hand and foot. Proceedings of the National Academy of Sciences, 103, 11808–11813.
Shore, D.I., Spry, E., and Spence, C. (2002). Confusing the mind by crossing the hands. Brain Research.
Simon, J.R., Hinrichs, J.V., and Craft, J.L. (1970). Auditory S-R compatibility: reaction time as a function of
ear-hand correspondence and ear-response-location correspondence. Journal of Experimental
Spence, C., and Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford University Press,
Oxford.
single neuron. Nature Review Neuroscience, 9, 255–66.
Stevens, C., and Neville, H. (2009). Profiles of development and plasticity in human neurocognition. In The
cognitive neurosciences IV, 4th edn. (Ed. M. Gazzaniga), pp. 165–81. MIT Press, Cambridge, MA.
Thinus-Blanc, C., and Gaunet, F. (1997). Representation of space in blind persons: vision as a spatial sense?
Psychological Bulletin, 121(1), 20–42
Turkewitz, G. and Kenny, P.A. (1982). Limitations on input as a basis for neural organization and
perceptual development: a preliminary theoretical statement. Developmental Psychobiology, 15, 357–68.
Van Velzen, J., Eardley, A.F., Forster, B., and Eimer, M. (2006). Shifts of attention in the early blind: an
ERP study of attentional control processes in the absence of visual spatial information.
Vanlierde, A., de Volder, A.G., Wanet-Defalque, M.C., and Veraart, C. (2003). Occipito-parietal cortex
activation during visuo-spatial imagery in early blind humans. Neuroimage, 19, 698–709.
Von Holst, E., and Mittelstaedt, H. (1950). Das Reafferenzprinzip. Die Naturwissenschaften, 37, 464–76.
Voss, P., Gougoux, F., Lassonde, M., Zatorre, R.J., and Lepore, F. (2006). A positron emission tomography
study during auditory localization by late-onset blind individuals. Neuroreport, 17, 383–88.
Voss, P., Lassonde, M., Gougoux, F., Fortin, M., Guillemot, J.P., and Lepore, F. (2004). Early- and
late-onset blind individuals show supra-normal auditory abilities in far-space. Current Biology, 14,
1734–38.
Wallace, M.T., Perrault, T.J., Jr., Hairston, W.D., and Stein, B.E. (2004). Visual experience is necessary for
the development of multisensory integration. Journal of Neuroscience, 24, 9580–84.
Warren, D.H. (1994). Blindness and children: an individual differences approach. Cambridge University
Press, New York.
Warren, D.H., and Pick, H.L., Jr. (1970). Intermodality relations in localization in blind and sighted people.
Weaver, K.E., and Stevens, A.A. (2007). Attention and sensory interactions within the occipital cortex in
the early blind: an fMRI study. Journal of Cognitive Neuroscience, 19, 315–30.
Weisser, V., Stilla, R., Peltier, S., Hu, X., and Sathian, K. (2005). Short-term visual deprivation alters neural
processing of tactile form. Experimental Brain Research, 166, 572–82.
Wiesel, T.N., and Hubel, D.H. (1965). Comparison of the effects of unilateral and bilateral eye closure on
cortical unit responses in kittens. Journal of Neurophysiology, 28, 1029–40.
Yamamoto, S., and Kitazawa, S. (2001). Reversal of subjective temporal order due to arm crossing. Nature
Zangaladze, A., Epstein, C.M., Grafton, S.T., and Sathian, K. (1999). Involvement of visual cortex in tactile
discrimination of orientation. Nature, 401, 587–90.
Part C
Neural, computational, and

evolutionary mechanisms in
Chapter 14
Development of multisensory
integration in subcortical and
cortical brain networks
Mark T. Wallace, Dipanwita Ghose, Aaron R. Nidiffer,
Matthew C. Fister, and Juliane Krueger Fister
14.1 Introduction
Recent years have witnessed an explosion of interest in multisensory processes. A wealth
of behavioural and perceptual studies have detailed the benefits (and detriments) of having
combined information from multiple sensory modalities, and in the last 25 years a much better
understanding of the neural underpinnings of these multisensory phenomena has emerged.
Collectively, these studies have highlighted that combined use of multiple sensory channels
can improve detectability, reaction and response times, localization and can induce a host of
perceptual illusions (for a review of this literature see Calvert et al. 2004). On the biological and
physiological side, this work has revealed not only the basic operations performed by multisen-
sory neurons on their different convergent inputs, but also that multisensory processes are much
more broadly distributed throughout the nervous system than originally thought, with interac-
tions being revealed even in early sensory cortical domains (Stein and Meredith 1993; Stein and
Stanford 2008).
Despite this resurgence of interest in multisensory processes, our knowledge of the developmental
events leading up to the creation of a mature multisensory state has lagged behind. This is despite
the presence of a long and rich history of human inquiry into the development of sensory systems
in human infants and children, an interest bolstered by our own personal observations as we watch
our children grow (Piaget 1952). Underscoring the difficulty of this problem is the relatively recent
history of the field of developmental psychobiology, which has provided a host of views and theories
as to the chronology of sensory (and multisensory) development in humans (for a review of this
literature see Lewkowicz 1994; Lewkowicz and Ghazanfar 2009). At the extremes, these theories
posit either an advancing sequence that moves from unisensory to multisensory as development
progresses, or the reverse (i.e. the human newborn is obliged to perceive synaesthetically).
Although substantial evidence has been offered for each of these views (as well as for intermedi-
ate perspectives), there is an inherent and seemingly insurmountable problem associated with
these studies—that because of the immaturity of all systems at these early ages (e.g. language,
motoric, etc.), measuring and indexing multisensory function is difficult and laden with interpre-
tational caveats. For example, we can directly measure multisensory-induced behavioural facilita-
tions on a simple reaction task by asking adult subjects to push a button as quickly as possible.
However, measuring facilitative benefits in infants proves much more difficult (although see Neil
et al. 2006) for a reaction-time task which can be used with infants). Moreover, in adults we can
326 DEVELOPMENT OF MULTISENSORY INTEGRATION IN SUBCORTICAL AND CORTICAL BRAIN NETWORKS
measure multisensory-mediated gains in higher-order processes such as speech comprehension

by asking participants to report on their percepts, but such a task is not possible in prelinguistic
children.
Undoubtedly, the best evidence to date on multisensory processes in human infants has come
from preferential looking studies, in which the babies’ looking time at one of two visual stimuli or
events is the dependent measure. This work has highlighted the complex interplay that is taking
place between perceptual narrowing and broadening processes during early human development
(see Chapter 8 by Bahrick and Lickliter, Chapter 7 by Lewkowicz, and Chapter 4 by Streri).
Nonetheless, and despite the significant insights that have been gained from this work, we still
remain woefully ignorant about the developmental events and progression leading to the creation
of a mature multisensory state. A complementary approach to these questions is to study animal
models, which can be used to study the development of multisensory neural networks. With an
understanding of some of the fundamental developmental milestones in these systems, one can
then begin to make inferences about how the developmental state of these circuits maps onto
known behavioural and perceptual capabilities. The following sections detail some of the key
observations concerning multisensory development (and its malleability) that have been gleaned
from such studies.
14.2 Developmental models for neurally based

multisensory studies
One of the preeminent model species for studies of sensory (and multisensory) processes has been
the cat. The reasons for this are manifold, but are partly a result of the marked visual, auditory,
and tactile acuity of this species, and the striking commonalities in the organization of its sensory
systems (and sensory brain regions) and those of primate species, including man.
Seminal studies into the operations performed by individual multisensory neurons were first
carried out in the cat midbrain superior colliculus (SC). The cat SC was an ideal model for this
work for a number of reasons, most notably because it was known as a site for sensory conver-
gence from at least three different sensory systems (vision, hearing, and touch), because its stere-
otyped topographic organization made it very amenable to neurophysiological analyses, and
because of the SC’s well-defined and well-characterized role in the orientation of gaze (see Stein
and Meredith 1993).
Although the SC has remained an important model for multisensory research, more recent
interest has focused on better detailing the organization and response characteristics of cortical
multisensory domains, with the rationale that perceptual processes are more the domain of cortex
as opposed to subcortex. In the cat, this work has focused on the cortex of the anterior ectosylvian
sulcus (AES), an enigmatic cortical area comprised of three overlapping unisensory representa-
tions (visual, auditory, and somatosensory) whose role in perceptual processes remains to be
elucidated (Stein and Wallace 1996; Wallace et al. 1992).
Although these studies in cat subcortical and cortical multisensory structures represent the
foundation of work completed to date on questions of multisensory neural encoding (and as
outlined below have provided a set of fundamental principles for multisensory processing),
recent work has expanded the number of species investigated in multisensory research to include
rodents, ferrets, and non-human primates (Allman et al. 2008; Avillac et al. 2007; Bizley et al.
2007; Foxe et al. 2002; Ghazanfar et al. 2005; Kayser et al. 2005; King and Palmer 1985; Wallace
et al. 1996, 2004b). In each of these species, the basic operations performed by multisensory
neurons and networks appear to be strikingly similar, suggesting a universal set of principles that
govern the integration of sensory information. Briefly, multisensory integration in this context
MULTISENSORY FUNCTION IN ADULT CIRCUITS 327
refers to the active process of evaluating and synthesizing information arriving from more than
one sensory modality at the single-neuron level, a process that typically can dramatically trans-
form the information encoded by that neuron (as evidenced by firing-rate changes under multi-
sensory conditions).
From a developmental perspective, the cat has been the preferred species to date, not only for
the reasons outlined above, but also because of its relatively short gestation time and the altricial
(i.e. relatively immature) state of the newborn nervous system. Although most of the subsequent
narrative will summarize developmental studies carried out in the cat model, it is important to
point out that both rodents and non-human primates offer distinct advantages over cat (i.e. ease
of developmental perturbation/manipulation and closer parallels to the human developmental
processes, respectively) and are becoming increasingly important model species. Likewise, the
ferret, another carnivore, is gaining popularity in the multisensory arena (Bizley et al. 2007;
Keniston et al. 2009; Meredith and King 2004; Ramsay and Meredith 2004). Although lacking
some of the foundational work needed to relate unisensory and multisensory systems, the ferret
is much more readily trained in behavioural studies than the cat. Hence, future studies structured
to more directly examine the correlative and causal links between multisensory networks and
behavioural and perceptual processes may be more tractable in the ferret model.
14.3 The framework for developmental studies: multisensory

function in adult circuits
Before detailing the developmental chronology of multisensory neurons and the networks they
contribute to, it is necessary to provide a basic foundation of knowledge concerning the charac-
teristics of adult multisensory neurons. By definition, a multisensory neuron is any neuron that
responds to or is influenced by stimuli from more than a single sensory modality. If we look in
either subcortical or cortical multisensory domains in cat and monkey, we find that these neurons
share a number of similarities.
First, the receptive fields of these neurons typically share a great deal of commonality, repre-
senting very similar regions of sensory space regardless of the sensory modality (Meredith and
Stein 1986a; Wallace et al. 1992; Wallace and Stein 1996). Although far from surprising in spa-
tially ordered structures like the SC, the presence of such receptive field register in less spatially
ordered cortical domains suggests that this register is a common organizational feature of multi-
sensory neurons. As an example of this receptive field register, a visual–auditory multisensory
neuron in which the visual receptive field is located in superior frontal space will have an auditory
receptive field representing a very similar portion of space.
Perhaps the most compelling characteristic of the multisensory neurons is the manner in which
they respond when confronted with stimuli from two or more modalities. Often these neurons
exhibit large changes in their activity when presented with stimulus combinations, changes that
frequently differ from what would be predicted based on the component unisensory (e.g. visual
alone, auditory alone) responses. Such changes can be (and have been) quantified in two ways.
Traditionally these responses have been evaluated relative to the largest of the unisensory responses
using a proportional measure called the interactive index (e.g. see Meredith and Stein 1986b). As
an example of this measure, a visual–auditory neuron that responds to a visual stimulus with an
average of 5 spikes per trial and that responds to the visual–auditory combination with 10 spikes
per trial is said to exhibit a 100% response enhancement. On the other hand, under certain
circumstances, the combined response may only be 2 spikes per trial, in which case we would
refer to a 60% response depression. More recent analyses have adopted a second measure of the
multisensory response that incorporates both unisensory responses, and compares these to the
predicted addition of these two responses. Mean statistical contrast thus divides the multisensory
response into superadditive, additive, and subadditive categories (e.g. Perrault et al. 2005; Stanford
et al. 2005; Stein and Stanford 2008). As an example of this metric, a given neuron may show a
visual response of 6 spikes per trial, a somatosensory response of 4 spikes per trial, and a com-
bined response of 14 spikes per trial—resulting in its categorization as superadditive. In contrast,
if the combined response were only 8 spikes per trial we would label this neuron as subadditive.
Using these two measures to quantify multisensory interactions, it has also been well estab-
lished that the types of interactions one sees (e.g. enhancement, depression, superadditivity, etc.)
are very much dependent on the physical characteristics and relationships of the stimuli to one
another. Three principles have been put forth to capture the basic aspects of these combinatorial
relationships. The spatial principle states that response gains are the typical result of pairings of
stimuli within their respective receptive fields (Meredith and Stein 1986a). In contrast, when one
of these stimuli is outside of its receptive field, either no interaction or a response depression is
observed. Given the marked overlap in the different receptive fields of individual multisensory
neurons, such an organization makes intuitive sense in that sensory cues arising from the same
event are highly likely to fall within both of these receptive fields. The temporal principle is the
time-based correlate of this, and basically reflects the fact that the largest response enhancements
are seen to stimuli that occur in close temporal proximity (Meredith et al. 1987). Nonetheless,
response enhancements are seen over a temporal ‘window’ that generally spans several hundreds
of milliseconds. Finally, the inverse effectiveness principle captures the fact that the largest gains in
response are seen when the individual stimuli are weakly effective on their own (Meredith and
Stein 1986b). As the efficacy of the individual stimuli in eliciting responses grow, the size of the
multisensory interaction declines.
14.4 The developmental chronology of multisensory circuits

With these adult observations establishing the framework for comparative studies, focus shifted
to an examination of the developmental emergence of multisensory neural processes leading up
to the mature state. For the reasons articulated above, these initial developmental studies were
conducted in the cat, and have been partially reinforced in non-human primate work. In addi-
tion, this work was first carried out in the SC, but has now been extended to multisensory cortical
domains as well. Outside of differences in the timeframe of the developmental events (reflecting
the later maturation of cortex relative to subcortex), the basic developmental processes across
different brain regions are strikingly similar.
Recordings from the SC of newborn kittens revealed there to be no neurons with multisensory
response properties at birth (Wallace and Stein 1997; Wallace et al. 2006). Indeed, the only sen-
sory responses seen at this age are somatosensory, and even these neurons are relatively few and
far between. As development progresses, auditory responses are first seen at about the time that
the apparatus of the middle ear opens, late in the first week of postnatal life. During this same
period the incidence of somatosensory-responsive neurons is on the rise. In the multisensory lay-
ers of the SC, visual responses are the last to appear, being first seen between the second and third
postnatal weeks. Not surprisingly, the appearance of the first multisensory neurons is coincident
with the appearance of responses from two different sensory modalities in the SC. Thus, by the
second week of postnatal life, at a time when somatosensory and auditory inputs are present in
the SC, auditory-somatosensory multisensory neurons are first found. From this point forward,
the proportion of multisensory neurons grows over the ensuing 3–4 months, until the adult mul-
tisensory representation, which typically represents 60–70% of the sensory-responsive popula-
tion in the deep layers of the SC, is attained (Fig. 14.1A).
THE DEVELOPMENTAL CHRONOLOGY OF MULTISENSORY CIRCUITS 329
A B
70 Newborn
SC
60 S (23.2%)
A (12.6%) VA (5.3%)
VS (7.4%)
% Multisensory cells
50 M (14.7%)
AS (1%)
V (49.5%)
40 VAS (1%)
30
AES Adult
20
S (17.6%)
VA (11.1%)
10 A (17.6%) VS (9.3%)
M (27.8%)
AS (0.9%)
0 V (37%)
VAS (6.5%)
0 5 10 15 20 adult
Postnatal age (weeks)
Fig. 14.1 (A) The developmental chronology of multisensory neurons in the deep layers of the cat
superior colliculus (SC) and the cat anterior ectosylvian sulcus (AES). The percentage of multisensory
neurons is plotted as a function of postnatal age. (B) Modality convergence patterns in the SC of
the newborn and adult monkey. Pie charts show the distributions of all recorded sensory responsive
neurons in the multisensory laminas (IV–VII) of the SC. A, auditory; V, visual; S, somatosensory;
M, multisensory; AV, audiovisual; VS, visuo-somatosensory, AS, audio-somatosensory; VAS,
trimodal. (Reproduced from Wallace, M.T., Carriere, B.N., Perrault, T.J., Jr., Vaughan, J.W., and
Stein, B.E., The development of cortical multisensory integration, Journal of Neuroscience, 26,
pp. 11844–11849 © 2006 Society for Neuroscience, with permission.) (Reproduced in colour in the
colour plate section.)
In the newborn rhesus monkey, the situation is a bit different, in that the more precocial devel-
opmental state of this species advances the timeline for the appearance of sensory responses
(Wallace and Stein 2001). In this species, visual, auditory, and somatosensory responses are
found in the SC immediately after birth, and there is a corresponding population of multisensory
neurons. Nonetheless, at this age the proportion of multisensory neurons is far from that seen in
the adult, suggesting a period of postnatal development is necessary prior to the formation of an
adult multisensory representation (Fig. 14.1B).
In the best-studied multisensory cortical structure, the cat AES, the sequence for the appear-
ance of the different modalities is identical to the SC, but the process appears to be delayed
by several weeks. Although there are hints of somatosensory responses at earlier ages, the first
convincing tactile responses are seen at about postnatal week four (Wallace et al. 2006). These
responses are clustered in the anteriormost portions of the AES, in the area that will become the
fourth somatosensory cortex as development progresses (SIV, see Clemo and Stein 1983). By the
eighth postnatal week, auditory responses are seen in the caudal portions of the dorsal bank of
the AES (Wallace et al. 2006), in the location of the presumptive field AES (FAES, see Clarey
and Irvine 1986). As with the SC, the presence of both somatosensory and auditory responses in
the AES by this age allows for the beginnings of a multisensory (i.e. somatosensory-auditory)
representation. In a pattern that will be retained into the adult state (Wallace et al. 1992), these
multisensory neurons are clustered at the borders between the two major unisensory domains
(i.e. SIV and FAES). Finally, by week 12, visually-responsive neurons are apparent in AES (Wallace
et al. 2006). These neurons are found in the caudal and ventral extent of AES, in a zone that will
become the adult anterior ectosylvian visual area (AEV, see Mucke et al. 1982; Olson and Graybiel
1987). With the appearance of this visually-responsive population, the complete complement of
multisensory neurons is found in AES. As in the SC, the proportion of multisensory neurons in
AES rises during the first 3–4 months of postnatal life, ultimately reaching a final distribution of
20–25% of the total AES sensory-responsive population (Fig. 14.1A).
14.5 The development of multisensory neuronal

response properties
Despite their presence early in postnatal life, multisensory neurons are far from mature at these
ages. When compared with adults, neuronal responses have strikingly different temporal dynam-
ics in these young neurons, as manifested in features such as response latency, variability, and
habituation/adaptation. One of the most striking differences between neonatal and adult multi-
sensory neurons is in the size of their receptive fields. In both cat and monkey SC, as well as in the
cat AES, the receptive fields of early multisensory neurons are substantially larger than their adult
counterparts (Wallace and Stein 1997, 2001; Wallace et al. 2006). In fact, in many circumstances
the earliest multisensory neurons respond to stimuli at all locations in sensory space that can be
sampled by the peripheral organ. In each of these models, receptive-field size declines dramati-
cally over the course of postnatal life, thus revealing the tight receptive-field correspondence
between the individual receptive fields of a given multisensory neuron.
Perhaps the most dramatic difference in these early multisensory neurons is in the manner
in which they respond to multisensory stimuli. In contrast to their adult counterparts, these neu-
rons fail to show significant changes in their response profiles when challenged with stimuli from
multiple modalities (Fig. 14.2—see Wallace and Stein 1997, 2001; Wallace et al. 2006). The most
typical response profile for these early multisensory neurons is a multisensory response that is
indistinguishable from one or both of the constituent unisensory responses. Although once
labelled as an absence of multisensory integration, more recent studies would revise this descrip-
tion to say that these neurons are performing largely subadditive operations (Perrault et al. 2005;
A 8 weeks B 20 weeks
S S
T T
A A A A
S S
25 10 400 25 V V
10 400
VA
Mean impulses/trial
Mean impulses/trial
*
% Interaction
% Interaction
Impulses
Impulses
V
S SA A
A
0 0 0 0 0 0
0% +235%
Fig. 14.2 Multisensory integration is absent in the earliest AES neurons and develops during
postnatal life. (A) Shown at the top are the receptive fields (shading) and stimulus locations (icons)
used in sensory testing of this auditory-somatosensory AES neuron in an 8 week old animal. Rasters
and peristimulus time histograms show the neuron's responses to somatosensory (S), auditory (A),
and combined (SA) stimulation. Bar graphs summarize these responses and show the lack of any
multisensory interaction. (B) Multisensory integration in a visual-auditory neuron at 20 weeks of age.
A, auditory stimulation; V, visual stimulation; VA, combined visual-auditory stimulation. Conventions
are same as in (A). P<0.01, t test. (Reproduced from Wallace, M.T., Carriere, B.N., Perrault, T.J., Jr.,
Vaughan, J.W., and Stein, B.E., The development of cortical multisensory integration, Journal of
Neuroscience, 26, pp. 11844–11849 © 2006 Society for Neuroscience, with permission.)
THE ROLE OF SENSORY EXPERIENCE IN MULTISENSORY DEVELOPMENT 331
Stanford et al. 2005). Even in monkey, where there is a substantial multisensory population present
at birth in the SC, these early neurons appear to be capable of only subadditive interactions.
As postnatal development progresses, multisensory neurons indeed acquire the ability to trans-
form their different sensory inputs into products that are additive/superadditive, a hallmark
feature of adult multisensory circuits. Somewhat intriguingly, this transition from an early
subadditive state to a more additive/superadditive state appears to happen very abruptly in any
given neuron, since there are very few instances of neurons with intermediate or transitional
response profiles (Wallace and Stein 1997). Furthermore, this transition appears to be intimately
tied to receptive-field architecture, in that neurons in developing animals with large receptive
fields predominantly exhibit subadditive interactions whereas neurons with more adult-like
receptive fields are much more likely to be categorized as additive/superadditive.
The sharp transition from one response mode to another (i.e. subadditive to superadditive) in
developing SC neurons appears to be gated by the functional maturation of inputs from cortex
(Wallace and Stein 1994, 2000). Whereas in early multisensory SC neurons the functional deacti-
vation of cortical domains known to target the SC has little influence on multisensory integration,
as soon as an SC neuron exhibits the capacity to perform additive/superadditive operations,
these operations can be compromised by cortical deactivation (Wallace and Stein 2000). Further
reinforcing the importance of these corticotectal inputs, ablation of these areas early in postnatal
life results in the formation of an SC limited to subadditive interactions (Jiang et al. 2001). When
tested behaviourally, these animals also have little capacity to use multisensory cues to facilitate
orientation responses (Jiang et al. 2006).
In multisensory cortical domains such as the AES, the developmental sequence is strikingly
similar to that seen in the SC, with early subadditive neurons acquiring their more interesting
capabilities as development progresses. Here, the developmental switch that is gating this shift of
interactive mode remains unknown. As with the development of the multisensory representations
as a whole, the proportion of neurons with additive/superadditive response modes increases over
the first 3–4 months of postnatal life in cat.
14.6 The role of sensory experience in multisensory

development
The protracted timeline over which multisensory processes develop suggests that sensory experience
may play an integral role in the shaping of the final functional state of these circuits. To test this
hypothesis, our laboratory has carried out experiments in which early sensory experiences are
altered, and then examined the consequent impact on multisensory neurons and their integrative
characteristics. In the first of these experiments, cats were raised in an environment devoid of visual
experience from birth until adulthood, hence abolishing the animal’s experience with associating
visual and non-visual cues. Somewhat surprisingly, when examined as adults, these animals were
found to have a fairly normal complement of multisensory neurons in both subcortical (i.e. SC) and
cortical multisensory structures, which included a substantial population of visually responsive
multisensory neurons (Carriere et al. 2007; Wallace et al. 2004a; Wallace and Stein 2007). However,
these neurons were strikingly different from their normally reared counterparts, in that their ability
to integrate crossmodal cues was severely altered by the dark-rearing. Intriguingly, substantial dif-
ferences were noted in the effects of dark-rearing on SC versus cortical multisensory processes.
Whereas in the visually deprived group the vast majority of multisensory neurons in the SC showed
no significant changes in response upon crossmodal pairings (Wallace et al. 2004a), in AES the effect
in many neurons was a transition from response enhancements to response depressions (Carriere
et al. 2007). Here, pairings that typically resulted in significant response gains (i.e. enhancements,
which could be additive or superadditive) in normally-reared animals resulted in significant response

depressions. Such an outcome suggests that dark-rearing may alter the local circuit relations in the
cortex such that inhibition is now favoured over excitation.
In addition to eliminating sensory experience in one modality, more recent experiments have
examined the impact of altering the statistical relationships between multisensory stimuli early in
development and then testing the consequent impact on these same circuits. As alluded to earlier,
in ‘normal’ environments spatial and temporal proximity are powerful cues as to the relatedness
of crossmodal cues. What happens if these cues are now yoked in ways that violate the normal
physical world? To do this, cats were raised in environments in which the spatial relationship
between visual and auditory stimuli was systematically altered. Here, the presentation of a visual
stimulus was invariably linked to the presentation of an auditory cue from a different (but fixed)
spatial location, violating the ‘typical’ circumstance in which these cues would normally be spa-
tially congruent (and which likely serves as the substrate for the ‘spatial principle’ described ear-
lier). After raising cats in such an environment, neurophysiological recordings from adults
revealed a marked reorganization in the architecture and processing features of visual-auditory
multisensory neurons (Wallace and Stein 2007). First, these neurons were found to have spatial
receptive fields that had shifted in order to reflect the altered physical world. Thus, if the rearing
environment was such that the auditory cues were always displaced by 30° relative to the visual
cues, the receptive fields were found to be misaligned by a comparable amount. These findings
closely parallel prior studies conducted in birds (i.e. owls—see Knudsen and Brainard 1991) and
mammals (i.e. ferrets—see King et al. 1988), which demonstrated similar shifts in spatial repre-
sentations with developmental perturbations. However, in the current work, we have extended
these findings to show that this reordered receptive field architecture now forms the basis for the
integrative capabilities of these neurons, such that stimuli separated by 30° were those that gave
rise to the maximal multisensory interactions (Fig. 14.3).
Recently, preliminary studies have extended these findings into the temporal domain, and have
shown that raising animals in a world in which visual and auditory stimuli are always presented
in a spatially aligned but temporally disparate manner (i.e. a visual stimulus comes on and is fol-
lowed by an auditory stimulus 100 ms later), results in a shift in the temporal-tuning functions of
these neurons such that they ‘prefer’ the time lag experienced during postnatal life. Although still
in their early stages, these experiments also suggest that there is a limit to the degree of temporal
disparity that can be tolerated and still result in a neuron with significant interactive capabilities.
When the visual and auditory stimuli were separated not by 100 ms, but by 250 ms, the ability of
the neuron to support additive/superadditive interactions was abolished. This finding has impor-
tant mechanistic implications, in that it suggests that there is a biophysical constraint on the
integrative process, and narrows the list of likely cellular and molecular processes that govern the
way in which multisensory inputs are synthesized.
Taken as a whole, these findings suggest a powerful role for sensory experience in shaping the
final state of the multisensory processing circuits. In addition, these studies argue that sensory
experience by itself is sufficient to engender substantial change in these developing circuits, a
finding in keeping with the large body of work that has documented developmental plasticity in
unisensory systems and in which alterations in the sensory statistics are sufficient to induce sub-
stantial change (for reviews see Katz and Crowley 2002; Hensch 2004, 2005). Furthermore, the
inability of these same sensory manipulations to drive change in adult systems reinforces the
concept of a sensitive period for multisensory plasticity. Waiting to be addressed is whether plas-
ticity can be restored if these same statistical manipulations are paired with reinforcement, as has
been shown for the individual sensory systems (Ahissar et al. 1992; Blake et al. 2006; Kilgard and
Merzenich 1998).
RECEPTIVE-FIELD ARCHITECTURE AS A KEY DETERMINANT OF MULTISENSORY INTEGRATION 333
A Control B Spatial-disparity reared

S S
T T
A A A A
V V V V
25 35
Impulses
Impulses
0 0
10 P<0.05* 200 P<0.05*

16 200
Mean impulses/trial
Mean impulses/trial
% Interaction
% Interaction
Sum
Sum
0 0 0 0
V A VA +154% V A VA +144%
Fig. 14.3 The spatial constraints of multisensory integration appear dependant on the experiences
received during development. The figure shows data from two neurons, one from a control animal (A)
and one from a spatial disparity reared animal (B). Top: receptive fields (visual, dark grey shading;
auditory, light grey shading) of these neurons, along with the locations of the stimuli used to assess
multisensory integration. Middle: responses of these neurons to visual, auditory and combined visual-
auditory (i.e. multisensory) stimulation. Peristimulus time histograms depict the summed neural
responses for a total of 15 trials for each condition with the ramps and square waves at the top
showing the timing of the visual and auditory stimulus, respectively. Summary bar graphs at the
bottom show the mean responses for each condition, along with the magnitude of the multisensory
enhancement (furthest right bar in each graph) and predicted sum of the visual and auditory
responses (dashed line). (Reproduced from Mark T. Wallace and Barry E. Stein, Early Experience
Determines How the Senses Will Interact, Journal of Neurophysiology, 97 (1), pp. 921–926 © 2007
The American Physiological Society with permission.)
14.7 Receptive-field architecture as a key determinant

of multisensory integration
These studies examining the role of sensory experience in the development of multisensory
circuits illustrate the important linkage that exists between receptive-field organization and a given
neuron’s integrative capabilities. Taking this even further, recent studies have highlighted that the
microarchitecture of multisensory receptive fields plays a vital and previously unrealized role in
these interactions. Whereas classical multisensory (and unisensory) studies represented receptive
fields as simple bordered response regions, more recent studies have revealed a marked degree of
heterogeneity to these receptive fields (Carriere et al. 2008; Krueger et al. 2009; Royal et al. 2009).
Thus, rather than responding equivalently to stimuli placed anywhere within the receptive field,
neurons in both subcortical and cortical multisensory structures can respond in dramatically differ-
ent ways to stimuli placed at different locations within these receptive fields (Fig. 14.4 ).
A a b
Visual 0 1
Elevation (deg)
15
0
−15
−30
−10 0 10 20
Azimuth (deg)
0 1
Auditory
Elevation (deg)
15
0
−15
−30
−10 0 10 20
Azimuth (deg)
Multisensory 0 1
Elevation (deg)
15
0
−15
−30
−10 0 10 20
Azimuth (deg)
B a b
Multisensory (M) Visual (V) Auditory (A)
Elevation (deg)
Elevation (deg)
60 60
30 30
0 0
−30 −30
−60 −60
−60 −30 0 30 −60 −30 0 30 −60 −30 0 30 −60 −30 0 30
Azimuth (deg) Azimuth (deg) Azimuth (deg) Azimuth (deg)
c
A
Spikes/sec Stim
V
V V
150 A A
100 M M
50
0
−100 0 100 200 300 400 −100 0 100 200 300 400
Time from stimulus onset (ms) Time from stimulus onset (ms)
Fig. 14.4 (A) Representative example of an AES neuron exhibiting substantial changes of response
and multisensory interaction as a function of changes in stimulus location within its receptive field.
(a) Visual, auditory, and multisensory spatial receptive fields (SRFs) plotted, with each of the three
representations being normalized to the greatest evoked response and the pseudocolour plots
showing the relative activity scaled to this maxima. Warmer colours indicate higher evoked firing rates.
RECEPTIVE-FIELD ARCHITECTURE AS A KEY DETERMINANT OF MULTISENSORY INTEGRATION 335
Such work has sought to define the spatial receptive fields (SRFs) of multisensory neurons, and to
compare and contrast how SRF structure differs between different multisensory areas.
The primary motivation for this work lay beyond a simple attribution of this complexity to
receptive-field structure. For if we return to the original principles that appear to capture most of
the integrative features of multisensory neurons, we are immediately confronted with the revela-
tion that space (i.e. SRF architecture) and effectiveness must be intimately interwoven. Extending
this logic, one would predict that the pairing of multisensory stimuli within a region of vigorous
response (i.e. a ‘hot spot’) should result in relatively small interactions, whereas pairings in weakly
effective regions should result in large response gains. Indeed, such a relationship has been found
in both cat SC and cortex (Fig. 14.4; Krueger et al. 2009; Carriere et al. 2008). This work has
greatly broadened our view of multisensory interactions, and has reinforced the idea that space
and effectiveness cannot be thought of as independent entities in gating these interactions.
As a natural extension of these spatially-based studies, more recent work has focused on the time
domain, and has shown that multisensory neurons can be characterized on the basis of their spatio-
temporal receptive fields (STRFs; see Royal et al. 2009). These studies were motivated by the simple
observation that the temporal dynamics of multisensory neurons are strikingly complex and appear
to change as a function of both the spatial location of the stimuli and their relative effectiveness. The
most exciting feature of this data is that it has revealed two critical epochs in the multisensory
response—an early superadditive phase in which responses are preferentially speeded under multi-
sensory conditions and a late superadditive phase in which multisensory responses continue after
the unisensory responses have ended. Although much work is needed to fully characterize the STRF
architecture of multisensory neurons, these early studies greatly add to the SRF work by showing
that space, time, and effectiveness are all seamlessly interrelated. Furthermore, future work is needed
to reveal the functional significance of SRF and STRF structure for multisensory behaviours and
perceptions. Most likely among the possibilities is that this architecture is a necessary element for
dealing with the highly dynamic stimuli that make up the real world. For example, in vision, STRF
structure has been shown to be critical for motion processing, and in audition STRF (in this case,
spectrotemporal receptive field) structure has been shown to dynamically represent changes in both
Fig. 14.4 (Cont.) Symbols relate to the spatial locations of the stimulus pairings represented on the
right (b). (b) Rasters and spike density functions show the details of this neuron’s responses to the
visual stimulus alone (top row), auditory stimulus alone (middle row), and the combined visual-
auditory stimulus (bottom row) presented at three different azimuthal locations (circle, square, and
star on the receptive field plots in (a)). Note that the pairing of a visual stimulus at a highly effective
location within the SRF with an ineffective auditory stimulus resulted in a subadditive interaction
(circle column), whereas the pairing at a weakly effective location resulted in a superadditive
interaction (square column), and the pairing at a location of intermediate visual effectiveness resulted
in an additive interaction (star column). (P<0.01). (B) An example of SRF analysis in an SC neuron
illustrating the relationship between stimulus effectiveness and multisensory integration as a function
of space. (a) SRFs for this visual-auditory neuron. The borders of the multisensory SRF are outlined
with a dotted black line in all three panels. (b) Stimulus locations for two spatially coincident
multisensory pairings (circle and square) within the SRF. (c) Evoked responses for these two locations
for the visual, auditory, and multisensory conditions. Note that whereas the pairing at a weakly
effective location (square) results in a large superadditive multisensory interaction, the pairing at a
location in which a vigorous auditory response could be evoked (circle) results in a clear subadditive
interaction. (Reproduced from Journal of Neurophysiology, 97 (1), Mark T. Wallace and Barry E.
Stein, Early Experience Dertermines how the Senses will Interact, pp. 921–926 © 2007, The
American Physiological Society with permission.) (Reproduced in colour in the colour plate section.)
the sensory world and attentional state (Cai et al. 1997; Deangelis et al. 1995; Fritz et al. 2003; Miller
et al. 2002; Sripati et al. 2006). Finally, and most germane to the current contribution, spatial and
spatiotemporal receptive-field architecture in the context of development needs to be explored. A
host of exciting questions come immediately to the fore when thinking of applying these methods
to the developing nervous system. For example, do the large receptive fields that characterize multi-
sensory neurons early in life have the marked heterogeneity seen in the adult? When do the impor-
tant temporal epochs of multisensory integration emerge? Ongoing studies are beginning to address
these and other questions.
14.8 Studies of multisensory development moving

forward: future directions
Although we now have a much better view of the neurodevelopmental events resulting in the
formation of mature multisensory circuits, much work needs to be done in order to better char-
acterize the key developmental processes, and to relate developmental events at the neural level
with their behavioural and perceptual correlates. A number of questions have arisen during the
course of the previously described work, and a few of these are outlined below in an effort to
provide a view into the future of multisensory research.
Technological innovations now provide the ability to examine developmental processes in real
time, to manipulate these processes through molecular genetic means, to look not just at the sin-
gle neuron but at ensembles of interconnected neurons, and to record the activity of neurons
while the animal is actively engaged in a behavioural or perceptual task. In the first of these
advances, chronic recording methods now allow individual electrodes and arrays of electrodes to
be positioned in situ for significant periods of time, allowing the activity of these neurons to be
monitored continuously over the course of days to weeks. Such an approach is ideal for examin-
ing multisensory development, in that it will allow for the monitoring of an individual neuron as
it transitions from one response mode to another while beginning to actively integrate multisen-
sory signals over time. In a preliminary study using this method, we have recently been able to
watch the consolidation of receptive-field microarchitecture in a multisensory SC neuron of a
neonatal cat driven by the repetitive pairing of visual and auditory stimuli in spatial and temporal
coincidence.
One of the most exciting recent developments in neuroscience has been the application of
molecular genetic tools as a means of distinguishing the relative contributions of various neuron
types to specific behavioural and perceptual processes. The use of optogenetic methods (see
Cardin et al. 2009; Miller 2006; Sohal et al. 2009), involving genetic manipulation of specific neu-
ronal cell types coupled with optical stimulation, can now be turned to multisensory circuits in an
effort to parse the functional contribution of the integrative properties of these neurons. In a
similar manner, multielectrode methods are becoming increasingly common in addressing ques-
tions of information encoding in neural systems. Application of these approaches will undoubt-
edly yield valuable new data on multisensory integration, and will bring about a greater focus on
population-based measures of neuronal activity. Indeed, such approaches have recently been
used to great benefit in elucidating a functional role for oscillatory processes in multisensory
systems (Lakatos et al. 2007) and in illustrating an important role for local activity (i.e. local field
potentials) and neuronal coherence measures in multisensory interactions (Chandrasekaran and
Ghazanfar 2009; Ghazanfar et al. 2005; Maier et al. 2004). Perhaps more importantly, these pop-
ulation-based measures are strongly linked to the hemodynamic processes driving the fMRI sig-
nal, and hold great promise for providing the critical bridge between the animal model work and
human studies. Finally, although much of the foundation research on multisensory integration
TRANSLATIONAL OPPORTUNITIES 337
was conducted in anesthetized animals, the field is now at an exciting time in which a transition
to recordings in awake and behaving animals is beginning. From a developmental perspective,
this offers enormous opportunity to relate changes at the neurophysiological level with changes
at the behavioural and perceptual levels, thus enabling important correlative bridges to be built
between these domains.
14.9 Translational opportunities: bridging between the animal

model work and human multisensory research
The goal of the animal-model-based research outlined in this contribution is to complement,
inform, and extend our understanding of the developmental processes leading up to the con-
struction of a mature multisensory representation. Ultimately, this work must be directed so as to
form an important empirical foundation for human research, and indeed strong bridges are now
beginning to emerge between these model systems. For example, research from our laboratory
has begun to focus on detailing how multisensory temporal processing changes throughout child-
hood and adolescence. This work is predicated on the animal model studies described above,
which have shown that individual multisensory neurons have fairly broad temporal ‘tuning
curves’ within which they exhibit integrated responses. Stated a bit differently, each neuron
appears to have a relatively large window of time, typically lasting several hundred milliseconds,
within which significant multisensory interactions can be demonstrated. This concept of a ‘tem-
poral window’ for multisensory integration has been shown to extend to the human psycho-
physical realm, in that both behaviour and perception can be impacted significantly when paired
multisensory stimuli (e.g. visual-auditory) are presented within this time span (see Dixon and
Spitz 1980; Shams et al. 2002; van Wassenhove et al. 2007; Powers et al. 2009; Vroomen and
Keetels 2010; Vatakis and Spence 2010).
In applying this construct to developing systems, we have found there to be a significant nar-
rowing of this temporal window as children mature, but that this narrowing appears to be very
much dependent on the nature of the stimuli that are paired. Thus, for low-level, non-speech
stimuli (i.e. simple flashes and beeps), the temporal window was found to be nearly 40% larger
(404 ms versus 290 ms) for children between the ages of 6–11 years when compared with adults.
Somewhat surprisingly, this window remains enlarged even into adolescence (12–17 years), sug-
gesting that the processes driving the closure of this window for simple stimuli happen very late
in development (Hillock and Wallace, in press). In contrast, a parallel study using speech stimuli
found little difference in temporal binding across the tested ages (6–25 years) (Hillock and Wallace,
submitted), suggesting that the neural mechanisms subserving the temporal processing of more
ethologically relevant multisensory stimuli mature quite early, and providing early evidence for
ideas that assumptions concerning stimulus unity play an important role in facilitating temporal
binding (see Vatakis and Spence 2010). Ongoing EEG/ERP studies are now attempting to exam-
ine these issues in a more mechanistic manner.
In addition to this work from our laboratory focusing on later development, other relevant
work has focused on similar questions at earlier ages. For example, recent studies have
shown there to be substantial perceptual narrowing (analogous to the temporal narrowing
described above) during the first year of human development, both in the ability to match
multisensory vocalizations from non-native species and in responses to non-native (i.e. different
language) phonetic contrasts (Lewkowicz and Ghazanfar 2009; Lewkowicz et al. 2008; Pons et al.
2009).
Although the neural mechanisms underlying these developmental changes in perceptual
processes remain unknown, these observations provide a strong predictive framework for
future inquiry. One surprising incongruity has already appeared between the human and animal
models, in that whereas narrowing appears to characterize the development of multisensory
temporal function (and other abilities) in humans, single-unit studies from animal models show
a developmental expansion in the breadth of multisensory temporal tuning (Wallace and Stein
1997). Although these findings can be reconciled by appealing to a population-based argument
(i.e. if the individually tuned units are tuned differently, a broad population distribution can be
created), only with future experimentation can such questions be answered in a manner that will
allow us to more directly relate neurophysiological processes to their behavioural and perceptual
correlates.
Acknowledgements
The work described in this chapter has been supported by the NIH (MH063861), the Vanderbilt
Kennedy Center, the Center for Integrative Neuroscience and the Vanderbilt Vision Research
Center.
References
Ahissar, E., Vaadia, E., Ahissar, M., Bergman, H., Arieli, A., and Abeles, M. (1992). Dependence of cortical
plasticity on correlated activity of single neurons and on behavioral context. Science, 257, 1412–1415.
Allman, B.L., Bittencourt-Navarrete, R.E., Keniston. L.P., Medina, A.E., Wang, M.Y., and Meredith, M.A.
(2008). Do cross-modal projections always result in multisensory integration? Cerebral Cortex, 18,
2066–76.
Avillac, M., Ben Hamed, S., and Duhamel, J.R. (2007). Multisensory integration in the ventral intraparietal
area of the macaque monkey. Journal of Neuroscience, 27, 1922–32.
Bizley. J.K., Nodal, F.R., Bajo, V.M., Nelken, I., and King, A.J. (2007). Physiological and anatomical
evidence for multisensory interactions in auditory cortex. Cerebral Cortex, 17, 2172–89.
Blake, D.T., Heiser, M.A., Caywood, M., Merzenich, M.M. (2006). Experience-dependent adult cortical
plasticity requires cognitive association between sensation and reward. Neuron, 52, 371–81.
Cai, D., DeAngelis, G.C., and Freeman, R.D. (1997). Spatiotemporal receptive field organization in the
lateral geniculate nucleus of cats and kittens. Journal of Neurophysiology, 78, 1045–61.
Calvert, G.A., Spence, C., and Stein, B.E. (2004) The handbook of multisensory processes. MIT Press,
Cambridge, MA.
Cardin, J.A., Carlen, M., Meletis K., et al. (2009). Driving fast-spiking cells induces gamma rhythm and
controls sensory responses. Nature, 459, 663–67.
Carriere, B.N., Royal, D.W., Perrault, T.J., et al. (2007) Visual deprivation alters the development of cortical
multisensory integration. Journal of Neurophysiology, 98, 2858–67.
Carriere, B.N., Royal, D.W., and Wallace, M.T. (2008). Spatial heterogeneity of cortical receptive fields and
its impact on multisensory interactions. Journal of Neurophysiology, 99, 2357–68.
Chandrasekaran, C., and Ghazanfar, A.A. (2009). Different neural frequency bands integrate faces and
voices differently in the superior temporal sulcus. Journal of Neurophysiology, 101, 773–88.
Clarey, J.C., and Irvine, D.R. (1986). Auditory response properties of neurons in the anterior ectosylvian
sulcus of the cat. Brain Research, 386, 12–19.
Clemo, H.R., and Stein, B.E. (1983). Organization of a fourth somatosensory area of cortex in cat. Journal
of Neurophysiology, 50, 910–25.
DeAngelis, G.C., Ohzawa, I., and Freeman, R.D. (1995). Receptive-field dynamics in the central visual
pathways. Trends in Neuroscience, 18, 451–58.
Dixon, N.F., and Spitz, L. (1980). The detection of auditory visual desynchrony. Perception, 9, 719–21.
REFERENCES 339
Foxe, J.J., Wylie, G.R., Martinez, A., et al. (2002). Auditory-somatosensory multisensory processing in
auditory association cortex: an fMRI study. Journal of Neurophysiology, 88, 540–43.
Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). Rapid task-related plasticity of spectrotemporal
receptive fields in primary auditory cortex. Nature Neuroscience, 6, 1216–23.
Ghazanfar, A.A., Maier, J.X., Hoffman, K.L., and Logothetis, N.K. (2005). Multisensory integration of
dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience, 25, 5004–5012.
Hensch, T.K. (2004). Critical period regulation. Annual Review of Neuroscience, 27, 549–79.
Hensch, T.K. (2005). Critical period plasticity in local cortical circuits. Nature Reviews Neuroscience,
6, 877–88.
Hillock, A.R., and Wallace, M.T. (2012). Developmental changes in the multisensory temporal binding
window persist into adolescence. Developmental Science. (in press).
Hillock, A.R., and Wallace, M.T. (in preparation). A developmental study of the temporal constraints for
audiovisual speech binding.
Jiang, W., Wallace, M.T., Jiang, H., Vaughan, J.W., and Stein, B.E. (2001). Two cortical areas mediate
multisensory integration in superior colliculus neurons. Journal of Neurophysiology, 85, 506–22.
Jiang, W., Jiang, H., and Stein, B.E. (2006). Neonatal cortical ablation disrupts multisensory development
in superior colliculus. Journal of Neurophysiology, 95, 1380–96.
Katz, L.C., and Crowley, J.C. (2002). Development of cortical circuits: lessons from ocular dominance
columns. Nature Reviews Neuroscience, 3, 34–42.
Kayser, C., Petkov, C.I., Augath, M., and Logothetis, N.K. (2005). Integration of touch and sound in
auditory cortex. Neuron, 48, 373–84.
Keniston, L.P., Allman, B.L., Meredith, M.A., and Clemo, H.R. (2009). Somatosensory and multisensory
properties of the medial bank of the ferret rostral suprasylvian sulcus. Experimental Brain Research,
196, 239–51.
Kilgard, M.P., and Merzenich, M.M. (1998). Cortical map reorganization enabled by nucleus basalis
activity. Science, 279, 1714–1718.
King, A.J., and Palmer, A.R. (1985). Integration of visual and auditory information in bimodal neurones in
the guinea-pig superior colliculus. Experimental Brain Research, 60, 492–500.
King, A.J., Hutchings, M.E., Moore, D.R., and Blakemore, C. (1988). Developmental plasticity in the visual
and auditory representations in the mammalian superior colliculus. Nature, 332, 73–76.
Knudsen, E.I., and Brainard, M.S. (1991). Visual instruction of the neural map of auditory space in the
developing optic tectum. Science, 253, 85–87.
Krueger, J., Royal, D.W., Fister, M.C., and Wallace, M.T. (2009). Spatial receptive field organization of
multisensory neurons and its impact on multisensory interactions. Hearing Research, 258, 47–54.
Lakatos, P., Chen, C.M., O’Connell, M.N., Mills, A., and Schroeder, C.E. (2007). Neuronal oscillations and
multisensory interaction in primary auditory cortex. Neuron, 53, 279–92.
Lewkowicz, D.J. (1994). Development of intersensory perception in human infants. In The development of
intersensory perception: comparative perspectives (eds. D.J. Lewkowicz, and R. Lickliter), pp. 165–203.
Lewkowicz, D.J., Sowinski, R., and Place, S. (2008). The decline of cross-species intersensory perception
in human infants: underlying mechanisms and its developmental persistence. Brain Research, 1242,
291–302.
Maier, J.X., Neuhoff, J.G., Logothetis, N.K., and Ghazanfar, A.A. (2004). Multisensory integration of
looming signals by rhesus monkeys. Neuron, 43, 177–81.
Meredith, M.A., and King, A.J. (2004). Spatial distribution of functional superficial-deep connections in the
adult ferret superior colliculus. Neuroscience, 128, 870.
Meredith, M.A., and Stein, B.E. (1986a). Visual, auditory, and somatosensory convergence on cells in
Meredith, M.A., and Stein, B.E. (1986b). Spatial factors determine the activity of multisensory neurons in
cat superior colliculus. Brain Research, 365, 350–54.
Meredith, M.A., Nemitz, J.W., and Stein, B.E. (1987). Determinants of multisensory integration in superior
colliculus neurons. 1 Temporal factors. Journal of Neuroscience 7, 3215–29.
Miller, G. (2006). Optogenetics. Shining new light on neural circuits. Science, 314, 1674–76.
Miller, L.M., Escabi, M.A., Read, H.L., and Schreiner, C.E. (2002). Spectrotemporal receptive fields in the
lemniscal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–27.
Mucke, L., Norita, M., Benedek, G., and Creutzfeldt, O. (1982). Physiologic and anatomic investigation of a
visual cortical area situated in the ventral bank of the anterior ectosylvian sulcus of the cat.
Olson, C.R., and Graybiel, A.M. (1987). Ectosylvian visual area of the cat: location, retinotopic
organization, and connections. Journal of Comparative Neurology, 261, 277–94.
Perrault, T.J., Jr., Vaughan, J.W., Stein, B.E., and Wallace, M.T. (2005). Superior colliculus neurons use
distinct operational modes in the integration of multisensory stimuli. Journal of Neurophysiology, 93,
2575–86.
Pons, F., Lewkowicz, D.J., Soto-Faraco, S., and Sebastian-Galles, N. (2009). Narrowing of intersensory
speech perception in infancy. Proceedings of the National Academy of Science USA, 106, 10598–10602.
Powers, A.R., Hillock, A.R., and Wallace, M.T. (2009). Perceptual training narrows the temporal window of
multisensory binding. Journal of Neuroscience, 29, 12265–74.
Ramsay, A.M., and Meredith, M.A. (2004). Multiple sensory afferents to ferret pseudosylvian sulcal cortex.
NeuroReport, 15, 461–65.
Royal, D.W., Carriere, B.N., and Wallace, M.T. (2009). Spatiotemporal architecture of cortical receptive
fields and its impact on multisensory interactions. Experimental Brain Research, 198, 127–36.
Shams, L., Kamitani, Y., and Shimojo, S. (2002). Visual illusion induced by sound. Brain Research.
Sohal, V.S., Zhang, F., Yizhar, O., and Deisseroth, K. (2009). Parvalbumin neurons and gamma rhythms
enhance cortical circuit performance. Nature, 459, 698–702.
Sripati, A.P., Yoshioka, T., Denchev, P., Hsiao, S.S., and Johnson, K.O. (2006). Spatiotemporal receptive
fields of peripheral afferents and cortical area 3b and 1 neurons in the primate somatosensory system.
Stanford, T.R., Quessy, S., and Stein, B.E. (2005). Evaluating the operations underlying multisensory
integration in the cat superior colliculus. Journal of Neuroscience, 25, 6499–6508.
Stein, B.E., and Wallace, M.T. (1996). Comparisons of cross-modality integration in midbrain and cortex.
Progress in Brain Research, 112, 289–99.
van Wassenhove, V., Grant, K.W., and Poeppel, D. (2007). Temporal window of integration in auditory-
visual speech perception. Neuropsychologia, 45, 598–607.
Vatakis, A. and Spence, C. (2010). Audiovisual temporal integration for complex speech, object-action,
animal call, and musical stimuli. In Multisensory object perception in the primate brain
(eds. M.J. Naumer, and J. Kaiser). Springer, New York.
Vroomen, J., and Keetels, M. (2010). Perception of intersensory synchrony: a tutorial review. Attention,
Perception, and Psychophysics, 72, 871–84.
REFERENCES 341
Wallace, M.T., and Stein, B.E. (1994). Cross-modal synthesis in the midbrain depends on input from
cortex. Journal of Neurophysiology, 71, 429–32.
Wallace, M.T., and Stein, B.E. (1996). Sensory organization of the superior colliculus in cat and monkey.
Progress in Brain Research, 112, 301–311.
Wallace, M.T., and Stein, B.E. (2000). Onset of cross-modal synthesis in the neonatal superior colliculus is
gated by the development of cortical influences. Journal of Neurophysiology, 83, 3578–82.
Wallace, M.T., and Stein, B.E. (2007). Early experience determines how the senses will interact. Journal of
Wallace, M.T., Meredith, M.A., and Stein, B.E. (1992). Integration of multiple sensory modalities in cat
cortex. Experimental Brain Research, 91, 484–88.
Wallace, M.T., Wilkinson, L.K., and Stein, B.E. (1996). Representation and integration of multiple sensory
inputs in primate superior colliculus. Journal of Neurophysiology, 76, 1246–66.
Wallace, M.T., Perrault, T.J., Jr., Hairston, W.D., and Stein, B.E. (2004a). Visual experience is necessary for
the development of multisensory integration. Journal of Neuroscience, 24, 9580–84.
Wallace, M.T., Ramachandran, R., and Stein, B.E. (2004b). A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Science USA, 101, 2167–72.
Wallace, M.T., Carriere, B.N., Perrault, T.J., Jr., Vaughan, J.W., and Stein, B.E. (2006). The development of
cortical multisensory integration. Journal of Neuroscience, 26, 11844–49.
Chapter 15
In search of the mechanisms of

Denis Mareschal, Gert Westermann, and
Nadja Althaus
15.1 Introduction
In this chapter, we ask how multisensory perception can develop. That is, what are the mecha-
nisms by which separate modalities become differentiated and integrated during the first years of
life? We argue that computer models provide a powerful tool for answering these questions. We
therefore begin by describing the role of computational modelling in understanding causal
mechanisms of developmental change, going on to characterize connectionist neural network
models as a family of modelling approaches that is particularly well suited for studying learning
and development. In the next three sections we illustrate how neural network models have been
used to grapple with how pairwise multisensory (and likewise, sensorimotor) integration devel-
ops beginning with auditory-visual, then auditory-motor, and then finally visual-motor coupling.
Finally, we review our findings and point to some challenges for future research.
15.2 Models as tools for studying mechanisms

Most modern sciences have progressed from a descriptive to a causal-mechanistic stage. This has
been true in the physical sciences and biological sciences, as well as economics and other social
sciences. Developmental science is now mature enough (in terms of having enough data) to begin
the transition into a causal-mechanistic science. The principal historical obstacle to this transition
has been the lack of an appropriate tool to think about information-processing mechanisms and
the processes of developmental change. The introduction into the study of cognitive psychology
of computational modelling methods in the mid 1970s and 1980s (see Boden 1988), swiftly fol-
lowed by the arrival of connectionist (or neural network) modelling (Rumelhart and McClelland
1986a), has provided just such a tool.
Implemented computational models have many benefits (Lewandowsky 1993). Their key con-
tribution is that they force the researcher to be explicit about the information-processing mecha-
nisms that underlie performance on a task. As such, they test the internal consistency of any
proposed information-processing theory and allow the researcher to explore ranges of behaviours
that may be impossible or unethical to explore empirically. They can also be used to predict per-
formance under extreme limiting conditions, and to explore the complex interactions of the
multiple variables that may impact on performance. For example, in models of multisensory
development one could easily explore how the processing system develops under the manipula-
tion of sensory inputs, including sensory deprivation in one or several senses.
Thus, implemented computer models complement experimental data gathering by placing
constraints on the direction of future empirical investigations. Developing a computer model
MODELS AS TOOLS FOR STUDYING MECHANISMS 343
forces the user to specify precisely what is meant by the terms in his or her underlying theory.
Terms such as representations, symbols, and variables, and the nature of the input to a system,
must have an exact definition to be implemented within a computer model, which can help, for
example, in highlighting differences in their meaning for developmental and adult processing (see
also Haith 1998). The degree of precision required to construct a working computer model avoids
the possibility of arguments arising from the misunderstanding of imprecise verbal theories.
Moreover, building a model that implements a theory provides a means of testing the internal
self-consistency of the theory. Gaps in a theory that is in any way inconsistent or incomplete will
become immediately obvious when one tries to implement it as a computer programme: the
inconsistencies will lead to conflict situations in which the computer programme will not be able
to function. Such failures point to a need to reassess the implementation or re-evaluate the the-
ory. An implication of these two points is that the model can be used to work out unexpected
implications of a complex theory. Because the world is highly complex with a multitude of infor-
mation sources that constantly interact, even a simple process theory can lead to unforeseen
behaviours. Here again, the model provides a tool for teasing apart the nature of these interac-
tions and corroborating or falsifying the theory.
Perhaps the main contribution made by computational models of development is to provide an
account of the representations that underlie performance on a task, while also incorporating a
mechanism for representational change (see Mareschal et al. 2007 for an extensive discussion of
this issue). This is a difficult question because it involves observing how representations evolve
over time and tracking the intricate interactions between the developing components of a com-
plex cognitive system and its subjective environment. Building a model and observing how it
evolves over time provides a tangible means of achieving this end. Indeed, models that do not
address transitions but only explain behaviour at discrete ages are not models of development,
even if the relevant data that they explain involve children.
Thus, a model is fundamentally a tool for helping us to reason about the processes that under-
lie a given natural phenomenon. To be of value to the developmental community, a number of
constraints must be satisfied (see Mareschal and Thomas 2007). The first is transparency. A model
must be understandable to those who are going to use it in their everyday research activities. This
does not mean that its dynamic properties need to be immediately graspable. However, the proc-
esses and mechanisms that underlie the system’s behaviour, their mathematical embodiment, and
their computational implementation must all be clear. If the end user cannot understand the
basic processes underlying the developmental model then it is of little value, even if it mimics
completely the unfolding behaviours observed in a child.
Secondly, the model must be grounded. It must make substantial contact with the rich data
already available in all areas of cognitive development. A real danger of interdisciplinary research
(such as the computational modelling of cognitive development) is that expertise in one side of a
multifaceted project is underrepresented. Researchers then rely on secondary sources to guide
their modelling efforts. These secondary sources are often either out of date, of limited scope, or
simple approximations of what the real phenomena look like. Consequently, experts in the field
do not view the model as making any real theoretical contribution.
Thirdly, the model must be plausible. The mechanisms and processes that it proposes must be
consistent with those known or believed to exist in other related domains. Putting aside the issue
of what the appropriate level of description is for a particular phenomenon (i.e. the matter of
whether it is best described at the cognitive level or the neural level of processing) there is a temp-
tation, in the vein of an engineer or computer scientist, to posit mechanisms that will work, with
no regard given to cross-referencing to other models in similar domains. As a result, the model
will be theoretically and empirically isolated and it will become difficult to see how the model
344 IN SEARCH OF THE MECHANISMS OF MULTISENSORY DEVELOPMENT
could generalize to any other domain. In terms of levels of description, while empirical phenom-
ena can be independently studied at different levels, the levels are not themselves independent.
For example, a theory at the cognitive level cannot include assumptions or mechanisms that can-
not be delivered by processes at the neural level; assumptions about innate starting points for
cognitive development cannot be inconsistent with what is known about initial constraints on
brain development (Mareschal et al. 2007).
The rest of this chapter illustrates these general principles with specific reference to multisen-
sory integration, by focusing on one computational modelling approach that has been extremely
successful at producing psychologically relevant computational models of learning and develop-
ment in infants and children. This approach has given rise to models of typical but also atypical
learning and development, where key boundary conditions and resource limitations are found to
lie at the heart of the atypical behaviours observed in children; namely, connectionist or neural
network modelling.
15.3 Connectionist models of learning and development

Many different cognitive architectures have been proposed as psychological process models of
development (see Mareschal and Thomas 2007 or Mareschal 2010, for full reviews). Connectionist
networks are computer models loosely based on the principles of neural information processing
(Elman et al. 1996; McLeod et al. 1998; Rumelhart and McClelland 1986a; Shultz 2003). In many
cases, they are not intended to be neural models, but rather cognitive information processing
models that embody general processing principles such as inhibition and excitation within a dis-
tributed, parallel processing system. They attempt to strike the balance between importing some
of the key ideas from the neurosciences while maintaining sufficiently discrete and definable
components to allow questions about behaviour to be formulated in terms of a high-level cogni-
tive computational framework.
From a developmental perspective, connectionist networks are ideal for modelling because
they develop their own internal representations as a result of interacting with an environment
(Plunkett and Sinha 1992). However, these networks are not simply tabula rasa empiricist learn-
ing machines. The representations that they develop can be strongly determined by initial con-
straints (or boundary conditions). In the case of multisensory development, these constraints
can, for example, take the form of different associative learning mechanisms attuned to specific
information in the environment (e.g. to the temporal correlation or spatial correlation between
information in different sensory modalities) or they can take the form of architectural constraints
that guide the flow of information in the system (such as separate sensory channels that become
integrated, or a common multisensory system that becomes differentiated). Although connec-
tionist modelling has its roots in associationist learning paradigms, it has inherited the Hebbian
rather than the Hullian tradition (Hill 1967). That is, what goes on inside the box (inside the net-
work) is as important in determining the overall behaviour of the networks as the correlation
between the inputs (stimuli) and the outputs (responses).
Connectionist networks are made up of simple processing units (idealized neurons) intercon-
nected via weighted communication lines (Hinton 1989) so that activation can flow between
units through these connections. As activation flows through the network, it is transformed by
the set of connection weights between successive layers in the network. An attractive property
of connectionist models as models of development is that they can learn from exposure to an
environment. This learning (i.e. behavioural adaptation) is achieved by adjusting the weights
of the connections between the units depending on observed input patterns. Different types of
connectionist models have different architectures, and here we will describe briefly the two most
common ones: feed-forward networks and self-organizing feature maps.
CONNECTIONIST MODELS OF LEARNING AND DEVELOPMENT 345
Feed-forward networks consist of two or more layer of neurons, and typically the neurons in
each later are fully connected to all units in the next higher layer. The first layer is usually the
input layer that receives information from the environment, and the last layer is the output layer
that provides output to the environment. Between the input and output layers can be any number
of hidden layers (typically there is one). Feed-forward models learn to map an input to its corre-
sponding output—for example, the representation of the picture of an object into the name for
this object. When an input is presented to the model, activation is propagated from the input
layer through the weighted connections to the hidden layer(s) and from there to the output layer.
In supervised learning, the output generated by the model is compared with the target that the
model should produce, and the connection weights are then adjusted so that the output will
become more like the target. The most common weight adjustment rule is the backpropagation
algorithm (Rumelhart et al. 1986). Supervised models are very popular and have been applied
to a wide range of psychological phenomena. They have, however, been criticized for a lack
of biological plausibility because it was assumed that neurons do not propagate error signals
backwards. Biologically plausible equivalences of error backpropagation have, however, been
suggested (O’Reilly 1996).
More commonly used than feed-forward networks in models of multisensory integration are
self-organizing maps (SOM)—also known as self-organizing feature maps (Kohonen 1982)—or
variants thereof. These models consist of an input layer and a feature map layer. Feature map
models do not learn a specific task but they cluster the high-dimensional input data on a two-
dimensional map while preserving the neighbourhood relations (topology) between inputs. One
reason why this type of model is attractive is its biological plausibility. Topological maps are ubiq-
uitous in the cortex. For example, neurons in the primary visual cortex (V1) form a topological
representation so that adjacent stimuli in the world, which create activations in adjacent retinal
cells, lead to activity in adjacent cortical cells. This topology preservation persists into higher
visual areas. Likewise, auditory neurons are organized in a tonotopic maps reflecting temporal
frequency. Somatosensory maps preserve the general layout of the body surface while distorting
it so that sensitive areas (lips) are expanded and less sensitive areas (back) are shrunk on the
map. Further topological maps have been identified in parietal and frontal cortex (Silver and
Kastner 2009).
In Kohonen’s (1982) algorithm, an input is presented to the input layer of the feature map,
which is fully connected to the output map, and activation progresses to the second layer through
these connections. Activation depends on the distance between the weight vector associated with
each unit and the current input: The best-matching unit (BMU) is the one whose associated
vector has the minimum distance to the input vector. The weights between the input units and
the activated unit are then changed so that the distance to the current input vector is further
decreased, making it more likely that on repeated presentation of the same input the unit
will become activated. A topological representation is achieved by defining a radius of units
(a ‘neighbourhood’) around the BMU for which the connections are adjusted in the same direc-
tion, but to a lesser extent. As a consequence, the units in this neighbourhood will be activated by
similar inputs as the one activating the BMU. The radius is initially large and then gradually
shrinks, as learning progresses, to a final size of zero.
In models of multisensory integration, often two or more feature maps are used to represent
the inputs from different sensory domains, and connections between the maps realize multisen-
sory integration. These connections are often modified according to the Hebb rule (Hebb 1949).
This is a biologically plausible rule by which the connection strength between two neurons
increases if both neurons are active simultaneously (‘neurons that fire together, wire together’).
The principle of this type of model is that when inputs in two domains (e.g. an object and a
sound) have co-occurred repeatedly, strong Hebbian connections will develop between the visual
and auditory units representing these stimuli. Subsequently, when only one of the inputs (e.g. the
sound) is presented to the model, the corresponding representation in the other domain (the
object) is evoked through the Hebbian connections.
There are two concerns with the existing approaches using two self-organizing maps connected
by Hebbian links to model multisensory development. First, in order to obtain a highly accurate
mapping between the two domains, the maps for each sensory domain need to be fully trained
before the Hebbian crossdomain connections are learned. This assumption is unrealistic because
presumably multisensory integration begins before unisensory representations are fully formed.
For example, in word-learning as a case of higher-level multisensory integration, this mechanism
implies that the complete phonological and object repertoire have to be developed independently
before the first word–object associations are learned. Second, in this simple type of model, the
organization of representations within each domain is left unaffected by crossmodal integration.
As will be described below, this assumption is also in conflict with empirical data. As a conse-
quence of these disadvantages some models have modified both the algorithm by which the maps
are formed and the formation of crossmodal connections between them (Althaus and Mareschal
2012; Westermann and Miranda 2004).
Psychologists think of knowledge occurring at two levels in connectionist neural networks (see
Munakata and McClelland 2003). On the one hand, there is knowledge stored in the connection
strengths, an accumulation of learning events. The connection strengths determine the pattern of
activation produced by an input and/or by existing activation propagating inside the system. On
the other hand, there is knowledge corresponding to the activations themselves. When activation
is processed through the connections, it gives rise to maintained activity, which serves as both a
representation of current inputs and a working memory for recently presented information.
Note that many connectionist networks are very simple: they may contain some 100 units or so.
This is not to suggest that the part of the brain solving the corresponding task only has 100 neu-
rons. Remember that such models are frequently not intended as neural models, but rather as
information-processing models of behaviour. The models constitute examples of how systems
with similar computational properties to the brain can give rise to a set of observed behaviours.
Sometimes individual units are taken to represent pools of neurons or cell assemblies. According
to this interpretation, the activation level of the units corresponds to the proportion of neurons
firing in the pool (e.g. Changeux and Dehaene 1989 ). However, to preserve the cognitive
interpretation of the model, activation of a unit in a network corresponds to a conceptual state
with reference to the domain being modelled, rather than the spike train recorded from a single
neuron somewhere in the brain. This is because neural codes typically use distributed representa-
tions rather than localist representations. In distributed codes, the cognitively interpretable unit
is represented by a pattern of activity, not by the activity of a single unit. Since the representations
in a connectionist model of a cognitive process may capture similarity relations among patterns
used in the brain and among concepts entertained by the mind without the units representing the
concepts themselves, there is a sense in which the models exist at a level between the cognitive and
the neural level.
15.4 Modelling multisensory integration

As illustrated throughout this book, we live and grow up within a multisensory world. Indeed,
even at the neural level, children exhibit differential processing of multisensory compared to
unisensory stimuli, as has previously been reported in adults (Brett-Green et al. 2008). Although
many mechanistic models of learning and development have been proposed, most have focused
on learning within a single modality, or focused on learning with amodal representations. This is
MODELLING MULTISENSORY INTEGRATION 347
because multisensory learning raises some particularly difficult issues. First, representational
formats (codes) across modalities are likely to be very different. For example, in the auditory
cortex, there is both a frequency- and a location-based code of sound, depending on the fre-
quency of the sound encountered. Relating both of these formats simultaneously onto a single
visual code leads to many computational problems. Secondly, different modalities may have dif-
ferent processing requirements. So, while visual information often consists of complex informa-
tion that is available to the eyes all at once, auditory signals (such as speech) involve the rapid
temporal modulation of the processing stream, with individual elements of the auditory scene
following one another. Thus, it is more important for visual information to be integrated spatially
and auditory information to be integrated temporally. Thirdly, there may be enormous differ-
ences in the degrees of freedom with which a sensory system can operate. While the mechanics of
the ear (and thus auditory system) is relatively fixed, the motor system has an enormous number
of degrees of freedom giving rise to a complex inverse problem in the somatosensory space
(Bremner et al. 2008a).
Models of multi-domain integration have spanned several levels, from low-level models—
multisensory integration between visual and auditory information in speech recognition and
sensorimotor integration between speech sounds and articulatory motor programs in speech
production—to higher-level models combining orthographic and phonological information in
reading or object features and words in word learning. Most models of multisensory integration
use separate feature maps for each domain, which are then linked by Hebbian connections (see
Fig. 15.1). All of these models take an integrative approach to multisensory development, assum-
ing that input to each domain leads to a separate representation on one of the feature maps
and that Hebbian connections then develop to link co-occurring representations in each domain.
One aspect in which models differ is in simulating how the presence of multisensory input affects
the unisensory representations in each domain. Whereas many models merely learn the
Modality A Modality B
Hebbian connection
Self-organizing maps
Input vectors
Fig. 15.1 A model architecture consisting of two self-organizing maps (SOMs) that are interconnected
by Hebbian links: each map represents one modality, and crossmodal learning involves a strengthening
of Hebbian links between co-activated units. (Reproduced in colour in the colour plate section.)
co-occurrence statistics of inputs across sensory domains, other models have become explicit in
how multisensory information can affect unisensory representations There exists considerable
evidence that the organization within each domain can be affected through multisensory integra-
tion. The best-known example of this is the McGurk effect in audio-visual speech integration
(McGurk and MacDonald 1976). For example, when participants see a face articulating /ga/ while
hearing an audio clip of the syllable /ba/ they perceive /da/. People are not conscious of the
discrepancy between auditory and visual information but form one integrated percept, which in
this case does not correspond to either the visual or auditory stimulus. Similar effects have since
been identified in a range of other domains such as vision and touch (Jousmaki and Hari 1998)
and audition (pitch) and taste (Crisinel and Spence 2009). Although vision usually is the most
robust sense with respect to crossmodal interference, some studies have shown that vision can be
altered by auditory information. For example, a single flash is perceived as multiple flashes when
paired with two auditory stimuli (beeps) (Shams et al. 2000; Shipley 1964).
The ‘multi-modal classifier’ described by De Sa and Ballard (1998) constitutes one example
of a model in which representations within a sensory domain are affected by crossmodal links.
The idea behind this classification model was that instead of using supervised learning of catego-
ries, a multisensory (in this case, audio-visual) learning system can exploit the redundancy present
across sensory modalities to derive categories in an unsupervised way. The model consisted
of separate networks for each modality, and each network fed into a common classification
layer that contained one unit for each category (the number of categories that could be discrimi-
nated was therefore pre-specified by the number of units in this layer). Each modality first
made an independent classification decision, and in a second stage each modality was trained
with the category output of the other modality in order to minimize the discrepancy between
classification decisions for a multisensory stimulus. This crossmodal training resulted in an
adjustment of category boundaries in each domain on the basis of the correlational structure
between domains.
The model was tested on realistic speech data by using audio-visual recordings of English
speakers uttering one of five syllables (/ba/, /va/, /da/, /ga/ and /wa/). For each syllable, 10-ms
segments of processed acoustic data were presented together with digitized images of the mouth
movement of the speaker. Each syllable was recorded 118 times with 98 instances used for train-
ing and 20 for (unisensory) testing. After training on the multisensory data, the visual network
achieved a classification accuracy of 80% and the auditory network of 93% of the test data.
The model therefore demonstrated that multisensory learning improved classification accuracy
even for subsequent unisensory stimuli.
Although the De Sa and Ballard (1998) model did not specifically focus on psychological aspects
of multisensory processing, it did show that exploiting the redundancy and correlations between
the different modalities of naturalistic stimuli can improve the categorization of these stimuli
to an extent that resembles supervised learning of unisensory stimuli. It also left the issue of
the development of this ability completely untouched (for a discussion of the developmental
importance of crossmodal redundancy on learning see Chapter 8 by Bahrick and Lickliter, and
Chapter 7 by Lewkowicz).
Westermann (2001) described another neural network model directly aimed at explaining
how crossmodal links affect the emergence of representations in each modality. As described
above, there is considerable empirical evidence for such effects (e.g. in audiovisual speech inte-
gration, category learning, and colour categorization). Like most other models it consisted of
two topographic maps, one for each modality, and with Hebbian connections between the
domain maps. Each map consisted of 200 units with Gaussian activation functions that formed
receptive fields over the input space for that domain. An external input activated a population of
MODELLING THE DEVELOPMENT OF AUDIO-VISUAL COUPLING 349
neurons with overlapping receptive fields on its respective sensory map. The response of the
model to the input was computed as a population code; that is, the weighted sum of the activa-
tion values of all units. Through being exposed to multisensory inputs the model developed
crossmodal connections between the neurons on each map. These connections developed
according to a covariance rule (Sejnowski 1977): when the activation of a neuron covaried with
that of a neuron on the other domain map (i.e. it was active whenever the other neuron was
active and inactive whenever the other neuron was inactive), a strong crossmodal connection
developed between these neurons. Connections between neurons whose activations were unre-
lated remained weak. The activity reaching a neuron through its crossmodal connections was
added to its ‘external’ activation and thus affected the population-coded response on each map.
In particular, neurons whose activation reliably correlated with activation of the same neurons
in the other modality became more highly activated and therefore ‘pulled’ the population-coded
response toward them, inducing a perceptual change to align the mappings in both domains.
In effect the neurons on each map acted as multisensory neurons because they were activated
both by unisensory input from their domain and, through the crossmodal connections, by input
to the other domain.
To model how several perceptual phenomena can be explained by multisensory integration, a
continuous signal in one modality was paired with two separate signal clusters in the other
modality so that one half of the continuous signal co-occurred with one cluster and the other half
with the other cluster. By being exposed to multisensory stimuli with this correlational structure,
representations in each domain became warped and the continuous signal separated into two
clusters to align with the distribution of signals in the other modality. This process also induced
categorical perception of the continuous data so that the way in which the model perceived this
input (i.e. the population-coded response) was non-linear, with higher discrimination sensitivity
at the category boundary. Finally, the model provided a mechanistic explanation of McGurk-like
sensory illusions. For this simulation, the model that had been trained on the ‘correct’ multisen-
sory data was then exposed to data in which the trained correlational structure was violated so
that each half of the continuous signal in the first modality co-occurred with the ‘wrong’ cluster
in the other modality. This input corresponds to, for example, a visual articulation of /ga/ paired
with an auditory /da/. The response of the model to this inconsistent input was a ‘compromise’
between the two unisensory inputs that were evident in each domain, corresponding to, for
example, a perceived /ba/ when presented with the auditory-visual /da/-/ga/ pairing. In this way,
the model, which was based on simple Hebbian learning mechanisms, accounted for a range of
behaviors in multisensory processing.
15.5 Modelling the development of audio-visual coupling: the

case of early word learning
Infant word learning is an aspect of cognitive development not typically considered from a mul-
tisensory perspective. The reason for this may be that there is no causal relationship between the
visual signal (an object, the ‘referent’) and the auditory signal (a word, the ‘label’), and crossmo-
dal synchrony is not necessarily maintained. Indeed, the infant receives many exposures to words
or objects without the counterpart in the other modality being present. Nevertheless, there is
evidence in the word-learning literature that it is precisely those occurrences where infants are
presented with both object and referent at the same time (often in situations of joint attention
with an adult) that result in successful word–object mapping, and ultimately in word learning. At
the same time, an impact of labelling has been shown to facilitate (or at least impact on) various
cognitive processes in pre-linguistic infants such as object individuation (Xu et al. 2005) and
categorization (Fulkerson and Waxman 2007; Plunkett et al. 2008; Waxman and Markow 1995).
The mechanism(s) underlying these effects remain largely undiscussed in the literature.
In this section, we introduce a novel model simulating word learning as a multisensory (audio-
visual) task. Our model shows that early crossmodal interactions shape the development of cate-
gory representations in both individual modalities, and that categorical perception emerges in
this scenario, but not in a similar model where no crossmodal interactions are possible. We argue
that perceiving labels in a different sensory modality than their referents is a significant aspect of
word learning, and has a major impact on the formation of categories.
Several models with this dual-map architecture have been used previously to simulate word
learning and lexical processing by linking phonological, orthographic, and semantic (Miikkulainen
1997), phonological and semantic (Li et al. 2004), or object and label (Mayor and Plunkett 2010)
maps. The most sophisticated of these models was DevLex, a model of early lexical development
(Li et al. 2004). This model consisted of two growing self-organizing maps, one for processing
lexical-semantic information and one for phonological information. These maps were linked
with Hebbian connections. Phonemes were represented by features such as voiced/unvoiced,
fricative and dental and were inserted into a syllabic template that allowed for the representation
of phonological similarities among words. Semantic representations were based on co-occurrence
statistics of words so that words that occur in the same contexts in text overlap in their features.
Another semantic representation that was also used was based on semantic feature descriptions.
The model was trained by being presented simultaneously with the phonological and semantic
information for words. Words were presented incrementally to model the growing lexicon of the
developing child. During learning each map grew by adding units and connections as the size of
the lexicon and the complexity of semantic representations increased. The connections between
the active units on the maps were updated according to the Hebb rule, increasing the connection
strength for co-activated words and semantic representations. Without explicit training the
model developed topographically organized representations for lexical categories (nouns, verbs,
adjectives, and closed-class words) as a consequence of the self-organizing process. The model
also simulated lexical confusion (using the wrong word within a lexical category) (Gershkoff-
Stowe and Smith 1997) as a function of word density, semantic similarity, and age-of acquisition
effects (Ellis and Morrison 1998).
Although in DevLex (Li et al. 2004) and the Dislex model (Miikkulainen 1997) learning of
topographic organization and crossmodal links occurred simultaneously, the crossmodal con-
nections had no effect on the organization within a domain. To address this shortcoming, we
introduce here a new computational model (Althaus and Mareschal in press), whose architecture
is similar to previous connectionist approaches discussed above, but which combines this with a
novel training algorithm that attempts to exploit the crossmodal information present in word–
object pairings while focusing on the organization of similarity space in the visual and auditory
domains. Thus, experience-based self-organization in one domain causally impacts on self-
organization in the other linked domains. The primary purpose of this example is to illustrate in
detail how a dual-map model of multisensory integration may work.
The model consists of two SOMs (20 × 20 units) that are fully interconnected by Hebbian links.
One of the maps receives ‘visual’ input, the other receives (pre-processed) speech input. As in the
general SOM learning algorithm (Kohonen 1982; see above), presentation of an input pattern
causes one or several units in the map to become active. The BMU is the one whose associated
weight vector is least distant from the input vector. Based on a (Gaussian) neighbourhood
function, the BMU and surrounding units are activated. In the present model, the interaction
between the two modalities is realized by additionally propagating activation between the
two maps via the Hebbian links. In addition to the ‘direct’ activation, this causes an ‘indirect’
(crossmodal) activation pattern (again based on a Gaussian neighbourhood function). Together,

these patterns form the ‘joint activation’ pattern, a weighted mixture of Gaussians, which may be
used for updating map weights in both dimensions.
Map weights are updated in two consecutive steps, combining enhancement and inhibition.
The idea behind this is to learn a representation based on a history of similar objects having been
encountered with similar labels (or similar labels with similar objects). In each domain, activation
that is ‘supported’ by evidence from that domain itself (i.e. direct activation) and by evidence
from the other domain (i.e. indirect activation, coming in via the Hebbian links)—in other words
the joint activation pattern—is enhanced. At the same time, units that are activated only by the
direct input are inhibited (i.e. moved away from the present input vector). In this way, aspects
idiosyncratic to individual exemplars (that do not correspond to previous exemplars sharing a
similar label) are disregarded for the purpose of developing a similarity space. In real-world word
learning, this may reflect a scenario where exemplars may appear quite different from each other,
such as a duck versus a parrot. Hearing both associated with the word ‘bird’ may enable a category
representation that is focused on the fact that both animals have some features in common, such
as a beak and wings, rather than attributes that distinguish the two, such as size, proportions, and
colour.
In addition to the representation in the individual domains, Hebbian connection weights
between the maps are also updated at every iteration, thereby updating the mappings between
words and objects.
In the present model, the visual map was trained with a feature-based representation of objects,
acquired by measuring geometrical surface dimensions of toy objects from 11 categories (horses,
cats, dogs, fish, songbirds, eagles, tables, chairs, cars, ships, bikes—8 exemplars each). Auditory
stimuli were obtained by recording 8 tokens of 11 nonsense words, and applying a series of pre-
processing steps to each of these tokens. The model was always presented with pairs of words and
objects, with 100% consistency between categories (e.g. cars were always paired with instances of
the word blicket). However, within categories, exemplars were randomized to simulate a more
realistic situation in which specific objects occur with phonetically distinct utterances (tokens) of
the label.
In order to evaluate the role of interactive crossmodal learning in performance, a similar model
was developed in which architecture and training were identical apart from the Hebbian weights,
which in this case were passive (i.e. there was no indirect activation and representations were
learned separately in both modalities). Visual and auditory category representation in both mod-
els, as well as the quality of mappings from one domain to the other, were assessed using four
metrics (see Figs. 15.2 and 15.3). It is important to note that both models had arrived at 100%
accurate mappings from words to objects and vice versa by 450 epochs of training (as shown by
comprehension and production rates in Figs. 15.2 and 15.3), indicating that the multisensory
integration processes present in the interactive model did not disrupt learning with regards to
this. The ability to accurately learn word–object mappings is a prerequisite for a beneficial impact
of multisensory integration, even though our focus is on the representation of visual/auditory
exemplars that develops in the individual maps as illustrated in Fig. 15.4.
As Fig. 15.4a shows, the non-interactive maps developed a scattered representation of exemplar
projections. A topographical map emerged, which shows that similar objects (and words) were
perceived as similar by the model. All exemplars from the same category (or word) were projected
onto map units in close spatial proximity, as indicated by the differently coloured squares in
Fig. 15.4 (note that learning in the model was unsupervised, i.e. the model had no information
about category membership during training). However, no real clusters of exemplars were formed:
distances between exemplars within a category were the same as where a category boundary is
Clusteringvis Discriminationvis MEDvis Production

1 1 20 1
Mean
0.8 0.8 0.8 +/− 1 std. dev.
15
0.6 0.6 0.6
10
0.4 0.4 0.4
5
0.2 0.2 0.2
0 0 0 0
0 200 400 0 200 400 0 200 400 0 200 400
Epochs Epochs Epochs Epochs
Clusteringaud Discriminationaud MEDaud Comprehension

1 1 20 1
0.8 0.8 0.8

15
0.6 0.6 0.6
10
0.4 0.4 0.4
5
0.2 0.2 0.2
0 0 0 0
0 200 400 0 200 400 0 200 400 0 200 400
Fig. 15.2 Performance in the model with interactively developing maps (active Hebbian
connections). Clustering: measures the proportion of nearest neighbours of each projection that are
projections of exemplars from the same category. Discrimination: the number of distinct units which
are projections of exemplars from the same category. Mean exemplar distance (MED): the average
euclidean distance between all pairs of exemplars from one category. Production/comprehension:
the proportion of correct mappings from the visual to the auditory (auditory to visual) domain.
Clusteringvis Discriminationvis MEDvis Production

1 1 20 1
0.8 0.8 0.8 Mean

15
+/− 1 std. dev.
0.6 0.6 0.6
10
0.4 0.4 0.4
5
0.2 0.2 0.2
0 0 0 0
0 200 400 0 200 400 0 200 400 0 200 400
Clusteringaud Discriminationaud MEDaud Comprehension

1 1 20 1
0.8 0.8 0.8

15
0.6 0.6 0.6
10
0.4 0.4 0.4
5
0.2 0.2 0.2
0 0 0 0
0 200 400 0 200 400 0 200 400 0 200 400
Fig. 15.3 Performance in the model with independently developing maps (passive Hebbian
connections).
crossed. In contrast, Fig. 15.4b reveals that categorical perception emerged through training using
the interactive learning mechanism: projections in the interactive scenario formed tight clusters
with large between-category distances. Clearly, learning using multisensory information led to
advantages with regard to category formation and assessing category membership.
The quantitative assessment of the maps throughout the learning phase is shown in Figs. 15.2
and 15.3. Figure 15.2 illustrates the model’s performance with crossmodal interactions (i.e. active
Hebbian weights). Figure 15.3 shows, for comparison, the model’s performance with separately
developing maps and inactive Hebbian weights.
The clustering and mean exemplar distance (MED) metrics give a quantification of the devel-
opment of scattered versus tightly clustered representation, as depicted in Fig. 15.4. Figure 15.3,
shows that in the non-interactive case, clustering and MED decreased over training and settled at
low values. In contrast, the high clustering and MED values at 450 epochs reflect the organization
in the interactive model. Looking at the discrimination metric, however, shows that the interac-
tive model was worse at discriminating between different exemplars than the non-interactive
model. Apparently, the model extracted a feature representation that focuses on the commonali-
ties between exemplars from one category, and disregards idiosyncrasies.
Most interesting from a developmental perspective are the changes in the models’ representations
over time. While the different metrics in Fig. 15.3 show that development in the non-interactive
scenario was monotonic, the maps in the interactive model apparently underwent a phase of reor-
ganization after around 200 epochs of training. Up until this point, development in the interactive
and non-interactive scenarios was very similar but after 200 epochs directions reversed in the inter-
active case: discrimination and MED decreased, while clustering increased. This reorganization
phase coincided temporally with a dip in both comprehension and production measures, indicating
that the mappings between words and objects were also affected by the process. Investigating the
model’s parameter settings at this intermediate stage of training reveals that reorganization also
(a) Exemplar projections: non-interactive case (b) Exemplar projections: interactive case
Fig. 15.4 Exemplar representations after training: coloured squares are BMUs of at least one
training exemplar, with identical colours representing exemplars from the same category. Dark blue
colours in the grid represent units that were not a BMU. Both maps represent the ‘visual’ domain;
‘auditory’ maps had corresponding overall structure after training. a) In the non-interactive case
(passive Hebbian weights) objects from the same category were projected to units in close
proximity, but there were no tight clusters, and between-category distances were hardly larger than
within-category distances. b) In the interactive case, the model developed tight clusters where
category boundaries are clear (i.e. large between-category gaps, small within-category gaps).
(Reproduced in colour in the colour plate section.)
coincided with the point at which direct and indirect activation contributed equally to the joint
activation pattern.
To investigate the impact of an early versus late onset of crossmodal interactions we simulated
learning with different parameter settings, varying the initial contribution of crossmodal activa-
tion. As this contribution grows with time, this also impacts the point in time where ‘direct’ and
‘indirect’ activation are equal. Results showed that this parameter setting played an important
role in model performance and the ‘developmental trajectory’ of crossmodal mappings as well as
categorization in the individual domains. Strengthening crossmodal interactions too early in
training results in immature map representations being enhanced, and the consequences are
deficits in performance, both at the level of representations within the modalities and at the level
of intermodal mappings. This is despite a more ‘straightforward’, monotonic map development
that at a first glance appears to be an improvement over the seemingly tedious reorganization
process demonstrated above. Similarly, crossmodal interactions that start too late in development
do not have the possibility of shaping development in the individual domains to the same degree
as early interactions, and the maps settle before reorganization has been completed (see
Zangerenhepour et al., 2009 for similar discussions in relation to the primate brain).
In summary, the model discussed here provides evidence that crossmodal interactions during
learning contribute not only to the acquisition of crossmodal mappings, but also contribute to
developing representations in the individual domains, and that the timing of the onset of cross-
modal interactions may play a crucial role in development. The latter particularly is an aspect of
multisensory development that is worth exploring in further work. (For discussions of the impor-
tant role of developmental timing in multisensory development see also Chapter 16 by Ghazanfar,
and Chapter 7 by Lewkowicz).
15.6 Modelling the development of auditory-motor

coupling: the case of infant babbling
An extension to the Westermann (2001) model discussed in Section 15.3 was to substitute one
sensory modality map with a map encoding motor commands in a model of sensorimotor inte-
gration in the learning of speech sounds (Westermann and Miranda 2004), showing how the
same mechanisms could, in principle, account for multisensory and sensorimotor integration.
The idea here was to explain the formation of preferentially perceived and produced speech
sounds by a link between perception and production. The speech system in infants adapts to their
ambient language during the first years of life: perceptual discrimination narrows from the ability
to discriminate all speech sounds to being able to only discriminate native speech sounds (Werker
and Tees 1984), and production of early sounds converges onto the native repertoire (Boysson-
Bardies and Vihman 1991; but see Oller and Eilers 1998). There is considerable evidence that a
link between perception and production plays an important role in the development of normal
speech. For example, the onset of babbling in deaf infants is not only delayed by several months
compared with hearing infants, but the repertoire of produced sounds is also smaller, indicating
that the ability to perceive sounds plays a role in the ability to produce sounds (Oller and Eilers
1988; Stoel-Gammon and Otomo 1986).
The sensorimotor model learned articulatory and auditory representations simultaneously by
babbling, that is, by creating a random articulation and listening to this articulation. Links
between maps thus developed between articulatory settings and their perceptual consequences.
Due to the effect of sensorimotor links on the population-coded representation on each map, the
model developed prototypical articulations and perceptions in those regions of articulatory space
that led to stable sounds; that is, in which small changes to articulators resulted in small changes
GENERAL DISCUSSION 355
to the produced sounds. When the model was not only exposed to its own-generated sounds but
also to ambient sounds (German or French vowels recorded from native speakers) it adapted its
prototypes to these sounds by selectively enhancing articulator representations that corresponded
to these sounds. This mechanism provided an implementation of the articulatory filter hypothesis
(Vihman 1993), which suggests that the alignment between articulatory and perceptual abilities
provides the basis for producing an infant’s first words. The model also presented a new perspec-
tive on mirror neurons. These are neurons that are active both when an action is observed and
when it is executed, and they have been implicated in the evolution and development of imitative
behavior (Rizzolatti and Craighero 2004). In the model, such neurons emerged from the sensori-
motor links: presentation of an external speech sound to the model resulted in an activation on
the auditory map, and through the sensorimotor connections that had been learned through bab-
bling, this activation also led to excitation of those motor neurons that were used by the model to
produce that external sound itself. The model therefore suggested that mirror neurons can be
construed as simple association neurons that emerge through sensorimotor integration and that
they have not evolved specifically to support the ability for imitative learning.
15.7 Modelling the development of visual-motor coupling

Visual-motor coupling is of profound interest to development psychology and plays a fundamen-
tal role in Piaget’s early theories of infant cognitive development (Piaget 1952). In fact, sensori-
motor schemas have been described as translating visual codes directly into motor codes (e.g.
Drescher 1991) in much the same way as the dorsal visual cortical route has been described as
translating visual codes into action codes (Milner and Goodale 1995). It is therefore surprising
that there has been relatively little effort to model the development of this coupling (see Bullock
and Grossberg 1990; Schlesinger et al. 2000; Schlesinger and Parisi 2001). The main exception to
this has been in the domain of robotics.
Robots interact in a physical world and need to couple their action to their sensory input.
Consequently, several roboticists have turned to developmentalists for inspiration as to how this
may happen. By studying infants closely and trying to construct robots that gradually combine
sensorimotor information, they provide some interesting mechanistic accounts of how visual and
motor information could be combined. While it is eminently plausible that robotic models could
provide useful models for developmentalists (see Mareschal et al. 2007; Schlesinger 2003), current
efforts have generally remained at a relatively abstract level (e.g. Smith and Breazeal 2007) or been
dominated by solving technical issues in sensory motor control (e.g. Metta et al. 1999), rather
than focusing on the mechanisms of development (see discussion in Weng et al. 2001). Perhaps
one of the biggest obstacles to developing developmental models of visual-motor integration is
the constant change in physical body size and motor skills of the developing child. Thus, the
motor dimension of the coupling would need to be constantly recalibrated, a point that has been
made extensively in the empirical literature (e.g. Chapter 5 by Bremner et al.; Chapter 6 by
Nardini and Cowie; Bremner et al. 2008a, 2008b; Gori et al. 2008; Nardini et al. 2010). This is
clearly an area that needs future work and the dual-map framework described above provides one
possible avenue of future explorations. Such an endeavour would have the added elegance of
proposing a single general framework for explaining pairwise multisensory integration across
many different sensory modalities.
15.8 General discussion

In this chapter, we have reviewed the importance of developing mechanistic models of development.
Ultimately, mechanisms are at the heart of any scientific explanation. Implemented computational
models (or computer simulations) provide an ideal tool for testing and exploring mechanistic
models of development. However, most models of multisensory integration are not developmen-
tal and are only interested in end-state performance. For example, the work of Ma et al. (2006),
or Ma and Pouget (2008), provides an excellent account of cue integration that bridges the neural
and behavioral levels of representations, but does not ask how this ability comes about. This
leaves unanswered the question of how the integration mechanism emerges and how it recali-
brates in response to the changing physical constraints of the growing child. Thus multisensory
integration poses particular challenges for models of development.
A second point that we emphasized was that many typical perceptual and cognitive phenomena
of developmental interest could also be understood as expressions of emerging multisensory inte-
gration. Indeed, early word learning is not just about developing a semantic understanding of
language, but it is also about combining visual and auditory information into an integrated and
coherent representation. Casting word learning in this manner suggests novel ways in which the
developing child may solve the problem of learning language.
We suggested that coupled self-organizing neural maps provide a promising avenue to explore
the emergence of multisensory integration. Each map can organize around the intrinsic structure
present in a particular sensory modality. Hebbian-style links between the sensory maps can pick
up on co-occurrence of particular representations in each of the sensory maps. Once these links
are established, information in one sensory stream can cross the modality gap to influence
processing in the other modality. In one example we described, auditory label information influ-
enced the way visual category representations function in the presence of a label as compared to
the absence of a label. This suggested a mechanistic multisensory integration account of early
word learning. While such maps have been proposed to model auditory-motor and auditory-
visual coupling, it still remains to be seen if the same account can be extended to visual-motor
coupling.
One final comment here is that most of the existing models are disembodied models that only
exist as computer simulations. While this is in general extremely useful for understanding causal
mechanisms of development, it may well be that for modelling and understanding the mecha-
nism of multisensory integration, we need to move to more realistic embodied models (Mareschal
et al. 2007a,b). An immediate example of this is to use the disembodied models as control systems
for embodied robots. Robots face many of the same problems humans in that the move around
in a physical three-dimensional world and sense their world through multiple sensory streams.
Thus, solving the problem for robots may provide us with useful hints as to how this problem is
solved the developing child.
Acknowledgements
The writing of this chapter was supported by UK ESRC grants RES-062–23-0819 to DM and
RES-000–22-3394 to GW.
References
Althaus, N. and Mareschal, D. (in press). Early language as cross-modal learning. In Proceedings of the 12th
Neural Computation and Psychology Workshop (ed. E. Davelaar), pp.110–123. World Scientific, Singapore.
Boden, M.A. (1988). Computer models of mind. Cambridge University Press, Cambridge, MA.
Boysson-Bardies, B.D., and Vihman, M.M. (1991) Adaptation to language: Evidence from babbling of
infants according to target language. Language, 67, 297–319.
Bremner, A.J., Holmes, N.P. and Spence, C. (2008a). Infants lost in (peripersonal) space? Trends in
REFERENCES 357
Bremner, A.J., Mareschal, D., Lloyd-Fox, S., and Spence, C. (2008b). Spatial localization of touch in the first
year of life: Early influence of a visual spatial code and the development of remapping across changes in
limb position. Journal of Experimental Psychology: General, 137, 149–62.
Brett-Green, B.A., Miller, L.J., Gavin, W.J., and Davies, P.L. (2008). Multisensory integration in children:
A preliminary ERP study. Brain Research, 1242, 283–90.
Bullock, D. and Grossberg, S. (1990). Motor skill development and neural networks for position code
invariance under speed and compliance rescaling. In Sensory-motor organizations and development in
infancy and early childhood (eds. H. Block and B. Bertenthal), pp. 1–22. Kluwer Academic, Dordrecht.
Changeux, J., and Dehaene, S. (1989). Neural models of cognitive functions. Cognition, 33, 63–109.
Crisinel, A.-S., and Spence, C. (2009). Implicit association between basic tastes and pitch. Neuroscience
Letters, 464, 39–42.
de Sa, V.R., and Ballard, D.H. (1998). Category learning through multimodality sensing. Neural
Computation, 10, 1097–1117.
Drescher, G.L. (1991). Made-up minds. a constructivist approach to artificial intelligence. MIT Press,
Cambridge, MA.
Ellis, A.W., and Morrison, C.M. (1998). Real age-of-acquisition effects in lexical retrieval. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 24, 515–23.
Fulkerson, A., and Waxman, S. (2007). Words (but not tones) facilitate object categorization: evidence
from 6-and 12-month-olds. Cognition, 105, 218–28.
Gershkoff-Stowe, L., and Smith, L.B. (1997). A curvilinear trend in naming errors as a function of early
vocabulary growth. Cognitive Psychology, 34, 37–71.
Goldstone, R.L., and Barsalou, L.W. (1998). Reuniting perception and conception. Cognition, 65, 231–62.
Goldstone, R.L., Lippa, Y., and Shiffrin, R.M. (2001). Altering object representations through category
learning. Cognition, 78, 27–43.
Gori, M., De Viva, Sandini, G., and Burr, D. (2008). Young children do not integrate visual and haptic form
information. Current Biology, 18, 694–98.
Haith, M.M. (1998). Who put the cog in infant cognition? Is rich interpretation too costly? Infant Behavior
Hebb, D.O. (1949). The organization of behavior: a neuropsychological theory. Wiley, New York.
Hill, W.F. (1967). Learning: a survey of psychological interpretations. Chandler Publishing, London.
Hinton, G.E. (1989). Connectionist learning procedures. Artificial Intelligence, 40, 185–234.
Jousmäki, V., and Hari, R. (1998). Parchment-skin illusion: sound-biased touch. Current Biology, 8, R190–91.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics,
43, 59–69.
Lewandowsky, S. (1993). The rewards and hazards of computer simulations. Psychological Science, 4, 236–43.
Li, P., Farkas, I., and MacWhinney, B. (2004). Early lexical development in a self-organizing neural
network. Neural Networks, 17, 1345–62.
Ma, W.J., Beck, J.M., Latham, P.E., and Pouget, A. (2006). Bayesian inference with probabilistic population
codes. Nature Neuroscience, 9, 1432–38.
Ma, W.J., and Pouget, A. (2008). Linking neurons to behavior in multisensory perception: A computational
review. Brain Research, 1242, 4–12.
Mareschal, D. (2010). Computational perspectives on cognitive development. Wiley Interdisciplinary
Reviews, 1, 696–708.
Mareschal, D., and Thomas M.S.C. (2007). Computational modelling in developmental psychology. IEEE
Transactions on Evolutionary Computation (Special Issue on Autonomous Mental Development), 11,
137–50.
Mareschal, D. Johnson, M.H., Sirois, S., Spratling, M., Thomas, M., and Westermann, G. (2007a)
Neuroconstuctivism, Vol. 1: How the brain constructs cognition. Oxford University Press, Oxford.
Mareschal, D., Sirois, S., Westermann, G., and Johnson, M.H. (2007b). Neuroconstructivism, Vol. 2:
Perspectives and prospects. Oxford University Press, Oxford.
Mayor, J., and Plunkett, K. (2010). A neurocomputational account of taxonomic responding and fast
mapping in early word learning. Psychological Review, 117, 1–31.
Metta, G., Sandini, G., and Konczak, J. (1999). A developmental approach to visually guided reaching in
artificial systems. Neural Networks, 12, 1413–27.
Miikkulainen, R. (1997) Dyslexic and category-specific aphasic impairments in a self-organizing feature
map model of the lexicon. Brain and Language, 59, 334–66.
Milner, D.A. and Goodale, M.A. (1995). The visual brain in action. Oxford University Press, Oxford.
Munakata, Y., and McClelland, J.L. (2003). Connectionist models of development. Developmental Science,
6, 413–29.
Nardini, M., Bedford, R., and Mareschal, D. (2010). Fusion of visual cues is not mandatory in children.
Proceedings of the National Academy of Science U.S.A. 107, 17041–46.
O’Reilly, R.C. (1996) Biologically plausible error-driven learning using local activation differences: the
generalized recirculation algorithm. Neural Computation, 8, 895–938.
O’Reilly, R.C., and Munakata, Y. (2000). Computational explorations in cognitive neurosicnece:
Understanding the mind by simulating the brain. MIT Press, Cambridge, MA.
Oller, D.K., and Eilers, R.E. (1988). The role of audition in infant babbling. Child Development, 59, 441–49.
Oller, D.K., and Eilers, R.E. (1998). Interpretive and methodological difficulties in evaluating babbling drift.
Revue Parole, 7/8, 147–64.
Piaget, J. (1952). The origins of intelligence in the child. International Universities Press, New York.
Plunkett, K., Hu, J., and Cohen, L. (2008). Labels can override perceptual categories in early infancy.
Cognition, 106, 665–81.
Plunkett, K., and Sinha, C. (1992). Connectionism and developmental theory. British Journal of
Rizzolatti, G., and Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27,
169–92.
Roberson, D., Davies, I., and Davidoff, J. (2000). Color categories are not universal: replications and
new evidence from a stone-age culture. Journal of Experimental Psychology: General, 129, 369–98.
Rumelhart, D.E., and McClelland, J.L. (1986). Parallel distributed processing, Vol. 1: Foundations. MIT Press,
Cambridge, MA.
Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986). Learning internal representations by error
propagation. In Parallel distributed processing: explorations in the microstructure of cognition, Vol. 1:
Foundations (eds. D.E. Rumelhart, and J.L. McClelland), pp. 318–62. MIT Press, Cambridge, MA.
Schlesinger, M. (2003). A lesson from robotics: modeling infants as autonomous agents. Adaptive Behavior,
11, 97–107.
Schlesinger, M., and Parisi, D. (2001). Multimodal control of reaching: the role of tactile feedback. IEEE
Transactions on Evolutionary Computation: Special Section on Evolutionary Computation and Cognitive
Science, 5, 122–28.
Schlesinger, M., Parisi, D., and Langer, J. (2000). Learning to reach by constraining the movement search
space. Developmental Science, 3, 67–80.
Sejnowski, T.J. (1977). Storing covariance with nonlinearly interacting neurons. Journal of Mathematical
Biology, 4, 303–312.
Shams, L., Kamitani, Y., and Shimojo, S. (2000). Illusions: what you see is what you hear. Nature, 408, 788.
Shipley, T. (1964). Auditory flutter-driving of visual flicker. Science, 145, 1328–30.
Silver, M.A., and Kastner, S. (2009). Topographic maps in human frontal and parietal cortex. Trends in
Smith, L. and Breazeal, C. (2007) The dynamic life of developmental process. Developmental Science, 1, 61–68.
REFERENCES 359
Stoel-Gammon, C., and Otomo, K. (1986). Babbling development of hearing-impaired and normally
hearing subjects. Journal of Speech and Hearing Disorders, 51, 33–41.
Vihman, M.M. (1993). Variable paths to early word production. Journal of Phonetics, 21, 61–82.
Waxman, S. and Markow, D. (1995). Words as invitations to form categories: evidence from 12- to
13-month-old infants. Cognitive Psychology, 29, 257–302.
Weng, J., McClelland, J., Pentland, A., et al. (2001). Autonomous mental development by robots and
animals. Science, 291, 599–600.
Werker, J.F., and Tees, R.C. (1984). Cross-language speech-perception—evidence for perceptual
reorganization during the 1st year of life. Infant Behavior and Development, 7, 49–63.
Westermann, G. (2001). A model of perceptual change by domain integration. In Proceedings of the
23rd Annual Conference of the Cognitive Science Society (eds. J. D. Moore and K. Stenning),
pp. 1100–1105. Lawrence Erlbaum Associates, Hillsdale, NJ.
Westermann, G., and Miranda, E.R. (2004). A new model of sensorimotor coupling in the development of
speech. Brain and Language, 89, 393–400.
Xu, F., Cote, M., and Baker, A. (2005). Labeling guides object individuation in 12-month-old infants.
Zangenehpour, S., Ghazanfar, A.A., Lewkowicz, D.J., Zatorre, R.J. (2009). Heterochrony and cross-species
intersensory matching by infant vervet monkeys. PLoS ONE 4, e4302.
Chapter 16
The evolution of multisensory vocal

communication in primates and the
influence of developmental timing
Asif A. Ghazanfar
16.1 Introduction
Behaviours do not fossilize and typically neither do brains. As a result, understanding the
origins of human behaviour requires the comparative method whereby species-typical behav-
iours of other extant primates and humans are compared. It can be inferred that for any behav-
iour shared by two closely related living species, their last common ancestor must have exhibited
that same behaviour. In this manner, comparative methods can uncover the behavioural capaci-
ties of extinct common ancestors and identify behavioural homologies. However, comparative
studies must also recognize that species-typical behaviours are not only the product of phyloge-
netic processes but ontogenetic ones as well (Gottlieb 1992), and thus understanding the origins
of species-typical behaviours requires understanding the relationship between these two proc-
esses. This integrative approach can inform questions about the homology of behaviours (Deacon
1990 ; Finlay et al. 2001 ), help determine whether the homologies reflect the operation
of the same or different mechanisms (Schneirla 1949), and can point to the range of possible
mechanisms that are available for natural selection to modify (Finlay and Darlington 1995).
Taking development into account will help avoid the mistake of inferring similar underlying
mechanisms when dealing with different levels of phylogenetic and ontogenetic organization
(Schneirla 1949).
In the domain of multisensory processes, there are numerous comparative studies across
many bird and mammalian species that provide insights into the phylogeny and ontogeny of
human multisensory capacities (Lewkowicz and Lickliter 1994). Like newborn humans, the
newly-hatched of many bird species, for example, are able to detect redundant information across
sensory modalities, suggesting a behavioural capacity that is very ancient. We also know from
studies of quail that prenatal experience in one sensory modality can influence the postnatal
development of a different modality (Lickliter 1990; Lickliter and Stoumbos 1991). In this sce-
nario, it is thought that earlier developing sensory systems act as scaffolds for later developing
ones (Turkewitz and Kenny 1982). Using the cat model system, we’ve learned that areas of the
neocortex that are multisensory in the adult only gradually acquire this characteristic (Wallace
et al. 2006 ). Furthermore, single neurons in the cat association cortex may respond
to multiple modalities, but do not integrate those responses without considerable postnatal
sensory experience (Wallace et al. 2006). By ‘integration’ (here and elsewhere in this chapter),
I mean that the magnitude of neuronal or behavioural responses to multisensory stimuli is greater
than the sum of the responses to unisensory stimuli; the multisensory responses are supralinear.
Along the same lines as their non-human counterparts, human infants exhibit multisensory
APPARENT HOMOLOGIES IN PRIMATE VOCAL COMMUNICATION 361
behaviours early in development. With respect to the visual and auditory modalities, human
infants detect changes in synchrony and other spatiotemporal relationships between the sounds
and sights of objects including communication signals such as speech (see Chapter 9 by
Lewkowicz). These infant multisensory capacities require both pre- and post-natal experience to
help refine them (Lewkowicz and Ghazanfar 2009).
In this chapter, I review data that suggest that adult monkeys and humans (infants and adults)
share behavioural and neural homologies in the domain of multisensory vocal communication.
I will then posit that, despite certain similarities in multisensory capacity, it is possible that
neurodevelopmental processes leading to the development of these behaviours may be different.
While there are many putative developmental factors that could have a differential influence on
the organization of the neocortex across species, one prominent factor may simply be the matura-
tion rate of the brain.
16.2 Apparent homologies in primate vocal communication

16.2.1 Behaviour
All mammalian vocalizations are produced by coordinated movements of the lungs, larynx (vocal
folds), and the supralaryngeal vocal tract (Fitch and Hauser 1995; Ghazanfar and Rendall 2008).
The vocal tract consists of the column of air derived from the pharynx, mouth, and nasal cavity.
In larger-bodied mammals, the source signals (sounds generated by the lungs and larynx) travel
through the vocal tract and are filtered according to its shape, resulting in vocal tract resonances
or formants discernable in the spectra of some vocalizations (for non-human primates, see: Fitch
1997; Owren et al. 1997; Rendall et al. 1998). In humans, speech-related vocal tract motion results
in the predictable deformation of the face around the oral aperture and other parts of the face
(Jiang et al. 2002; Yehia et al. 1998, 2002). In fact, the spectral envelope of a speech signal can be
predicted by the three-dimensional motion of the face alone (Yehia et al. 1998), as can the motion
of the tongue (an articulator that is not necessarily coupled with the face) (Yehia et al. 1998; Jiang
et al. 2002). The spatiotemporal behaviour of the vocal tract articulators involved in sound pro-
duction constrains the shape and time-course of visible orofacial movement. Such speech-related
facial motion, distributed around and beyond the mouth, guides the perception of audiovisual
speech.
In non-human primate vocal production, there is a similar link between acoustic output and
facial dynamics. Different rhesus monkey vocalizations are produced with unique lip configura-
tions and mandibular positions, and the motion of such articulators influences the acoustics
of the signal (Hauser et al. 1993; Hauser and Ybarra 1994). Coo calls, such as /u/ in speech, are
produced with the lips protruded, while screams, such as the /i/ in speech, are produced with the
lips retracted. The jaw position and lip configuration affect the formant frequencies
independent of the source frequency (Hauser et al. 1993; Hauser and Ybarra 1994). Moreover, as
in humans, the articulation of these expressions has visible consequences on facial motion beyond
the oral region. Grimaces, produced during scream vocalizations for instance, cause the skin-
folds around the eyes to increase in number. In addition to these production-related facial move-
ments, some vocalizations are associated with visual cues that are not directly related to the
articulatory movement. Threat vocalizations, for instance, are produced with intense staring,
eyebrows raised, and ears often pulled back (Partan 2002). Head position and motion (e.g. chin-
up versus chin-down versus neutral position) also varies according to vocal expression type
(Partan 2002). Chimpanzees (Pan troglodytes), though less studied in the domain of multisensory
communication, also have a link between facial expression and vocal acoustics. Bauer (1987)
analyzed video and audio tracks of chimpanzee vocalizations and found that a decline in the
362 THE EVOLUTION OF MULTISENSORY VOCAL COMMUNICATION IN PRIMATES
fundamental frequency (F0) occurred when submissive screams transitioned into aggressive
barks. These changes in F0 were correlated with changes in visible articulators such as lip and
teeth opening. Thus, it is likely that many of the facial motion cues that humans use for speech-
reading are present in at least some apes and monkeys as well.
Given that both humans and other extant primates exhibit unique facial postures during
the production of their various vocalizations, it is perhaps not surprising that, like humans,
many primates recognize the correspondence between the visual and auditory components
of vocal signals. Macaque monkeys (Old World, Macaca mulatta), capuchins (New World,
Cebus apella) and chimpanzees (Pan troglodytes) all recognize auditory-visual correspondences
between their various vocalizations (Evans et al. 2005; Ghazanfar and Logothetis 2003; Izumi
and Kojima 2004; Parr 2004). For example, rhesus monkeys readily match the facial expressions
of ‘coo’ and ‘threat’ calls with their associated vocal components (Ghazanfar and Logothetis
2003). Perhaps more impressive, rhesus monkeys can also segregate competing voices in a chorus
of coos—much as humans might with speech in a cocktail party scenario—and match them to
the correct number of individuals seen cooing on a video screen (Jordan et al. 2005). Finally,
macaque monkeys use formants (i.e. vocal-tract resonances) as acoustic cues to assess age-related
body size differences among conspecifics (Ghazanfar et al. 2007). They do so by linking, across
modalities, the body size information embedded in the formant spacing of vocalizations (Fitch
1997) with the visual size of animals who are likely to produce such vocalizations (Ghazanfar et al.
2007).
Given the few primate species tested so far, the ubiquity of face-voice matching in this taxon is
not known. Filling in the gaps is important for the following reason: If macaques, chimps, humans,
and capuchins all share this capacity, it would suggest that a common ancestral primate also had
this capacity. However, we also know that many primates are small and arboreal (e.g. marmosets,
Callithrix jacchus) and they seem to rely less on facial communication. Would these species show
face–voice matching? Or is this capacity confined to larger primates that frequently use face-to-
face communication? If so, then this would suggest that the face–voice matching capacity shared
by some primates may (in part) be the result of convergent evolution—that is, the behaviour
evolved independently in two lineages because of common socioecological forces and not because
of common ancestry. To date, there are no studies of multisensory behaviours in small primate
species, but the evidence for extensive neuroanatomical connections between sensory areas sug-
gests that the substrate for such behaviours exists (Cappe and Barone 2005).
16.2.2 Neurophysiology
Traditionally, the linking of vision with audition in the multisensory vocal perception described
above would be attributed to the functions of association areas such as the superior temporal
sulcus in the temporal lobe or the principal and intraparietal sulci located in the frontal and pari-
etal lobes, respectively. Although these regions may certainly play important roles (see below),
they are certainly not necessary for all types of multisensory behaviours (Ettlinger and Wilson
1990), nor are they the sole regions for multisensory convergence (Driver and Noesselt 2008;
Ghazanfar and Schroeder 2006). The auditory cortex, in particular, has many potential sources of
visual inputs (Ghazanfar and Schroeder 2006), and this is borne out in the increasing number of
studies in different species (ferrets, macaque monkeys, and humans) demonstrating visual modu-
lation of auditory cortical activity (Bizley et al. 2007; Ghazanfar et al. 2005, 2008; Kayser et al.
2007, 2008; Schroeder and Foxe 2002).
Human studies of audiovisual speech show, remarkably, that auditory cortex is sensitive to
visual speech (Besle et al. 2004; Calvert et al. 1997; Sams et al. 1991; van Wassenhove et al. 2005).
This is true for monkeys as well. Recordings from the auditory cortex of macaque monkeys reveal
DOES THE TIMING OF BRAIN DEVELOPMENT INFLUENCE MULTISENSORY BEHAVIOUR? 363
that responses to the voice are influenced specifically by the presence of a dynamic face (Ghazanfar
et al. 2005, 2008). Monkey subjects viewing unisensory and multisensory versions of two different
species-typical vocalizations (‘coos’ and ‘grunts’) show both enhanced and suppressed local field
potential (LFP) and single neuron responses in the multisensory condition relative to the unisen-
sory auditory condition (Ghazanfar et al. 2005, 2008). These data are consistent with evoked
potential studies in humans (Besle et al. 2004; van Wassenhove et al. 2005) in that the congruent
combination of faces and voices led to integrative responses (significantly different from unisen-
sory responses) in the vast majority of auditory cortical sites.
The superior temporal sulcus (STS) is another prominent node for processing audiovisual
speech in humans (Calvert et al. 2000; Calvert and Campbell 2003; Wright et al. 2003), and
audiovisual integration more generally (Beauchamp et al. 2004; Noesselt et al. 2007). In a similar
vein, the STS of macaque monkeys also shows integration of faces and voices (Barraclough et al.
2005; Chandrasekaran and Ghazanfar 2009), and a general sensitivity to multisensory inputs
(Benevento et al. 1977; Bruce et al. 1981). One possibility is that the visual influence on auditory
cortex is mediated by its interactions with the STS, as there are reciprocal connections between
the STS and the auditory cortex (Barnes and Pandya 1992; Seltzer and Pandya 1994). Concurrent
recordings of LFPs and spiking activity in the auditory cortex and the STS revealed that functional
interactions, in the form of gamma band (>30 Hz) correlations, between these two regions
increase in strength during presentations of faces and voices together relative to the unisensory
conditions (Ghazanfar et al. 2008). Using neuroimaging, functional connectivity has also been
observed between STS and auditory cortex in humans performing multisensory tasks (Noesselt
et al. 2007). This suggests that humans and monkeys share a neural pathway subserving audio-
visual communication, one that includes the auditory cortex and the STS and their functional
interactions (among other regions and interactions).
The commonalities between the multisensory behaviour and its underlying neural activity in
humans and other primates (particularly, macaque monkeys) make it easy to suggest that the
processes are homologous—descended from a common ancestor. Yet, despite the fact that human
and monkeys may exhibit similar behavioural capacities and neural activity patterns, the different
developmental and experiential trajectories between species may indicate that these similarities
may be superficial. An examination of their developmental time course suggests that we must be
more careful in how we define what counts as a homologous trait.
16.3 Does the timing of brain development influence

multisensory behaviour?
Often in contemporary comparative studies, the behavioural capacities of adult non-human
primates are compared with those observed in human infants, with the guiding assumption being
that similar behaviours reflect the same underlying processes. As reviewed above in the domain of
audiovisual communication, it has been shown that adult Old World monkeys can match spe-
cies-specific faces to voices (Ghazanfar and Logothetis 2003; Jordan et al. 2005) and the implicit
assumption is that this is homologous to the ability of human infants to do so (Jordan and
Brannon 2006; Kuhl and Meltzoff 1982; Patterson and Werker 2003). One possible source for this
assumption comes from the pervasive, tenacious, and incorrect idea that ‘ontogeny recapitulates
phylogeny’. According to this scenario, human infant goes through stages of development that
reflect all the ancestors of humans—as though the human infant brain must go through a stage
that represents a ‘primitive’ adult monkey brain. Any uniquely human behavioural and neural
capacities are ‘added on’ after that stage (in evolutionary biology, this is referred to as ‘terminal
addition’).
The fundamental problem with making claims about homologous behaviours is the possibility
that similar behavioural capacities may be mediated by different underlying processes. This alter-
native scenario is possible because brain development follows different trajectories in monkeys
relative to humans, particularly with regard to timing: Old World monkeys are neurologically
precocial relative to human infants. For example, at birth, the rhesus monkey brain is heavily
myelinated whereas the human brain is only moderately myelinated (Gibson 1991). Likewise,
whereas all sensorimotor tracts are heavily myelinated by 2 to 3 postnatal months in the rhesus
monkey, they are not myelinated until 8–12 months of age in the human. These facts suggest that
the postnatal myelination in the rhesus monkey brain is about three to four times faster than in
the human brain (Gibson 1991; Malkova et al. 2006). Although the rate is different, the spatio-
temporal sequence of myelination (and other indices of brain growth) along different neural
pathways is the same between monkeys and humans (Clancy et al. 2000; Kingsbury and Finlay
2001) and generally coincides with the emergence and development of species-specific motor,
socio-emotional, and cognitive behaviours (Antinucci 1989; Konner 1991). Finally, in terms of
overall brain size at birth, Old World monkeys are among the most precocial of all mammals
(Sacher and Staffeldt 1974), possessing around 65% of their brain size at birth compared to only
around 25% for human infants (Malkova et al. 2006; Sacher and Staffeldt 1974).
Given that monkeys and humans develop at different rates (‘heterochrony’), it is important to
know how this might influence the behaviour and neural circuitry underlying multisensory com-
munication. Oddly, while there are numerous studies on the development of multisensory proc-
esses in humans and other animals (Lewkowicz and Lickliter 1994), there is only a handful of
studies for non-human primates (Adachi et al. 2006; Batterson et al. 2008; Gunderson 1983;
Gunderson et al. 1990, Zangenehpour et al. 2009). Furthermore, there is only one neurobiological
study of multisensory integration in the developing monkey (Wallace and Stein 2001). This study
suggests that while neurons in the newborn macaque monkey may respond to more than one
modality, they are unable to integrate them—that is, they do not produce supralinear responses
to multisensory stimulation (again, relative to the sum of unisensory response) as they do in adult
monkeys. This suggests that the circuits require experience to fully develop, an idea supported by
deprivation experiments in developing cats (Carriere et al. 2007). Taken together, we can posit
the following: if the brains of monkeys and humans develop at different rates, then the role that
experience plays in shaping multisensory circuits may have differing degrees of influence on those
circuits. For instance, a more mature circuit may be less susceptible to the effects of experience.
A recent experiment (described in the next section) suggests that an interaction between the tim-
ing of brain development and social experience may shape the neural circuits underlying both
human and primate vocal communication, but in different ways (Zangenehpour et al. 2009).
16.3.1 Cross-species face–voice matching in infant vervet monkeys

The heterochrony of neural and behavioural development across different primate species raises
the possibility that the development of multisensory circuits may be different in monkeys relative
to humans. In particular, Turkewitz and Kenny (1982) suggested that the neural limitations
imposed by the relatively slow rate of neural development in human infants may actually be
advantageous because the limitations may provide them with greater functional plasticity during
postnatal life. This theoretical observation has received empirical support from studies showing
that infants go through a process of ‘perceptual narrowing’ in their processing of unisensory as
well as multisensory information; that is, where initially they exhibit broad sensory tuning, they
later exhibit narrower tuning. For example, 4–6 month-old human infants can match rhesus
monkey faces and voices, but 8–10 month-old infants no longer do so (Lewkowicz and Ghazanfar
2006). These findings suggest that as human infants acquire greater experience with conspecific
NEURODEVELOPMENTAL PROCESSES UNDERLYING PERCEPTUAL NARROWING 365
human faces and vocalizations, but none with heterospecific faces and vocalizations and nonna-
tive multisensory speech, their perceptual tuning (and their neural systems) narrows to match
their early experience (for a review, see Lewkowicz and Ghazanfar 2009).
If a relatively immature postnatal state of neural development leaves a developing human
infant more ‘open’ to the effects of early sensory experience then it stands to reason that the more
advanced state of neural development in monkeys might result in a different outcome. This pos-
sibility was investigated in developing infant vervet monkeys (an Old World monkey species;
Chlorocebus pygerethrus) by testing whether they can match the faces and vocalizations of another
species with which they had no prior experience (Zangenehpour et al. 2009). As in the human
infant study described above (Lewkowicz and Ghazanfar 2006), infant vervets ranging in age
from 23 to 65 weeks (∼6 to 16 months) were tested in a preference task in which they viewed pairs
of the same rhesus monkey face producing a coo call on one side and a grunt call on the other side
and heard one of the calls at the same time. Even though the vervets had no prior exposure to
rhesus monkey faces and vocalizations, they matched them. Importantly, they exhibited cross-
species matching well beyond the age of perceptual narrowing in human infants. The reason for
this lack of perceptual narrowing may lie in the precocial neurological development of this Old
World monkey species.
Why do infant vervets continue to match hetero-specific faces and voices at a postnatal and
neurological age that, relative to human infants, is beyond the time when multisensory perceptual
narrowing should have occurred? One possibility is that while both young human infants and
monkeys start with a broad range of sensitivity, the monkeys may be ‘stuck’ with this broad range
because of the more precocial state of their nervous system. The other possibility is that monkeys’
precocial brains are not stuck per se but, rather, are less plastic because of their more advanced
developmental state (Kaas 1991). Thus, vervets may still be sensitive to social experience, but it
may take them longer to incorporate the effects of such experience and, consequently, to exhibit
perceptual narrowing. The latter possibility is consistent with the development of vocal behaviour
in vervets in that their ability to produce vocalizations, use them in appropriate contexts, and
respond appropriately to the vocalizations of conspecifics emerges gradually during the first four
years of life (Seyfarth and Cheney 1986). For example, 3-month old infant vervets produce differ-
ent alarm calls according to three general categories: ‘terrestrial predator’, ‘aerial predator’, and
‘snake-like object’, but they do not distinguish between real predators and non-predators. Only
over the course of years do they restrict their alarm-calling to the small number of genuine
predators within each category. It is also consistent with the fact that in Japanese macaques
(another Old World monkey species), unisensory and multisensory representations of faces and
voices are influenced by the amount of exposure they have to conspecifics and heterospecifics
(Adachi et al. 2009; Sugita 2008) and that adults in many primate species show a behavioural
advantage for processing the faces of their own species over others (Dufour et al. 2006).
These comparative developmental data reveal that while monkeys and humans may appear to
share similarities at the behavioural and neural levels their different developmental trajectories
are likely to result in important differences. It is important to keep this in mind when making
claims about homologies at either of these levels.
16.4 Neurodevelopmental processes underlying

perceptual narrowing
The kinds of narrowing effects found in the development of speech, face, music, and perception,
as well as the multisensory perception of faces and voices, raise the question of what
putative neural mechanisms might underlie perceptual narrowing effects (Lewkowicz and
Ghazanfar 2009). Naturally, it is tempting to link ‘selectionist’ or regressive theories of neural

development (Cowan et al. 1984; Low and Cheng 2006) with the seemingly regressive nature of
perceptual narrowing. These theories postulate that neural development occurs in two stages, the
first of which is the construction of neuronal networks that are initially diffuse and somewhat
global in nature. This first stage is constructed through genetic and epigenetic factors and sets up
what will ultimately be considered ‘exuberant’ connections. The second stage involves the selec-
tive elimination of some of the connections in this initial network, leading to a more modularized
network that is better adapted to mediate mature perceptual and motor skills needed in the cur-
rent socio-ecological environment. In this stage, the reshaping (or ‘pruning’) of the network
occurs through the competitive stabilization of some synapses versus others. The competition is
decided through neural activity. Neural activity can be spontaneously generated by circuits (e.g.
during foetal stages), but is typically to be driven by sensory stimulation in postnatal life. This
neurodevelopmental scheme fits perfectly with the phenomena related to perceptual narrowing:
The initially diffuse network mediates the broad tuning of early infant perception, and experience
subsequently sculpts the network to generate more finely tuned perceptual capacities. Indeed, a
recent review of unisensory perceptual narrowing concludes that at the neural level, narrowing is
due to the pruning of exuberant, unneeded synaptic connections (Scott et al. 2007).
Though conceptually elegant, there are many problems with the selectionist scheme (Purves et al.
1996; Quartz and Sejnowski 1997). First, the basic premise of the theory is that there are abnormally
large axonal and dendritic arbors in the developing brain that represent the extra synapses present
in a particular developmental state. Although this may be true for a few brain regions (e.g. the tran-
sient connections between the visual cortex and the spinal cord in the developing rodent brain),
it is not true for many others (e.g. the axonal arborizations of thalamocortical neurons in layer 4 of
the rodent somatosensory cortex: Agmon et al. 1993, and ferret visual cortex: Crowley and Katz
1999). The fact that extra synapses are not widespread in the immature brain throws into question
the broad relevance of selectionist theories for cognitive development (Quartz and Sejnowski 1997).
The most damning evidence against selectionist theories, however, is simply that as the brain
matures, it grows considerably in size during postnatal life (Purves et al. 1996). This growth is attrib-
utable to neurons increasing their morphological complexity through the elaboration of axonal
and dendritic processes. For example, there is an explosive rise in the number of synapses in the
perinatal rhesus monkey brain, followed by a long period during which the number of synapses
is steady (Bourgeois and Rakic 1993). In other words, there is a net gain in synapses over the course
of development. As a result, the narrowing that is observed at the behavioural level is most likely
due to the formation of new neural connections rather than only to the loss of neurons and/or their
connections through a Darwinian-like process of selective pruning.
What does this all mean for the neural basis of perceptual narrowing? The neural development
data suggest that perceptual narrowing is more likely the result of a selective elaboration of
synapses, whose relevance is determined by postnatal experience, rather than the selective prun-
ing of irrelevant synapses. At best, synaptic pruning and selective synaptic elaboration may
work in concert. However, given the overall growth in size of the brain during development and
the lack of widespread evidence of pruning across different brain regions, the selective elaboration
should be considered the major driving force underlying perceptual narrowing. What would be
evidence for this?
Here’s an example. Six-month old human infants will readily discriminate between unfamiliar
monkey faces, but older infants cannot (Pascalis et al. 2002). If young infants are trained by their
caregivers to individuate monkey faces (e.g. by giving each monkey face a proper name), then
when they get older, they retain the ability to discriminate monkey faces (Scott and Monesson
2009). By measuring face-specific evoked-response potentials (ERPs) from human infant brains
CONCLUSIONS: DEVELOPMENT AND EVOLUTION 367
following trained versus untrained conditions with monkey faces, Scott and Monesson (2010)
were able to distinguish between two hypothetical neural developmental processes. If untrained
infants discriminate monkey faces when they are younger but not older, then one hypothesis is that
younger infants should show a face-specific ERP response to monkey faces but older infants should
not. In contrast, trained infants should exhibit a face-specific ERP response at all ages. These out-
comes would be consistent with the neural pruning hypothesis whereby untrained infants, as they
get older, lose synapses related to monkey-face processing, but training (experience) can stabilize
those synapses. Alternatively, younger infants could lack a face-specific ERP response to monkey
faces (though they can discriminate individuals readily at the perceptual level), but if they are
trained with monkey faces, a monkey-face-specific ERP will be apparent later in life (consistent
with their retention of the ability to discriminate monkey faces). This would suggest that the loss
of monkey-face perception during the normal course of human infancy is not due to neural prun-
ing, but simply the lack of experience with monkey faces that would lead to neural specialization.
Training with monkey faces would ameliorate that problem. The data are consistent with the gain
of a neural specialization (Scott and Monesson 2010). Untrained infants lack a face-specific ERP
response to monkey faces both when they are younger and when they are older. Infants trained to
individuate monkey faces exhibit a face-specific ERP response to monkey faces only when they are
older. Thus, experience with monkey faces leads to a neural specialization.
16.5 Conclusions: development and evolution

Individual development and species evolution operate on vastly different timescales. Trying to
make sense of how the two trajectories interact and influence each other is, naturally, a formida-
ble task. Yet, it is the only reasonable approach if we are to understand the origins of behaviour.
It is known that all vertebrate species share some similar trajectories during brain development.
For example, brainstem structures finish growing before forebrain structures, and later develop-
ing neural structures tend to be larger in size than earlier developing ones (Finlay et al. 2001).
In sensory development, the vestibular and somatosensory systems develop before the auditory
system, which in turn develops prior to the visual system in all vertebrates (Gottlieb 1971).
It is likely that the evolution of neural circuits involved in multisensory processes in primates is
constrained by a general developmental trajectory common to all mammalian species, but not as
constrained by the timing of different developing sensory systems (e.g. while order may be
constrained, the time of onset and/or amount of temporal overlap may not be). These differences
in timing and how they determine the magnitude of influence by sensory experience mould
brains into their species-typical forms. Thus understanding the evolution of multisensory proc-
esses requires delving into the deep history of developmental processes. In this chapter, I reviewed
what is known about multisensory perception in non-human primates—their neural basis, their
development, and the putative neural developmental processes that give rise to them. However,
it seems clear that to really gain insights into the evolutionary, neural, and developmental proc-
esses related to multisensory perception, we need more animal-model systems that explore the
role of pre- and postnatal experience in the formation of multisensory circuits.
Acknowledgements
The authors gratefully acknowledge the scientific contributions and numerous discussions with
the following people: Chand Chandrasekaran, Ipek Kulahci, Luis Lemus, David Lewkowicz, Joost
Maier, Darshana Narayanan, Stephen Shepherd, Daniel Takahashi and Hjalmar Turesson.
This work was supported by NIH R01NS054898, NSF BCS-0547760 CAREER Award and the
James S. McDonnell Scholar Award.
References
Adachi, I., Kuwahata, H., Fujita, K., Tomonaga, M., and Matsuzawa, T. (2006). Japanese macaques form a
cross-modal representation of their own species in their first year of life. Primates, 47, 350–54.
Adachi, I., Kuwahata, H., Fujita, K., Tomonaga, M., and Matsuzawa, T. (2009). Plasticity of the ability to
form cross-modal representations in infant Japanese macaques. Developmental Science, 12, 446–52.
Agmon, A., Yang, L.T., O’Dowd, D.K., and Jones, E.G. (1993). Organized growth of thalamocortical axons
from the deep tier of terminations into layer IV of developing mouse barrel cortex. Journal of
Antinucci, F. (1989). Systematic comparison of early sensorimotor development. In Cognitive structure and
development in nonhuman primates (ed. F. Antinucci), pp. 67–85. Lawrence Erlbaum Associates,
Hillsdale, New Jersey.
Barnes, C.L., and Pandya, D.N. (1992). Efferent cortical connections of multimodal cortex of the superior
temporal sulcus in the rhesus-monkey. Journal of Comparative Neurology, 318, 222–44.
Barraclough, N.E., Xiao, D., Baker, C.I., Oram, M.W., and Perrett, D.I. (2005). Integration of visual and
auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of
Cognitive Neuroscience, 17, 377–91.
Batterson, V.G., Rose, S.A., Yonas, A., Grant, K.S., and Sackett, G.P. (2008). The effect of experience on the
development of tactual-visual transfer in pigtailed macaque monkeys. Developmental Psychobiology, 50,
88–96.
Bauer, H.R. (1987). Frequency code: orofacial correlates of fundamental frequency. Phonetica, 44, 173–91.
Beauchamp, M.S., Lee, K.E., Argall, B.D., and Martin, A. (2004). Integration of auditory and visual
information about objects in superior temporal sulcus. Neuron, 41, 809–23.
Benevento, L.A., Fallon, J., Davis, B.J., and Rezak, M. (1977). Auditory-visual interactions in single cells in
the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey.
Experimental Neurology, 57, 849–72.
Besle, J., Fort, A., Delpuech, C., and Giard, M.H. (2004). Bimodal speech: early suppressive visual effects in
human auditory cortex. European Journal of Neuroscience, 20, 2225–34.
Bizley, J.K., Nodal, F.R., Bajo, V.M., Nelken, I., and King, A.J. (2007). Physiological and anatomical
evidence for multisensory interactions in auditory cortex. Cerebral Cortex, 17, 2172–89.
Bourgeois, J.P., and Rakic, P. (1993). Changes of synaptic density in the primary visual cortex of the
macaque monkey from fetal to adult stage. Journal of Neuroscience, 13, 2801–20.
Bruce, C., Desimone, R., and Gross, C.G. (1981). Visual properties of neurons in a polysensory area in
superior temporal sulcus of the macaque. Journal Of Neurophysiology, 46, 369–84.
Calvert, G.A., and Campbell, R. (2003). Reading speech from still and moving faces: the neural substrates of
visible speech. Journal of Cognitive Neuroscience, 15, 57–70.
Calvert, G.A., Bullmore, E.T., Brammer, M.J., et al. (1997). Activation of auditory cortex during silent
lipreading. Science, 276, 593–96.
Calvert, G.A., Campbell, R., and Brammer, M.J. (2000). Evidence from functional magnetic resonance
imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–57.
Cappe, C., and Barone, P. (2005). Heteromodal connections supporting multisensory integration at low
levels of cortical processing in the monkey. European Journal of Neuroscience, 22, 2886–2902.
Carriere, B.N., Royal, D.W., Perrault, T.J., et al. (2007). Visual deprivation alters the development of
cortical multisensory integration. Journal of Neurophysiology, 98, 2858–67.
Chandrasekaran, C., and Ghazanfar, A.A. (2009). Different neural frequency bands integrate faces and
voices differently in the superior temporal sulcus. Journal of Neurophysiology, 101, 773–88.
Clancy, B., Darlington, R.B., and Finlay, B.L. (2000). The course of human events: predicting the timing of
primate neural development. Developmental Science, 3, 57–66.
Cowan, W.M., Fawcett, J.W., O’Leary, D.D., and Stanfield, B.B. (1984). Regressive events in neurogenesis.
Science, 225, 1258–65.
REFERENCES 369
Crowley, J.C., and Katz, L.C. (1999). Development of ocular dominance columns in the absence of retinal
input. Nature Neuroscience, 2, 1125–30.
Deacon, T.W. (1990). Rethinking mammalian brain evolution. American Zoologist, 30, 629–705.
Driver, J., and Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ‘sensory-
specific’ brain regions, neural responses, and judgments. Neuron, 57, 11–23.
Dufour, V., Pascalis, O., and Petit, O. (2006). Face processing limitation to own species in primates: a
comparative study in brown capuchins, Tonkean macaques and humans. Behavioural Processes, 73,
107–113.
Ettlinger, G., and Wilson, W.A. (1990). Cross-modal performance: behavioural processes, phylogenetic
considerations and neural mechanisms. Behavioural Brain Research, 40, 169–92.
Evans, T.A., Howell, S., and Westergaard, G.C. (2005). Auditory-visual cross-modal perception of
communicative stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology-
Animal Behavior Processes, 31, 399–406.
Finlay, B.L., and Darlington, R.B. (1995). Linked regularities in the development and evolution of
mammalian brains. Science, 268, 1578–84.
Finlay, B.L., Darlington, R.B., and Nicastro, N. (2001). Developmental structure of brain evolution.
Behavorial and Brain Sciences, 24, 263–308.
Fitch, W.T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus
macaques. Journal of the Acoustical Society of America, 102, 1213–22.
Fitch, W.T., and Hauser, M.D. (1995). Vocal production in nonhuman-primates—acoustics, physiology,
and functional constraints on honest advertisement. American Journal of Primatology, 37, 191–219.
Ghazanfar, A.A., and Logothetis, N.K. (2003). Facial expressions linked to monkey calls. Nature, 423, 937–38.
Ghazanfar, A.A., and Rendall, D. (2008). Evolution of human vocal production. Current Biology, 18, R457–60.
Ghazanfar, A.A., and Schroeder, C.E. (2006). Is neocortex essentially multisensory? Trends In Cognitive
Sciences, 10, 278–85.
Ghazanfar, A.A., Maier, J.X., Hoffman, K.L., and Logothetis, N.K. (2005). Multisensory integration of
dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience, 25, 5004–5012.
Ghazanfar, A.A., Turesson, H.K., Maier, J.X., Van Dinther, R., Patterson, R.D., and Logothetis, N.K.
(2007). Vocal tract resonances as indexical cues in rhesus monkeys. Current Biology, 17, 425–30.
Ghazanfar, A.A., Chandrasekaran, C., and Logothetis, N.K. (2008). Interactions between the superior
temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys.
Journal Of Neuroscience, 28, 4457–69.
Gibson, K.R. (1991). Myelination and behavioral development: a comparative perspective on questions of
neoteny, altriciality and intelligence. In Brain maturation and cognitive development: comparative and
cross-cultural perspectives (eds. K.R. Gibson and A.C. Petersen), pp. 29–63. Aldine de Gruyter, New York.
development (eds. E. Tobach, L.R. Aronson and E. Shaw), pp. 67–128. Academic Press, New York.
Gottlieb, G. (1992). Individual development and evolution: the genesis of novel behavior. Oxford University
Press, New York.
Gunderson, V.M. (1983). Development of cross-modal recognition in infant pigtail monkeys (Macaca-
Nemestrina). Developmental Psychology, 19, 398–404.
Gunderson, V.M., Rose, S.A., and Grantwebster, K.S. (1990). Cross-modal transfer in high-risk and low-
risk infant pigtailed macaque monkeys. Developmental Psychology, 26, 576–81.
Hauser, M.D., Evans, C.S. and Marler, P. (1993). The role of articulation in the production of rhesus-
monkey, macaca-mulatta, vocalizations. Animal Behaviour, 45, 423–33.
Hauser, M.D., and Ybarra, M.S. (1994). The role of lip configuration in monkey vocalizations—experiments
using xylocaine as a nerve block. Brain and Language, 46, 232–44.
Izumi, A. and Kojima, S. (2004). Matching vocalizations to vocalizing faces in a chimpanzee (Pan
troglodytes). Animal Cognition, 7, 179–84.
Jiang, J.T., Alwan, A., Keating, P.A., Auer, E.T., and Bernstein, L.E. (2002). On the relationship between
face movements, tongue movements, and speech acoustics. Eurasip Journal on Applied Signal
Processing, 2002, 1174–88.
Jordan, K.E. and Brannon, E.M. (2006). The multisensory representation of number in infancy. Proceedings
Of The National Academy Of Sciences U.S.A., 103, 3486–89.
Jordan, K.E., Brannon, E.M., Logothetis, N.K., and Ghazanfar, A.A. (2005). Monkeys match the number of
voices they hear with the number of faces they see. Current Biology, 15, 1034–38.
Kaas, J.H. (1991). Plasticity of sensory and motor maps in adult animals. Annual Review of Neuroscience, 5,
137–67.
Kayser, C., Petkov, C.I., Augath, M., and Logothetis, N.K. (2007). Functional imaging reveals visual
modulation of specific fields in auditory cortex. Journal Of Neuroscience, 27, 1824–35.
Kayser, C., Petkov, C.I., and Logothetis, N.K. (2008). Visual modulation of neurons in auditory cortex.
Cerebral Cortex, 18, 1560–74.
Kingsbury, M.A., and Finlay, B.L. (2001). The cortex in multidimensional space: where do cortical areas
come from? Developmental Science, 4, 125–57.
Konner, M. (1991). Universals of behavioral development in relation to brain myelination. In Brain
maturation and cognitive development: comparative and cross-cultural perspectives (eds. K.R. Gibson and
A.C. Petersen), pp. 181–223. Aldine de Gruyter, New York.
human infants. Proceedings of the National Acadademy Sciences U.S.A., 103, 6771–74.
Lewkowicz, D.J. and Lickliter, R. (1994). The development of intersensory perception: comparative
perspectives. Lawrence Erlbaum Associates, Hillsdale, N.J.
Lickliter, R. (1990). Premature visual-stimulation accelerates intersensory functioning in bobwhite quail
neonates. Developmental Psychobiology, 23, 15–27.
Lickliter, R., and Stoumbos, J. (1991). Enhanced prenatal auditory experience facilitates species-specific
visual responsiveness in bobwhite quail chicks (Colinus virginianus). Journal of Comparative Psychology,
105, 89–94.
Low, L.K. and Cheng, H.J. (2006). Axon pruning: an essential step underlying the developmental plasticity
of neuronal connections. Philosophical Transactioms of the Royal Society of London, B. Biological
Sciences, 361, 1531–44.
Malkova, L., Heuer, E., and Saunders, R.C. (2006). Longitudinal magnetic resonance imaging study of
rhesus monkey brain development. European Journal of Neuroscience, 24, 3204–3212.
Noesselt, T., Rieger, J.W., Schoenfeld, M.A. et al. (2007). Audiovisual temporal correspondence modulates
human multisensory superior temporal sulcus plus primary sensory cortices. Journal Of Neuroscience,
27, 11431–41.
Owren, M.J., Seyfarth, R.M., and Cheney, D.L. (1997). The acoustic features of vowel-like grunt calls in
chacma baboons (Papio cyncephalus ursinus): Implications for production processes and functions.
Journal of the Acoustical Society of America, 101, 2951–63.
Parr, L.A. (2004). Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition.
Animal Cognition, 7, 171–78.
Partan, S.R. (2002). Single and multichannel signal composition: facial expressions and vocalizations of
rhesus macaques (Macaca mulatta). Behaviour, 139, 993–1027.
Pascalis, O., Haan, M.D. and Nelson, C.A. (2002). Is face processing species-specific during the first year of
life? Science, 296, 1321–23.
Patterson, M.L., and Werker, J.F. (2003). Two-month-old infants match phonetic information in lips and
REFERENCES 371
Purves, D., White, L. E., and Riddle, D. R. (1996). Is neural development darwinian? Trends in
Quartz, S.R., and Sejnowski, T.J. (1997). The neural basis of cognitive development: a constructivist
manifesto. Behavioral and Brain Sciences, 20, 537–56.
Rendall, D., Owren, M.J., and Rodman, P.S. (1998). The role of vocal tract filtering in identity cueing in rhesus
monkey (Macaca mulatta) vocalizations. Journal of the Acoustical Society of America, 103, 602–614.
Sacher, G.A., and Staffeldt, E.F. (1974). Relation of gestation time to brain weight for placental mammals:
implications for the theory of vertebrate growth. American Naturalist, 108, 593–615.
Sams, M., Aulanko, R., Hamalainen, M., et al. (1991). Seeing speech: visual information from lip
movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–45.
Schneirla, T.C. (1949). Levels in the psychological capacities of animals. In Philosophy for the future (eds.
R.W. Sellars, V.J. Mcgill and M. Farber), pp. 243–86. Macmillan, New York.
Schroeder, C.E., and Foxe, J.J. (2002). The timing and laminar profile of converging inputs to multisensory
areas of the macaque neocortex. Cognitive Brain Research, 14, 187–98.
Scott, L.S., and Monesson A. (2009). The origin of biases in face perception. Psychological Science, 20, 676–80.
Scott, L.S., and Monesson A. (2010). Experience-dependent neural specialization during infancy.
Scott, L.S., Pascalis, O., and Nelson, C.A. (2007). A domain general theory of the development of perceptual
discrimination. Current Directions in Psychological Science, 16, 197–201.
Seltzer, B., and Pandya, D.N. (1994). Parietal, temporal, and occipital projections to cortex of the superior
temporal sulcus in the rhesus monkey: a retrograde tracer study. Journal of Comparative Neurology,
343, 445–63.
Seyfarth, R.M., and Cheney, D.L. (1986). Vocal development in vervet monkeys. Animal Behaviour, 34,
1640–58.
Sugita, Y. (2008). Face perception in monkeys reared with no exposure to faces. Proceedings of the National
Academy of Sciences U.S.A., 105, 394–98.
Turkewitz, G., and Kenny, P.A. (1982). Limitations on input as a basis for neural organization and
perceptual development: a preliminary theoretical statement. Developmental Psychobiology, 15, 357–68.
van Wassenhove, V., Grant, K.W., and Poeppel, D. (2005). Visual speech speeds up the neural processing of
auditory speech. Proceedings Of The National Academy Of Sciences, U.S.A., 102, 1181–1186.
Wallace, M.T., Carriere, B.N., Perrault, T.J., Vaughan, J.W., and Stein, B.E. (2006). The development of
cortical multisensory integration. Journal Of Neuroscience, 26, 11844–49.
Wright, T.M., Pelphrey, K.A., Allison, T., Mckeown, M.J., and McCarthy, G. (2003). Polysensory
interactions along lateral temporal regions evoked by audiovisual speech. Cerebral Cortex, 13, 1034–43.
Yehia, H., Rubin, P., and Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract and facial
behavior. Speech Communication, 26, 23–43.
Yehia, H.C., Kuratate, T., and Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and
speech acoustics. Journal of Phonetics, 30, 555–68.
Zangenehpour, S., Ghazanfar, A.A., Lewkowicz, D.J., and Zatorre, R.J. (2009). Heterochrony and cross-
species intersensory matching by infant vervet monkeys. PLoS ONE, 4, e4302.
Author index
Alain, C. 259 Hebb, D.O. 345–6, 350–4, 356

Althaus, N. 342–59 Hepper, P.G. 39
Aronson, E. 278 Hill, E.L. 273–300
Ayres, A.J. 287 Holmes, N.P. 113–36
Hötting, K. 310
Bahrick, L.E. 160–1, 183–206, 304 Houston-Price, C. 67
Bairstow, P.J. 280, 281 Hugenschmidt, C.E. 251–70
Ballard, D.H. 348 Hulme, C. 280
Belmont, L. 289
Berger, S.E. 197 James, W. 1, 11, 69, 79, 159
Birch, H.G. 114, 289 Jouen, F. 103
Bremner, A.J. 1–26, 113–36, 273–300
Bremner, J.G. 144, 145 Kane-Martinelli, C. 77
Brillat-Savarin, Anton 77 Kaplan, J.N. 47
Burgess, N. 144 Kohonen, T. 345, 350
Burnham, D. 209, 212 Krueger, J. 334–5
Calabresi, M. 207–28 Laszlo, J.I. 280, 281

Cardello, A.V. 72 Laurienti, P.J. 251–70
Carriere, B.N. 305, 309, 331, 334–5 Lavin, J. 73
Chan, M.M. 77 Lawless, H. 73
Christensen, C. 78 Lee, D.N. 278
Collignon, O. 312 Lee, S.A. 138–9
Colombo, J. 195 Lefford, A. 114
Colonius, H. 254, 310 Lewandowsky, S. 343
Cowie, D. 137–58 Lewkowicz, D. J. 1–26, 46, 48, 90, 159–82, 207–28,
Craig, A.D. 51 301, 302, 325, 361
Crane, L. 273–300 Lickliter, R. 163, 164, 183–206
Lishman, J.R. 138–9
de Sa, V.R. 348
De Volder, A.G. 312 Ma, W.J. 356
Delaunay-El Allam, M. 46 McGurk, H. 161, 213
Diderot, D. 96 MacKain, K. 218–19
Diederich, A. 254 Mareschal, D. 3–4, 342–59
Dodd, R. 209, 212 Maurer, D. 229–50, 302
Driver, J. 258 Mennella, J.A. 66
Durand, K. 29–62 Meredith, M.A. 256, 257, 326, 327
Millar, S. 306
Fawcett, A.J. 290–1 Molina, M. 103
Fister, J.K. 325–41 Molyneux, W. 88, 97
Fister, M.C. 325–41 Morgan, R. 115–16
Flom, R. 188, 192, 196
Nardini, M. 137–58
Gentaz, E. 98, 100, 104 Navarra, J. 207–28
Gergely, G. 285 Nicolson, R.I. 290–1
Ghazanfar, A.A. 169–71, 173, 302, 325, 360–71 Nidiffer, A.R. 325–41
Ghose, D. 325–41
Gibson, E.J. 11–12, 89, 99, 159, 162, 165, 185, 304 Oberman, L.M. 284–5
Gibson, J.J. 138, 144, 159, 185 Oram, N. 70–1
Gibson, L.C. 229–50
Gottlieb, G. 165, 166, 168, 304, 360 Pagel, B. 125–6, 313
Graziano, M.S.A. 114, 124 Peiffer, A.M. 252
Groh, J.M. 120 Philipsen, D.H. 78–9
Piaget, J. 10, 208, 302, 325, 355
Hairston, W.D. 289–90 Poliakoff, E. 256–7
Harlow, H.F. 47 Pons, F. 171–2, 216
374 AUTHOR INDEX
Ramachandran, V.S. 284–5 Stein, B.E. 287–8, 304, 325, 326, 327, 329, 333
Renshaw, S. 125 Stein, J. 290
Rider, E.A. 146 Streri, A. 88–112
Rieser, J.J. 146 Stroop, J.R. 231–2, 264
Rochat, P. 115–16 Strupp, M. 251–2
Röder, B. 119, 125, 301–22
Rose, S.A. 94, 114–15 Teinonen, T. 212–13
Tsang, H.Y. 145
Sann, C. 104 Turkewitz, G. 11, 30, 163, 168, 304
Santangelo, V. 197
Schaal, B. 29–62, 89 Van Beers, R.J. 119, 129
Schmuckler, M. 115, 141, 145 Van der Meer, A.L. 123
Shankar, M.U. 74–5
Shumway-Cook, A. 140 Wallace, M.T. 304, 309, 325–41, 360
Simon, J.R. 311 Wang, R.F. 144
Smith, L. 12 Weikum, W.M. 210, 211, 215
Soto-Faraco, S. 207–28 Werker, J.F. 207–28
Sparks, D.L. 120 Westerman, G. 342–59
Spector, F. 229–50 Wollacott, M.H. 140, 142
Spelke, E.S. 144
Spence, C. 1–26, 63–87, 113–36, 258 Yordanova, J. 263
Subject index
abstract amodal information hypothesis 99 spatial localization 151

active intermodal matching hypothesis 99, 100 auditory cues 64–5
affordances 30, 33, 37, 48, 49 auditory development 7
aging 17–18 auditory spatial learning 304
Alzheimer’s disease 263 auditory-motor coupling models 354–5
amniotic fluid 66, 89 autism spectrum disorder 17, 200, 273, 274, 282–7
amodal crossmodal correspondence 12 core multisensory deficits 282, 284–5
amodal perception 106, 160 diagnostic criteria 283–4
salience of 184–5 evidence for multisensory deficits 285–7
temporal synchrony 163, 164, 170, 171, 185 links with developmental coordination disorder 287–8
amodal synchrony 15
amygdala 34 babbling 354–5
animal studies balance 137–58
application to human research 337–8 development 137–40, 142
multisensory development 329 dynamic 137
multisensory integration 173, 174, 326–7 impairment 278–9
non-human primates static 137
cross-species face-voice matching 364–6 baseline sensory processing, age-related changes 260–3
multisensory integration 173–4 Bayesian decision theory 72, 81
vocalizations 361–2 Bayesian models 150–2
olfaction 47 birds 360
superior colliculus 15–16, 164 bitter taste 64, 67
visual input in multisensory development 304–5 aversion to 67
see also individual species blindness 220
ankle joint proprioception 139, 140 animal studies 304–5
anterior ectosylvian sulcus 326, 329 congenital 305–12
anterior olfactory nucleus 34 crossmodal plasticity 302–3
arbitrary crossmodal correspondence 12 late onset 312–13
arm matching task 281 spatial perception 305
Asperger’s disorder 283–4 ‘blooming buzzing confusion’ 1, 11, 69, 159
see also autism spectrum disorder bodily illusions 119, 126
attention see also body representations
bias 198–9 body changes 118
capacity 264 body representations 113–36
salience 198 canonical multisensory 118–19
selective see selective attention computational challenges 117
attention deficit hyperactivity disorder 274 developmental changes 125–6
atypical multisensory development 273–300 extrapersonal space 113
audiovisual coupling models 349–54 limb distributions 118
audiovisual speech perception 207–28 multisensory nature 114
developmental changes 213–14 neural construction 124–5
early capacity for 207–9 orienting of tactile stimuli 121–3
McGurk effect 161, 209, 212 peripersonal space 113, 114, 117–21
multilingual environments 214–17 and early infant behaviours 123–4
narrowing 171–3 postural remapping 118, 119–21
neural origins 217–19 visual spatial reliance 118, 128–30
real-time 212 visual-proprioceptive/visual-tactile
sensory deprivation 219–20 correspondence 114–17
audiovisual stimuli 143, 164 bodily illusion 126–8
audition 3, 5, 8, 32, 33, 36, 91, 142, 257, 258, 348 bottom-up modulation 256–7
blind adults 99 bounce illusion 161–2
in flavour perception 65–6 brain
links olfactory regions 34
with touch 14 structure and function 15
with vision 14, 362 timing of development 363–5
prenatal development 161 see also individual regions
376 SUBJECT INDEX
breastfeeding 41–4, 66 cross-species multisensory perception

odour cues 89 at birth 171, 172
Broca’s area 303 mechanisms 169–71
Brodmann’s area 114 narrowing 169
non-human primates 173–4
canalization 166 crossmodal calibration 8–9
see also perceptual narrowing crossmodal compensation 314–15
canonical multisensory body representations 118–19 crossmodal correspondence 11, 13, 208, 301
orienting to tactile stimuli 121–3 amodal 12
capture 98 arbitrary 12
capuchins (Cebus apella) linguistic processing 207
multisensory matching 173 up-down (vertical) 115, 116
vocalizations 362 crossmodal interactions 348
cats enhancement 65, 67
anterior ectosylvian sulcus 326, 329 neonates 88–112
multisensory development 328–9 experimental evidence 97–101
role of experience 332 haptic-visual 96–105
multisensory integration 326–7, 360 shape versus texture 102–5
superior colliculus 329 touch and vision 92–6
chemical senses 33–4, 89 shape versus texture 102–5
chemoreception/chemoreceptors 5, 33 crossmodal matching 115
complexity 31–2 crossmodal plasticity 302–4
ontogenetic precocity 33 functional imaging 303
spatial cues 33 crossmodal transfer 97, 100–1, 114, 115
unitary precepts 32–3
see also olfaction dark-reared animals 304–5
chemosensation deafness 219–20
development 7 default-mode network 261–3
environmental exposure effects 35 developmental broadening 165–6
information processing 13–14 developmental changes 17–18, 198–9
modularity of 34 audiovisual speech perception 213–14
chemosensory integration 40–1 auditory system 7
chemosensory irritation 89 balance 137–40, 142
chilli peppers 64 body representations 125–6
chimpanzees (Pan troglodytes) chemosensation 7
multisensory matching 173 cognitive 131
vocalizations 361–2 flavour learning 66–8
cognitive aging see developmental changes in old age flavour perception 68–9
cognitive development 131 locomotion 140–2
cognitive map 144 multisensory see multisensory development
cognitive processing speed 263–4 postural remapping 124–5
cognitive slowing, age-related 263 spatial orienting 142–5
colour spatial recall 147
and flavour expectations 68–9, 75, 76 tactile perception 163
older adults 77–8 vestibular system 7
preferences 67–8 visual cues 68–9
colour-grapheme synaesthesia 229, 230, 231, 235, see also age-related changes; experience
240–2 developmental changes in old age 251–70
colour-letter synaesthesia 240–2 audiovisual speech perception 214
coloured hearing synaesthesia 99, 230 baseline sensory processing 260–3
pitch 239–40 flavour perception 75, 77–9
coloured number synaesthesia 99 mechanisms 263–4
common affordances 30 attentional capacity 264
computational modelling 342–4 cognitive processing speed 263–4
see also models inhibitory control 264
conditioned aversion 46 multisensory integration 251–5
connectionist models 344–6 impact 255
contour following 93 magnitude 251–4
cortical brain networks 325–41 probability 254–5
cortical inhibition 236 neural mechanisms 255–60
cortical pruning see synaptic pruning bottom-up modulation 256–7
critical developmental periods 314–16 top-down modulation 257–60
cross-species face-voice matching 364–6 see also developmental changes
SUBJECT INDEX 377
developmental coordination disorder 274, 275–82 facilitation

diagnostic criteria 278 across development 196–7
links with autism spectrum disorder 287–8 intersensory 188–94
multisensory processing abnormalities 275, 278–9 unimodal 194–5
unisensory origins 280–2 feed-forward networks 345
developmental differentiation 208 ferrets 326, 327
developmental disorders 17, 273–300 fetus
unisensory vs. multisensory processing 274 flavour learning 66–8
vulnerability of multisensory integration 274–5 olfaction 37–40
see also individual disorders amniotic fluid 66, 89
developmental dyslexia 17, 273, 274, 288–92 associative learning 37–9
diagnostic criteria 288 supramodel properties 39–40
multisensory processing abnormalities 288–90 pseudo-respiratory movements 38
developmental enrichment 5–11 sensory inputs 163
benefits of multiple senses 5–10 fixation 261
challenges of multiple senses 10–11 flavour 13–14, 32
developmental integration 162–3, 208 expectations 81
developmental narrowing 166 and food colour 68–9, 75, 76
see also perceptual narrowing identity 69, 80
developmental psychology 2, 14 intensity 69, 80
developmental robotics 199–200 preferences 67
DevLex 350 flavour learning
diffusion tensor imaging 103, 234–5 later development 66–8
direct receptors 5 prenatal/perinatal 66
direction cells 148 flavour perception 63–87
Dislex 350 decline of 75, 77–9
‘distance’ receptors 131 gustation and olfaction 66–8
dynamic balance 137 prenatal/perinatal learning 67
dysgraphia 288 senses contributing to 64–6
dyslexia, developmental 17, 273, 274, 288–92 visual cues 64–5
dysphasia 288 crossmodal influence 70–3
developmental changes 68–9
ectosylvian cortex 305 expertise 73–5
egocentric coordinate system 144 flavour-interoception association 46
embodied cognition 14 foods
embodiment 14–15, 113, 343 colour preferences 67–8
see also body representations oral-somatosensory qualities 64, 65
entorhinal cortex 34 sensory attributes 67–8
environment 16–17 functional imaging 1
extrapersonal 113, 116, 126, 131 crossmodal plasticity 303
peripersonal see peripersonal environment resting brain function 261, 262
event-related potentials 307–10 synaesthesia 232–3
congenitally blind humans 309–10
evoked-response potentials 367 global emotional moment 51
expectancy effects 68–9, 72 grasping 10, 105, 124
experience 73–5, 159–82, 196–7 grid cells 144, 148
developmental role 331–3 gustation see taste
broadening 165–6
narrowing 166 hand pressure frequency 103
perceptual narrowing 168 haptic information 141
sensory 331–3 haptic perception 9, 93, 94
and synaesthesia 236–7 haptic-visual interaction 96–105
see also developmental changes head direction cells 144
experience-dependent synaptic pruning 233–6 hearing see audition; and entries
expertise see experience under auditory
exploratory procedures 93 Hebbian connections 345–6, 350–4, 356
extrapersonal environment 113, 116, 126, 131 hedonic valence 33
heterochrony 364
face recognition 2 hippocampus 34, 148
other race effect 167 place cells 144, 148
perceptual narrowing 167 hypersensitivity 274–5, 287–8
face-voice matching 361–2 hyposensitivity 274–5, 287–8
cross-species 364–6 hypothalamus 34
378 SUBJECT INDEX
illusion tasks 126–8 self-organizing 345, 347

infants somatosensory 345
babbling 354–5 marmosets (Callithrix jacchus), lack of facial
multisensory development 207–9 communication 362
olfaction 44–52 mean exemplar distance 353
animal studies 47 mechanoreceptors 138
background cues 49–52 medial superior temporal area 138
foreground cues 45–9 medial temporal lobe 144
synaesthesia 229–50 memory 185–7
word learning 349–54 olfactory 36
see also neonates metallic taste 64
information integration, middle temporal cortex 138
Bayesian models 150–2 mirror illusion task 126–8
inheritance 16–17 mirror neurons 284, 303
inhibitory control 264 modality-specific cues 160, 258
interactive index 327 models 342–4
intersensory contingency learning 45 audio-visual coupling 349–54
intersensory facilitation 188–94 auditory-motor coupling 354–5
intersensory redundancy 9–10, 12, 187–97 Bayesian 150–2
intersensory facilitation 188–94 connectionist 344–6
selective attention development 195–6 learning and development 344–6
task difficulty and expertise 196–7 multisensory integration 346–9
unimodal facilitation 194–5 race 252, 253
inverse effectiveness principle 214, 328 transparency 343
visual-motor coupling 355
kinaesthetic acuity task 280, 281 molecular genetics 336
kinaesthetic deficit 280 Movement ABC task 278
kinaesthetic information 278 multimodal classifier 348
knowledge 346 multilingual environment 210, 214–17
utility 33 multiple sensory systems
see also experience anatomy and function 6, 7
benefits 5–10
language challenges 5–10
discrimination 210, 211, 215 multisensory circuits 327–8
multilingual environment 210, 214–17 developmental
olfactory links 36–7 chronology 328–9
perception 209 interactive index 327
phonemes 215–16 inverse effectiveness principle 328
word learning 349–54 mean statistical contrast 328
see also vocal communication response properties 330–1
lateral occipital complex 102 spatial principle 328
learning 17–18, 185–7 multisensory development 164–5
auditory spatial 304 animal studies 326–7
flavour 66–8 atypical 273–300
intersensory contingency 45 critical periods 314–16
models 344–6 differentiation theory 302, 308
olfaction 37–40 infants 207–9
words 349–54 integration theory 302
see also developmental changes; multisensory mechanisms of 342–59
development role of experience 331–3
limbs, changing distribution of 118 role of visual input 304–13
locomotion 137–58 congenitally blind humans 305–12
development of 140–2 late-blind humans 312–13
loudness-brightness matching 238–9 visually deprived animals 304–5
sensory experience in 331–3
macaques (Macaca mulatta) timing of 363–5
multisensory matching 173 vocal communication 360–71
vocalizations 362 see also entries under developmental
McGurk effect 161, 209, 212, 259, 348 multisensory differentiation 302, 308
magnitude mapping 238–9 multisensory illusions 161–2
maps/mapping multisensory integration 302
cognitive 144 cross-species 169, 173, 174, 326–7
magnitude 238–9 at birth 171, 172
postural remapping 118, 119–21 mechanisms 169–71
SUBJECT INDEX 379
non-human primates 173–4 odours 5, 13–14, 30

development see multisensory development amniotic fluid 66, 89
inputs 160–2 breast milk 41–4, 66
models 346–9 persistence in memory 36
narrowing see perceptual narrowing older adults see developmental changes in old age
receptive-field architecture 333–6 olfaction 29–62, 66–8
redundancy 160–1 decline of 77
subcortical and cortical brain networks 325–41 early functions 34–5
multisensory interactions 1–26 foetus 37–40
classification 4 associative learning 37–9
developmental enrichment 5–11 supramodel properties 39–40
neonates 88–91 infants and young children 44–52
multisensory speeding 9 background cues 49–52
music perception 167 foreground cues 45–9
myelination 364 linguistic links 36–7
neonates 40–4
narrowing breastfeeding 41–4, 89
developmental 166 chemosensory integration 40–1
perceptual 166–74 social contexts 41–4
audiovisual speech 171–3 neural architecture 34
cross-species 169–71, 173–4 ontogenetic precocity 33
experience effects 168 orthonasal 32, 64
face recognition 167 retronasal 32, 64
multisensory 169 see also chemoreception
music 167 olfactory bulbs 31
speech 166–7 olfactory memory 36
navigation 137–58 olfactory receptors 5, 63, 89
and spatial recall 145–9 olfactory sensory neurons 31
neonates olfactory stimuli, subliminal processing 36
cross-species multisensory perception 171 olfactory tubercle 34
crossmodal interactions 88–112 optic flow 138, 144
experimental evidence 97–101 oral irritation 64
haptic-visual 96–105 orbitofrontal cortex 34
shape versus texture 102–5 organisation of reciprocal assimilation 10
touch and vision 92–6 orienting behaviour 137–58
flavour learning 66–8 development of 142–5
haptic perception 93, 94 and reaching 144–5
manual abilities 96 visual 142–3
multisensory interactions 88–91 orthonasal olfaction 32, 64
numerosity 90, 91 other race effect 167
olfaction 40–4
breastfeeding 41–4, 89 parcellation 69
chemosensory integration 40–1 peakaboo task 144–5
social contexts 41–4 perception 185–7
perceptual abilities 96 amodal see amodal perception
peripersonal spatial representations 123–4 audiovisual speech 207–28
selective attention 184 cross-species multisensory 169–74
stimulus orientation 89 flavour 63–87
tactile perception 93–4 haptic 9, 93, 94
assessment 94–6 language 209
grasping 10, 105 phonemes 215–16
see also infants music 167
neuroconstructivism 3–4 spatial 305
neuroscience 15–16 speech see speech perception
non-human primates tactile 93–4, 163
cross-species face-voice perceptual narrowing 166–74, 302
matching 364–6 audiovisual speech 171–3, 215
multisensory integration 173–4 cross-species 169–71
vocalizations 361–2 experience effects 168
see also individual species face perception 167
multisensory 169
obesity 79–80 music perception 167
odorants 32 neurodevelopmental processes 365–7
odour-taste associations 32–3 speech perception 166–7
380 SUBJECT INDEX
periamygdalian cortex 34 saccadic reaction times 310

peripersonal environment 113, 114, 117–21 salience
canonical multisensory body representations amodal perception 184–5
118–19 attentional 198
and early infant behaviours 123–4 hierarchies 198–9
neural construction 124–5 salty taste 64
orienting to tactile stimuli 121–3 scale errors 113
postural remapping 118, 119–21 selective attention 258
pheromones 35 developmental improvement 195–6
phonemes 350 neonates 184
perception of 215–16 perception, learning and memory 185–7
phonetic discrimination 210 self-motion 137, 144, 145, 146–8
phonetic matching 209 self-organizing maps 345, 347
see also audiovisual speech perception semantic congruency 13
piriform cortex 34 sensorimotor impairment 274–5
pitch, visual associations 239–40 sensory deprivation 301–22
place cells 144, 148 and audiovisual speech perception 219–20
platform perturbation technique 139–40 crossmodal plasticity 302–4
postural remapping 118, 119–21 supramodal spatial representations 313–14
development of 124–5 visual 220
orienting to tactile stimuli 121–3 animal studies 304–5
prefunctional/predisposed stimulus-response congenital blindness 305–12
loops 35 crossmodal plasticity 302–3
prenatal see fetus late onset blindness 312–13
primates sensory experience 331–3
brain development rate 363–5 sensory input 2, 40, 41, 65, 69, 160–2
evolution of vocal communication 360–71 visual 304–13
non-human see non-human primates sensory threshold 257
process-oriented approaches 16–17 sentient self 51
projector synaesthesia 229, 236 sequence-form synaesthesia 230
proprioception 5, 9, 114, 138, 278 shape 102–5
ankle joint 139, 140 similarity of processes 101–2
defective 280 similarity of representation 101–2
prosody 215 Simon effect 311
somatosensory maps 345
race model 252, 253 somesthesis 32, 33
reaching 144–5 sound
reaction time 252, 254, 258 loudness-brightness matching 238–9
receptive fields 304–5 symbolism 242–3
architecture 333–6 sound-shape synaesthesia 242–3
spatial 333–5 sour taste 64
spatiotemporal 335–6 spatial attention 301
receptors 5, 130 chemoreception 33
chemoreceptors see chemoreception/ congenitally blind humans 305–12
chemoreceptors spatial awareness 113–36
direct 5 scale errors 113
‘distance’ 131 spatial coding 10–11
maturation 7–8 spatial correspondence 115
mechanoreceptors 138 spatial integration 12–13
olfactory 5, 63, 89 spatial orientation 143
taste 31, 64 blind people 305
visual 6, 66 spatial principle 328, 332
redundancy spatial recall
intersensory 9–10, 12, 183–206 development of 147
multisensory 160–1 and navigation 145–9
retronasal olfaction 32, 64 spatial receptive fields 333–5
rhesus monkeys spatial representations 313–14
multisensory development 329 extrapersonal 113, 116, 126, 131
vocalizations 362 peripersonal see peripersonal environment
robotics 355 supramodal 313–14
developmental robots 199–200 spatial task 148–9
robust cue integration 151 spatial updating 144
rooting behaviour 34 spatial ventroliquism 161
SUBJECT INDEX 381
spatiotemporal coincidence 66 taste 5, 13–14, 63, 64, 66–8

spatiotemporal receptive fields 335–6 bitter 64, 67
speech perception decline of 77
audiovisual see audiovisual speech perception information processing 13–14
discrimination 210 metallic 64
narrowing 166–7 ontogenetic precocity 33, 35
sensitivity 209–13 preferences 67
squirrel monkeys, olfactory development 47–8 umami 35, 64
static balance 137 see also flavour; and individual tastes
stimulus onset asynchrony 254–5 taste receptors 31, 64
stimulus orientation in neonates 89 temporal integration 12–13
Stroop task 231–2, 264 neonates 90
subcortical brain networks 325–41 temporal order judgements 119, 125, 289, 306
subliminal processing 36 temporal synchrony 163, 164, 170, 171, 185, 256
superadditivity 160 development of 208
superior colliculus (SC) 1, 15, 160, 288 terminal addition 363
animal studies 15–16, 164 terminal system 31
in orientation 143 texture 13–14, 102–5
superior temporal sulcus (STS) 363 theory of mind 303
supramodal spatial representations 313–14 top-down modulation 257–60
sweetness 64 touch 5, 8, 114
visual cues 72–3 anatomical/functional development 7
swinging room procedure 138–9 neonates 92–6
developmental coordination disorder 278 see also tactile perception
synaesthesia 16, 33, 99, 229–50, 302, 308 transcranial magnetic stimulation 1, 102
colour-grapheme 229, 230, 231, 235, 240–2 grounding 343
coloured hearing 99, 230, 239–40 interference by 302
coloured numbers 99 plausibility 343–4
developmental implications 238–43 synaesthesia 236
colour-letters 240–2 trigeminal systems 31
loudness-brightness 238–9
sound symbolism 242–3 umami 35, 64
visual associations to pitch 239–40 unimodal facilitation 194–5
developmental origins 233–8 unisensory processing 274
correspondences 238 impairment 280–2
experience 237 unitary precepts 32–3
inhibition 236 utility knowledge 33
pruning 233–6
familial bias 230–1 vervet monkeys (Chlorocebus pygerethrus)
female bias 230 cross-species face-voice matching 364–6
induction of 237 multisensory matching 174
perceptual manifestations 230–1 vestibular system development 7
perceptual reality 231–3 vibrissae (whiskers) 92
behavioural evidence 231–2 visegmatic information 210
neuroimaging studies 232–3 vision 3, 5, 8, 32, 33, 36
projector 229, 236 links
sequence-form 230 with audition 14, 362
synaesthetic congruency 13 with touch 14
synaptic elaboration 366 spatial localization 151
synaptic pruning 233–6, 366 visual cortex 303
visual cues 64–5
tactile perception crossmodal influence 70–3
adults 92–3 developmental changes 68–9
contour following 93 expertise 73–5
development of 163 older adults 78–9
enclosure 93, 94 visual dominance 151
exploratory procedures 93 visual flavour 68
infants 93–4 visual input in multisensory development 304–13
assessment 94–6 congenitally blind humans 305–12
grasping 10, 105 late-blind humans 312–13
lateral motion 93 receptive fields 304–5
tactile stimuli 14–15 visually deprived animals 304–5
orienting to 120–3 visual language discrimination 210
382 SUBJECT INDEX
visual orienting 142–3 face-voice matching 361–2

visual receptors 6, 66 neurophysiology 362–3
visual spatial reliance 118 see also language
developmental changes 128–30 vomeronasal system 31
visual system development 7–8
visual-motor coupling models 355 walking 279
visual-proprioceptive correspondence 114–17, 123 Wernicke's area 303
visual-tactile correspondence 114–17 Williams syndrome 292
visually guided reaching 10 wine tasting 73–4
vocal communication 360–71 word learning 349–54
auditory-visual correspondence 361–2 word-blindness 273
behaviour 361–2

Multisensory Development PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multisensory Development PDF

Uploaded by

Copyright:

Available Formats

Multisensory Development

This page intentionally left blank

Part A Typical development of multisensory processes from

Part B Atypical multisensory development

Part C Neural, computational, and evolutionary mechanisms

Author index 373

Nadja Althaus Asif A. Ghazanfar

Paul J. Laurienti Benoist Schaal

Denis Mareschal Ferrinne Spector

Marko Nardini Arlette Streri

The multisensory approach

Auditory AND Development

Separate components Multisensory composite signal

a+b and Independence

Single signal a a and Synaesthesia

1.2 How do multisensory processes constrain and enrich

1.2.1 Developmental benefits of multiple senses

Trigeminal anatomy (touch/chemoreception)

Cutaneous touch anatomy

Olfactory and taste anatomy

Trigeminal function (touch/chemoreception)

Cutaneous tactile function

Olfactory and taste function

Sensory channel Anatomical development Functional development

Sensory channel Anatomical development Functional development

functioning plays an important role in development (Chapter 16 by Ghazanfar; Chapter 2 by

1.2.2 Developmental challenges of multiple senses

1.3 Where are we now?

1.4 Important themes in this book

1.4.1 Chemosensory and gustatory information processing:

1.4.4 Understanding the developmental process (inheritance

1.4.5 Development, learning, and aging

Walker-Andrews, A. (1994). Taxonomy for intermodal relations. In The development of intersensory

The role of olfaction in human

The omission of taste, touch and olfaction. . .is. . .an accurate

to be caught up in multisensory functioning. As defined broadly, ‘intersensory functioning is

2.2 Some specificities of olfaction in human development

2.2.1 Chemoreception is anatomically and functionally complex

Olfactory bulb Trigeminal

Olfactory system Trigeminal system

Olfactory Taste bud

2.2.2 The chemoreceptive modalities engender unitary percepts

2.2.3 Nasal and oral chemoreceptive subsystems are characterized

2.2.4 The nature of information provided by the chemical senses

2.2.5 The role of the neural architecture of olfaction in

2.2.6 Modularity of the chemosensory systems

2.2.7 Early functions of olfaction

2.2.8 The effect of environmental exposure on

2.2.9 Subliminal processing of odour stimuli

2.2.10 Persistence of odours in memory

2.2.12 Links of olfactory percepts with language

2.3 Can odours be drawn into multisensory processes

2.3.1 The associative nature of foetal learning

2.3.2 Can foetuses extract supramodal properties from

2.4 Intersensory involvement of olfaction in the newborn

2.4.2 Nursing and other social contexts that

Relative duration of eye opening Duration of oral activation (sec.)

Odour Present Masked

Multisensory integration is certainly an essential mechanism in the rapid improvement of neona-

2.5 Intersensory binding of olfaction in infancy

2.5.1 Odours as foreground cues to multisensory events

2.5.2 Odours as background cues to multisensory events

Schneirla, T.C. (1965). Aspects of stimulation and organization in approach/withdrawal processes