Professional Documents
Culture Documents
MULTISENSORY PROCESSES
FRONTIERS
IN NEUROSCIENCE
FRONTIERS IN NEUROSCIENCE
The NEURAL BASES of
MULTISENSORY PROCESSES
FRONTIERS IN NEUROSCIENCE
Series Editors
Sidney A. Simon, Ph.D.
Miguel A.L. Nicolelis, M.D., Ph.D.
Published Titles
Apoptosis in Neurobiology
Yusuf A. Hannun, M.D., Professor of Biomedical Research and Chairman, Department
of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston,
South Carolina
Rose-Mary Boustany, M.D., tenured Associate Professor of Pediatrics and Neurobiology, Duke
University Medical Center, Durham, North Carolina
TRP Ion Channel Function in Sensory Transduction and Cellular Signaling Cascades
Wolfgang B. Liedtke, M.D., Ph.D., Duke University Medical Center, Durham, North Carolina
Stefan Heller, Ph.D., Stanford University School of Medicine, Stanford, California
Neuroproteomics
Oscar Alzate, Ph.D., Department of Cell and Developmental Biology,
University of North Carolina, Chapel Hill, North Carolina
Mark T. Wallace
Vanderbilt University
Nashville, Tennessee
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Series Preface.................................................................................................................................. xiii
Introduction....................................................................................................................................... xv
Editors..............................................................................................................................................xix
Contributors.....................................................................................................................................xxi
Section I Anatomy
Chapter 2 Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay........ 15
Céline Cappe, Eric M. Rouiller, and Pascal Barone
Chapter 3 What Can Multisensory Processing Tell Us about the Functional Organization
of Auditory Cortex?..................................................................................................... 31
Jennifer K. Bizley and Andrew J. King
ix
x Contents
Chapter 12 Early Integration and Bayesian Causal Inference in Multisensory Perception......... 217
Ladan Shams
Chapter 15 The Organization and Plasticity of Multisensory Integration in the Midbrain........ 279
Thomas J. Perrault Jr., Benjamin A. Rowland, and Barry E. Stein
Chapter 22 Visual Abilities in Individuals with Profound Deafness: A Critical Review............ 423
Francesco Pavani and Davide Bottari
Chapter 37 Assessing the Role of Visual and Auditory Cues in Multisensory Perception of
Flavor......................................................................................................................... 739
Massimiliano Zampini and Charles Spence
Series Preface
FRONTIERS IN NEUROSCIENCE
The Frontiers in Neuroscience Series presents the insights of experts on emerging experimental
technologies and theoretical concepts that are or will be at the vanguard of neuroscience.
The books cover new and exciting multidisciplinary areas of brain research and describe break-
throughs in fields such as insect sensory neuroscience, primate audition, and biomedical imaging.
The most recent books cover the rapidly evolving fields of multisensory processing and reward.
Each book is edited by experts and consists of chapters written by leaders in a particular field.
Books are richly illustrated and contain comprehensive bibliographies. Chapters provide substantial
background material relevant to the particular subject.
The goal is for these books to be the references neuroscientists use in order to acquaint them-
selves with new methodologies in brain research. We view our task as series editors to produce
outstanding products and to contribute to the field of neuroscience. We hope that, as the volumes
become available, the effort put in by us, the publisher, the book editors, and individual authors will
contribute to further development of brain research. To the extent that you learn from these books,
we will have succeeded.
xiii
Introduction
The field of multisensory research continues to grow at a dizzying rate. Although for those of us
working in the field this is extraordinarily gratifying, it is also a bit challenging to keep up with
all of the exciting new developments in such a multidisciplinary topic at such a burgeoning stage.
For those a bit peripheral to the field, but with an inherent interest in the magic of multisensory
interactions to shape our view of the world, the task is even more daunting. Our objectives for this
book are straightforward—to provide those working within the area a strong overview of the cur-
rent state-of-the field, while at the same time providing those a bit outside of the field with a solid
introduction to multisensory processes. We feel that the current volume meets these objectives,
largely through a choice of topics that span the single cell to the clinic and through the expertise of
our authors, each of whom has done an exceptional job explaining their research to an interdisci-
plinary audience.
The book is organized thematically, with the themes generally building from the more basic to
the more applied. Hence, a reader interested in the progression of ideas and approaches can start at
the beginning and see how the basic science informs the clinical and more applied sciences by read-
ing each chapter in sequence. Alternatively, one can choose to learn more about a specific theme and
delve directly into that section. Regardless of your approach, we hope that this book will serve as
an important reference related to your interests in multisensory processes. The following narrative
provides a bit of an overview to each of the sections and the chapters contained within them.
Section I (Anatomy) focuses on the essential building blocks for any understanding of the neural
substrates of multisensory processing. In Chapter 1, Clemo and colleagues describe how neural con-
vergence and synaptology in multisensory domains might account for the diversity of physiological
response properties, and provide elegant examples of structure/function relationships. Chapter 2,
from Cappe and colleagues, details the anatomical substrates supporting the growing functional
evidence for multisensory interactions in classical areas of unisensory cortex, and which highlights
the possible thalamic contributions to these processes. In Chapter 3, Bizley and King focus on the
unisensory cortical domain that has been best studied for these multisensory influences—auditory
cortex. They highlight how visual inputs into the auditory cortex are organized, and detail the pos-
sible functional role(s) of these inputs.
Section II, organized around Neurophysiological Bases, provides an overview of how multisen-
sory stimuli can dramatically change the encoding processes for sensory information. Chapter 4, by
Meredith and colleagues, addresses whether bimodal neurons throughout the brain share the same
integrative characteristics, and shows marked differences in these properties between subcortex
and cortex. Chapter 5, from Kajikawa and colleagues, focuses on the nonhuman primate model
and bridges what is known about the neural integration of auditory–visual information in monkey
cortex with the evidence for changes in multisensory-mediated behavior and perception. In Chapter
6, Kayser and colleagues also focus on the monkey model, with an emphasis now on auditory cor-
tex and the merging of classical neurophysiological analyses with neuroimaging methods used in
human subjects (i.e., functional magnetic resonance imaging (fMRI)). This chapter emphasizes not
only early multisensory interactions, but also the transformations that take place as one ascends
the processing hierarchy as well as the distributed nature of multisensory encoding. The final four
chapters in this section then examine evidence from humans. In Chapter 7, Engel and colleagues
present compelling evidence for a role of coherent oscillatory activity in linking unisensory and
multisensory brain regions and improving multisensory encoding processes. This is followed by a
contribution from James and Stevenson (Chapter 8), which focuses on fMRI measures of multisen-
sory integration and which proposes a new criterion based on inverse effectiveness in evaluating and
xv
xvi Introduction
interpreting the BOLD signal. Chapter 9, by Keetels and Vroomen, reviews the psychophysical and
neuroimaging evidence associated with the perception of the temporal relationships (i.e., synchrony
and asynchrony) between multisensory cues. Finally, this section closes with a chapter from Lacey
and Sathian (Chapter 10), which reviews our current neuroimaging knowledge concerning the men-
tal representations of objects across vision and touch.
Section III, Combinatorial Principles and Modeling, focuses on efforts to gain a better mecha-
nistic handle on multisensory operations and their network dynamics. In Chapter 11, Sarko and
colleagues focus on spatiotemporal analyses of multisensory neurons and networks as well as com-
monalities across both animal and human model studies. This is followed by a contribution from
Shams, who reviews the psychophysical evidence for multisensory interactions and who argues that
these processes can be well described by causal inference and Bayesian modeling approaches. In
Chapter 13, Noppeney returns to fMRI and illustrates the multiple methods of analyses of fMRI
datasets, the interpretational caveats associated with these approaches, and how the combined
use of methods can greatly strengthen the conclusions that can be drawn. The final contribution
(Chapter 14), from Diederich and Colonius, returns to modeling and describes the time-window-of-
integration (TWIN) model, which provides an excellent framework within which to interpret the
speeding of saccadic reaction times seen under multisensory conditions.
Section IV encompasses the area of Development and Plasticity. Chapter 15, from Perrault and
colleagues, describes the classic model for multisensory neural studies, the superior colliculus, and
highlights the developmental events leading up to the mature state. In Chapter 16, Fillbrandt and
Ohl explore temporal plasticity in multisensory networks and shows changes in the dynamics of
interactions between auditory and visual cortices following prolonged exposure to fixed auditory–
visual delays. The next two contributions focus on human multisensory development. In Chapter 17,
Lewkowicz details the development of multisensory temporal processes, highlighting the increasing
sophistication in these processes as infants grow and gain experience with the world. Chapter 18, by
Burr and Gori, reviews the neurophysiological, behavioral and imaging evidence that illustrates the
surprisingly late development of human multisensory capabilities, a finding that they posit is a result
of the continual need for cross-modal recalibration during development. In Chapter 19, Vroomen
and Baart also discuss recalibration, this time in the context of language acquisition. They argue
that in the process of phonetic recalibration, the visual system instructs the auditory system to build
phonetic boundaries in the presence of ambiguous sound sources. Finally, Chapter 20 focuses on
what can be considered the far end of the developmental process—normal aging. Here, Mozolic and
colleagues review the intriguing literature suggesting enhanced multisensory processing in aging
adults, and highlight a number of possible reasons for these apparent improvements in sensory
function.
Section V, Clinical Manifestations, addresses how perception and action are affected by altered
sensory experience. In Chapter 21, Striem-Amit and colleagues focus on sensory loss, placing
particular emphasis on plasticity following blindness and on efforts to introduce low-cost sensory
substitution devices as rehabilitation tools. The functional imaging evidence they review provides
a striking example of training-induced plasticity. In Chapter 22, Pavani and Bottari likewise con-
sider sensory loss, focusing on visual abilities in profoundly deaf individuals. One contention in
their chapter is that deafness results in enhanced speed of reactivity to visual stimuli, rather than
enhanced visual perceptual abilities. In Chapter 23, Brozzoli and colleagues use the case of visuo-
tactile interactions as an example of how multisensory brain mechanisms can be rendered plastic
both in terms of sensory as well as motor processes. This plasticity is supported by the continuous
and active monitoring of peripersonal space, including both one’s own body and the objects in its
vicinity. In Chapter 24, Aspell and colleagues address the topic of bodily self-consciousness both
in neurological patients and healthy participants, showing how the perception of one’s “self” can be
distorted by multisensory conflicts.
Section VI encompasses the topic of Attention and Spatial Representations. A contribution from
Macaluso opens this section by reviewing putative neural mechanisms for multisensory links in the
Introduction xvii
our chosen field of inquiry. We are delighted by the diversity of experimental models, methodologi-
cal approaches, and conceptual frameworks that are used in the study of multisensory processes,
and that are reflected in the current volume. Indeed, in our opinion, the success of our field and
its rapid growth are attributable to this highly multidisciplinary philosophy, and bode well for the
future of multisensory science.
Micah M. Murray
Lausanne, Switzerland
Mark T. Wallace
Nashville, Tennessee
Editors
Micah M. Murray earned a double BA in psychology and English from The Johns Hopkins
University. In 2001, he received his PhD with honors from the Neuroscience Department, Albert
Einstein College of Medicine of Yeshiva University. He worked as a postdoctoral scientist in the
Neurology Clinic and Rehabilitation Department, University Hospital of Geneva, Switzerland. Since
2003 he has held a position within the Department of Clinical Neurosciences and Department of
Radiology at the University Hospital of Lausanne, Switzerland. Currently, he is an associate profes-
sor within these departments, adjunct associate professor at Vanderbilt University, as well as associ-
ate director of the EEG Brain Mapping Core of the Center for Biomedical Imaging in Lausanne,
Switzerland. Dr. Murray has a contiguous record of grant support from the Swiss National Science
Foundation. He has received awards for his research from the Leenaards Foundation (2005 Prize
for the Promotion of Scientific Research), the faculty of Biology and Medicine at the University
of Lausanne (2008 Young Investigator Prize), and from the Swiss National Science Foundation
(bonus of excellence in research). His research has been widely covered by the national and inter-
national media. He currently holds editorial board positions at Brain Topography (editor-in-chief),
Journal of Neuroscience (associate editor), Frontiers in Integrative Neuroscience (associate editor),
Frontiers in Auditory Cognitive Neuroscience (associate editor), and the Scientific World Journal.
Dr. Murray has authored more than 80 articles and book chapters. His group’s research primarily
focuses on multisensory interactions, object recognition, learning and plasticity, electroencepha-
logram-correlated functional MRI (EEG/fMRI) methodological developments, and systems/cog-
nitive neuroscience in general. Research in his group combines psychophysics, EEG, fMRI, and
transcranial magnetic simulation in healthy and clinical populations.
Mark T. Wallace received his BS in biology from Temple University in 1985, and his PhD in
neuroscience from Temple University in 1990, where he was the recipient of the Russell Conwell
Presidential Fellowship. He did a postdoctoral fellowship with Dr. Barry Stein at the Medical
College of Virginia, where he began his research looking at the neural mechanisms of multisensory
integration. Dr. Wallace moved to the Wake Forest University School of Medicine in 1995. In 2006,
Dr. Wallace came to Vanderbilt University, and was named the director of the Vanderbilt Brain
Institute in 2008. He is professor of hearing and speech sciences, psychology, and psychiatry, and
the associate director of the Vanderbilt Silvio O. Conte Center for Basic Neuroscience Research.
He is a member of the Center for Integrative and Cognitive Neuroscience, the Center for Molecular
Neuroscience, the Vanderbilt Kennedy Center, and the Vanderbilt Vision Research Center. Dr.
Wallace has received a number of awards for both research and teaching, including the Faculty
Excellence Award of Wake Forest University and being named the Outstanding Young Investigator
in the Basic Sciences. Dr. Wallace has an established record of research funding from the National
Institutes of Health, and is the author of more than 125 research presentations and publications. He
currently serves on the editorial board of several journals including Brain Topography, Cognitive
Processes, and Frontiers in Integrative Neuroscience. His work has employed a multidisciplinary
approach to examining multisensory processing, and focuses upon the neural architecture of multi-
sensory integration, its development, and its role in guiding human perception and performance.
xix
Contributors
Brian L. Allman Olaf Blanke
Department of Anatomy and Neurobiology Laboratory of Cognitive Neuroscience
Virginia Commonwealth University School of Ecole Polytechnique Fédérale de Lausanne
Medicine Lausanne, Switzerland
Richmond, Virginia
Davide Bottari
Amir Amedi Center for Mind/Brain Sciences
Department of Medical Neurobiology, Institute University of Trento
for Medical Research Israel–Canada Rovereto, Italy
Hebrew University–Hadassah Medical School
Jerusalem, Israel Claudio Brozzoli
Institut National de la Santé et de la Recherche
Dora E. Angelaki Médicale
Department of Anatomy and Neurobiology Bron, France
Washington University School of Medicine
St. Louis, Missouri Andreja Bubic
Department of Medical Neurobiology, Institute
Jane E. Aspell for Medical Research Israel–Canada
Laboratory of Cognitive Neuroscience Hebrew University–Hadassah Medical School
Ecole Polytechnique Fédérale de Lausanne Jerusalem, Israel
Lausanne, Switzerland
Heinrich H. Bülthoff
Martijn Baart Department of Human Perception, Cognition,
Department of Medical Psychology and and Action
Neuropsychology Max Planck Institute for Biological Cybernetics
Tilburg University Tübingen, Germany
Tilburg, the Netherlands
David Burr
Pascal Barone Dipartimento di Psicologia
Centre de Recherche Cerveau et Cognition Università Degli Studi di Firenze
(UMR 5549) Florence, Italy
CNRS, Faculté de Médecine de Rangueil
Université Paul Sabatier Toulouse 3 Jennifer L. Campos
Toulouse, France Department of Psychology
Toronto Rehabilitation Institute
Jennifer K. Bizley University of Toronto
Department of Physiology, Anatomy, and Toronto, Ontario, Canada
Genetics
University of Oxford Céline Cappe
Oxford, United Kingdom Laboratory of Psychophysics
Ecole Polytechnique Fédérale de Lausanne
Lausanne, Switzerland
xxi
xxii Contributors
Barry E. Stein
Department of Neurobiology and Anatomy
Wake Forest School of Medicine
Winston-Salem, North Carolina
Section I
Anatomy
1 Structural Basis of
Multisensory Processing
Convergence
H. Ruth Clemo, Leslie P. Keniston, and M. Alex Meredith
CONTENTS
1.1 Introduction...............................................................................................................................3
1.2 Multiple Sensory Projections: Sources......................................................................................3
1.2.1 Multiple Sensory Projections: Termination Patterns.....................................................6
1.2.2 Supragranular Termination of Cross-Modal Projections.............................................. 7
1.3 Do All Cross-Modal Projections Generate Multisensory Integration?.....................................9
1.4 Synaptic Architecture of Multisensory Convergence.............................................................. 10
1.5 Summary and Conclusions...................................................................................................... 11
Acknowledgments............................................................................................................................. 12
References......................................................................................................................................... 12
1.1 INTRODUCTION
For multisensory processing, the requisite, defining step is the convergence of inputs from differ-
ent sensory modalities onto individual neurons. This arrangement allows postsynaptic currents
evoked by different modalities access to the same membrane, to collide and integrate there on
the common ground of an excitable bilayer. Naturally, one would expect a host of biophysical and
architectural features to play a role in shaping those postsynaptic events as they spread across
the membrane, but much more can be written about what is unknown of the structural basis
for multisensory integration than of what is known. Historically, however, what has primarily
been the focus of anatomical investigations of multisensory processing has been the identifica-
tion of sources of inputs that converge in multisensory regions. Although a few recent studies
have begun to assess the features of convergence (see below), most of what is known about the
structural basis of multisensory processing lies in the sources and pathways essentially before
convergence.
3
4 The Neural Bases of Multisensory Processes
assessments of multisensory pathways were those that injected tracers into the STS and identified
the different cortical sources of inputs to that region. With tracer injections into the upper “poly-
sensory” STS bank, retrogradely labeled neurons were identified in adjoining auditory areas of the
STS, superior temporal gyrus, and supratemporal plane, and in visual areas of the inferior parietal
lobule and the lateral intraparietal sulcus, with a somewhat more restricted projection from the
parahippocampal gyrus and the inferotemporal visual area, as illustrated in Figure 1.1 (Seltzer and
Pandya 1994; Saleem et al. 2000). Although inconclusive about potential somatosensory inputs to
the STS, this study did mention the presence of retrogradely labeled neurons in the inferior parietal
lobule, an area that processes both visual and somatosensory information (e.g., Seltzer and Pandya
1980).
Like the STS, the feline anterior ectosylvian sulcus (AES) is located at the intersection of the
temporal, parietal, and frontal lobes, contains multisensory neurons (e.g., Rauschecker and Korte
1993; Wallace et al. 1992; Jiang et al. 1994), and exhibits a higher-order visual area within its lower
(ventral) bank (Mucke et al. 1982; Olson and Graybiel 1987). This has led to some speculation that
these regions might be homologous. However, a fourth somatosensory area (SIV) representation
(Clemo and Stein 1983) is found anterior along the AES, whereas somatosensory neurons are pre-
dominantly found in the posterior STS (Seltzer and Pandya 1994). The AES also contains distinct
modality-specific regions (somatosensory SIV, visual AEV, and auditory FAES) with multisensory
neurons found primarily at the intersection between these different representations (Meredith 2004;
Wallace et al. 2004; Carriere et al. 2007; Meredith and Allman 2009), whereas the subdivisions of
the upper STS bank are largely characterized by multisensory neurons (e.g., Benevento et al. 1977;
Bruce et al. 1981; Hikosaka et al. 1988). Further distinctions between the STS and the AES reside in
the cortical connectivity of the latter, as depicted in Figure 1.2. Robust somatosensory inputs reach
the AES from somatosensory areas SI–SIII (Burton and Kopf 1984; Reinoso-Suarez and Roda
1985) and SV (Mori et al. 1996; Clemo and Meredith 2004); inputs to AEV arrive from the extras-
triate visual area posterolateral lateral suprasylvian (PLLS), with smaller contributions from the
anterolateral lateral suprasylvian (ALLS) and the posteromedial lateral suprasylvian (PMLS) visual
areas (Olson and Graybiel 1987); auditory inputs to the FAES project from the rostral suprasylvian
sulcus (RSS), second auditory area (AII), and posterior auditory field (PAF) (Clemo et al. 2007; Lee
and Winer 2008). The laminar origin of these projections is provided in only a few of these reports.
CS
STS
Superior
LF
Posterior
FIGURE 1.1 Cortical afferents to monkey STS. On this lateral view of monkey brain, the entire extent of
STS is opened (dashed lines) to reveal upper and lower banks. On upper bank, multisensory regions TP0–4
are located (not depicted). Auditory inputs (black arrows) from adjoining superior temporal gyrus, planum
temporale, preferentially target anterior portions of upper bank. Visual inputs, primarily from parahippocam-
pal gyrus (medium gray arrow) but also from inferior parietal lobule (light gray arrow), also target upper
STS bank. Somatosensory inputs were comparatively sparse, limited to posterior aspects of STS, and may
arise from part of inferior parietal lobule (light gray arrow). Note that inputs intermingle within their areas
of termination.
Structural Basis of Multisensory Processing 5
AES
Superior
Posterior
FIGURE 1.2 Cortical afferents to cat AES. On this lateral view of cat cortex, the AES is opened (dashed
lines) to reveal dorsal and ventral banks. The somatosensory representation SIV on the anterior dorsal bank
receives inputs (light gray arrow) from somatosensory areas SI, SII, SII and SV. The auditory field of the
AES (FAES) in the posterior end of the sulcus receives inputs (black arrows) primarily from the rostral
suprasylvian auditory field, and sulcal portion of the anterior auditory field as well as portions of dorsal zone
of the auditory cortex, AII, and PAF. The ectosylvian visual (AEV) area in the ventral bank receives visual
inputs (dark gray arrow) primarily from PLLS and, to a lesser extent, from adjacent ALLS and PMLS visual
areas. Note that the SIV, FAES, and AEV domains, as well as their inputs, are largely segregated from one
another.
The AES is not alone as a cortical site of convergence of inputs from representations of different
sensory modalities, as the posterior ectosylvian gyrus (an auditory–visual area; Bowman and Olson
1988), PLLS visual area (an auditory–visual area; Yaka et al. 2002; Allman and Meredith 2007),
and the rostral suprasylvian sulcus (an auditory–somatosensory area; Clemo et al. 2007) have had
their multiple sensory sources examined.
Perhaps the most functionally and anatomically studied multisensory structure is not in the cor-
tex, but the midbrain. This six-layered region contains spatiotopic representations of visual, audi-
tory, and somatosensory modalities within its intermediate and deep layers (for review, see Stein
and Meredith 1993). Although unisensory, bimodal, and trimodal neurons are intermingled with
one another in this region, the multisensory neurons predominate (63%; Wallace and Stein 1997).
Despite their numbers, structure–function relationships have been determined for only a few mul-
tisensory neurons. The largest, often most readily identifiable on cross section (or via recording)
are the tectospinal and tectoreticulospinal neurons, with somata averaging 35 to 40 µm in diameter
whose dendritic arbors can extend up to 1.4 mm (Moschovakis and Karabelas 1985; Behan et al.
1988). These large multipolar neurons have a high incidence of multisensory properties, usually
as visual–auditory or visual–somatosensory bimodal neurons (Meredith and Stein 1986). Another
form of morphologically distinct superior colliculus (SC) neuron also shows multisensory proper-
ties: the nitric oxide synthase (NOS)-positive interneuron. These excitatory local circuit neurons have
been shown to receive bimodal inputs largely from the visual and auditory modalities (Fuentes-
Santamaria et al. 2008). Thus, unlike most other structures identified as multisensory, the SC con-
tains morphological classes of neurons that highly correlate with multisensory activity. Ultimately,
this could contribute to understanding how multisensory circuits are formed and their relation to
particular features of multisensory processing.
Because the SC is a multisensory structure, anatomical tracers injected into it have identified
numerous cortical and subcortical areas representing different sensory modalities that supply its
inputs. However, identification of the sources of multiple sensory inputs to this, or any, area pro-
vides little more than anatomical confirmation that projections from different sensory modalities
were involved. More pertinent is the information relating to the other end of the projection, the
6 The Neural Bases of Multisensory Processes
axon terminals, whose influence is responsible for the generation of multisensory effects on the
postsynaptic membrane. Despite the fact that axon terminals are at the physical point of multisen-
sory convergence, few studies of multisensory regions outside of the SC have addressed this specific
issue.
SO
SGI
Visual - AEV
Somatosensory - SIV
1 mm
FIGURE 1.3 Sensory segregation and multisensory convergence in SC. This coronal section through cat
SC shows alternating cellular and fibrous layers (SO, stratum opticum; SGI, stratum griseum intermediale).
Terminal boutons form a discontinuous, patchy distribution across multisensory layers with somatosensory
(dark gray, from SIV) and visual (light gray, from AEV) inputs that largely occupy distinct, nonoverlapping
domains. (Redrawn from Harting, J.K. et al., J. Comp. Neurol., 324, 379–414, 1992.) A tectoreticulospinal neu-
ron (redrawn from Behan, M. et al., J. Comp. Neurol., 270, 171–184, 1988.) is shown, to scale, repeating across
the intermediate layer where dendritic arbor virtually cannot avoid contacting multiple input domains from
different modalities. Accordingly, tectoreticulospinal neurons are known for their multisensory properties.
Structural Basis of Multisensory Processing 7
24
1. 2. 3. 4.
AES
SIV
1 3 Inj.
FAES
injection
1 mm
FIGURE 1.4 Supragranular cross-modal projections from auditory FAES (black injection site) to somato
sensory SIV. Coronal sections through SIV correlate with levels shown on lateral diagram of ferret cortex;
location of AES is indicated by arrow. On each coronal section, SIV region is denoted by dashed lines roughly
perpendicular to pial surface, and location of layer IV (granular layer) is indicated by dashed line essentially
parallel to the gray-white border. Each dot is equivalent to one bouton labeled from FAES; note that a prepon-
derance of labeled axon terminals are found in the supragranular layers. (Redrawn from Dehner, L.R. et al.,
Cereb. Cortex, 14, 387–403, 2004.)
these different input patterns suggest a complex spatial relationship with the recipient neurons and
may provide a useful testing ground on which to determine the synaptic architecture underlying
multisensory processing.
With regard to cortical multisensory areas, only relatively recent studies have examined the ter-
mination patterns of multiple sensory projections (e.g., projections from auditory and visual sources
to a target area) or cross-modal projections (e.g., projections from an auditory source to a visual tar-
get area). It had been observed that tracer injections into the anterior dorsal bank of the AES, where
the somatosensory area SIV is located, produced retrograde labeling in the posterior aspects of the
AES, where auditory field AES is found (Reinoso-Suarez and Roda 1985). This potential cross-
modal projection was further examined by Dehner et al. (2004), who injected tracers in auditory
FAES and identified orthograde projection terminals in SIV (see Figure 1.4). These experiments
were repeated with the tracer systematically placed in different portions of the FAES, showing
the constancy of the projection’s preference for terminating in the upper, supragranular layers of
SIV (Dehner et al. 2004). Functionally, such a cross-modal projection between auditory and soma-
tosensory areas would be expected to generate bimodal auditory–somatosensory neurons. However,
such bimodal neurons have rarely been observed in SIV (Clemo and Stein 1983; Rauschecker and
Korte 1993; Dehner et al. 2004) and stimulation of FAES (through indwelling electrodes) failed to
elicit a single example of orthodromic activation via this cross-modal pathway (Dehner et al. 2004).
Eventually, single- and combined-modality stimulation revealed that somatosensory SIV neurons
received subthreshold influences from auditory inputs, which was described as a “new” form of
multisensory convergence that was distinct from the well-known bimodal patterns identified in the
SC and elsewhere (Dehner et al. 2004). These functional distinctions are depicted in Figure 1.5,
where hypothetical circuits that produce different multisensory effects are illustrated. Ultimately,
these experiments (Dehner et al. 2004) indicate that bimodal neurons are not the only form of mul-
tisensory neuron.
FIGURE 1.5 Different patterns of sensory convergence result in different forms of processing. In each panel,
neuron (gray) receives inputs (black) from sensory modalities “A” and/or “B.” In bimodal condition (left),
neuron receives multiple inputs from both modalities, such that it can be activated by stimulus “A” alone or
by stimulus “B” alone. Furthermore, when both “A + B” are stimulated together, inputs converge on the same
neuron and their responses integrate. In subthreshold condition (center), neuron still receives inputs from both
modalities, but inputs from modality “B” are so reduced and occur at low-priority locations that stimulation
of “B” alone fails to activate the neuron. However, when “B” is combined with “A,” activity is modulated
(facilitation or suppression). In contrast, unisensory neurons (right) receive inputs from only a single modality
“A” and stimulation of “B” has no effect alone or in combination with “A.”
of the samples (Meredith et al. 2006). These projections also showed a preference for supragranu-
lar termination, as illustrated in Figure 1.6. In another study (Clemo et al. 2008), several auditory
corticocortical projections were demonstrated to terminate in the visual PLLS area, but only those
projections from FAES were present within the entire extent of the PLLS corresponding with the
distribution of subthreshold multisensory neurons (Allman and Meredith 2007). These projections
from FAES to PLLS showed an overwhelming preference for termination in the supragranular
FIGURE 1.6 Corticocortical projections to multisensory areas preferentially terminate in supragranular lay-
ers. In “A,” all panels represent coronal sections through RSS with layer IV approximated by dashed line. For
each area injected (e.g., AI, SIV, AEV, etc.), each dot represents one labeled axon terminal (bouton). (Redrawn
from Clemo, H.R. et al., J. Comp. Neurol., 503, 110–127, 2007; Clemo, H.R. et al., Exp. Brain Res., 191, 37–47,
2008; Meredith, M.A. et al., Exp. Brain Res., 172:472–484, 2006.)
Structural Basis of Multisensory Processing 9
layers (see Figure 1.6). Thus, it might seem that cross-modal projections that have supragranular
terminations underlie a specific form of multisensory processing. However, in the auditory field of
the rostral suprasylvian sulcus (which is part of the rostral suprasylvian sulcal cortex; Clemo et al.
2007), projections from somatosensory area SIV have a similar supragranular distribution, but both
subthreshold and bimodal forms of multisensory neurons are present. Therefore, it is not conclusive
that the supragranular projections and subthreshold multisensory processing correlate. It is clear,
however, that cross-modal corticocortical projections are strongly characterized by supragranular
patterns of termination.
FIGURE 1.7 Patterns of sensory convergence (black; from modality “A” or “B”) onto individual neurons
(gray) result in different forms of processing (similar to Figure 1.4). Synaptic arrangement depicted in middle
panel is adjusted such that inputs from modality “B” are light (left center) or very sparse (right center), sug-
gesting a slight difference of effect of modality “B” on responses elicited by “A.” In addition, because each
of these effects result from simple yet systematic changes in synaptic arrangement, these patterns suggest
that multisensory convergence occurs over a continuum of synaptic arrangements that, on one end, produces
bimodal multisensory properties, whereas on the other, it underlies only unisensory processing.
that produce bimodal neurons to, at the other end, the complete lack of inputs from a second modal-
ity that defines unisensory neurons (see Figure 1.7).
SIV
FAES
1 µm
1 µm
10 µm
FIGURE 1.8 (See color insert.) Confocal images of a somatosensory SIV neuron (red) contacted by boutons
that originated in auditory FAES (green). A three-dimensional rendering of a trimmed confocal stack contain-
ing a calretinin-positive SIV neuron (red; scale bar, 10 μm) that was contacted by two axons (green) labeled
from auditory area FAES. Each of the axo-dendritic points of contact are enlarged on the right (white arrows;
scale bar, 1.0 μm) to reveal the putative bouton swelling. (From Keniston, L.P. et al., Exp. Brain Res., 202,
725–731, 2010. With permission.)
projections to somatosensory area SIV (Keniston et al. 2010). First, a tracer (fluoroemerald, linked
to biotinylated dextran amine) was injected into the auditory FAES and allowed to transport to
SIV. Next, because inhibitory interneurons represent only about 20% of cortical neurons, immu-
nofluorescent tags of specific subclasses of interneurons would make them stand out against the
neuropil. Therefore, immunocytochemical techniques were used to rhodamine-label SIV interneu-
rons containing a calcium-binding protein (e.g., parvalbumin, calbindin, calretinin). Double- labeled
tissue sections were examined by a laser-scanning confocal microscope (TCS SP2 AOBS, Leica
Microsystems) and high-magnification image stacks were collected, imported into Volocity
(Improvision, Lexington, Massachusetts), and deconvolved (AutoQuant, Media Cybernetics). A
synaptic contact was defined as an axon swelling that showed no gap between it and the immuno-
positive neuron. Of the 33 immunopositive neurons identified, a total of 59 contacts were observed
with axon terminals labeled from the FAES, two of which are illustrated in Figure 1.8. Sixty-four
percent (21 of 33) of interneurons showed one or more contacts; the average was 2.81 (±1.4), with a
maximum of 5 found on one neuron. Thus, the anatomical techniques used here visualized cross-
modal convergence at the neuronal level as well as obtained some of the first insights into the syn-
aptic architecture of multisensory connections.
how the terminations of those inputs generate multisensory effects. Furthermore, because multisen-
sory processing is not restricted to only bimodal (or trimodal) neurons, the synaptic architecture
of multisensory convergence may be revealed to be as distinct and varied as the perceptions and
behaviors these multisensory circuits subserve.
ACKNOWLEDGMENTS
This study was supported by NIH grant NS039460.
REFERENCES
Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal sub-
threshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549.
Allman, B.L., R.E. Bittencourt-Navarrete, L.P. Keniston, A.E. Medina, M.Y. Wang, and M.A. Meredith. 2008.
Do cross-modal projections always result in multisensory integration? Cerebral Cortex 18:2066–2076.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribu-
tion of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167.
Behan, M., P.P. Appell, and M.J. Graper. 1988. Ultrastructural study of large efferent neurons in the supe-
rior colliculus of the cat after retrograde labeling with horseradish peroxidase. Journal of Comparative
Neurology 270:171–184.
Benevento, L.A., J.H. Fallon, B. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57:849–872.
Bental, E., N. Dafny, and S. Feldman. 1968. Convergence of auditory and visual stimuli on single cells in the
primary visual cortex of unanesthetized unrestrained cats. Experimental Neurology 20:341–351.
Bowman, E.M., and C.R. Olson. 1988. Visual and auditory association areas of the cat’s posterior ectosylvian
gyrus: Cortical afferents. Journal of Comparative Neurology 272:30–42.
Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46:369–384.
Burton, H., and E.M. Kopf. 1984. Ipsilateral cortical connections from the second and fourth somatic sensory
areas in the cat. Journal of Comparative Neurology 225:527–553.
Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007.
Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology
98:2858–2867.
Clemo, H.R., and M.A. Meredith. 2004. Cortico-cortical relations of cat somatosensory areas SIV and SV.
Somatosensory & Motor Research 21:199–209.
Clemo, H.R., and B.E. Stein. 1983. Organization of a fourth somatosensory area of cortex in cat. Journal of
Neurophysiology 50:910–925.
Clemo, H.R., B.L. Allman, Donlan M.A., and M.A. Meredith. 2007. Sensory and multisensory representations
within the cat rostral suprasylvian cortex. Journal of Comparative Neurology 503:110–127.
Clemo, H.R., G.K. Sharma, B.L. Allman, and M.A. Meredith. 2008. Auditory projections to extrastriate visual
cortex: Connectional basis for multisensory processing in ‘unimodal’ visual neurons. Experimental Brain
Research 191:37–47.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Falchier, A., C. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience 22:5749–5759.
Fishman, M.C., and P. Michael. 1973. Integration of auditory information in the cat’s visual cortex. Vision
Research 13:1415–1419.
Fuentes-Santamaria, V., J.C. Alvarado, B.E. Stein, and J.G. McHaffie. 2008. Cortex contacts both output neu-
rons and nitrergic interneurons in the superior colliculus: Direct and indirect routes for multisensory
integration. Cerebral Cortex 18:1640–1652.
Harting, J.K., and D.P. Van Leishout. 1991. Spatial relationships of axons arising from the substantia nigra,
spinal trigeminal nucleus, and the pedunculopontine tegmental nucleus within the intermediate gray of
the cat superior colliculus. Journal of Comparative Neurology 305:543–558.
Structural Basis of Multisensory Processing 13
Harting, J.K., B.V. Updyke, and D.P. Van Lieshout. 1992. Corticotectal projections in the cat: Anterograde
transport studies of twenty-five cortical areas. Journal of Comparative Neurology 324:379–414.
Harting, J.K., S. Feig, and D.P. Van Lieshout. 1997. Cortical somatosensory and trigeminal inputs to the
cat superior colliculus: Light and electron microscopic analyses. Journal of Comparative Neurology
388:313–326.
Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior
bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology
60:1615–1637.
Horn, G., and R.M. Hill. 1966. Responsiveness to sensory stimulation of units in the superior colliculus and
subjacent tectotegmental regions of the rabbit. Experimental Neurology 14:199–223.
Illing, R.-B., and A.M. Graybiel. 1986. Complementary and non-matching afferent compartments in the cat’s
superior colliculus: Innervation of the acetylcholinesterase-poor domain of the intermediate gray layer.
Neuroscience 18:373–394.
Jiang, H., F. Lepore, M. Ptito, and J.P. Guillemot. 1994, Sensory interactions in the anterior ectosylvian cortex
of cats. Experimental Brain Research 101:385–396.
Keniston, L.P., S.C. Henderson, and M.A. Meredith. 2010. Neuroanatomical identification of crossmodal audi-
tory inputs to interneurons in somatosensory cortex. Experimental Brain Research 202:725–731.
Lee, C.C., and J.A. Winer. 2008. Connections of cat auditory cortex: III. Corticocortical system. Journal of
Comparative Neurology 507:1920–1943.
Meredith, M.A. 2004. Cortico-cortical connectivity and the architecture of cross-modal circuits. In Handbook of
Multisensory Processes. C. Spence, G. Calvert, and B. Stein, eds. 343–355. Cambridge, MA: MIT Press.
Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. Neuro
report 20:126–131.
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662.
Meredith, M.A, L.P. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Crossmodal projections from somatosen-
sory area SIV to the auditory field of the anterior ectosylvian sulcus (FAES) in cat: Further evidence for
subthreshold forms of multisensory processing. Experimental Brain Research 172:472–484.
Monteiro, G., H.R. Clemo, and M.A. Meredith. 2003. Auditory cortical projections to the rostral suprasylvian
sulcal cortex in the cat: Implications for its sensory and multisensory organization. NeuroReport 14:
2139–2145.
Mori, A., T. Fuwa, A. Kawai et al. 1996. The ipsilateral and contralateral connections of the fifth somatosensory
area (SV) in the cat cerebral cortex. Neuroreport 7:2385–2387.
Morrell, F. 1972. Visual system’s view of acoustic space. Nature 238:44–46.
Moschovakis, A.K., and A.B. Karabelas. 1985. Observations on the somatodendritic morphology and axonal
trajectory of intracellularly HRP-labeled efferent neurons located in the deeper layers of the superior col-
liculus of the cat. Journal of Comparative Neurology 239:276–308.
Mucke, L., M. Norita, G. Benedek, and O. Creutzfeldt. 1982. Physiologic and anatomic investigation of a
visual cortical area situated in the ventral bank of the anterior ectosylvian sulcus of the cat. Experimental
Brain Research 46:1–11.
Murata, K., H. Cramer, and P. Bach-y-Rita. 1965. Neuronal convergence of noxious, acoustic, and visual stim-
uli in the visual cortex of the cat. Journal of Neurophysiology 28:1223–1239.
Olson, C.R., and A.M. Graybiel. 1987. Ectosylvian visual area of the cat: Location, retinotopic organization,
and connections. Journal of Comparative Neurology 261:277–294.
Rauschecker, J.P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex.
Journal of Neuroscience 13:4538–4548.
Reinoso-Suarez, F., and J.M. Roda. 1985. Topographical organization of the cortical afferent connections to the
cortex of the anterior ectosylvian sulcus in the cat. Experimental Brain Research 59:313–324.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50:19–26.
Saleem, K.S, W. Suzuki, K. Tanaka, and T. Hashikawa. 2000. Connections between anterior inferotem-
poral cortex and superior temporal sulcus regions in the macaque monkey. Journal of Neuroscience
20:5083–5101.
Seltzer, B., and D.N. Pandya. 1980. Converging visual and somatic sensory input to the intraparietal sulcus of
the rhesus monkey. Brain Research 192:339–351.
Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior
temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology
343:445–463.
14 The Neural Bases of Multisensory Processes
Shore, S.E., Z. Vass, N.L. Wys, and R.A. Altschuler. 2000. Trigeminal ganglion innervates the auditory brain-
stem. Journal of Comparative Neurology 419:271–285.
Spinelli, D.N., A. Starr, and T.W. Barrett. 1968. Auditory specificity in unit recordings from cat’s visual cortex.
Experimental Neurology 22:75–84.
Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge, MA: MIT Press.
Toldi, J., O. Feher, and L. Feuer. 1984. Dynamic interactions of evoked potentials in a polysensory cortex of
the cat. Neuroscience 13:945–952.
Vinkenoog, M., M.C. van den Oever, H.B. Uylings, and F.G. Wouterlood. 2005. Random or selective neuroana-
tomical connectivity. Study of the distribution of fibers over two populations of identified interneurons in
cerebral cortex. Brain Research. Brain Research Protocols 14:67–76.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience 17:2429–2444.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. The integration of multiple sensory inputs in cat cortex.
Experimental Brain Research 91:484–488.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences 101:2167–2172.
Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo–auditory interactions in the primary visual cortex
of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79.
Yaka, R., N. Notkin, U. Yinon, and Z. Wollberg. 2002. Visual, auditory and bimodal activity in the banks of the
lateral suprasylvian sulcus in the cat. Neuroscience and Behavioral Physiology 32:103–108.
2 Cortical and Thalamic
Pathways for Multisensory
and Sensorimotor Interplay
Céline Cappe, Eric M. Rouiller, and Pascal Barone
CONTENTS
2.1 Introduction............................................................................................................................. 15
2.2 Cortical Areas in Multisensory Processes............................................................................... 15
2.2.1 Multisensory Association Cortices.............................................................................. 15
2.2.1.1 Superior Temporal Sulcus............................................................................. 16
2.2.1.2 Intraparietal Sulcus....................................................................................... 16
2.2.1.3 Frontal and Prefrontal Cortex....................................................................... 16
2.2.2 Low-Level Sensory Cortical Areas............................................................................. 17
2.2.2.1 Auditory and Visual Connections and Interactions...................................... 17
2.2.2.2 Auditory and Somatosensory Connections and Interactions........................ 19
2.2.2.3 Visual and Somatosensory Connections and Interactions............................ 19
2.2.2.4 Heteromodal Projections and Sensory Representation................................. 19
2.3 Thalamus in Multisensory Processes......................................................................................20
2.3.1 Thalamocortical and Corticothalamic Connections...................................................20
2.3.2 Role of Thalamus in Multisensory Integration............................................................ 21
2.4 Higher-Order, Lower-Order Cortical Areas and/or Thalamus?.............................................. 23
2.5 Conclusions..............................................................................................................................24
Acknowledgments.............................................................................................................................24
References.........................................................................................................................................24
2.1 INTRODUCTION
Numerous studies in both monkey and human provided evidence for multisensory integration at
high-level and low-level cortical areas. This chapter focuses on the anatomical pathways contributing
to multisensory integration. We first describe the anatomical connections existing between different
sensory cortical areas, briefly concerning the well-known connections between associative cortical
areas and the more recently described connections targeting low-level sensory cortical areas. Then
we focus on the description of the connections of the thalamus with different sensory and motor
areas and their potential role in multisensory and sensorimotor integration. Finally, we discuss the
several possibilities for the brain to integrate the environmental world with the different senses.
15
16 The Neural Bases of Multisensory Processes
features about these regions, focusing on the superior temporal sulcus (STS), the intraparietal sul-
cus, and the frontal cortex.
Graziano et al. 1994, 1999). Somatosensory responses may be mediated by connections with soma-
tosensory area S2 and parietal ventral (PV) somatosensory area (Disbrow et al. 2003) and with
the posterior parietal cortex, such as areas 5, 7a, 7b, anterior intraparietal area (AIP), and VIP (see
Kaas and Collins 2004). Visual inputs could also come from the posterior parietal region. The belt
and parabelt auditory areas project to regions rostral to the premotor cortex (Hackett et al. 1999;
Romanski et al. 1999) and may contribute to auditory activation, as well as connections from the
trimodal portion of area 7b to the premotor cortex (Graziano et al. 1999).
Anterior to the premotor cortex, the prefrontal cortex plays a key role in temporal integration and
is related to evaluative and cognitive functions (Milner et al. 1985; Fuster 2001). Much of this cortex
has long been considered to be multisensory (Bignall 1970) but some regions are characterized by
some predominance in one sensory modality, such as an auditory domain in the ventral prefrontal
region (Suzuki 1985; Romanski and Goldman-Rakic 2002; Romanski 2004). This region receives
projections from auditory, visual, and multisensory cortical regions (e.g., Gaffan and Harrison 1991;
Barbas 1986; Romanski et al. 1999; Fuster et al. 2000), which are mediated through different func-
tional streams ending separately in the dorsal and ventral prefrontal regions (Barbas and Pandya
1987; Kaas and Hackett 2000; Romanski et al. 1999). This cortical input arising from different
modalities confer to the prefrontal cortex a role in cross-modal association (see Petrides and Iversen
1976; Joseph and Barone 1987; Barone and Joseph 1989; Ettlinger and Wilson 1990) as well as in
merging sensory information especially in processing conspecific auditory and visual communica-
tion stimuli (Romanski 2007; Cohen et al. 2007).
is important to note that there is probably a tendency for a decrease in the density of these auditory–
visual interconnections when going from rodents to carnivore to primates. This probably means a
higher incidence of cross-modal responses in unisensory areas of the rodents (Wallace et al. 2004),
whereas such responses are not present in the primary visual or auditory cortex of the monkey
(Lakatos et al. 2007; Kayser et al. 2008; Wang et al. 2008).
On the behavioral side, in experiments conducted in animals, multisensory integration dealt
in most cases with spatial cues, for instance, the correspondence between the auditory space and
the visual space. These experiments were mainly conducted in cats (Stein et al. 1989; Stein and
Meredith 1993; Gingras et al. 2009). For example, Stein and collaborators (1989) trained cats to
move toward visual or auditory targets with weak salience, resulting in poor performance that did
not exceed 25% on average. When the same stimuli were presented in spatial and temporal con-
gruence, the percentage of correct detections increased up to nearly 100%. In monkeys, only few
experiments have been conducted on behavioral facilitation induced by multimodal stimulation
(Frens and Van Opstal 1998; Bell et al. 2005). In line with human studies, simultaneous presenta-
tion in monkeys of a sound during a visually guided saccade induced a reduction of about 10% to
15% of saccade latency depending on the visual stimulus contrast level (Wang et al. 2008). Recently,
we have shown behavioral evidence for multisensory facilitation between vision and hearing in
macaque monkeys (Cappe et al. 2010). Monkeys were trained to perform a simple detection task
to stimuli, which were auditory (noise), visual (flash), or auditory–visual (noise and flash) at dif-
ferent intensities. By varying the intensity of individual auditory and visual stimuli, we observed
that, when the stimuli are of weak saliency, the multisensory condition had a significant facilitatory
effect on reaction times, which disappeared at higher intensities (Cappe et al. 2010). We applied to
the behavioral data the “race model” (Raab 1962) that supposes that the faster unimodal modality
should be responsible for the shortening in reaction time (“the faster the winner”), which would
correspond to a separate activation model (Miller 1982). It turns out that the multisensory benefit
at low intensity derives from a coactivation mechanism (Miller 1982) that implies a convergence
of hearing and vision to produce multisensory interactions and a reduction in reaction time. The
anatomical studies previously described suggest that such a convergence may take place at the lower
levels of cortical sensory processing.
In humans, numerous behavioral studies, using a large panel of different paradigms and various
types of stimuli, showed the benefits of auditory–visual combination stimuli compared to unisen-
sory stimuli (see Calvert et al. 2004 for a review; Romei et al. 2007; Cappe et al. 2009b as recent
examples).
From a functional point of view, many studies have shown multisensory interactions early in
time and in different sensory areas with neuroimaging and electrophysiological methods. Auditory–
visual interactions have been revealed in the auditory cortex or visual cortex using electrophysi-
ological or neuroimaging methods in cats and monkeys (Ghazanfar et al. 2005; Bizley et al. 2007;
Bizley and King 2008; Cappe et al. 2007; Kayser et al. 2007, 2008; Lakatos et al. 2007; Wang et al.
2008). More specifically, electrophysiological studies in monkeys, revealing multisensory interac-
tions in primary sensory areas such as V1 or A1, showed that cross-modal stimuli (i.e., auditory
or visual stimuli, respectively) are rather modulatory on the non-“sensory-specific” response, and/
or acting on the oscillatory activity (Lakatos et al. 2007; Kayser et al. 2008) or on the latency of
the neuronal responses (Wang et al. 2008). These mechanisms can enhance the speed of sensory
processing and induce a reduction of the reaction times (RTs) during a multisensory stimulation.
Neurons recorded in the primary visual cortex showed a significant reduction in visual response
latencies, specifically in suboptimal conditions (Wang et al. 2008). It is important to mention that, in
the primary sensory areas of the primate, authors have reported the absence of nonspecific sensory
responses at the spiking level (Wang et al. 2008; Lakatos et al. 2007; Kayser et al. 2008). These
kinds of interactions between hearing and vision were also reported in humans using neuroimaging
techniques (Giard and Peronnet 1999; Molholm et al. 2002; Lovelace et al. 2003; Laurienti et al.
2004; Martuzzi et al. 2007).
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay 19
representations in areas 1/3b. Similarly, auditory and multimodal projections to area V1 are promi-
nent toward the representation of the peripheral visual field (Falchier et al. 2002, 2010; Hall and
Lomber 2008), and only scattered neurons in the auditory cortex send a projection to foveal V1. The
fact that heteromodal connections are coupling specific sensory representations across modalities
probably reflects an adaptive process for behavioral specialization. This is in agreement with human
and monkey data showing that the neuronal network involved in multisensory integration, as well
as its expression at the level of the neuronal activity, is highly dependent on the perceptual task in
which the subject is engaged. In humans, the detection or discrimination of bimodal objects, as
well as the perceptual expertise of subjects, differentially affect both the temporal aspects and the
cortical areas at which multisensory interactions occur (Giard and Peronnet 1999; Fort et al. 2002).
Similarly, we have shown that the visuo–auditory interactions observed at the level of V1 neurons
are observed only in behavioral situations during which the monkey has to interact with the stimuli
(Wang et al. 2008).
Such an influence of the perceptual context on the neuronal expression of multisensory interac-
tion is also present when analyzing the phenomena of cross-modal compensation after sensory
deprivation in human. In blind subjects (Sadato et al. 1996), the efficiency of somatosensory stimu-
lation on the activation of the visual cortex is at maximum during an active discrimination task
(Braille reading). This suggests that the mechanisms of multisensory interaction, at early stages of
sensory processing and the cross-modal compensatory mechanisms, are probably mediated through
common neuronal pathways involving the heteromodal connections described previously.
showed that somatosensory inputs may reach the auditory cortex (CM and CL) through connections
coming from the medial part of the medial geniculate nucleus (MGm) or the multisensory nuclei
[posterior, suprageniculate, limitans, and medial pulvinar (PuM)]. All these thalamocortical projec-
tions are consistent with the presence of thalamic territories possibly integrating different sensory
modalities with motor attributes.
We calculated the degree of overlap between thalamocortical and CT connections in the thalamus
to determine the projections to areas of a same modality, as previously described (Tanné-Gariépy
et al. 2002; Morel et al. 2005; Cappe et al. 2009c). The degree of overlap may range between 0%
when two thalamic territories projecting to two distinct cortical areas are spatially completely seg-
regated and 100% when the two thalamic territories fully overlap (considering a spatial resolution of
0.5 mm, further details in Cappe et al. 2009c). Thalamic nuclei with spatially intermixed thalamo-
cortical cells projecting to auditory or premotor cortices were located mainly in the PuM, VA, and
CL nuclei. The overlap between the projections to the auditory and parietal cortical areas concerned
different thalamic nuclei such as PuM, CL, and to a lesser extent, LP and PuL. The projections to
the premotor and posterior parietal cortex overlapped primarily in PuM, LP, MD, and also in VA,
VLpd, and CL. Quantitatively, we found that projections from the thalamus to the auditory and
motor cortical areas overlapped to an extent ranging from 4% to 12% through the rostral thalamus
and increased up to 30% in the caudal part of the thalamus. In PuM, the degree of overlap between
thalamocortical projections to auditory and premotor cortex ranged from 14% to 20%. PuM is the
thalamic nucleus where the maximum of overlap between thalamocortical projections was found.
Aside from the thalamocortical connections, CT connections were also investigated in the same
study, concerning, in particular, the parietal areas PE and PEa injected with a tracer with antero-
grade properties (biotinylated dextran amine; Cappe et al. 2007). Indeed, areas PE and PEa send
CT projections to the thalamic nuclei PuM, LP, and to a lesser extent, VPL, CM, CL, and MD (PEa
only for MD). These thalamic nuclei contained both the small and giant CT endings. The existence
of these two different types of CT endings reflect the possibility for CT connections to represent
either feedback or feedforward projections (for review, see Rouiller and Welker 2000; Sherman and
Guillery 2002, 2005; Sherman 2007). In contrast to the feedback CT projection originating from
cortical layer VI, the feedforward CT projection originates from layer V and terminates in the thala-
mus in the form of giant endings, which can ensure highly secure and rapid synaptic transmission
(Rouiller and Welker 2000). Considering the TC and CT projections, some thalamic nuclei (PuM,
LP, VPL, CM, CL, and MD) could play a role in the integration of different sensory information
with or without motor attributes (Cappe et al. 2007, 2009c). Moreover, parietal areas PE and PEa
may send, via the giant endings, feedforward CT projection and transthalamic projections to remote
cortical areas in the parietal, temporal, and frontal lobes contributing to polysensory and senso-
rimotor integration (Cappe et al. 2007, 2009c).
Section 2.3.1, the medial part of the pulvinar nucleus is the main candidate (although other thalamic
nuclei such as LP, VPL, MD, or CL may also play a role) to represent an alternative to corticocorti-
cal loops by which information can be transferred between cortical areas belonging to different sen-
sory and sensorimotor modalities (see also Shipp 2003). On a functional point of view, neurons in
PuM respond to visual stimuli (Gattass et al. 1979) and auditory stimuli (Yirmiya and Hocherman
1987), which is consistent with our hypothesis.
Another point is that, as our injections in the different sensory and motor areas included corti-
cal layer I (Cappe et al. 2009c), it is likely that some of these projections providing multimodal
information to the cortex originate from the so-called “matrix” calbindin-immunoreactive neurons
distributed in all thalamic nuclei and projecting diffusely and relatively widely to the cortex (Jones
1998).
Four different mechanisms of multisensory and sensorimotor interplay can be proposed based
on the pattern of convergence and divergence of thalamocortical and CT connections (Cappe et al.
2009c). First, some restricted thalamic territories sending divergent projections to cortical areas
afford different sensory and/or motor inputs which can be mixed simultaneously. Although such
a multimodal integration in the temporal domain cannot be excluded (in case the inputs reach the
cerebral cortex at the exact same time), it is less likely to provide massive multimodal interplay
than an actual spatial convergence of projections. More convincingly, this pattern could support
a temporal coincidence mechanism as a synchronizer between remote cortical areas, allowing a
higher perceptual saliency of multimodal stimuli (Fries et al. 2001). Second, thalamic nuclei could
be an integrator of multisensory information, rapidly relaying this integrated information to the
cortex by their multiple thalamocortical connections. In PuM, considerable mixing of territories
projecting to cortical areas belonging to several modalities is in line with previously reported con-
nections with several cortical domains, including visual, auditory, somatosensory, and prefrontal
and motor areas. Electrophysiological recordings showed visual and auditory responses in this
thalamic nucleus (see Cappe et al. 2009c for an extensive description). According to our analysis,
PuM, LP, MD, MGm, and MGd could play the role of integrator (Cappe et al. 2009c). Third, the
spatial convergence of different sensory and motor inputs at the cortical level coming from thal-
amocortical connections of distinct thalamic territories suggests a fast multisensory interplay. In
our experiments (Cappe et al. 2009c), the widespread distribution of thalamocortical inputs to the
different cortical areas injected could imply that this mechanism of convergence plays an impor-
tant role in multisensory and motor integration. By their cortical connection patterns, thalamic
nuclei PuM and LP, for instance, could play this role for auditory–somatosensory interplay in area
5 (Cappe et al. 2009c). Fourth, the cortico–thalamo–cortical route can support rapid and secure
transfer from area 5 (PE/PEa; Cappe et al. 2007) to the premotor cortex via the giant terminals
of these CT connections (Guillery 1995; Rouiller and Welker 2000; Sherman and Guillery 2002,
2005; Sherman 2007). These giant CT endings, consistent with this principle of transthalamic loop,
have been shown to be present in different thalamic nuclei (e.g., Schwartz et al. 1991; Rockland
1996; Darian-Smith et al. 1999; Rouiller et al. 1998, 2003; Taktakishvili et al. 2002; Rouiller and
Durif 2004) and may well also apply to PuM, as demonstrated by the overlap between connections
to the auditory cortex and to the premotor cortex, allowing an auditory–motor integration (Cappe
et al. 2009c).
Thus, recent anatomical findings at the thalamic level (Komura et al. 2005; de la Mothe 2006b;
Hackett et al. 2007; Cappe et al. 2007, 2009c) may represent the anatomical support for multisen-
sory behavioral phenomenon as well as multisensory integration at the functional level. Indeed,
some nuclei in the thalamus, such as the medial pulvinar, receive either mixed sensory inputs or
projections from different sensory cortical areas and project to sensory and premotor areas (Cappe
et al. 2009c). Sensory modalities may thus already be fused at the thalamic level before being
directly conveyed to the premotor cortex and consequently participating in the redundant signal
effect expressed by faster reaction times in response to auditory–visual stimulation (Cappe et al.
2010).
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay 23
(a) M (b) M
H H
A V S A V S
FIGURE 2.1 Hypothetical scenarios for multisensory and motor integration through anatomically identified
pathways. (a) High-level cortical areas as a pathway for multisensory and motor integration. (b) Low-level
cortical areas as a pathway for multisensory integration. (c) Thalamus as a pathway for multisensory and
motor integration. (d) Combined cortical and thalamic connections as a pathway for multisensory and motor
integration.
as well as the thalamus have now been shown to be part of multisensory integration. The question
is now to determine how this system of multisensory integration is organized and how the different
parts of the system communicate to allow a unified view of the perception of the world.
2.5 CONCLUSIONS
Obviously, we are just beginning to understand the complexity of interactions in the sensory sys-
tems and between the sensory and the motor systems. More work is needed in both the neural and
perceptual domains. At the neural level, additional studies are needed to understand the extent and
hierarchical organization of multisensory interactions. At the perceptual level, further experiments
should explore the conditions necessary for cross-modal binding and plasticity, and investigate the
nature of the information transfer between sensory systems. Such studies will form the basis for a
new comprehension of how the different sensory and/or motor systems function together.
ACKNOWLEDGMENTS
This study was supported by the following grants: the CNRS ATIP program (to P.B.), the Swiss
National Science Foundation, grants 31-61857.00 (to E.M.R.) and 310000-110005 (to E.M.R.), the
Swiss National Science Foundation Center of Competence in Research on “Neural Plasticity and
Repair” (to E.M.R.).
REFERENCES
Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape
processing in the human lateral occipital complex. Cerebral Cortex 12:1202–12.
Amedi, A., W.M., Stern, J.A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitu-
tion activates the lateral occipital complex. Nature Neuroscience 10:687–9.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay 25
Andersen, R.A., C., Asanuma, G. Essick, and R.M. Siegel. 1990. Corticocortical connections of anatomi-
cally and physiologically defined subdivisions within the inferior parietal lobule. Journal of Comparative
Neurology 296:65–113.
Andersen, R.A., L.H. Snyder, D.C. Bradley, and J. Xing. 1997. Multimodal representation of space in the
posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience 20:
303–30.
Avillac, M., S. Ben Hamed, and J.R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area
of the macaque monkey. Journal of Neuroscience 27:1922–32.
Barbas, H. 1986. Pattern in the laminar origin of corticocortical connections. Journal of Comparative Neurology
252:415–22.
Barbas, H., and D.N. Pandya. 1987. Architecture and frontal cortical connections of the premotor cortex (area
6) in the rhesus monkey. Journal of Comparative Neurology 256:211–28.
Barone, P., and J.P. Joseph. 1989. Role of the dorsolateral prefrontal cortex in organizing visually guided behav-
ior. Brain, Behavior and Evolution 33:132–5.
Barraclough, N.E., D. Xiao, C.I., Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17:377–91.
Baylis, G.C., E.T. Rolls, and C.M. Leonard. 1987. Functional subdivisions of the temporal lobe neocortex.
Journal of Neuroscience 7:330–42.
Bell, A.H., M.A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate
superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of
Neurophysiology 93:3659–73.
Benevento, L.A., J., Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57:849–72.
Bignall, K.E. 1970. Auditory input to frontal polysensory cortex of the squirrel monkey: Possible pathways.
Brain Research 19:77–86.
Bizley, J.K., and A.J. King. 2008. Visual–auditory spatial processing in auditory cortical neurons. Brain
Research 1242:24–36.
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for
multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89.
Blatt, G.J., R.A. Andersen, and G.R. Stoner. 1990. Visual receptive field organization and cortico-cortical con-
nections of the lateral intraparietal area (area LIP) in the macaque. Journal of Comparative Neurology
299:421–45.
Bremmer, F., F. Klam, J.R. Duhamel, S. Ben Hamed, and W. Graf. 2002. Visual-vestibular interactive responses
in the macaque ventral intraparietal area (VIP). European Journal of Neuroscience 16:1569–86.
Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84.
Budinger, E., and H. Scheich. 2009. Anatomical connections suitable for the direct processing of neuronal
information of different modalities via the rodent primary auditory cortex (review). Hearing Research
258:16–27.
Budinger, E., P. Heil, A. Hess, and H. Scheich. 2006. Multisensory processing via early cortical stages: Connec
tions of the primary auditory cortical field with other sensory systems. Neuroscience 143:1065–83.
Bullier, J. 2006. What is feed back? In 23 Problems in Systems Neuroscience, ed. J.L. van Hemmen and T.J.
Sejnowski, 103–132. New York: Oxford University Press.
Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies
(review). Cerebral Cortex 11:1110–23.
Calvert, G., C. Spence, and B.E. Stein, eds. 2004. The Handbook of Multisensory Processes. Cambridge, MA:
MIT Press.
Campi, K.L., K.L. Bales, R. Grunewald, and L. Krubitzer. 2010. Connections of auditory and visual cortex in
the prairie vole (Microtus ochrogaster): Evidence for multisensory processing in primary sensory areas.
Cerebral Cortex 20:89–108.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. European Journal of Neuroscience 22:2886–902.
Cappe, C., A. Morel, and E.M. Rouiller. 2007. Thalamocortical and the dual pattern of corticothalamic projec-
tions of the posterior parietal cortex in macaque monkeys. Neuroscience 146:1371–87.
Cappe, C., E.M. Rouiller, and P. Barone. 2009a. Multisensory anatomic pathway (review). Hearing Research
258:28–36.
26 The Neural Bases of Multisensory Processes
Cappe, C., G. Thut, V. Romei, and M.M. Murray. 2009b. Selective integration of auditory-visual looming cues
by humans. Neuropsychologia 47:1045–52.
Cappe, C., A. Morel, P. Barone, and E.M. Rouiller. 2009c. The thalamocortical projection systems in primate:
An anatomical support for multisensory and sensorimotor integrations. Cerebral Cortex 19:2025–37.
Cappe, C., M.M. Murray, P. Barone, and E.M. Rouiller. 2010. Multisensory facilitation of behavior in monkeys:
Effects of stimulus intensity. Journal of Cognitive Neuroscience 22:2850–63.
Cohen, Y.E., B.E. Russ, and G.W. Gifford 3rd. 2005. Auditory processing in the posterior parietal cortex
(review). Behavioral and Cognitive Neuroscience Reviews 4:218–31.
Cohen, Y.E., F. Theunissen, B.E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their
representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97:1470–84.
Colby, C.L., and M.E. Goldberg. 1999. Space and attention in parietal cortex (review). Annual Review of
Neuroscience 22:319–49.
Crick, F., and C. Koch. 1998. Constraints on cortical and thalamic projections: The no-strong-loops hypothesis.
Nature 391:245–50.
Cusick, C.G., B. Seltzer, M. Cola, and E. Griggs. 1995. Chemoarchitectonics and corticocortical terminations
within the superior temporal sulcus of the rhesus monkey: Evidence for subdivisions of superior temporal
polysensory cortex. Journal of Comparative Neurology 360:513–35.
Darian-Smith, C., A. Tan, and S. Edwards. 1999. Comparing thalamocortical and corticothalamic microstruc-
ture and spatial reciprocity in the macaque ventral posterolateral nucleus (VPLc) and medial pulvinar.
Journal of Comparative Neurology 410:211–34.
de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006a. Cortical connections of the auditory cortex
in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:27–71.
de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006b. Thalamic connections of the auditory cor-
tex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:72–96.
Desimone, R., and C.G. Gross. 1979. Visual areas in the temporal cortex of the macaque. Brain Research
178:363–80.
Disbrow, E., E. Litinas, G.H. Recanzo, J. Padberg, and L. Krubitzer. 2003. Cortical connections of the sec-
ond somatosensory area and the parietal ventral area in macaque monkeys. Journal of Comparative
Neurology 462:382–99.
Duhamel, J.R., C.L. Colby, and M.E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent
visual and somatic response properties. Journal of Neurophysiology 79:126–36.
Ettlinger, G., and W.A. Wilson. 1990. Cross-modal performance: Behavioural processes, phylogenetic consid-
erations and neural mechanisms (review). Behavioural Brain Research 40:169–92.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience 22:5749–59.
Falchier, A., C.E. Schroeder, T.A. Hackett et al. 2010. Low level intersensory connectivity as a fundamental
feature of neocortex. Cerebral Cortex 20:1529–38.
Felleman, D.J., and D.C. Van Essen. 1991. Distributed hierarchical processing in the primate cerebral cortex.
Cerebral Cortex 1:1–47.
Fogassi, L., V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal
space in inferior premotor cortex (area F4). Journal of Neurophysiology 76:141–57.
Fort, A., C. Delpuech, J. Pernier, and M.H. Giard. 2002. Dynamics of corticosubcortical cross-modal opera-
tions involved in audio-visual object detection in humans. Cerebral Cortex 12:1031–39.
Forster, B., C. Cavina-Pratesi, S.M. Aglioti, and G. Berlucchi. 2002. Redundant target effect and intersensory facili-
tation from visual-tactile interactions in simple reaction time. Experimental Brain Research 143:480–487.
Foxe, J.J., I.A. Morocz, M.M. Murray, B.A. Higgins, D.C. Javitt, and C.E. Schroeder. 2000. Multisensory
auditory–somatosensory interactions in early cortical processing revealed by high-density electrical
mapping. Brain Research. Cognitive Brain Research 10:77–83.
Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory–somatosensory multisensory processing in auditory
association cortex: An fMRI study. Journal of Neurophysiology 88:540–3.
Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in mon-
key superior colliculus. Brain Research Bulletin 46:211–24.
Fries, P., S. Neuenschwander, A.K. Engel, R. Goebel, and W. Singer. 2001. Rapid feature selective neuronal
synchronization through correlated latency shifting. Nature Neuroscience 4:194–200.
Fu, K.M., T.A. Johnston, A.S. Shah et al. 2003. Auditory cortical neurons respond to somatosensory stimula-
tion. Journal of Neuroscience 23:7510–5.
Fuster, J.M. 2001. The prefrontal cortex—an update: Time is of the essence (review). Neuron 30:319–33.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay 27
Fuster, J.M., M. Bodner, and J.K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of
frontal cortex. Nature 405:347–51.
Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporal–
frontal interaction in the rhesus monkey. Brain 114:2133–44.
Galletti, C., M. Gamberini, D.F. Kutz, P. Fattori, G. Luppino, M. Matelli. 2001. The cortical connections of
area V6: An occipito-parietal network processing visual information. European Journal of Neuroscience
13:1572–88.
Gattass, R., E. Oswaldo-Cruz, and A.P. Sousa. 1979. Visual receptive fields of units in the pulvinar of cebus
monkey. Brain Research 160:413–30.
Ghazanfar, A.A. 2009. The multisensory roles for auditory cortex in primate vocal communication (review).
Hearing Research 258:113–20.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? (review). Trends in Cognitive
Sciences 10:278–85.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience 28:4457–69.
Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in
humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90.
Gifford 3rd, G.W., and Y.E. Cohen. 2004. Effect of a central fixation light on auditory spatial responses in area
LIP. Journal of Neurophysiology 91:2929–33.
Gingras, G., B.A. Rowland, and B.E. Stein. 2009. The differing impact of multisensory and unisensory integra-
tion on behavior. Journal of Neuroscience 29:4897–902.
Giray, M., and R. Ulrich. 1993. Motor coactivation revealed by response force in divided and focused attention.
Journal of Experimental Psychology. Human Perception and Performance 19:1278–91.
Gondan, M., B. Niederhaus, F. Rösler, and B. Röder. 2005. Multisensory processing in the redundant-target
effect: A behavioral and event-related potential study. Perception & Psychophysics 67:713–26.
Gottlieb, J. 2007. From thought to action: The parietal cortex as a bridge between perception, action, and cogni-
tion (review). Neuron 53:9–16.
Graziano, M.S., G.S. Yap, and C.G. Gross. 1994. Coding of visual space by premotor neurons. Science
266:1054–7.
Graziano, M.S., L.A. Reiss, and C.G. Gross. 1999. A neuronal representation of the location of nearby sounds.
Nature 397:428–30.
Guillery, R.W. 1995. Anatomical evidence concerning the role of the thalamus in corticocortical communica-
tion: A brief review. Journal of Anatomy 187:583–92.
Gutierrez, C., M.G. Cola, B. Seltzer, and C. Cusick. 2000. Neurochemical and connectional organization of the
dorsal pulvinar complex in monkeys. Journal of Comparative Neurology 419:61–86.
Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1998. Thalamocortical connections of the parabelt auditory cortex
in macaque monkeys. Journal of Comparative Neurology 400:271–86.
Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in
macaque monkeys. Brain Research 817:45–58.
Hackett, T.A., L.A. de La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C.E. Schroeder. 2007. Multisensory
convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane.
Journal of Comparative Neurology 502:924–52.
Hagen, M.C., O. Franzén, F. McGlone, G. Essick, C. Dancer, and J.V. Pardo. 2002. Tactile motion activates the
human middle temporal/V5 (MT/V5) complex. European Journal of Neuroscience 16:957–64.
Hall, A.J., and S.G. Lomber. 2008. Auditory cortex projections target the peripheral field representation of
primary visual cortex. Experimental Brain Research 190:413–30.
Hecht, D., M. Reiner, and A. Karni. 2008. Enhancement of response times to bi- and tri-modal sensory stimuli
during active movements. Experimental Brain Research 185:655–65.
Heffner, R.S., and H.E. Heffner. 1992. Visual factors in sound localization in mammals. Journal of Comparative
Neurology 317:219–32.
Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of
the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–37.
Huffman, K.J., and L. Krubitzer. 2001. Area 3a: topographic organization and cortical connections in marmoset
monkeys. Cerebral Cortex 11:849–67.
28 The Neural Bases of Multisensory Processes
Innocenti, G.M., P. Berbel, and S. Clarke. 1988. Development of projections from auditory to visual areas in
the cat. Journal of Comparative Neurology 272:242–59.
James, T.W., G.K. Humphrey, J.S. Gati, P. Servos, R.S. Menon, and M.A. Goodale. 2002. Haptic study of three-
dimensional objects activates extrastriate visual areas. Neuropsychologia 40:1706–14.
Jepma, M., E.J. Wagenmakers, G.P. Band, and S. Nieuwenhuis. 2009. The effects of accessory stimuli on
information processing: Evidence from electrophysiology and a diffusion model analysis. Journal of
Cognitive Neuroscience 21:847–64.
Jones, E.G. 1998. Viewpoint: The core and matrix of thalamic organization. Neuroscience 85:331–45.
Joseph, J.P., and P. Barone. 1987. Prefrontal unit activity during a delayed oculomotor task in the monkey.
Experimental Brain Research 67:460–8.
Kaas, J.H., and C.E. Collins. 2001. Evolving ideas of brain evolution. Nature 411:141–2.
Kaas, J., and C.E. Collins. 2004. The resurrection of multisensory cortex in primates: connection patterns that
integrates modalities. In The Handbook of Multisensory Processes, ed. G. Calvert, C. Spence, and B.E.
Stein, 285–93. Cambridge, MA: MIT Press.
Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates.
Proceedings of the National Academy of Sciences of the United States of America 97:11793–9.
Kaas, J.H., and A. Morel. 1993. Connections of visual areas of the upper temporal lobe of owl monkeys: The
MT crescent and dorsal and ventral subdivisions of FST. Journal of Neuroscience 13:534–46.
Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices
and their role in sensory integration (review). Frontiers in Integrative Neuroscience 3:7. doi: 10.3389/
neuro.07.007.2009.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48:373–84.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation
of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–74.
Komura, Y., R. Tamura, T. Uwano, H. Nishijo, and T. Ono. 2005. Auditory thalamus integrates visual inputs
into behavioral gains. Nature Neuroscience 8:1203–9.
Krubitzer, L.A., and J.H. Kaas. 1990. The organization and connections of somatosensory cortex in marmosets.
Journal of Neuroscience 10:952–74.
Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multi-
sensory interaction in primary auditory cortex. Neuron 53:279–92.
Lamarre, Y., L. Busby, and G. Spidalieri. 1983. Fast ballistic arm movements triggered by visual, auditory, and
somesthetic stimuli in the monkey: I. Activity of precentral cortical neurons. Journal of Neurophysiology
50:1343–58.
Laurienti, P.J., R.A. Kraft, J.A. Maldjian, J.H. Burdette, and M.T. Wallace. 2004. Semantic congruence is a
critical factor in multisensory behavioral performance. Experimental Brain Research 158:405–14.
Lewis, J.W., and D.C. Van Essen. 2000. Corticocortical connections of visual, sensorimotor, and multimodal pro-
cessing areas in the parietal lobe of the macaque monkey. Journal of Comparative Neurology 428:112–37.
Linden, J.F., A. Grunewald, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intra-
parietal area: II. Behavioral modulation. Journal of Neurophysiology 82:343–58.
Lovelace, C.T., B.E. Stein, and M.T. Wallace. 2003. An irrelevant light enhances auditory detection in humans:
A psychophysical analysis of multisensory integration in stimulus detection. Brain Research. Cognitive
Brain Research 17:447–53.
Martuzzi, R., M.M. Murray, C.M. Michel et al. 2007. Multisensory interactions within human primary cortices
revealed by BOLD dynamics. Cerebral Cortex 17:1672–9.
Maunsell, J.H., and D.C. Van Essen. 1983. The connections of the middle temporal visual area (MT) and their
relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience 3:2563–86.
Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology
14:247–79.
Miller, J., R. Ulrich, and Y. Lamarre. 2001. Locus of the redundant-signals effect in bimodal divided attention:
A neurophysiological analysis. Perception & Psychophysics 63:555–62.
Milner, B., M. Petrides, and M.L. Smith. 1985. Frontal lobes and the temporal organization of memory. Human
Neurobiology 4:137–42.
Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory–
visual interactions during early sensory processing in humans: A high-density electrical mapping study.
Brain Research. Cognitive Brain Research 14:115–28.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay 29
Mordkoff, J.T., J. Miller, and A.C. Roch. 1996. Absence of coactivation in the motor component: Evidence
from psychophysiological measures of target detection. Journal of Experimental Psychology. Human
Perception and Performance 22:25–41.
Morel, A., J. Liu, T. Wannier, D. Jeanmonod, and E.M. Rouiller. 2005. Divergence and convergence of thalamo-
cortical projections to premotor and supplementary motor cortex: A multiple tracing study in macaque
monkey. European Journal of Neuroscience 21:1007–29.
Murray, M.M., S. Molholm, C.M. Michel et al. 2005. Grabbing your ear: Rapid auditory–somatosensory multi
sensory interactions in low-level sensory cortices are not constrained by stimulus alignment. Cerebral
Cortex 15:963–74.
Palmer, S.M., and M.G. Rosa. 2006. A distinct anatomical network of cortical areas for analysis of motion in
far peripheral vision. European Journal of Neuroscience 24:2389–405.
Pandya, D.N., and B. Seltzer. 1982. Intrinsic connections and architectonics of posterior parietal cortex in the
rhesus monkey. Journal of Comparative Neurology 204:196–210.
Petrides, M., and S.D. Iversen. 1976. Cross-modal matching and the primate frontal cortex. Science 192:1023–4.
Poremba, A., R.C. Saunders, A.M. Crane, M. Cook, L. Sokoloff, and M. Mishkin. 2003. Functional mapping
of the primate auditory system. Science 299:568–72.
Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of
Sciences 24:574–90.
Rizzolatti, G., L. Fogassi, and V. Gallese. 1997. Parietal cortex: From sight to action (review). Current Opinion
in Neurobiology 7:562–7.
Rockland, K.S. 1996. Two types of corticopulvinar terminations: Round (type 2) and elongate (type 1). Journal
of Comparative Neurology 368:57–87.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50:19–26.
Romanski, L.M. 2004. Domain specificity in the primate prefrontal cortex (review). Cognitive, Affective &
Behavioral Neuroscience 4:421–9.
Romanski, L.M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral
lateral prefrontal cortex. Cerebral Cortex 17 Suppl. no. 1, i61–9.
Romanski, L.M., M. Giguere, J.F. Bates, and P.S. Goldman-Rakic. 1997. Topographic organization of medial
pulvinar connections with the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology
379:313–32.
Romanski, L.M., J.F. Bates, and P.S. Goldman-Rakic. 1999. Auditory belt and parabelt projections to the pre-
frontal cortex in the rhesus monkey. Journal of Comparative Neurology 403:141–57.
Romanski, L.M., and P.S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature
Neuroscience 5:15–6.
Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has
opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions.
Journal of Neuroscience 27:11465–72.
Rouiller, E.M., and C. Durif. 2004. The dual pattern of corticothalamic projection of the primary auditory cor-
tex in macaque monkey. Neuroscience Letters 358:49–52.
Rouiller, E.M., J. Tanné, V. Moret, I. Kermadi, D. Boussaoud, and E. Welker. 1998. Dual morphology and
topography of the corticothalamic terminals originating from the primary, supplementary motor, and
dorsal premotor cortical areas in macaque monkeys. Journal of Comparative Neurology 396:169–85.
Rouiller, E.M., and E. Welker. 2000. A comparative analysis of the morphology of corticothalamic projections
in mammals. Brain Research Bulletin 53:727–41.
Rouiller, E.M., T. Wannier, and A. Morel. 2003. The dual pattern of corticothalamic projection of the premotor
cortex in macaque monkeys. Thalamus & Related Systems 2:189–97.
Russ, B.E., A.M. Kim, K.L. Abrahamsen, R. Kiringoda, and Y.E. Cohen. 2006. Responses of neurons in the
lateral intraparietal area to central visual cues. Experimental Brain Research 174:712–27.
Sadato, N., A. Pascual-Leone, J. Grafman et al. 1996. Activation of the primary visual cortex by Braille reading
in blind subjects. Nature 380:526–8.
Salin, P.A., and J. Bullier. 1995. Corticocortical connections in the visual system: Structure and function.
Physiological Reviews 75:107–54.
Sathian, K., and A. Zangaladze. 2002. Feeling with the mind’s eye: Contribution of visual cortex to tactile
perception (review). Behavioural Brain Research 135:127–32.
Schall, J.D., A. Morel, D.J. King, and J. Bullier. 1995. Topography of visual cortex connections with fron-
tal eye field in macaque: Convergence and segregation of processing streams. Journal of Neuroscience
15:4464–87.
30 The Neural Bases of Multisensory Processes
Schlack, A., S.J. Sterbing-D’Angelo, K. Hartung, K.P. Hoffmann, and F. Bremmer. 2005. Multisensory space
representations in the macaque ventral intraparietal area. Journal of Neuroscience 25:4616–25.
Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas
of the macaque neocortex. Cognitive Brain Research 14:187–98.
Schroeder, C.E., R.W. Lindsley, C. Specht, A. Marcovici, J.F. Smiley, and D.C. Javitt. 2001. Somatosensory
input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–7.
Schwartz, M.L., J.J. Dekker, and P.S. Goldman-Rakic. 1991. Dual mode of corticothalamic synaptic termina-
tion in the mediodorsal nucleus of the rhesus monkey. Journal of Comparative Neurology 309:289–304.
Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior
temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology
343:445–63.
Sherman, S.M. 2007. The thalamus is more than just a relay. Current Opinion in Neurobiology 17:417–22.
Sherman, S.M., and R.W. Guillery. 2002. The role of the thalamus in the flow of information to the cortex.
Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 357:1695–708.
Sherman, S.M., and R.W. Guillery. 2005. Exploring the Thalamus and Its Role in Cortical Function. Cambridge:
MIT Press.
Shipp, S. 2003. The functional logic of cortico-pulvinar connections. Philosophical Transactions of the Royal
Society of London. Series B, Biological Sciences 358:1605–24.
Smiley, J.F., T.A. Hackett, I. Ulbert, G. Karmas, P. Lakatos, D.C. Javitt, and C.E. Schroeder. 2007. Multisensory
convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque
monkeys. Journal of Comparative Neurology 502:894–923.
Smiley, J.F., and A. Falchier. 2009. Multisensory connections of monkey auditory cerebral cortex. Hearing
Research 258:37–46.
Sperdin, H., C. Cappe, J.J. Foxe, and M.M. Murray. 2009. Early, low-level auditory–somatosensory multisen-
sory interactions impact reaction time speed. Frontiers in Integrative Neuroscience 3:2. doi:10.3389/
neuro.07.002.2009.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. Mcdade. 1989. Behavioral indices of multisensory inte-
gration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience
1:12–24.
Suzuki, H. 1985. Distribution and organization of visual and auditory neurons in the monkey prefrontal cortex.
Vision Research 25:465–9.
Tanné-Gariépy, J., E.M. Rouiller, and D. Boussaoud. 2002. Parietal inputs to dorsal versus ventral premo-
tor areas in the macaque monkey: Evidence for largely segregated visuomotor pathways. Experimental
Brain Research 145:91–103.
Taktakishvili, O., E. Sivan-Loukianova, K. Kultas-Ilinsky, and I.A. Ilinsky. 2002. Posterior parietal cortex pro-
jections to the ventral lateral and some association thalamic nuclei in Macaca mulatta. Brain Research
Bulletin 59:135–50.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–72.
Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo–auditory interactions in the primary visual cortex
of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79.
Yirmiya, R., and S. Hocherman. 1987. Auditory- and movement-related neural activity interact in the pulvinar
of the behaving rhesus monkey. Brain Research 402:93–102.
Zampini, M., D. Torresan, C. Spence, and M.M. Murray. 2007. Auditory–somatosensory multisensory interac-
tions in front and rear space. Neuropsychologia 45:1869–77.
3 What Can Multisensory
Processing Tell Us about
the Functional Organization
of Auditory Cortex?
Jennifer K. Bizley and Andrew J. King
CONTENTS
3.1 Introduction............................................................................................................................. 31
3.2 Functional Specialization within Auditory Cortex?................................................................ 32
3.3 Ferret Auditory Cortex: A Model for Multisensory Processing.............................................. 33
3.3.1 Organization of Ferret Auditory Cortex...................................................................... 33
3.3.2 Surrounding Cortical Fields........................................................................................ 35
3.3.3 Sensitivity to Complex Sounds.................................................................................... 36
3.3.4 Visual Sensitivity in Auditory Cortex......................................................................... 36
3.3.5 Visual Inputs Enhance Processing in Auditory Cortex............................................... 39
3.4 Where Do Visual Inputs to Auditory Cortex Come From?.....................................................40
3.5 What Are the Perceptual Consequences of Multisensory Integration in the Auditory
Cortex?..................................................................................................................................... 41
3.5.1 Combining Auditory and Visual Spatial Representations in the Brain....................... 42
3.5.2 A Role for Auditory Cortex in Spatial Recalibration?................................................. 43
3.6 Concluding Remarks...............................................................................................................44
References.........................................................................................................................................44
3.1 INTRODUCTION
The traditional view of sensory processing is that the pooling and integration of information across
different modalities takes place in specific areas of the brain only after extensive processing within
modality-specific subcortical and cortical regions. This seems like a logical arrangement because
our various senses are responsible for transducing different forms of energy into neural activity
and give rise to quite distinct perceptions. To a large extent, each of the sensory systems can oper-
ate independently. We can, after all, understand someone speaking by telephone or read a book
perfectly well without recourse to cues provided by other modalities. It is now clear, however, that
multisensory convergence is considerably more widespread in the brain, and particularly the cere-
bral cortex, than was once thought. Indeed, even the primary cortical areas in each of the main
senses have been claimed as part of the growing network of multisensory regions (Ghazanfar and
Schroeder 2006).
It is clearly beneficial to be able to combine information from the different senses. Although the
perception of speech is based on the processing of sound, what we actually hear can be influenced by
visual cues provided by lip movements. This can result in an improvement in speech intelligibility
31
32 The Neural Bases of Multisensory Processes
in the presence of other distracting sounds (Sumby and Pollack 1954) or even a subjective change
in the speech sounds that are perceived (McGurk and MacDonald 1976). Similarly, the accuracy
with which the source of a sound can be localized is affected by the availability of both spatially
congruent (Shelton and Searle 1980; Stein et al. 1989) and conflicting (Bertelson and Radeau 1981)
visual stimuli. With countless other examples of cross-modal interactions at the perceptual level
(Calvert and Thesen 2004), it is perhaps not surprising that multisensory convergence is so widely
found throughout the cerebral cortex.
The major challenge that we are now faced with is to identify the function of multisensory inte-
gration in different cortical circuits, and particularly at early levels of the cortical hierarchy—the
primary and secondary sensory areas—which are more likely to be involved in general-purpose
processing relating to multiple sound parameters than in task-specific computational operations
(Griffiths et al. 2004; King and Nelken 2009). In doing so, we have to try and understand how other
modalities influence the sensitivity or selectivity of cortical neurons in those areas while retaining
the modality specificity of the percepts to which the activity of the neurons contributes. By inves-
tigating the sources of origin of these inputs and the way in which they interact with the dominant
input modality for a given cortical area, we can begin to constrain our ideas about the potential
functions of multisensory integration in early sensory cortex.
In this article, we focus on the organization and putative functions of visual inputs to the audi-
tory cortex. Although anatomical and physiological studies have revealed multisensory interactions
in visual and somatosensory areas, it is arguably the auditory cortex where most attention has been
paid and where we may be closest to answering these questions.
field (AAF), ventral PAF (VPAF), and secondary auditory cortex (A2) do not appear to contribute
to this task (Malhotra and Lomber 2007). Moreover, a double dissociation between PAF and AAF
in the same animals has been demonstrated, with impaired sound localization produced by cooling
of PAF but not AAF, and impaired temporal pattern discrimination resulting from inactivation of
AAF but not PAF (Lomber and Malhotra 2008). Lastly, anatomical projection patterns in nonhu-
man primates support differential roles for rostral and caudal auditory cortex, with each of those
areas having distinct prefrontal targets (Hackett et al. 1999; Romanski et al. 1999).
Despite this apparent wealth of data in support of functional specialization within the auditory
cortex, there are a number of studies that indicate that sensitivity to both spatial and nonspatial
sound attributes is widely distributed across different cortical fields (Harrington et al. 2008; Stecker
et al. 2003; Las et al. 2008; Hall and Plack 2009; Recanzone 2008; Nelken et al. 2008; Bizley et al.
2009). Moreover, in humans, circumscribed lesions within the putative “what” and “where” path-
ways do not always result in the predicted deficits in sound recognition and localization (Adriani et
al. 2003). Clearly defined output pathways from auditory cortex to prefrontal cortex certainly seem
to exist, but what the behavioral deficits observed following localized deactivation or damage imply
about the functional organization of the auditory cortex itself is less clear-cut. Loss of activity in
any one part of the network will, after all, affect both upstream cortical areas and potentially the
responses of subcortical neurons that receive descending projections from that region of the cor-
tex (Nakamoto et al. 2008). Thus, a behavioral deficit does not necessarily reflect the specialized
properties of the neurons within the silenced cortical area per se, but rather the contribution of the
processing pathways that the area is integral to.
Can the distribution and nature of multisensory processing in the auditory cortex help reconcile
the apparently contrasting findings outlined above? If multisensory interactions in the cortex are
to play a meaningful role in perception and behavior, it is essential that the neurons can integrate
the corresponding multisensory features of individual objects or events, such as vocalizations and
their associated lip movements or the visual and auditory cues originating from the same location
in space. Consequently, the extent to which spatial and nonspatial sound features are processed in
parallel in the auditory cortex should also be apparent in both the multisensory response properties
of the neurons found there and the sources of origin of its visual inputs. Indeed, evidence for task-
specific activation of higher cortical areas by different stimulus modalities has recently been pro-
vided in humans (Renier et al. 2009). In the next section, we focus on the extent to which anatomical
and physiological studies of multisensory convergence and processing in the auditory cortex of the
ferret have shed light on this issue. In recent years, this species has gained popularity for studies of
auditory cortical processing, in part because of its particular suitability for behavioral studies.
(a)
PPr PPc
S1 body SIII
17
A 18
AL MLS
S1 face LS?
19
AAF A1 21
S
D RS S
SSY
PLLS?
M LRS ADF
sss AVF PPF
PS
PSF
C 20a
ssd
VP 20b
5 mm
(b) (c)
I-IV
V-VI
wm
V-VI
I-IV wm wm
CTβ
D
BDA CTβ
R BDA
1 mm
(d)
FIGURE 3.1 Visual inputs to ferret auditory cortex. (a) Ferret sensory cortex. Visual (areas 17–20, PS, SSY,
AMLS), posterior parietal (PPr, PPc), somatosensory (S1, SIII, MRSS), and auditory areas (A1, AAF, PPF,
PSF, and ADF) have been identified. In addition, LRSS and AVF are multisensory regions, although many
of the areas classified as modality specific also contain some multisensory neurons. (b) Location of neurons
in visual cortex that project to auditory cortex. Tracer injections made into core auditory cortex (A1: BDA,
shown in black, and AAF: CTβ, shown in gray) result in retrograde labeling in early visual areas. Every fifth
section (50 µm thick) was examined, but for the purpose of illustration, labeling from four sections was col-
lapsed onto single sections. Dotted lines mark the limit between cortical layers IV and V; dashed lines delimit
the white matter (wm). (c) Tracer injections made into belt auditory cortex. Retrograde labeling after an injec-
tion of CTβ into the anterior fields (on the borders of ADF and AVF) is shown in gray, and retrograde labeling
resulting from a BDA injection into the posterior fields PPF and PSF is shown in black. Note the difference
in the extent and distribution of labeling after injections into the core and belt areas of auditory cortex. Scale
bars in (b) and (c), 1 mm. (d) Summary of sources of visual cortical input to auditory cortex. (Anatomical data
adapted with permission from Bizley, J.K. et al., Cereb. Cortex, 17, 2172–89, 2007.)
Auditory Cortex according to Multisensory Processing 35
Neurons in the posterior fields can be distinguished from those in the primary areas by the
temporal characteristics of their responses; discharges are often sustained and they vary in latency
and firing pattern in a stimulus-dependent manner. The frequency response areas of posterior field
neurons are often circumscribed, exhibiting tuning for sound level as well as frequency. As such,
the posterior fields in the ferret resemble PAF and VPAF in the cat (Stecker et al. 2003; Phillips and
Orman 1984; Loftus and Sutter 2001) and cortical areas R and RT in the marmoset monkey (Bizley
et al. 2005; Bendor and Wang 2008), although whether PPF and PSF actually correspond to these
fields is uncertain.
Neurons in ADF also respond to pure tones, but are not tonotopically organized (Bizley et al.
2005). The lack of tonotopicity and the broad, high-threshold frequency response areas that char-
acterize this field are also properties of cat A2 (Schreiner and Cynader 1984). However, given that
ferret ADF neurons seem to show relatively greater spatial sensitivity than those in surrounding
cortical fields (see following sections), which is not a feature of cat A2, it seems unlikely that
these areas are homologous. Ventral to ADF lies AVF. Although many of the neurons that have
been recorded there are driven by sound, the high incidence of visually responsive neurons (see
Section 3.3.4) makes it likely that AVF should be regarded as a parabelt or higher multisensory field.
Given its proximity to the somatosensory area on the medial bank of the rostral suprasylvian sulcus
(MRSS) (Keniston et al. 2009), it is possible that AVF neurons might also be influenced by tactile
stimuli, but this remains to be determined.
Other studies have also highlighted the multisensory nature of the anterior ectosylvian gyrus. For
example, Ramsay and Meredith (2004) described an area surrounding the pseudosylvian sulcus that
receives largely segregated inputs from the primary visual and somatosensory cortices, which they
termed the pseudosylvian sulcal cortex. Manger et al. (2005) reported that a visually responsive
area lies parallel to the pseudosylvian sulcus on the posterolateral half of the anterior ectosylvian
gyrus, which also contains bisensory neurons that respond either to both visual and tactile or to
visual and auditory stimulation. They termed this area AEV, following the terminology used for the
visual region within the cat’s anterior ectosylvian sulcus. Because this region overlaps in part with
the acoustically responsive areas that we refer to as ADF and AVF, further research using a range of
stimuli will be needed to fully characterize this part of the ferret’s cortex. However, the presence of
a robust projection from AVF to the superior colliculus (Bajo et al. 2010) makes it likely that this is
equivalent to the anterior ectosylvian sulcus in the cat.
(2005) as the ferret homologue of primate motion-processing area MT. This region has also been
described by Manger et al. (2008) as the posteromedial suprasylvian visual area, but we will stay
with the terminology used in our previous articles and refer to it as SSY. PS has not been compre-
hensively investigated and, to our knowledge, neither of these sulcal fields have been tested with
auditory or somatosensory stimuli. On the lateral banks of the suprasylvian sulcus, at the dorsal and
caudal edges of the EG, remains an area of uninvestigated cortex. On the basis of its proximity to
AMLS and SSY, this region has tentatively been divided into the anterolateral lateral suprasylvian
visual area (ALLS) and the posterolateral lateral suprasylvian visual area (PLLS) by Manger et al.
(2008). However, because these regions of the sulcal cortex lie immediately adjacent to the primary
auditory fields, it is much more likely that they are multisensory in nature.
artificial stimuli presented under anesthesia. Sensitivity to visual stimulation was defined as a statisti-
cally significant change in spiking activity after the presentation of light flashes from a light-emitting
diode (LED) positioned in the contralateral hemifield or by a significant modulation of the response to
auditory stimulation even if the LED by itself was apparently ineffective in driving the neuron.
Although the majority of neurons recorded in the auditory cortex were classified as auditory alone,
the activity of more than one quarter was found to be influenced by visual stimulation. Figure 3.2a
shows the relative proportion of different response types observed in the auditory cortex as a whole.
(a) 7% (b)
140 284 143 127 225 105
7% 100 Visual
AV Bisensory
AV mod Auditory
25
Auditory
72% 0
A1 AAF PPF PSF ADF AVF
Area
(c) (d)
1 Auditory
Visual
A1 Audio-visual
0.8
MI (bits) mean spike latency
AAF
PPF 0.6
PSF
0.4
ADF
0.2
AVF
Enhancement Suppression
0
0 0.25 0.5 0.75 0 0.2 0.4 0.6 0.8 1
Proportion of cells MI (bits) spike count
FIGURE 3.2 Visual–auditory interactions in ferret auditory cortex. (a) Proportion of neurons (n = 716) that
responded to contralaterally presented noise bursts (auditory), to light flashes from an LED positioned in the
contralateral visual field (visual), to both of these stimuli (AV), or whose responses to the auditory stimulus
were modulated by the presentation of the visual stimulus, which did not itself elicit a response (AVmod).
(b) Bar graph showing the relative proportions of unisensory auditory (white), unisensory visual (black), and
bisensory (gray) neurons recorded in each auditory field. The actual numbers of neurons recorded are given at
the top of each column. (c) Proportion of neurons whose spike rates in response to combined visual–auditory
stimulation were enhanced or suppressed. Total number of bisensory neurons in each field: A1, n = 9; AAF,
n = 16; PPF, n = 13; PSF, n = 32; ADF, n = 32; AVF, n = 24. (d) Distribution of mutual information (MI) values
obtained when two reduced spike statistics were used: spike count and mean spike latency. Points above the
unity line indicate that mean response latency was more informative about the stimulus than spike count. This
was increasingly the case for all three stimulus conditions when the spike counts were low. (Anatomical data
adapted from Bizley, J.K. et al., Cereb. Cortex, 17, 2172–89, 2007 and Bizley, J.K., and King, A.J., Hearing
Res., 258, 55–63, 2009.)
38 The Neural Bases of Multisensory Processes
Bisensory neurons comprised both those neurons whose spiking responses were altered by auditory
and visual stimuli and those whose auditory response was modulated by the simultaneously presented
visual stimulus. The fact that visual stimuli can drive spiking activity in the auditory cortex has also
been described in highly trained monkeys (Brosch et al. 2005). Nevertheless, this finding is unusual, as
most reports emphasize the modulatory nature of nonauditory inputs on the cortical responses to sound
(Ghazanfar 2009; Musacchia and Schroeder 2009). At least part of the explanation for this is likely to
be that we analyzed our data by calculating the mutual information between the neural responses and
the stimuli that elicited them. Information (in bits) was estimated by taking into account the temporal
pattern of the response rather than simply the overall spike count. This method proved to be substan-
tially more sensitive than a simple spike count measure, and allowed us to detect subtle, but nonetheless
significant, changes in the neural response produced by the presence of the visual stimulus.
Although neurons exhibiting visual–auditory interactions are found in all six areas of the ferret
cortex, the proportion of such neurons varies in different cortical areas (Figure 3.2b). Perhaps not
surprisingly, visual influences are least common in the primary areas, A1 and AAF. Nevertheless,
approximately 20% of the neurons recorded in those regions were found to be sensitive to visual
stimulation, and even included some unisensory visual responses. In the fields on the posterior
ectosylvian gyrus and ADF, 40% to 50% of the neurons were found to be sensitive to visual stimuli.
This rose to 75% in AVF, which, as described in Section 3.3.1, should probably be regarded as a
multisensory rather than as a predominantly auditory area.
We found that visual stimulation could either enhance or suppress the neurons’ response to sound
and, in some cases, increased the precision in their spike timing without changing the overall firing
rate (Bizley et al. 2007). Analysis of all bisensory neurons, including both neurons in which there
was a spiking response to each sensory modality and those in which concurrent auditory–visual
stimulation modulated the response to sound alone, revealed that nearly two-thirds produced stron-
ger responses to bisensory than to unisensory auditory stimulation. Figure 3.2c shows the propor-
tion of response types in each cortical field. Although the sample size in some areas was quite small,
the relative proportions of spiking responses that were either enhanced or suppressed varied across
the auditory cortex. Apart from the interactions in A1, the majority of the observed interactions
were facilitatory rather than suppressive.
Although a similar trend for a greater proportion of sites to show enhancement as compared with
suppression has been reported for local field potential data in monkey auditory cortex, analysis of
spiking responses revealed that suppressive interactions are more common (Kayser et al. 2008).
This trend was found across four different categories of naturalistic and artificial stimuli, so the
difference in the proportion of facilitatory and suppressive interactions is unlikely to reflect the
use of different stimuli in the two studies. By systematically varying onset asynchronies between
the visual and auditory stimuli, we did observe in a subset of neurons that visual stimuli could
have suppressive effects when presented 100 to 200 ms before the auditory stimuli, which were not
apparent when the two modalities were presented simultaneously (Bizley et al. 2007). This finding,
along with the results of several other studies (Meredith et al. 2006; Dehner et al. 2004; Allman et
al. 2008), emphasizes the importance of using an appropriate combination of stimuli to reveal the
presence and nature of cross-modal interactions.
Examination of the magnitude of cross-modal facilitation in ferret auditory cortex showed that
visual–auditory interactions are predominantly sublinear. In other words, both the mutual informa-
tion values (in bits) and the spike rates in response to combined auditory–visual stimulation are
generally less than the linear sum of the responses to the auditory and visual stimuli presented in
isolation, although some notable exceptions to this have been found (e.g., Figure 2E, F of Bizley et
al. 2007). This is unsurprising as the stimulus levels used in that study were well above threshold
and, according to the “inverse effectiveness principle” (Stein et al. 1988), were unlikely to produce
supralinear responses to combined visual–auditory stimulation. Consistent with this is the observa-
tion of Kayser et al. (2008), showing that, across stimulus types, multisensory facilitation is more
common for those stimuli that are least effective in driving the neurons.
Auditory Cortex according to Multisensory Processing 39
As mentioned above, estimates of the mutual information between the neural responses and each
of the stimuli that produce them take into account the full spike discharge pattern. It is then possible
to isolate the relative contributions of spike number and spike timing to the neurons’ sensitivity to
multisensory stimulation. It has previously been demonstrated in both ferret and cat auditory cortex
that the stimulus information contained in the complete spike pattern is conveyed by a combination
of spike count and mean spike latency (Nelken et al. 2005). By carrying out a similar analysis of
the responses to the brief stimuli used to characterize visual–auditory interactions in ferret auditory
cortex, we found that more than half the neurons transmitted more information in the timing of their
responses than in their spike counts (Bizley et al. 2007). This is in agreement with the results of
Nelken et al. (2005) for different types of auditory stimuli. We found that this was equally the case for
unisensory auditory or visual stimuli and for combined visual–auditory stimulation (Figure 3.2d).
2 1.8
0.6
1.8
1.6
1.6
Mutual information (bits)
0.5 1.4
1.4
1.2
1.2 0.4
1
1
0.3 0.8
0.8
0.6 0.6
0.2
0.4 0.4
0.2 0.2
0.1
0 0
A1 AAF PPF PSF ADF A1 AAF PPF PSF ADF A1 AAF PPF PSF ADF
Cortical field
FIGURE 3.3 Box plots displaying the amount of information transmitted by neurons in each of five ferret
cortical fields about LED location (a), sound-source location (b), or the location of temporally and spatially
congruent auditory–visual stimuli (c). Only neurons for which there was a significant unisensory visual or
auditory response are plotted in (a) and (b), respectively, whereas (c) shows the multisensory mutual informa-
tion values for all neurons recorded, irrespective of their response to unisensory stimulation. The box plots
show the median (horizontal bar), interquartile range (boxes), spread of data (tails), and outliers (cross sym-
bols). The notch indicates the distribution of data about the median. There were significant differences in the
mutual information values in different cortical fields (Kruskal–Wallis test; LED location, p = .0001; auditory
location, p = .0035; bisensory stimulus location, p < .0001). Significant post hoc pairwise differences (Tukey–
Kramer test, p < .05) between individual cortical fields are shown by the lines above each box plot. Note that
neurons in ADF transmitted the most spatial information irrespective of stimulus modality. (Adapted with
permission from Bizley, J.K., and King, A.J., Brain Res., 1242, 24–36, 2008.)
40 The Neural Bases of Multisensory Processes
conditions, spatial sensitivity was found to be highest in ADF, supporting the notion that there is
some functional segregation across the auditory cortex, with the anterior fields more involved in
spatial processing. Relative to the responses to sound alone, the provision of spatially coincident
visual cues frequently altered the amount of information conveyed by the neurons about stimulus
location. Bisensory stimulation reduced the spatial information in the response in one third of these
cases, but increased it in the remaining two thirds. Thus, overall, visual inputs to the auditory cortex
appear to enhance spatial processing.
Because of the simple stimuli that were used in these studies, it was not possible to determine
whether or how visual inputs might affect the processing of nonspatial information in ferret auditory
cortex. However, a number of studies in primates have emphasized the benefits of visual influences
on auditory cortex in terms of the improved perception of vocalizations. In humans, lip reading
has been shown to activate the auditory cortex (Molholm et al. 2002; Giard and Peronnet 1999;
Calvert et al. 1999), and a related study in macaques has shown that presenting a movie of a mon-
key vocalizing can modulate the auditory cortical responses to that vocalization (Ghazanfar et al.
2005). These effects were compared to a visual control condition in which the monkey viewed a
disk that was flashed on and off to approximate the movements of the animal’s mouth. In that study,
the integration of face and voice stimuli was found to be widespread in both core and belt areas of
the auditory cortex. However, to generate response enhancement, a greater proportion of recording
sites in the belt areas required the use of a real monkey face, whereas nonselective modulation of
auditory cortical responses was more common in the core areas. Because a number of cortical areas
have now been shown to exhibit comparable sensitivity to monkey calls (Recanzone 2008), it would
be of considerable interest to compare the degree to which face and non-face visual stimuli can
modulate the activity of the neurons found there. This should help us determine the relative extent
to which each area might be specialized for processing communication signals.
These data revealed a clear projection pattern whereby specific visual cortical fields innervate spe-
cific auditory fields. A sparse direct projection exists from V1 to the core auditory cortex (A1 and
AAF), which originates from the region of V1 that represents the peripheral visual field. This find-
ing mirrors that of the reciprocal A1 to V1 projection in monkeys and cats, which terminates in
the peripheral field representation of V1 (Rockland and Ojima 2003; Falchier et al. 2002; Hall and
Lomber 2008). Ferret A1 and AAF are also weakly innervated by area V2. The posterior auditory
fields, PPF and PSF, are innervated principally by areas 20a and 20b, thought to be part of the visual
form-processing pathway (Manger et al. 2004). In contrast, the largest inputs to the anterior fields,
ADF and AVF, come from SSY, which is regarded as part of the visual “where” processing stream
(Philipp et al. 2006).
Interestingly, this difference in the sources of cortical visual input, which is summarized in
Figure 3.1d, appears to reflect the processing characteristics of the auditory cortical fields con-
cerned. As described above, the fields on the posterior ectosylvian gyrus are more sensitive to
pitch and timbre, parameters that contribute to the identification of a sound source, whereas spatial
sensitivity for auditory, visual, and multisensory stimuli is greatest in ADF (Figure 3.3). This func-
tional distinction therefore matches the putative roles of the extrastriate areas that provide the major
sources of cortical visual input to each of these regions.
These studies appear to support the notion of a division of labor across the nonprimary areas
of ferret auditory cortex, but it would be premature to conclude that distinct fields are responsible
for the processing of spatial and nonspatial features of the world. Thus, although PSF is innervated
by nonspatial visual processing areas 20a and 20b (Figure 3.1c), the responses of a particularly
large number of neurons found there show an increase in transmitted spatial information when a
spatially congruent visual stimulus is added to the auditory stimulus (Bizley and King 2008). This
could be related to a need to integrate spatial and nonspatial cues when representing objects and
events in the auditory cortex. The possibility that connections between the visual motion-sensi-
tive area SSY and the fields on the anterior ectosylvian gyrus are involved in processing spatial
information provided by different sensory modalities is supported by a magnetoencephalography
study in humans showing that audio–visual motion signals are integrated in the auditory cortex
(Zvyagintsev et al. 2009). However, we must not forget that visual motion also plays a key role in
the perception of communication calls. By making intracranial recordings in epileptic patients,
Besle et al. (2008) found that the visual cues produced by lip movements activate MT followed,
approximately 10 ms later, by secondary auditory areas, where they alter the responses to sound in
ways that presumably influence speech perception. Thus, although the influence of facial expres-
sions on auditory cortical neurons is normally attributed to feedback from the superior temporal
sulcus (Ghazanfar et al. 2008), the availability of lower-level visual signals that provide cues to
sound onset and offset may be important as well.
relies on both vocal calls and facial gestures. The role of multisensory processing in receptive audi-
tory communication is considered in more detail in other chapters in this volume. Here, we will
focus on the consequences of merging spatial information across different sensory modalities in
the auditory cortex.
with a straightforward transformation into eye-centered coordinates. Rather, spatial tuning seems to
take on an intermediate form between eye-centered and head-centered coordinates (Werner-Reiss
et al. 2003).
of auditory space (Aytekin et al. 2008). Although vision is not essential for the recalibration of
auditory space in monaurally occluded ferrets, it is certainly possible that training with congruent
multisensory cues might result in faster learning than that seen with auditory cues alone, as shown
in humans for a motion detection task (Kim et al. 2008).
REFERENCES
Adriani, M., P. Maeder, R. Meuli et al. 2003. Sound recognition and localization in man: Specialized cortical
networks and effects of acute circumscribed lesions. Experimental Brain Research 153:591–604.
Alain, C., S.R. Arnott, S. Hevenor, S. Graham, and C.L. Grady. 2001. “What” and “where” in the human
auditory system. Proceedings of the National Academy of Sciences of the United States of America
98:12301–6.
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current
Biology 14:257–62.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual
neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific
coding. Brain Research 1242:95–101.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Adult deafness induces somatosensory conversion of
ferret auditory cortex. Proceedings of the National Academy of Sciences of the United States of America
106:5925–30.
Aytekin, M., C.F. Moss, and J.Z. Simon. 2008. A sensorimotor approach to sound localization. Neural
Computation 20:603–35.
Bajo, V.M., F.R. Nodal, J.K. Bizley, and A.J. King. 2010. The non-lemniscal auditory cortex in ferrets:
Convergence of corticotectal inputs in the superior colliculus. Frontiers in Neuroanatomy 4:18.
Barrett, D.J., and D.A. Hall. 2006. Response preferences for “what” and “where” in human non-primary audi-
tory cortex. NeuroImage 32:968–77.
Beer, A.L., and T. Watanabe. 2009. Specificity of auditory-guided visual perceptual learning suggests cross-
modal plasticity in early visual cortex. Experimental Brain Research 198:353–61.
Bendor, D., and X. Wang. 2005. The neuronal representation of pitch in primate auditory cortex. Nature
436:1161–5.
Bendor, D., and X. Wang. 2008. Neural response properties of primary, rostral, and rostrotemporal core fields
in the auditory cortex of marmoset monkeys. Journal of Neurophysiology 100:888–906.
Auditory Cortex according to Multisensory Processing 45
Bertelson, P., and M. Radeau. 1981. Cross-modal bias and perceptual fusion with auditory-visual spatial dis-
cordance. Perception & Psychophysics 29:578–84.
Besle, J., C. Fischer, A. Bidet-Caulet, F. Lecaignard, O. Bertrand, and M.H. Giard. 2008. Visual activation
and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in
humans. Journal of Neuroscience 28:14301–10.
Bizley, J.K., and A.J. King. 2008. Visual-auditory spatial processing in auditory cortical neurons. Brain
Research 1242:24–36.
Bizley, J.K., and A.J. King. 2009. Visual influences on ferret auditory cortex. Hearing Research 258:55–63.
Bizley, J.K., F.R. Nodal, I. Nelken, and A.J. King. 2005. Functional organization of ferret auditory cortex.
Cerebral Cortex 15:1637–53.
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for
multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89.
Bizley, J.K., K.M. Walker, B.W. Silverman, A.J. King, and J.W. Schnupp. 2009. Interdependent encoding of
pitch, timbre, and spatial location in auditory cortex. Journal of Neuroscience 29:2064–75.
Bizley, J.K., and K.M. Walker, A.J. King, and J.W. Schnupp. 2010. Neural ensemble codes for stimulus period-
icity in auditory cortex. Journal of Neuroscience 30:5078–91.
Bonath, B., T. Noesselt, A. Martinez et al. 2007. Neural basis of the ventriloquist illusion. Current Biology
17:1697–703.
Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate audi-
tory cortex of highly trained monkeys. Journal of Neuroscience 25:6797–806.
Budinger, E., P. Heil, A. Hess, and H. Scheich. 2006. Multisensory processing via early cortical stages:
Connections of the primary auditory cortical field with other sensory systems. Neuroscience 143:
1065–83.
Budinger, E., A. Laszcz, H. Lison, H. Scheich, and F.W. Ohl. 2008. Non-sensory cortical and subcortical con-
nections of the primary auditory cortex in Mongolian gerbils: Bottom-up and top-down processing of
neuronal information via field AI. Brain Research 1220:2–32.
Cahill, L., F. Ohl, and H. Scheich. 1996. Alteration of auditory cortex activity with a visual stimulus through
conditioning: a 2-deoxyglucose analysis. Neurobiology of Learning and Memory 65:213–22.
Calvert, G.A., and T. Thesen. 2004. Multisensory integration: Methodological approaches and emerging prin-
ciples in the human brain. Journal of Physiology, Paris 98:191–205.
Calvert, G.A., M.J. Brammer, E.T. Bullmore, R. Campbell, S.D. Iversen, and A.S. David. 1999. Response
amplification in sensory-specific cortices during crossmodal binding. Neuroreport 10:2619–23.
Campi, K.L., K.L. Bales, R. Grunewald, and L. Krubitzer. 2010. Connections of auditory and visual cortex in
the prairie vole (Microtus ochrogaster): Evidence for multisensory processing in primary sensory areas.
Cerebral Cortex 20:89–108.
Cantone, G., J. Xiao, and J.B. Levitt. 2006. Retinotopic organization of ferret suprasylvian cortex. Visual
Neuroscience 23:61–77.
Cappe, C., A. Morel, P. Barone, and E.M. Rouiller. 2009. The thalamocortical projection systems in primate:
An anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19:2025–37.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience 22:5749–59.
Frissen, I., J. Vroomen, B. De Gelder, and P. Bertelson. 2005. The aftereffects of ventriloquism: Generalization
across sound-frequencies. Acta Psychologica 118:93–100.
Fu, K.M., A.S. Shah, M.N. O’Connell et al. 2004. Timing and laminar profile of eye-position effects on audi-
tory responses in primate auditory cortex. Journal of Neurophysiology 92:3522–31.
Ghazanfar, A.A. 2009. The multisensory roles for auditory cortex in primate vocal communication. Hearing
Research 258:113–20.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–85.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience 28:4457–69.
46 The Neural Bases of Multisensory Processes
Giard, M.H., and F. Peronnet. 1999. Auditory-visual integration during multimodal object recognition in
humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90.
Goodale, M.A., and D.A. Westwood. 2004. An evolving view of duplex vision: Separate but interacting cortical
pathways for perception and action. Current Opinion in Neurobiology 14:203–11.
Griffiths, T.D., J.D. Warren, S.K. Scott, I. Nelken, and A.J. King. 2004. Cortical processing of complex sound:
A way forward? Trends in Neuroscience 27:181–5.
Groh J.M., A.S. Trause, A.M. Underhill, K.R. Clark, and S. Inati. 2001. Eye position influences auditory
responses in primate inferior colliculus. Neuron 29:509–18.
Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in
macaque monkeys. Brain Research 817:45–58.
Hackett, T.A., L.A. De La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C.E. Schroeder. 2007a. Multisensory
convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane.
Journal of Comparative Neurology 502:924–52.
Hackett, T.A., J.F. Smiley, I. Ulbert et al. 2007b. Sources of somatosensory input to the caudal belt areas of
auditory cortex. Perception 36:1419–30.
Hall, A.J., and S.G. Lomber. 2008. Auditory cortex projections target the peripheral field representation of
primary visual cortex. Experimental Brain Research 190:413–30.
Hall, D.A., and C.J. Plack. 2009. Pitch processing sites in the human auditory brain. Cerebral Cortex
19:576–85.
Harrington, I.A., G.C. Stecker, E.A. Macpherson, and J.C. Middlebrooks. 2008. Spatial sensitivity of neurons
in the anterior, posterior, and primary fields of cat auditory cortex. Hearing Research 240:22–41.
Hartline, P.H., R.L. Vimal, A.J. King, D.D. Kurylo, and D.P. Northmore. 1995. Effects of eye position on audi-
tory localization and neural representation of space in superior colliculus of cats. Experimental Brain
Research 104:402–8.
Imaizumi, K., N.J. Priebe, P.A. Crum, P.H. Bedenbaugh, S.W. Cheung, and C.E. Schreiner. 2004. Modular
functional organization of cat anterior auditory field. Journal of Neurophysiology 92:444–57.
Jay, M.F., and D.L. Sparks. 1987. Sensorimotor integration in the primate superior colliculus: II. Coordinates
of auditory signals. Journal of Neurophysiology 57:35–55.
Jenison, R.L. 2000. Correlated cortical populations can enhance sound localization performance. Journal of the
Acoustical Society of America 107:414–21.
Kacelnik, O., F.R. Nodal, C.H. Parsons, and A.J. King. 2006. Training-induced plasticity of auditory localiza-
tion in adult mammals. PLoS Biology 4:627–38.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation
of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–74.
Keniston, L.P., B.L. Allman, and M.A. Meredith. 2008. The rostral suprasylvian sulcus (RSSS) of the ferret: A
‘new’ multisensory area. Society for Neuroscience Abstracts 38:457.10.
Keniston, L.P., B.L. Allman, M.A. Meredith, and H.R. Clemo. 2009. Somatosensory and multisensory properties
of the medial bank of the ferret rostral suprasylvian sulcus. Experimental Brain Research 196:239–51.
Kim, R.S., A.R. Seitz, and L. Shams. 2008. Benefits of stimulus congruency for multisensory facilitation of
visual learning. PLoS ONE 3:e1532.
King, A.J. 2009. Visual influences on auditory spatial learning. Philosophical Transactions of the Royal Society
of London. Series B, Biological Sciences 364:331–9.
King, A.J., and M.E. Hutchings. 1987. Spatial response properties of acoustically responsive neurons in the
superior colliculus of the ferret: A map of auditory space. Journal of Neurophysiology 57:596–624.
King, A.J., and I. Nelken. 2009. Unraveling the principles of auditory cortical processing: Can we learn from
the visual system? Nature Neuroscience 12:698–701.
King, A.J., and J.C. Middlebrooks. 2011. Cortical representation of auditory space. In The Auditory Cortex,
eds. J.A. Winer and C.E. Schreiner, 329–41. New York: Springer.
King, A.J., J.W. Schnupp, and T.P. Doubell. 2001. The shape of ears to come: Dynamic coding of auditory
space. Trends in Cognitive Sciences 5:261–70.
Las, L., A.H. Shapira, and I. Nelken. 2008. Functional gradients of auditory sensitivity along the anterior ecto-
sylvian sulcus of the cat. Journal of Neuroscience 28:3657–67.
Lewald, J. 2002. Rapid adaptation to auditory–visual spatial disparity. Learning and Memory 9:268–78.
Loftus, W.C., and M.L. Sutter. 2001. Spectrotemporal organization of excitatory and inhibitory receptive fields
of cat posterior auditory field neurons. Journal of Neurophysiology 86:475–91.
Auditory Cortex according to Multisensory Processing 47
Lomber, S.G., and S. Malhotra. 2008. Double dissociation of ‘what’ and ‘where’ processing in auditory cortex.
Nature Neuroscience 11:609–16.
Maeder, P.P., R.A. Meuli, M. Adriani et al. 2001. Distinct pathways involved in sound recognition and localiza-
tion: A human fMRI study. Neuroimage 14:802–16.
Malhotra, S., and S.G. Lomber. 2007. Sound localization during homotopic and heterotopic bilateral cooling deacti-
vation of primary and nonprimary auditory cortical areas in the cat. Journal of Neurophysiology 97:26–43.
Manger, P.R., I. Masiello, and G.M. Innocenti. 2002. Areal organization of the posterior parietal cortex of the
ferret (Mustela putorius). Cerebral Cortex 12:1280–97.
Manger, P.R., H. Nakamura, S. Valentiniene, and G.M. Innocenti. 2004. Visual areas in the lateral temporal
cortex of the ferret (Mustela putorius). Cerebral Cortex 14:676–89.
Manger, P.R., G. Engler, C.K. Moll, and A.K. Engel. 2005. The anterior ectosylvian visual area of the fer-
ret: A homologue for an enigmatic visual cortical area of the cat? European Journal of Neuroscience
22:706–14.
Manger, P.R., G. Engler, C.K. Moll, and A.K. Engel. 2008. Location, architecture, and retinotopy of the antero-
medial lateral suprasylvian visual area (AMLS) of the ferret (Mustela putorius). Visual Neuroscience
25:27–37.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–8.
McLaughlin, D.F., R.V. Sonty, and S.L. Juliano. 1998. Organization of the forepaw representation in ferret
somatosensory cortex. Somatosensory & Motor Research 15:253–68.
Meredith, M.A., L.R. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Crossmodal projections from somatosen-
sory area SIV to the auditory field of the anterior ectosylvian sulcus (FAES) in cat: Further evidence for
subthreshold forms of multisensory processing. Experimental Brain Research 172:472–84.
Merigan, W.H., and J.H. Maunsell. 1993. How parallel are the primate visual pathways? Annual Review of
Neuroscience 16:369–402.
Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus.
Journal of Neuroscience 4:2621–34.
Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory-
visual interactions during early sensory processing in humans: A high-density electrical mapping study.
Brain Research Cognitive Brain Research 14:115–28.
Musacchia, G., and C.E. Schroeder. 2009. Neuronal mechanisms, response dynamics and perceptual functions
of multisensory interactions in auditory cortex. Hearing Research 258:72–9.
Nakamoto, K.T., S.J. Jones, and A.R. Palmer. 2008. Descending projections from auditory cortex modulate
sensitivity in the midbrain to cues for spatial position. Journal of Neurophysiology 99:2347–56.
Nelken, I., G. Chechik, T.D. Mrsic-Flogel, A.J. King, and J.W. Schnupp. 2005. Encoding stimulus informa-
tion by spike numbers and mean response time in primary auditory cortex. Journal of Computational
Neuroscience 19:199–221.
Nelken, I., J.K. Bizley, F.R. Nodal, B. Ahmed, A.J. King, and J.W. Schnupp. 2008. Responses of auditory
cortex to complex stimuli: Functional organization revealed using intrinsic optical signals. Journal of
Neurophysiology 99:1928–41.
Passamonti, C., C. Bertini, and E. Ladavas. 2009. Audio-visual stimulation improves oculomotor patterns in
patients with hemianopia. Neuropsychologia 47:546–55.
Philipp, R., C. Distler, and K.P. Hoffmann. 2006. A motion-sensitive area in ferret extrastriate visual cortex: An
analysis in pigmented and albino animals. Cerebral Cortex 16:779–90.
Phillips, D.P., and S.S. Orman. 1984. Responses of single neurons in posterior field of cat auditory cortex to
tonal stimulation. Journal of Neurophysiology 51:147–63.
Radeau, M., and P. Bertelson. 1974. The after-effects of ventriloquism. Quarterly Journal of Experimental
Psychology 26:63–71.
Ramsay, A.M., and M.A. Meredith. 2004. Multiple sensory afferents to ferret pseudosylvian sulcal cortex.
Neuroreport 15:461–5.
Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of “what” and “where” in auditory
cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11800–6.
Recanzone, G.H. 1998. Rapidly induced auditory plasticity: The ventriloquism aftereffect. Proceedings of the
National Academy of Sciences of the United States of America 95:869–75.
Recanzone, G.H. 2000. Spatial processing in the auditory cortex of the macaque monkey. Proceedings of the
National Academy of Sciences of the United States of America 97:11829–35.
Recanzone, G.H. 2008. Representation of con-specific vocalizations in the core and belt areas of the auditory
cortex in the alert macaque monkey. Journal of Neuroscience 28:13184–93.
48 The Neural Bases of Multisensory Processes
Redies, C., M. Diksic, and H. Riml. 1990. Functional organization in the ferret visual cortex: A double-label
2-deoxyglucose study. Journal of Neuroscience 10:2791–803.
Renier, L.A., I. Anurova, A.G. De Volder, S. Carlson, J. Vanmeter, and J.P. Rauschecker. 2009. Multisensory
integration of sounds and vibrotactile stimuli in processing streams for “what” and “where.” Journal of
Neuroscience 29:10950–60.
Rice, F.L., C.M. Gomez, S.S. Leclerc, R.W. Dykes, J.S. Moon, and K. Pourmoghadam. 1993. Cytoarchitecture
of the ferret suprasylvian gyrus correlated with areas containing multiunit responses elicited by stimula-
tion of the face. Somatosensory & Motor Research 10:161–88.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50:19–26.
Romanski, L.M., B. Tian, J. Fritz, M. Mishkin, P.S. Goldman-Rakic, and J.P. Rauschecker. 1999. Dual streams
of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience
2:1131–6.
Schreiner, C.E., and M.S. Cynader. 1984. Basic functional organization of second auditory cortical field (AII)
of the cat. Journal of Neurophysiology 51:1284–305.
Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas
of the macaque neocortex. Brain Research Cognitive Brain Research 14:187–98.
Schroeder, C.E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual ampli-
fication of speech. Trends in Cognitive Sciences 12:106–13.
Shelton, B.R., and C.L. Searle. 1980. The influence of vision on the absolute identification of sound-source
position. Perception & Psychophysics 28:589–96.
Smiley, J.F., T.A. Hackett, I. Ulbert et al. 2007. Multisensory convergence in auditory cortex, I. Cortical con-
nections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology
502:894–923.
Stecker, G.C., B.J. Mickey, E.A. Macpherson, and J.C. Middlebrooks. 2003. Spatial sensitivity in field PAF of
cat auditory cortex. Journal of Neurophysiology 89:2889–903.
Stein, B.E., and T.R. Stanford. 2008. Multisensory intergration: Current issues from the perspective of the
single neuron. Nature Reviews. Neuroscience 9:1477–85.
Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory
integration apply. Brain Research 448:355–8.
Stein, B.E., M.A. Meredith, W.S. Huneycott, and L. Mcdade. 1989. Behavioral indices of multisensory inte-
gration: Orientation of visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience
1:12–24.
Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America 26:212–15.
Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus
monkey auditory cortex. Science 292:290–3.
Thomas, H., J. Tillein, P. Heil, and H. Scheich. 1993. Functional organization of auditory cortex in the mon-
golian gerbil (Meriones unguiculatus). I. Electrophysiological mapping of frequency representation and
distinction of fields. European Journal of Neuroscience 5:882–97.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–72.
Warren, J.D., and T.D. Griffiths. 2003. Distinct mechanisms for processing spatial sequences and pitch
sequences in the human auditory brain. Journal of Neuroscience 23:5799–804.
Werner-Reiss, U., K.A. Kelly, A.S. Trause, A.M. Underhill, and J.M. Groh. 2003. Eye position affects activity
in primary auditory cortex of primates. Current Biology 13:554–62.
Woods, T.M., and G.H. Recanzone. 2004. Visually induced plasticity of auditory spatial perception in macaques.
Current Biology 14:1559–64.
Woods, T.M., S.E. Lopez, J.H. Long, J.E. Rahman, and G.H. Recanzone. 2006. Effects of stimulus azimuth
and intensity on the single-neuron activity in the auditory cortex of the alert macaque monkey. Journal
of Neurophysiology 96:3323–37.
Wright, B.A., and Y. Zhang. 2006. A review of learning with normal and altered sound-localization cues in
human adults. International Journal of Audiology 45 Suppl 1, S92–8.
Zvyagintsev, M., A.R. Nikolaev, H. Thonnessen, O. Sachs, J. Dammers, and K. Mathiak. 2009. Spatially con-
gruent visual motion modulates activity of the primary auditory cortex. Experimental Brain Research
198:391–402.
Section II
Neurophysiological Bases
4 Are Bimodal Neurons the
Same throughout the Brain?
M. Alex Meredith, Brian L. Allman,
Leslie P. Keniston, and H. Ruth Clemo
CONTENTS
4.1 Introduction............................................................................................................................. 51
4.2 Methods................................................................................................................................... 52
4.2.1 Surgical Procedures..................................................................................................... 52
4.2.2 Recording..................................................................................................................... 52
4.2.3 Data Analysis............................................................................................................... 53
4.3 Results...................................................................................................................................... 54
4.3.1 Anterior Ectosylvian Sulcal Cortex............................................................................. 54
4.3.2 Posterolateral Lateral Suprasylvian Cortex................................................................. 54
4.3.3 Rostral Suprasylvian Sulcal Cortex............................................................................. 59
4.3.4 Superior Colliculus...................................................................................................... 59
4.4 Discussion................................................................................................................................60
4.4.1 Bimodal Neurons with Different Integrative Properties.............................................60
4.4.2 Bimodal Neurons in SC and Cortex Differ.................................................................60
4.4.3 Bimodal Neurons in Different Cortical Areas Differ..................................................60
4.4.4 Population Contribution to Areal Multisensory Function........................................... 61
4.4.5 Methodological Considerations................................................................................... 62
4.5 Conclusions.............................................................................................................................. 63
Acknowledgments............................................................................................................................. 63
References......................................................................................................................................... 63
4.1 INTRODUCTION
It is a basic tenet of neuroscience that different neural circuits underlie different functions or behav-
iors. For the field of multisensory processing, however, this concept appears to be superseded by
the system’s requirements: convergence of inputs from different sensory modalities onto individual
neurons is the requisite, defining step. This requirement is fulfilled by the bimodal neuron, which
has been studied for half a century now (Horn and Hill 1966) and has come to represent the basic
unit of multisensory processing (but see Allman et al. 2009). Bimodal neurons are ubiquitous: they
are found throughout the neuraxis and in nervous systems across the animal kingdom (for review,
see Stein and Meredith 1993). Bimodal (and trimodal) neurons exhibit suprathreshold responses to
stimuli from more than one sensory modality, and often integrate (a significant response change
when compared with unisensory responses) those responses when the stimuli are combined. As
revealed almost exclusively by studies of the superior colliculus (SC), bimodal neurons integrate
multisensory information according to the spatial, temporal, and physical parameters of the stim-
uli involved (for review, see Stein and Meredith 1993). The generality of these principles and the
51
52 The Neural Bases of Multisensory Processes
Superior
colliculus
Anterior
ectosylvian
FIGURE 4.1 Lateral view of cat brain depicts multisensory recording sites in cortex and midbrain.
4.2 METHODS
4.2.1 Surgical Procedures
A two-part implantation/recording procedure was used as described in detail in previous reports
(Meredith and Stein 1986; Meredith et al. 2006). First, the animals were anesthetized (pentobarbi-
tal, 40 mg/kg) and their heads were secured in a stereotaxic frame. Sterile techniques were used to
perform a craniotomy that exposed the targeted recording area and a recording well was implanted
over the opening. The scalp was then sutured closed around the implant and routine postoperative
care was provided. Approximately 7 to 10 days elapsed before the recording experiment.
4.2.2 Recording
Recording experiments were initiated by anesthetizing the animal (ketamine, 35 mg/kg, and
acepromazine, 3.5 mg/kg initial dose; 8 with 1 mg kg−1 h−1 supplements, respectively) and securing
the implant to a supporting bar. A leg vein was cannulated for continuous administration of fluids,
supplemental anesthetics, and to prevent spontaneous movements, a muscle relaxant (pancronium
bromide, 0.3 mg/kg initial dose; 0.2 mg kg−1 h−1 supplement). The animal was intubated through
Are Bimodal Neurons the Same throughout the Brain? 53
the mouth and maintained on a ventilator; expired CO2 was monitored and maintained at ~4.5%.
A glass-insulated tungsten electrode (impedance <1.0 MΩ) was used for recording. A hydraulic
microdrive was used to advance the electrode and to record the depth of identified neurons. Neuronal
activity was amplified and routed through a counter (for SC recordings) or to a PC for storage and
analysis (for cortical recordings). Neurons were identified by their spontaneous activity and by their
responses to somatosensory (puffs of air through a pipette, brush strokes and taps, manual pres-
sure and joint movement, taps, and stroking by calibrated von Frey hairs), auditory (claps, clicks,
whistles, and hisses), and/or visual (flashed or moving spots or bars of light from a handheld oph-
thalmoscope projected onto the translucent hemisphere, or dark stimuli from a rectangular piece
of black cardboard) search stimuli. Sensory receptive fields were mapped using adequate stimuli in
each modality and were graphically recorded. During recording, the depth of each identified neuron
was noted and tabulated along with its sensory responsivity (e.g., auditory, visual, somatosensory,
bimodal, or trimodal) and level of evoked stimulation activity obtained during quantitative tests
(see below). Multiple recording penetrations were performed in a single experiment and success-
ful recording penetrations were marked with a small electrolytic lesion. At the conclusion of the
experiment, the animal was euthanized and the brain fixed and blocked stereotaxically. Standard
histological techniques were used to stain and mount the tissue. A projecting microscope was used
to trace sections and to reconstruct recording penetrations from the lesion sites.
For selected neurons in each recording area, quantitative tests were conducted to document their
responses to sensory/multisensory stimulation. Electronically gated, repeatable somatosensory,
auditory, and visual stimuli were presented. Somatosensory stimuli were produced by an electroni-
cally driven, modified shaker (Ling, 102A) whose amplitude, velocity, and temporal delay were
independently set to either indent the skin or deflect hairs. Auditory stimulation consisted of a white
noise burst, 100 ms duration, generated by a solenoid-gated air hose (for some SC recordings), or an
electronic waveform played through a hoop-mounted speaker (for all other recordings) positioned
in contralateral auditory space. Visual stimuli were generated by a projector that cast an image of
a light bar through a rotating prism (to determine angle of trajectory) onto a galvanometer-driven
mirror (to affect delay, amplitude, and velocity of movement). This image was projected onto a
translucent Plexiglas hemisphere (92 cm diameter) positioned in front of the animal. Visual stimuli
of effective size and luminosity were moved through the visual receptive field at an effective ori-
entation, direction, and speed. These controlled somatosensory, auditory, and visual stimuli were
presented alone and in paired combinations (i.e., visual–auditory, auditory–somatosensory, visual–
somatosensory). An interstimulus interval of 7 to 15 s was used to avoid habituation; each test was
repeated 10 to 25 times.
4.3 RESULTS
4.3.1 Anterior Ectosylvian Sulcal Cortex
The banks of the anterior ectosylvian sulcus (AES) contain auditory (field of the AES; Clarey and
Irvine 1990), visual (AEV; Olson and Graybiel 1987), and somatosensory (SIV; Clemo and Stein
1983) representations. Numerous studies of this region have identified bimodal neurons (Wallace
et al. 1992; Rauschecker and Korte 1993; Jiang et al. 1994a, 1994b) particularly at the intersection
of the different sensory representations (Meredith 2004; Carriere et al. 2007). The bimodal neu-
rons described in the present study were collected during the recordings reported by Meredith and
Allman (2009).
Neurons were identified in six penetrations in three cats, of which 24% (n = 46/193) were bimodal.
These neurons exhibited suprathreshold responses to independent presentations of auditory and
visual (n = 39), auditory and somatosensory (n = 6), or visual and somatosensory (n = 1) stimuli. A
typical example is illustrated in Figure 4.2, where the presentation of either auditory or visual stim-
uli vigorously activated this neuron. Furthermore, the combination of visual and auditory stimuli
induced an even stronger response representing a significant (p < .05, paired t-test) enhancement
of activity (36%) over that elicited by the most effective stimulus presented alone (see Meredith
and Stein 1986 for criteria). This response increment was representative of bimodal AES neurons
because the population average level of enhancement was 34% (see Figure 4.3). This modest level of
multisensory integration was collectively achieved by neurons of widely different activity levels. As
illustrated in Figure 4.4, responses to separate or combined-modality stimulation achieved between
an average of 1 and 50 spikes/trial [response averages to the weakest (5.1 ± 4.9 standard deviation
(SD)) and best (8.9 ± 7.9 SD) separate stimuli and to combined-modality stimulation (11.7 ± 9.9 SD)
are also shown in Figure 4.3]. However, only a minority (46%; n = 21/46) of bimodal neurons showed
response enhancement to the available stimuli and most showed levels of activity that plotted close
to the line of unity in Figure 4.4. Figure 4.5 shows that the highest levels of enhancement were gen-
erally achieved in those neurons with lower levels of unimodal response activity. Specifically, the
neurons showing >75% response change (average 130%) exhibited responses to unimodal stimuli
that averaged 6.6 spikes/trial. As illustrated in Figure 4.6, however, most (85%; n = 39/46) bimodal
neurons demonstrated response enhancements of <75%. In addition, a few (11%; 5/46) AES bimodal
neurons even showed smaller responses to combined-modality stimulation than to the most effective
unimodal stimulus.
Another measure of multisensory processing is the proportional relationship of the activity
evoked by the combined stimuli to that of the sum of responses to the different separate-modality
stimuli (e.g., King and Palmer 1985). This analysis for bimodal AES neurons is presented in
Figure 4.7, which indicates that fewer neurons (17%; n = 8/46) show superadditive activity com-
pared with those that show statistically significant levels of response enhancement (46%; n =
21/46). Given that bimodal neurons represent only about 25% of the AES neurons (Jiang et al.
1994b; Meredith and Allman 2009), and that multisensory integration occurs in a portion of that
population (17–46%, depending on the criterion for integration), these data suggest that integrated
multisensory signals in response to effective sensory stimuli contribute to a small portion of the
output from the AES.
20
40
0 0 0 0
0 600 A V AV 0 600 A V AV
0%
Are Bimodal Neurons the Same throughout the Brain?
60
20
Mean spikes/trial
Spikes
0 0 0 0 0
0 600 A S SA 0 600 A V AV
Time (ms)
FIGURE 4.2 For each recording area (a–d), individual bimodal neurons showed responses to both unimodal stimuli presented separately as well as to their combina-
tion stimuli, as illustrated by rasters (1 dot = 1 spike) and histograms (10 ms time bins). Waveforms above each raster/histogram indicate stimulation condition (square
wave labeled “A” = auditory; ramp labeled “V” = visual; ramp labeled “S” = somatosensory; presented separately or in combination). Bar graphs depict mean (and
standard deviation) of responses to different stimulus conditions; numerical percentage indicates proportional difference between the most effective unimodal stimulus
and the response elicited by stimulus combination (i.e., integration). Asterisk (*) indicates that response change between these two conditions was statistically significant
(p < .05 paired t-test).
55
56 The Neural Bases of Multisensory Processes
20
AES PLLS
24 ± 4 %
10
0
Lowest Best Lowest Best
Combined Combined
unimodal unimodal unimodal unimodal
20
RSS SC 88 ± 12 %
Mean response (spikes/trial)
15
10
5 37 ± 4 %
0
Lowest Best Lowest Best
Combined Combined
unimodal unimodal unimodal unimodal
FIGURE 4.3 For each recording area, average response levels (and standard error of the mean [SEM]) for
population of bimodal neurons. Responses to unimodal stimuli were grouped by response level (lowest, best),
not by modality. Percentage (and SEM) indicates proportional change between the best unimodal response
and that elicited by combined stimulation (i.e., integration). In each area, combined response was statistically
greater than that evoked by the most effective unimodal stimulus (p < .05; paired t-test).
50 50
AES PLLS
Combined response (spikes/trial)
40 40
30 30
20 20
10 10
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Best unimodal response (spikes/trial) Best unimodal response (spikes/trial)
50 50
RSS SC
Combined response (spikes/trial)
40 40
30 30
20 20
10 10
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Best unimodal response (spikes/trial) Best unimodal response (spikes/trial)
FIGURE 4.4 For neural areas sampled, response of a given bimodal neuron to the most effective unimodal
stimulus (x axis) was plotted against its response to stimulus combination (y axis). For the most part, bimodal
neurons in each area showed activity that almost always plotted above line of unity (dashed line).
Are Bimodal Neurons the Same throughout the Brain? 57
400 400
AES PLLS
300 300
200 200
Interaction (%)
100 100
0 0
0 10 20 30 0 10 20 30
–100 –100
–200 –200
Best unimodal response (spikes/trial) Best unimodal response (spikes/trial)
400 400
RSS SC
300 300
200 200
Interaction (%)
100 100
0 0
0 10 20 30 0 10 20 30
–100 –100
–200 –200
Best unimodal response (spikes/trial) Best unimodal response (spikes/trial)
FIGURE 4.5 For each of recording areas, response of a given bimodal neuron to the most effective unimodal
stimulus (x axis) was plotted against proportional change (interaction) elicited by combined stimuli (y axis).
Most bimodal neurons exhibited interactions > 0, but level of interaction generally decreased with increasing
levels of spiking activity.
2007). The bimodal neurons described in the present study were collected during PLLS recordings
reported by Allman and Meredith (2007).
A total of 520 neurons were identified in eight penetrations in three cats, of which 9% (n = 49/520)
were visual–auditory bimodal. A typical example is illustrated in Figure 4.2, where the presentation
of either auditory or visual stimuli vigorously activated the neuron. In addition, when the same visual
and auditory stimuli were combined, an even stronger response was evoked. The combined response
representing a significant (p < .05, paired t-test) enhancement of activity (39%) over that elicited by
the most effective stimulus presented alone (see Meredith and Stein 1986 for criteria). This response
increment was slightly larger than the average magnitude of integration (24%) seen in the population
of bimodal PLLS neurons [response averages to the weakest (4.7 ± 5.4 SD) and best (7.1 ± 6.8 SD)
separate stimuli and to combined-modality stimulation (8.8 ± 8.8 SD) are shown in Figure 4.3]. This
modest response increment was generated by neurons of widely different activity levels. As illustrated
in Figure 4.4, PLLS responses to separate or combined-modality stimulation produced between 1 and
50 mean spikes/trial. However, only a minority (39%; n = 19/49) of bimodal neurons showed signifi-
cant response enhancement to the available stimuli and most showed levels of activity that plotted
close to the line of unity in Figure 4.4. Figure 4.5 shows that levels of response interaction were gener-
ally the same across activity levels. Furthermore, all PLLS interaction magnitudes represented <75%
change, as also depicted in Figure 4.6. A few (16%; 8/49) PLLS bimodal neurons even showed smaller
responses to the combined stimuli than elicited by the most effective unimodal stimulus.
Analysis of the proportional change in bimodal PLLS neurons resulting from combined-modality
stimulation revealed that even fewer neurons (10; n = 5/49) achieve superadditive levels of activity
than statistically significant levels of response enhancement (39%; n = 19/49). Given that bimodal
neurons represent only about 25% of the PLLS neurons (Allman and Meredith 2007), and that mul-
tisensory integration occurs in a portion of that population (17–46%, depending on the criterion for
integration), these data suggest that integrated multisensory signals in response to effective sensory
stimuli contribute to a small portion of the output from the PLLS.
58 The Neural Bases of Multisensory Processes
60 60
AES x = 34% PLLS x = 24%
50 50
40 40
Neurons (%)
30 30
20 20
10 10
0 0
>–25 –25 to 25 to 75 to 125 to >175 >–25 –25 to 25 to 75 to 125 to >175
24 74 124 174 24 74 124 174
Interaction (%) Interaction (%)
60 60
RSS x = 37% SC x = 88%
50 50
40 40
Neurons (%)
30 30
20 20
10 10
0 0
>–25 –25 to 25 to 75 to 125 to >175 >–25 –25 to 25 to 75 to 125 to >175
24 74 124 174 24 74 124 174
Interaction (%) Interaction (%)
FIGURE 4.6 For each recording area, many bimodal neurons showed low levels of interaction (–25% to
25%). However, only AES and SC exhibited integrated levels in excess of 175%.
80
60
Neurons (%)
40
20
0
Statistical summative Statistical summative Statistical summative Statistical summative
FIGURE 4.7 Multisensory interactions in bimodal neurons can be evaluated by statistical (paired t-test between
best unimodal and combined responses) or by summative (combined response exceeds sum of both unimodal
responses) methods. For each area, fewer combined responses met these criteria using summative rather than statis-
tical methods. However, only in SC was integration (by either method) achieved by >50 of neurons.
Are Bimodal Neurons the Same throughout the Brain? 59
larger for responses with lower levels of activity. Given the levels of enhancement achieved by such
a large proportion of SC bimodal neurons, it did not seem surprising that >48% of neurons showed
enhancement levels in excess of a 75% change (see Figure 4.6). In contrast, few SC neurons (3%;
3/97) produced combined responses that were lower than that elicited by the most effective single-
modality stimulus.
Analysis of the proportional change in bimodal SC neurons resulting from combined-modality
stimulation revealed that a majority (56%; n = 45/81) achieved superadditive levels of activity; a
large majority also demonstrated statistically significant levels of response enhancement (76%; n =
62/81). Given that bimodal neurons represent a majority of neurons in the deep layers of the SC
(63%; Wallace and Stein 1997), and that significant levels of multisensory response enhancement
are achieved in more than three-fourths of those, these data suggest that integrated multisensory
signals are a robust component of sensory signals in the SC.
4.4 DISCUSSION
4.4.1 Bimodal Neurons with Different Integrative Properties
Bimodal neurons clearly differ from one another (Perrault et al. 2005). In the SC, some bimodal
neurons are highly integrative and exhibit integrated, superadditive responses to a variety of stimu-
lus combinations, whereas others never produce superadditive levels in spite of the full range of
stimuli presented. Thus, different bimodal neurons exhibit different functional ranges. The ques-
tion of whether bimodal neurons elsewhere in the brain might also exhibit integrative differences
was examined in the present study. Bimodal neurons in the AES, PLLS, and RSS were tested for
their responses to combined-modality stimuli that revealed that some cortical neurons generated
multisensory integrated responses whereas others did not. It should be pointed out that the present
study did not make an exhaustive characterization of the integrative capacity of each neuron (as
done by Perrault et al. 2005). However, the present sampling methods appear to have overestimated
(not underestimated) the proportion of integrative neurons because 45% of the SC sample showed
superadditive response levels, whereas fewer (28%) were identified using more intensive methods
(Perrault et al. 2005). Regardless of these testing differences, these combined studies indicate that
bimodal neurons from across the brain are a diverse group.
21/46) and higher levels of integration (34% average) than those in the RSS (34%; 33/97 showed
significant response change; 24% average). Furthermore, bimodal neurons in these regions showed
significantly different (p < .01 t-test) spike counts in response to adequate separate and combined-
modality stimuli. AES neurons averaged 8.9 ± 7.9 SD spikes/trial in response to the most effective
separate-modality stimulus, and 11.7 ± 9.9 SD spikes/trial to the combined stimuli. In contrast,
RSS neurons averaged 2.8 ± 2.2 SD spikes/trial in response to the most effective separate-modality
stimulus, and 3.6 ± 2.9 SD spikes/trial to the combined stimuli. In addition, nearly 20% of RSS
neurons showed combined responses that were less than the maximal unimodal responses, com-
pared with 11% of AES bimodal neurons. Thus, by a variety of activity measures, the multisensory
processing capacity is clearly different for bimodal neurons in different cortical areas. Measures of
multisensory processing in bimodal PLLS neurons appear to fall between those obtained for AES
and RSS.
FIGURE 4.8 Bimodal neurons with different functional modes, when distributed in different proportions,
underlie regions exhibiting different multisensory properties. Each panel shows same array of neurons, except
that proportions of unisensory (white), low-integrator (gray), and high-integrator (black) multisensory neurons
are different. Areas in which low-integrator neurons predominate show low overall levels of multisensory
integration (left), whereas those with a large proportion of high-integrators (right) exhibit high levels of mul-
tisensory integration. Intermediate proportions of low- and high-integrators collectively generate intermedi-
ate levels of multisensory integration at areal level. Ultimately, these arrangements may underlie a range of
multisensory processes that occur along a continuum from one extreme (no integration, not depicted) to the
other (high integration).
62 The Neural Bases of Multisensory Processes
et al. 2007; Meredith et al. 2006; Clemo et al. 2007; Allman and Meredith 2007). Therefore,
from an areal level, the comparatively weak multisensory signal from a cortical area is likely
to be further diluted by the fact that only a small proportion of bimodal neurons contribute to
that signal. It should also be pointed out that many cortical areas have now been demonstrated
to contain subthreshold multisensory (also termed “modulatory”) neurons. These neurons are
activated by inputs from only one modality, but that response can be subtly modulated by influ-
ences from another to show modest (but statistically significant) levels of multisensory interac-
tion (Dehner et al. 2004; Meredith et al. 2006; Carriere et al. 2007; Allman and Meredith 2007;
Meredith and Allman 2009). Collectively, these observations suggest that cortical multisensory
activity is characterized by comparatively low levels of integration. In the context of the behav-
ioral/perceptual role of cortex, these modest integrative levels may be appropriate. For example,
when combining visual and auditory inputs to facilitate speech perception (e.g., the cocktail
party effect), it is difficult to imagine how accurate perception would be maintained if every
neuron showed a response change in excess of 1200%. On the other hand, for behaviors in which
survival is involved (e.g., detection), multisensory interactions >1200% would clearly provide an
adaptive advantage.
to avoid “false positives” while identifying sites of multisensory integration within the cortex
(see Laurienti et al. 2005 for review). Based on the multisensory characteristics of SC neurons
(Perrault et al. 2005), however, Laurienti and colleagues cautioned that multisensory stimuli
would not likely generate superadditive responses in the blood oxygenation level–dependent sig-
nal as measured by functional magnetic resonance imaging (Laurienti et al. 2005). The results
of the present study further support this caution because proportionally fewer cortical neurons
reveal superadditive responses than SC neurons (Figure 4.7), and the magnitude of response
enhancement is considerably smaller in the cortex (Figure 4.6). On the other hand, given the
tenuous relationship between single neuron discharge activity (i.e., action potentials) and brain
hemodynamics underlying changes in the blood oxygenation level–dependent signal (Logothetis
et al. 2001; Laurienti et al. 2005; Sirotin and Das 2009; Leopold 2009), it remains debatable
whether effects identified in single-unit electrophysiological studies are appropriate to charac-
terize/define multisensory processing in neuroimaging studies in the first place. How this issue
is resolved, however, does not change the fact that electrophysiological measures of multisensory
processing at the neuronal level reveal differences among bimodal neurons from different brain
regions.
4.5 CONCLUSIONS
Bimodal neurons are known to differ functionally within the same structure, the SC. The present
study shows that this variation also occurs within the cortex. Ultimately, by varying the propor-
tional representation of the different types of bimodal neurons (defined by functional ranges), dif-
ferent neural areas can exhibit different levels of multisensory integration in response to the same
multisensory stimulus.
ACKNOWLEDGMENTS
Collection of superior colliculus data was supported by grants NS019065 (to B.E. Stein) and NS06838
(to M.A. Meredith), that of cortical data was supported by grant NS039460 (to M.A. Meredith).
REFERENCES
Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in ‘unimodal’ neurons: Cross-modal sub-
threshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribu-
tion of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167.
Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007.
Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology
98:2858–2867.
Clarey, J.C., and D.R.F. Irvine. 1990. The anterior ectosylvian sulcal auditory field in the cat: I. An electro-
physiological study of its relationship to surrounding auditory cortical fields. Journal of Comparative
Neurology 301:289–303.
Clemo, H.R., B.L. Allman, M.A. Donlan, and M.A. Meredith. 2007. Sensory and multisensory representations
within the cat rostral suprasylvian cortices. Journal of Comparative Neurology 503:110–127.
Clemo, H.R., and B.E. Stein. 1983. Organization of a fourth somatosensory area of cortex in cat. Journal of
Neurophysiology 50:910–925.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Horn, G., and R.M. Hill. 1966. Responsiveness to sensory stimulation of units in the superior colliculus and
subjacent tectotegmental regions of the rabbit. Experimental Neurology 14:199–223.
Jiang, H., F. Lepore, M. Ptito, and J.P. Guillemot. 1994a. Sensory interactions in the anterior ectosylvian cortex
of cats. Experimental Brain Research 101:385–396.
64 The Neural Bases of Multisensory Processes
Jiang, H., F. Lepore, M. Ptito, and J.P. Guillermot. 1994b. Sensory modality distribution in the anterior ectosyl-
vian cortex (AEC) of cats. Experimental Brain Research 97:404–414.
King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the
guinea-pig superior colliculus. Experimental Brain Research 60:492–500.
Laurienti, P.J., T.J. Perrault, T.F. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity
as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental
Brain Research 166:289–297.
Leopold, D.A. 2009. Neuroscience: Pre-emptive blood flow. Nature 457:387–388.
Logothetis, N.K., J. Pauls, M. Augath, T. Trinath, and A. Oeltermann. 2001. Neurophysiological investigation
of the basis of the fMRI signal. Nature 412:150–157.
Meredith, M.A. 2004. Cortico-cortical connectivity and the architecture of cross-modal circuits. In Handbook of
Multisensory Processes, eds. C. Spence, G. Calvert, and B. Stein, 343–355. Cambridge, MA: MIT Press.
Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex.
Neuroreport 20:126–131.
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–391.
Meredith, M.A., and B.E. Stein. 1985. Descending efferents of the superior colliculus relay integrated multi-
sensory information. Science 227:657–659.
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in the superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662.
Meredith, M.A., and B.E. Stein. 1996. Spatial determinants of multisensory integration in cat superior collicu-
lus neurons. Journal of Neurophysiology. 75:1843–1857.
Meredith, M.A., L.R. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Cross-modal projections from somatosen-
sory area SIV to the auditory field of the anterior ecosylvian sulcus (FAES) in cat: Further evidence for
subthreshold forms of multisensory processing. Experimental Brain Research 172:472–484.
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons: I. Temporal factors. Journal of Neuroscience 7:3215–3229.
Olson, C.R., and A.M. Graybiel. 1987. Ectosylvian visual area of the cat: Location, retinotopic organization,
and connections. Journal of Comparative Neurology 261:277–294.
Palmer, L.A., A.C. Rosenquist, and R.J. Tusa. 1978. The retinotopic organization of lateral suprasylvian visual
areas in the cat. Journal of Comparative Neurology 177:237–256.
Perrault, T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use distinct opera-
tional modes in the integration of multisensory stimuli. Journal of Neurophysiology 93:2575–2586.
Rauschecker, J.P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex.
Journal of Neuroscience 13:4538–4548.
Sirotin, Y.B., and A. Das. 2009. Anticipatory haemodynamic signals in sensory cortex not predicted by local
neuronal activity. Nature 457:475–479.
Stecker, G.C., I.A. Harrington, E.A. MacPherson, and J.C. Middlebrooks. 2005. Spatial sensitivity in the dorsal
zone (area DZ) of cat auditory cortex. Journal of Neurophysiology 94:1267–1280.
Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., M.A. Meredith. W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory inte-
gration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience
1:12–24.
Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual
communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience
26:11138–11147.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience 17:2429–2444.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory inputs in cat cortex.
Experimental Brain Research 91:484–488.
5 Audiovisual Integration
in Nonhuman Primates
A Window into the Anatomy
and Physiology of Cognition
Yoshinao Kajikawa, Arnaud Falchier, Gabriella Musacchia,
Peter Lakatos, and Charles E. Schroeder
CONTENTS
5.1 Behavioral Capacities..............................................................................................................66
5.1.1 Recognition..................................................................................................................66
5.1.2 Fusion and Illusions.....................................................................................................66
5.1.3 Perception.................................................................................................................... 67
5.2 Neuroanatomical and Neurophysiological Substrates.............................................................68
5.2.1 Prefrontal Cortex......................................................................................................... 69
5.2.2 Posterior Parietal Cortex............................................................................................. 71
5.2.3 STP Area..................................................................................................................... 72
5.2.4 MTL Regions............................................................................................................... 73
5.2.5 Auditory Cortex........................................................................................................... 74
5.2.6 Visual Cortex............................................................................................................... 75
5.2.7 Subcortical Regions..................................................................................................... 76
5.3 Functional Significance of Multisensory Interactions............................................................. 77
5.3.1 Influences on Unimodal Perception............................................................................. 77
5.3.1.1 Influence on Temporal Dynamics of Visual Processing............................... 77
5.3.1.2 Sound Localization....................................................................................... 78
5.3.2 AV Recognition........................................................................................................... 79
5.4 Principles of Multisensory Interaction.................................................................................... 79
5.4.1 Inverse Effectiveness................................................................................................... 80
5.4.2 Temporal Contiguity....................................................................................................80
5.4.3 Spatial Contiguity........................................................................................................ 81
5.5 Mechanisms and Dynamics of Multisensory Interaction........................................................ 82
5.5.1 Phase Reset: Mechanisms............................................................................................ 82
5.5.2 Phase Reset: Dependence on Types of Stimuli........................................................... 83
5.6 Importance of Salience in Low-Level Multisensory Interactions........................................... 83
5.6.1 Role of (Top-Down) Attention.....................................................................................84
5.6.2 Attention or Saliency of Stimuli.................................................................................. 85
5.7 Conclusions, Unresolved Issues, and Questions for Future Studies........................................ 85
5.7.1 Complex AV Interactions............................................................................................. 85
5.7.2 Anatomical Substrates of AV Interaction.................................................................... 85
5.7.3 Implication of Motor Systems in Modulation of Reaction Time................................. 85
5.7.4 Facilitation or Information?......................................................................................... 86
65
66 The Neural Bases of Multisensory Processes
5.1.1 Recognition
One of the most ubiquitous AV functions in everyday human life is recognizing and matching the
sight and sounds of other familiar humans. Nonhuman primates can also recognize the sight and
sound of a familiar object and can express this association behaviorally. Primates reliably associate
coincident auditory and visual signals of conspecific vocalizations (Evans et al. 2005; Ghazanfar
and Logothetis 2003; Jordan et al. 2005; Sliwa et al. 2009) and can match pictures to vocal sounds
of both conspecifics and familiar humans (Izumi and Kojima 2004; Kojima et al. 2003; Martinez
and Matsuzawa 2009). Monkeys can also identify a picture in which the number of individuals
matches the number of vocal sounds (Jordan et al. 2005). Although it appears that primates rec-
ognize the AV components of a talking face much better when the individual is socially familiar,
familiarity does not appear to be a critical component of audiovisual recognition; many of the
studies cited above showed that primates can correctly match AV vocalizations from other primate
species (Martinez and Matsuzawa 2009; Zangenehpour et al. 2009). Facial movement, on the other
hand, appears to be a key component for nonhuman primates in recognizing the vocal behavior of
others. When matching a visual stimulus to a vocalization, primates correctly categorized a still
face as a mismatch (Izumi and Kojima 2004; Evans et al. 2005; Ghazanfar and Logothetis 2003)
and performed poorly when only the back view was presented (Martinez and Matsuzawa 2009).
AV matching by monkeys is not limited to facial recognition. Ghazanfar et al. (2002) showed
that a rising-intensity sound attracted a monkey’s attention to a similar degree as a looming visual
object (Schiff et al. 1962). These auditory and visual signals are signatures of an approaching object.
Monkeys preferentially look at the corresponding looming rather than receding visual signal when
presented with a looming sound. This was not the case when the monkey was presented with either
a receding sound or white noise control stimulus with an amplitude envelope matching that of the
looming sound (Maier et al. 2004). Therefore, monkeys presumably form single events by associat-
ing sound and visual attributes at least for signals of approaching objects.
Taken together, these data indicate that the dynamic structure of the visual stimulus and com-
patibility between two modalities is vital for AV recognition in primates and suggest a common
mechanistic nature across primate species.
syllables, mostly “da” (McGurk and MacDonald 1976). The illusion persists even when the listener
is aware of the mismatch, which indicates that visual articulations are automatically integrated into
speech perception (Green et al. 1991; Soto-Faraco and Alsius 2009).
Vatakis et al. (2008) examined whether auditory and visual components of monkey vocalizations
elicited a fused perception in humans. It is well known that people are less sensitive to temporal
asynchrony when auditory and visual components of speech are matched compared to a mismatched
condition (called the “unity effect”). Capitalizing on this phenomenon, Vatakis and colleagues used
a temporal order judgment task with matched and mismatched sounds and movies of monkey vocal-
izations across a range of stimulus onset asynchronies (SOA). The unity effect was observed for
human speech vocalization, but was not observed when people observed monkey vocalizations.
The authors also showed negative results for human vocalizations mimicking monkey vocaliza-
tions, suggesting that the fusion of face–voice components is limited to human speech for humans.
This may be because of the fact that monkey vocal repertoires are much more limited than those of
humans and have a large dissimilarity between facial expressive components and sound (Chakladar
et al. 2008; Partan 2002).
Another famous AV illusion, called the “ventriloquist effect,” also appears to have a corollary
in nonhuman primate perception. The effect is such that under the right conditions, a sound may be
perceived as originating from a visual location despite a spatial disparity. After training a monkey
to identify the location of a sound source, Recanzone’s group introduced a 20 to 60 min period of
spatially disparate auditory (tones) and visual (dots) stimuli (Woods and Recanzone 2004). The
consequence of this manipulation appeared in the sound lateralization task as a deviation of the
“auditory center spot” in the direction to the location of sound relative to visual fixation spot during
the prior task. The underlying neural mechanism of this effect may be similar to the realignment
of visual and auditory spatial maps after adapting to an optical prism displacing the visual space
(Cui et al. 2008; Knudsen and Knudsen 1989).
What about perception of multisensory moving objects? Preferential looking at looming sound
and visual signal suggests that monkeys associate sound and visual attributes of approaching objects
(Maier et al. 2004). However, longer looking does not necessarily imply fused perception, but may
instead suggest the attentional attraction to moving stimuli after assessing their congruency. Fused
perception of looming AV signals was supported by human studies, showing the redundant signal
effect (see Section 5.1.3 for more details) in reaction time (shorter reaction time to congruent loom-
ing AV signals) under the condition of bimodal attention (Cappe et al. 2010; see also Romei et al.
2009 for data suggesting preattentive effects of looming auditory signals). Interestingly, for such an
AV looming effect to happen, the spectrum of the sound has to be dynamically structured along
with sound intensity. It is not known which other attributes of a visual stimulus, other than motion,
could contribute to this effect. It is likely that auditory and visual stimuli must be related, not only
in spatial and temporal terms, but also in dynamic spectral dimensions in both modalities in order
for an attentional bias or performance enhancement to appear.
5.1.3 Perception
Visual influences on auditory perception, and vice versa, is well established in humans (Sumby
and Pollack 1954; Raab 1962; Miller 1982; Welch and Warren 1986; Sams et al. 1991; Giard and
Peronnet 1999; for review, see Calvert 2001; Stein and Meredith 1993) and has been examined in
several studies on nonhuman primates (described below). By using simple auditory and visual stim-
uli, such as tones and dots, the following studies show that auditory and visual information interact
with each other to modulate perception in monkeys.
Barone’s group trained monkeys to make a saccade to a visual target that starts to flash at the
moment when the fixation point disappears (Wang et al. 2008). In half of the trials, the visual target
was presented with a brief task-irrelevant noise. The result was faster saccadic reaction times when
the visual target was accompanied with a sound than without it. Frens and Van Opstal (1998) also
68 The Neural Bases of Multisensory Processes
studied the influence of auditory stimulation on saccadic responses in monkeys performing tasks
similar to that of Wang et al. (2008). They showed not only a shortening of reaction time, but also
that reaction time depended on the magnitude of the spatial and temporal shift between visual and
auditory stimuli; smaller distance and closer timing yielded shorter reaction times. These results
demonstrated a temporal effect of sound on visual localization. These results are compatible with
human psychophysical studies of AV integration (Frens et al. 1995; Diederich and Colonius 2004;
Perrott et al. 1990) and suggest that the underlying mechanism may be common to both human and
nonhuman primates.
Like humans, monkeys have also been shown to have shorter manual reaction times to bimodal
targets compared with unimodal targets. In a simple detection task in which a monkey had to report
the detection of a light flash (V alone), noise sound (A alone), or both (AV) stimuli by manual
response, reaction times to AV stimuli were faster than V alone regardless of its brightness (Cappe
et al. 2010; see also Miller et al. 2001, showing similar data for small data sets). When the sound was
loud, reaction times to AV stimuli and A alone were not different. When sound intensity was low,
the overall reaction time was longer and the response to AV stimuli was still faster than A alone.
A study from our laboratory showed that reaction times to perceptual “oddballs,” or novel stimuli
in a train of standard stimuli, were faster for AV tokens than for the visual or auditory tokens pre-
sented alone (Kajikawa and Schroeder 2008). Monkeys were presented with a series of standard AV
stimuli (monkey picture and vocal sound) with an occasional oddball imbedded in the series that
differed from the standard in image (V alone), sound (A alone), or both (AV) stimuli. The monkey
had to manually respond upon detection of such oddballs. In that case, whereas intensity levels
were fixed, reaction times to the AV oddballs were faster than either A alone or V alone oddballs.
In addition, the probability of a correct response was highest for the AV oddball and lowest for the
A alone condition. Therefore, not only the detection of signals, but also its categorization benefited
from AV integration.
This pattern of reaction times conforms to the results of human psychophysics studies showing
faster reaction time to bimodal than unimodal stimuli (Frens et al. 1995; Diederich and Colonius
2004; Perrott et al. 1990). Observations of faster reaction in response to bimodal compared with
unimodal stimuli in different motor systems suggest that AV integration occurs in sensory systems
before the motor system is engaged to generate a behavioral response (or that a similar integration
mechanism is present in several motor systems).
Difference in task demands complicates the ability to define the role of attention in the effect
of AV integration on reaction times. In the study conducted by Wang et al. (2008), monkeys were
required to monitor only the occurrence of the visual stimulus. Therefore, task-irrelevant sound
acted exogenously from outside of the attended sensory domain, that is, it likely drew the monkey’s
attention, but this possibility is impossible to assess. In contrast, Cappe et al. (2010) and Kajikawa
and Schroeder (2008) used monkeys that were actively paying attention to both visual and auditory
modalities during every trial. It is worth noting that the sound stimuli used by Wang et al. (2008)
did not act as distracters. Hence, it was possible that monkeys could do the task by paying attention
to both task-relevant visual stimuli and task-irrelevant sound (see Section 5.6).
2 7A
iS
-R
-lg pt DLPFC
PV al T VLPFC pt
ud B al T
Ca lt/P ud B
1 be Caelt/P
al A 1 b
str B al A
Ro lt/P str B
Roelt/P
be b
T
FS
23
V2/V1 31
Pro MP MGm SG
Li Po
FIGURE 5.1 (See color insert.) Connections mediating multisensory interactions in primate auditory
cortex. Primate auditory cortices receive a variety of inputs from other sensory and multisensory areas.
Somatosensory areas (PV, parietoventral area; Ri, retroinsular area; S2, secondary somatosensory cortex)
and their projections to auditory cortex are shown in red. Blue areas and lines denote known visual inputs
(FST, fundus of superior temporal area; Pro, prostriata; V1, primary visual cortex; V2, secondary visual
cortex). Feedback inputs from higher cognitive areas (7A, Brodmann’s area 7A; 23, Brodmann’s area 23;
31, Brodmann’s area 31; DLPFC, dorsolateral prefrontal cortex; VLPFC, ventrolateral prefrontal cortex) are
shown in green. Multisensory feedforward inputs from thalamic nuclei (Li, limitans; MP, medial pulvinar;
MGm, medial division of medial geniculate; Po, posterior nucleus; SG, suprageniculate nucleus) are shown
in purple.
medial temporal lobe (MTL). Even though most studies could not elucidate the relationship between
behavior and physiology because they did not test the monkey’s behavior in conjunction with physi-
ological measures, these studies provide promising indirect evidence that is useful in directing
future behavioral/physiological studies.
et al. 2007; Azuma and Suzuki 1984; Kikuchi-Yorioka and Sawaguchi 2000; Vaadia et al. 1986).
Conversely, response selectivity to macaque vocal sounds were found in VLPFC (Cohen et al. 2009;
Gifford et al. 2005; Romanski and Goldman-Rakic 2002; Romanski et al. 2005) and orbitofrontal
cortex (Rolls et al. 2006). These two areas may correspond to face-selective regions of frontal lobe
in nonhuman primates (Parr et al. 2009; Tsao et al. 2008b). Taken together, these findings support
the notion that, as in the visual system, sensitivity to location and nonspatial features of sounds are
segregated in PFC.
Although the dorsolateral stream in PFC has largely been shown to be sensitive to location, audi-
tory responses to species-specific vocalizations were also found in regions of DLPFC in squirrel
monkey (Newman and Lindsley 1976; Wollberg and Sela 1980) and macaque monkey (Bon and
Lucchetti 2006). Interestingly, visual fixation diminished responses to vocal sounds in some neu-
rons (Bon and Lucchetti 2006). Taken together with the results of Rao et al. (1997) showing that
neurons of the “what” and “where” visual stream are distributed over a region spanning both the
DLPFC and VLPFC, these studies suggest that the “what” auditory stream might extend outside
the VLPFC.
Apart from showing signs of analogous processing streams in auditory and visual pathways, PFC
is cytoarchitecturally primed to process multisensory stimuli. In addition to auditory cortical affer-
ents, the DLPFC and VLPFC have reciprocal connections with rostral and caudal STP subdivisions
(Seltzer and Pandya 1989). The VLPFC also receives inputs from the PPC, a presumed “where”
visual region (Petrides and Pandya 2009). Within both the DLPFC and VLPFC, segregated projec-
tions of different sensory afferents exist. Area 8 receives projections from visual cortices (occipital
and IPS) in its caudal part, and auditory-responsive cortices [superior temporal gyrus (STG) and
STP] in its rostral part (Barbas and Mesulam 1981). Similar segregation of visual [inferior temporal
(IT)] and auditory (STG and STP) afferents exist within VLPFC (Petrides and Pandya 2002). Thus,
DLPFC and VLPFC contain regions receiving both or either one of auditory and visual projections,
and those regions are intermingled. Additionally, orbitofrontal cortex and medial PFC receive inputs
from IT, STP, and STG (Barbas et al. 1999; Carmichael and Price 1995; Cavada et al. 2000; Kondo
et al. 2003; Saleem et al. 2008), and may contribute to AV integration (see Poremba et al. 2003).
Not surprisingly, bimodal properties of PFC neurons have been described in numerous studies.
Some early studies described neurons responsive to both tones and visual stimuli (Kubota et al.
1980; Aou et al. 1983). However, because these studies used sound as a cue to initiate immediate
behavioral response, it is possible that the neuronal response to the sound might be related to motor
execution. Other studies of PFC employed tasks in which oculomotor or manual responses were
delayed from sensory cues (Artchakov et al. 2007; Ito 1982; Joseph and Barone 1987; Kikuchi-
Yorioka and Sawaguchi 2000; Vaadia et al. 1986; Watanabe 1992). Despite the delayed response,
populations of neurons still responded to both visual and auditory stimuli. Such responses had
spatial tuning and dependence on task conditions such as modality of task and task demands of
discrimination, active detection, passive reception (Vaadia et al. 1986), or reward/no reward con-
tingency (Watanabe 1992). One report shows that visuospatial and audiospatial working memory
processes seem to share a common neural mechanism (Kikuchi-Yorioka and Sawaguchi 2000).
The behavioral tasks used in studies described so far did not require any comparison of visual
and auditory events. Fuster et al. (2000) trained monkeys to learn pairing of tones and colors and
perform a cross-modal delayed matching task using tones as the sample cue and color signals as the
target. They found that PFC neurons in those monkeys had elevated firing during the delay period
that was not present on error trials. Therefore, PFC has many neurons responsive to both auditory
and visual signals, somehow depending on behavioral conditions, and possibly associates them.
Romanski’s group explored multisensory responses in VLPFC (Sugihara et al. 2006), and found
that this region may have unimodal visual, unimodal auditory, or bimodal AV responsive regions
(Romanski et al. 2002, 2005). Their group used movies, images, and sounds of monkeys produc-
ing vocalizations as stimuli, and presented them unimodally or bimodally while subjects fixated.
Although neurons responded exclusively to one or both modalities, about half of the neurons
Audiovisual Integration in Nonhuman Primates 71
Even though most PPC studies used simple stimuli such as LED flashes and noise bursts, one
study also examined LIP response to vocal sounds and showed that LIP neurons are capable of
carrying information of sound acoustic features in addition to spatial location (Gifford and Cohen
2005). In that study, sounds were delivered passively to monkeys during visual fixation. Thus, it
seems inconsistent with the previously mentioned findings that manifestation of auditory response
in PPC requires behavioral relevance of the sounds (Grunewald et al. 1999; Linden et al. 1999).
Nevertheless, that study suggested the possibility that auditory coding in PPC may not be limited to
spatial information. Similarly, the existence of face-selective patches was shown in PPC of chim-
panzee using PET (Parr et al. 2009).
Although these studies suggest AV integration in PPC, responses to stimuli in bimodal condi-
tions have not yet been directly examined in monkeys.
of current source density (CSD), which reflects a pattern of afferent termination across cortical lay-
ers in response to sounds (click) and lights (flash), indicated that STP receives feedforward auditory
and visual inputs to layer IV (Schroeder and Foxe 2002).
Lesion studies in STP reveal that the region appears to process certain dimensions of sound and
vision used for discrimination. Monkeys with lesions of STG and STP areas showed an impair-
ment of auditory but not visual working memory and auditory pattern discrimination while sparing
hearing (Iversen and Mishkin 1973; Colombo et al. 2006). Although IT lesions impair many visual
tasks, IT and STP lesions (Aggleton and Mishkin 1990; Eaccott et al. 1993) selectively impair visual
discrimination of objects more severely while sparing the performance of other visual tasks. These
findings suggest that multisensory responses in STP are not simply sensory, but are involved in
cognitive processing of certain aspects of sensory signals.
A series of recent studies examined AV integration in STS using more naturalistic stimuli dur-
ing visual fixation, using sound and sight of conspecific vocalizations, naturally occurring scenes,
and artifactual movies (Barraclough et al. 2005; Dahl et al. 2009; Chandrasekaran and Ghazanfar
2009; Ghazanfar et al. 2008; Kayser and Logothetis 2009; Maier et al. 2008). As in previous studies
(Benevento et al. 1977; Watanabe and Iwai 1991), neuronal firing to bimodal stimuli was found to
be either stronger or weaker when compared to unimodal stimuli. Barraclough et al. (2005) showed
that the direction of change in the magnitude of response to AV stimuli from visual response
depended on the size of visual response. Incongruent pairs of sound and scenes seem to evoke
weaker responses (Barraclough et al. 2005; Maier et al. 2008).
To our knowledge, there are no animal studies that used task conditions requiring active behav-
ioral discrimination. Therefore, results may not be conclusive about whether the STS can associ-
ate/integrate information of different modalities to form a recognizable identity. However, their
bimodal responsiveness, specialization for objects such as faces in the visual modality, and sensitiv-
ity to congruence of signals in different modalities suggests that areas in STP are involved in such
cognitive processes and/or AV perception.
In the hippocampus, a small population of neurons responds to both auditory and visual cues for
moving tasks in which monkeys control their own spatial translation and position (Ono et al. 1993).
Even without task demands, hippocampal neurons exhibit spatial tuning properties to auditory and
visual stimuli (Tamura et al. 1992).
Neurons in the amygdala respond to face or vocalization of conspecifics passively presented
(Brothers et al. 1990; Kuraoka and Nakamura 2007; Leonard et al. 1985). Some neurons respond
selectively to emotional content (Hoffman et al. 2007; Kuraoka and Nakamura 2007). Multisensory
responses to different sensory cues were also shown in the amygdala of monkeys performing sev-
eral kinds of tasks to retrieve food or drink, avoid aversive stimuli, or discriminate sounds associ-
ated with reward (Nishijo et al. 1988a). These responses reflected affective values of those stimuli
rather than the sensory aspect (Nishijo et al. 1988b).
These data corroborate the notion that sensory activity in MTL is less likely to contribute
to detection, but more related to sensory association, evaluation, or other cognitive processes
(Murray and Richmond 2001). The integrity of these structures is presumably needed for the
formation and retention of cross-modal associational memory (Murray and Gaffan 1994; Squire
et al. 2004).
isolation modulate activity in the extragranular layer of A1 (Lakatos et al. 2009) and the same pattern
is observed with attended auditory stimuli in V1 (Lakatos et al. 2008). These findings strengthen the
hypothesis that nonspecific thalamic projections (Sherman and Guillery 2002) or pulvinar-mediated
lateral connections (Cappe et al. 2009) contribute to AV integration in A1.
Ghazanfar et al. and Logothetis et al. groups have shown that concurrent visual stimuli influ-
enced auditory cortical response systematically in A1 as well as in the lateral associative auditory
cortices and STP (Ghazanfar et al. 2005; Hoffman et al. 2008; Kayser et al. 2007, 2008). These
studies used complex and natural AV stimuli, which are more efficient in evoking responses in
some nonprimary auditory areas (Petkov et al. 2008; Rauschecker et al. 1995; Russ et al. 2008).
Their initial study (Ghazanfar et al. 2005) revealed that movies of vocalizations presented with the
associated sounds could modulate local field potential (LFP) responses in A1 and the lateral belt.
Kayser et al. (2008) showed visual responses in LFP at frequency bands near 10 Hz. This frequency
component responded preferably to faces, and more preference existed in the lateral belt than A1
(Hoffman et al. 2008). However, multiunit activity (MUA) barely showed visual response that cor-
related in magnitude with the LFP response. AV interactions occurred as a small enhancement in
LFP and suppression in MUA (see also Kayser and Logothetis 2009).
Although AV integration in areas previously thought to be unisensory are intriguing and pro-
vocative, the use of a behavioral task is imperative in order to determine the significance of this
phenomenon. Brosch et al. (2005) employed a task in which an LED flash cued the beginning of
an auditory sequence. Monkeys were trained to touch a bar to initiate the trial and to signal the
detection of a change in the auditory sequence. They found that some neurons in AC responded to
LED, but only when the monkey touched the bar after detecting the auditory change. This response
disappeared when the monkey had to perform a visual task that did not require auditory attention.
Although this may be due in part to the fact that the monkeys were highly trained (or potentially
overtrained) on the experimental task, they also point to the importance of engaging auditory atten-
tion in evoking responses to visual stimuli. Findings like these, which elucidate the integrative
responses of individual and small populations of neurons, can provide key substrates to understand
the effects of bimodal versus unimodal attention on cross-modal responses demonstrated in humans
(Jääskeläinen et al. 2007; McDonald et al. 2003; Rahne and Böckmann-Barthel 2009; Talsma et al.
2009; von Kriegstein and Giraud 2006).
The timing of cross-modal effects in primary auditory and posterior auditory association corti-
ces in resting or anesthetized monkeys seemed consistent with the cross-modal influence of touch
and sight in monkeys engaged in an auditory task. In resting monkeys, the somatosensory CSD
response elicited by electrical stimulation of the median nerve had an onset latency as short as 9 ms
(Lakatos et al. 2007; Schroeder et al. 2001), and single neurons responded to air puff stimulation
at dorsum hand in anesthetized monkey with a latency of about 30 ms (Fu et al. 2003). Cutaneous
sensory response of single units in AC during active task peaked at 20 ms (Brosch et al. 2005) and
occurred slower than direct electrical activation of afferent fibers but faster than passive condition.
Similarly, visual responses of single units in AC were observed from 60 ms and peaked at around
100 ms after the onset of LED during an active task (Brosch et al. 2005). That was within the same
range of the onset latency, about 100 ms, of neuronal firing and the peak timing of LFP responses
to complex visual stimuli in AC when monkeys were simply visually fixating (Hoffman et al. 2007;
Kayser et al. 2008). The effect of gaze direction/saccades will also need to be taken into account
in future studies because it has been proposed that it can considerably affect auditory processing
(Fu et al. 2004; Groh et al. 2001; Werner-Reiss et al. 2006).
2002). The peripheral visual field representation of area V2 also receives feedback inputs from cau-
dal STG/auditory belt region (Rockland and Ojima 2003). A preference to vocal sounds, relative to
other sounds, was found in the nonprimary visual cortex using functional MRI (fMRI) in monkeys
(Petkov et al. 2008).
In contrast to studies of visual responses in the auditory cortex, not many visual studies recorded
auditory responses in visual cortex during the performance of a task. Wang et al. (2008) recorded
V1 single-unit firing while monkeys performed a visual detection task. Concurrent presentation of
auditory and visual stimuli not only shortened saccadic reaction time, but also increased the neu-
ronal response magnitude and reduced response latency. This effect was greatest when the intensity
of visual stimuli was of a low to moderate level, and disappeared when the luminance of the visual
stimuli was intense. When monkeys were not performing a task, no auditory effect was observed in
V1 (see Section 5.6.1).
In a series of studies from our laboratory, a selective attention task was employed to deter-
mine whether attention to auditory stimuli influenced neuronal activity in V1 (Lakatos et al. 2008,
2009; Mehta et al. 2000a, 2000b). In these studies, tones and flashes were presented alternatively
and monkeys had to monitor a series of either visual or auditory stimuli, while ignoring the other
modality. The visual response was stronger when monkeys tracked the visual series than when they
tracked the auditory series. In the attend-auditory condition, it appeared that a phase reset of ongo-
ing neuronal oscillations occurred earlier than the visual response (Lakatos et al. 2009). This effect
disappeared when the same stimuli were ignored. Thus, auditory influences on V1 were observed
only when auditory stimuli were attended. It contrasted with the findings of Wang et al. (2008) in
which sound affected V1 activity in monkeys performing a visual task. As we propose later, con-
trol of attention likely has a major role in the manifestation of auditory effects in V1 (see Section
5.6.2).
influenced by covert orienting of concurrent visual events. This covert orienting may contribute to
the visual influence observed on portions of human auditory brainstem responses that are roughly
localized to the IC (Musacchia et al. 2006).
Studies of thalamic projections to the primary auditory cortex show that multisensory connec-
tions are present in centers previously thought to be “unisensory” (de la Mothe et al. 2006b; Hackett
et al. 2007; Jones 1998). Multiple auditory cortices also receive divergent afferents originating from
common thalamic nuclei (Cappe et al. 2009; Jones 1998). In addition, the connections between
thalamic nuclei and cortices are largely reciprocal. Even though the functions of those thalamic
nuclei have to be clarified, they may contribute to multisensory responsiveness in cerebral cortices.
Bimodal responsiveness was shown in a few thalamic nuclei (Matsumoto et al. 2001; Tanibuchi and
Goldman-Rakic 2003).
effect on the neural response, either through increased firing rate or faster response (for review,
see Stein and Stanford 2008), suggesting that AV stimuli should increase the acuity of the behav-
ioral sensation in some fashion. In humans, AV stimuli increase reaction time speed during
target detection (Diederich and Colonius 2004; Giard and Peronnet 1999; Molholm et al. 2002,
2007) and improve temporal order judgments (Hairston et al. 2006; Santangelo and Spence
2009).
In the monkey, Wang et al. (2008) showed electrophysiological results consistent with this notion.
During a visual localization task, the effect of AV enhancement in V1 occurred as shorter response
latency. Interestingly, no appreciable enhancement of visual response was elicited by auditory stim-
uli when monkeys were not engaged in tasks.
The auditory stimuli by themselves did not evoke firing response in V1. This suggests that audi-
tory influence on V1 activity is a subthreshold phenomenon. Suprathreshold response in V1 begins
at about 25 to 30 ms poststimulation (Chen et al. 2007; Musacchia and Schroeder 2009). To achieve
auditory influences on visual responses, auditory responses must arrive within a short temporal
window, a few milliseconds before visual input arrives (Lakatos et al. 2007; Schroeder et al. 2008).
Auditory responses in the auditory system generally begin much earlier than visual responses in V1.
For some natural events such as speech, visible signals lead the following sounds (Chandrasekaran
et al. 2009; for review, see Musacchia and Schroeder 2009). For these events, precedence of visual
input, relative to auditory input, is likely a requirement for very early AV interaction in early sensory
interactions.
Fixation during head restraint does not allow any eye movement. During fixation, subjects
can pay visual attention to locations off from the fixated spot (covert attention) or listen carefully.
Neuronal activity correlates of such processes were seen in PFC (Artchakov et al. 2007; Kikuchi-
Yorioka and Sawaguchi 2000) or PPC (Andersen et al. 1997). Meanwhile, subjects have to keep
feeding oculomotor command signals to maintain steady eye position. Therefore, the signal that
transmits fixating location and differentiates between center and deviant should be present. A pos-
sible correlate to such a signal was described in AC, a change in spontaneous activity dependent
on gaze direction, whereas it was not observed in IC (Werner-Reiss et al. 2006). Even though what
provides the eye positional signal to AC is unknown, it suggests AC as one of the candidates induc-
ing the ventriloquist aftereffect.
It is worth mentioning that regardless of the name “ventriloquist aftereffect,” it is quite different
from the ventriloquist effect. The ventriloquist effect happens when audio and visual signals stem
from a shared vicinity, but does not require fixation on a visual spot and a steady eye positional sig-
nal. In contrast, the ventriloquist aftereffect is about spatial coding of solely auditory events. Hence,
the study of this phenomenon may be useful to clarify which type of neuronal coding is the main
strategy for cortical encoding of sound localization.
5.3.2 AV Recognition
Identifying a previously known AV object, such as a speaker’s face and voice, requires AV inte-
gration, discrimination, and retention. This process likely relies on accurate encoding of complex
stimulus features in sensory cortices and more complex multiplexing in higher-order multisensory
association cortices. Multisensory cortices in the “what” pathway probably function to unite these
sensory attributes. In humans, audiovisual integration plays an important role in person recogni-
tion (Campanella and Belin 2007). Several studies have shown that unimodal memory retrieval of
multisensory experiences activated unisensory cortices, presumably because of multisensory asso-
ciation (Wheeler et al. 2000; Nyberg et al. 2000; Murray et al. 2004, 2005; von Kriegstein and
Giraud 2006) and such memory depended on meaningfulness of combined signals (Lehmann and
Murray 2005).
Differential responses to vocal sounds were observed in PFC (Gifford et al. 2005; Romanski
et al. 2005), STG (Rauschecker et al. 1995; Russ et al. 2008), and AC (Ghazanfar et al. 2005).
Differential responses to faces were found in PFC (Rolls et al. 2006), temporal lobe cortices (Eifuku
et al. 2004), and amygdala (Kuraoka and Nakamura 2007). Some of these structures may possess
selectivity to both vocal sounds and faces. Recognition of a previously learned object suggests that
this process relies in part on working and long-term memory centers. The fact that the identification
of correspondence between vocal sound and face is better when the individuals are socially famil-
iar (Martinez and Matsuzawa 2009) supports this notion. PFC and MTL are also involved in the
association of simple auditory and visual stimuli as shown by delayed match to sample task studies
(Colombo and Gross 1994; Fuster et al. 2000; Gibson and Maunsell 1997). Lesions in MTL (Murray
and Gaffan 1994) or PFC (Gaffan and Harrison 1991) impaired performance in tasks requiring
memory and AV association. These findings implicate PFC, STG, and MTL in AV recognition.
lations of auditory cortex at an SOA of 0 ms, and that was abolished by introducing a perceivable
delay between stimuli (160 ms).
These results suggest that AV interaction in AC could happen as either enhancement (if audio
and visual stimuli are nearly synchronized or separated by less than 100 ms delay) or suppression
(at delays longer than 100 ms). Interpretations of these data should be approached with a little
caution. In the first study, the effect of AV interaction was attributed to the interaction between
movements of the mouth and the following vocal sound (Ghazanfar et al. 2005). However, because
the mouth movement started immediately after the abrupt appearance of the first movie frame,
the sudden change in the screen image could capture visual attention. In other studies, an abrupt
visual change was shown to elicit a brief freeze of gaze position in monkeys (Cui et al. 2009)
and in humans (e.g., Engbert and Kliegl 2003). Therefore, the onset of the movie itself could
evoke transient activity. This would suggest that the observed effects were related simply to visual
response or a transient change in covert visual attention. Because LFPs capture the response of a
large population of neurons, such activity generated in non-AC structures may be superimposed.
Further studies are necessary to dissociate the AV interaction into mouth movement-related and
other components.
Cross-modal responses during an intermodal selective attention task were observed in response
to unimodal stimuli (Lakatos et al. 2008, 2009). What would be the effect of a phase reset when
auditory and visual stimuli are presented simultaneously? Wang et al. (2008) analyzed neuronal
firing responses to light with or without paired auditory noise stimuli using single-unit recordings
in V1. When stimuli were presented passively, firing rate in a population of V1 neurons increased
and remained high for 500 ms. V1 population responses to a visual target without sound during
visual detection tasks appeared as double peaks in a temporal pattern. The timing of each peak after
response onset was in the range of cycle length of gamma or theta frequency bands. In response to
AV stimuli, an additional peak near the time frame of a full delta cycle showed up in the temporal
firing pattern. Although translation of firing activity into underlying membrane potential is not
straightforward, those activity parameters are roughly monotonically proportional to each other
(e.g., Anderson et al. 2000). Thus, the oscillatory pattern of neuronal firing suggests oscillatory
modulation of neuronal excitability by the nonauditory stimuli.
However, AV integration has been observed in many higher cortical areas even when subjects were
only required to maintain visual fixation without further demands of a task (PFC, Sugihara et al.
2006; STP, Barraclough et al. 2005; AC, Ghazanfar et al. 2005; Kayser et al. 2008). Does this mean
audiovisual interactions happen automatically? The answer may depend on the level of the system
being studied, as well as the behavioral states, as discussed below.
enhancement occurs in just sensory systems or somewhere else additionally. As Miller et al. (2001)
showed, motor cortical activation triggered by sensory stimuli reflected that sensory signals were
already integrated at the stage of primary motor cortex, it is possible that activation of PPC, PFC,
particularly PM areas or SC is facilitated by redundant sensory inputs. These possibilities are not
fully discerned yet. The possibility of additional sources for facilitated reaction time was also sug-
gested by the findings of Wang et al. (2008). When intense visual stimuli were presented, additional
auditory stimuli did not affect visual response in V1, but it did influence saccadic reaction time. This
suggests either that visual response is facilitated somewhere in the visual system outside of V1 or
that auditory stimuli directly affect motor responses.
definitive role of sensory attention in AV integration. To get a clear picture on the role attention
plays in multisensory interactions, more studies are needed in which attention, even unimodal, is
controlled through behavioral tasks and stimuli. It will be also important to investigate issues of
attentional load because differences in selective attention may only emerge under high load condi-
tions, as under high attentional loads in attended modality subjects may try to ignore stimuli of
irrelevant modalities either consciously or unconsciously.
ACKNOWLEDGMENT
This work was supported by grant nos. K01MH082415, R21DC10415, and R01MH61989.
REFERENCES
Aggleton, J.P., and M. Mishkin. 1990. Visual impairments in macaques following inferior temporal lesions are
exacerbated selectively by additional damage to superior temporal sulcus. Behavioural Brain Research
39:262–274.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual
neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific
coding. Brain Research 1242:95–101.
Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal sub-
threshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549.
Allon, N., and Z. Wollberg. 1978. Responses of cells in the superior colliculus of the squirrel monkey to audi-
tory stimuli. Brain Research 159:321–330.
Andersen, R.A., L.H. Snyder, D.C. Bradley, and J. Xing. 1997. Multimodal representation of space in
the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience
20:303–330.
Anderson, J., I. Lampl, I. Reichova, M. Carandini, and D. Ferster. 2000. Stimulus dependence of two-state
fluctuations of membrane potential in cat visual cortex. Nature Neuroscience 3:617–621.
Anderson, K.C., and R.M. Siegel. 1999. Optic flow selectivity in the anterior superior temporal polysensory
area, STPa, of the behaving monkey. Journal of Neuroscience 19:2681–2691.
Anderson, K.C., and R.M. Siegel. 2005. Three-dimensional structure-from-motion selectivity in the anterior
superior temporal polysensory area STPs of the behaving monkey. Cerebral Cortex 15:1299–1307.
Aosaki, T., M. Kimura, and A.M. Graybiel. 1995. Temporal and spatial characteristics of tonically active neu-
rons of the primate’s striatum. Journal of Neurophysiology 73:1234–1252.
Aou, S., Y. Oomura, H. Nishino, et al. 1983. Functional heterogeneity of single neuronal activity in the monkey
dorsolateral prefrontal cortex. Brain Research 260:121–124.
Artchakov, D., D. Tikhonravov, V. Vuontela, I. Linnankoski, A. Korvenoja, and S. Carlson. 2007. Processing of
auditory and visual location information in the monkey prefrontal cortex. Experimental Brain Research
180:469–479.
Azuma, M., and H. Suzuki. 1984. Properties and distribution of auditory neurons in the dorsolateral prefrontal
cortex of the alert monkey. Brain Research 298:343–346.
Baizer, J.S., L.G. Ungerleider, and R. Desimone. 1991. Organization of visual inputs to the inferior temporal
and posterior parietal cortex in macaques. Journal of Neuroscience 11:168–190.
Baizer, J.S., R. Desimone, and L.G. Ungerleider. 1993. Comparison of subcortical connections of inferior tem-
poral and posterior parietal cortex in monkeys. Visual Neuroscience 10:59–72.
Barbas, H., H. Ghashghaei, S.M. Dombrowski, and N.L. Rempel-Clower. 1999. Medial prefrontal cortices
are unified by common connections with superior temporal cortices and distinguished by input from
memory-related areas in the rhesus monkey. Journal of Comparative Neurology 410:343–367.
Barbas, H., and M.M. Mesulam. 1981. Organization of afferent input to subdivisions of area 8 in the rhesus
monkey. Journal of Comparative Neurology 200:407–431.
Barnes, C.L., and D.N. Pandya. 1992. Efferent cortical connections of multimodal cortex of the superior tem-
poral sulcus in the rhesus monkey. Journal of Comparative Neurology 318:222–244.
Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17:377–391.
88 The Neural Bases of Multisensory Processes
Baylis, G.C., E.T. Rolls, and C.M. Leonard. 1987. Functional subdivisions of the temporal lobe neocortex.
Journal of Neuroscience 7:330–342.
Bell, A.H., B.D. Corneil, D.P. Munoz, and M.A. Meredith. 2003. Engagement of visual fixation suppresses sen-
sory responsiveness and multisensory integration in the primate superior colliculus. European Journal of
Neuroscience 18:2867–2873.
Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57:849–872.
Besle, J., Bertrand, O., and Giard, M.H. 2009. Electrophysiological (EEG, sEEG, MEG) evidence for multiple
audiovisual interactions in the human auditory cortex. Hearing Research 258:143–151.
Blatt, G.J., D.N. Pandya, and D.L. Rosene. 2003. Parcellation of cortical afferents to three distinct sectors in
the parahippocampal gyrus of the rhesus monkey: An anatomical and neurophysiological study. Journal
of Comparative Neurology 466:161–179.
Bologninia, N., I. Senna, A. Maravita, A. Pascual-Leone, and L.B. Merabet. 2010. Auditory enhancement
of visual phosphene perception: The effect of temporal and spatial factors and of stimulus intensity.
Neuroscience Letters 477:109–114.
Bon, L., and C. Lucchetti. 2006. Auditory environmental cells and visual fixation effect in area 8B of macaque
monkey. Experimental Brain Research 168:441–449.
Born, R.T., and D.C. Bradley. 2005. Structure and function of visual area MT. Annual Review of Neuroscience
28:157–189.
Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate audi-
tory cortex of highly trained monkeys. Journal of Neuroscience. 25:6797–6806.
Brothers, L., B. Ring, and A. Kling. 1990. Response of neurons in the macaque amygdala to complex social
stimuli. Behavioural Brain Research 41:199–213.
Bruce, C.J., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46:369–384.
Bruce, C.J., R. Desimone, and C.G. Gross. 1986. Both striate cortex and superior colliculus contributes to
visual properties of neurons in superior temporal polysensory area of macaque monkey. Journal of
Neurophysiology 55:1057–1075.
Burton, H., and E.G. Jones. 1976. The posterior thalamic region and its cortical projection in new world and old
world monkeys. Journal of Comparative Neurology 168:249–302.
Carmichael, S.T., and J.L. Price. 1995. Sensory and premotor connections of the orbital and medial prefrontal
cortex of macaque monkeys. Journal of Comparative Neurology 363:642–664.
Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cerebral Cortex 11:1110–1123.
Calvert, G.A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of
visible speech. Journal of Cognitive Neuroscience 15:57–70.
Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends in Cognitive
Sciences 11:535–543.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. European Journal of Neuroscience 22:2886–2902.
Cappe, C., A. Morel, P. Barone, and E. Rouiller. 2009. The thalamocortical projection systems in primate: An
anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19:2025–2037.
Cappe, C., M.M. Murray, P. Barone, and E.M. Rouiller. 2010. Multisensory facilitation of behavior in monkeys:
Effects of stimulus intensity. Journal of Cognitive Neuroscience 22:2850–2863.
Cate, A.D., T.J. Herron, E.W. Yund, et al. 2009. Auditory attention activates peripheral visual cortex. PLoS
ONE 4:e4645.
Cavada, C., and P.S. Goldman-Rakic. 1989a. Posterior parietal cortex in rhesus monkey: I. Parcellation of areas
based on distinctive limbic and sensory corticocortical connections. Journal of Comparative Neurology
287:393–421.
Cavada, C., and P.S. Goldman-Rakic. 1989b. Posterior parietal cortex in rhesus monkey: II. Evidence for
segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. Journal of
Comparative Neurology 287:422–445.
Cavada, C., T. Company, J. Tejedor, R.J. Cruz-Rizzolo, and F. Reinoso-Suarez. 2000. The anatomical connec-
tions of the macaque monkey orbitofrontal cortex. A review. Cerebral Cortex 10:220–242.
Chakladar, S., N.K. Logothetis, and C.I. Petkov. 2008. Morphing rhesus monkey vocalizations. Journal of
Neuroscience Methods 170:45–55.
Audiovisual Integration in Nonhuman Primates 89
Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices
differently in the superior temporal sulcus. Journal of Neurophysiology 101:773–788.
Chandrasekaran, C., A. Trubanova, S. Stillittano, A. Caplier, and A.A. Ghazanfar. 2009. The natural statistics
of audiovisual speech. PLoS Computational Biology 5:e1000436.
Chen, C.M., P. Lakatos, A.S. Shah, et al. 2007. Functional anatomy and interaction of fast and slow visual
pathways in macaque monkeys. Cerebral Cortex 17:1561–1569.
Cheney, D.L., and Seyfarth, R.M. 1990. How Monkeys See the World. Chicago: Univ. of Chicago Press.
Ciaramitaro, V.M., G.T. Buracas, and G.M. Boynton. 2007. Spatial and crossmodal attention alter responses to
unattended sensory information in early visual and auditory human cortex. Journal of Neurophysiology
98:2399–2413.
Clower, D.M., R.A. West, J.C. Lynch, and P.L. Strick. 2001. The inferior parietal lobule is the target of output
from the superior colliculus, hippocampus, and cerebellum. Journal of Neuroscience. 21:6283–6291.
Cohen, Y.E. 2009. Multimodal activity in the parietal cortex. Hearing Research 258:100–105.
Cohen, Y.E., and R.A. Andersen. 2000. Reaches to sounds encoded in an eye-centered reference frame. Neuron
27:647–652.
Cohen, Y.E., and R.A. Andersen. 2002. A common reference frame for movement plans in the posterior parietal
cortex. Nature Reviews. Neuroscience 3:553–562.
Cohen, Y.E., A.P. Batista, and R.A. Andersen. 2002. Comparison of neural activity preceding reaches to audi-
tory and visual stimuli in the parietal reach region. Neuroreport 13:891–894.
Cohen, Y.E., I.S. Cohen, and G.W. Gifford III. 2004. Modulation of LIP activity by predictive auditory and
visual cues. Cerebral Cortex 14:1287–1301.
Cohen, Y.E., B.E. Russ, S.J. Davis, A.E. Baker, A.L. Ackelson, and R. Nitecki. 2009. A functional role for the
ventrolateral prefrontal cortex in non-spatial auditory cognition. Proceedings of the National Academy of
Sciences of the United States of America 106:20045–20050.
Colombo, M., and C.G. Gross. 1994. Responses of inferior temporal cortex and hippocampal neurons
during delayed matching to sample in monkeys (Macaca fascicularis). Behavioral Neuroscience
108:443–455.
Colombo, M., H.R. Rodman, and C.G. Gross. 1996. The effects of superior temporal cortex lesions on the
processing and retention of auditory information in monkeys (Cebus apella). Journal of Neuroscience.
16:4501–4517.
Cooke, D.F., and M.S.A. Graziano. 2004a. Super-flinchers and nerves of steel: Defensive movements altered
by chemical manipulation of a cortical motor area. Neuron 43:585–593.
Cooke, D.F., and M.S.A. Graziano. 2004b. Sensorimotor integration in the precentral gyrus: Polysensory neu-
rons and defensive movements. Journal of Neurophysiology 91:1648–1660.
Cui, Q.N., L. Bachus, E. Knoth, W.E. O’Neill, and G.D. Paige. 2008. Eye position and cross-sensory learning
both contribute to prism adaptation of auditory space. Progress in Brain Research 171:265–270.
Cui, J., M. Wilke, N.K. Logothetis, D.A. Leopold, and H. Liang. 2009. Visibility states modulate microsaccade
rate and direction. Vision Research 49:228–236.
Cusick, C.G., B. Seltzer, M. Cola, and E. Griggs. 1995. Chemoarchitectonics and corticocortical terminations
within the superior temporal sulcus of the rhesus monkey: Evidence for subdivisions of superior temporal
polysensory cortex. Journal of Comparative Neurology 360:513–535.
Cynader, M., and N. Berman. 1972. Receptive field organization of monkey superior colliculus. Journal of
Neurophysiology 35:187–201.
Dahl, C.D., N.K. Logothetis, and C. Kayser. 2009. Spatial organization of multisensory responses in temporal
association cortex. Journal of Neuroscience. 29:11924–11932.
de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006a. Cortical connections of the auditory cortex
in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:27–71.
de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006b. Thalamic connections of the auditory cor-
tex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:72–96.
De Souza, W.C., S. Eifuku, R. Tamura, H. Nishijo, and T. Ono. 2005. Differential characteristics of face neu-
ron responses within the anterior superior temporal sulcus of macaques. Journal of Neurophysiology
94:1251–1566.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Desimone, R., and C.G. Gross. 1979. Visual areas in the temporal cortex of the macaque. Brain Research
178:363–380.
90 The Neural Bases of Multisensory Processes
Diederich, A., and H. Colonius. 2004. Modeling the time course of multisensory interaction in manual and
saccadic responses. In Handbook of Multisensory Processes, ed. G. Calvert, C. Spence, and B.E. Stein,
373–394. Cambridge, MA: MIT Press.
Disbrow, E., E. Litinas, G.H. Recanzone, J. Padberg, and L. Krubitzer. 2003. Cortical connections of the sec-
ond somatosensory area and the parietal ventral area in macaque monkeys. Journal of Comparative
Neurology 462:382–399.
Dobelle, W.H., M.G. Mladejovsky, and J.P. Girvin. 1974. Artificial vision for the blind: Electrical stimulation
of visual cortex offers hope for a functional prosthesis. Science 183:440–444.
Duffy, C.J., and R.H. Wurtz. 1991. Sensitivity of MST neurons to optic flow stimuli: I. A continuum of response
selectivity to large-field stimuli. Journal of Neurophysiology 65:1329–1345.
Eaccott, M.J., C.A. Heywood, C.G. Gross, and A. Cowey. 1993. Visual discrimination impairments fol-
lowing lesions of the superior temporal sulcus are not specific for facial stimuli. Neuropsychologia
31:609–619.
Eifuku, S., W.C. De Souza, R. Tamura, H. Nishijo, and T. Ono. 2004. Neuronal correlates of face identification
in the monkey anterior temporal cortical areas. Journal of Neurophysiology 91:358–371.
Engbert, R., and R. Kliegl. 2003. Microsaccades uncover the orientation of covert attention. Vision Research
43:1035–1045.
Evans, T.A., S. Howell, and G.C. Westergaard. 2005. Auditory–visual cross-modal perception of communica-
tive stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology. Animal
Behavior Processes 31:399–406.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience. 22:5749–5759.
Falchier, A., C.E. Schroeder, T.A. Hackett, et al. 2010. Projection from visual areas V2 and prostriata to caudal
auditory cortex in the monkey. Cerebral Cortex 20:1529–1538.
Felleman, D.J., and J.H. Kaas. 1984. Receptive field properties of neurons in middle temporal visual area (MT)
of owl monkeys. Journal of Neurophysiology 52:488–513.
Fogassi, L., V. Gallese, L. Fadiga, F. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal
space in inferior premotor cortex (area F4). Journal of Neurophysiology 76:141–157.
Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in mon-
key superior colliculus. Brain Research Bulletin 46:211–224.
Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995. Spatial and temporal factors determine auditory–
`visual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–816.
Fu, K.G., T.A. Johnston, A.S. Shah, et al. 2003. Auditory cortical neurons respond to somatosensory stimula-
tion. Journal of Neuroscience. 23:7510–7515.
Fu, K.G., A.S. Shah, M.N. O’Connell, et al. 2004. Timing and laminar profile of eye-position effects on audi-
tory responses in primate auditory cortex. Journal of Neurophysiology 92:3522–3531.
Fuster, J.M., M. Bodner, and J.K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of
frontal cortex. Nature 405:347–351.
Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporal–
frontal interaction in the rhesus monkey. Brain 114:2133–2144.
Ghazanfar, A.A., and N.K. Logothetis. 2003. Facial expressions linked to monkey calls. Nature 423:934–934.
Ghazanfar, A.A., and L.R. Santos. 2004. Primate brains in the wild: The sensory bases for social interactions.
Nature Reviews. Neuroscience 5:603–616.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–285.
Ghazanfar, A.A., J.G. Neuhoff, and N.K. Logothetis. 2002. Auditory looming perception in rhesus monkeys.
Proceedings of the National Academy of Sciences of the United States of America 99:15755–15757.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience. 25:5004–5012.
Ghazanfar, A.A., K. Nielsen, and N.K. Logothetis. 2006. Eye movements of monkey observers viewing vocal-
izing conspecifics. Cognition 101:515–529.
Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience. 28:4457–4469.
Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in
humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–490.
Gibson, J.R., and J.H.R. Maunsell. 1997. Sensory modality specificity of neural activity related to memory in
visual cortex. Journal of Neurophysiology 78:1263–1275.
Audiovisual Integration in Nonhuman Primates 91
Gifford III, G.W., and Y.E. Cohen. 2005. Spatial and non-spatial auditory processing in the lateral intraparietal
area. Experimental Brain Research 162:509–512.
Gifford III, G.W., K.A. MacLean, M.D. Hauser, and Y.E. Cohen. 2005. The neurophysiology of functionally
meaningful categories: Macaque ventrolateral prefrontal cortex plays a critical role in spontaneous cat-
egorization of species-specific vocalizations. Journal of Cognitive Neuroscience 17:1471–1482.
Goldman-Rakic, P.S., A.R. Cools, and K. Srivastava. 1996. The prefrontal landscape: Implications of functional
architecture for understanding human mentation and the central executive. Philosophical Transactions of
the Royal Society of London. Series B, Biological Sciences 351:1445–1453.
Goodale, M.A., and A.D. Milner. 1992. Separate visual pathways for perception and action. Trends in
Neurosciences 15:20–25.
Graziano, M.S.A., and S. Gandhi. 2000. Location of the polysensory zone in the precentral gyrus of anesthe-
tized monkeys. Experimental Brain Research 135:259–266.
Graziano, M.S.A., X.T. Hu, and C.G. Gross. 1997. Visuospatial properties of ventral premotor cortex. Journal
of Neurophysiology 77:2268–2292.
Graziano, M.S.A., L.A.J. Reiss, and C.G. Gross. 1999. A neuronal representation of the location of nearby
sounds. Nature 397:428–430.
Graziano, M.S.A., G.S. Yap, and C.G. Gross. 1994. Coding of visual space by premotor neurons. Science
266:1054–1057.
Green, K.P., P.K. Kuhl, A.N. Meltzoff, and E.B. Stevens. 1991. Integrating speech information across talk-
ers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception &
Psychophysics 50:524–536.
Groh, J.M., A.S. Trause, A.M. Underhill, K.R. Clark, and S. Inati. 2001. Eye position influences auditory
responses in primate inferior colliculus. Neuron 29:509–518.
Grunewald, A., J.F. Linden, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intra-
parietal area I. Effects of training. Journal of Neurophysiology 82:330–342.
Gu, Y., D.E. Angelaki, and G.C. DeAngelis. 2008. Neural correlates of multisensory cue integration in macaque
MSTd. Nature Neuroscience 11:1201–1210.
Hackett, T.A. 2002. The comparative anatomy of the primate auditory cortex. In: Primate Audition: Ethology
and Neurobiology, ed. Asif A. Ghazanfar, 199–226. Boca Raton, FL: CRC.
Hackett, T.A., L.A. de la Mothe, I. Ulbert, G. Karmos, J.F. Smiley, and C.E. Schroeder. 2007. Multisensory
convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane.
Journal of Comparative Neurology 502:894–923.
Hackett, T.A., T.M. Preuss, and J.H. Kaas. 2001. Architectonic identification of the core region in auditory
cortex of macaques, chimpanzees, and humans. Journal of Comparative Neurology 441:197–222.
Hackett, T.A, I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in
macaque monkeys. Brain Research 817:45–58.
Hairston, W.D., D.A. Hodges, J.H. Burdette, and M.T. Wallace. 2006. Auditory enhancement of visual tempo-
ral order judgment. Neuroreport 17:791–795.
Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior
bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology
60:1615–1637.
Hikosaka, O., M. Sakamoto, and S. Usui. 1989. Functional properties of monkey caudate neurons: II. Visual
and auditory responses. Journal of Neurophysiology 61:799–813.
Hoffman, K.L., A.A. Ghazanfar, I. Gauthier, and N.K. Logothetis. 2008. Category-specific responses to faces
and objects in primate auditory cortex. Frontiers in Systems Neuroscience 1:2.
Hoffman, K.L., K.M. Gothard, M.C. Schmid, and N.K. Logothetis. 2007. Facial-expression and gaze-selective
responses in the monkey amygdala. Current Biology 17:766–772.
Ito, S. 1982. Prefrontal activity of macaque monkeys during auditory and visual reaction time tasks. Brain
Research 247:39–47.
Iversen, S.D., and M. Mishkin. 1973. Comparison of superior temporal and inferior prefrontal lesions on audi-
tory and non-auditory task in rhesus monkeys. Brain Research 55:355–367.
Izumi, A., and S. Kojima. 2004. Matching vocalizations to vocalizing faces in chimpanzee (Pan troglodytes).
Animal Cognition 7:179–184.
Jääskeläinen, I.P., J. Ahveninen, J.W. Belliveau, T. Raij, and M. Sams. 2007. Short-term plasticity in auditory
cognition. Trends in Neurosciences 30:653–661.
Jay, M.F., and D.L. Sparks. 1984. Auditory receptive fields in primate superior colliculus shift with changes in
eye position. Nature 309:345–347.
Jones, E.G. 1998. Viewpoint: The core and matrix of thalamic organization. Neuoroscience 85:331–345.
92 The Neural Bases of Multisensory Processes
Jordan, K.E., E.M. Brannon, N.K. Logothetis, and A.A. Ghazanfar. 2005. Monkeys match the number of voices
they hear to the number of faces they see. Current Biology 15:1034–1038.
Joseph, J.P., and P. Barone. 1987. Prefrontal unit activity during a delayed oculomotor task in the monkey.
Experimental Brain Research 67:460–468.
Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates.
Proceedings of the National Academy of Sciences of the United States of America 97:11793–11799.
Kajikawa, Y., C.E. Schroeder. 2008. Face–voice integration and vocalization processing in the monkey.
Abstracts Society for Neuroscience 852.22.
Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices
and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7.
Kayser, C.I., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48:373–384.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation
of specific fields in auditory cortex. Journal of Neuroscience 27:1824–1835.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–1574.
Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices
and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7.
Kayser, C., N.K. Logothetis, and S. Panzeri. 2010. Visual enhancement of the information representation in
auditory cortex. Current Biology 20:19–24.
Keysers, C., E. Kohler, M.A. Umilta, L. Nanetti, L. Fogassi, and V. Gallese. 2003. Audiovisual mirror neurons
and action recognition. Experimental Brain Research 153:628–636.
Kikuchi-Yorioka, Y., and T. Sawaguchi. 2000. Parallel visuospatial and audiospatial working memory pro-
cesses in the monkey dorsolateral prefrontal cortex. Nature Neuroscience 3:1075–1076.
Kimura, M. 1992. Behavioral modulation of sensory responses of primate putamen neurons. Brain Research
578:204–214.
Knudsen, E.I., and P.F. Knudsen. 1989. Vision calibrates sound localization in developing barn owls. Journal
of Neuroscience 9:3306–3313.
Kohler, E., C. Keysers, M.A. Umilta, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing sounds, under-
standing actions: Action representation in mirror neurons. Science 297:846–848.
Kojima, S., A. Izumi, and M. Ceugniet. 2003. Identification of vocalizers by pant hoots, pant grants and screams
in a chimpanzee. Primates 44:225–230.
Kondo, H., K.S. Saleem, and J.L. Price. 2003. Differential connections of the temporal pole with the orbital and
medial prefrontal networks in macaque monkeys. Journal of Comparative Neurology 465:499–523.
Kosmal, A., M. Malinowska, and D.M. Kowalska. 1997. Thalamic and amygdaloid connections of the audi-
tory association cortex of the superior temporal gyrus in rhesus monkey (Macaca mulatta). Acta
Neurobiologiae Experimentalis 57:165–188.
Kubota, K., M. Tonoike, and A. Mikami. 1980. Neuronal activity in the monkey dorsolateral prefrontal cortex
during a discrimination task with delay. Brain Research 183:29–42.
Kuraoka, K., and K. Nakamura. 2007. Responses of single neurons in monkey amygdala to facial and vocal
emotions. Journal of Neurophysiology 97:1379–1387.
Lakatos, P., C.-M. Chen, M. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multi-
sensory interaction in primary auditory cortex. Neuron 53:279–292.
Lakatos, P., G., Karmos, A.D. Mehta, I. Ulbert, and C.E. Schroeder. 2008. Entrainment of neural oscillations as
a mechanism of attentional selection. Science 320:110–113.
Lakatos, P., M.N. O’Connell, A. Barczak, A. Mills, D.C. Javitt, and C.E. Schroeder. 2009. The leading sense:
Supramodal control of neurophysiological context by attention. Neuron 64:419–430.
Lakatos, P., A.S. Shaw, K.H. Knuth, I. Ulbert, G. Karmos, and C.E. Schroeder. 2005. An oscillatory hier-
archy controlling neuronal excitability and stimulu processing in the auditory cortex. Journal of
Neurophysiology 94:1904–1911.
Lehmann, C., M. Herdener, F. Esposito, et al. 2006. Differential patterns of multisensory interactions in core
and belt areas of human auditory cortex. Neuroimage 31:294–300.
Lehmann, S., and M.M. Murray. 2005. The role of multisensory memories in unisensory object discrimination.
Brain Research. Cognitive Brain Research 24:326–334.
Leonard, C.M., E.T. Rolls. F.A. Wilson and G.C. Baylis. 1985. Neurons in the amygdala of the monkey with
responses selective for faces. Behavioural Brain Research 15:159–176.
Levy, R., and P.S. Goldman-Rakic. 2000. Segregation of working memory functions within the dorsolateral
prefrontal cortex. Experimental Brain Research 133:23–32.
Audiovisual Integration in Nonhuman Primates 93
Lewis, J.W., and D.C. Van Essen. 2000. Corticocortical connections of visual, sensorimotor, and multi modal pro
cessing areas in the parietal lobe of the macaque monkey. Journal of Comparative Neurology
428:112–137.
Linden, J.F., A. Grunewald, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intra-
parietal area: II. Behavioral modulation. Journal of Neurophysiology 82:343–358.
Maier, J.X., J.G. Neuhoff, N.K. Logothetis, and A.A. Ghazanfar. 2004. Multisensory integration of looming
signals by rhesus monkeys. Neuron 43:177–181.
Maier, J.X., C. Chandrasekaran, and A.A. Ghazanfar. 2008. Integration of bimodal looming signals through
neuronal coherence in the temporal lobe. Current Biology 18:963–968.
Martinez, L., and T. Matsuzawa. 2009. Auditory–visual intermodal matching based on individual recognition
in a chimpanzee (Pan troglodytes). Animal Cognition 12:S71–S85.
Matsumoto, N., T. Minamimoto, A.M. Graybiel, and M. Kimura. 2001. Neurons in the thalamic CM-Pf com-
plex supply striatal neurons with information about behaviorally significant sensory events. Journal of
Neurophysiology 85:960–976.
Mazzoni, P., R.P. Bracewell, S. Barash, and R.A. Andersen. 1996. Spatially tuned auditory responses in area
LIP of macaques performing delayed memory saccades to acoustic targets. Journal of Neurophysiology
75:1233–1241.
McDonald, J.J., W.A. Teder-Sälejärvi, F. Di Russo, and S.A. Hillyard. 2003. Neural substrates of perceptual
enhancement by cross-modal spatial attention. Journal of Cognitive Neuroscience 15:10–19.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–748.
McNaughton, B.L., F.P. Battagllia, O. Jensen, E.I. Moser, and M.B. Moser. 2006. Path integration and the neu-
ral basis of the ‘cognitive map.’ Nature Reviews. Neuroscience 7:663–678.
Mehta, A.D., U. Ulbert, and C.E. Schroeder. 2000a. Intermodal selective attention in monkeys: I. Distribution
and timing of effects across visual areas. Cerebral Cortex 10:343–358.
Mehta, A.D., U. Ulbert, and C.E. Schroeder. 2000b. Intermodal selective attention in monkeys: II. Physiological
mechanisms of modulation. Cerebral Cortex 10:359–370.
Meredith, M.A., B.L. Allman, L.P. Keniston, and H.R. Clemo. 2009. Auditory influences on non-auditory cor-
tices. Hearing Research 258:64–71.
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons: I. Temporal factors. Journal of Neuroscience 7:3215–3229.
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–391.
Meyer K., J.T. Kaplan, R. Essec, C. Webber, H. Damasio, and A. Damasio. 2010. Predicting visual stimuli on
the basis of activity in auditory cortices. Nature Neuroscience 13:667–668.
Miller, J.O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology
14:247–279
Miller, J., R. Ulrich, and Y. Lanarre. 2001. Locus of the redundant-signals effect in bimodal divided attention:
A neurophysiological analysis. Perception & Psychophysics 63:555–562.
Mohedano-Moriano, A., P. Pro-Sistiaga, M.M. Arroyo-Jimenez, et al. 2007. Topographical and laminar distri-
bution of cortical input to the monkey entorhinal cortex. Journal of Anatomy 211:250–260.
Mohedano-Moriano, A., A. Martinez-Marcos, P. Pro-Sistiaga, et al. 2008. Convergence of unimodal and poly-
modal sensory input to the entorhinal cortex in the fascicularis monkey. Neuroscience 151:255–271.
Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory–
visual interactions during early sensory processing in humans: A high-density electrical mapping study.
Brain Research. Cognitive Brain Research 14, 115–128.
Molholm, S., A. Martinez, M. Shpaner, and J.J. Foxe. 2007. Object-based attention is multisensory: Co-activation
of an object’s representations in ignored sensory modalities. European Journal of Neuroscience 26:
499–509.
Mullette-Gilman, O.A., Y.E. Cohen, and J.M. Groh. 2005. Eye-centered, head-centered, and complex coding of
visual and auditory targets in the intraparietal sulcus. Journal of Neurophysiology 94:2331–2352.
Mullette-Gilman, O.A., Y.E. Cohen, and J.M. Groh. 2009. Motor-related signals in the intraparietal cortex
encode locations in a hybrid, rather than eye-centered reference frame. Cerebral Cortex 19:1761–1775.
Murata, A., L. Fadiga, L. Fogassi, V. Gallese, V. Raos, and G. Rizzolatti. 1997. Object representation in the
ventral premotor cortex (area F5) of the monkey. Journal of Neurophysiology 78:2226–2230.
Murray, E.A., and D. Gaffan. 1994. Removal of the amygdala plus subjacent cortex disrupts the retention of both
intramodal and crossmodal associative memories in monkeys. Behavioral Neuroscience 108:494–500.
Murray, E.A., and B.J. Richmond. 2001. Role of perirhinal cortex in object perception, memory, and associa-
tions Current Opinion in Neurobiology 11:188–193.
94 The Neural Bases of Multisensory Processes
Murray, M.M., C.M. Michel, R.G. de Peralta, et al. 2004. Rapid discrimination of visual and multisensory
memories revealed by electrical neuroimaging. Neuroimage 21:125–135.
Murray, M.M., J.J. Foxe, and G.R. Wylie. 2005. The brain uses single-trial multisensory memories to discrimi-
nate without awareness. Neuroimage 27:473–478.
Musacchia, G., M. Sams, T. Nicol, and N. Kraus. 2006. Seeing speech affects acoustic information processing
in the human brainstem. Experimental Brain Research 168:1–10.
Musacchia, G., and C.E. Schroeder. 2009. Neuronal mechanisms, response dynamics and perceptual functions
of multisensory interactions in auditory cortex. Hearing Research 258:72–79.
Nager, W., K. Estorf, and T.F. Münte. 2006. Crossmodal attention effects on brain responses to different stimu-
lus classes. BMC Neuroscience 7:31.
Navarra, J., A. Alsius, S. Soto-Faraco, and C. Spence. 2010. Assessing the role of attention in the audiovisual
integration of speech. Information Fusion 11:4–11.
Neal, J.W., R.C. Pearson, and T.P. Powell. 1990. The connections of area PG, 7a, with cortex in the parietal,
occipital and temporal lobes of the monkey. Brain Research 532:249–264.
Nelissen, K., W. Vanduffel, and G.A. Orban. 2006. Charting the lower superior temporal region, a new motion-
sensitive region in monkey superior temporal sulcus. Journal of Neuroscience 26:5929–5947.
Newman, J.D., and D.F. Lindsley. 1976. Single unit analysis of auditory processing in squirrel monkey frontal
cortex. Experimental Brain Research 25:169–181.
Nishijo, H., T. Ono, and H. Nishino. 1988a. Topographic distribution of modality-specific amygdalar neurons
in alert monkey. Journal of Neuroscience 8:3556–3569.
Nishijo, H., T. Ono, and H. Nishino. 1988b. Single neuron responses in amygdala of alert monkey during com-
plex sensory stimulation with affective significance. Journal of Neuroscience 8:3570–3583.
Nyberg, L., R. Habib, A.R. McIntosh, and E. Tulving. 2000. Reactivation of encoding-related brain activ-
ity during memory retrieval. Proceedings of the National Academy of Sciences of the United States of
America 97:11120–11124.
Ono, T., K. Nakamura, H. Nishijo, and S. Eifuku. 1993. Monkey hippocampal neurons related to spatial and
nonspatial functions. Journal of Neurophysiology 70:1516–1529.
Oram, M.W., and D.I. Perrett. 1996. Integration of form and motion in the anterior superior temporal polysen-
sory area (STPa) of the macaque monkey. Journal of Neurophysiology 76:109–129.
Oram, M.W., D.I. Perrett, and J.K. Hietanen. 1993. Directional tuning of motion-sensitive cells in the anterior
superior temporal polysensory area of the macaque. Experimental Brain Research 97:274–294.
Padberg, J., B. Seltzer, and C.G. Cusick. 2003. Architectonics and cortical connections of the upper bank
of the superior temporal sulcus in the rhesus monkey: An analysis in the tangential plane. Journal of
Comparative Neurology 467:418–434.
Padberg, J., E. Disbrow, and L. Krubitzer. 2005. The organization and connections of anterior and posterior
parietal cortex in titi monkeys: Do new world monkeys have an area 2? Cerebral Cortex 15:1938–1963.
Parr, L.A., E. Hecht, S.K. Barks, T.M. Preuss, and J.R. Votaw. 2009. Face processing in the chimpanzee brain.
Current Biology 19:50–53.
Partan, S.R. 2002. Single and multichannel signal composition: Facial expressions and vocalizations of rhesus
macaques (Macaca mulatta). Behavior 139:993–1027.
Perrett, D.I., E.T. Rolls, and W. Caan. 1982. Visual neurones responsive to faces in the monkey temporal cortex.
Experimental Brain Research 47:329–342.
Perrott, D.R., K. Saberi, K. Brown, and T.Z. Strybel. 1990. Auditory psychomotor coordination and visual
search performance. Perception & Psychophysics 48:214–226.
Petkov, C.I., C. Kayser, T. Steudel, K. Whittingstall, M. Augath, and N.K. Logothetis. 2008. A voice region in
the monkey brain. Nature Neuroscience 11:367–374.
Petrides, M., and D.N. Pandya. 2002. Comparative cytoarchitectonic analysis of the human and the macaque
ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. European Journal
of Neuroscience 16:291–310.
Petrides, M., and D.N. Pandya. 2009. Distinct parietal and temporal pathways to the homologues of Broca’s
area in the monkey. PLoS Biology 7:e1000170.
Phelps, E.A., and J.E. LeDoux. 2005. Contributions of the amygdala to emotion processing: From animal mod-
els to human behavior. Neuron 48:175–187.
Pinsk, M.A., K. DeSimone, T. Moore, C.G. Gross, and S. Kastner. 2005. Representations of faces and body
parts in macaque temporal cortex: A functional MRI study. Proceedings of the National Academy of
Sciences of the United States of America 102:6996–7001.
Poremba, A., R.C. Saunders, A.M. Crane, M. Cook, L. Sokoloff, and M. Mishkin. 2003. Functional mapping
of the primate auditory system. Science 299:568–572.
Audiovisual Integration in Nonhuman Primates 95
Porter, K.K., R.R. Metzger, and J.M. Groh. 2007. Visual- and saccade-related signals in the primate infe-
rior colliculus. Proceedings of the National Academy of Sciences of the United States of America
104:17855–17860.
Posner, M.I., C.R.R. Snyder, and D.J. Davidson. 1980. Attention and the detection of signals. Journal of
Experimental Psychology. General 109:160–174.
Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of
Sciences 24:574–590.
Rahne, T., and M. Böckmann-Barthel. 2009. Visual cues release the temporal coherence of auditory objects in
auditory scene analysis. Brain Research 1300:125–134.
Ramos-Estebanez, C., L.B. Merabet, K. Machii, et al. 2007. Visual phosphene perception modulated by sub-
threshold crossmodal sensory stimulation. Journal of Neuroscience 27:4178–4181.
Rao, S.C., G. Rainer, and E.K. Miller. 1997. Integration of what and where in the primate prefrontal cortex.
Science 276:821–824.
Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of “what” and “where” in
auditory cortex. Proceedings of the National Academy of Sciences of the United States of America
97:11800–11806.
Rauschecker, J.P., B. Tian, and M. Hauser. 1995. Processing of complex sounds in the macaque nonprimary
auditory cortex. Science 268:111–114.
Rauschecker, J.P., and L.R. Harris. 1989. Auditory and visual neurons in the cat’s superior colliculus selective
for the direction of apparent motion stimuli. Brain Research 490:56–63.
Recanzone, G.H., D.C. Guard, M.L. Phan, and T.K. Su. 2000. Correlation between the activity of single auditory
cortical neurons and sound-localization behavior in the macaque monkey. Journal of Neurophysiology
83:2723–2739.
Ringo, J.L., and S.G. O’Neill. 1993. Indirect inputs to ventral temporal cortex of monkey: The influence on unit
activity of alerting auditory input, interhemispheric subcortical visual input, reward, and the behavioral
response. Journal of Neurophysiology 70:2215–2225.
Rizzolatti, G., and L. Craighero. 2004. The mirror-neuron system. Annual Review of Neuroscience 27:169–192.
Rizzolatti, G., L. Fadiga, V. Gallese, and L. Fogassi. 1996. Premotor cortex and the recognition of motor
actions. Brain Research. Cognitive Brain Research 3:131–141.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50:19–26.
Rolls, E.T., H.D. Critchley, A.S. Browning, and K. Inoue. 2006. Face-selective and auditory neurons in the
primate orbitofrontal cortex. Experimental Brain Research 170:74–87.
Romanski, L.M., B.B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate
ventrolateral prefrontal cortex. Journal of Neurophysiology 93:734–747.
Romanski, L.M., J.F. Bates, and P.S. Goldman-Rakic. 1999a. Auditory belt and parabelt projections to the pre-
frontal cortex in the rhesus monkey. Journal of Comparative Neurology 403:141–157.
Romanski, L.M., and P.S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature
Neuroscience 5:15–16.
Romanski, L.M., B. Tian, J. Fritz, M. Mishkin, P.S. Goldman-Rakic, and J.P. Rauschecker. 1999b. Dual streams
of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience
2:1131–1136.
Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has
opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions.
Journal of Neuroscience 27:11465–11472.
Romei, V., M.M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of
low-level human visual cortex excitability by sounds. Current Biology 19:1799–1805.
Russ, B.E., A.L. Ackelson, A.E. Baker, and Y.E. Cohen. 2008. Coding of auditory-stimulus identity in the audi-
tory non-spatial processing stream. Journal of Neurophysiology 99:87–95.
Saleem, K.S., W. Suzuki, K. Tanaka, and T. Hashikawa. 2000. Connections between anterior inferotemporal cortex
and superior temporal sulcus regions in the macaque monkey. Journal of Neuroscience 20:5083–5101.
Saleem, K.S., H. Kondo, and J.L. Price. 2008. Complementary circuits connecting the orbital and medial
prefrontal networks with the temporal, insular, and opercular cortex in the macaque monkey. Journal of
Comparative Neurology 506:659–693.
Sams, M., R. Aulanko, M. Hämäläinen, et al. 1991. Seeing speech: Visual information from lip movements
modifies activity in the human auditory cortex. Neuroscience Letters 127:141–145.
Santangelo V., and C. Spence. 2009. Crossmodal exogenous orienting improves the accuracy of temporal order
judgments. Experimental Brain Research 194:577–586.
96 The Neural Bases of Multisensory Processes
Santos-Benitez, H., C.M. Magarinos-Ascone, and E. Garcia-Austt. 1995. Nucleus basalis of Meynert cell
responses in awake monkeys. Brain Research Bulletin 37:507–511.
Schiff, W., J.A. Caviness, and J.J. Gibson. 1962. Persistent fear responses in rhesus monkeys to the optical
stimulus of “looming.” Science 136:982–983.
Schlack, A., S.J. Sterbing-D’Angelo, K. Hartung, K.-P. Hoffmann, and F. Bremmer. 2005. Multisensory space
representations in the macaque ventral intraparietal area. Journal of Neuroscience 25:4616–4625.
Schmolesky, M.T., Y. Wang, D.P. Hanes, et al. 1998. Signal timing across the macaque visual system. Journal
of Neurophysiology 79:3272–3278.
Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas
of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–198.
Schroeder, C.E., and J.J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current
Opinion in Neurobiology 15:454–458.
Schroeder, C.E., and P. Lakatos. 2009. Low-frequency neuronal oscillations as instruments of sensory selec-
tion. Trends in Neurosciences 32:9–18.
Schroeder, C.E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual ampli-
fication of speech. Trends in Cognitive Sciences 12:106–113.
Schroeder, C.E., R.W. Lindsley, C. Specht, A. Marcovici, J.F. Smilery, and D.C. Javitt. 2001. Somatosensory input
to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–1327.
Seltzer, B., M.G. Cola, C. Gutierrez, M. Massee, C. Weldon, and C.G. Cusick. 1996. Overlapping and non-
overlapping cortical projections to cortex of the superior temporal sulcus in the rhesus monkey: Double
anterograde tracer studies. Journal of Comparative Neurology 370:173–190.
Seltzer, B., and D.N. Pandya. 1978. Afferent cortical connections and architectonics of the superior temporal
sulcus and surrounding cortex in the rhesus monkey. Brain Research 149:1–24.
Seltzer, B., and D.N. Pandya. 1989. Frontal lobe connections of the superior temporal sulcus in the rhesus
monkey. Journal of Comparative Neurology 281:97–113.
Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior
temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology
343:445–463.
Sherman, S.M., and R.W. Guillery. 2002. The role of the thalamus in the flow of information to the cor-
tex. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
357:1695–1708.
Sliwa, J., J.-R. Duhamel, O. Paxsalis, and S.C. Wirth. 2009. Cross-modal recognition of identity in rhesus mon-
keys for familiar conspecifics and humans. Abstracts Society for Neuroscience 684.14.
Smiley, J.F., T.A. Hackett, I. Ulbert, et al. 2007. Multisensory convergence in auditory cortex, I. Cortical con-
nections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology
502:894–923.
Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental
Psychology. Human Perception and Performance 35:580–587.
Squire, L.R., C.E.L. Stark, and R.E. Clark. 2004. The medial temporal lobe. Annual Review of Neuroscience
27:279–306.
Starr, A., and M. Don. 1972. Responses of squirrel monkey (Samiri sciureus) medial geniculate units to binau-
ral click stimuli. Journal of Neurophysiology 35:501–517.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., W. Jiang, M.T. Wallace, and T.R. Stanford. 2001. Nonvisual influences on visual-information pro-
cessing in the superior colliculus. Progress in Brain Research 134:143–156.
Stein, B.E., M.W. Wallace, T.R. Stanford, and W. Jiang. 2002. Cortex governs multisensory integration in the
midbrain. Neuroscientist 8:306–314.
Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single
neuron. Nature Reviews. Neuroscience 9:255–266.
Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse
effectiveness and the neural processing of speech and object recognition. Neuroimage 44:1210–1223.
Stricane, B., R.A. Andersen, and P. Mazzoni. 1996. Eye-centered, head-centered, and intermediate coding of
remembered sound locations in area LIP. Journal of Neurophysiology 76:2071–2076.
Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual
communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience
26:11138–11147.
Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America 26:212–215.
Audiovisual Integration in Nonhuman Primates 97
Suzuki, W.A., and D.G. Amaral. 1994. Perirhinal and parahippocampal cortices of the macaque monkey:
Cortical afferents. Journal of Comparative Neurology 350:497–533.
Talsma, D., D. Senkowski, and M.G. Woldorff. 2009. Intermodal attention affects the processing of the tempo-
ral alignment of audiovisual stimuli. Experimental Brain Research 198:313–328.
Tamura, R., T. Ono, M. Fukuda, and K. Nakamura. 1992. Spatial responsiveness of monkey hippocampal neu-
rons to various visual and auditory stimuli. Hippocampus 2:307–322.
Tanaka, K., K. Hikosaka, H. Saito, M. Yukie, Y. Fukada, and E. Iwai. 1986. Analysis of local and wide-field
movements in the superior temporal visual areas of the macaque monkey. Journal of Neuroscience
6:134–144.
Tanibuchi I., and P.S. Goldman-Rakic. 2003. Dissociation of spatial-, object-, and sound-coding neurons in the
mediodorsal nucleus of the primate thalamus. Journal of Neurophysiology 89:1067–1077.
Teder-Sälejärvi, W.A., T.F. Münte, F. Sperlich, and S.A. Hillyard. 1999. Intra-modal and cross-modal spatial
attention to auditory and visual stimuli. An event-related brain potential study. Brain Research. Cognitive
Brain Research 8:327–343.
Théoret, H., L. Merabet, and A. Pascual-Leone. 2004. Behavioral and neuroplastic changes in the
blind: Evidence for functionally relevant cross-modal interactions. Journal of Physiology, Paris
98:221–233.
Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus
monkey auditory cortex. Science 292:290–293.
Tsao, D.Y., W.A. Freiwald, R.B.H. Tootell, and M.S. Livingstone. 2006. A cortical region consisting entirely of
face-selective cells. Science 311:670–674.
Tsao, D.Y., S. Moeller, and W.A. Freiwald. 2008a. Comparing face patch systems in macaques and humans.
Proceedings of the National Academy of Sciences of the United States of America 105:19514–19519.
Tsao, D.Y., N. Schweers, S. Moeller, and W.A. Freiwald. 2008b. Patches of face-selective cortex in the macaque
frontal lobe. Nature Neuroscience 11:877–879.
Turner, B.H., M. Mishkin, and M. Knapp. 1980. Organization of the amygdalopetal projections from modality-
specific cortical association areas in the monkey. Journal of Comparative Neurology 191:515–543.
Ungerleider, L.G., and M. Mishkin. 1982. Two cortical visual systems. In Analysis of Visual Behavior, ed. D.J.
Ingle, M.A. Goodale, and R.J.W. Mansfield, 549–586. Cambridge: MIT Press.
Ungerleider, L.G., S.M. Courtney, and J.V. Haxby. 1998. A neural system for human vision working memory.
Proceedings of the National Academy of Sciences of the United States of America 95:883–890.
Updyke, B.V. 1974. Characteristics of unit responses in superior colliculus of the cebus monkey. Journal of
Neurophysiology 37:896–909.
Vaadia, E., D.A. Benson, R.D. Hienz, and M.H. Goldstein Jr. 1986. Unit study of monkey frontal cortex: Active
localization of auditory and of visual stimuli. Journal of Neurophysiology 56:934–952.
van Attenveldt, N., A. Roebroeck, and R. Goebel. 2009. Interaction of speech and script in human auditory
cortex: Insights from neuro-imaging and effective connectivity. Hearing Research 258:152–164.
Vatakis, A., A.A. Ghazanfar, and C. Spence. 2008. Facilitation of multisensory integration by the “unity effect”
reveals that speech is special. Journal of Vision 8(9):14.
von Kriegstein, K., and A.-L. Giraud. 2006. Implicit multisensory associations influence voice recognition.
PLoS Biology 4:e326.
Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs
in primate superior colliculus. Journal of Neurophysiology 76:1246–1266.
Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex
of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79.
Watanabe, M. 1992. Frontal units of the monkey coding the associative significance of visual and auditory
stimuli. Experimental Brain Research 89:233–247.
Watanabe, J., and E. Iwai. 1991. Neuronal activity in visual, auditory and polysensory areas in the monkey
temporal cortex during visual fixation task. Brain Research Bulletin 26:583–592.
Welch, R., and D. Warren. 1986. Intersensory interactions. In Handbook of Perception and Human Performance,
ed. K.R. Boff, L. Kaufman, and J.P. Thomas, 21–36. New York: Wiley.
Werner-Reiss, U., K.A. Kelly, A.S. Trause, A.M. Underhill, and J.M. Groh. 2006. Eye position affects activity
in primary auditory cortex of primates. Current Biology 13:554–562.
Wheeler, M.E., S.E. Petersen, and R.L. Buckner. 2000. Memory’s echo: Vivid remembering reactivates
sensory-specific cortex. Proceedings of the National Academy of Sciences of the United States of America
97:11125–11129.
Wilson, F.A.W., and E.T. Rolls. 1990. Neuronal responses related to reinforcement in the primate basal fore-
brain. Brain Research 509:213–231.
98 The Neural Bases of Multisensory Processes
Wilson, F.A.W., S.P.O. Scalaidhe, and P.S. Goldman-Rakic. 1993. Dissociation of object and spatial processing
in primate prefrontal cortex. Science 260:1955–1958.
Wollberg, Z., and J. Sela. 1980. Frontal cortex of the awake squirrel monkey: Responses of single cells to visual
and auditory stimuli. Brain Research 198:216–220.
Woods, T.M., and G.H. Recanzone. 2004. Visually induced plasticity of auditory spatial perception in macaques.
Current Biology 14:1559–1564.
Wurtz, R.H., and J.E. Albano. 1980. Visual–motor function of the primate superior colliculus. Annual Review
of Neuroscience 3:189–226.
Yeterian, E.H., and D.N. Pandya. 1989. Thalamic connections of the cortex of the superior temporal sulcus in
the rhesus monkey. Journal of Comparative Neurology 282:80–97.
Zangenehpour, S., A.A. Ghazanfar, D.J. Lewkowicz, and R.J. Zatorre. 2009. Heterochrony and cross-species
intersensory matching by infant vervet monkeys. PLoS ONE 4:e4302.
6 Multisensory Influences
on Auditory Processing
Perspectives from fMRI
and Electrophysiology
Christoph Kayser, Christopher I. Petkov,
Ryan Remedios, and Nikos K. Logothetis
CONTENTS
6.1 Introduction.............................................................................................................................99
6.2 The Where and How of Sensory Integration......................................................................... 100
6.3 Using Functional Imaging to Localize Multisensory Influences in Auditory Cortex........... 101
6.4 Multisensory Influences along the Auditory Processing Stream.......................................... 102
6.5 Multisensory Influences and Individual Neurons.................................................................. 104
6.6 Multisensory Influences and Processing of Communication Signals................................... 106
6.7 Conclusions............................................................................................................................ 109
References....................................................................................................................................... 109
6.1 INTRODUCTION
Traditionally, perception has been described as a modular function, with the different sensory
modalities operating as independent and separated processes. Following this view, sensory integra-
tion supposedly occurs only after sufficient unisensory processing and only in higher association
cortices (Jones and Powell 1970; Ghazanfar and Schroeder 2006). Studies in the past decade, how-
ever, promote a different view, and demonstrate that the different modalities interact at early stages
of processing (Kayser and Logothetis 2007; Schroeder and Foxe 2005; Foxe and Schroeder 2005).
A good model for this early integration hypothesis has been the auditory cortex, where multisensory
influences from vision and touch have been reported using a number of methods and experimental
paradigms (Kayser et al. 2009c; Schroeder et al. 2003; Foxe and Schroeder 2005). In fact, anatomi-
cal afferents are available to provide information about nonacoustic stimuli (Rockland and Ojima
2003; Cappe and Barone 2005; Falchier et al. 2002) and neuronal responses showing cross-modal
influences have been described in detail (Lakatos et al. 2007; Kayser et al. 2008, 2009a; Bizley et al.
2006). These novel insights, together with the traditional notion that multisensory processes are
more prominent in higher association regions, suggest that sensory integration is a rather distributed
process that emerges over several stages.
Of particular interest in the context of sensory integration are stimuli with particular behavioral
significance, such as sights and sounds related to communication (Campanella and Belin 2007;
Petrini et al. 2009; Ghazanfar and Logothetis 2003; von Kriegstein and Giraud 2006; von Kriegstein
et al. 2006). Indeed, a famous scenario used to exemplify sensory integration—the cocktail party—
concerns exactly this: when in a loud and noisy environment, we can better understand a person
talking to us when we observe the movements of his/her lips at the same time (Sumby and Polack
99
100 The Neural Bases of Multisensory Processes
1954; Ross et al. 2007). In this situation, the visual information about lip movements enhances the
(perceived) speech signal, hence providing an example of how visual information can enhance
auditory perception. However, as for many psychophysical phenomena, the exact neural substrate
mediating the sensory integration underlying this behavioral benefit remains elusive.
In this review, we discuss some of the results of early multisensory influences on auditory pro-
cessing, and provide evidence that sensory integration occurs distributed and across several pro-
cessing stages. In particular, we discuss some of the methodological aspects relevant for studies
seeking to localize and characterize multisensory influences, and emphasize some of the recent
results pertaining to speech and voice integration.
This statistical criterion, in conjunction with the verification of these principles, has become the
standard approach to detect neural processes related to sensory integration. In addition, recent work
has introduced more elaborate concepts derived from information theory and stimulus decoding.
Such methods can be used to investigate whether neurons indeed become more informative about
the sensory stimuli, and whether they allow better stimulus discrimination in multisensory com-
pared to unisensory conditions (Bizley et al. 2006; Bizley and King 2008; Kayser et al. 2009a).
Frequency preferences
RPB High freq.
Core (PAC) High 8
RTM RT RTL
Belt 4
Parabelt RM 2
R AL Low 5 mm
1
Lateral
MM 0.5
A1 ML
High
CM
CL
CPB
Low
Rostral
Broad Narrow Broad
Lateral
Caudal
Bandwidth preference
Caudal
FIGURE 6.1 (See color insert.) Mapping individual auditory fields using fMRI. (a) Schematic of organi-
zation of monkey auditory cortex. Three primary auditory fields (core region) are surrounded by secondary
fields (belt region) as well as higher association areas (parabelt). Electrophysiological studies have shown that
several of these fields contain an ordered representation of sound frequency (tonotopic map, indicated on left),
and that core and belt fields prefer narrow- and broadband sounds, respectively. These two functional proper-
ties can be exploited to map layout of these auditory fields in individual subjects using functional imaging.
(b) Single-slice fMRI data showing frequency-selective BOLD responses to low and high tones (left panel)
and a complete (smoothed) frequency map obtained from stimulation using six frequency bands (right panel).
Combining frequency map with an estimate of core region and anatomical landmarks to delineate the parabelt
results in a full parcellation of auditory cortex in individual subjects. This parcellation is indicated in the left
panel as white dashed lines and is shown in full in panel a.
frequency preference map which allowed determining the anterior–posterior borders of potential
fields. In addition, the preference to sounds of different bandwidths often allowed a segregation of
core and belt fields, hence providing borders in medial–lateral directions. When combined with the
known organization of auditory cortex, the evidence from these activation patterns allowed a more
complete parcellation into distinct core and belt fields, and provided constraints for the localization
of the parabelt regions (Figure 6.1b). This functional localization procedure for auditory fields now
serves as a routine tool to delineate auditory structures in experiments involving auditory cortex.
(a) (b)
p<0.01 p<10–7
RT
R
MM A1
CM
CL
CPB
Percent change
Dorsal
(c) (d)
40
% of total activation
Caudal
30
Core
20
Medial
belt
Lateral 10
belt
STG 0
uSTS A1 Caudal Parabelt/ uSTS
belt STG
Lower areas Higher areas
Rostral
FIGURE 6.2 (See color insert.) Imaging multisensory influences in monkey auditory cortex. (a) Data from
an experiment with audiovisual stimulation. Sensory activation to auditory (left) and visual (right) stimuli are
shown on single image slices (red to yellow voxels). An outline of auditory fields is indicated (white lines).
Time course illustrates multisensory enhancement during combined audiovisual stimulation (data from one
session, averaged over 36 repeats of the stimulus). For details, see Kayser, C. et al. (2007). (b) Schematic of
auditory fields exhibiting significant visual influences. Visual influences (shown in blue) were most prominent
in caudal fields, and effects in A1 were only observed in alert animals. (From Kayser, C. et al., J. Neurosci.,
27, 1824–1835, 2007. With permission.) (c) Three-dimensional rendering of a segment of a monkey brain.
Different structures investigated in fMRI experiments are color coded and comprise classical auditory cortex
(core and belt) as well as auditory association cortex (parabelt) and general association cortex (STS). Please
note that this figure serves as an illustration only, and individual structures have been sketched based on
approximate anatomical location, not on functional criteria. (d) Strength of visual influence along auditory
hierarchy. The graph displays contribution of responses to (unimodal) visual stimuli to total fMRI-BOLD
activation obtained during auditory, visual, and audiovisual stimulation. This was computed as a fraction (in
percentage) of BOLD response to visual stimulation relative to sum of BOLD responses to all three conditions.
Visual contribution increases from lower to higher areas.
interactions in secondary and higher auditory regions occurred reliably in both anesthetized and
alert animals. In addition, we found multisensory interactions in the core region A1, but only in the
alert animal, indicating that these early interactions could be dependent on the vigilance of the ani-
mal, perhaps involving cognitive or top-down influences. To rule out nonspecific modulatory pro-
jections as the source of these effects, we tested two functional criteria of sensory integration: the
104 The Neural Bases of Multisensory Processes
principles of temporal coincidence and inverse effectiveness. We found both criteria to be obeyed,
and multisensory influences were stronger when sensory stimuli were in temporal coincidence and
when unisensory stimuli were less effective in eliciting BOLD responses. Overall, these findings
not only confirm previous results from human imaging, but also localize multisensory influences
mostly to secondary fields and demonstrate a clear spatial organization, with caudal regions being
most susceptible to multisensory inputs (Kayser et al. 2009c).
In addition to providing a good localization of cross-modal influences (the “where” question),
functional imaging can also shed light on the relative influence of visual stimuli on auditory pro-
cessing at several processing stages. Because fMRI allows measuring responses at many locations
at the same time, we were able to quantify visual influences along multiple stages in the caudal
auditory network (Figure 6.2c). Using the above-mentioned localization technique in conjunction
with anatomical landmarks, we defined several regions of interest outside the classical auditory
cortex: these comprised the caudal parabelt, the superior temporal gyrus, as well as the upper bank
of the STS (uSTS). The uSTS is a well-known multisensory area where neuronal responses as well
as fMRI activations to stimulation of several modalities have been described (Benevento et al.
1977; Bruce et al. 1981; Beauchamp et al. 2004, 2008; Dahl et al. 2009). As a result, one should
expect a corresponding increase in visual influence when proceeding from the auditory core to the
uSTS. This was indeed the case, as shown in Figure 6.2d: visual influences were relatively small
in auditory core and belt fields, as described above. In the parabelt/STG region, an auditory asso-
ciation cortex, visual influences already contributed a considerable proportion to the total activa-
tion, and were yet much stronger in the uSTS. As a rule of thumb, it seemed that the contribution
of visual stimuli to the total measured activation roughly doubled from stage to stage along this
hierarchy.
Although human functional imaging has described multisensory influences at different stages
of auditory processing, and in a number of behavioral contexts, imaging studies with the animal
model localized these influences to the identified areas. These results promote a model in which
multisensory influences already exist at early processing stages and progressively increase in higher
areas. This suggests that sensory integration is a distributed process involving several processing
stages to varying degrees, in opposition to the traditional idea of a modular organization of sensory
processing into independent unisensory processes modules.
2002), these observations demonstrate that multisensory input to the auditory cortex occurs at the
synaptic level. These results provide a direct neural basis for the multisensory influences seen in
imaging studies, but do not yet reveal whether the neural information representation benefits from
the multisensory input.
Other studies provide evidence for multisensory influences on the firing of individual neurons in
the auditory cortex. For example, measurements in ferret auditory cortex revealed that 15% of the
neurons in core fields are sensitive no nonauditory inputs such as flashes of light (Bizley et al. 2006;
and see Cappe et al. 2007 for similar results in monkeys). We investigated such visual influences
in the macaque and found that a similar proportion (12%) of neurons in the auditory core revealed
multisensory interactions in their firing rates. Of these, nearly 4% responded to both acoustic and
visual stimuli when presented individually, and hence, constitute bimodal neurons. The remain-
ing 8% responded to unimodal sounds but did not respond to unimodal visual stimuli; however,
their responses were enhanced (or reduced) by the simultaneous presentation of both stimuli. This
response pattern does not conform to the traditional notion of bimodal neurons but represents a
kind of multisensory influence typically called subthreshold response modulation (Dehner et al.
2004). Similar subthreshold response modulations have been observed in a number of cortical
areas (Allman et al. 2008a, 2008b; Allman and Meredith 2007; Meredith and Allman 2009), and
suggest that multisensory influences can fall along a continuum, ranging from true unimodal neu-
rons to the classical bimodal neuron that exhibits suprathreshold responses to stimuli in several
modalities.
Noteworthy, the fraction of neurons with significant multisensory influences in the auditory cor-
tex was considerably smaller than the fraction of sites showing similar response properties in the
local field potential (LFP), or the spatial area covered by the voxels showing multisensory responses
in the imaging data. Hence, although visual input seems to be widely present at the subthreshold
level, only a minority of neurons actually exhibit significant changes of their firing rates. This sug-
gests that the effect of visual stimulation on auditory information coding in early auditory cortex
is weaker than one would estimate from the strong multisensory influences reported in imaging
studies.
When testing the principles of temporal coincidence and inverse effectiveness for these audi-
tory cortex neurons, we found both to be obeyed: the relative timing of auditory and visual stimuli
was as important in shaping the multisensory influence as was the efficacy of the acoustic stimu-
lus (Kayser et al. 2008). Similar constraints of spatiotemporal stimulus alignment on audiovisual
response modulations in the auditory cortex have been observed in other studies as well (Bizley
et al. 2006). Additional experiments using either semantically congruent or incongruent audiovi-
suals revealed that visual influences in the auditory cortex also show specificity to more complex
stimulus attributes. For example, neurons integrating information about audiovisual communication
signals revealed reduced visual modulation when the acoustic communication call was paired with
a moving disk instead of the movie displaying the conspecific animal (Ghazanfar et al. 2008). A
recent study also revealed that pairing a natural sound with a mismatching movie abolishes multi-
sensory benefits for acoustic information representations (Kayser et al. 2009a). Altogether, this sug-
gests that visual influences in the primary and secondary auditory fields indeed provide functionally
specific visual information.
Given that imaging studies reveal an increase of multisensory influence in higher auditory
regions, one should expect a concomitant increase in the proportion of multisensory neurons.
Indeed, when probing neurons in a classical association cortex, such as the STS, much stronger
multisensory influences are visible in the neurons firing. Using the same stimuli and statistical
criteria, a recent study revealed a rather homogeneous population of unimodal and bimodal neu-
rons in the upper bank STS (Dahl et al. 2009): about half the neurons responded significantly to
both sensory modalities, whereas 28% of the neurons preferred the visual and 19% preferred the
auditory modality. Importantly, this study not only revealed a more complex interplay of auditory
and visual information representations in this region, but detailed electrophysiological mappings
106 The Neural Bases of Multisensory Processes
demonstrated that a spatial organization of neurons according to their modality preferences exists
in the STS: neurons preferring the same modality (auditory or visual) co-occurred in close spatial
proximity or occurred intermingled with bimodal neurons, whereas neurons preferring different
modalities occurred only spatially separated. This organization at the scale of individual neurons
led to extended patches of same modality preference when analyzed at the scale of millimeters,
revealing large-scale regions that preferentially respond to the same modality. These results lend
support to the notion that topographical organizations might serve as a general principle of inte-
grating information within and across the sensory modalities (Beauchamp et al. 2004; Wallace et
al. 2004).
These insights from studies of multisensory integration at the neuronal level are in concordance
with the notion that sensory integration is a distributed hierarchical process that extends over sev-
eral processing stages. Given the difficulty in characterizing and interpreting the detailed effect of
multisensory influences at a single processing stage, a comparative approach might prove useful:
comparing multisensory influences at different stages using the same stimuli might help not only
in understanding the contribution of individual stages to the process of sensory integration, but
also facilitate the understanding of the exact benefit for a particular region to receive multisensory
input.
In fact, the multisensory influences in this region were found to depend on stimulus parameters such
as the face–voice onset asynchrony or the match of visual and acoustic vocalizations, suggesting a
good degree of specificity of the visual input. At the other end of this pathway, in the ventrolateral
prefrontal cortex, 46% of the neurons were found to reflect audiovisual components of vocalization
signals (Sugihara et al. 2006). Although the existence of a dedicated “what” pathway is still debated
(Bizley and Walker 2009; Hall 2003; Wang 2000), these results highlight the prominence of multi-
sensory influences in the implicated areas.
In addition to these stages of the presumed “what” pathway, two other regions have recently
been highlighted in the context of vocal communication sounds. Recording in the primate
insula, we recently found a large cluster of neurons that respond preferentially to conspecific
vocalizations, when contrasted with a large set of other natural sounds (Remedios et al. 2009)
(Figure 6.3a). Many of these neurons not only responded more strongly to conspecific vocaliza-
tions, but also responded selectively to only a few examples, and their responses allowed the
decoding of the identity of individual vocalizations. This suggests that the insular cortex might
play an important role in the representation of vocal communication sounds. Noteworthy, this
response preference to conspecific vocalizations is also supported by functional imaging studies
in animals (Figure 6.3b) and humans (Griffiths et al. 1997; Rumsey et al. 1997; Kotz et al. 2003;
Meyer et al. 2002; Zatorre et al. 1994). In addition, lesions of the insula often manifest as deficits
in sound or speech recognition (auditory agnosia) and speech production, confirming a central
function of this structure in communication-related processes (Habib et al. 1995; Cancelliere and
Kertesz 1990; Engelien et al. 1995). Noteworthy, some of the neurons in this auditory responsive
region in the insula also show sensitivity to visual stimuli or response interactions during audio-
visual stimulation (R. Remedios and C. Kayser, unpublished data). However, the vast majority of
units in this structure are not affected by visual stimuli, suggesting that in this region, it is likely
not concerned with the sensory integration of information related to communication calls, but
mostly processes acoustic input.
Another region that has recently been implicated in the processing of communication sounds
is the so-called voice region in the anterior temporal lobe. A preference for the human voice,
in particular, the identity of a human speaker, has been found in the human anterior temporal
lobe (Belin and Zatorre 2003; Belin et al. 2000; von Kriegstein et al. 2003) and a similar pref-
erence for conspecific vocalizations and the identity of a monkey caller has been observed in
the anterior temporal lobe of the nonhuman primate (Petkov et al. 2008). For example, high-
resolution functional imaging revealed several regions in the superior temporal lobe responding
preferentially to the presentation of conspecific macaque vocalizations over other vocalizations
and natural sounds (see the red clusters in the middle panel of Figure 6.3c), as has been seen in
humans (Belin et al. 2000; von Kriegstein et al. 2003). These results can be interpreted as evi-
dence for sensitivity to the acoustic features that distinguish the vocalizations of members of the
species from other sounds. Further experiments have shown that one of these regions located in
the anterior temporal lobe respond more vigorously to sounds that come from different speakers,
whose meaning is constant, rather than to those that come from the same speaker, whose mean-
ing and acoustics vary (Belin and Zatorre 2003; von Kriegstein et al. 2003; Petkov et al. 2008).
These observations support the conclusion of a high-level correspondence in the processing of
species-specific vocal features and a common cross-species substrate in the brains of human and
nonhuman primates.
Noteworthy, this human voice region can also be influenced by multisensory input. For
instance, von Kriegstein and colleagues (2006) used face and voice stimuli to first localize the
human “face” and “voice” selective regions. They then showed that the activity of each of these
regions was modulated by multisensory input. Comparable evidence from the animal model
is still unavailable. Ongoing work in our laboratory is pursuing this question (Perrodin et al.
2009a, 2009b).
108 The Neural Bases of Multisensory Processes
40
20
Insula
0 500 1000 1500
Time (s)
% Bold response
AC
4
Population (155 units)
Rel. response
–2
Mvoc Asnd Esnd
(c)
Pro
Ts1
Voice area
Ts2
1.0
Normalized response
0.5
A1
0
Mvoc Asnd Esnd
Preference for
Tpt cons. vocalization
Preference for
Core Belt Parabelt other sounds
FIGURE 6.3 (See color insert.) Response preferences to (vocal) communication sounds. Preferences to con-
specific communication sounds have been found in insula (panels a and b) and in anterior temporal lobe (panel
c). In both cases, responses to conspecific communication sounds (Mvoc) have been contrasted with sounds of
other animals (Asnd) and environmental sounds (Esnd). (a) Data from an electrophysiological investigation of
insula neurons. (From Remedios, R. et al., J. Neurosci., 29, 1034–1045, 2009. With permission.) Upper panel
displays one example neuron, showing a strong response to Mvocs. Lower panel displays normalized popu-
lation response to three sound categories (mean ± SEM). (b) Example data from a single fMRI experiment
showing voxels significantly preferring conspecific vocalizations over other sounds (color code) in a single
slice. Such voxels were found in anterior auditory cortex (field TS2), core and lateral belt, and in insula. Bar
plot displays BOLD signal change for different conditions (mean ± SEM for insula voxels). (c) Identification
of a voice region in monkey brain using functional imaging. (From Petkov, C.I. et al., Nat. Neurosci., 11,
367–374, 2008. With permission.) Preferences to conspecific vocal sounds (red voxels) were found in caudal
auditory cortex (as also seen in b), and on anterior temporal lobe (voice area). This location of voice area is
consistent with studies on voice processing in human brain, and suggests a common basis of voice processing
in human and nonhuman primates. Bar plot displays BOLD signal change in voice region for different sound
conditions (mean ± SEM across experiments).
Multisensory Influences on Auditory Processing 109
6.7 CONCLUSIONS
During everyday actions, we benefit tremendously from the combined input provided by our different
sensory modalities. Although seldom experienced explicitly, only this combined sensory input makes
an authentic and coherent percept of our environment possible (Adrian 1928; Stein and Meredith
1993). In fact, multisensory integration helps us to react faster or with higher precision (Calvert et al.
2004; Hershenson 1962), improves our learning capacities (Montesori 1967; Oakland et al. 1998),
and sometimes even completely alters our percept (McGurk and MacDonald 1976). As a result, the
understanding of sensory integration and its neural basis not only shed insights into brain function
and perception, but could also provide improved strategies for learning and rehabilitation programs
(Shams and Seitz 2008).
Evidence from functional imaging and electrophysiology demonstrates that this process of sen-
sory integration is likely distributed across multiple processing stages. Multisensory influences
are already present at early stages, such as in the primary auditory cortex, but increase along the
processing hierarchy and are ubiquitous in higher association cortices. Existing data suggest that
multisensory influences at early stages are specific to basic stimulus characteristics such as spatial
and temporal localization, but are not specialized toward particular kinds of stimuli, such as com-
munication signals. Whether, where, and how multisensory influences become more specialized,
remains to be investigated by future work. In this search, a comparative approach comparing the
multisensory influences at multiple processing stages during the same stimulation paradigm might
prove especially useful. And as highlighted here, this ideally precedes using a combination of meth-
ods that probe neural responses at different spatiotemporal scales, such as electrophysiology and
functional imaging. Definitely, much remains to be learned until we fully understand the neural
basis underlying the behavioral gains provided by multisensory stimuli.
REFERENCES
Adrian, E.D. 1928. The Basis of Sensations. New York, Norton.
Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal sub-
threshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–9.
Allman, B.L., R.E. Bittencourt-Navarrete, L.P. Keniston et al. 2008a. Do cross-modal projections always result
in multisensory integration? Cerebral Cortex 18:2066–76.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008b. Subthreshold auditory inputs to extrastriate visual
neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific
coding. Brain Research 1242:95–101.
Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004. Unraveling multisensory integra-
tion: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–2.
Beauchamp, M.S., N.E. Yasar, R.E. Frye, and T. Ro. 2008. Touch, sound and vision in human superior temporal
sulcus. NeuroImage 41:1011–20.
Belin, P., and R.J. Zatorre. 2003. Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport
14:2105–9.
Belin, P., R.J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex.
Nature 403:309–12.
Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57:849–72.
Bernstein, L.E., E.T. Auer Jr., J.K. Moore et al. 2002. Visual speech perception without primary auditory cortex
activation. Neuroreport 13:311–5.
Bizley, J.K., and A.J. King. 2008. Visual–auditory spatial processing in auditory cortical neurons. Brain
Research 1242:24–36.
Bizley, J.K., and K.M. Walker. 2009. Distributed sensitivity to conspecific vocalizations and implications for
the auditory dual stream hypothesis. Journal of Neuroscience 29:3011–3.
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2006. Physiological and anatomical evidence for
multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89.
110 The Neural Bases of Multisensory Processes
Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84.
Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cerebral Cortex 11:1110–23.
Calvert, G.A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of
visible speech. Journal of Cognitive Neuroscience 15:57–70.
Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific cortices
during crossmodal binding. Neuroreport 10:2619–23.
Calvert, G., C. Spence, and B.E. Stein. 2004. The Handbook of Multisensory Processes. Cambridge: MIT
Press.
Calvert, G.A., E.T. Bullmore, M.J. Brammer et al. 1997. Activation of auditory cortex during silent lipreading.
Science 276:593–6.
Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends in Cognitive
Sciences 11:535–43.
Cancelliere, A.E., and A. Kertesz. 1990. Lesion localization in acquired deficits of emotional expression and
comprehension. Brain and Cognition 13:133–47.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. European Journal of Neuroscience 22:2886–902.
Cappe, C., G. Loquet, P. Barone, and E.M. Rouiller. 2007. Neuronal responses to visual stimuli in auditory
cortical areas of monkeys performing an audio-visual detection task. European Brain and Behaviour
Society. Trieste.
Chiry, O., E. Tardif, P.J. Magistretti, and S. Clarke. 2003. Patterns of calcium-binding proteins support
parallel and hierarchical organization of human auditory areas. European Journal of Neuroscience
17:397–410.
Clarke, S., and F. Rivier. 1998. Compartments within human primary auditory cortex: Evidence from cyto-
chrome oxidase and acetylcholinesterase staining. European Journal of Neuroscience 10:741–5.
Cohen, Y.E., F. Theunissen, B.E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their
representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97:1470–84.
Dahl, C., N. Logothetis, and C. Kayser. 2009. Spatial organization of multisensory responses in temporal asso-
ciation cortex. Journal of Neuroscience 29:11924–32.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Engel, S.A., D.E. Rumelhart, B.A. Wandell et al. 1994. fMRI of human visual cortex. Nature 369:525.
Engelien, A., D. Silbersweig, E. Stern et al. 1995. The functional anatomy of recovery from auditory
agnosia. A PET study of sound categorization in a neurological patient and normal controls. Brain
118(Pt 6):1395–409.
Ernst, M.O., and H.H. Bülthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Science
8:162–9.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience 22:5749–59.
Formisano, E., D.S. Kim, F. Di Salle et al. 2003. Mirror-symmetric tonotopic maps in human primary auditory
cortex. Neuron 40:859–69.
Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical
processing. Neuroreport 16:419–23.
Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory–somatosensory multisensory processing in auditory
association cortex: An fMRI study. Journal of Neurophysiology 88:540–3.
Fullerton, B.C., and D.N. Pandya. 2007. Architectonic analysis of the auditory-related areas of the superior
temporal region in human brain. Journal of Comparative Neurology 504:470–98.
Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience 28:4457–69.
Ghazanfar, A.A., and N.K. Logothetis. 2003. Neuroperception: Facial expressions linked to monkey calls.
Nature 423:937–8.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–85.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Multisensory Influences on Auditory Processing 111
Griffiths, T.D., A. Rees, C. Witton et al. 1997. Spatial and temporal auditory processing deficits following right
hemisphere infarction. A psychophysical study. Brain 120(Pt 5):785–94.
Habib, M., G. Daquin, L. Milandre et al. 1995. Mutism and auditory agnosia due to bilateral insular damage—
role of the insula in human communication. Neuropsychologia 33:327–39.
Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1998. Subdivisions of auditory cortex and ipsilateral cortical connec-
tions of the parabelt auditory cortex in macaque monkeys. Journal of Comparative Neurology 394:475–95.
Hall, D.A. 2003. Auditory pathways: Are ‘what’ and ‘where’ appropriate? Current Biology 13:R406–8.
Hershenson, M. 1962. Reaction time as a measure of intersensory facilitation. Journal of Experimental
Psychology 63:289–93.
Jones, E.G., and T.P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral
cortex of the monkey. Brain 93:793–820.
Juergens, E., A. Guettler, and R. Eckhorn. 1999. Visual stimulation elicits locked and induced gamma oscilla-
tions in monkey intracortical- and EEG-potentials, but not in human EEG. Experimental Brain Research
129:247–59.
Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates.
Proceedings of the National Academy of Sciences of the United States of America 97:11793–9.
Kayser, C., and N.K. Logothetis. 2007. Do early sensory cortices integrate cross-modal information? Brain
Structure and Function 212:121–32.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48:373–84.
Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation
of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–74.
Kayser, C., N. Logothetis, and S. Panzeri. 2009a. Visual enhancement of the information representation in audi-
tory cortex. Current Biology (in press).
Kayser, C., M.A. Montemurro, N. Logothetis, and S. Panzeri. 2009b. Spike-phase coding boosts and stabilizes
the information carried by spatial and temporal spike patterns. Neuron 61:597–608.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2009c. Multisensory interactions in primate auditory cortex:
fMRI and electrophysiology. Hearing Research (in press). doi:10.1016/j.heares.2009.02.011.
Kosaki, H., T. Hashikawa, J. He, and E.G. Jones. 1997. Tonotopic organization of auditory cortical fields
delineated by parvalbumin immunoreactivity in macaque monkeys. Journal of Comparative Neurology
386:304–16.
Kotz, S.A., M. Meyer, K. Alter et al. 2003. On the lateralization of emotional prosody: An event-related func-
tional MR investigation. Brain and Language 86:366–76.
Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multi-
sensory interaction in primary auditory cortex. Neuron 53:279–92.
Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity
as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental
Brain Research 166:289–97.
Lauritzen, M. 2005. Reading vascular changes in brain imaging: Is dendritic calcium the key? Nature
Neuroscience Reviews 6(1):77–85.
Lehmann, C., M. Herdener, F. Esposito et al. 2006. Differential patterns of multisensory interactions in core and
belt areas of human auditory cortex. NeuroImage 31:294–300.
Leopold, D.A. 2009. Neuroscience: Pre-emptive blood flow. Nature 457:387–8.
Logothetis, N.K. 2002. The neural basis of the blood-oxygen-level-dependent functional magnetic resonance
imaging signal. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
357:1003–37.
Logothetis, N.K. 2008. What we can do and what we cannot do with fMRI. Nature 453:869–78.
Logothetis, N.K., H. Guggenberger, S. Peled, and J. Pauls. 1999. Functional imaging of the monkey brain.
Nature Neuroscience 2:555–62.
Martuzzi, R., M.M. Murray, C.M. Michel et al. 2006. Multisensory interactions within human primary cortices
revealed by BOLD dynamics. Cerebral Cortex 17:1672–9.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–8.
Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex.
Neuroreport 20:126–31.
Merzenich, M.M., and J.F. Brugge. 1973. Representation of the cochlear partition of the superior temporal
plane of the macaque monkey. Brain Research 50:275–96.
112 The Neural Bases of Multisensory Processes
Meyer, M., K. Alter, A.D. Friederici, G. Lohmann, and D.Y. Von Cramon. 2002. FMRI reveals brain regions
mediating slow prosodic modulations in spoken sentences. Human Brain Mapping 17:73–88.
Mitzdorf, U. 1985. Current source-density method and application in cat cerebral cortex: Investigation of
evoked potentials and EEG phenomena. Physiological Reviews 65:37–100.
Montessori, M. 1967. The Absorbent Mind. New York: Henry Holt & Co.
Morel, A., P.E. Garraghty, and J.H. Kaas. 1993. Tonotopic organization, architectonic fields, and connections of
auditory cortex in macaque monkeys. Journal of Comparative Neurology 335:437–59.
Oakland, T., J.L. Black, G. Stanford, N.L. Nussbaum, and R.R. Balise. 1998. An evaluation of the dyslexia
training program: A multisensory method for promoting reading in students with reading disabilities.
Journal of Learning Disabilities 31:140–7.
Pekkola, J., V. Ojanen, T. Autti et al. 2005. Attention to visual speech gestures enhances hemodynamic activity
in the left planum temporale. Human Brain Mapping 27:471–7.
Perrodin, C., C. Kayser, N. Logothetis, and C. Petkov. 2009a. Visual influences on voice-selective neurons
in the anterior superior-temporal plane. International Conference on Auditory Cortex. Madgeburg,
Germany, 2009.
Perrodin, C., L. Veit, C. Kayser, N.K. Logothetis, and C.I. Petkov. 2009b. Encoding properties of neurons sensi-
tive to species-specific vocalizations in the anterior temporal lobe of primates. International Conference
on Auditory Cortex. Madgeburg, Germany, 2009.
Petkov, C.I., C. Kayser, M. Augath, and N.K. Logothetis. 2006. Functional imaging reveals numerous fields in
the monkey auditory cortex. PLoS Biology 4:e215.
Petkov, C.I., C. Kayser, T. Steudel et al. 2008. A voice region in the monkey brain. Nature Neuroscience
11:367–74.
Petkov, C.I., C. Kayser, M. Augath, and N.K. Logothetis. 2009. Optimizing the imaging of the monkey auditory
cortex: Sparse vs. continuous fMRI. Magnetic Resonance Imaging 27:1065–73.
Petrini, K., M. Russell, and F. Pollick. 2009. When knowing can replace seeing in audiovisual integration of
actions. Cognition 110:432–9.
Rauschecker, J.P. 1998. Cortical processing of complex sounds. Current Opinion in Neurobiology 8:516–21.
Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of what and where in auditory
cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11800–6.
Rauschecker, J.P., and B. Tian. 2004. Processing of band-passed noise in the lateral auditory belt cortex of the
rhesus monkey. Journal of Neurophysiology 91:2578–89.
Rauschecker, J.P., B. Tian, and M. Hauser. 1995. Processing of complex sounds in the macaque nonprimary
auditory cortex. Science 268:111–4.
Rauschecker, J.P., B. Tian, T. Pons, and M. Mishkin. 1997. Serial and parallel processing in rhesus monkey
auditory cortex. Journal of Comparative Neurology 382:89–103.
Recanzone, G.H., D.C. Guard, and M.L. Phan. 2000. Frequency and intensity response properties of single neu-
rons in the auditory cortex of the behaving macaque monkey. Journal of Neurophysiology 83:2315–31.
Remedios, R., N.K. Logothetis, and C. Kayser. 2009. An auditory region in the primate insular cortex respond-
ing preferentially to vocal communication sounds. Journal of Neuroscience 29:1034–45.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50:19–26.
Romanski, L.M., B. Tian, J. Fritz et al. 1999. Dual streams of auditory afferents target multiple domains in the
primate prefrontal cortex. Nature Neuroscience 2:1131–6.
Romanski, L.M., B.B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate
ventrolateral prefrontal cortex. Journal of Neurophysiology 93:734–47.
Ross, L.A., D. Saint-Amour, V.M. Leavitt, D.C. Javitt, and J.J. Foxe. 2007. Do you see what I am saying?
Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex 17:
1147–53.
Rumsey, J.M., B. Horwitz, B.C. Donohue et al. 1997. Phonological and orthographic components of word
recognition. A PET-rCBF study. Brain 120(Pt 5):739–59.
Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas
of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–98.
Schroeder, C.E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current
Opinion in Neurobiology 15:454–8.
Schroeder, C.E., R.W. Lindsley, C. Specht et al. 2001. Somatosensory input to auditory association cortex in
the macaque monkey. Journal of Neurophysiology 85:1322–7.
Schroeder, C.E., J. Smiley, K.G. Fu et al. 2003. Anatomical mechanisms and functional implications of multi-
sensory convergence in early cortical processing. International Journal of Psychophysiology 50:5–17.
Multisensory Influences on Auditory Processing 113
Schurmann, M., G. Caetano, Y. Hlushchuk, V. Jousmaki, and R. Hari. 2006. Touch activates human auditory
cortex. NeuroImage 30:1325–31.
Shams, L., and A.R. Seitz. 2008. Benefits of multisensory learning. Trends in Cognitive Sciences 12:411–7.
Stein, B.E. 1998. Neural mechanisms for synthesizing sensory information and producing adaptive behaviors.
Experimental Brain Research 123:124–35.
Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single
neuron. Nature Reviews Neuroscience 9:255–66.
Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge: MIT Press.
Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communi-
cation information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–47.
Sumby, W.H., and I. Polack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical
Society of America 26:212–5.
Talavage, T.M., M.I. Sereno, J.R. Melcher et al. 2004. Tonotopic organization in human auditory cortex revealed
by progressions of frequency sensitivity. Journal of Neurophysiology 91:1282–96.
Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus
monkey auditory cortex. Science 292:290–3.
van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in
the human brain. Neuron 43:271–82.
van Wassenhove, V., K.W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of
auditory speech. Proceedings of the National Academy of Sciences of the United States of America
102:1181–6.
von Kriegstein, K., and A.L. Giraud. 2006. Implicit multisensory associations influence voice recognition.
PLoS Biol 4:e326.
von Kriegstein, K., E. Eger, A. Kleinschmidt, and A.L. Giraud. 2003. Modulation of neural responses to speech
by directing attention to voices or verbal content. Brain Research. Cognitive Brain Research 17:48–55.
von Kriegstein, K., A. Kleinschmidt, and A.L. Giraud. 2006. Voice recognition and cross-modal responses to
familiar speakers’ voices in prosopagnosia. Cerebral Cortex 16:1314–22.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–72.
Wang, X. 2000. On cortical coding of vocal communication sounds in primates. Proceedings of the National
Academy of Sciences of the United States of America 97:11843–9.
Warnking, J., M. Dojat, A. Guerin-Dugue et al. 2002. fMRI retinotopic mapping—step by step. NeuroImage
17:1665–83.
Wessinger, C.M., J. Vanmeter, B. Tian et al. 2001. Hierarchical organization of the human auditory cortex
revealed by functional magnetic resonance imaging. Journal of Cognitive Neuroscience 13:1–7.
Zatorre, R.J., A.C. Evans, and E. Meyer. 1994. Neural mechanisms underlying melodic perception and memory
for pitch. Journal of Neuroscience 14:1908–19.
7 Multisensory Integration
through Neural Coherence
Andreas K. Engel, Daniel Senkowski, and Till R. Schneider
CONTENTS
7.1 Introduction........................................................................................................................... 115
7.2 Views on Cross-Modal Integration........................................................................................ 116
7.2.1 Integration by Convergence....................................................................................... 116
7.2.2 Integration through Neural Coherence...................................................................... 117
7.3 Oscillatory Activity in Cross-Modal Processing................................................................... 117
7.3.1 Oscillations Triggered by Multisensory Stimuli....................................................... 117
7.3.2 Effects of Cross-Modal Semantic Matching on Oscillatory Activity....................... 119
7.3.3 Modulation of Cross-Modal Oscillatory Responses by Attention............................ 119
7.3.4 Percept-Related Multisensory Oscillations................................................................ 121
7.4 Functional Role of Neural Synchrony for Cross-Modal Interactions.................................... 123
7.5 Outlook.................................................................................................................................. 125
References....................................................................................................................................... 126
7.1 INTRODUCTION
The inputs delivered by different sensory organs provide us with complementary information about
the environment. Constantly, multisensory interactions occur in the brain to evaluate cross-modal
matching or conflict of such signals. The outcome of these interactions is of critical importance
for perception, cognitive processing, and the control of action (Meredith and Stein 1983, 1985;
Stein and Meredith 1993; Macaluso and Driver 2005; Kayser and Logothetis 2007). Recent stud-
ies have revealed that a vast amount of cortical operations, including those carried out by primary
regions, are shaped by inputs from multiple sensory modalities (Amedi et al. 2005; Ghazanfar and
Schroeder 2006; Kayser and Logothetis 2007, 2009). Multisensory integration is highly automatized
and can even occur when there is no meaningful relationship between the different sensory inputs
and even under conditions with no perceptual awareness, as demonstrated in pioneering research on
multisensory interactions in the superior colliculus of anesthetized cats (Meredith and Stein 1983,
1985; Stein and Meredith 1993; Stein et al. 2002). Clearly, these findings suggest the fundamental
importance of multisensory processing for development (Sur et al. 1990; Shimojo and Shams 2001;
Bavelier and Neville 2002) and normal functioning of the nervous system.
In recent years, an increasing number of studies has aimed at characterizing multisensory corti-
cal regions, revealing multisensory processing in the superior temporal sulcus, the intraparietal
sulcus, frontal regions as well as the insula and claustrum (Calvert 2001; Ghazanfar and Schroeder
2006; Kayser and Logothetis 2007). Interestingly, there is increasing evidence that neurons in areas
formerly considered unimodal, such as auditory belt areas (Foxe et al. 2002; Kayser et al. 2005;
Macaluso and Driver 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007), can also
exhibit multisensory characteristics. Furthermore, numerous subcortical structures are involved in
multisensory processing. In addition to the superior colliculus (Meredith and Stein 1983, 1985),
115
116 The Neural Bases of Multisensory Processes
this includes the striatum (Nagy et al. 2006), the cerebellum (Baumann and Greenlee 2007), the
amygdala (Nishijo et al. 1988), and there is evidence for cross-modal interactions at the level of the
thalamus (Komura et al. 2005).
Whereas the ubiquity and fundamental relevance of multisensory processing have become
increasingly clear, the neural mechanisms underlying multisensory interaction are much less
well understood. In this chapter, we review recent studies that may cast new light on this issue.
Although classical studies have postulated a feedforward convergence of unimodal signals as the
primary mechanism for multisensory integration (Stein and Meredith 1993; Meredith 2002), there
is now evidence that both feedback and lateral interaction may also be relevant (Driver and Spence
2000; Foxe and Schroeder 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007).
Beyond this changing view on the anatomical substrate, there is increasing awareness that com-
plex dynamic interactions of cell populations, leading to coherent oscillatory firing patterns, may
be crucial for mediating cross-systems integration in the brain (von der Malsburg and Schneider
1986; Singer and Gray 1995; Singer 1999; Engel et al. 1992, 2001; Varela et al. 2001; Herrmann
et al. 2004a; Fries 2005). Here, we will consider the hypothesis that synchronized oscillations may
also provide a potential mechanism for cross-modal integration and for the selection of informa-
tion that is coherent across different sensory channels. We will (1) contrast the two different views
on cross-modal integration that imply different mechanisms (feedforward convergence vs. neural
coherence), (2) review recent studies on oscillatory responses and cross-modal processing, and
(3) discuss functional aspects and scenarios for the involvement of neural coherence in cross-modal
interaction.
conditions of passive stimulation (i.e., subjects were not required to perform any task), the authors
reported an increase of phase coherence in the lower beta band between temporal and parietal
electrode sites. The authors therefore suggested that meaningful semantic inputs are processed in a
modality-independent network of temporal and parietal areas.
Additional evidence for the involvement of oscillatory beta responses in multisensory process-
ing comes from a study in which subjects were instructed to respond to the appearance of any
stimulus in a stream of semantically meaningless auditory, visual, and multisensory audiovisual
stimuli (Senkowski et al. 2006). In the cross-modal condition, an enhancement was observed for
evoked oscillations, i.e., early oscillatory activity that is phase-locked to stimulus onset. This inte-
gration effect, which specifically occurred in the beta band, predicted the shortening of reaction
times observed for multisensory audiovisual stimuli, suggesting an involvement of beta activity
in the multisensory processing of behaviorally relevant stimuli. Cross-modal effects on evoked
beta responses have been also reported in a sensory gating paradigm (Kisley and Cornwell 2006),
in which auditory and somatosensory stimuli were presented at short or long interstimulus inter-
vals under conditions of passive stimulation. Higher auditory and somatosensory evoked beta
responses were found when the preceding stimulus was from the other compared to when it was
from the same modality, suggesting a cross-modal gating effect on the oscillatory activity in
this frequency range. Further EEG investigations have focused on the examination of oscilla-
tory activity in response to basic auditory, visual, and audiovisual stimuli during passive stimula-
tion (Sakowitz et al. 2000, 2001, 2005). In these studies, multisensory interactions were found in
evoked oscillatory responses across a wide range of frequencies and across various scalp sites,
indicating an involvement of neural synchronization of cell assemblies in different frequency
bands and brain regions.
Compelling evidence for an association between oscillatory responses and multisensory process-
ing comes from a recent study on somatosensory modulation of processing in primary auditory
cortex of alert monkeys (Lakatos et al. 2007). The authors investigated the effect of median nerve
stimulation on auditory responses and observed a pronounced augmentation of oscillations in the
delta, theta, and gamma frequency ranges. Further analysis revealed that this effect was mainly due
to a phase resetting of auditory oscillations by the somatosensory inputs. Another intriguing obser-
vation in the same study was that systematic variation of the relative delay between somatosensory
and auditory inputs lead to multisensory response enhancements at intervals corresponding to the
cycle length of gamma, theta, and delta band oscillations. In contrast, for intermediate delays, the
paired stimulus response was smaller than the responses to auditory stimuli alone. Further support
for phase resetting as a potential mechanism of cross-modal interaction comes from a recent study
focusing on visual modulation of auditory processing in the monkey (Kayser et al. 2008). Using
auditory and visual stimuli while recording in the auditory core and belt regions of awake behav-
ing monkeys, the authors observed both enhancement and suppression of unit and field potential
responses. Importantly, visual stimuli could be shown to modulate the phase angle of auditory alpha
and theta band activity.
Two recent studies have addressed interactions between auditory and multisensory regions in the
superior temporal sulcus in behaving monkeys. One of the studies examined the effect of audiovi-
sual looming signals on neural oscillations in the two regions (Maier et al. 2008). The main finding
of this study was enhanced gamma band coherence between the two structures for cross-modally
coherent looming signals compared to unimodal or receding motion inputs. This suggests that
coupling of neuronal populations between primary sensory areas and higher-order multisensory
structures may be functionally relevant for the integration of audiovisual signals. In a recent study,
Kayser and Logothetis (2009) have investigated directed interactions between auditory cortex and
multisensory sites in the superior temporal sulcus. Their analysis, which was confined to frequen-
cies below the gamma band, suggests that superior temporal regions provide one major source of
visual influences to the auditory cortex and that the beta band is involved in directed information
flow through coupled oscillations.
Multisensory Integration through Neural Coherence 119
In line with other studies (Foxe et al. 2002; Kayser et al. 2005; Ghazanfar and Schroeder 2006;
Kayser and Logothetis 2007), these data support the notion that inputs from other modalities and
from multisensory association regions can shape, in a context-dependent manner, the processing of
stimuli in presumed unimodal cortices. Taken together, the findings discussed above suggest that
modulation of both the power and the phase of oscillatory activity could be important mechanisms
of cross-modal interaction.
Incongruent
sheep ring
80
70
(%) (%)
60
50
40
30 0 0
–200 0 200 400 600 800 –200 0 200 400 600 800 –200 0 200 400 600 800
Time (ms) Time (ms) Time (ms)
(c)
x = –52 y = –32 z = –8
FIGURE 7.1 Enhanced gamma band activity during semantic cross-modal matching. (a) Semantically
congruent and incongruent objects were presented in a cross-modal visual-to-auditory priming paradigm.
(b) GBA in response to auditory target stimuli (S2) was enhanced following congruent compared to incongru-
ent stimuli. Square in right panel indicates a time-frequency window in which GBA difference was signifi-
cant. (c) Source localization of GBA (40–50 Hz) between 120 and 180 ms after auditory stimulus onset (S2)
using “linear beamforming” method (threshold at z = 2.56). Differences between congruent and incongruent
conditions are prominent in left medial temporal gyrus (BA 21; arrow). This suggests that enhanced GBA
reflects cross-modal semantic matching processes in lateral temporal cortex. (Adapted with permission from
Schneider, T.R. et al., NeuroImage, 42, 1244–1254, 2008.)
an important role in multisensory processing (Driver and Spence 2000; Macaluso et al. 2000; Foxe
et al. 2005; Talsma and Woldorff 2005). The effect of spatial selective attention on GBA in a mul-
tisensory setting has recently been investigated (Senkowski et al. 2005). Subjects were presented
with a stream of auditory, visual, and combined audiovisual stimuli to the left and right hemispaces
and had to attend to a designated side to detect occasional target stimuli in either modality. An
Multisensory Integration through Neural Coherence 121
enhancement of the evoked GBA was found for attended compared to unattended multisensory
stimuli. In contrast, no effect of spatial attention was observed for unimodal stimuli. An additional
analysis of the gamma band phase distribution suggested that attention primarily acts to enhance
GBA phase-locking, compatible with the idea already discussed above that cross-modal interactions
can affect the phase of neural signals.
The effects of nonspatial intermodal attention and the temporal relation between auditory and
visual inputs on the early evoked GBA have been investigated in another EEG study (Senkowski
et al. 2007). Subjects were presented with a continuous stream of centrally presented unimodal
and bimodal stimuli while they were instructed to detect an occasional auditory or visual target.
Using combined auditory and visual stimuli with different onset delays revealed clear effects on the
evoked GBA. Although there were no significant differences between the two attention conditions,
an enhancement of the GBA was observed when auditory and visual inputs of multisensory stimuli
were presented simultaneously (i.e., 0 ± 25 ms; Figure 7.2). This suggests that the integration of
auditory and visual inputs, as reflected in high-frequency oscillatory activity, is sensitive to the rela-
tive onset timing of the sensory inputs.
(c)
A|V(0±25) Auditory only Difference
80 0.2 0.08
70
Frequency (Hz)
60
50 (µV) (µV)
40
30
20 0 0
–100 0 100 200 300 400 –100 0 100 200 300 400 –100 0 100 200 300 400
Time (ms) Time (ms) Time (ms)
0.15 0.06
50–100ms
(µV) (µV)
0 0
FIGURE 7.2 Effect of relative timing of multisensory stimuli on gamma band oscillations. (a) Horizontal
gratings and sinusoidal tones were presented with different stimulus onset asynchronies. (b) GBA to auditory
and visual components of multisensory audiovisual stimuli were extracted for five asynchrony ranges centered
about –100, –50, 0, +50, and +100 ms delay between visual and auditory stimulus, respectively. GBA evoked
with multisensory inputs was compared to GBA to unisensory control stimuli. (c) An enhancement of evoked
GBA compared to unimodal input was observed when auditory and visual inputs were presented with smallest
relative onset asynchrony window (0 ± 25 ms). This shows that precision of temporal synchrony has an effect
on early cross-modal processing as reflected by evoked GBA. (Adapted with permission from Senkowski, D.
et al., Neuropsychologia, 45, 561–571, 2007.)
finger receiving a tactile stimulus, as compared to a spatial cross-modal misalignment. This finding
suggests a close relationship between multisensory tactile–visual stimulation and phase coherence
in gamma band oscillations. In sum, the findings discussed in this section suggest that oscillatory
activity, in particular at gamma band frequencies, can reflect perceptual changes resulting from
cross-modal interactions.
Multisensory Integration through Neural Coherence 123
Multisensory
(a) (b) parietal
cortex
Audi-
tory cortex Visual
cortex Multisensory
temporal cortex
Multisensory Multisensory
(c) parietal (d) Premotor parietal
cortex cortex cortex
Prefrontal
cortex
Audi-
tory cortex Visual
Multisensory cortex Multisensory
temporal cortex temporal cortex
FIGURE 7.3 Scenarios for large-scale neural communication during cross-modal perception. The model
proposed here is compatible with a number of different patterns of neural interactions. The figure refers to
the case of audiovisual interactions. (a) Multisensory interactions by coherence change between early sen-
sory areas. (b) Alternatively, changes in neural coherence or power might occur mainly within or between
multisensory association cortices, e.g., superior temporal and parietal regions. (c) Combining both scenarios,
neural synchrony among unimodal regions could also be associated with enhanced oscillatory activity in mul-
tisensory areas. (d) Multisensory perception might also involve oscillatory activity in frontal regions, which is
likely to exert a modulatory influence on temporal patterns in parietal and temporal regions.
et al. 2007) and in modulating synaptic weights (Markram et al. 1997; Bi and Poo 2001), such a
mechanism would then lead to a selection of strongly synchronized populations and suppression of
decorrelated activity.
A third case may be cross-modal modulation, i.e., the bias of a percept by concurrent input from
a different sensory modality. The model suggested here predicts that the inputs from the second
modality can change the temporal structure of activity patterns in the first modality. One possible
mechanism for such a modulation by oscillatory inputs is suggested by studies discussed above
(Lakatos et al. 2007; Kayser et al. 2008). Both “lateral” interactions between assemblies in early
areas as well as top-down influences could lead to a shift in phase of the respective local oscillations,
thus entraining the local population into a temporal pattern that may be optimally suited to enhance
the effect on downstream assemblies. The prediction is that this phase resetting or phase shift-
ing should be maximally effective in case of spatial, temporal, or semantic matching cross-modal
information. Such a mechanism might help to explain why cross-modal context can often lead to
biases in the processing of information in one particular sensory system and might contribute to
understanding the nature of “early” multisensory integration (Foxe and Schroeder 2005). Because
such modulatory effects might occur on a range of time scales (defined by different frequency bands
in oscillatory activity), this mechanism may also account for broader temporal integration windows
that have been reported for multisensory interactions (Vroomen and Keetels 2010).
Finally, our hypothesis might also help to account for key features of multisensory process-
ing such as the superadditivity or subadditivity of responses (Stein and Meredith 1993; Meredith
2002) and the principle of “inverse effectiveness” (Kayser and Logothetis 2007). Because of non-
linear dendritic processing, appropriately timed inputs will generate a much stronger postsynaptic
response in target neuronal populations than temporally uncoordinated afferent signals (König et al.
1996; Singer 1999; Fries 2005) and, therefore, matching cross-modal inputs can have an impact
that differs strongly from the sum of the unimodal responses. Conversely, incongruent signals from
two modalities might result in temporally desynchronized inputs and, therefore, in “multisensory
depression” in downstream neural populations (Stein et al. 2002).
7.5 OUTLOOK
Although partially supported by data, the hypothesis that neural synchrony may play a role in
multisensory processing clearly requires further experimental testing. Thus far, only a relatively
small number of multisensory studies have used coherence measures to explicitly address interac-
tions across different neural systems. Very likely, substantial progress can be achieved by studies
in humans if the approaches are suitable to capture dynamic cross-systems interactions among
specific brain regions. Such investigations may be carried out using MEG (Gross et al. 2001; Siegel
et al. 2007, 2008), combination of EEG with functional magnetic resonance imaging (Debener et
al. 2006) or intracerebral multisite recordings (Lachaux et al. 2003), if the recordings are com-
bined with advanced source modeling techniques (Van Veen et al. 1997) and analysis methods
that quantify, e.g., directed information transfer between the activated regions (Supp et al. 2007).
In addition, some of the earlier EEG studies on multisensory oscillations involving visual stimuli
(e.g., Yuval-Greenberg and Deouell 2007) seem to be confounded by artifacts relating to microsac-
cades (Yuval-Greenberg et al. 2008), a methodological issue that needs to be clarified and possibly
can be avoided by using MEG (Fries et al. 2008). To characterize the role of correlated activity for
multisensory processing at the cellular level, further microelectrode studies in higher mammals will
be indispensable.
The model put forward here has several implications. We believe that the study of synchroniza-
tion phenomena may lead to a new view on multisensory processing that considers the dynamic
interplay of neural populations as a key to cross-modal integration and stipulates the development of
new research approaches and experimental strategies. Conversely, the investigation of multisensory
interactions may also provide a crucial test bed for further validation of the temporal correlation
126 The Neural Bases of Multisensory Processes
hypothesis (Engel et al. 1992; Singer and Gray 1995; Singer 1995), because task- or percept-related
changes in coherence between independent neural sources have hardly been shown in humans thus
far. In this context, the role of oscillations in different frequency bands is yet another unexplored
issue that future studies will have to address. As discussed above, multisensory effects are often, but
not exclusively, observed in higher frequency ranges, and it is unclear why gamma band oscillations
figure so prominently.
Finally, abnormal synchronization across sensory channels may play a role in conditions of
abnormal cross-modal perception such as synesthesia (Hubbard and Ramachandran 2005) or in
disorders such as schizophrenia or autism. In synesthesia, excessively strong multisensory coher-
ence might occur, which then would not just modulate processing in unimodal regions but actually
drive sensory neurons even in the absence of a proper stimulus. In contrast, abnormal weakness of
cross-modal coupling might account for the impairment of multisensory integration that is observed
in patients with schizophrenia (Ross et al. 2007) or autism (Iarocci and McDonald 2006). Thus,
research on cross-modal binding may help to advance our understanding of brain disorders that
partly result from dysfunctional integrative mechanisms (Schnitzler and Gross 2005; Uhlhaas and
Singer 2006).
REFERENCES
Amedi, A., K. von Kriegstein, N.M. van Atteveldt, M.S. Beauchamp, M.J. Naumer. 2005. Functional imaging
of human crossmodal identification and object recognition. Experimental Brain Research 166:559–571.
Bauer, M., R. Oostenveld, M. Peeters, P. Fries. 2006. Tactile spatial attention enhances gamma-band activ-
ity in somatosensory cortex and reduces low-frequency activity in parieto-occipital areas. Journal of
Neuroscience 26:490–501.
Baumann, O., and M.W. Greenlee. 2007. Neural correlates of coherent audiovisual motion perception. Cerebral
Cortex 17:1433–1443.
Bavelier, D., and H.J. Neville. 2002. Cross-modal plasticity: Where and how? Nature Reviews. Neuroscience
3:443–452.
Bhattacharya, J., L. Shams, S. Shimojo. 2002. Sound-induced illusory flash perception: Role of gamma band
responses. Neuroreport 13:1727–1730.
Bi, G.-Q., and M.-M. Poo. 2001. Synaptic modification by correlated activity: Hebb’s postulate revisited.
Annual Review of Neuroscience 24:139–166.
Bressler, S.L., and W.J. Freeman. 1980. Frequency analysis of olfactory system EEG in cat, rabbit, and rat.
Electroencephalography and Clinical Neurophysiology 50:19–24.
Brosch, M., E. Budinger, H. Scheich. 2002. Stimulus-related gamma oscillations in primate auditory cortex.
Journal of Neurophysiology 87:2715–2725.
Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cerebral Cortex 11:1110–1123.
Castelo-Branco, M., R. Goebel, S. Neuenschwander, W. Singer. 2000. Neural synchrony correlates with surface
segregation rules. Nature 405:685–689.
Csicsvari, J., B. Jamieson, K.D. Wise, G. Buzsaki. 2003. Mechanisms of gamma oscillations in the hippocam-
pus of the behaving rat. Neuron 37:311–322.
Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices
differently in the superior temporal sulcus. Journal of Neurophysiology 101:773–788.
Debener, S., C.S. Herrmann, C. Kranczioch, D. Gembris, A.K. Engel. 2003. Top-down attentional processing
enhances auditory evoked gamma band activity. Neuroreport 14:683–686.
Debener, S., M. Ullsperger, M. Siegel, A.K. Engel. 2006. Single-trial EEG-fMRI reveals the dynamics of cog-
nitive function. Trends in Cognitive Sciences 10:558–563.
Doesburg, S.M., L.L. Emberson, A. Rahi, D. Cameron, L.M. Ward. 2007. Asynchrony from synchrony:
Long-range gamma-band neural synchrony accompanies perception of audiovisual speech asynchrony.
Experimental Brain Research 185:11–20.
Driver, J., and C. Spence. 2000. Multisensory perception: Beyond modularity and convergence. Current Biology
10:R731–R735.
Engel, A.K., P. König, W. Singer. 1991a. Direct physiological evidence for scene segmentation by temporal cod-
ing. Proceedings of the National Academy of Sciences of the United States of America 88:9136–9140.
Multisensory Integration through Neural Coherence 127
Engel, A.K., P. König, A.K. Kreiter, Singer, W. 1991b. Interhemispheric synchronization of oscillatory neu-
ronal responses in cat visual cortex. Science 252:1177–1179.
Engel, A.K., P. König, A.K. Kreiter, T.B. Schillen, W. Singer. 1992. Temporal coding in the visual cortex: New
vistas on integration in the nervous system. Trends in Neurosciences 15:218–226.
Engel, A.K., P. Fries, W. Singer. 2001. Dynamic predictions: Oscillations and synchrony in top-down process-
ing. Nature Reviews. Neuroscience 2:704–716.
Farmer, S.F. 1998. Rhythmicity, synchronization and binding in human and primate motor systems. Journal of
Physiology 509:3–14.
Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical
processing. Neuroreport 16:419–423.
Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory-somatosensory multisensory processing in auditory
association cortex: An fMRI study. Journal of Neurophysiology 88:540–543.
Foxe, J.J., G.V. Simpson, S.P. Ahlfors, C.D. Saron. 2005. Biasing the brain’s attentional set: I. cue driven
deployments of intersensory selective attention. Experimental Brain Research 166:370–392.
Fries, P. 2005. A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence.
Trends in Cognitive Sciences 9:474–480.
Fries, P., P.R. Roelfsema, A.K. Engel, P. König, W. Singer. 1997. Synchronization of oscillatory responses in
visual cortex correlates with perception in interocular rivalry. Proceedings of the National Academy of
Sciences of the United States of America 94:12699–12704.
Fries, P., S. Neuenschwander, A.K. Engel, R. Goebel, W. Singer. 2001. Modulation of oscillatory neuronal
synchronization by selective visual attention. Science 291:1560–1563.
Fries, P., D. Nikolic, W. Singer. 2007. The gamma cycle. Trends in Neurosciences 30:309–316.
Fries, P., R. Scheeringa, R. Oostenveld. 2008. Finding gamma. Neuron 58:303–305.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–285.
Gray, C.M., P. König, A.K. Engel, W. Singer. 1989. Oscillatory responses in cat visual cortex exhibit inter-
columnar synchronization which reflects global stimulus properties. Nature 338:334–337.
Gross, J., J. Kujala, M. Hamalainen et al. 2001. Dynamic imaging of coherent sources: Studying neural inter-
actions in the human brain. Proceedings of the National Academy of Sciences of the United States of
America 98:694–699.
Gruber, T., and M.M. Müller. 2005. Oscillatory brain activity dissociates between associative stimulus content
in a repetition priming task in the human EEG. Cerebral Cortex 15:109–116.
Herrmann, C.S., M.H. Munk, A.K. Engel. 2004a. Cognitive functions of gamma-band activity: Memory match
and utilization. Trends in Cognitive Sciences 8:347–355.
Herrmann, C.S., D. Lenz, S. Junge, N.A. Busch, B. Maess. 2004b. Memory-matches evoke human gamma-
responses. BMC Neuroscience 5:13.
Hubbard, E.M., and V.S. Ramachandran. 2005. Neurocognitive mechanisms of synesthesia. Neuron 48:
509–520.
Hummel, F., and C. Gerloff. 2005. Larger interregional synchrony is associated with greater behavioral success
in a complex sensory integration task in humans. Cerebral Cortex 15:670–678.
Iarocci, G., and J. McDonald. 2006. Sensory integration and the perceptual experience of persons with autism.
Journal of Autism and Developmental Disorders 36:77–90.
Kaiser, J., W. Lutzenberger, H. Ackermann, N. Birbaumer. 2002. Dynamics of gamma-band activity induced by
auditory pattern changes in humans. Cerebral Cortex 12:212–221.
Kaiser, J., I. Hertrich, H. Ackermann, K. Mathiak, W. Lutzenberger. 2005. Hearing lips: Gamma-band activity
during audiovisual speech perception. Cerebral Cortex 15:646–653.
Kaiser, J., I. Hertrich, W. Ackermann, W. Lutzenberger. 2006. Gamma-band activity over early sensory areas
predicts detection of changes in audiovisual speech stimuli. NeuroImage 30:1376–1382.
Kanayama, N., A. Sato, H. Ohira. 2007. Crossmodal effect with rubber hand illusion and gamma-band activity.
Psychophysiology 44:392–402.
Kayser, C., and N.K. Logothetis. 2007. Do early sensory cortices integrate crossmodal information? Brain
Structure and Function 212:121–132.
Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices
and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7.
Kayser, C., C.I. Petkov, M. Augath, N.K. Logothetis. 2005. Integration of touch and sound in auditory cortex.
Neuron 48:373–384.
Kayser, C., C.I. Petkov, N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–1574.
128 The Neural Bases of Multisensory Processes
Kisley, M.A., and Z.M. Cornwell. 2006. Gamma and beta neural activity evoked during a sensory gating
paradigm: Effects of auditory, somatosensory and cross-modal stimulation. Clinical Neurophysiology
117:2549–2563.
Komura, Y., R. Tamura, T. Uwano, H. Nishijo, T. Ono. 2005. Auditory thalamus integrates visual inputs into
behavioral gains. Nature Neuroscience 8:1203–1209.
König, P., A.K. Engel, W. Singer. 1995. Relation between oscillatory activity and long-range synchronization
in cat visual cortex. Proceedings of the National Academy of Sciences of the United States of America
92:290–294.
König, P., A.K. Engel, W. Singer. 1996. Integrator or coincidence detector? The role of the cortical neuron
revisited. Trends in Neurosciences 19:130–137.
Lachaux, J.P., D. Rudrauf, P. Kahane. 2003. Intracranial EEG and human brain mapping. Journal of Physiology
97:613–628.
Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, C.E. Schroeder. 2007. Neuronal oscillations and multisen-
sory interaction in primary auditory cortex. Neuron 53:279–292.
Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in
the human brain. Trends in Neurosciences 28:264–271.
Macaluso, E., C.D. Frith, J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial attention.
Science 289:1206–1208.
Maier, J.X., C. Chandrasekaran, A.A. Ghazanfar. 2008. Integration of bimodal looming signals through neu-
ronal coherence in the temporal lobe. Current Biology 18:963–968.
Markram, H., J. Lübke, M. Frotscher, B. Sakmann. 1997. Regulation of synaptic efficacy by coincidence of
postsynaptic APs and EPSPs. Nature 275:213–215.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–748.
Meredith, M.A. 2002. On the neuronal basis for multisensory convergence: A brief overview. Brain Research.
Cognitive Brain Research 14:31–40.
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–391.
Meredith, M.A., and B.E. Stein. 1985. Descending efferents from the superior colliculus relay integrated mul-
tisensory information. Science 227:657–659.
Mishra, J., A. Martinez, T.J. Sejnowski, S.A. Hillyard. 2007. Early cross-modal interactions in auditory and
visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience 27:4120–4131.
Müller, M.M., T. Gruber, A. Keil. 2000. Modulation of induced gamma band activity in the human EEG by
attention and visual information processing. International Journal of Psychophysiology 38:283–299.
Nagy, A., G. Eördegh, Z. Paroczy, Z. Markus, G. Benedek. 2006. Multisensory integration in the basal ganglia.
European Journal of Neuroscience 24:917–924.
Nishijo, H., T. Ono, H. Nishino. 1988. Topographic distribution of modality-specific amygdalar neurons in alert
monkey. Journal of Neuroscience 8:3556–3569.
Roelfsema, P.R., P. König, A.K. Engel, R. Sireteanu, W. Singer. 1994. Reduced synchronization in the visual
cortex of cats with strabismic amblyopia. European Journal of Neuroscience 6:1645–1655.
Roelfsema, P.R., A.K. Engel, P. König, W. Singer. 1997. Visuomotor integration is associated with zero time-
lag synchronization among cortical areas. Nature 385:157–161.
Ross, L.A., D. Saint-Amour, V.M. Leavitt, S. Molholm, D.C. Javitt, J.J. Foxe. 2007. Impaired multisensory
processing in schizophrenia: Deficits in the visual enhancement of speech comprehension under noisy
environmental conditions. Schizophrenia Research 97:173–183.
Rowland B.A., S. Quessy, T.R. Stanford, B.E. Stein. 2007. Multisensory integration shortens physiological
response latencies. Journal of Neuroscience 27:5879–5884.
Sakowitz, O.W., M. Schürmann, E. Basar. 2000. Oscillatory frontal theta responses are increased upon bisen-
sory stimulation. Clinical Neurophysiology 111:884–893.
Sakowitz, O.W., R.Q. Quiroga, M. Schürmann, E. Basar. 2001. Bisensory stimulation increases gamma-
responses over multiple cortical regions. Brain Research. Cognitive Brain Research. 11:267–279.
Sakowitz, O.W., R.Q. Quiroga, M. Schürmann, E. Basar. 2005. Spatio-temporal frequency characteristics of
intersensory components in audiovisual evoked potentials. Brain Research. Cognitive Brain Research
23:316–326.
Salinas, E., and T.J. Sejnowski. 2001. Correlated neuronal activity and the flow of neural information. Nature
Reviews Neuroscience 2:539–550.
Sanes, J.N., and J.P. Donoghue. 1993. Oscillations in local field potentials of the primate motor cortex during
voluntary movement. Proceedings of the National Academy of Sciences of the United States of America
90:4470–4474.
Multisensory Integration through Neural Coherence 129
Schnitzler, A., and J. Gross. 2005. Normal and pathological oscillatory communication in the brain. Nature
Reviews Neuroscience 6:285–296.
Schneider, T.R., S. Debener, R. Oostenveld, A.K. Engel. 2008. Enhanced EEG gamma-band activity reflects
multisensory semantic matching in visual-to-auditory object priming. NeuroImage 42:1244–1254.
Senkowski, D., D. Talsma, C.S. Herrmann, M.G. Woldorff. 2005. Multisensory processing and oscil-
latory gamma responses: Effects of spatial selective attention. Experimental Brain Research
3–4:411–426.
Senkowski, D., S. Molholm, M. Gomez-Ramirez, J.J. Foxe. 2006. Oscillatory beta activity predicts response
speed during a multisensory audiovisual reaction time task: A high-density electrical mapping study.
Cerebral Cortex 16:1556–1565.
Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, M.G. Woldorff. 2007. Good times for multisensory
integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations.
Neuropsychologia 45:561–571.
Senkowski, D., T.R. Schneider, J.J. Foxe, A.K. Engel. 2008. Crossmodal binding through neural coherence:
Implications for multisensory processing. Trends in Neurosciences 31:401–409.
Senkowski, D., T.R. Schneider, R. Tandler, A.K. Engel. 2009. Gamma-band activity reflects multisensory
matching in working memory. Experimental Brain Research 198:363–372.
Shams, L., Y. Kamitani, S. Shimojo. 2000. Illusions. What you see is what you hear. Nature 408:788.
Shimojo, S., and L. Shams. 2001. Sensory modalities are not separate modalities: Plasticity and interactions.
Current Opinion in Neurobiology 11:505–509.
Siegel, M., T.H. Donner, R. Oostenveld, P. Fries, A.K. Engel. 2007. High-frequency activity in human visual
cortex is modulated by visual motion strength. Cerebral Cortex 17:732–741.
Siegel, M., T.H. Donner, R. Oostenveld, P. Fries, A.K. Engel. 2008. Neuronal synchronization along the dorsal
visual pathway reflects the focus of spatial attention. Neuron 60:709–719.
Singer, W. 1999. Neuronal synchrony: A versatile code for the definition of relations? Neuron 24:49–65.
Singer, W., and C.M. Gray. 1995. Visual feature integration and the temporal correlation hypothesis. Annual
Review of Neuroscience 18:555–586.
Stanford, T.R., S. Quessy, B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in
the cat superior colliculus. Journal of Neuroscience 25:6499–6508.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., M.W. Wallace, T.R. Stanford, W. Jiang. 2002. Cortex governs multisensory integration in the mid-
brain. Neuroscientist 8:306–314.
Supp, G.G., A. Schlögl, N. Trujillo-Barreto, M.M. Müller, T. Gruber. 2007. Directed cortical information flow
during human object recognition: Analyzing induced EEG gamma-band responses in brain’s source
space. PLoS ONE 2:e684.
Sur, M., S.L. Pallas, A.W. Roe. 1990. Cross-modal plasticity in cortical development: Differentiation and speci-
fication of sensory neocortex. Trends in Neurosciences 13:227–233.
Tallon-Baudry, C., and O. Bertrand. 1999. Oscillatory gamma activity in humans and its role in object repre-
sentation. Trends in Cognitive Sciences 3:151–162.
Tallon-Baudry, C., O. Bertrand, C. Delpuech, J. Pernier. 1996. Stimulus specificity of phase-locked and non-
phase-locked 40 Hz visual responses in human. Journal of Neuroscience 16:4240–4249.
Talsma, D., and M.G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of
effects on the evoked brain activity. Journal of Cognitive Neuroscience 17:1098–1114.
Uhlhaas, P.J., and W. Singer. 2006. Neural synchrony in brain disorders: Relevance for cognitive dysfunctions
and pathophysiology. Neuron 52:155–168.
Van Veen, B.D., W. van Drongelen, M. Yuchtman, A. Suzuki. 1997. Localization of brain electrical activity via
linearly constrained minimum variance spatial filtering. IEEE Transactions on Bio-Medical Engineering
44:867–880.
Varela, F., J.P. Lachaux, E. Rodriguez, J. Martinerie. 2001. The brainweb: Phase synchronization and large-
scale integration. Nature Reviews. Neuroscience 2:229–239.
von der Malsburg, C., and W. Schneider. 1986. A neural cocktail-party processor. Biological Cybernetics
54:29–40.
von Stein, A., P. Rappelsberger, J. Sarnthein, H. Petsche. 1999. Synchronization between temporal and parietal
cortex during multimodal object processing in man. Cerebral Cortex 9:137–150.
Vroomen, J., and M. Keetels. 2010. Perception of intersensory synchrony: A tutorial review. Attention,
Perception, & Psychophysics 72:871–884.
Wehr, M., and G. Laurent. 1996. Odour encoding by temporal sequences of firing in oscillating neural assem-
blies. Nature 384:162–166.
130 The Neural Bases of Multisensory Processes
Widmann, A., T. Gruber, T. Kujala, M. Tervaniemi, E. Schröger. 2007. Binding symbols and sounds: Evidence
from event-related oscillatory gamma-band activity. Cerebral Cortex 17:2696–2702.
Womelsdorf, T., P. Fries, P.P. Mitra, R. Desimone. 2006. Gamma-band synchronization in visual cortex predicts
speed of change detection. Nature 439:733–736.
Womelsdorf, T., J.M. Schoffelen, R. Oostenveld et al. 2007. Modulation of neuronal interactions through neu-
ronal synchronization. Science 316:1609–1612.
Yuval-Greenberg, S., and L.Y. Deouell. 2007. What you see is not (always) what you hear: Induced gamma
band responses reflect cross-modal interactions in familiar object recognition. Journal of Neuroscience
27:1090–1096.
Yuval-Greenberg, S., O. Tomer, A.S. Keren, I. Nelken, L.Y. Deouell. 2008. Transient induced gamma-band
response in EEG as a manifestation of miniature saccades. Neuron 58:429–441.
8 The Use of fMRI to Assess
Multisensory Integration
Thomas W. James and Ryan A. Stevenson
CONTENTS
8.1 Principles of Multisensory Enhancement.............................................................................. 131
8.2 Superadditivity and BOLD fMRI.......................................................................................... 133
8.3 Problems with Additive Criterion.......................................................................................... 134
8.4 Inverse Effectiveness............................................................................................................. 136
8.5 BOLD Baseline: When Zero Is Not Zero.............................................................................. 138
8.6 A Difference-of-BOLD Measure........................................................................................... 139
8.7 Limitations and Future Directions........................................................................................ 143
8.8 Conclusions............................................................................................................................ 144
Acknowledgments........................................................................................................................... 144
References....................................................................................................................................... 145
Although scientists have only recently had the tools available to noninvasively study the neural
mechanisms of multisensory perceptual processes in humans (Calvert et al. 1999), the study of
multisensory perception has had a long history in science (James 1890; Molyneux 1688). Before the
advent of neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) and
high-density electrical recording, the study of neural mechanisms, using single-unit recording, was
restricted to nonhuman animals such as monkeys and cats. These groundbreaking neurophysiologi-
cal studies established many principles for understanding multisensory processing at the level of
single neurons (Meredith and Stein 1983), and continue to improve our understanding of multisen-
sory mechanisms at that level (Stein and Stanford 2008).
It is tempting to consider that neuroimaging measurements, like blood oxygenation level–
dependent (BOLD) activation measured with fMRI, are directly comparable with findings from
single-unit recordings. Although several studies have established clear links between BOLD activa-
tion and neural activity (Attwell and Iadecola 2002; Logothetis and Wandell 2004; Thompson et al.
2003), there remains a fundamental difference between BOLD activation and single-unit activity:
BOLD activation is measured from the vasculature supplying a heterogeneous population of neu-
rons, whereas single-unit measures are taken from individual neurons (Scannell and Young 1999).
The ramifications of this difference are not inconsequential because the principles of multisensory
phenomena established using single-unit recording may not apply to population-based neuroimaging
data (Calvert et al. 2000). The established principles must be tested theoretically and empirically, and
where they fail, they must be replaced with new principles that are specific to the new technique.
131
132 The Neural Bases of Multisensory Processes
spike count above spontaneous baseline) with only one modality of sensory input, and this response
is not modulated by concurrent input from any other sensory modality. The second class of neurons is
bimodal (or trimodal). They produce significant neural activity with two or more modalities of unisen-
sory input (Meredith and Stein 1983; Stein and Stanford 2008). With single-unit recording, bimodal
neurons can be identified by testing their response with unisensory stimuli from two different sensory
modalities. The premise is simple: if the neuron produces significant activity with both modalities,
then it is bimodal. However, bimodal activation only implies a convergence of sensory inputs, not the
integration of those inputs (Stein et al. 2009). Bimodal neurons can be further tested for multisensory
integration by using multisensory stimuli. When tested with a multisensory stimulus, most bimodal
neurons produce activity that is greater than the maximum activity produced with either unisensory
stimulus or multisensory enhancement. The criterion usually used to identify multisensory enhance-
ment is called the maximum criterion or rule (AV > Max(A,V)). A minority of neurons produce activity
that is lower than the maximum criterion, which is considered multisensory suppression. Whether the
effect is enhancement or suppression, a change in activity of a neuron when the subject is stimulated
through a second sensory channel only occurs if those sensory channels interact. Thus, multisensory
enhancement and suppression are indicators that information is being integrated. The third class of
neurons is subthreshold. They have patterns of activity that look unisensory when they are tested with
only unisensory stimuli, but when tested with multisensory stimuli, show multisensory enhancement
(Allman and Meredith 2007; Allman et al. 2008; Meredith and Allman 2009). For example, a sub-
threshold neuron may produce significant activity with visual stimuli, but not with auditory stimuli.
Because it does not respond significantly with both, it cannot be classified as bimodal. However, when
tested with combined audiovisual stimuli, the neuron shows multisensory enhancement and thus inte-
gration. For graphical representations of each of these three classes of neurons, see Figure 8.1.
Unisensory neurons
Unisensory auditory Unisensory visual
Impulse counts
Impulse counts
Unisensory
A V AV A V AV
Input modality Input modality
Bimodal neurons
Bimodal enhanced Bimodal supressed Bimodal superadditive
Impulse counts
Impulse counts
Impulse counts
Multisensory
A V AV A V AV A V AV
Input modality Input modality Input modality
Subthreshold neurons
Subthreshold auditory Subthreshold Visual
Impulse counts
Impulse counts
A V AV A V AV
Input modality Input modality
A majority of bimodal and subthreshold neurons show multisensory enhancement (i.e., exceed
the maximum criterion when stimulated with a multisensory stimulus); however, neurons that show
multisensory enhancement can be further subdivided into those that are superadditive and those
that are subadditive. Superadditive neurons show multisensory activity that exceeds a criterion that
is greater than the sum of the unisensory activities (AV > Sum(A,V); Stein and Meredith 1993). In
the case of subthreshold neurons, neural activity is only elicited by a single unisensory modality;
therefore, the criterion for superadditivity is the same as (or very similar to) the maximum crite-
rion. However, in the case of bimodal neurons, the criterion for superadditivity is usually much
greater than the maximum criterion. Thus, superadditive bimodal neurons can show extreme levels
of multisensory enhancement. Although bimodal neurons that are superadditive are, by definition,
multisensory (because they must also exceed the maximum criterion), the majority of multisensory
enhancing neurons are not superadditive (Alvarado et al. 2007; Perrault et al. 2003; Stanford et al.
2007). To be clear, in single-unit studies, superadditivity is not a criterion for identifying multisen-
sory enhancement, but instead is used to classify the degree of enhancement.
A cells
V cells
BOLD response
A V AV Max(A,V) Sum(A,V)
Input modality Criterion
unisensory neurons respond similarly under unisensory and multisensory stimulation (otherwise
they would be classified as subthreshold neurons), the modeled AV activation is the same as the
additive criterion.
For comparison, we include the maximum criterion (the Max(A,V) bar), which is the crite-
rion used in single-unit recording, and sometimes used with BOLD fMRI (Beauchamp 2005; van
Atteveldt et al. 2007). The maximum criterion is clearly much more liberal than the additive cri-
terion, and the model in Figure 8.2 shows that the use of the maximum criterion with BOLD data
could produce false-positives in brain regions containing only two pools of unisensory neurons
and no multisensory neurons. That is, if a single voxel contained only unisensory neurons and no
neurons with multisensory properties, the BOLD response will still exceed the maximum criterion.
Thus, the simple model shown in Figure 8.2 demonstrates both the utility of the additive criterion
for assessing multisensory interactions in populations containing a mixture of unisensory and mul-
tisensory neurons, and that the maximum criterion, which is sometimes used in place of the additive
criterion, may inappropriately identify unisensory areas as multisensory.
It should be noted that the utility of the additive criterion applied to BOLD fMRI data is different
conceptually from the superadditivity label used with single units. The additive criterion is used to
identify multisensory interactions with BOLD activation. This is analogous to maximum criterion
being used to identify multisensory interactions in single-unit activity. Thus, superadditivity with
single units is not analogous to the additive criterion with BOLD fMRI. The term superadditivity is
used with single-unit recordings as a label to describe a subclass of neurons that not only exceeded
the maximum criterion, but also the superadditivity criterion.
1.03
1.5 0.79 0.80
0.54
0.80
1.0
0.49 0.80 0.80 0.80 0.80 0.80 0.49
0.54 0.54
0.5
0.80 0.80
0.60 0.60 0.60 0.60 0.60 0.60 0.60
0.0
A V
ax
ax
ive
ive
ti
,V )
)
A,V
en
rm
:m
dit
dit
x(A
ri
m(
pe
AV
au
: ad
rad
Ma
Su
: su
:L
pe
AV
AV
AV
: su
AV
Unisensory Modeled responses Criterion
input to AV input
including unisensory auditory, unisensory visual, and multisensory audiovisual. The Sum(A,V) col-
umn is simply the sum of the audio and visual BOLD signals and represents the additive criterion
(null hypothesis). The audiovisual stimulus conditions were simulated using five different models,
the maximum model, the supermaximum model, the additive model, the superadditive model, and
the Laurienti model. The first three rows of the table represent the contributions of different classes
of neurons to BOLD activation, including auditory unisensory neurons (A cells), visual unisensory
neurons (V cells), and audiovisual multisensory neurons (AV cells). To be clear, the BOLD value in
the bottom-most row is the sum of the A, V, and AV cell’s contributions. Summing these contribu-
tions is based on the assumption that voxels (or clusters of voxels) contain mixtures of unisensory
and multisensory neurons, not a single class of neurons. Although the “contributions” have no units,
they are simulated based on the statistics of recorded impulse counts (spike counts) from neurons
in the superior colliculus, as reported by Laurienti et al. (2005). Unisensory neurons were explicitly
modeled to respond similarly under multisensory stimulation as they did under unisensory stimula-
tion, otherwise they would be classified as subthreshold neurons, which were not considered in the
models.
The five models of BOLD activation under audiovisual stimulation differed in the calculation
of only one value: the contribution of the AV multisensory neurons. For the maximum model, the
contribution of AV cells was calculated as the maximum of the AV cell contributions with visual
and auditory unisensory stimuli. For the super-max model, the contribution of AV neurons was cal-
culated as 150% of the AV cell contribution used for the maximum model. For the additive model,
the contribution of AV cells was calculated as the sum of AV cell contributions with visual and audi-
tory unisensory stimuli. For the superadditive model, the contribution of AV cells was calculated as
150% of the AV cell contribution used for the additive model. Finally, for the Laurienti model, the
136 The Neural Bases of Multisensory Processes
contribution of the AV cells was based on the statistics of recorded impulse counts. What the table
makes clear is that, based on Laurienti’s statistics, the additive criterion is too conservative, which is
consistent with what has been found in practice (Beauchamp 2005; Beauchamp et al. 2004a, 2004b;
Laurienti et al. 2005; Stevenson et al. 2007).
Laurienti and colleagues (2005) suggest three reasons why the simulated BOLD activation may
not exceed the additive criterion based on the known neurophysiology: first, the proportion of AV
neurons is small compared to unisensory neurons; second, of those multisensory neurons, only a
small proportion are superadditive; and third, superadditive neurons have low impulse counts rela-
tive to other neurons. To exceed the additive criterion, the average impulse count of the pool of
bimodal neurons must be significantly superadditive for population-based measurements to exceed
the additive criterion. The presence of superadditive neurons in the pool is not enough by itself
because those superadditive responses are averaged with other subadditive, and even suppressive,
responses. According to Laurienti’s statistics, the result of this averaging is a value somewhere
between maximum and additive. Thus, even though the additive criterion is appropriate because
it represents the correct null hypothesis, the statistical distribution of cell and impulse counts in
multisensory brain regions may make it practically intractable as a criterion.
Subadditive
2.5
AV cells
0.49
2.0 A cells
V cells
BOLD response
0.80
Superadditive
1.5 0.80
0.40 0.12
1.0 0.49 0.80
0.54 0.54 0.56
0.56
0.12
0.5 0.13 0.13
0.80
0.60 0.60 0.60 0.56
0.42 0.42 0.42
0.0
A V AV Sum(A,V) A V AV Sum(A,V)
contribution of the multisensory neurons. On the right in Figure 8.4, a similar situation is shown,
but with less effective, degraded stimuli. In general, neurons in multisensory regions decrease their
impulse counts when stimuli are less salient. However, the size of the decrease is different across
different classes of neurons and different stimulus conditions (Alvarado et al. 2007). In our simu-
lation, impulse counts of unisensory neurons were reduced by 30% from the values simulated by
the Laurienti model. Impulse counts of bimodal neurons were reduced by 75% under unisensory
stimulus conditions, and by 50% under multisensory stimulus conditions. This difference in reduc-
tion for bimodal neurons between unisensory and multisensory stimulus conditions reflects inverse
effectiveness, that is, the multisensory gain increases with decreasing stimulus effectiveness.
Using these reductions in activity with stimulus degradation, BOLD activation with the AV
stimulus now exceeds the additive criterion. Admittedly, the reductions that were assigned to
the different classes of neurons were chosen somewhat arbitrarily. There are definitely different
combinations of reductions that would lead to AV activation that would not exceed the criterion.
However, the reductions shown are based on statistics of impulse counts taken from single-unit
recording data, and are consistent with the principle of inverse effectiveness reported routinely
in the single-unit recording literature (Meredith and Stein 1986). Furthermore, there is empirical
evidence from neuroimaging showing an increased likelihood of exceeding the additive criterion
as stimulus quality is degraded (Stevenson and James 2009; Stevenson et al. 2007, 2009). Figure
8.5 compares AV activation with the additive criterion at multiple levels of stimulus quality. These
are a subset of data from a study reported elsewhere (Stevenson and James 2009). Stimulus quality
was degraded by parametrically varying the signal-to-noise ratio (SNR) of the stimuli until partici-
pants were able to correctly identify the stimuli at a given accuracy. This was done by embedding
the audio and visual signals in constant external noise and lowering the root mean square contrast
of the signals. AV activation exceeded the additive criterion at low SNR, but failed to exceed the
criterion at high SNR.
Although there is significant empirical and theoretical evidence suggesting that the additive
criterion is too conservative at high stimulus SNR, the data presented in Figure 8.5 suggest that
the additive criterion may be a better criterion at low SNR. However, there are two possible prob-
lems with using low-SNR stimuli to assess multisensory integration with BOLD fMRI. First,
based on the data in Figure 8.5, the change from failing to meet the additive criterion to exceeding
the additive criterion is gradual, not a sudden jump at a particular level of SNR. Thus, the choice
of SNR level(s) is extremely important for the interpretation of the results. Second, there may be
problems with using the additive criterion with measurements that lack a natural zero, such as
BOLD.
0.3 AV response
Sum(AV) response
0.25
BOLD response
0.2
0.15
0.1
0.05
0
95% 85% 75% 65%
Stimulus quality by % accuracy
FIGURE 8.5 Assessing inverse effectiveness empirically with BOLD activation. These are a subset of data
reported elsewhere. (From Stevenson, R.A. and James, T.W., NeuroImage, 44, 1210–23, 2009. With permission.)
138 The Neural Bases of Multisensory Processes
AV > A + V (8.1)
580
560
540
520
500
480
A V AV Baseline 1 A V AV Baseline 2
Experiment 1 Experiment 2
0.20 Subadditive
0.16
% BOLD change
Superadditive
0.12
0.08
0.04
0.00
A V AV Sum(A,V) A V AV Sum(A,V)
Experiment 1 Experiment 2
and then
Equation 8.4 clearly shows that the level of activation produced by the baseline condition influences
the additive criterion. An increase in activation of the baseline condition causes the additive crite-
rion to become more liberal (Figure 8.6). The fact that the additive criterion can be influenced by the
activation of the experimenter-chosen baseline condition may explain why similar experiments from
different laboratories produce different findings when that criterion is used (Beauchamp 2005).
Note that the inequality sign is different in Equation 8.5 than in Equation 8.1. Equation 8.1 is used
to test the directional hypothesis that AV activation exceeds the additive criterion. Subadditivity, the
hypothesis that AV activation is less than the additive criterion, is rarely, if ever, used as a criterion
by itself. It has used been used in combination with superadditivity, for instance, showing that a
brain region exceeds the additive criterion with semantically congruent stimuli but does not exceed
the additive criterion with semantically incongruent stimuli (Calvert et al. 2000). This example
(using both superadditivity and subadditivity), however, is testing two directional hypotheses, rather
than testing one nondirectional hypothesis. Equation 8.5 is used to test a nondirectional hypothesis,
140 The Neural Bases of Multisensory Processes
and we suggest that it should be nondirectional for two reasons. First, the order in which the two
terms are subtracted to produce each delta is arbitrary. For each delta term, if the least effective
stimulus condition is subtracted from the most effective condition, then Equation 8.5 can be rewrit-
ten as ΔAV < ΔA + ΔV to test for inverse effectiveness, that is, the multisensory difference should
be less than the sum of the unisensory differences. If, however, the differences were taken in the
opposite direction (i.e., most effective subtracted from least effective), Equation 8.5 would need to
be rewritten with the inequality in the opposite direction (i.e., ΔAV > ΔA + ΔV). Second, inverse
effectiveness may not be the only meaningful effect that can be seen with difference measures,
perhaps especially if the measures are used to assess function across the whole brain. This point is
discussed further at the end of the chapter (Figure 8.9).
Each component of Equation 8.5 can be rewritten with the baseline activation made explicit. The
equation for the audio component would be
∆A=
(A1 − baseline ) − (A 2 − baseline ) , (8.6)
baseline baseline
where A1 and A2 represent auditory stimulus conditions with different levels of stimulus quality.
When Equation 8.5 is rewritten by substituting Equation 8.6 for each of the three stimulus condi-
tions, all baseline variables in both the denominator and the numerator cancel out, producing the
following equation:
The key importance of Equation 8.7 is that the baseline variable cancels out when relative differ-
ences are used instead of absolute values. Thus, the level of baseline activation has no influence on
a criterion calculated from BOLD differences.
The null hypothesis represented by Equation 8.5 is similar to the additive criterion in that the
sum of two unisensory values is compared to a multisensory value. Those values, however, are
relative differences instead of absolute BOLD percentage signal changes. If the multisensory differ-
ence is less (or greater) than the additive difference criterion, one can infer an interaction between
sensory channels, most likely in the form of a third pool of multisensory neurons in addition to
unisensory neurons. The rationale for using additive differences is illustrated in Figure 8.7. The
simulated data for the null hypothesis reflect the contributions of neurons in a brain region that
contains only unisensory auditory and visual neurons (Figure 8.7a). In the top panel, the horizontal
axis represents the stimulus condition, either unisensory auditory (A) or visual (V), or multisensory
audiovisual (AV). The subscripts 1 and 2 represent different levels of stimulus quality. For example,
A1 is high-quality audio and A2 is low-quality audio. To relate these simulated data to the data in
Figure 8.2 and the absolute additive criterion, the height of the stacked bar for AV1 is the absolute
additive criterion (or null hypothesis) for the high-quality stimuli, and the height of the AV2 stacked
bar is the absolute additive criterion for the low-quality stimuli. Those absolute additive criteria,
however, suffer from the issues discussed above. Evaluating the absolute criterion at multiple levels
of stimulus quality provides the experimenter with more information than evaluating it at only one
level, but a potentially better way of assessing multisensory integration is to use a criterion based
on differences between the high- and low-quality stimulus conditions. The null hypothesis for this
additive differences criterion is illustrated in the bottom panel of Figure 8.7a. The horizontal axis
shows the difference in auditory (ΔA), visual (ΔV), and audiovisual (ΔAV) stimuli, all calculated
as differences in the heights of the stacked bars in the top panel. The additive differences criterion,
labeled Sum(ΔA,ΔV), is also shown, and is the same as the difference in multisensory activation
(ΔAV). Thus, for a brain region containing only two pools of unisensory neurons, the appropriate
null hypothesis to be tested is provided by Equation 8.5.
The Use of fMRI to Assess Multisensory Integration 141
1.40 2.5
A cells AV cells
1.20 V cells ΔAV A cells
2.0 V cells
% BOLD change
% BOLD change
1.00 0.80 0.80 ΔAV
0.80 1.5
ΔV 0.56
0.40
0.60 1.0 0.49
ΔA ΔV 0.80
0.54 ΔA
0.40 0.80
0.12 0.56
0.60 0.56 0.60 0.5 0.13
0.20 0.42 0.42 0.80
0.60 0.56 0.60
0.42 0.42
0.00 0.0
A1 A2 V1 V2 AV1 AV2 A1 A2 V1 V2 AV1 AV2
Input condition Input condition
BOLD differences
0.35 V cells
1.00 0.37
0.30 0.24 0.24
0.25 0.80
0.24
0.20 0.60 0.40
0.15 0.37
0.24 0.40 0.41 0.41
0.10 0.18 0.18 0.18 0.24
0.05 0.20
0.18 0.24 0.18 0.18
0.00 0.00
ΔA ΔV ΔAV Sum(ΔA,ΔV) ΔA ΔV ΔAV Sum(ΔA,ΔV)
Differences Criterion Differences Criterion
The data in Figure 8.7b apply the additive differences criterion to the simulated BOLD activation
data shown in Figure 8.4. Recall from Figure 8.4 that the average contribution of the multisensory
neurons is subadditive for high-quality stimuli (A1, V1, AV1), but is superadditive with low-quality
stimuli (A2, V2, AV2). In other words, the multisensory pool shows inverse effectiveness. The data
in the bottom panel of Figure 8.7b are similar to the bottom panel of Figure 8.7a, but with the addi-
tion of this third pool of multisensory neurons to the population. Adding the third pool makes ΔAV
(the difference in multisensory activation) significantly less than the additive differences criterion
(Sum(ΔA,ΔV)), and rejects the null hypothesis of only two pools of unisensory neurons.
Figure 8.8 shows the same additive differences analysis performed on the empirical data from
Figure 8.5 (Stevenson and James 2009; Stevenson et al. 2009). The empirical data show the same
pattern as the simulated data. With both the simulated and empirical data, ΔAV was less than
Sum(ΔA,ΔV), a pattern of activation similar to inverse effectiveness seen in single units. In single-
unit recording, there is a positive relation between stimulus quality and impulse count (or effective-
ness). This same relation was seen between stimulus quality and BOLD activation. Although most
neurons show this relation, the multisensory neurons tend to show smaller decreases (proportion-
ately) than the unisensory neurons. Thus, as the effectiveness of the stimuli decreases, the multisen-
sory gain increases. Decreases in stimulus quality also had a smaller effect on multisensory BOLD
activation than on unisensory BOLD activation, suggesting that the results in Figure 8.8 could (but
do not necessarily) reflect the influence of inversely-effective neurons.
In summary, we have demonstrated some important theoretical limitations of the criteria com-
monly used in BOLD fMRI studies to assess multisensory integration. First, the additive criterion
142 The Neural Bases of Multisensory Processes
BOLD differences
0.08
0.06
0.04
0.02
0
95-85% 85-75% 75-65%
Stimulus quality by % accuracy
is susceptible to variations in baseline. Second, the additive criterion is sensitive only if the aver-
age activity profile of the multisensory neurons in the neuronal population is superadditive, which,
empirically, only occurs with very low-quality stimuli. A combination of these two issues may
explain the inconsistency in empirical findings using the additive criterion (Beauchamp 2005;
Calvert et al. 2000; Stevenson et al. 2007). Third, the maximum criterion tests a null hypothesis that
is based on a homogeneous population of only multisensory neurons. Existing single-unit recording
data suggest that multisensory brain regions have heterogeneous populations containing unisensory,
bimodal, and sometimes, subthreshold neurons. Thus, the null hypothesis tested with the maximum
criterion is likely to produce false-positive results in unisensory brain regions.
BOLD activity
A V AV A V AV A V AV A V AV
High quality Low quality High quality Low quality
BOLD activity
A V AV A V AV A V AV A V AV
High quality Low quality High quality Low quality
FIGURE 8.9 A whole-brain statistical parametric map of regions demonstrating audiovisual neuronal con-
vergence as assessed by additive differences criterion.
The Use of fMRI to Assess Multisensory Integration 143
As a potential solution to these concerns, we have developed a new criterion for assessing mul-
tisensory integration using relative BOLD differences instead of absolute BOLD measurements.
Relative differences are not influenced by changes in baseline, protecting the criterion from incon-
sistencies across studies. The null hypothesis to be tested is the sum of unisensory differences
(additive differences), which is based on the assumption of a heterogeneous population of neurons.
In addition to the appropriateness of the null hypothesis tested, the additive differences criterion
produced positive results in known multisensory brain regions when tested empirically (Stevenson
et al. 2009). Evidence for inverse effectiveness with audiovisual stimuli was found in known mul-
tisensory brain regions such as the superior temporal gyrus and inferior parietal lobule, but also
in regions that have garnered less attention from the multisensory community, such as the medial
frontal gyrus and parahippocampal gyrus (Figure 8.9). These results were found across different
pairings of sensory modalities and with different experimental designs, suggesting the use of addi-
tive differences may be of general use for assessing integration across sensory channels. A num-
ber of different brain regions, such as the insula and caudate nucleus, also showed an effect that
appeared to be the opposite of inverse effectiveness (Figure 8.9). BOLD activation in these brain
regions showed the opposite relation with stimulus quality as sensory brain regions, that is, high-
quality stimuli produced less activation than low-quality stimuli. Because of this opposite relation,
we termed the effect observed in these regions indirect inverse effectiveness. More research will be
needed to assess the contribution of indirect inverse effectiveness to multisensory neural processing
and behavior.
collected using single-unit recording. BOLD activation reflects a hemodynamic response, which
itself is the result of local neural activity. The exact relationship, however, between neural activ-
ity and BOLD activation is unclear. There is evidence that increased spiking produces small brief
local reductions in tissue oxygenation, followed by large sustained increases in tissue oxygenation
(Thompson et al. 2003). Neural spike count, however, is not the only predictor of BOLD activation
levels nor is it the best predictor. The correlation of BOLD activation with local field potentials is
stronger than the correlation of BOLD with spike count (Heeger et al. 2000; Heeger and Ress 2002;
Logothetis and Wandell 2004). Whereas spikes reflect the output of neurons, local field potentials
are thought to reflect the postsynaptic potentials or input to neurons. This distinction between input
and output and the relationship with BOLD activation raises some concerns about the relating stud-
ies using BOLD fMRI to studies using single-unit recording. Of course, spike count is also highly
correlated with local field potentials, suggesting that spike count, local field potentials, and BOLD
activation are all interrelated and, in fact, that the correlations among them may be related to another
variable that is responsible for producing all of the phenomena (Attwell and Iadecola 2002).
Multisensory single-unit recordings are mostly performed in monkey and cat superior colliculus
and monkey superior temporal sulcus or cat posterolateral lateral suprasylvian area (Allman and
Meredith 2007; Allman et al. 2008; Barraclough et al. 2005; Benevento et al. 1977; Bruce et al. 1981;
Hikosaka et al. 1988; Meredith 2002; Meredith and Stein 1983, 1986; Stein and Meredith 1993; Stein
and Stanford 2008). With BOLD fMRI, whole-brain imaging is routine, which allows for exploration
of the entire cortex. The principles that are derived from investigation of specific brain areas may not
always apply to other areas of the brain. Thus, whole-brain investigation has the distinct promise of
producing unexpected results. The unexpected results could be because of the different proportions
of known classes of neurons, or the presence of other classes of multisensory neurons that have not
yet been found with single-unit recording. It is possible that the indirect inverse effectiveness effect
described above (Figure 8.9) may reflect the combined activity of types of multisensory neurons with
response profiles that have not yet been discovered with single-unit recording.
8.8 CONCLUSIONS
We must stress that each method used to investigate multisensory interactions has a unique set
of limitations and assumptions, whether the method is fMRI, high-density recording, single-unit
recording, behavioral reaction time, or others. Differences between methods can have a great impact
on how multisensory interactions are assessed. Thus, it should not be assumed that a criterion that is
empirically tested and theoretically sound when used with one method will be similarly sound when
applied to another method. We have developed a method for assessing multisensory integration
using BOLD fMRI that makes fewer assumptions than established methods. Because BOLD mea-
surements have an arbitrary baseline, a criterion that is based on relative BOLD differences instead
of absolute BOLD values is more interpretable and reliable. Also, the use of BOLD differences is
not limited to comparing across multisensory channels, but should be equally effective when com-
paring across unisensory channels. Finally, it is also possible that the use of relative differences may
be useful with other types of measures, such as EEG, which also use an arbitrary baseline. However,
before using the additive differences criterion with other measurement methods, it should be tested
both theoretically and empirically, as we have done here with BOLD fMRI.
ACKNOWLEDGMENTS
This research was supported in part by the Indiana METACyt Initiative of Indiana University, funded
in part through a major grant from the Lilly Endowment, Inc., the IUB Faculty Research Support
Program, and the Indiana University GPSO Research Grant. We appreciate the insights provided
by Karin Harman James, Sunah Kim, and James Townsend, by other members the Perception and
Neuroimaging Laboratory, and by other members of the Indiana University Neuroimaging Group.
The Use of fMRI to Assess Multisensory Integration 145
REFERENCES
Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal sub-
threshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–9.
Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual
neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific
coding. Brain Research 1242:95–101.
Alvarado, J.C., J.W. Vaughan, T.R. Stanford, and B.E. Stein. 2007. Multisensory versus unisensory integration:
Contrasting modes in the superior colliculus. Journal of Neurophysiology 97:3193–205.
Attwell, D., and C. Iadecola. 2002. The neural basis of functional brain imaging signals. Trends in Neurosciences
25:621–5.
Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17:377–91.
Beauchamp, M.S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics
3:93–113.
Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004a. Unraveling multisensory integra-
tion: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–2.
Beauchamp, M.S., K.E. Lee, B.D. Argall, and A. Martin. 2004b. Integration of auditory and visual information
about objects in superior temporal sulcus. Neuron 41:809–23.
Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57:849–72.
Binder, J.R., J.A. Frost, T.A. Hammeke et al. 1999. Conceptual processing during the conscious resting state.
A functional MRI study. Journal of Cognitive Neuroscience 11:80–95.
Boynton, G.M., S.A. Engel, G.H. Glover, and D.J. Heeger. 1996. Linear systems analysis of functional mag-
netic resonance imaging in human V1. Journal of Neuroscience 16:4207–21.
Boynton, G.M., and E.M. Finney. 2003. Orientation-specific adaptation in human visual cortex. The Journal of
Neuroscience 23:8781–7.
Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84.
Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific cortices
during crossmodal binding. NeuroReport 10:2619–23.
Calvert, G.A., R. Campbell, and M.J. Brammer. 2000. Evidence from functional magnetic resonance imaging
of crossmodal binding in the human heteromodal cortex. Current Biology 10:649–57.
Calvert, G.A., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites
in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14:427–38.
Dale, A.M., and R.L. Buckner. 1997. Selective averaging of rapidly presented individual trials using fMRI.
Human Brain Mapping 5:329–40.
Friston, K.J., E. Zarahn, O. Josephs, R.N. Henson, and A.M. Dale. 1999. Stochastic designs in event-related
fMRI. NeuroImage 10:607–19.
Glover, G.H. 1999. Deconvolution of impulse response in event-related BOLD fMRI. NeuroImage 9:416–29.
Heeger, D.J., A.C. Huk, W.S. Geisler, and D.G. Albrecht. 2000. Spikes versus BOLD: What does neuroimaging
tell us about neuronal activity? Nature Neuroscience 3:631–3.
Heeger, D.J., and D. Ress. 2002. What does fMRI tell us about neuronal activity? Nature Reviews Neuroscience
3:142–51.
Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of
the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–37.
James, W. 1890. The Principles of Psychology. New York: Henry Holt & Co.
Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity
as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental
Brain Research 166:289–97.
Logothetis, N.K., and B.A. Wandell. 2004. Interpreting the BOLD signal. Annual Review of Physiology
66:735–69.
Meredith, M.A. 2002. On the neuronal basis for multisensory convergence: A brief overview. Brain Research.
Cognitive Brain Research 14:31–40.
Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex.
NeuroReport 20:126–31.
146 The Neural Bases of Multisensory Processes
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–91.
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62.
Molyneux, W. 1688. Letter to John Locke. In E.S. de Beer (ed.), The correspondence of John Locke. Oxford:
Clarendon Press.
Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2003. Neuron-specific response characteristics
predict the magnitude of multisensory integration. Journal of Neurophysiology 90:4022–6.
Scannell, J.W., and M.P. Young. 1999. Neuronal population activity and functional imaging. Proceedings of the
Royal Society of London. Series B. Biological Sciences 266:875–81.
Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in
context. NeuroReport 18:787–92.
Stark, C.E., and L.R. Squire. 2001. When zero is not zero: The problem of ambiguous baseline conditions in
fMRI. Proceedings of the National Academy of Sciences of the United States of America 98:12760–6.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: The MIT Press.
Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single
neuron. Nature Reviews Neuroscience 9:255–66.
Stein, B.E., T.R. Stanford, R. Ramachandran, T.J. Perrault Jr., and B.A. Rowland. 2009. Challenges in quantify-
ing multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain
Research 198:113–26.
Stevens, S.S. 1946. On the theory of scales of measurement. Science 103:677–80.
Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse
effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210–23.
Stevenson, R.A., M.L. Geoghegan, and T.W. James. 2007. Superadditive BOLD activation in superior temporal
sulcus with threshold non-speech objects. Experimental Brain Research 179:85–95.
Stevenson, R.A., S. Kim, and T.W. James. 2009. An additive-factors design to disambiguate neuronal and areal
convergence: Measuring multisensory interactions between audio, visual, and haptic sensory streams
using fMRI. Experimental Brain Research 198:183–94.
Thompson, J.K., M.R. Peterson, and R.D. Freeman. 2003. Single-neuron activity and tissue oxygenation in the
cerebral cortex. Science 299:1070–2.
van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007. The effect of temporal asynchrony on the
multisensory integration of letters and speech sounds. Cerebral Cortex 17:962–74.
9 Perception of Synchrony
between the Senses
Mirjam Keetels and Jean Vroomen
CONTENTS
9.1 Introduction........................................................................................................................... 147
9.2 Measuring Intersensory Synchrony: Temporal Order Judgment Task and Simultaneity
Judgment Task....................................................................................................................... 148
9.3 Point of Subjective Simultaneity............................................................................................ 150
9.3.1 Attention Affecting PSS: Prior Entry........................................................................ 151
9.4 Sensitivity for Intersensory Asynchrony............................................................................... 152
9.4.1 Spatial Disparity Affects JND................................................................................... 153
9.4.2 Stimulus Complexity Affects JND............................................................................ 154
9.4.3 Stimulus Rate Affects JND....................................................................................... 155
9.4.4 Predictability Affects JND........................................................................................ 155
9.4.5 Does Intersensory Pairing Affect JND?.................................................................... 156
9.5 How the Brain Deals with Lags between the Senses............................................................ 156
9.5.1 Window of Temporal Integration.............................................................................. 156
9.5.2 Compensation for External Factors........................................................................... 158
9.5.3 Temporal Recalibration............................................................................................. 161
9.5.4 Temporal Ventriloquism............................................................................................ 164
9.6 Temporal Synchrony: Automatic or Not?.............................................................................. 167
9.7 Neural Substrates of Temporal Synchrony............................................................................ 169
9.8 Conclusions............................................................................................................................ 170
References....................................................................................................................................... 171
9.1 INTRODUCTION
Most of our real-world perceptual experiences are specified by synchronous redundant and/or com-
plementary multisensory perceptual attributes. As an example, a talker can be heard and seen at the
same time, and as a result, we typically have access to multiple features across the different senses
(i.e., lip movements, facial expression, pitch, speed, and temporal structure of the speech sound).
This is highly advantageous because it increases perceptual reliability and saliency and, as a result,
it might enhance learning, discrimination, or the speed of a reaction to the stimulus (Sumby and
Pollack 1954; Summerfield 1987). However, the multisensory nature of perception also raises the
question about how the different sense organs cooperate so as to form a coherent representation of
the world. In recent years, this has been the focus of much behavioral and neuroscientific research
(Calvert et al. 2004). The most commonly held view among researchers in multisensory perception
is what has been referred to as the “assumption of unity.” It states that, as information from different
modalities share more (amodal) properties, the more likely the brain will treat them as originating
from a common object or source (see, e.g., Bedford 1989; Bertelson 1999; Radeau 1994; Stein and
Meredith 1993; Welch 1999; Welch and Warren 1980). Without a doubt, the most important amodal
147
148 The Neural Bases of Multisensory Processes
property is temporal coincidence (e.g., Radeau 1994). From this perspective, one expects intersen-
sory interactions to occur if, and only if, information from the different sense organs arrives at
around the same time in the brain; otherwise, two separate events are perceived rather than a single
multimodal one.
The perception of time and, in particular, synchrony between the senses is not straightforward
because there is no dedicated sense organ that registers time in an absolute scale. Moreover, to
perceive synchrony, the brain has to deal with differences in physical (outside the body) and neural
(inside the body) transmission times. Sounds, for example, travel through air much slower than
visual information does (i.e., 300,000,000 m/s for vision vs. 330 m/s for audition), whereas no
physical transmission time through air is involved for tactile stimulation as it is presented directly
at the body surface. The neural processing time also differs between the senses, and it is typically
slower for visual than it is for auditory stimuli (approximately 50 vs. 10 ms, respectively), whereas
for touch, the brain may have to take into account where the stimulation originated from as the trav-
eling time from the toes to the brain is longer than from the nose (the typical conduction velocity
is 55 m/s, which results in a ~30 ms difference between toe and nose when this distance is 1.60 m;
Macefield et al. 1989). Because of these differences, one might expect that for audiovisual events,
only those occurring at the so-called “horizon of simultaneity” (Pöppel 1985; Poppel et al. 1990)—a
distance of approximately 10 to 15 m from the observer—will result in the approximate synchro-
nous arrival of auditory and visual information at the primary sensory cortices. Sounds will arrive
before visual stimuli if the audiovisual event is within 15 m from the observer, whereas vision will
arrive before sounds for events farther away. Although surprisingly, despite these naturally occur-
ring lags, observers perceive intersensory synchrony for most multisensory events in the external
world, and not only for those at 15 m.
In recent years, a substantial amount of research has been devoted to understanding how the
brain handles these timing differences (Calvert et al. 2004; King 2005; Levitin et al. 2000; Spence
and Driver 2004; Spence and Squire 2003). Here, we review several key issues about intersensory
timing. We start with a short overview of how intersensory timing is generally measured, and then
discuss several factors that affect the point of subjective simultaneity and sensitivity. In the sections
that follow, we address several ways in which the brain might deal with naturally occurring lags
between the senses.
notice. A steep psychometric curve thus implies a small JND, and sensitivity is thus good as observ-
ers are able to detect small asynchronies (see Figure 9.1).
The second task that has been used often is the SJ task. Here, stimuli are also presented at
various SOAs, but rather than judging which stimulus came first, observers now judge whether
the stimuli were presented simultaneously or not. In the SJ task, one usually obtains a bell-shaped
Gaussian curve if the percentage of “simultaneous” responses is plotted as a function of the SOA.
For the audiovisual case, the raw data are usually not mirror-symmetric, but skewed toward more
“simultaneous” responses on the “light-first” side of the axis. Once a curve is fitted on the raw data,
one can, as in the TOJ task, derive the PSS and the JND: the peak of the bell shape corresponds to
the PSS, and the width of the bell shape corresponds to the JND.
The TOJ and SJ tasks have, in general, been used more or less interchangeably, despite the fact
that comparative studies have found differences in performance measures derived from both tasks.
Possibly, it reflects that judgments about simultaneity and temporal order are based on different
sources of information (Hirsh and Fraisse 1964; Mitrani et al. 1986; Schneider and Bavelier 2003;
Zampini et al. 2003a). As an example, van Eijk et al. (2008) examined task effects on the PSS.
They presented observers a sound and light, or a bouncing ball and an impact sound at various
SOAs, and had them perform three tasks: an audiovisual TOJ task (“sound-first” or “light-first”
responses required), an SJ task with two response categories (SJ2; “synchronous” or “asynchro-
nous” responses required), and an SJ task with three response categories (SJ3; “sound-first,” “syn-
chronous,” or “light-first” responses required). Results from both stimulus types showed that the
individual PSS values for the two SJ tasks correlated well, but there was no correlation between the
100
Percentage of “Synchronous” or
75
“V-first” responses
JND
50
25
PSS
0
A-first –80 –60 –40 –20 20 40 60 80 V-first
Stimulus onset asynchrony (in ms)
FIGURE 9.1 S-shaped curve that is typically obtained for a TOJ task and a bell-shaped curve typically
obtained in a simultaneity task (SJ). Stimuli from different modalities are presented at varying SOAs, ranging
from clear auditory-first (A-first) to clear vision-first (V-first). In a TOJ task, the participant’s task is to judge
which stimulus comes first, sound or light, whereas in a SJ task, subjects judge whether stimuli are synchro-
nous or not. The PSS represents the interval at which information from different modalities is perceived as
being maximally simultaneous (~0 ms). In a SJ task, this is the point at which the most synchronous responses
are given; in TOJ task, it is the point at which 50% of responses is vision-first and 50% is auditory-first. The
JND represents the smallest interval observers can reliably notice (in this example ~27 ms). In a SJ task, this
is the average interval (of A-first and V-first) at which a participant responds with 75% synchronous responses.
In a TOJ task, it is the difference in SOA at 25% and 75% point divided by two.
150 The Neural Bases of Multisensory Processes
TOJ and SJ tasks. This made the authors conclude, arguably, that the SJ task should be preferred
over the TOJ task if one wants to measure perception of audiovisual synchrony.
In our view, there is no straightforward solution about how to measure the PSS or JND for
intersensory timing because the tasks are subject to different kinds of response biases (see Schneider
and Bavelier 2003; Van Eijk et al. 2008; Vatakis et al. 2007, 2008b for discussion). In the TOJ task,
in which only temporal order responses can be given (“sound-first” or “light-first”), observers may
be inclined to adopt the assumption that stimuli are never simultaneous, which thus may result in
rather low JNDs. On the other hand, in the SJ task, observers may be inclined to assume that stimuli
actually belong together because the “synchronous” response category is available. Depending on
criterion settings, this may result in many “synchronous” responses, and thus, a wide bell-shaped
curve which will lead to the invalid conclusion that sensitivity is poor.
In practice, both the SJ and TOJ task will have their limits. The SJ2 task suffers heavily from the
fact that observers have to adopt a criterion about what counts as “simultaneous/nonsimultaneous.”
And in the SJ3 task, the participant has to dissociate sound-first stimuli from synchronous ones, and
light-first stimuli from synchronous ones. Hence, in the SJ3 task there are two criteria: a “sound-first/
simultaneous” criterion, and a “light-first/simultaneous” criterion. If observers change, for whatever
reason, their criterion (or criteria) along the experiment or between experimental manipulations, it
changes the width of the curve and the corresponding JND. If sensitivity is the critical measure, one
should thus be careful using the SJ task because JNDs depend heavily on these criterion settings.
A different critique can be applied to the TOJ task. Here, the assumption is made that observ-
ers respond at about 50% for each of the two response alternatives when maximally unsure about
temporal order. Although in practice, participants may adopt a different strategy and respond, for
example, “sound-first” (and others may, for arbitrary reasons, respond “light-first”) whenever unsure
about temporal order. Such a response bias will shift the derived 50% point toward one side of the
continuum or the other, and the 50% point will then not be a good measure of the PSS, the point at
which simultaneity is supposed to be maximal. If performance of an individual observer on an SJ
task is compared with a TOJ task, it should thus not come as too big of a surprise that the PSS and
JND derived from both tasks do not converge.
sounds on the sense organs (King and Palmer 1985). There will then be a preference for vision to
have a head start over sound so as to be perceived as simultaneous.
Besides this possibility, though, there are many other reasons why the PSS can differ quite substan-
tially from 0 ms SOA. To point out just a few: the PSS depends, among others, on stimulus intensity
(more intense stimuli are processed faster or come to consciousness more quickly (Jaskowski 1999;
Neumann and Niepel 2004; Roefs 1963; Sanford 1971; Smith 1933), stimulus duration (Boenke et
al. 2009), the nature of the response that participants have to make (e.g., “Which stimulus came
first?” vs. “Which stimulus came second?”; see Frey 1990; Shore et al. 2001), individual differ-
ences (Boenke et al. 2009; Mollon and Perkins 1996; Stone et al. 2001), and the modality to which
attention is directed (Mattes and Ulrich 1998; Schneider and Bavelier 2003; Shore et al. 2001, 2005;
Stelmach and Herdman 1991; Zampini et al. 2005c). We do not intend to list all the factors known
thus far, but we only pick out the one that has been particularly important in theorizing about per-
ception in general, that is, the role of attention.
participant’s task was to attend either the auditory or visual modality, and to respond to infrequent
targets in that modality at an attended location (e.g., respond to a slightly longer tone on the left).
The attended modality was constant during the experiment (but varied between subjects), and the
relevant location was specified at the beginning of each block of trials. The authors found enhanced
negativity in the ERP for stimuli at attended locations if compared to nonattended locations. The
negativity started at about 150 ms poststimulus for visual stimuli and at about 100 ms for auditory
stimuli. Evidence for a cross-modal link in spatial attention was also found, as the enhancement
(although smaller) was also found for stimuli at the attended location in the unattended modality
(see also Spence and Driver 1996; Spence et al. 2000 for behavioral results). Since then, analogous
results have been found by many others. For example, Eimer and Schröger (1998) found similar
results using a different design in which the side of the attended location varied from trial to trial.
Again, their results demonstrated enhanced negativities (between 160 and 280 ms after stimulus
onset) for attended locations as compared to unattended locations, and the effect was again bigger
for the relevant rather than irrelevant modality.
The critical issue for the idea prior entry is whether these ERP effects also reflect that attended
stimuli are processed faster. In most EEG studies, attention affects the amplitude of the ERP rather
than speed (for a review, see Eimer and Driver 2001). The problem is that there are many other inter-
pretations for an amplitude modulation rather than increased processing speed (e.g., less smearing
of the EEG signal over trials if attended). A shift in the latencies of the ERP would have been easier
to interpret in terms of increased processing speed, but the problem is that even if a latency shift
in the ERP is obtained, it is usually small if compared to the behavioral data. As an example, in an
ERP study by Vibell et al. (2007), attention was directed toward the visual or tactile modality in a
visual–tactile TOJ task. Results showed that the peak latency of the visual evoked potentials (P1
and N1) was earlier when attention was directed to vision (P1 = 147 ms, and N1 = 198 ms) rather
than when directed to touch (P1 = 151 ms, and N1 = 201 ms). This shift in the P1 may be taken as
evidence that attention indeed speeds up perception in the attended modality, but it should also be
noted that the 4-ms shift in the ERP is in a quite different order of magnitude than the 38 ms shift
of the PSS in the behavioral data, or the 133 ms shift reported by Spence et al. (2001) in a similar
study. In conclusion, there is both behavioral and electrophysiological support for the idea that atten-
tion speeds up perceptual processing, but the underlying neural mechanisms remain, for the time
being, elusive.
ity, whether it is speech or not, and—more controversial—the semantic congruency. Some of these
factors will be described below.
* In the McGurk illusion (McGurk and MacDonald 1976), it is shown that the perception of nonambiguous speech tokens
can be modified by the simultaneous presentation of visually incongruent articulatory gestures. Typically, when pre-
sented with an auditory syllable /ba/ dubbed onto a face articulating /ga/, participants report hearing /da/. The occurrence
of this so-called McGurk effect has been taken as a particularly powerful demonstration of the use of visual information
in speech perception.
Perception of Synchrony between the Senses 155
Stekelenburg 2009), and by whether there is a sharp transition that can serve as a temporal anchor
(Fujisaki and Nishida 2005). Each of these stimulus characteristics—and likely many others—need
to be controlled if one wants to compare across stimuli in a nonarbitrary way. Below, we address
some of these factors.
* It has also been reported that the presentation rate may shift the PSS. In a study by Arrighi et al. (2006), participants were
presented a video of hands drumming on a conga at various rates (1, 2, and 4 Hz). Observers were asked to judge whether
the auditory and visual streams appeared to be synchronous or not (an SJ task). Results showed that the auditory delay for
maximum simultaneity (the PSS) varied inversely with drumming tempo from about 80 ms at 1 Hz, and 60 ms at 2 Hz,
to 40 ms at 4 Hz. Video sequences of random drumming motion and of a disk moving along the motion profile matching
the hands of to the drummer produced similar results, with higher tempos requiring less auditory delay.
156 The Neural Bases of Multisensory Processes
better temporal sensitivity if visual predictive information about sound onset was available (the left
display) rather than if it was absent (the right display).
9.5 HOW THE BRAIN DEALS WITH LAGS BETWEEN THE SENSES
In any multisensory environment, the brain has to deal with lags in arrival and processing time
between the different senses. Surprisingly though, despite these lags, temporal coherence is usually
maintained, and only in exceptional circumstances such as the thunder, which is heard after the
lightning, a single multisensory event is perceived as being separated. This raises the question of
how temporal coherence is maintained. In our view, at least four options are available: (1) the brain
might be insensitive for small lags, or it could just ignore them (a window of temporal integration);
(2) the brain might be “intelligent” and bring deeply rooted knowledge about the external world into
play that allows it to compensate for various external factors; (3) the brain might be flexible and shift
its criterion about synchrony in an adaptive fashion (recalibration); or (4) the brain might actively
shift the time at which one information stream is perceived to occur toward the other (temporal
ventriloquism). Below, we discuss each of these notions. It should be noted beforehand that none of
these options mutually excludes the other.
have occurred simultaneously (see Figure 9.2, panel 1). Many have alluded to this concept, but what
is less satisfying about it is that it is basically a description rather than an explanation. To make this
point clear, some have reported that the temporal window for audiovisual speech can be quite large
because it can range from approximately 40 ms audio-first to 240 ms vision-first. However, sensitiv-
ity for intersensory asynchronies (JND) is usually much smaller than the size of this window. For
example, Munhall et al. (1996) demonstrated that exact temporal coincidence between the auditory
and visual parts of audiovisual speech stimuli is not a very strict constraint on the McGurk effect
(McGurk and MacDonald 1976). Their results demonstrated that the McGurk effect was biggest
when vowels were synchronized (see also McGrath and Summerfield 1985), but the effect survived
even if audition lagged vision by 180 ms (see also Soto-Faraco and Alsius 2007, 2009; these studies
a. Adjustment of criterion
4) Temporal ventriloquism: The perceived visual onset time is shifted towards audition
FIGURE 9.2 Synchrony can be perceived despite lags. How is this accomplished? Four possible mechanisms
are depicted for audiovisual stimuli like a flash and beep. Similar mechanisms might apply for other stimuli
and other modality pairings. Time is represented on the x-axis, and accumulation of sensory evidence on the
y-axis. A stimulus is time-stamped once it surpasses a sensory threshold. Stimuli in audition and vision are
perceived as being synchronous if they occur within a certain time window. (1) The brain might be insensitive
for naturally occurring lags because the window of temporal integration is rather wide. (2) The brain might
compensate for predictable variability—here, sound distance—by adjusting perceived occurrence of a sound
in accordance with sound travel time. (3) Temporal recalibration. Three different mechanisms might underlie
adaptation to asynchrony: (a) a shift in criterion about synchrony for adapted stimuli or modalities, (b) a wid-
ening of temporal window for adapted stimuli or modalities, and (c) a change in threshold of sensory detection
(when did the stimulus occur?) within one of adapted modalities. (4) Temporal ventriloquism: a visual event
is actively shifted toward an auditory event.
158 The Neural Bases of Multisensory Processes
show that participants can still perceive a McGurk effect when they can quite reliably perform
TOJs). Outside the speech domain, similar findings have been reported. In a study by Shimojo et al.
(2001), the role of temporal synchrony was examined using the streaming–bouncing illusion (i.e.,
two identical visual targets that move across each other and are normally perceived as a streaming
motion are typically perceived to bounce when a brief sound is presented at the moment that the
visual targets coincide; Sekuler et al. 1997). The phenomenon is dependent on the timing of the
sound relative to the coincidence of the moving objects. Although it has been demonstrated that a
brief sound induced the visual bouncing percept most effectively when it was presented about 50 ms
before the moving objects coincide, their data furthermore showed a rather large temporal window
of integration because intervals ranging from 250 ms before visual coincidence to 150 ms after
coincidence still induced the bouncing percept (see also Bertelson and Aschersleben 1998, for the
effect of temporal asynchrony on spatial ventriloquism; or Shams et al. 2002, for the illusory-flash
effect). All these intersensory effects thus occur at asynchronies that are much larger than JNDs
normally reported when directly exploring the effect of asynchrony using TOJ or SJ tasks (van
Wassenhove et al. 2007). One might argue that despite the fact that observers do notice small delays
between the senses, the brain can still ignore it if it is of help for other purposes, such as understand-
ing speech (Soto-Faraco and Alsius 2007, 2009). But the question then becomes, why is there more
than one window; that is, one for understanding, the other for noticing timing differences.
Besides the width of the temporal window varying with the purpose of the task, it has also been
found to vary for different kinds of stimuli. As already mentioned, the temporal window is much
smaller for clicks and flashes than it is for audiovisual speech. However, why would the size be
different for different stimuli? Does the brain have a separate window for each stimulus and each
purpose? If so, we are left with explaining how and why it varies. Some have taken the concept of
a window quite literally, and have argued that “speech is special” because the window for audiovi-
sual speech is wide (van Wassenhove et al. 2007; Vatakis et al. 2008a). Although we would rather
refrain from such speculations, and consider it more useful to examine what the critical features are
that determine when perception of simultaneity becomes easy (a small window) or difficult (a large
window). The size of the window is thus, in our view, the factor that needs to be explained rather
than that it is the explanation itself.
~5 ms sound delay, and the delay increased when the LEDs were farther away. The increment was
consistent with the velocity of sounds up to a viewing distance of about 10 m, after which it leveled
off. This led the authors to conclude that lags between auditory and visual inputs are perceived as
synchronous not because the brain has a wide temporal window for audiovisual integration, but
because the brain actively changes the temporal location of the window depending on the distance
of the source.
Alais and Carlile (2005) came to similar conclusions, but with different stimuli. In their study,
auditory stimuli were presented over a loudspeaker and auditory distance was simulated by varying
the direct-to-reverberant energy ratio as a depth cue for sounds (Bronkhorst 1995; Bronkhorst and
Houtgast 1999). The near sounds simulated a depth of 5 m and had substantial amounts of direct
energy with a sharp transient onset; the far sounds simulated a depth of 40 m and did not have a
transient. The visual stimulus was a Gaussian blob on a computer screen in front of the observer
without variations in the distance. Note that, again, no attempt was made to equate auditory and
visual distance, thus again undermining the underlying notion. The effect of apparent auditory dis-
tance on temporal alignment with the blob on the screen was measured in a TOJ task. The authors
found compensation for depth, thus the PSS in the audiovisual TOJ task shifted with the apparent
distance of the sound in accordance with the speed of sounds through air up to 40 m. Although on
closer inspection of their data, it is clear that the shift in the PSS was mainly caused by the fact that
sensitivity for intersensory synchrony became increasingly worse for more distant sounds. Judging
from their figures, sensitivity for nearby sounds at 5 m was in the normal range, but for the most
distant sound, sensitivity was extremely poor as it never reached plateau, and even at a sound delay
of 200 ms, 25% of the responses was still “auditory-first” (see also Arnold et al. 2005; Lewald and
Guski 2004). This suggests that observers, while performing the audiovisual TOJ task, could not
use the onset of the far sound as a cue for temporal order, possibly because it lacks a sharp transient
and that they had to rely on other cues instead. Besides controversial stimuli and data, there are oth-
ers who simply failed to observe compensation for distance (Arnold et al. 2005; Heron et al. 2007;
Lewald and Guski 2004; Stone et al. 2001). For example, Stone et al. (2001) used an audiovisual SJ
task and varied stimulus–observer distances from 0.5 m in the near condition to 3.5 m in the far con-
dition. This resulted in a 3-m difference that would theoretically correspond to an 11 ms difference
in the PSS if sound–travel time would not be compensated (sound velocity of 330 m/s corresponds
to ~3.5 m/11 ms). For three out of five subjects, the PSS values were indeed shifted in that direction,
which led the authors to conclude that distance was not compensated. Against this conclusion, it
should be said that the SJ tasks depend heavily on criterion settings, that “three-out-of-five” is not
persuasively above chance, and that the range of distances was rather restricted.
Less open to these kinds of criticisms is a study by Lewald and Guski (2004). They used a
rather wide range of distances (1, 5, 10, 20, and 50 m), and their audiovisual stimuli (a sequence of
five beeps/flashes) were delivered by colocated speakers/LEDs placed in the open field. Note that
in this case, there were no violations in the “naturalness” of the audiovisual stimuli and that they
were physically colocated. Using this setup, the authors did not observe compensation for distance.
Rather, their results showed that when the physical observer–stimulus distance increased, the PSS
shifted precisely with the variation in sound transmission time through air. For audiovisual stimuli
that are far away, sounds thus had to be presented earlier than for nearby stimuli to be perceived
as simultaneous, and there was no sign that the brain would compensate for sound–traveling time.
The authors also suggested that the discrepancy between their findings and those who did find com-
pensation for distance lies in the fact that the latter simulated distance rather than using the natural
situation.
Similar conclusions were also reached by Arnold et al. (2005), who examined whether the stream/
bounce illusion (Sekuler et al. 1997) varies with distance. The authors examined whether the opti-
mal time to produce a “bounce” percept varied with the distance of the display, which ranged from
~1 to ~15 m. The visual stimuli were presented on a computer monitor—keeping retinal proper-
ties constant—and the sounds were presented either over loudspeakers at these distances or over
160 The Neural Bases of Multisensory Processes
headphones. The optimal time to induce a bounce percept shifted with the distance of the sound if
they were presented over loudspeakers, but there was no shift if the sound was presented over head-
phones. Similar effects of timing shifts with viewing distance after loudspeaker, but not headphone,
presentation were obtained in an audiovisual TOJ task in which observers judged whether a sound
came before or after two disks collided. This led the authors to conclude that there is no compensa-
tion for distance if distance is real and presented over speakers rather than simulated and presented
over headphones.
This conclusion might well be correct, but it raises the question of how to account for the findings
by Kopinska and Harris (2004). These authors reported complete compensation for distance despite
using colocated sounds and lights produced at natural distances. In their study, the audiovisual
stimulus was a bright disk that flashed once on a computer monitor and it was accompanied by a
tone burst presented from the computer’s inbuilt speaker. Participants were seated at various dis-
tances from the screen (1, 4, 8, 16, 24, and 32 m) and made TOJs about the flash and the sound. The
authors also selectively slowed down visual processing by presenting the visual stimulus at 20° of
eccentricity rather than in the fovea, or by having observers wear darkened glasses. As an additional
control, they used simple reaction time tasks and found that all these variations—distance, eccen-
tricity, and dark glasses—had predictable effects on auditory or visual speeded reaction. However,
audiovisual simultaneity was not affected by distance, eccentricity, or darkened glasses. Thus, there
was no shift in the PSS despite the fact that the change in distance, illumination, and retinal location
affected simple reaction times. This made the authors conclude that observers recover the external
world by taking into account all kinds of predictable variations, most importantly distance, alluding
to similar phenomena such as size or color constancy.
There are some studies that varied audiovisual distance in a natural way, but came to diametri-
cally opposing conclusions: Lewald and Guski (2004) and Arnold et al. (2005) found no compensa-
tion for distance, whereas Kopinska and Harris (2004) reported complete compensation. What’s the
critical difference between them? Our conjecture is that they differ in two critical aspects, that is,
(1) whether distance was randomized on a trial-by-trial basis or blocked, and (2) whether sensitivity
for temporal order was good or poor. In the study by Lewald and Guski, the distance of the stimuli
was varied on a trial-by-trial basis as they used a setup of five different speakers/LEDs. In Kopinska
and Harris’s study, though, the distance between the observer and the screen was blocked over trials
because otherwise subjects would have to be shifted back and forth after each trial. If the distance is
blocked, then either adaptation to the additional sound lag may occur (i.e., recalibration), or subjects
may equate response probabilities to the particular distance that they are seated. Either way, the
effect of distance on the PSS will diminish if trials are blocked, and no shift in the PSS will then
be observed, leading to the “wrong” conclusion that distance is compensated. This line of reason-
ing corresponds with a recent study by Heron et al. (2007). In their study, participants performed a
TOJ task in which audiovisual stimuli (a white disk and a click) were presented at varying distances
(0, 5, 10, 20, 30, and 40 m). Evidence for compensation was only found after a period of adaptation
(1 min + 5 top-up adaptation stimuli between trials) to the naturally occurring audiovisual asyn-
chrony associated with a particular viewing distance. No perceptual compensation for distance-
induced auditory delays could be demonstrated whenever there was no adaptation period (although
we should notice that in the present study, observer distance was always blocked).
The second potentially relevant difference between studies that do or do not demonstrate com-
pensation is the difficulty of the stimuli. Lewald and Guski (2004) used a sequence of five pulses/
sounds, whereas Kopinska and Harris (2004) presented a single sound/flash. In our experience, a
sequence of pulses/flashes drastically improves accuracy for temporal order if compared to a single
pulse/flash because there are many more cues in the signal. In the study by Arnold et al. (2005),
judgments about temporal order could also be relatively accurate because the two colliding disks
provided anticipatory information about when to expect the sound. Most likely, observers in the
study of Kopinska and Harris were inaccurate because their single sound/flash stimuli without
anticipatory information were difficult (unfortunately, none of the studies reported JNDs). In effect,
Perception of Synchrony between the Senses 161
this amounts to adding noise to the psychometric function, which then effectively masks the effect
of distance on temporal order. It might easily lead one to conclude “falsely” that there is compensa-
tion for distance.
Time
Exposure lag 0 ms
Time
= Visual stimulus
= Sound
= Vibro-tactile stimulus
FIGURE 9.3 Schematic illustration of exposure conditions typically used in a temporal recalibration para-
digm. During exposure, participants are exposed to a train of auditory–visual (AV) or tactile–visual (TV)
stimulus pairs (panels a and b, respectively) with a lag of –100, 0, or +100 ms. To explore possible shifts in
perceived simultaneity or sensitivity to asynchrony, typically a TOJ or SJ task is performed in a subsequent
test phase. (From Fujisaki, W. et al., Nat. Neurosci., 7, 773–8, 2004; Vroomen, J. et al., Cogn. Brain Res.,
22, 32–5, 2004; Keetels, M., Vroomen, J., Percept. Psychophys., 70, 765–71, 2008; Keetels, J., Vroomen, M.,
Neurosci. Lett., 430, 130–4, 2008. With permission.)
changed (see Figure 9.2, panel 3b). For example, as an attempt to perceive simultaneity during light-
first exposure, participants might delay processing time in the visual modality by adopting a more
stringent criterion for sensory detection of visual stimuli. After exposure to light-first audiovisual
pairings, one might then expect slower processing times of visual stimuli in general, and other
modality pairings that involve the visual modality, say vision–touch, would then also be affected.
Two strategies have been undertaken to explore the mechanism underlying temporal recalibra-
tion. The first is to examine whether temporal recalibration generalizes to other stimuli within
the adapted modalities, the second is to examine whether temporal recalibration affects different
modality pairings than the ones adapted. Fujisaki et al. (2004) have already demonstrated that the
effect of adaptation in temporal misalignment was effective even when the visual test stimulus was
very different from the exposure situation. The authors exposed observers to asynchronous tone-
flash stimulus pairs and later tested them on the “stream/bounce” illusion (Sekuler et al. 1997).
Fujisaki et al. reported that the optimal delay for obtaining a bounce percept in the stream/bounce
illusion was shifted in the same direction as the adapted lag. Furthermore, after exposure to a “wall-
display,” in which tones were timed with a ball bouncing off the inner walls of a square, similar
shifts in the PSS on the bounce percept were found (a ~45 ms difference when comparing the PSS
of the –235 ms sound-first exposure with the +235 ms vision-first exposure). Audiovisual temporal
recalibration thus generalized well to other visual stimuli.
Navarra et al. (2005) and Vatakis et al. (2008b) also tested generalization for audiovisual tempo-
ral recalibration using stimuli from different domains (speech/nonspeech). Their observers had to
monitor a continuous speech stream for target words that were presented either in synchrony with
the video of a speaker, or with the audio stream lagging 300 ms behind. During the monitoring
Perception of Synchrony between the Senses 163
task, participants performed a TOJ (Navarra et al. 2005; Vatakis et al. 2007) or SJ task (Vatakis
et al. 2008b) on simple flashes and white noise bursts that were overlaid on the video. Their results
showed that sensitivity, rather than a shift in the PSS, became worse if subjects were exposed to
desynchronized rather than synchronized audiovisual speech. Similar effects (larger JNDs) were
found with music stimuli. This led the authors to conclude that the “window of temporal integra-
tion” was widened (see Figure 9.2, panel 3c) because of asynchronous exposure (see also Navarra
et al. 2007 for effects on JND after adaptation to asynchronous audio–tactile stimuli). The authors
argued that this effect on the JND may reflect an initial stage of recalibration in which a more
lenient criterion is adopted for simultaneity. With prolonged exposure, subjects may then shift the
PSS. An alternative explanation—also considered by the authors, but rejected—might be that sub-
jects became confused by the nonmatching exposure stimuli, which as a result may also affect the
JND rather than the PSS because it adds noise to the distribution.
The second way to study the underlying mechanisms of temporal recalibration is to examine
whether temporal recalibration generalizes to different modality pairings. Hanson et al. (2008)
explored whether a “supramodal” mechanism might be responsible for the recalibration of multi-
sensory timing. They examined whether adaptation to audiovisual, audio–tactile, and tactile–visual
asynchronies (10 ms flashes, noise bursts, and taps on the left index finger) generalized across
modalities. The data showed that a brief period of repeated exposure to ±90 ms asynchrony in any
of these pairings resulted in shifts of about 70 ms of the PSS on subsequent TOJ tasks, and that
the size and nature of the shifts were very similar across all three pairings. This made them con-
clude that there is a “general mechanism.” Opposite conclusions though, were reached by Harrar
and Harris (2005). They exposed participants for 5 min to audiovisual pairs with a fixed time lag
(250 ms light-first), but did not obtain shifts in the PSSs for touch–light pairs. In an extension of this
topic (Harrar and Harris 2008), observers were exposed for 5 min to ~100 ms lags of light-first stim-
uli for the audiovisual case, and touch-first stimuli for the auditory–tactile and visual–tactile case.
Participants were tested on each of these pairs before and after exposure. Shifts of the PSS in the
predicted direction were only found in the audiovisual exposure–test stimuli, but not for the other
cases. Di Luca et al. (2007) also exposed participants to asynchronous audiovisual pairs (~200 ms
lags of sound-first and light-first) and measured the PSS for audiovisual, audio–tactile, and visual–
tactile test stimuli. Besides obtaining a shift in the PSS for audiovisual pairs, the effect was found
to generalize to audio–tactile, but not to visual–tactile test pairs. This pattern made the authors
conclude that adaptation resulted in a phenomenal shift of the auditory event (Di Luca et al. 2007).
Navarra et al. (2009) also recently reported that the auditory rather than visual modality is more
flexible. Participants were exposed to synchronous or asynchronous audiovisual stimuli (224 ms
vision-first, or 84 ms auditory-first for 5 min of exposure) after which they performed a speeded
reaction time task on unimodal visual or auditory stimuli. In contrast with the idea that visual
stimuli get adjusted in time to the relatively more accurate auditory stimuli (Hirsh and Sherrick
1961; Shipley 1964; Welch 1999; Welch and Warren 1980), their results seemed to show the oppo-
site, namely, that auditory rather than visual stimuli were shifted in time. The authors reported that
simple reaction times to sounds became approximately 20 ms faster after vision-first exposure and
about 20 ms slower after auditory-first exposure, whereas simple reaction times for visual stimuli
remained unchanged. They explained this finding by alluding to the idea that visual information
can serve as the temporal anchor because it is a more exact estimate of the time of occurrence of a
distal event rather than auditory information because light travel time does not depend on distance.
Further research is needed, however, to examine whether a change in simple reaction times is truly
reflective of a change in the timing of that event, as there is quite some evidence showing that the
two do not always go hand-in-hand (e.g., reaction times are more affected by variations in intensity
than TOJs; Jaskowski and Verleger 2000; Neumann and Niepel 2004).
To summarize, until now, there is no clear explanation for the mechanism underlying temporal
recalibration as there is some discrepancy in the data regarding generalization across modalities. It
seems safe to conclude that the audiovisual exposure–test situation is the most reliable one to obtain
164 The Neural Bases of Multisensory Processes
a shift in the PSS. Arguably, audiovisual pairs are more flexible because the brain has to correct
for timing differences between auditory and visual stimuli because of naturally occurring delays
caused by distance. Tactile stimuli might be more rigid in time because visual–tactile and audio–
tactile events always occur at the body surface, so less compensation for latency differences might
be required here. As already mentioned above, a widening of the JND, rather than a shift in the
PSS, has also been observed and it might possibly reflect an initial stage of recalibration in which a
more lenient criterion about simultaneity is adopted. The reliability of each modality on its own is
also likely to play a role. For visual stimuli, it is known that they are less reliable in time than audi-
tory or tactile stimuli (Fain 2003), and as a consequence they may be more malleable (Ernst and
Banks 2002; Ernst et al. 2000; Ernst and Bulthoff 2004), but there is also evidence that the auditory
modality is, in fact, shifted.
Time
= Visual stimulus
= Sound
= Vibro-tactile stimulus
FIGURE 9.4 A schematic illustration of conditions typically used to demonstrate auditory–visual temporal
ventriloquism (panel a) and tactile–visual temporal ventriloquism (panel b). The first capturing stimulus (i.e.,
either a sound or a vibro–tactile stimulus) precedes the first light by 100 ms, whereas the second capturing
stimulus trails the second light by 100 ms. Baseline condition consists of presentation of two capturing stimuli
simultaneous with light onsets. Temporal ventriloquism is typically shown by improved visual TOJ sensitivity
when capture stimuli are presented with a 100-ms interval. (From Scheier, C.R. et al., Invest. Ophthalmol. Vis.
Sci., 40, 4169, 1999; Morein-Zamir, S. et al., Cogn. Brain Res., 17, 154–63, 2003; Vroomen, J., Keetels, M.,
J. Exp. Psychol. Hum. Percept. Perform., 32, 1063–71, 2006; Keetels, M. et al., Exp. Brain Res., 180, 449–56,
2007; Keetels, M., Vroomen, J., Percept. Psychophys., 70, 765–71, 2008, Keetels, M., Vroomen, J., Neurosci.
Lett., 430, 130–4, 2008. With permission.)
Nijhawan 1994, 1997, 2002), a flash appears to lag behind a moving visual stimulus even though
the stimuli are presented at the same physical location. To induce temporal ventriloquism, Vroomen
and de Gelder added a single click presented slightly before, at, or after the flash (intervals of 0, 33,
66, and 100 ms). The results showed that the sound attracted the temporal onset of the flash and
shifted it in the order of ~5%. A sound ~100 ms before the flash thus made the flash appear ~5 ms
earlier, and a sound 100 ms after the flash made the flash appear ~5 ms later. A sound, including the
synchronous one, also improved sensitivity on the visual task because JNDs on the visual task were
better if a sound was present rather than absent.
Yet another recent manifestation of temporal ventriloquism used an apparent visual motion par-
adigm. Visual apparent motion occurs when a stimulus is flashed in one location and is followed by
another identical stimulus flashed in another location (Korte 1915). Typically, an illusory movement
is observed that starts at the lead stimulus and is directed toward the second lagging stimulus (the
strength of the illusion depends on the exposure time of the stimuli, and the temporal and spatial
separation between them). Getzmann (2007) explored the effects of irrelevant sounds on this motion
illusion. In their study, two temporally separated visual stimuli (SOAs ranged from 0 to 350 ms)
were presented and participants classified their impression of motion using a categorization system.
The results demonstrated that sounds intervening between the visual stimuli facilitated the impres-
sion of apparent motion relative to no sounds, whereas sounds presented before the first and after
the second visual stimulus reduced motion perception (see Bruns and Getzmann 2008 for similar
results). The idea was that because exposure time and spatial separation were both held constant in
this study, the impression of apparent motion was systematically affected by the perceived length of
the interstimulus interval. The effect was explained in terms of temporal ventriloquism, as sounds
attracted the illusory onset of visual stimuli.
Freeman and Driver (2008) investigated whether the timing of a static sound could influence spa-
tiotemporal processing of visual apparent motion. Apparent motion was induced by visual stimuli
166 The Neural Bases of Multisensory Processes
alternating between opposite hemifields. The perceived direction typically depends on the relative
timing interval between the left–right and right–left flashes (e.g., rightward motion dominating
when left–right interflash intervals are shortest; von Grunau 1986). In their study, the interflash
intervals were always 500 ms (ambiguous motion), but sounds could slightly lead the left flash and
lag the right flash by 83 ms or vice versa. Because of temporal ventriloquism, this variation made
visual apparent motion depend on the timing of the sound stimuli (e.g., more rightward responses if
a sound preceded the left flash, and lagged the right flash, and more leftward responses if a sound
preceded the right flash, and lagged the left flash).
The temporal ventriloquist effect has also been used as a diagnostic tool to examine whether
commonality in space is a constraint on intersensory pairing. Vroomen and Keetels (2006) adopted
the visual TOJ task of Scheier et al. (1999) and replicated that sounds improved sensitivity in the
AVVA version of the visual TOJ task. Importantly, the temporal ventriloquist effect was unaffected
by whether sounds and lights were colocated or not. For example, the authors varied whether the
sounds came from a central location or a lateral one, whether the sounds were static or moving,
and whether the sounds and lights came from the same or different sides of fixation at either small
or large spatial disparities. All these variations had no effect on the temporal ventriloquist effect,
despite that discordant sounds were shown to attract reflexive spatial attention and to interfere with
speeded visual discrimination. These results made the author conclude that intersensory interac-
tions in general do not require spatial correspondence between the components of the cross-modal
stimuli (see also Keetels et al. 2007).
In another study (Keetels and Vroomen 2008a), it was explored whether touch affects vision on
the time dimension as audition does (visual–tactile ventriloquism), and whether spatial disparity
between the vibrator and lights modifies this effect. Given that tactile stimuli are spatially better
defined than tones because of their somatotopic rather than tonotopic initial coding, this study pro-
vided a strong test case for the notion that spatial co-occurrence between the senses is required for
intersensory temporal integration. The results demonstrated that tactile–visual stimuli behaved like
audiovisual stimuli, in that temporally misaligned tactile stimuli captured the onsets of the lights
and spatial discordance between the stimuli did not harm this phenomenon.
Besides exploring whether spatial disparity affects temporal ventriloquism, the effect of synes-
thetic congruency between modalities was also recently explored (Keetels and Vroomen 2010;
Parise and Spence 2008). Parise and Spence (2008) suggested that pitch size synesthetic congru-
ency (i.e., a natural association between the relative pitch of a sound and the relative size of a visual
stimulus) might affect temporal ventriloquism. In their study, participants made visual TOJs about
small-sized and large-sized visual stimuli whereas high-pitched or low-pitched tones were pre-
sented before the first and after the second light. The results showed that, at large sound–light inter-
vals, sensitivity for visual temporal order was better for synesthetically congruent than incongruent
pairs. In a more recent study, Keetels and Vroomen (2010) reexamined this effect and showed that
this congruency effect could not be attributed to temporal ventriloquism, as it disappeared at short
sound–light intervals if compared to a synchronous AV baseline condition that excludes response
biases. In addition, synesthetic congruency did not affect temporal ventriloquism even if partici-
pants were made explicitly aware of congruency before testing, challenging the view that synes-
thetic congruency affects temporal ventriloquism.
Stekelenburg and Vroomen (2005) also investigated the time course and the electrophysiologi-
cal correlates of the audiovisual temporal ventriloquist effect using ERPs in the FLE. Their results
demonstrated that the amplitude of the visual N1 was systematically affected by the temporal inter-
val between the visual target flash and the task-irrelevant sound in the FLE paradigm (Mackay
1958; Nijhawan 1994, 1997, 2002). If a sound was presented in synchrony with the flash, the N1
amplitude was larger than when the sound lagged the visual stimulus, and it was smaller when the
sound lead the flash. No latency shifts, however, were found. Yet, based on the latency of the cross-
modal effect (N1 at 190 ms) and its localization in the occipitoparietal cortex, this study confirmed
the sensory nature of temporal ventriloquism. An explanation for the absence of a temporal shift of
Perception of Synchrony between the Senses 167
the ERP components may lie in the small size of the temporal ventriloquist effect found (3 ms). Such
a small temporal difference may not be reliably reflected in the ERPs because it reaches the lower
limit of the temporal resolution of the sampled EEG.
In most of the studies examining temporal ventriloquism (visual TOJ, FLE, reporting clock posi-
tion or motion direction), the timing of the visual stimulus is the task-relevant dimension. Although
recently, Vroomen and Keetels (2009) explored whether a temporally offset sound could improve
the identification of a visual stimulus whereas temporal order is not involved. In this study, it was
examined whether four-dot masking was affected by temporal ventriloquism. In the four-dot mask-
ing paradigm, visual target identification is impaired when a briefly presented target is followed by
a mask that consists of four dots that surround but do not touch the visual target (Enns 2004; Enns
and DiLollo 1997, 2000). The idea tested was that a sound presented slightly before the target and
slightly after the mask might lengthen the perceived interval between target and mask. By lengthen-
ing the perceived target–mask interval, there is more time for the target to consolidate, and in turn
target identification should be easier. Results were in line with this hypothesis as a small release
from four-dot masking was reported (1% improvement, which corresponds to an increase of the
target–mask ISI of 4.4 ms) if two sounds were presented at approximately 100-ms intervals before
the target and after the mask, rather than if only a single sound was presented before the target or
a silent condition.
To summarize, there are by now many demonstrations that vision is flexible on the time dimen-
sion. In general, the perceived timing of a visual event is attracted toward other events in audition
and touch, provided that the lag between them is less than ~200 ms. The deeper reason why there is
this mutual attraction is still untested. Although in our view, it serves to reduce natural lags between
the senses so that they become unnoticed, thus maintaining coherence between the senses.
If so, one can ask what the relationship is between temporal ventriloquism and temporal recali-
bration. Despite the fact that occurs immediately when a temporal asynchrony is presented, whereas
temporal recalibration manifests itself as an aftereffect, both effects are explained as perceptual
solutions to maintain intersensory synchrony. The question can then be asked whether the same
mechanism underlies the two phenomena. At first sight, one might argue that the magnitude of
the temporal ventriloquist effect seems smaller than the temporal recalibration effects (temporal
ventriloquism: Morein-Zamir et al. 2003, ~15 ms JND improvement; Scheier et al. 1999, 15 ms
JND improvement; Vroomen and Keetels 2006, ~6 ms JND improvement; temporal recalibration:
Fujisaki et al. 2004, ~30 ms PSS shifts for 225 ms adaptation lags; Hanson et al. 2008, ~35 ms
PSS shifts for 90 ms adaptation lags; Navarra et al. 2009, ~20 ms shifts in reaction times; although
relatively small effects were found by Vroomen et al. 2004, ~8 ms PSS shifts for 100 ms adaptation
lags). However, these magnitudes cannot be compared directly because the temporal ventriloquist
effect refers to an improvement in JNDs, whereas the temporal recalibration effect is typically a
shift of the PSS. Moreover, in studies measuring temporal recalibration, there is usually much more
exposure to temporal asynchronies than in studies measuring temporal ventriloquism. Therefore,
it remains up to future studies to examine whether the mechanisms that are involved in temporal
ventriloquism and temporal recalibration are the same.
does it “pop out”? In a study by van de Par and Kohlrausch (2004), this question was addressed by
presenting observers a visual display of a number of independently moving circles moving up and
down along a Gaussian profile. Along with the motion display, a concurrent sound was presented
in which amplitude was modulated coherently with one of the circles. The participant’s task was to
identify the coherently moving visual circle as quickly as possible. The authors found that response
times increased approximately linearly with the numbers of distracters (~500 ms/distracter), indi-
cating a slow serial search process rather than pop-out.
Fujisaki et al. (2006) came to similar conclusions. They examined search functions for a visual
target that changed in synchrony with an auditory stimulus. The visual display consisted of two, four,
or eight luminance-modulated Gaussian blobs presented at 5, 10, 20, and 40 Hz that were accompa-
nied by a white noise sound whose amplitude was modulated in synch with one of the visual stimuli.
Other displays contained clockwise/counterclockwise rotations of windmills synchronized with a
sound whose frequency was modulated up or down at a rate of 10 Hz. The observers’ task was to
indicate which visual stimulus was luminance-modulated in synch with the sound. Search func-
tions for both displays were slow (~1 s/distractor in target-present displays), and increased linearly
with the number of visual distracters. In a control experiment, it was also shown that synchrony
discrimination was unaffected by the presence of distractors if attention was directed at the visual
target. Fujisaki et al. therefore concluded that perception of audiovisual synchrony is a slow and
serial process based on a comparison of salient temporal features that need to be individuated from
within-modal signal streams.
Others, though, came to quite opposing conclusions and found that intersensory synchrony can
be detected in an automatic fashion. Most notably, van der Burg et al. (2008b) reported an interest-
ing study in which they showed that a simple auditory pip can drastically reduce search times for
a color-changing object that is synchronized with the pip. The authors presented a horizontal or
vertical target line among a large array of oblique lines. Each of the lines (target and distracters)
changed color from green-to-red or red-to-green in a random fashion. If a pip sound was synchro-
nized with a color change, visual attention was automatically drawn to the location of the line that
changed color. When the sound was synchronized with the color change of the target, search times
improved drastically and the number of irrelevant distracters had virtually no effect on search times
(a nearly flat slope indicating pop-out). The authors concluded that the temporal information of the
auditory signal was integrated with the visual signal generating a relatively salient emergent feature
that automatically draw spatial attention (see also van der Burg et al. 2008a). Similar effects were
also demonstrated for tactile stimuli instead of auditory pips (Olivers and van der Burg 2008; van
der Burg et al. 2009).
Kanai et al. (2007) also explored temporal correspondences in visually ambiguous displays. They
presented multiple disks flashing sequentially at one of eight locations in a circle, thus inducing the
percept of a disk revolving around fixation. A sound was presented at one particular location in every
cycle, and participants had to indicate the disk that was temporally aligned with the sound. The disk
seen as being synchronized with the sound was perceived as brighter with a sharper onset and offset
(Vroomen and de Gelder 2000). Moreover, it fluctuated over time and its position changed every 5
to 10 s. Kanai et al. explored whether this flexibility was dependent on attention by having observers
perform a concurrent task in which they had to count the number of X’s in a letter stream. The results
demonstrated that the transitions disappeared whenever attention was distracted from the stimulus.
On the other hand, if attention was directed to one particular visual event—either by making it
“pop-out” by a using a different color, by presenting a cue next to the target dot, or by overtly cueing
it—the perceived timing of the sound was attracted toward that event. These results thus suggest that
perception of intersensory synchrony is flexible, and is not completely immune to attention.
These opposing views on the role of attention can be reconciled on the assumption that percep-
tion of synchrony depends on a matching process of salient temporal features (Fujisaki et al. 2006;
Fujisaki and Nishida 2007). Saliency may be lost when stimuli are presented at fast rates (typi-
cally above 4 Hz), when perceptually grouped into other streams, or if they lack a sharp transition
Perception of Synchrony between the Senses 169
(Keetels et al. 2007; Sanabria et al. 2004; Vroomen and de Gelder 2004a; Watanabe and Shimojo
2001). In line with this notion, studies reporting that audiovisual synchrony detection is slow, either
presented stimuli at fast rates (>4 Hz up to 80/s) or the stimuli lacked a sharp onset/offset (e.g., van
de Par, using a Gaussian amplitude modulation). Others reporting automatic detection of auditory–
visual synchrony used much slower rates (1.11 Hz; van der Burg et al. 2008b) and sharp transitions
(a pip).
More recently, in an fMRI study, Dhamala et al. (2007) examined the networks that are involved
in the perception of physically synchronous versus asynchronous audiovisual events. Two timing
parameters were varied: the SOA between sound and light (–200 to +200 ms) and the stimulation
rate (0.5–3.5 Hz). In the behavioral task, observers had to report whether stimuli were perceived
as simultaneous, sound-first, light-first, or “Can’t tell,” resulting in the classification of three dis-
tinct perceptual states, that is, the perception of synchrony, asynchrony, and “no clear perception.”
The fMRI data showed that each of these stages involved activation in different brain networks.
Perception of asynchrony activated the primary sensory, prefrontal, and inferior parietal corti-
ces, whereas perception of synchrony disengaged the inferior parietal cortex and further recruited
the SC.
An fMRI study by Noesselt et al. (2007) also explored the effect of temporal correspondence
between auditory and visual streams. The stimuli were arranged such that auditory and visual
streams were temporally corresponding or not, using irregular and arrhythmic temporal patterns
that either matched between audition and vision or mismatched substantially whereas maintaining
the same overall temporal statistics. For the coincident audiovisual streams, there was an increase
in the BOLD response in multisensory STS contralateral to the visual stream. The contralateral
primary visual and auditory cortex were also found to be affected by the synchrony–asynchrony
manipulations, and a connectivity analysis indicated enhanced influence from mSTS on primary
sensory areas during temporal correspondence.
In an EEG paradigm, Senkowski et al. (2007) examined the neural mechanisms underlying
intersensory synchrony by measuring oscillatory gamma-band responses (GBRs; 30–80 Hz).
Oscillatory GBRs have been linked to feature integration mechanisms and to multisensory pro-
cessing. The authors reasoned that GBRs might also be sensitive to the temporal alignment of
intersensory stimulus components. The temporal synchrony of auditory and visual components of a
multisensory signal was varied (tones and horizontal gratings with SOAs ranging from –125 to +125
ms). The GBRs to the auditory and visual components of multisensory stimuli were extracted for
five subranges of asynchrony and compared with GBRs to unisensory control stimuli. The results
revealed that multisensory interactions were strongest in the early GBRs when the sound and light
stimuli were presented with the closest synchrony. These effects were most evident over medial–
frontal brain areas after 30 to 80 ms and over occipital areas after 60 to 120 ms, indicating that
temporal synchrony may have an effect on early intersensory interactions in the human cortex.
Overall, it should be noted that there is a lot of variation in the outcomes of studies that have
examined the neural basis of intersensory temporal synchrony. At present, the issue is far from
resolved and more research has to be performed to unravel the exact neural substrates underlying
it. The overall finding is that the SC and mSTS are repeatedly reported in intersensory synchrony
detection studies, which at least suggests a prominent role for these structures in the processing
of intersensory stimuli based on their temporal correspondence. For the time being, however, it is
unknown how these areas would affect the perception of intersensory synchrony if they were dam-
aged or temporarily blocked by, for example, transcranial magnetic stimulation.
9.8 CONCLUSIONS
In recent years, a substantial amount of research has been devoted to understanding how the brain
handles lags between the senses. The most important conclusion we draw is that intersensory timing
is flexible and adaptive. The flexibility is clearly demonstrated by studies showing one or another
variant of temporal ventriloquism. In that case, small lags go unnoticed because the brain actively
shifts one information stream (usually vision) toward the other, possibly to maintain temporal
coherence. The adaptive part rests on studies of temporal recalibration demonstrating that observ-
ers are flexible in adopting what counts as synchronous. The extent to which temporal recalibration
generalizes to other stimuli and domains, however, remains to be further explored. The idea that the
brain compensates for predictable variability between the senses—most notably distance—is, in
Perception of Synchrony between the Senses 171
our view, not well-founded. We are more enthusiastic about the notion that intersensory synchrony
is perceived mostly in an automatic fashion, provided that the individual components of the stimuli
are sufficiently salient. The neural mechanisms that underlie this ability are of clear importance for
future research.
REFERENCES
Alais, D., and S. Carlile. 2005. Synchronizing to real events: Subjective audiovisual alignment scales with
perceived auditory depth and speed of sound. Proceedings of the National Academy of Sciences of the
United States of America 102(6);2244–7.
Arnold, D.H., A. Johnston, and S. Nishida. 2005. Timing sight and sound. Vision Research 45(10);1275–84.
Arrighi, R., D. Alais, and D. Burr. 2006. Perceptual synchrony of audiovisual streams for natural and artificial
motion sequences. Journal of Vision 6(3);260–8.
Asakawa, K., A. Tanaka, and H. Imai. 2009. Temporal Recalibration in Audio-Visual Speech Integration Using
a Simultaneity Judgment Task and the McGurk Identification Task. Paper presented at the 31st Annual
Meeting of the Cognitive Science Society (July 29–August 1, 2009). Amsterdam, The Netherlands.
Bald, L., F.K. Berrien, J.B. Price, and R.O. Sprague. 1942. Errors in perceiving the temporal order of auditory
and visual stimuli. Journal of Applied Psychology 26;283–388.
Bedford, F.L. 1989. Constraints on learning new mappings between perceptual dimensions. Journal of
Experimental Psychology. Human Perception and Performance 15(2);232–48.
Benjamins, J.S., M.J. van der Smagt, and F.A. Verstraten. 2008. Matching auditory and visual signals: Is sen-
sory modality just another feature? Perception 37(6);848–58.
Bertelson, P. 1994. The cognitive architecture behind auditory-visual interaction in scene analysis and speech
identification. Cahiers de Psychologie Cognitive 13(1);69–75.
Bertelson, P. 1999. Ventriloquism: A case of crossmodal perceptual grouping. In G. Aschersleben, T. Bachmann,
and J. Musseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events, 347–63.
North-Holland: Elsevier.
Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic
Bulletin & Review 5(3);482–89.
Bertelson, P., and G. Aschersleben. 2003. Temporal ventriloquism: Crossmodal interaction on the time
dimension: 1. Evidence from auditory–visual temporal order judgment. International Journal of
Psychophysiology 50(1–2);147–55.
Boenke, L.T., M. Deliano, and F.W. Ohl. 2009. Stimulus duration influences perceived simultaneity in audiovi-
sual temporal-order judgment. Experimental Brain Research 198(2–3);233–44.
Bronkhorst, A.W. 1995. Localization of real and virtual sound sources. Journal of the Acoustical Society of
America 98(5);2542–53.
Bronkhorst, A.W., and T. Houtgast. 1999. Auditory distance perception in rooms. Nature 397;517–20.
Bruns, P., and S. Getzmann. 2008. Audiovisual influences on the perception of visual apparent motion:
Exploring the effect of a single sound. Acta Psychologica 129(2);273–83.
Bushara, K.O., J. Grafman, and M. Hallett. 2001. Neural correlates of auditory-visual stimulus onset asyn-
chrony detection. Journal of Neuroscience 21(1);300–4.
Calvert, G., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in
humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14(2);427–38.
Calvert, G., C. Spence, and B. Stein. 2004. The Handbook of Multisensory Processes. Cambridge, MA: The
MIT Press.
Colin, C., M. Radeau, P. Deltenre, and J. Morais. 2001. Rules of intersensory integration in spatial scene analy-
sis and speechreading. Psychologica Belgica 41(3);131–44.
Conrey, B., and D.B. Pisoni. 2006. Auditory–visual speech perception and synchrony detection for speech and
nonspeech signals. Journal of the Acoustical Society of America 119(6);4065–73.
Dhamala, M., C.G. Assisi, V.K. Jirsa, F.L. Steinberg, and J.A. Kelso. 2007. Multisensory integration for timing
engages different brain networks. NeuroImage 34(2);764–73.
Di Luca, M., T. Machulla, and M.O. Ernst. 2007. Perceived Timing Across Modalities. Paper presented at the
International Intersensory Research Symposium 2007: Perception and Action (July 3, 2007). Sydney,
Australia.
Dinnerstein, A.J., and P. Zlotogura. 1968. Intermodal perception of temporal order and motor skills: Effects of
age. Perceptual and Motor Skills 26(3);987–1000.
Dixon, N.F., and L. Spitz. 1980. The detection of auditory visual desynchrony. Perception 9(6);719–21.
172 The Neural Bases of Multisensory Processes
Eagleman, D.M., and A.O. Holcombe. 2002. Causality and the perception of time. Trends in Cognitive Sciences
6(8);323–5.
Eimer, M., and J. Driver. 2001. Crossmodal links in endogenous and exogenous spatial attention: Evidence
from event-related brain potential studies. Neuroscience and Biobehavioral Reviews 25(6);497–511.
Eimer, M., and E. Schroger. 1998. ERP effects of intermodal attention and cross-modal links in spatial atten-
tion. Psychophysiology 35(3);313–27.
Engel, G.R., and W.G. Dougherty. 1971. Visual–auditory distance constancy. Nature 234(5327);308.
Enns, J.T. 2004. Object substitution and its relation to other forms of visual masking. Vision Research
44(12);1321–31.
Enns, J.T., and V. DiLollo. 1997. Object substitution: A new form of masking in unattended visual locations.
Psychological Science 8;135–9.
Enns, J.T., and V. DiLollo. 2000. What’s new in visual masking? Trends in Cognitive Sciences 4(9);345–52.
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415(6870);429–33.
Ernst, M.O., and H.H. Bulthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Sciences
8(4);162–9.
Ernst, M.O., M.S. Banks, and H.H. Bulthoff. 2000. Touch can change visual slant perception. Nature Neuro
science 3(1);69–73.
Fain, G.L. 2003. Sensory Transduction. Sunderland, MA: Sinauer Associates.
Fendrich, R., and P.M. Corballis. 2001. The temporal cross-capture of audition and vision. Perception &
Psychophysics 63(4);719–25.
Finger, R., and A.W. Davis. 2001. Measuring Video Quality in Videoconferencing Systems. Technical Report
SN187-D. Los Gatos, CA: Pixel Instrument Corporation.
Freeman, E., and J. Driver. 2008. Direction of visual apparent motion driven solely by timing of a static sound.
Current Biology 18(16);1262–6.
Frey, R.D. 1990. Selective attention, event perception and the criterion of acceptability principle: Evidence sup-
porting and rejecting the doctrine of prior entry. Human Movement Science 9;481–530.
Fujisaki, W., and S. Nishida. 2005. Temporal frequency characteristics of synchrony–asynchrony discrimina-
tion of audio-visual signals. Experimental Brain Research 166(3–4);455–64.
Fujisaki, W., and S. Nishida. 2007. Feature-based processing of audio-visual synchrony perception revealed by
random pulse trains. Vision Research 47(8);1075–93.
Fujisaki, W., S. Shimojo, M. Kashino, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature
Neuroscience 7(7);773–8.
Fujisaki, W., A. Koene, D. Arnold, A. Johnston, and S. Nishida. 2006. Visual search for a target changing in
synchrony with an auditory signal. Proceedings of Biological Science 273(1588);865–74.
Getzmann, S. 2007. The effect of brief auditory stimuli on visual apparent motion. Perception 36(7);1089–103.
Grant, K.W., V. van Wassenhove, and D. Poeppel. 2004. Detection of auditory (cross-spectral) and auditory–
visual (cross-modal) synchrony. Speech Communication 44;43–53.
Hanson, J.V., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities.
Experimental Brain Research 185(2);347–52.
Harrar, V., and L.R. Harris. 2005. Simultaneity constancy: Detecting events with touch and vision. Experimental
Brain Research 166(3–4);465–73.
Harrar, V., and L.R. Harris. 2008. The effect of exposure to asynchronous audio, visual, and tactile stimulus
combinations on the perception of simultaneity. Experimental Brain Research 186(4);517–24.
Heron, J., D. Whitaker, P.V. McGraw, and K.V. Horoshenkov. 2007. Adaptation minimizes distance-related
audiovisual delays. Journal of Vision 7(13);51–8.
Hillyard, S.A., and T.F. Munte. 1984. Selective attention to color and location: An analysis with event-related
brain potentials. Perception & Psychophysics 36(2);185–98.
Hirsh, I.J., and P. Fraisse. 1964. Simultaneous character and succession of heterogenous stimuli. L’Année
Psychologique 64;1–19.
Hirsh, I.J., and C.E. Sherrick. 1961. Perceived order in different sense modalities. Journal of Experimental
Psychology 62(5);423–32.
Jaskowski, P. 1999. Reaction time and temporal-order judgment as measures of perceptual latency: The prob-
lem of dissociations. In G. Aschersleben, T. Bachmann, and J. Müsseler (eds.), Cognitive Contributions
to the Perception of Spatial and Temporal Events (pp. 265–82). North-Holland: Elsevier Science B.V.
Jaskowski, P., and R. Verleger. 2000. Attentional bias toward low-intensity stimuli: An explanation for the
intensity dissociation between reaction time and temporal order judgment? Consciousness and Cognition
9(3);435–56.
Perception of Synchrony between the Senses 173
Jaskowski, P., F. Jaroszyk, and D. Hojan-Jezierska. 1990. Temporal-order judgments and reaction time for
stimuli of different modalities. Psychological Research, 52(1);35–8.
Jones, J.A., and M. Jarick. 2006. Multisensory integration of speech signals: The relationship between space
and time. Experimental Brain Research 174(3);588–94.
Jones, J.A., and K.G. Munhall. 1997. The effects of separating auditory and visual sources on the audiovisual
integration of speech. Canadian Acoustics 25(4);13–9.
Kanai, R., B.R. Sheth, F.A. Verstraten, and S. Shimojo. 2007. Dynamic perceptual changes in audiovisual
simultaneity. PLoS ONE 2(12);e1253.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18(7);1560–74.
Keetels, M., and J. Vroomen. 2005. The role of spatial disparity and hemifields in audio-visual temporal order
judgements. Experimental Brain Research 167;635–40.
Keetels, M., and J. Vroomen. 2007. No effect of auditory-visual spatial disparity on temporal recalibration.
Experimental Brain Research 182(4);559–65.
Keetels, M., and J. Vroomen. 2008a. Tactile–visual temporal ventriloquism: No effect of spatial disparity.
Perception & Psychophysics 70(5);765–71.
Keetels, M., and J. Vroomen. 2008b. Temporal recalibration to tactile–visual asynchronous stimuli. Neuroscience
Letters 430(2);130–4.
Keetels, M., and J. Vroomen. 2010. No effect of synesthetic congruency on temporal ventriloquism. Attention,
Perception, & Psychophysics 72(4);871–4.
Keetels, M., J. Stekelenburg, and J. Vroomen. 2007. Auditory grouping occurs prior to intersensory pairing:
Evidence from temporal ventriloquism. Experimental Brain Research 180(3);449–56.
King, A.J. 2005. Multisensory integration: Strategies for synchronization. Current Biology 15(9);R339–41.
King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the
guinea-pig superior colliculus. Experimental Brain Research 60(3);492–500.
Kitagawa, N., M. Zampini, and C. Spence. 2005. Audiotactile interactions in near and far space. Experimental
Brain Research 166(3–4);528–37.
Kopinska, A., and L.R. Harris. 2004. Simultaneity constancy. Perception 33(9);1049–60.
Korte, A. 1915. Kinematoskopische untersuchungen. Zeitschrift für Psychologie mit Zeitschrift für Angewandte
Psychologie 72;194–296.
Levitin, D., K. MacLean, M. Mathews, and L. Chu. 2000. The perception of cross-modal simultaneity.
International Journal of Computing and Anticipatory Systems, 323–9.
Lewald, J., and R. Guski. 2003. Cross-modal perceptual integration of spatially and temporally disparate audi-
tory and visual stimuli. Cognitive Brain Research 16(3);468–78.
Lewald, J., and R. Guski. 2004. Auditory–visual temporal integration as a function of distance: No compensa-
tion for sound-transmission time in human perception. Neuroscience Letters 357(2);119–22.
Lewkowicz, D.J. 1996. Perception of auditory-visual temporal synchrony in human infants. Journal of
Experimental Psychology. Human Perception and Performance 22(5);1094–106.
Macaluso, E., N. George, R. Dolan, C. Spence, and J. Driver. 2004. Spatial and temporal factors during process-
ing of audiovisual speech: A PET study. NeuroImage 21(2);725–32.
Macefield, G., S.C. Gandevia, and D. Burke. 1989. Conduction velocities of muscle and cutaneous afferents in
the upper and lower limbs of human subjects. Brain 112(6);1519–32.
Mackay, D.M. 1958. Perceptual stability of a stroboscopically lit visual field containing self-luminous objects.
Nature 181(4607);507–8.
Massaro, D.W., M.M. Cohen, and P.M. Smeele. 1996. Perception of asynchronous and conflicting visual and
auditory speech. Journal of the Acoustical Society of America 100(3);1777–86.
Mattes, S., and R. Ulrich. 1998. Directed attention prolongs the perceived duration of a brief stimulus. Perception
& Psychophysics 60(8);1305–17.
McGrath, M., and Q. Summerfield. 1985. Intermodal timing relations and audio-visual speech recognition by
normal-hearing adults. Journal of the Acoustical Society of America 77(2);678–85.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264(5588);746–8.
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons. I. Temporal factors. Journal of Neuroscience 7(10);3215–29.
Mitrani, L., S. Shekerdjiiski, and N. Yakimoff. 1986. Mechanisms and asymmetries in visual perception of
simultaneity and temporal order. Biological Cybernetics 54(3);159–65.
Mollon, J.D., and A.J. Perkins. 1996. Errors of judgement at Greenwich in 1796. Nature 380(6570);101–2.
Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. 2003. Auditory capture of vision: Examining temporal
ventriloquism. Cognitive Brain Research 17(1);154–63.
174 The Neural Bases of Multisensory Processes
Mortlock, A.N., D. Machin, S. McConnell, and P. Sheppard. 1997. Virtual conferencing. BT Technology Journal
15;120–9.
Munhall, K.G., P. Gribble, L. Sacco, and M. Ward. 1996. Temporal constraints on the McGurk effect. Perception
& Psychophysics 58(3);351–62.
Navarra, J., A. Vatakis, M. Zampini et al. 2005. Exposure to asynchronous audiovisual speech extends the tem-
poral window for audiovisual integration. Cognitive Brain Research 25(2);499–507.
Navarra, J., S. Soto-Faraco, and C. Spence. 2007. Adaptation to audiotactile asynchrony. Neuroscience Letters
413(1);72–6.
Navarra, J., J. Hartcher-O’Brien, E. Piazza, and C. Spence. 2009. Adaptation to audiovisual asynchrony modu-
lates the speeded detection of sound. Proceedings of the National Academy of Sciences of the United
States of America 106(23);9169–73.
Neumann, O., and M. Niepel. 2004. Timing of “perception” and perception of “time.” In C. Kaernbach,
E. Schröger, and H. Müller (eds.), Psychophysics Beyond Sensation: Laws and Invariants of Human
Cognition (pp. 245–70): Lawrence Erlbaum Associates, Inc.
Nijhawan, R. 1994. Motion extrapolation in catching. Nature 370(6487);256–7.
Nijhawan, R. 1997. Visual decomposition of colour through motion extrapolation. Nature 386(6620);66–9.
Nijhawan, R. 2002. Neural delays, visual motion and the flash-lag effect. Trends in Cognitive Science
6(9);387.
Noesselt, T., J.W. Rieger, M.A. Schoenfeld et al. 2007. Audiovisual temporal correspondence modulates
humultisensory man superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience
27(42);11431–41.
Occelli, V., C. Spence, and M. Zampini. 2008. Audiotactile temporal order judgments in sighted and blind
individuals. Neuropsychologia 46(11);2845–50.
Olivers, C.N., and E. van der Burg. 2008. Bleeping you out of the blink: Sound saves vision from oblivion.
Brain Research 1242;191–9.
Pandey, P.C., H. Kunov, and S.M. Abel. 1986. Disruptive effects of auditory signal delay on speech perception
with lipreading. Journal of Auditory Research 26(1);27–41.
Parise, C., and C. Spence. 2008. Synesthetic congruency modulates the temporal ventriloquism effect.
Neuroscience Letters 442(3);257–61.
Pöppel, E. 1985. Grenzes des bewusstseins, Stuttgart: Deutsche Verlags-Anstal, translated as Mindworks: Time
and Conscious Experience. New York: Harcourt Brace Jovanovich. 1988.
Poppel, E., K. Schill, and N. von Steinbuchel. 1990. Sensory integration within temporally neutral systems
states: A hypothesis. Naturwissenschaften 77(2);89–91.
Radeau, M. 1994. Auditory-visual spatial interaction and modularity. Cahiers de Psychologie Cognitive
13(1);3–51.
Rihs, S. 1995. The Influence of Audio on Perceived Picture Quality and Subjective Audio-Visual Delay
Tolerance. Paper presented at the MOSAIC Workshop: Advanced methods for the evaluation of television
picture quality, Eindhoven, 18–19 September.
Roefs, J.A.J. 1963. Perception lag as a function of stimulus luminance. Vision Research 3;81–91.
Rutschmann, J., and R. Link. 1964. Perception of temporal order of stimuli differing in sense mode and simple
reaction time. Perceptual and Motor Skills 18;345–52.
Sanabria, D., S. Soto-Faraco, and C. Spence. 2004. Exploring the role of visual perceptual grouping on the
audiovisual integration of motion. Neuroreport 15(18);2745–9.
Sanford, A.J. 1971. Effects of changes in the intensity of white noise on simultaneity judgements and simple
reaction time. Quarterly Journal of Experimental Psychology 23;296–303.
Scheier, C.R., R. Nijhawan, and S. Shimojo. 1999. Sound alters visual temporal resolution. Investigative
Ophthalmology & Visual Science 40;4169.
Schneider, K.A., and D. Bavelier. 2003. Components of visual prior entry. Cognitive Psychology 47(4);
333–66.
Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385;308–08.
Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisen-
sory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations.
Neuropsychologia 45(3);561–71.
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research
14(1);147–52.
Shimojo, S., C. Scheier, R. Nijhawan et al. 2001. Beyond perceptual modality: Auditory effects on visual per-
ception. Acoustical Science & Technology 22(2);61–67.
Shipley, T. 1964. Auditory flutter-driving of visual flicker. Science 145;1328–30.
Perception of Synchrony between the Senses 175
Shore, D.I., C. Spence, and R.M. Klein. 2001. Visual prior entry. Psychological Science 12(3);205–12.
Shore, D.I., C. Spence, and R.M. Klein. 2005. Prior entry. In L. Itti, G. Rees, and J. Tsotsos (eds.), Neurobiology
of Attention (pp. 89–95). North Holland: Elsevier.
Slutsky, D.A., and G.H. Recanzone. 2001. Temporal and spatial dependency of the ventriloquism effect.
Neuroreport 12(1);7–10.
Smith, W.F. 1933. The relative quickness of visual and auditory perception. Journal of Experimental Psychology
16;239–257.
Soto-Faraco, S., and A. Alsius. 2007. Conscious access to the unisensory components of a cross-modal illusion.
Neuroreport 18(4);347–50.
Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimen
tal Psychology. Human Perception and Performance 35(2);580–7.
Spence, C., and J. Driver. 1996. Audiovisual links in endogenous covert spatial attention. Journal of Experimen
tal Psychology. Human Perception and Performance 22(4);1005–30.
Spence, C., and J. Driver. 2004. Crossmodal Space and Crossmodal Attention. Oxford: Oxford University
Press.
Spence, C., F. Pavani, and J. Driver. 2000. Crossmodal links between vision and touch in covert endoge-
nous spatial attention. Journal of Experimental Psychology. Human Perception and Performance
26(4);1298–319.
Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current
Biology 13(13);R519–21.
Spence, C., D.I. Shore, and R.M. Klein. 2001. Multisensory prior entry. Journal of Experimental Psychology.
General 130(4);799–832.
Spence, C., R. Baddeley, M. Zampini, R. James, and D.I. Shore. 2003. Multisensory temporal order judgments:
When two locations are better than one. Perception & Psychophysics 65(2);318–28.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: The MIT Press.
Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: Multisensory
integration in cat and monkey. Progress in Brain Research 95;79–90.
Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by
auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience 8(6);497–506.
Stekelenburg, J.J., and J. Vroomen. 2005. An event-related potential investigation of the time-course of tempo-
ral ventriloquism. Neuroreport 16;641–44.
Stekelenburg, J.J., and J. Vroomen. 2007. Neural correlates of multisensory integration of ecologically valid
audiovisual events. Journal of Cognitive Neuroscience 19(12);1964–73.
Stelmach, L.B., and C.M. Herdman. 1991. Directed attention and perception of temporal order. Journal of
Experimental Psychology. Human Perception and Performance 17(2);539–50.
Sternberg, S., and R.L. Knoll. 1973. The perception of temporal order: Fundamental issues and a general model.
In S. Kornblum (ed.), Attention and Performance (vol. IV, pp. 629–85). New York: Academic Press.
Stetson, C., X. Cui, P.R. Montague, and D.M. Eagleman. 2006. Motor–sensory recalibration leads to an illusory
reversal of action and sensation. Neuron 51(5);651–9.
Stone, J.V., N.M. Hunkin, J. Porrill et al. 2001. When is now? Perception of simultaneity. Proceedings of the
Royal Society of London. Series B. Biological Sciences 268(1462);31–8.
Sugano, Y., M. Keetels, and J. Vroomen. 2010. Adaptation to motor–visual and motor–auditory temporal lags
transfer across modalities. Experimental Brain Research 201(3);393–9.
Sugita, Y., and Y. Suzuki. 2003. Audiovisual perception: Implicit estimation of sound-arrival time. Nature
421(6926);911.
Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America 26;212–15.
Summerfield, Q. 1987. A comprehensive account of audio-visual speech perception. In B. Dodd and R.
Campbell (eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3–51). London: Lawrence
Erlbaum Associates.
Takahashi, K., J. Saiki, and K. Watanabe. 2008. Realignment of temporal simultaneity between vision and
touch. Neuroreport 19(3);319–22.
Tanaka, A., S. Sakamoto, K. Tsumura, and S. Suzuki. 2009a. Visual speech improves the intelligibility of time-
expanded auditory speech. Neuroreport 20;473–7.
Tanaka, A., S. Sakamoto, K. Tsumura, and Y. Suzuki. 2009b. Visual speech improves the intelligibility of time-
expanded auditory speech. Neuroreport 20(5);473–7.
Teatini, G., M. Ferne, F. Verzella, and J.P. Berruecos. 1976. Perception of temporal order: Visual and auditory
stimuli. Giornale Italiano di Psicologia 3;157–64.
176 The Neural Bases of Multisensory Processes
Teder-Salejarvi, W.A., F. Di Russo, J.J. McDonald, and S.A. Hillyard. 2005. Effects of spatial congruity on
audio-visual multimodal integration. Journal of Cognitive Neuroscience 17(9);1396–409.
Titchener, E.B. 1908. Lectures on the Elementary Psychology of Feeling and Attention. New York:
Macmillan.
van de Par, S., and A. Kohlrausch. 2004. Visual and auditory object selection based on temporal correlations
between auditory and visual cues. Paper presented at the 18th International Congress on Acoustics,
Kyoto, Japan.
van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2008a. Audiovisual events capture attention:
Evidence from temporal order judgments. Journal of Vision 8(5);2, 1–10.
van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2008b. Pip and pop: Nonspatial audi-
tory signals improve spatial visual search. Journal of Experimental Psychology. Human Perception and
Performance 34(5);1053–65.
van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2009. Poke and pop: Tactile–visual syn-
chrony increases visual saliency. Neuroscience Letters 450(1);60–4.
Van Eijk, R.L. 2008. Audio-Visual Synchrony Perception. Thesis, Technische Universiteit Eindhoven, The
Netherlands.
Van Eijk, R.L., A. Kohlrausch, J.F. Juola, and S. van de Par. 2008. Audiovisual synchrony and tempo-
ral order judgments: Effects of experimental method and stimulus type. Perception & Psychophysics
70(6);955–68.
van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual
speech perception. Neuropsychologia 45;598–601.
Vatakis, A., and C. Spence. 2006a. Audiovisual synchrony perception for music, speech, and object actions.
Brain Research 1111(1);134–42.
Vatakis, A., and C. Spence. 2006b. Audiovisual synchrony perception for speech and music assessed using a
temporal order judgment task. Neuroscience Letters 393(1);40–4.
Vatakis, A., and C. Spence. 2007. Crossmodal binding: Evaluating the “unity assumption” using audiovisual
speech stimuli. Perception & Psychophysics 69(5);744–56.
Vatakis, A., and C. Spence. 2008. Evaluating the influence of the ‘unity assumption’ on the temporal perception
of realistic audiovisual stimuli. Acta Psychologica 127(1);12–23.
Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2007. Temporal recalibration during asynchronous
audiovisual speech perception. Experimental Brain Research 181(1);173–81.
Vatakis, A., A.A. Ghazanfar, and C. Spence. 2008a. Facilitation of multisensory integration by the “unity effect”
reveals that speech is special. Journal of Vision 8(9);14 1–11.
Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2008b. Audiovisual temporal adaptation of speech:
Temporal order versus simultaneity judgments. Experimental Brain Research 185(3);521–9.
Vibell, J., C. Klinge, M. Zampini, C. Spence, and A.C. Nobre. 2007. Temporal order is coded temporally in the
brain: Early event-related potential latency shifts underlying prior entry in a cross-modal temporal order
judgment task. Journal of Cognitive Neuroscience 19(1);109–20.
von Grunau, M.W. 1986. A motion aftereffect for long-range stroboscopic apparent motion. Perception &
Psychophysics 40(1);31–8.
Von Helmholtz, H. 1867. Handbuch der Physiologischen Optik. Leipzig: Leopold Voss.
Vroomen, J., and B. de Gelder. 2000. Sound enhances visual perception: Cross-modal effects of auditory
organization on vision. Journal of Experimental Psychology. Human Perception and Performance
26(5);1583–90.
Vroomen, J., and B. de Gelder. 2004a. Perceptual effects of cross-modal stimulation: Ventriloquism and the
freezing phenomenon. In G.A. Calvert, C. Spence, and B.E. Stein (eds.). The Handbook of Multisensory
Processes. Cambridge, MA: MIT Press.
Vroomen, J., and B. de Gelder. 2004b. Temporal ventriloquism: Sound modulates the flash-lag effect. Journal
of Experimental Psychology. Human Perception and Performance 30(3);513–8.
Vroomen, J., and M. Keetels. 2006. The spatial constraint in intersensory pairing: No role in temporal ventrilo-
quism. Journal of Experimental Psychology. Human Perception and Performance 32(4);1063–71.
Vroomen, J., and M. Keetels. 2009. Sounds change four-dot masking. Acta Psychologica 130(1);58–63.
Vroomen, J., and J.J. Stekelenburg. 2009. Visual anticipatory information modulates multisensory interactions
of artificial audiovisual stimuli. Journal of Cognitive Neuroscience 22(7);1583–96.
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by
exposure to audio-visual asynchrony. Cognitive Brain Research 22(1);32–5.
Watanabe, K., and S. Shimojo. 2001. When sound affects vision: Effects of auditory grouping on visual motion
perception. Psychological Science 12(2);109–16.
Perception of Synchrony between the Senses 177
Welch, R.B. 1999. Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and tem-
poral perceptions. In G. Aschersleben, T. Bachmann, and J. Müsseler (eds.), Cognitive Contributions to
the Perception of Spatial and Temporal Events (pp. 371–87). Amsterdam: Elsevier.
Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological
Bulletin 88(3);638–67.
Yamamoto, S., M. Miyazaki, T. Iwano, and S. Kitazawa. 2008. Bayesian calibration of simultaneity in audio-
visual temporal order judgment. Paper presented at the 9th International Multisensory Research Forum
(July 16–19, 2008). Hamburg, Germany.
Zampini, M., D.I. Shore, and C. Spence. 2003a. Audiovisual temporal order judgments. Experimental Brain
Research 152(2);198–210.
Zampini, M., D.I. Shore, and C. Spence. 2003b. Multisensory temporal order judgments: The role of hemi-
spheric redundancy. International Journal of Psychophysiology 50(1–2);165–80.
Zampini, M., T. Brown, D.I. Shore et al. 2005a. Audiotactile temporal order judgments. Acta Psychologica
118(3);277–91.
Zampini, M., S. Guest, D.I. Shore, and C. Spence. 2005b. Audio-visual simultaneity judgments. Perception &
Psychophysics 67(3);531–44.
Zampini, M., D.I. Shore, and C. Spence. 2005c. Audiovisual prior entry. Neuroscience Letters 381(3);217–22.
10 Representation of Object
Form in Vision and Touch
Simon Lacey and Krish Sathian
CONTENTS
10.1 Introduction........................................................................................................................... 179
10.2 Cortical Regions Involved in Visuo-Haptic Shape Processing............................................. 179
10.2.1 Lateral Occipital Complex......................................................................................... 179
10.2.2 Parietal Cortical Regions........................................................................................... 180
10.3 Do Vision and Touch Share a Common Shape Representation?........................................... 180
10.3.1 Potential Role of Visual Imagery.............................................................................. 180
10.3.2 A Modality-Independent Shape Representation?...................................................... 181
10.4 Properties of Shared Representation..................................................................................... 182
10.4.1 View-Dependence in Vision and Touch.................................................................... 182
10.4.2 Cross-Modal View-Independence............................................................................. 183
10.5 An Integrative Framework for Visuo- Haptic Shape Representation..................................... 183
Acknowledgments........................................................................................................................... 184
References....................................................................................................................................... 184
10.1 INTRODUCTION
The idea that the brain processes sensory inputs in parallel modality-specific streams has given way
to the concept of a “metamodal” brain with a multisensory task-based organization (Pascual-Leone
and Hamilton 2001). For example, recent research shows that many cerebral cortical regions previ-
ously considered to be specialized for processing various aspects of visual input are also activated
during analogous tactile or haptic tasks (reviewed by Sathian and Lacey 2007). In this article,
which concentrates on shape processing in humans, we review the current state of knowledge about
the mental representation of object form in vision and touch. We begin by describing the cortical
regions showing multisensory responses to object form. Next, we consider the extent to which the
underlying representation of object form is explained by cross-modal visual imagery or multisen-
sory convergence. We then review recent work on the view-dependence of visuo-haptic shape rep-
resentations and the resulting model of a multisensory, view-independent representation. Finally,
we discuss a recently presented conceptual framework of visuo-haptic shape processing as a basis
for future investigations.
179
180 The Neural Bases of Multisensory Processes
three- dimensional shape perception (Amedi et al. 2001; Stilla and Sathian 2008; Zhang et al. 2004)
and tactile two-dimensional shape perception (Stoesz et al. 2003; Prather et al. 2004). Neurological
case studies indicate that the LOC is necessary for both haptic and visual shape perception: a patient
with a left occipitotemporal cortical lesion, likely including the LOC, was found to exhibit tactile
in addition to visual agnosia (inability to recognize objects), although somatosensory cortex and
basic somatosensory function were intact (Feinberg et al. 1986). Another patient with bilateral LOC
lesions could not learn new objects either visually or haptically (James et al. 2006). LOtv is thought
to be a processor of geometric shape because it is not activated during object recognition triggered
by object-specific sounds (Amedi et al. 2002). Interestingly, though, LOtv does respond when audi-
tory object recognition is mediated by a visual–auditory sensory substitution device that converts
visual shape information into an auditory stream, but only when individuals (whether sighted or
blind) are specifically trained in a manner permitting generalization to untrained objects and not
when merely arbitrary associations are taught (Amedi et al. 2007). This dissociation further bolsters
the idea that LOtv is concerned with geometric shape information, regardless of the input sensory
modality.
visual imagery can be considerably less than during haptic shape perception, suggesting that visual
imagery may be relatively unimportant in haptic shape perception (Amedi et al. 2001; see also Reed
et al. 2004). However, performance on the visual imagery task has not generally been monitored, so
that lower levels of LOC activity during visual imagery could simply reflect participants not main-
taining their visual images throughout the imagery scan. Because both the early and late blind show
shape-related activity in the LOC evoked by tactile input (Amedi et al. 2003; Burton et al. 2002;
Pietrini et al. 2004; Stilla et al. 2008; reviewed by Pascual-Leone et al. 2005; Sathian 2005; Sathian
and Lacey 2007), or by auditory input when sensory substitution devices were used (Amedi et al.
2007; Arno et al. 2001; Renier et al. 2004, 2005), some have concluded that visual imagery does not
account for cross-modal activation of visual cortex. Although this is true for the early-blind, it cer-
tainly does not exclude the use of visual imagery in the sighted, especially in view of the abundant
evidence for cross-modal plasticity resulting from visual deprivation (Pascual-Leone et al. 2005;
Sathian 2005; Sathian and Lacey 2007).
It is also important to be clear about what is meant by “visual imagery,” which is often treated
as a unitary ability. Recent research has shown that there are two different kinds of visual imagery:
“object imagery” (images that are pictorial and deal with the actual appearance of objects in terms
of shape, color, brightness, and other surface properties) and “spatial imagery” (more schematic
images dealing with the spatial relations of objects and their component parts and with spatial
transformations; Kozhevnikov et al. 2002, 2005; Blajenkova et al. 2006). This distinction is relevant
because both vision and touch encode spatial information about objects—for example, size, shape,
and the relative positions of different object features—such information may well be encoded in a
modality-independent spatial representation (Lacey and Campbell 2006). Support for this possibil-
ity is provided by recent work showing that spatial, but not object, imagery scores were correlated
with accuracy on cross-modal, but not within-modal, object identification for a set of closely similar
and previously unfamiliar objects (Lacey et al. 2007a). Thus, it is probably beneficial to explore the
roles of object and spatial imagery rather than taking an undifferentiated visual imagery approach.
We return to this idea later but, as an aside, we note that the object–spatial dimension of imagery
can be viewed as orthogonal to the modality involved, as there is evidence that early-blind individu-
als perform both object-based and spatially based tasks equally well (Aleman et al. 2001; see also
Noordzij et al. 2007). However, the object–spatial dimension of haptically derived representations
remains unexplored.
representation include the right LOC and the left CIP because activation magnitudes during visual
and haptic processing of (unfamiliar) shape are significantly correlated across subjects in these
regions (Stilla and Sathian 2008). Furthermore, the time taken to scan both visual images (Kosslyn
1973; Kosslyn et al. 1978) and haptically derived images (Röder and Rösler 1998) increases with
the spatial distance to be inspected. Also, the time taken to judge whether two objects are the same
or mirror images increases nearly linearly with increasing angular disparity between the objects
for mental rotation of both visual (Shepard and Metzler 1971) and haptic stimuli (Marmor and
Zaback 1976; Carpenter and Eisenberg 1978; Hollins 1986; Dellantonio and Spagnolo 1990). The
same relationship was found when the angle between a tactile stimulus and a canonical angle was
varied, with associated activity in the left anterior IPS (Prather et al. 2004), an area also active dur-
ing mental rotation of visual stimuli (Alivisatos and Petrides 1997), and probably corresponding to
AIP (Grefkes and Fink 2005; Shikata et al. 2008). Similar processing has been found with sighted,
early- and late-blind individuals (Carpenter and Eisenberg 1978; Röder and Rösler 1998). These
findings suggest that spatial metric information is preserved in both vision and touch, and that both
modalities rely on similar, if not identical, imagery processes (Röder and Rösler 1998).
A further functional equivalence between visual and haptic object representation is that each
has preferred or canonical views of objects. In vision, the preferred view for both familiar and
unfamiliar objects is one in which the main axis is angled at 45° to the observer (Palmer et al. 1981;
Perrett et al. 1992). Recently, Woods et al. (2008) have shown that haptic object recognition also has
canonical views—again independently of familiarity—but that these are defined by reference to the
midline of the observer’s body, the object’s main axis being aligned either parallel or perpendicu-
lar to the midline. This may be due to grasping and object function: Craddock and Lawson (2008)
found that haptic recognition was better for objects in typical rather than atypical orientations; for
example, a cup oriented with the handle to the right for a right-handed person.
object imagery task, participants listened to word pairs and decided whether the objects designated
by those words had the same or different shapes. Thus, in contrast with earlier studies, participants
had to process their images throughout the scan and this could be verified by monitoring their
performance. In a separate session, participants performed a haptic shape discrimination task. For
one group of subjects, the haptic objects were familiar; for the other group, they were unfamiliar.
We found that both intertask correlations and connectivity were modulated by object familiarity
(Deshpande et al. 2010; Lacey et al. 2010). Although the LOC was active bilaterally during both
visual object imagery and haptic shape perception, there was an intertask correlation only for famil-
iar shape. Analysis of connectivity showed that visual object imagery and haptic familiar shape
perception engaged quite similar networks characterized by top-down paths from prefrontal and
parietal regions into the LOC, whereas a very different network emerged during haptic perception
of unfamiliar shape, featuring bottom-up inputs from S1 to the LOC (Deshpande et al. 2010).
Based on these findings and on the literature reviewed earlier in this chapter, we proposed a
conceptual framework for visuo-haptic object representation that integrates the visual imagery and
multisensory approaches (Lacey et al. 2009b). In this proposed framework, the LOC houses a rep-
resentation that is independent of the input sensory modality and is flexibly accessible via either
bottom-up or top-down pathways, depending on object familiarity (or other task attributes). Haptic
perception of familiar shape uses visual object imagery via top-down paths from prefrontal and
parietal areas into the LOC whereas haptic perception of unfamiliar shape may use spatial imagery
processes and involves bottom-up pathways from the somatosensory cortex to the LOC. Because
there is no stored representation of an unfamiliar object, its global shape has to be computed by
exploring it in its entirety and the framework would therefore predict the somatosensory drive of
LOC. The IPS has been implicated in visuo-haptic perception of both shape and location (Stilla
and Sathian 2008; Gibson et al. 2008). We might therefore expect that, to compute global shape in
unfamiliar objects, the IPS would be involved in processing the relative spatial locations of object
parts. For familiar objects, global shape can be inferred easily, perhaps from distinctive features
that are sufficient to retrieve a visual image, and so the framework predicts increased contribution
from parietal and prefrontal regions. Clearly, objects are not exclusively familiar or unfamiliar and
individuals are not purely object or spatial imagers: these are continua along which objects and indi-
viduals may vary. In this respect, an individual differences approach is likely to be productive (see
Lacey et al. 2007b; Motes et al. 2008) because these factors may interact, with different weights in
different circumstances, for example task demands or individual history (visual experience, train-
ing, etc.). More work is required to define and test this framework.
ACKNOWLEDGMENTS
This work was supported by the National Eye Institute, the National Science Foundation, and the
Veterans Administration.
REFERENCES
Aleman, A., L. van Lee, M.H.M. Mantione, I.G. Verkoijen, and E.H.F. de Haan. 2001. Visual imagery without
visual experience: Evidence from congenitally totally blind people. Neuroreport 12:2601–2604.
Alivisatos, B., and M. Petrides. 1997. Functional activation of the human brain during mental rotation.
Neuropsychologia 36:11–118.
Amedi, A., R. Malach, T. Hendler, S. Peled, and E. Zohary. 2001. Visuo-haptic object-related activation in the
ventral visual pathway. Nature Neuroscience 4:324–330.
Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape
processing in the human lateral occipital complex. Cerebral Cortex 12:1202–1212.
Amedi, A., N. Raz, P. Pianka, R. Malach, and E. Zohary. 2003 Early ‘visual’ cortex activation correlates with
superior verbal memory performance in the blind. Nature Neuroscience 6:758–766.
Representation of Object Form in Vision and Touch 185
Amedi, A., W.M. Stern, J.A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitution
activates the lateral occipital complex. Nature Neuroscience 10:687–689.
Arno, P., A.G. De Volder, A. Vanlierde et al. 2001. Occipital activation by pattern recognition in the early blind
using auditory substitution for vision. NeuroImage 13:632–645.
Blajenkova, O., M. Kozhevnikov, and M.A. Motes. 2006. Object-spatial imagery: A new self-report imagery
questionnaire. Applied Cognitive Psychology 20:239–263.
Buelte, D., I.G. Meister, M. Staedtgen et al. 2008. The role of the anterior intraparietal sulcus in crossmodal
processing of object features in humans: An rTMS study. Brain Research 1217:110–118.
Burton, H., A.Z. Snyder, T.E. Conturo, E. Akbudak, J.M. Ollinger, and M.E. Raichle. 2002. Adaptive changes
in early and late blind: A fMRI study of Braille reading. Journal of Neurophysiology 87:589–607.
Carpenter, P.A., and P. Eisenberg. 1978. Mental rotation and the frame of reference in blind and sighted indi-
viduals. Perception & Psychophysics 23:117–124.
Craddock, M., and R. Lawson. 2008. Repetition priming and the haptic recognition of familiar and unfamiliar
objects. Perception & Psychophysics 70:1350–1365.
Dellantonio, A., and F. Spagnolo. 1990. Mental rotation of tactual stimuli. Acta Psychologica 73:245–257.
Deshpande, G., X. Hu, R. Stilla, and K. Sathian. 2008. Effective connectivity during haptic perception: A
study using Granger causality analysis of functional magnetic resonance imaging data. NeuroImage
40:1807–1814.
Deshpande, G., X. Hu, S. Lacey, R. Stilla, and K. Sathian. 2010. Object familiarity modulates effective con-
nectivity during haptic shape perception. NeuroImage 49:1991–2000.
De Volder, A.G., H. Toyama, Y. Kimura et al. 2001. Auditory triggered mental imagery of shape involves visual
association areas in early blind humans. NeuroImage 14:129–139.
Easton, R.D., A.J. Greene, and K. Srinivas. 1997a. Transfer between vision and haptics: Memory for 2-D pat-
terns and 3-D objects. Psychonomic Bulletin & Review 4:403–410.
Easton, R.D., K. Srinivas, and A.J. Greene. 1997b. Do vision and haptics share common representations?
Implicit and explicit memory within and between modalities. Journal of Experimental Psychology.
Learning, Memory, and Cognition 23:153–163.
Feinberg, T.E., L.J. Rothi, and K.M. Heilman. 1986. Multimodal agnosia after unilateral left hemisphere lesion.
Neurology 36:864–867.
Gauthier, I., W.G. Hayward, M.J. Tarr et al. 2002. BOLD activity during mental rotation and view-dependent
object recognition. Neuron 34:161–171.
Gibson, G., R. Stilla, and K. Sathian. 2008. Segregated visuo-haptic processing of texture and location. Abstract,
Human Brain Mapping.
Grefkes, C., S. Geyer, T. Schormann, P. Roland, and K. Zilles. 2001. Human somatosensory area 2: Observer-
independent cytoarchitectonic mapping, interindividual variability, and population map. NeuroImage
14:617–631.
Grefkes, C., P.H. Weiss, K. Zilles, and G.R. Fink. 2002. Crossmodal processing of object features in human
anterior intraparietal cortex: An fMRI study implies equivalencies between humans and monkeys. Neuron
35:173–184.
Grefkes, C., A. Ritzl, K. Zilles, and G.R. Fink. 2004. Human medial intraparietal cortex subserves visuomotor
coordinate transformation. NeuroImage 23:1494–1506.
Grefkes, C., and G. Fink. 2005. The functional organization of the intraparietal sulcus in humans and monkeys.
Journal of Anatomy 207:3–17.
Grill-Spector, K., T. Kushnir, S. Edelman, G. Avidan, Y. Itzchak, and R. Malach. 1999. Differential processing of
objects under various viewing conditions in the human lateral occipital complex. Neuron 24:187–203.
Hollins, M. 1986. Haptic mental rotation: More consistent in blind subjects? Journal of Visual Impairment &
Blindness 80:950–952.
Iwamura, Y. 1998. Hierarchical somatosensory processing. Current Opinion in Neurobiology 8:522–528.
James, T.W., G.K. Humphrey, J.S. Gati, R.S. Menon, and M.A. Goodale. 2002a. Differential effects of view on
object-driven activation in dorsal and ventral streams. Neuron 35:793–801.
James, T.W., G.K. Humphrey, J.S. Gati, P. Servos, R.S. Menon, and M.A. Goodale. 2002b. Haptic study of
three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40:1706–1714.
James, T.W., K.H. James, G.K. Humphrey, and M.A. Goodale. 2006. Do visual and tactile object representa-
tions share the same neural substrate? In Touch and Blindness: Psychology and Neuroscience, ed. M.A.
Heller and S. Ballesteros, 139–155. Mahwah, NJ: Lawrence Erlbaum Associates.
Kosslyn, S.M. 1973. Scanning visual images: Some structural implications. Perception & Psychophysics
14:90–94.
186 The Neural Bases of Multisensory Processes
Kosslyn, S.M., T.M. Ball, and B.J. Reiser. 1978. Visual images preserve metric spatial information: Evidence
from studies of image scanning. Journal of Experimental Psychology. Human Perception and Performance
4:47–60.
Kozhevnikov, M., M. Hegarty, and R.E. Mayer. 2002. Revising the visualiser–verbaliser dimension: Evidence
for two types of visualisers. Cognition and Instruction 20:47–77.
Kozhevnikov, M., S.M. Kosslyn, and J. Shephard. 2005. Spatial versus object visualisers: A new characterisa-
tion of cognitive style. Memory & Cognition 33:710–726.
Lacey, S., and C. Campbell. 2006. Mental representation in visual/haptic crossmodal memory: Evidence from
interference effects. Quarterly Journal of Experimental Psychology 59:361–376.
Lacey, S., A. Peters, and K. Sathian. 2007a. Cross-modal object representation is viewpoint-independent. PLoS
ONE 2:e890. doi: 10.1371/journal.pone0000890.
Lacey, S., C. Campbell, and K. Sathian. 2007b. Vision and touch: Multiple or multisensory representations of
objects? Perception 36:1513–1521.
Lacey, S., M. Pappas, A. Kreps, K. Lee, and K. Sathian. 2009a. Perceptual learning of view-independence in
visuo-haptic object representations. Experimental Brain Research 198:329–337.
Lacey, S., N. Tal, A. Amedi, and K. Sathian. 2009b. A putative model of multisensory object representation.
Brain Topography 21:269–274.
Lacey, S., P. Flueckiger, R. Stilla, M. Lava, and K. Sathian. 2010. Object familiarity modulates the relationship
between visual object imagery and haptic shape perception. NeuroImage 49:1977–1990.
Lawson, R. 2009. A comparison of the effects of depth rotation on visual and haptic three-dimensional object
recognition. Journal of Experimental Psychology. Human Perception and Performance 35:911–930.
Lederman, S.J., and R.L. Klatzky. 1987. Hand movements: A window into haptic object recognition. Cognitive
Psychology 19:342–368.
Lucan, J.N., J.J. Foxe, M. Gomez-Ramirez, K. Sathian, and S. Molholm. 2011. Tactile shape discrimination
recruits human lateral occipital complex during early perceptual processing. Human Brain Mapping
31:1813–1821.
Malach, R., J.B. Reppas, R.R. Benson et al. 1995. Object-related activity revealed by functional magnetic reso-
nance imaging in human occipital cortex. Proceedings of the National Academy of Sciences of the United
States of America 92:8135–8139.
Marmor, G.S., and L.A. Zaback. 1976. Mental rotation by the blind: Does mental rotation depend on visual
imagery? Journal of Experimental Psychology. Human Perception and Performance 2:515–521.
Motes, M.A., R. Malach, and M. Kozhevnikov. 2008. Object-processing neural efficiency differentiates object
from spatial visualizers. Neuroreport 19:1727–1731.
Newell, F.N., M.O. Ernst, B.S. Tjan, and H.H. Bülthoff. 2001. View dependence in visual and haptic object
recognition. Psychological Science 12:37–42.
Newell, F.N., A.T. Woods, M. Mernagh, and H.H. Bülthoff. 2005. Visual, haptic and crossmodal recognition of
scenes. Experimental Brain Research 161:233–242.
Newman, S.D., R.L. Klatzky, S.J. Lederman, and M.A. Just. 2005. Imagining material versus geometric prop-
erties of objects: An fMRI study. Cognitive Brain Research 23:235–246.
Noordzij, M.L., S. Zuidhoek, and A. Postma. 2007. The influence of visual experience on visual and spatial
imagery. Perception 36:101–112.
Palmer, S., E. Rosch, and P. Chase. 1981. Canonical perspective and the perception of objects. In Attention
and Performance IX, ed. J.B. Long and A.D. Baddeley, 135–151. Hillsdale, NJ: Lawrence Earlbaum
Associates.
Pascual-Leone, A., and R.H. Hamilton. 2001. The metamodal organization of the brain. Progress in Brain
Research 134:427–445.
Pascual-Leone, A., A. Amedi, F. Fregni, and L.B. Merabet. 2005. The plastic human brain. Annual Review of
Neuroscience 28:377–401.
Peissig, J.J., and M.J. Tarr. 2007. Visual object recognition: Do we know more now than we did 20 years ago?
Annual Review of Psychology 58:75–96.
Peltier, S., R. Stilla, E. Mariola, S. LaConte, X. Hu, and K. Sathian. 2007. Activity and effective connectivity of
parietal and occipital cortical regions during haptic shape perception. Neuropsychologia 45:476–483.
Perrett, D.I., M.H. Harries, and S. Looker. 1992. Use of preferential inspection to define the viewing sphere and
characteristic views of an arbitrary machined tool part. Perception 21:497–515.
Pietrini, P., M.L. Furey, E. Ricciardi et al. 2004. Beyond sensory images: Object-based representation in the
human ventral pathway. Proceedings of the National Academy of Sciences of the United States of America
101:5658–5663.
Representation of Object Form in Vision and Touch 187
Prather, S.C., J.R. Votaw, and K. Sathian. 2004. Task-specific recruitment of dorsal and ventral visual areas
during tactile perception. Neuropsychologia 42:1079–1087.
Reales, J.M., and S. Ballesteros. 1999. Implicit and explicit memory for visual and haptic objects: Cross-modal
priming depends on structural descriptions. Journal of Experimental Psychology. Learning, Memory, and
Cognition 25:644–663.
Reed, C.L., S. Shoham, and E. Halgren. 2004. Neural substrates of tactile object recognition: An fMRI study.
Human Brain Mapping 21:236–246.
Renier, L., O. Collignon, D. Tranduy et al. 2004. Visual cortex activation in early blind and sighted subjects
using an auditory visual substitution device to perceive depth. NeuroImage 22:S1.
Renier, L., O. Collignon, C. Poirier et al. 2005. Cross modal activation of visual cortex during depth perception
using auditory substitution of vision. NeuroImage 26:573–580.
Riesenhuber, M., and T. Poggio. 1999. Hierarchical models of object recognition in cortex. Nature Neuroscience
2:1019–1025.
Röder, B., and F. Rösler. 1998. Visual input does not facilitate the scanning of spatial images. Journal of Mental
Imagery 22:165–181.
Saito, D.N., T. Okada, Y. Morita, Y. Yonekura, and N. Sadato. 2003. Tactile–visual cross-modal shape match-
ing: A functional MRI study. Cognitive Brain Research 17:14–25.
Sathian, K. 2004. Modality, quo vadis? Comment. Behavioral and Brain Sciences 27:413–414.
Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived.
Developmental Psychobiology 46:279–286.
Sathian, K., and S. Lacey. 2007. Journeying beyond classical somatosensory cortex. Canadian Journal of
Experimental Psychology 61:254–264.
Sathian, K., A. Zangaladze, J.M. Hoffman, and S.T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport
8:3877–3881.
Shepard, R.N., and J. Metzler. 1971. Mental rotation of three-dimensional objects. Science 171:701–703.
Shikata, E., A. McNamara, A. Sprenger et al. 2008. Localization of human intraparietal areas AIP, CIP, and
LIP using surface orientation and saccadic eye movement tasks. Human Brain Mapping 29:411–421.
Stilla, R., R. Hanna, X. Hu, E. Mariola, G. Deshpande, and K. Sathian. 2008. Neural processing underlying
tactile microspatial discrimination in the blind: A functional magnetic resonance imaging study. Journal
of Vision 8:1–19 doi:10.1167/8.10.13.
Stilla, R., and K. Sathian. 2008. Selective visuo-haptic processing of shape and texture. Human Brain Mapping
29:1123–1138.
Stoesz, M., M. Zhang, V.D. Weisser, S.C. Prather, H. Mao, and K. Sathian. 2003. Neural networks active during
tactile form perception: Common and differential activity during macrospatial and microspatial tasks.
International Journal of Psychophysiology 50:41–49.
Swisher, J.D., M.A. Halko, L.B. Merabet, S.A. McMains, and D.C. Somers. 2007. Visual topography of human
intraparietal sulcus. Journal of Neuroscience 27:5326–5337.
Ueda, Y., and J. Saiki. 2007. View independence in visual and haptic object recognition. Japanese Journal of
Psychonomic Science 26:11–19.
Woods, A.T., A. Moore, and F.N. Newell. 2008. Canonical views in haptic object representation. Perception
37:1867–1878.
Zhang, M., V.D. Weisser, R. Stilla, S.C. Prather, and K. Sathian. 2004. Multisensory cortical processing of object
shape and its relation to mental imagery. Cognitive, Affective & Behavioral Neuroscience 4:251–259.
Zhou, Y.-D., and J.M. Fuster. 1997. Neuronal activity of somatosensory cortex in a cross-modal (visuo-haptic)
memory task. Experimental Brain Research 116:551–555.
Section III
Combinatorial Principles and Modeling
11 Spatial and Temporal Features
of Multisensory Processes
Bridging Animal and Human Studies
Diana K. Sarko, Aaron R. Nidiffer, Albert R. Powers III,
Dipanwita Ghose, Andrea Hillock-Dunn,
Matthew C. Fister, Juliane Krueger, and Mark T. Wallace
CONTENTS
11.1 Introduction........................................................................................................................... 192
11.2 Neurophysiological Studies in Animal Models: Integrative Principles as a Foundation
for Understanding Multisensory Interactions........................................................................ 192
11.3 Neurophysiological Studies in Animal Models: New Insights into Interdependence of
Integrative Principles............................................................................................................. 193
11.3.1 Spatial Receptive Field Heterogeneity and Its Implications for Multisensory
Interactions................................................................................................................ 193
11.3.2 Spatiotemporal Dynamics of Multisensory Processing............................................ 197
11.4 Studying Multisensory Integration in an Awake and Behaving Setting: New Insights
into Utility of Multisensory Processes.................................................................................. 199
11.5 Human Behavioral and Perceptual Studies of Multisensory Processing: Building
Bridges between Neurophysiological and Behavioral and Perceptual Levels of Analysis.........201
11.5.1 Defining the “Temporal Window” of Multisensory Integration............................... 201
11.5.2 Stimulus-Dependent Effects on the Size of the Multisensory Temporal Window....202
11.5.3 Can “Higher-Order” Processes Affect Multisensory Temporal Window?............... 203
11.6 Adult Plasticity in Multisensory Temporal Processes: Psychophysical and
Neuroimaging Evidence........................................................................................................203
11.7 Developmental Plasticity in Multisensory Representations: Insights from Animal and
Human Studies....................................................................................................................... 205
11.7.1 Neurophysiological Studies into Development of Multisensory Circuits..................205
11.7.2 Development of Integrative Principles......................................................................206
11.7.3 Experientially Based Plasticity in Multisensory Circuits..........................................207
11.7.4 Development of Human Multisensory Temporal Perception....................................207
11.8 Conclusions and Future Directions........................................................................................209
References....................................................................................................................................... 210
191
192 The Neural Bases of Multisensory Processes
11.1 INTRODUCTION
Multisensory processing is a pervasive and critical aspect of our behavioral and perceptual reper-
toires, facilitating and enriching a wealth of processes including target identification, signal detec-
tion, speech comprehension, spatial navigation, and flavor perception to name but a few. The adaptive
advantages that multisensory integration confers are critical to survival, with effective acquisition
and use of multisensory information enabling the generation of appropriate behavioral responses
under circumstances in which one sense is inadequate. In the behavioral domain, a number of
studies have illustrated the strong benefits conferred under multisensory circumstances, with the
most salient examples including enhanced orientation and discrimination (Stein et al. 1988, 1989),
improved target detection (Frassinetti et al. 2002; Lovelace et al. 2003), and speeded responses
(Hershenson 1962; Hughes et al. 1994; Frens et al. 1995; Harrington and Peck 1998; Corneil et al.
2002; Forster et al. 2002; Molholm et al. 2002; Amlot et al. 2003; Diederich et al. 2003; Calvert
and Thesen 2004).
Along with these behavioral examples, there are myriad perceptual illustrations of the power
of multisensory interactions. For example, the intensity of a light is perceived as greater when
presented with a sound (Stein et al. 1996) and judgments of stimulus features such as speed and
orientation are often more accurate when combined with information available from another sense
(Soto-Faraco et al. 2003; Manabe and Riquimaroux 2000; Clark and Graybiel 1966; Wade and Day
1968). One of the most compelling examples of multisensory-mediated perceptual gains can be
seen in the speech realm, where the intelligibility of a spoken signal can be greatly enhanced when
the listener can see the speaker’s face (Sumby and Pollack 1954). In fact, this bimodal gain may
be a principal factor in the improvements in speech comprehension seen in those with significant
hearing loss after visual training (Schorr et al. 2005; Rouger et al. 2007). Regardless of whether
the benefits are seen in the behavioral or perceptual domains, they typically exceed those that are
predicted on the basis of responses to each of the component unisensory stimuli (Hughes et al. 1994,
1998; Corneil and Munoz 1996; Harrington and Peck 1998). Such deviations from simple additive
models provide important insights into the neural bases for these multisensory interactions in that
they strongly argue for a convergence and active integration of the different sensory inputs within
the brain.
with the physical location of the paired stimuli, and illustrates the importance of spatial proximity
in driving the largest proportionate gains in response. Similarly, the temporal principle captures
the fact that the largest gains are typically seen when stimuli are presented close together in time,
and that the magnitude of the interaction declines as the stimuli become increasingly separated in
time. Finally, the principle of inverse effectiveness reflects the fact that the largest gains are gener-
ally seen to the pairing of two weakly effective stimuli. As individual stimuli become increasingly
effective in driving neuronal responses, the size of the interactions seen to the pairing declines.
Together, these principles have provided an essential predictive outline for understanding multisen-
sory integration at the neuronal level, as well as for understanding the behavioral and perceptual
consequences of multisensory pairings. However, it is important to point out that these principles,
although widely instructive, fail to capture the complete integrative profile of any individual neuron.
The reason for this is that space, time, and effectiveness are intimately intertwined in naturalistic
stimuli, and manipulating one has a consequent effect on the others. Recent studies, described in
the next section, have sought to better understand the strong interdependence between these fac-
tors, with the hope of better elucidating the complex spatiotemporal architecture of multisensory
interactions.
Receptive field
locations Stimulus
locations
SUA
SRF
SDF
Elevation (deg)
Visual
Auditory
100
Spikes/s
50
0
–200 0 200 400 600
Time (ms)
Azimuth (deg)
FIGURE 11.1 Construction of an SRF for an individual multisensory neuron. Each stimulus location tested
within receptive field generates a response that is then compiled into a single unit activity (SUA) plot. SUA
plot at one location is shown in detail to illustrate how spike density function (SDF) is derived. Finally, SDF/
SUA data are transformed into a pseudocolor SRF plot in which normalized evoked response is shown relative
to azimuth and elevation. Evoked responses are scaled to maximal response, with warmer colors representing
higher firing rates. (Adapted from Carriere, B.N. et al., J. Neurophysiol., 99, 2357–2368, 2008.)
predictions. The first is that spatial location takes precedence and that the resultant interactions
would be completely a function of the spatial disparity between the paired stimuli. In this scenario,
the largest interactions would be seen when the stimuli were presented at the same location, and
the magnitude of the interaction would decline as spatial disparity increased. Although this would
seem to be a strict interpretation of the spatial principle, in fact, even the early characterization of
this principle focused not on location or disparity, but rather on the presence or absence of stimuli
within the receptive field (Meredith and Stein 1986), hinting at the relative lack of importance of
absolute location. The second hypothesis is that stimulus effectiveness would be the dominant fac-
tor, and that the interaction would be dictated not by spatial location but rather by the magnitude of
the individual sensory responses (which would be modulated by changes in spatial location). The
final hypothesis is that there is an interaction between stimulus location and effectiveness, such that
both would play a role in shaping the resultant interaction. If this were the case, studies would seek
to identify the relative weighting of these two stimulus dimensions to gain a better mechanistic view
into these interactions.
The first foray into this question focused on cortical area AES (Carriere et al. 2008). Here, it
was found that SRF architecture played an essential deterministic role in the observed multisensory
interactions, and most intriguingly, in a manner consistent with the second hypothesis outlined
above. Thus, and as illustrated in Figure 11.2, SRF architecture resulted in changes in stimulus
effectiveness that formed the basis for the multisensory interaction. In the neuron shown, if the stim-
uli were presented in a region of strong response within the SRF, a response depression would result
(Figure 11.2b, left column). In contrast, if the stimuli were moved to a location of weak response,
their pairing resulted in a large enhancement (Figure 11.2b, center column). Intermediate regions
(a) (b) S
Visual A
1 V
14
15 0.8 7
Trial Stim
0 0.6 200
–15 0.4
100
–30 0.2 +2SD +2SD +2SD
Elevation (deg)
0 Mean Mean Mean
Spikes/s
0 143 131
–10 0 10 20 69 263 153
64 175 65 253
Azimuth (deg)
–100 0 100 200 300 400 –100 0 100 200 300 400 –100 0 100 200 300 400
S
A
Auditory (50%) V
1 14
0.8 7
Trial Stim
15
200
0 0.6
–15 0.4 100
–30 0.2 +2SD +2SD +2SD
Elevation (deg)
0 Mean Mean Mean
Spikes/s
0
–10 0 10 20
Azimuth (deg) No stimulus evoked response detected No stimulus evoked response detected No stimulus evoked response detected
–100 0 100 200 300 400 –100 0 100 200 300 400 –100 0 100 200 300 400
S
A
Multisensory V
14
1
7
Trial Stim
15 0.8
200
0 0.6
Spatial and Temporal Features of Multisensory Processes
Elevation (deg)
0 Mean Mean Mean
Spikes/s
198 199 106
0 99 267 56 286 65 241
–10 0 10 20
Azimuth (deg) –100 0 100 200 300 400 –100 0 100 200 300 400 –100 0 100 200 300 400
Time from stimulus onset (ms) Time from stimulus onset (ms) Time from stimulus onset (ms)
(c) (d)
10 200 10 200 10 200
spikes/trial
spikes/trial
spikes/trial
Multisensory
Multisensory
Multisensory
interaction (%)
interaction (%)
interaction (%)
FIGURE 11.2 (See color insert.) Multisensory interactions in AES neurons differ based on location of paired stimuli. (a) Visual, auditory, and multisensory SRFs
are shown with highlighted locations (b, d) illustrating response suppression (left column), response enhancement (middle column), and no significant interaction (right
column). (c) Shaded areas depict classically defined receptive fields for visual (blue) and auditory (green) stimuli.
195
196 The Neural Bases of Multisensory Processes
of response resulted in either weak or no interactions (Figure 11.2b, right column). In addition to
this traditional measure of multisensory gain (relative to the best unisensory response), these same
interactions can also be examined and quantified relative to the predicted summation of the unisen-
sory responses (Wallace et al. 1992; Wallace and Stein 1996; Stein and Wallace 1996; Stanford et
al. 2005; Royal et al. 2009; Carriere et al. 2008). In these comparisons, strongly effective pairings
typically result in subadditive interactions, weakly effective pairings result in superadditive inter-
actions, and intermediate pairings result in additive interactions. Visualization of these different
categories of interactions relative to additive models can be captured in pseudocolor representations
such as that shown in Figure 11.3, in which the actual multisensory SRF is contrasted against that
predicted on the basis of additive modeling. Together, these results clearly illustrate the primacy of
stimulus efficacy in dictating multisensory interactions, and that the role of space per se appears to
be a relatively minor factor in governing these integrative processes.
Parallel studies are now beginning to focus on the SC, and provide an excellent comparative
framework from which to view multisensory interactive mechanisms across brain structures. In
this work, Krueger et al. (2009) reported that the SRF architecture of multisensory neurons in the
SC is not only similar to that of cortical neurons, but also that stimulus effectiveness appears to
once again be the key factor in dictating the multisensory response. Thus, stimulus pairings within
regions of weak unisensory response often resulted in superadditive interactions (Figure 11.4b–c,
◼), whereas pairings at locations of strong unisensory responses typically exhibited subadditive
interactions (Figure 11.4b–c, ○). Overall, such an organization presumably boosts signals within
weakly effective regions of the unisensory SRFs during multisensory stimulus presentations and
yields more reliable activation for each stimulus presentation.
Although SRF architecture appears similar in both cortical and subcortical multisensory brain
regions, there are also subtle differences that may provide important insights into both the underly-
ing mechanistic operations and the different behavioral and perceptual roles of AES and SC. For
example, when the SRFs of a multisensory neuron in the SC are compared under different sensory
V
Visual V Multisensory A
Azimuth (deg)
Azimuth (deg)
10 10
30 30
50 50
0 200 400 600 0 200 400 600
V
Auditory A (V + A) A 1
Azimuth (deg)
Azimuth (deg)
10 10
30 30
50 50
0
0 200 400 600 0 200 400 600
Time from stim onset (ms) Time from stim onset (ms)
FIGURE 11.3 Multisensory interactions relative to additive prediction models. Visual, auditory, and multi-
sensory (VA) SRFs are shown for an individual multisensory neuron of AES. True multisensory responses can
be contrasted with those predicted by an additive model (V + A) and reveal a richer integrative microarchitec-
ture than predicted by simple linear summation of unisensory response profiles. (Adapted from Carriere, B.N.
et al., J. Neurophysiol., 99, 2357–2368, 2008.)
Spatial and Temporal Features of Multisensory Processes 197
(a) (b)
Multisensory (M) Visual (V) Auditory (A)
Elevation (deg)
Elevation (deg)
60 60
30 30
0 0
30 30
−60 −60
−60 −30 0 30 −60 −30 0 30 −60 −30 0 30 −60 −30 0 30
Azimuth (deg) Azimuth (deg) Azimuth (deg) Azimuth (deg)
(c)
A
V
Spikes/s stim
V V
150 A A
M M
100
50
0
–100 0 100 200 300 400 –100 0 100 200 300 400
Time from stimulus onset (ms) Time from stimulus onset (ms)
FIGURE 11.4 Multisensory interactions in SC neurons differ based on location of paired stimuli. (a) Visual,
auditory, and multisensory SRFs are shown as a function of azimuth (x axis) and elevation (y axis). Specific
locations within receptive field (b) are illustrated in detail (c) to show evoked responses for visual, auditory,
and multisensory conditions. Weakly effective locations (square) result in response enhancement, whereas
conditions evoking a strong unisensory response (circle) result in response suppression.
conditions, there appears to be a global similarity in the structure of each SRF with respect to both
the number and location of hot spots. This might indicate that the overall structure of the SRF is
dependent on fixed anatomical and/or biophysical constraints such as the extent of dendritic arbors.
However, these characteristics are far less pronounced in cortical SRFs (Carriere et al. 2008), possi-
bly due to the respective differences in the inputs to these two structures (the cortex receiving more
heterogeneous inputs) and/or due to less spatiotopic order in the cortex. Future work will seek to
better clarify these intriguing differences across structures.
ms. Along with these unitary changes, recent work had also shown that the timing of sensory inputs
with respect to ongoing neural oscillations in the neocortex has a significant impact on whether
neuronal responses are enhanced or suppressed. For instance, in macaque primary auditory cortex,
properly timed somatosensory input has been found to reset ongoing oscillations to an optimal
excitability phase that enhances the response to temporally correlated auditory input. In contrast,
somatosensory input delivered during suboptimal, low-excitability oscillatory periods depresses the
auditory response (Lakatos et al. 2007).
Although clearly illustrating the importance of stimulus timing in shaping multisensory interac-
tions, these prior studies have yet to characterize the interactions between time, space, and effec-
tiveness in the generation of a multisensory response. To do this, recent studies from our laboratory
have extended the SRF analyses described above to include time, resulting in the creation of spa-
tiotemporal receptive field (STRF) plots. It is important to point out that such analyses are not
a unique construct to multisensory systems, but rather stem from both spatiotemporal and spec-
trotemporal receptive field studies within individual sensory systems (David et al. 2004; Machens
et al. 2004; Haider et al. 2010; Ye et al. 2010). Rather, the power of the STRF here is its application
to multisensory systems as a modeling framework from which important mechanistic insights can
be gained about the integrative process.
The creation of STRFs for cortical multisensory neurons has revealed interesting features about
the temporal dynamics of multisensory interactions and the evolution of the multisensory response
(Royal et al. 2009). Most importantly, these analyses, when contrasted with simple additive mod-
els based on the temporal architecture of the unisensory responses, identified two critical epochs
in the multisensory response not readily captured by additive processes (Figure 11.5). The first of
these, presaged by the Rowland et al. study described above, revealed an early phase of superaddi-
tive multisensory responses that manifest as a speeding of response (i.e., reduced latency) under
200
Change in firing rate (%)
150
100
50
0
−50
−100
−15 −10 −5 0 5 10 15 20 25 –50 −25 0 25 50 75
Latency shift (ms) Duration shift (ms)
Integration (%)
800
400
0
60
Firing rate (Hz)
45
30
15
0
–500 0 500 1000 –1000 –500 0 500
Time from onset of predicted Time from offset of predicted
multisensory response (ms) multisensory response (ms)
FIGURE 11.5 Spatiotemporal response dynamics in multisensory AES neurons. A reduced response latency
and increased response duration characterized spatiotemporal dynamics of paired multisensory stimuli.
Spatial and Temporal Features of Multisensory Processes 199
multisensory conditions. The second of these happens late in the response epoch, where the multi-
sensory response continues beyond the truncation of the unisensory responses, effectively increasing
response duration under multisensory circumstances. It has been postulated that these two distinct
epochs of multisensory integration may ultimately be linked to very different behavioral and/or
perceptual roles (Royal et al. 2009). Whereas reduced latencies may speed target detection and
identification, extended response duration may facilitate perceptual analysis of the object or area of
interest. One interesting hypothesis is that the early speeding of responses will be more prominent
in SC multisensory neurons given their important role in saccadic (and head) movements, and that
the extended duration will be seen more in cortical networks engaged in perceptual analyses. Future
work, now in progress in our laboratory (see below), will seek to clarify the behavioral/perceptual
roles of these integrative processes by directly examining the links at the neurophysiological and
behavioral levels.
(a) (b)
STRFs STRFs Rasters and
Rasters and
0 perievent time histograms perievent time histograms
10 0
V 20 20
30 150 200
50 40
100 100
0 50
10 0 0 0
A 20 20
30
200
Azimuth (deg)
50 150 40
Spikes/s
Spikes/s
100 100
0
10 50 0
0 0
VA 20 20
30
50 40
150 200
100
100
50
0 0 0 0
10 –100 0 100 200 300 400 –100 0 100 200 300 400
(V+A) 20 20 Time (ms)
Time (ms)
30 40
50
0 1.0 1.0
10 0
VA–(V+A) 20 0.0 20 0.0
30
40
50 –1.0 –1.0
0 100 200 300 400 0 100 200 300 400
Time from stim onset (ms) Time from stim onset (ms)
FIGURE 11.6 (See color insert.) Representative STRF from awake (a) versus anesthetized (b) recordings from cat SC using simple audiovisual stimulus presentations
(an LED paired with broadband noise). In awake animals, superadditive interactions occurred over multiple time points in multisensory condition (VA) when compared
to what would be predicted based on a linear summation of unisensory responses (V + A; see contrast, VA – [V + A]). This differs from anesthetized recordings from
SC in which multisensory interactions are limited to earliest temporal phase of multisensory response.
The Neural Bases of Multisensory Processes
Spatial and Temporal Features of Multisensory Processes 201
Preliminary studies have already identified that multisensory neurons in the SC of the awake cat
demonstrate extended response durations, as well as superadditive interactions over multiple time
scales, when compared to anesthetized animals in which multisensory interactions are typically
limited to the early phases of the response (Figure 11.6; Krueger et al. 2008). These findings remain
to be tested in multisensory regions of the cortex, or extended beyond simple stimuli (LEDs paired
with white noise) to more complex, ethologically relevant cues that might better address multisen-
sory perceptual capabilities. Responses to naturalistic stimuli in cats have primarily been examined
in unisensory cortices, demonstrating that simplification of natural sounds (bird chirps) results in
significant alteration of neuronal responses (Bar-Yosef et al. 2002) and that firing rates differ for
natural versus time-reversed conspecific vocalizations (Qin et al. 2008) in the primary auditory cor-
tex. Furthermore, multisensory studies in primates have shown that multisensory enhancement in
the primary auditory cortex of awake monkeys was reduced when a mismatched pair of naturalistic
audiovisual stimuli was presented (Kayser et al. 2010).
this difference, take an audiovisual event happening at a distance of 1 m, where the incident ener-
gies will arrive at the retina almost instantaneously and at the cochlea about 3 ms later (the speed
of sound is approximately 330 m/s). Now, if we move that same audiovisual source to a distance of
20 m, the difference in arrival times expands to 60 ms. Hence, having a window of tolerance for
these audiovisual delays represents an effective means to continue to bind stimuli across modalities
even without absolute correspondence in their incident arrival times.
Because of the importance of temporal factors for multisensory integration, a number of experi-
mental paradigms have been developed for use in human subjects as a way to systematically study
the temporal binding window and its associated dynamics. One of the most commonly used of these
is a simultaneity judgment task, in which paired visual and auditory stimuli are presented at various
SOAs and participants are asked to judge whether the stimuli occurred simultaneously or succes-
sively (Zampini et al. 2005a; Engel and Dougherty 1971; Stone et al. 2001; Stevenson et al. 2010).
A distribution of responses can then be created that plots the probability of simultaneity reports as
a function of SOA. This distribution yields not only the point of subjective simultaneity, defined as
the peak of function (Stone et al. 2001; Zampini et al. 2005a) but, more importantly, can be used
to define a “window” of time within which simultaneity judgments are highly likely. A similar
approach is taken in paradigms designed to assess multisensory temporal order judgments, wherein
participants judge whether stimuli within one or another modality was presented first. Similar to
the simultaneity judgment task, the point of subjective simultaneity is the time point at which par-
ticipants judge either stimulus to have occurred first at a rate of 50% (Zampini et al. 2003; Spence
et al. 2001). Once again, this method can also be adapted to create response distributions that serve
as proxies for the temporal binding window. Although the point measures (i.e., point of subjective
simultaneity) derived from these studies tend to differ based on the paradigm chosen (Fujisaki et
al. 2004; Vroomen et al. 2004; Zampini et al. 2003, 2005a), the span of time over which there is a
high likelihood of reporting simultaneity is remarkably constant, ranging from about –100 ms to
250 ms, where negative values denote auditory-leading-visual conditions (Dixon and Spitz 1980;
Fujisaki et al. 2004; Vroomen et al. 2004; Zampini et al. 2003, 2005a). The larger window size on
the right side of these distributions—in which vision leads audition—appears in nearly all stud-
ies of audiovisual simultaneity perception, and has been proposed to arise from the inherent flex-
ibility needed to process real-world audiovisual events, given that the propagation speeds of light
and sound will result in SOAs only on the right side of these distributions (Dixon and Spitz 1980).
Indeed, very recent efforts to model the temporal binding window within a probabilistic framework
(Colonius and Diederich 2010a, 2010b) have described this asymmetry as arising from an asym-
metry in Bayesian priors across SOAs corresponding to the higher probability that visual-first pairs
were generated by the same external event.
by the duration of the elemental building blocks of the spoken language—phonemes (Crystal and
House 1981).
Other studies have focused on altering the statistics of multisensory temporal relations in an
effort to better characterize the malleability of these processes. For example, repeated exposure to
a 250-ms auditory-leading-visual asynchronous pair is capable of biasing participants’ simultaneity
judgments in the direction of that lag by about 25 ms, with effects lasting on the order of minutes
(Fujisaki et al. 2004; Vroomen et al. 2004). Similar recalibration effects have been noted after expo-
sure to asynchronous audiovisual speech, as well as to visual–tactile, audio–tactile, and sensory–
motor pairs (Hanson et al. 2008; Fajen 2007; Stetson et al. 2006; Navarra et al. 2005). Although the
exact mechanisms underlying these changes are unknown, they have been proposed to represent a
recalibration of sensory input consistent with Bayesian models of perception (Hanson et al. 2008;
Miyazaki et al. 2005, 2006).
(a) (b)
1 Baseline
300
*
250
200
150
100
50
0
Baseline Pre Post Pre Post Pre Post Pre Post Pre Post
Day 1 Day 1 Day 2 Day 2 Day 3 Day 3 Day 4 Day 4 Day 5 Day 5
FIGURE 11.7 Training on a two-alternative forced choice simultaneity judgment forced choice task. (a) An
estimate of temporal binding window is derived using a criterion set at 75% of maximum. In this representa-
tive individual case, window narrows from 321 to 115 ms after 5 days (1 h/day) of feedback training. (b) After
training, a significant decrease in probability of judging nonsimultaneous audiovisual pairs to be simultane-
ous was found (*P < .05). (c) Average window size dropped significantly after first day (1 h) of training, then
remained stable (*P < .05).
window, with a group average reduction of 40%. Further characterization revealed that the changes
in window size were very rapid (being seen after the first day of training), were durable (lasting
at least a week after the cessation of training), and were a direct result of the feedback provided
(control subjects passively exposed to the same stimulus set did not exhibit window narrowing).
Additionally, to rule out the possibility that this narrowing was the result of changes in cognitive
biases, a second experiment using a two-interval forced choice paradigm was undertaken in which
participants were instructed to identify the simultaneously presented audiovisual pair presented
within one of two intervals. The two-interval forced choice paradigm resulted in a narrowing that
was similar in both degree and dynamics to that using the two-alternative forced choice approach.
Overall, this result is the first to illustrate a marked experience-dependent malleability to the mul-
tisensory temporal binding window, a result that has potentially important implication for clinical
conditions such as autism and dyslexia in which there is emerging evidence for changes in multisen-
sory temporal function (Ciesielski et al. 1995; Laasonen et al. 2001, 2002; Kern 2002; Hairston et
al. 2005; Facoetti et al. 2010; Foss-Feig et al. 2010).
In an effort to better define the brain networks responsible for multisensory temporal perception
(and the demonstrable plasticity), our laboratory has conducted a follow-up neuroimaging study using
functional magnetic resonance imaging (fMRI) (Powers et al. 2010). The findings revealed marked
Spatial and Temporal Features of Multisensory Processes 205
changes in one of the best-established multisensory cortical domains in humans, the posterior supe-
rior temporal sulcus (pSTS). The pSTS exhibited striking decreases in blood oxygen level dependent
(BOLD) activation after training, suggestive of an increased efficiency of processing. In addition to
these changes in pSTS were changes in regions of the auditory and visual cortex, along with marked
changes in functional coupling between these unisensory domains and the pSTS. Together, these
studies are beginning to reveal the cortical networks involved in multisensory temporal processing
and perception, as well as the dynamics of these networks that must be continually adjusted to cap-
ture the ever-changing sensory statistics of our natural world as well as their cognitive valence.
70 SC SC
AES
60
% Multisensory cells
50
40
30
AES
20
10
0
0 5 10 15 20 Adult
Postnatal age (weeks)
FIGURE 11.8 Development of multisensory neurons in SC (open circles) versus AES (closed circles) of cat.
Development of multisensory neurons is similar between SC and AES with exceptions of onset and overall
percentage of multisensory neurons. At 4 months postnatal life, percentages of multisensory neurons in both
AES and SC are at their mature levels, with SC having a higher percentage than AES.
The parallels between SC and AES in their multisensory developmental chronology likely
reflect the order of overall sensory development (Gottlieb 1971), rather than dependent connectiv-
ity between the two regions because the establishment of sensory profiles in the SC precedes the
functional maturation of connections between AES and the SC (Wallace and Stein 2000). Thus, a
gradual recruitment of sensory functions during development appears to produce neurons capable
of multisensory integration (Lewkowicz and Kraebel 2004; Lickliter and Bahrick 2004), and points
strongly to a powerful role for early experience in sculpting the final multisensory state of these
systems (see Section 11.7.3).
yet clear, but may have something to do with the fact that young animals are generally only con-
cerned with events in the immediate proximity to the body (and which would make an SOA close
to 0 of greatest utility). As the animal becomes increasingly interested in exploring space at greater
distances, an expansion in the temporal window would allow for the better encoding of these more
distant events. We will return to the issue of plasticity in the multisensory temporal window when
we return to the human studies (see Section 11.7.4).
50
–50
–30 –20 –10 0 10 20 30 40 50 60
100
80
60
40
20
–20
FIGURE 11.9 Developmental manipulations of spatial and temporal relationships of audiovisual stimuli.
(a) Multisensory interaction is shown as a function of spatially disparate stimuli between normally reared
animals and animals reared with a 30° disparity between auditory and visual stimuli. Peak multisensory
interaction for disparately reared group falls by 30° from that of normally reared animals. (b) Multisensory
interaction as a function of SOA in animals reared normally versus animals reared in environments with 100
and 250 ms temporal disparities. As might be expected, peak multisensory interactions are offset by 100 ms
for normally reared versus the 100 ms disparate group. Interestingly, the 250 ms group loses the ability to
integrate audiovisual stimuli.
These studies strongly suggest that the maturation of multisensory temporal functioning extends
beyond the first decade of life. In the initial study, it was established that multisensory temporal
functioning was still not mature by 10 to 11 years of age (Hillock et al. 2010). Here, children were
assessed on a simultaneity judgment task in which flashes and tone pips were presented at SOAs
ranging from –450 to +450 ms (with positive values representing visual-leading stimulus trials and
Spatial and Temporal Features of Multisensory Processes 209
700
600
500
300
200
100
0
0 5 10 15 20 25
Subject age (y)
FIGURE 11.10 Temporal window size decreases from childhood to adulthood. Each data point represents a
participant’s window size as determined by width at 75% of maximum probability of perceived simultaneity
using nonspeech stimuli. See Section 11.5.1. (Adapted from Hillock, A.R. et al., Binding of sights and sounds:
Age-related changes in audiovisual temporal processing, 2010, submitted for publication.)
negative values representing auditory-leading stimulus trials), allowing for the creation of a response
distribution identical to what has been done in adults and which serves as a proxy for the multi-
sensory temporal binding window (see Section 11.6). When compared with adults, the group mean
window size for these children was found to be approximately 38% larger (i.e., 413 vs. 299 ms). A
larger follow-up study then sought to detail the chronology of this maturational process from 6 years
of age until adulthood, and identified the closure of the binding window in mid to late adolescence
for these simple visual–auditory pairings (Figure 11.10; Hillock and Wallace 2011b). A final study
then sought to extend these analyses into the stimulus domain with which children likely have the
greatest experience—speech. Using the McGurk effect, which uses the pairing of discordant visual
and auditory speech stimuli (e.g., a visual /ga/ with an auditory /ba/), it is possible to index the
integrative process by looking at how often participants report fusions that represent a synthesis of
the visual and auditory cues (e.g., /da/ or /tha/). Furthermore, because this effect has been shown
to be temporally dependent, it can be used as a tool to study the multisensory temporal binding
window for speech-related stimuli. Surprisingly, when used with children (6–11 years), adolescents
(12–17 years), and adults (18–23 years), windows were found to be indistinguishable (Hillock and
Wallace 2011a). Together, these studies show a surprising dichotomy between the development of
multisensory temporal perception for nonspeech versus speech stimuli, a result that may reflect the
powerful imperative placed on speech in young children, and reinforcing the importance of sensory
experience in the development of multisensory abilities.
Although we have made great strides in recent years in building a better understanding of multi-
sensory behavioral and perceptual processes and their neural correlates, we still have much to dis-
cover. Fundamental questions remain unanswered, providing both a sense of frustration but also a
time of great opportunity. One domain of great interest to our laboratory is creating a bridge between
the neural and the behavioral/perceptual in an effort to extend beyond the correlative analyses done
thus far. Paradigms developed in awake and behaving animals allow for a direct assessment of neu-
ral and behavioral responses during performance on the same task, and should more directly link
multisensory encoding processes to their striking behavioral benefits (e.g., see Chandrasekaran and
Ghazanfar 2009). However, even these experiments provide only correlative evidence, and future
work will seek to use powerful new methods such as optogenetic manipulation in animal models
(e.g., see Cardin et al. 2009) and transcranial magnetic stimulation in humans (e.g., see Romei et al.
2007; Beauchamp et al. 2010; Pasalar et al. 2010) to selectively deactivate specific circuit compo-
nents and then assess the causative impact on multisensory function.
REFERENCES
Amlot, R., R. Walker, J. Driver, and C. Spence. 2003. Multimodal visual–somatosensory integration in saccade
generation. Neuropsychologia, 41, 1–15.
Bar-Yosef, O., Y. Rotman, and I. Nelken. 2002. Responses of neurons in cat primary auditory cortex to bird
chirps: Effects of temporal and spectral context. Journal of Neuroscience, 22, 8619–8632.
Beauchamp, M.S., A.R. Nath, and S. Pasalar. 2010. fMRI-guided transcranial magnetic stimulation reveals
that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience, 30,
2414–2417.
Bell, A.H., M.A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate
superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of
Neurophysiology, 93, 3659–3673.
Benedek, G., G. Eordegh, Z. Chadaide, and A. Nagy. 2004. Distributed population coding of multisensory spa-
tial information in the associative cortex. European Journal of Neuroscience, 20, 525–529.
Calvert, G.A., and T. Thesen. 2004. Multisensory integration: methodological approaches and emerging prin-
ciples in the human brain. Journal of Physiology, Paris, 98, 191–205.
Cardin, J.A., M. Carlen, K. Meletis, U. Knoblich, F. Zhang, K. Deisseroth, L.H. Tsai, and C.I. Moore. 2009.
Driving fast-spiking cells induces gamma rhythm and controls sensory responses. Nature, 459, 663–667.
Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007.
Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology,
98, 2858–2867.
Carriere, B.N., D.W. Royal, and M.T. Wallace. 2008. Spatial heterogeneity of cortical receptive fields and its
impact on multisensory interactions. Journal of Neurophysiology, 99, 2357–2368.
Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices
differently in the superior temporal sulcus. Journal of Neurophysiology, 101, 773–788.
Ciesielski, K.T., J.E. Knight, R.J. Prince, R.J. Harris, and S.D. Handmaker. 1995. Event-related potentials in
cross-modal divided attention in autism. Neuropsychologia, 33, 225–246.
Clark, B., and A. Graybiel. 1966. Factors contributing to the delay in the perception of the oculogravic illusion.
American Journal of Psychology, 79, 377–388.
Colonius, H., and P. Arndt. 2001. A two-stage model for visual–auditory interaction in saccadic latencies.
Perception & Psychophysics, 63, 126–147.
Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-of-
integration model. Journal of Cognitive Neuroscience, 16, 1000–1009.
Colonius, H., and A. Diederich. 2010a. The optimal time window of visual–auditory integration: A reaction
time analysis. Frontiers in Integrative Neuroscience, 4, 11.
Colonius, H., and A. Diederich. 2010b. Optimal time windows of integration. Abstract Presented at 2010
International Multisensory Research Forum.
Conrey, B., and D.B. Pisoni. 2006. Auditory–visual speech perception and synchrony detection for speech and
nonspeech signals. Journal of the Acoustical Society of America, 119, 4065–4073.
Corneil, B.D., and D.P. Munoz 1996. The influence of auditory and visual distractors on human orienting gaze
shifts. Journal of Neuroscience, 16, 8193–8207.
Spatial and Temporal Features of Multisensory Processes 211
Corneil, B.D., M. Van Wanrooij., D.P. Munoz, and A.J. Van Opstal. 2002. Auditory–visual interactions subserv-
ing goal-directed saccades in a complex scene. Journal of Neurophysiology, 88, 438–454.
Crystal, T.H., and A.S. House. 1981. Segmental durations in connected speech signals. Journal of the Acoustical
Society of America, 69, S82–S83.
David, S.V., W.E. Vinje, and J.L. Gallant. 2004. Natural stimulus statistics alter the receptive field structure of
v1 neurons. Journal of Neuroscience, 24, 6991–7006.
Dhamala, M., C.G. Assisi, V.K. Jirsa, F.L. Steinberg, and J.A. Kelso. 2007. Multisensory integration for timing
engages different brain networks. NeuroImage, 34, 764–773.
Diederich, A., H. Colonius, D. Bockhorst, and S. Tabeling. 2003. Visual–tactile spatial interaction in saccade
generation. Experimental Brain Research, 148, 328–337.
Dixon, N.F., and L. Spitz. 1980. The detection of auditory visual desynchrony. Perception, 9, 719–721.
Engel, G.R., and W.G. Dougherty. 1971. Visual–auditory distance constancy. Nature, 234, 308.
Facoetti, A., A.N. Trussardi, M. Ruffino, M.L. Lorusso, C. Cattaneo, R. Galli, M. Molteni, and M. Zorzi. 2010.
Multisensory spatial attention deficits are predictive of phonological decoding skills in developmental
dyslexia. Journal of Cognitive Neuroscience, 22, 1011–1025.
Fajen, B.R. 2007. Rapid recalibration based on optic flow in visually guided action. Experimental Brain
Research, 183, 61–74.
Forster, B., C. Cavina-Pratesi, S.M. Aglioti, and G. Berlucchi. 2002. Redundant target effect and intersensory
facilitation from visual–tactile interactions in simple reaction time. Experimental Brain Research, 143,
480–487.
Foss-Feig, J.H., L.D. Kwakye, C.J. Cascio, C.P. Burnette, H. Kadivar, W.L. Stone, and M.T. Wallace. 2010.
An extended multisensory temporal binding window in autism spectrum disorders. Experimental Brain
Research, 203, 381–389.
Frassinetti, F., N. Bolognini, and E. Ladavas. 2002. Enhancement of visual perception by crossmodal visuo-
auditory interaction. Experimental Brain Research, 147, 332–343.
Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in mon-
key superior colliculus. Brain Research Bulletin, 46, 211–224.
Frens, M.A., A.J. Van Opstal, and R.F. van der Willigen. 1995. Spatial and temporal factors determine audito-
ry–visual interactions in human saccadic eye movements. Perception & Psychophysics, 57, 802–816.
Fujisaki, W., S. Shimojo, M. Kashino, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature
Neuroscience, 7, 773–778.
Furukawa, S., and J.C. Middlebrooks. 2002. Cortical representation of auditory space: Information-bearing
features of spike patterns. Journal of Neurophysiology, 87, 1749–1762.
Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience, 28, 4457–4469.
Ghazanfar, A.A., C. Chandrasekaran, and R.J. Morrill. 2010. Dynamic, rhythmic facial expressions and the
superior temporal sulcus of macaque monkeys: implications for the evolution of audiovisual speech.
European Journal of Neuroscience, 31, 1807–1817.
Gottlieb, G. 1971. Ontogenesis of sensory function in birds and mammals. In The biopsychology of develop-
ment, ed. E. Tobach, L.R. Aronson, and E. Shaw. New York: Academic Press.
Guest, S., C. Catmur, D. Lloyd, and C. Spence. 2002. Audiotactile interactions in roughness perception.
Experimental Brain Research, 146, 161–171.
Guitton, D., and D.P. Munoz 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the
head-free cat. I. Identification, localization, and effects of behavior on sensory responses. Journal of
Neurophysiology, 66, 1605–1623.
Haider, B., M.R. Krause, A. Duque, Y. Yu, J. Touryan, J.A. Mazer, and D.A. McCormick. 2010. Synaptic and
network mechanisms of sparse and reliable visual cortical activity during nonclassical receptive field
stimulation. Neuron, 65, 107–121.
Hairston, W.D., J.H. Burdette, D.L. Flowers, F.B. Wood, and M.T. Wallace. 2005. Altered temporal profile of
visual–auditory multisensory interactions in dyslexia. Experimental Brain Research, 166, 474–480.
Hall, W.C., and A.K. Moschovakis. 2004. The superior colliculus: New approaches for studying sensorimotor
integration. Boca Raton, FL: CRC Press.
Hanson, J.V., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities.
Experimental Brain Research, 185, 347–352.
Harrington, L.K., and C.K. Peck. 1998. Spatial disparity affects visual–auditory interactions in human senso-
rimotor processing. Experimental Brain Research, 122, 247–252.
212 The Neural Bases of Multisensory Processes
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons. I. Temporal factors. Journal of Neuroscience, 7, 3215–3229.
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science, 221, 389–391.
Meredith, M.A., and B.E. Stein. 1985. Descending efferents from the superior colliculus relay integrated mul-
tisensory information. Science, 227, 657–659.
Meredith, M.A., and B.E. Stein. 1986. Spatial factors determine the activity of multisensory neurons in cat
superior colliculus. Brain Research, 365, 350–354.
Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus.
Journal of Neuroscience, 4, 2621–2634.
Middlebrooks, J.C., L. Xu, A.C. Eddins, and D.M. Green. 1998. Codes for sound-source location in nontono-
topic auditory cortex. Journal of Neurophysiology, 80, 863–881.
Miyazaki, M., D. Nozaki, and Y. Nakajima. 2005. Testing Bayesian models of human coincidence timing.
Journal of Neurophysiology, 94, 395–399.
Miyazaki, M., S. Yamamoto, S., Uchida, and S. Kitazawa. 2006. Bayesian calibration of simultaneity in tactile
temporal order judgment. Nature Neuroscience, 9, 875–877.
Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory–
visual interactions during early sensory processing in humans: A high-density electrical mapping study.
Brain Research. Cognitive Brain Research, 14, 115–128.
Munhall, K.G., P. Gribble, L. Sacco, and M. Ward. 1996. Temporal constraints on the McGurk effect. Perception
& Psychophysics, 58, 351–362.
Munoz, D.P., and D. Guitton. 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the
head-free cat: II. Sustained discharges during motor preparation and fixation. Journal of Neurophysiology,
66, 1624–1641.
Munoz, D.P., D. Guitton, and D. Pelisson. 1991a. Control of orienting gaze shifts by the tectoreticulospinal
system in the head-free cat: III. Spatiotemporal characteristics of phasic motor discharges. Journal of
Neurophysiology, 66, 1642–1666.
Munoz, D.P., D. Pelisson, and D. Guitton. 1991b. Movement of neural activity on the superior colliculus motor
map during gaze shifts. Science, 251, 1358–1360.
Murray, M.M., C.M. Michel, R. Grave De Peralta, S. Ortigue, D. Brunet, S. Gonzalez Andino, and A. Schnider.
2004. Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging.
NeuroImage, 21, 125–135.
Murray, M.M., J.J. Foxe, and G.R. Wylie. 2005. The brain uses single-trial multisensory memories to discrimi-
nate without awareness. NeuroImage, 27, 473–478.
Nagy, A., G. Eordegh, and G. Benedek. 2003. Spatial and temporal visual properties of single neurons in the
feline anterior ectosylvian visual area. Experimental Brain Research, 151, 108–114.
Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asyn-
chronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research.
Cognitive Brain Research, 25, 499–507.
Noesselt, T., J.W. Rieger, M.A. Schoenfeld, M. Kanowski, H. Hinrichs, H.J. Heinze, and J. Driver. 2007.
Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus pri-
mary sensory cortices. Journal of Neuroscience, 27, 11431–11441.
Pandey, P.C., H. Kunov, and S.M. Abel. 1986. Disruptive effects of auditory signal delay on speech perception
with lipreading. Journal of Auditory Research, 26, 27–41.
Pasalar, S., T. Ro, and M.S. Beauchamp. 2010. TMS of posterior parietal cortex disrupts visual tactile multisen-
sory integration. European Journal of Neuroscience, 31, 1783–1790.
Populin, L.C. 2005. Anesthetics change the excitation/inhibition balance that governs sensory processing in the
cat superior colliculus. Journal of Neuroscience, 25, 5903–5914.
Powers 3rd, A.R., A.R. Hillock, and M.T. Wallace. 2009. Perceptual training narrows the temporal window of
multisensory binding. Journal of Neuroscience, 29, 12265–12274.
Powers 3rd, A.R., M.A. Hevey, and M.T. Wallace. 2010. Neural correlates of multisensory perceptual learning.
In preparation.
Qin, L., J.Y. Wang, and Y. Sato. 2008. Representations of cat meows and human vowels in the primary auditory
cortex of awake cats. Journal of Neurophysiology, 99, 2305–2319.
Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has
opposing effects on visual and auditory stimulus detection: implications for multisensory interactions.
Journal of Neuroscience, 27, 11465–11472.
214 The Neural Bases of Multisensory Processes
Rouger, J., S. Lagleyre, B. Fraysse, S. Deneve, O. Deguine, and P. Barone. 2007. Evidence that cochlear-
implanted deaf patients are better multisensory integrators. Proceedings of the National Academy of
Sciences of the United States of America, 104, 7295–7300.
Rowland, B.A., S. Quessy, T.R. Stanford, and B.E. Stein. 2007. Multisensory integration shortens physiological
response latencies. Journal of Neuroscience, 27, 5879–5884.
Royal, D.W., B.N. Carriere, and M.T. Wallace. 2009. Spatiotemporal architecture of cortical receptive fields
and its impact on multisensory interactions. Experimental Brain Research, 198, 127–136.
Schneider, T.R., A.K. Engel, and S. Debener. 2008. Multisensory identification of natural objects in a two-way
crossmodal priming paradigm. Experimental Psychology, 55, 121–132.
Schorr, E.A., N.A. Fox, V. van Wassenhove, and E.I. Knudsen. 2005. Auditory–visual fusion in speech percep-
tion in children with cochlear implants. Proceedings of the National Academy of Sciences of the United
States of America, 102, 18748–18750.
Seitz, A.R., R. Kim, and L. Shams. 2006. Sound facilitates visual learning. Current Biology, 16, 1422–1427.
Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature, 385, 308.
Shams, L., Y. Kamitani, and S. Shimojo. 2000. Illusions. What you see is what you hear. Nature, 408, 788.
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Brain Research Cognitive
Brain Research, 14, 147–152.
Shore, D.I., C. Spence, and R.M. Klein. 2001. Visual prior entry. Psychological Science, 12, 205–212.
Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental
Psychology. Human Perception and Performance, 35, 580–587.
Soto-Faraco, S., A. Kingstone, and C. Spence. 2003. Multisensory contributions to the perception of motion.
Neuropsychologia, 41, 1847–1862.
Sparks, D.L. 1986. Translation of sensory signals into commands for control of saccadic eye movements: Role
of primate superior colliculus. Physiological Reviews, 66, 118–171.
Sparks, D.L., and Groh, J.M. 1995. The superior colliculus: A window for viewing issues in integrative neuro-
science. In The Cognitive Sciences, ed. Gazzaniga, M.S. Cambridge, MA: MIT Press.
Spence, C., D.I. Shore, and R.M. Klein. 2001. Multisensory prior entry. Journal of Experimental Psychology.
General, 130, 799–832.
Stanford, T.R., S. Quessy, and B.E. Stein. 2005. Evaluating the operations underlying multisensory integration
in the cat superior colliculus. Journal of Neuroscience, 25, 6499–6508.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., and M.T. Wallace. 1996. Comparisons of cross-modality integration in midbrain and cortex.
Progress in Brain Research, 112, 289–299.
Stein, B.E., E. Labos, and L. Kruger. 1973a. Determinants of response latency in neurons of superior colliculus
in kittens. Journal of Neurophysiology, 36, 680–689.
Stein, B.E., E. Labos, and L. Kruger. 1973b. Sequence of changes in properties of neurons of superior colliculus
of the kitten during maturation. Journal of Neurophysiology, 36, 667–679.
Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory
integration apply. Brain Research, 448, 355–358.
Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory inte-
gration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1,
12–24.
Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by
auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.
Stetson, C., X. Cui, P.R. Montague, and D.M. Eagleman. 2006. Motor-sensory recalibration leads to an illusory
reversal of action and sensation. Neuron, 51, 651–659.
Stevenson, R.A., N.A. Altieri, S. Kim, D.B. Pisoni, and T.W. James. 2010. Neural processing of asynchronous
audiovisual speech perception. NeuroImage, 49, 3308–3318.
Stone, J.V., N.M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, and N.R. Porter. 2001. When
is now? Perception of simultaneity. Proceedings of the Royal Society of London. Series B. Biological
Sciences, 268, 31–38.
Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America, 26, 212–215.
Ter-Mikaelian, M., D.H. Sanes, and M.N. Semple. 2007. Transformation of temporal properties between audi-
tory midbrain and cortex in the awake Mongolian gerbil. Journal of Neuroscience, 27, 6091–6102.
van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007. The effect of temporal asynchrony on the
multisensory integration of letters and speech sounds. Cerebral Cortex, 17, 962–794.
Spatial and Temporal Features of Multisensory Processes 215
van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual
speech perception. Neuropsychologia, 45, 598–607.
van Wassenhove, V., D.V. Buonomano, S. Shimojo, and L. Shams. 2008. Distortions of subjective time percep-
tion within and across senses. PLoS One, 3, e1437.
Von Kriegstein, K., and A.L. Giraud. 2006. Implicit multisensory associations influence voice recognition.
PLoS Biology, 4, e326.
Vroomen, J., M. Keetels, B. De Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by
exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22, 32–35.
Wade, N.J., and R.H. Day. 1968. Development and dissipation of a visual spatial aftereffect from prolonged
head tilt. Journal of Experimental Psychology, 76, 439–443.
Wallace, M.T., and B.E. Stein. 1996. Sensory organization of the superior colliculus in cat and monkey. Progress
in Brain Research, 112, 301–311.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience, 17, 2429–2444.
Wallace, M.T., and B.E. Stein. 2000. Onset of cross-modal synthesis in the neonatal superior colliculus is gated
by the development of cortical influences. Journal of Neurophysiology, 83, 3578–3582.
Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior col-
liculus. Journal of Neuroscience, 21, 8886–8894.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory modalities in cat cortex.
Experimental Brain Research, 91, 484–488.
Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs
in primate superior colliculus. Journal of Neurophysiology, 76, 1246–1266.
Wallace, M.T., T.J. Perrault Jr., W.D. Hairston, and B.E. Stein. 2004. Visual experience is necessary for the
development of multisensory integration. Journal of Neuroscience, 24, 9580–9584.
Wallace, M.T., B.N. Carriere, T.J. Perrault Jr., J.W. Vaughan, and B.E. Stein. 2006. The development of cortical
multisensory integration. Journal of Neuroscience, 26, 11844–11849.
Wang, X., T. Lu, R.K. Snider, and L. Liang. 2005. Sustained firing in auditory cortex evoked by preferred
stimuli. Nature, 435, 341–346.
Xu, L., S. Furukawa, and J.C. Middlebrooks. 1999. Auditory cortical responses in the cat to sounds that pro-
duce spatial illusions. Nature, 399, 688–691.
Ye, C.Q., M.M. Poo, Y. Dan, and X.H. Zhang. 2010. Synaptic mechanisms of direction selectivity in primary
auditory cortex. Journal of Neuroscience, 30, 1861–1868.
Zampini, M., D.I. Shore, and C. Spence. 2003. Audiovisual temporal order judgments. Experimental Brain
Research, 152, 198–210.
Zampini, M., S. Guest, D.I. Shore, and C. Spence. 2005a. Audio-visual simultaneity judgments. Perception &
Psychophysics, 67, 531–544.
Zampini, M., D.I. Shore, and C. Spence. 2005b. Audiovisual prior entry. Neurosci Letters, 381, 217–22.
12 Early Integration and
Bayesian Causal Inference
in Multisensory Perception
Ladan Shams
CONTENTS
12.1 Introduction........................................................................................................................... 217
12.2 Early Auditory–Visual Interactions in Human Brain............................................................ 218
12.3 Why Have Cross-Modal Interactions?................................................................................... 219
12.4 The Problem of Causal Inference.......................................................................................... 220
12.5 Spectrum of Multisensory Combinations..............................................................................220
12.6 Principles Governing Cross-Modal Interactions................................................................... 222
12.7 Causal Inference in Multisensory Perception........................................................................ 223
12.8 Hierarchical Bayesian Causal Inference Model.................................................................... 225
12.9 Relationship with Nonhierarchical Causal Inference Model................................................ 226
12.10 Hierarchical Causal Inference Model versus Human Data................................................. 226
12.11 Independence of Priors and Likelihoods............................................................................. 227
12.12 Conclusions.......................................................................................................................... 229
References....................................................................................................................................... 229
12.1 INTRODUCTION
Brain function in general, and perception in particular, has been viewed as highly modular for more
than a century. Although phrenology is considered obsolete, its general notion of the brain being
composed of compartments each devoted to a single function and independent of other functions has
been the dominant paradigm, especially in the context of perception (Pascual-Leone and Hamilton
2001). In the cerebral cortex, it is believed that the different sensory modalities are organized into
separate pathways that are independent of each other, and process information almost completely in
a self-contained manner until the “well digested” processed signals converge at some higher-order
level of processing in the polysensory association cortical areas, wherein the unified perception of
the environment is achieved. The notion of modularity of sensory modalities has been particularly
strong as related to visual perception. Vision has been considered to be highly self-contained and
independent of extramodal influences. This view owes to many sources. Humans are considered to
be “visual animals,” and this notion has been underscored in contemporary society with the ever-
increasingly important role of text and images in our lives along with the advent of electricity (and
light at night). The notion of visual dominance has been supported by the classic and well-known
studies of cross-modal interactions in which a conflict was artificially imposed between vision and
another modality and found that vision overrides the conflicting sensory modality. For example, in
the ventriloquist illusion, vision captures the location of discrepant auditory stimulus (Howard and
Templeton 1966). Similarly, in the “visual capture” effect, vision captures the spatial location of a
tactile or proprioceptive stimulus (Rock and Victor 1964). In the McGurk effect, vision strongly and
217
218 The Neural Bases of Multisensory Processes
qualitatively alters the perceived syllable (McGurk and McDonald 1976). As a result, the influence
of vision on other modalities has been acknowledged for some time. However, the influence of other
modalities on vision has not been appreciated until very recently. There have been several reports
of vision being influenced by another modality; however, most of these have involved quantitative
effects (Gebhard and Mowbray 1959; Scheier et al. 1999; Walker and Scott 1981; McDonald et al.
2000; Spence and Driver 1997; Spence et al. 1998; Stein et al. 1996). Over the past few years, two
studies have reported radical alterations of visual perception by auditory modality. In one case, the
motion trajectory of two visual targets is sometimes changed from a streaming motion to a bounc-
ing motion by a brief sound occurring at the time of visual coincidence (Sekuler et al. 1997). In this
case, the motion of the visual stimuli is, in principle, ambiguous in the absence of sound, and one
could argue that sound disambiguates this ambiguity. In another study, we found that the perceived
number of pulsations of a visual flash (for which there is no obvious ambiguity) is often increased
when paired with multiple beeps (Shams et al. 2000, 2002). This phenomenon demonstrates, in an
unequivocal fashion, that visual perception can be altered by a nonvisual signal. The effect is also
very robust and resistant to changes in the shape, pattern, intensity, and timing of the visual and
auditory stimuli (Shams et al. 2001, 2002; Watkins et al. 2006). For this reason, this illusion known
as “sound-induced flash illusion” appears to reflect a mainstream mechanism of auditory–visual
interaction in the brain as opposed to some aberration in neural processing. Thus, we used the
sound-induced flash illusion as an experimental paradigm for investigating auditory–visual interac-
tions in the human brain.
an illusory percept of two flashes (also referred to as a fission effect). We compared the illusion and
no-illusion trials, reasoning that given that the physical stimuli are identical in both of these post
hoc–defined conditions, the arousal level should also be equal. Contrasting illusion and nonillusion
trials revealed increased activity in V1 in the illusion condition (Watkins et al. 2006), indicating
that the perception of illusion is correlated with increased activity in V1. Although this contradicts
the attention hypothesis laid out earlier, one could still argue that sound may only increase arousal
in some trials and those trials happen to be the illusion trials. Although this argument confounds
attention with integration, we could nevertheless address it using another experiment in which
we included a 2flash1beep condition. On some trials of this condition, the two flashes are fused,
leading to an illusory percept of a single flash (also referred to as a fusion effect), whereas in other
trials, the observers correctly perceived two flashes. Contrasting the illusion and nonillusion tri-
als, we again found a significant difference in the activation level of V1; however, this time, the
perception of sound-induced visual illusion was correlated with decreased activity in V1 (Watkins
et al. 2007), therefore ruling out the role of attention or arousal. As mentioned above, the event-
related potential (ERP) study showed a similar temporal pattern of activity for the illusory and
physical second flash. Here, we found a similar degree of V1 activation for physical and illusory
double flash, and a similar degree of activation for the physical and illusory single flash (Watkins
et al. 2007). These results altogether establish clearly that activity in early visual cortical areas,
as early as in the primary visual cortex, is modulated by sound through cross-modal integration
processes.
What neural pathway could underlie these early auditory–visual interactions? Again, the last
decade has witnessed the overturning of another dogma; the dogma of no connectivity among
the sensory cortical areas. There has been mounting evidence for direct and indirect anatomical
connectivity among the sensory cortical areas (e.g., Clavagnier et al. 2004; Falchier et al. 2002;
Ghazanfar and Schroeder 2006; Rockland and Ojima 2003; Hackett et al. 2007). Of particular
interest here are the findings of extensive projections from the auditory core and parabelt and mul-
tisensory area superior temporal polysensory cortical areas to V1 and V2 in monkey (Falchier et
al. 2002; Rockland and Ojima 2003; Clavagnier et al. 2004). Intriguingly, these projections appear
to be only extensive for the peripheral representations in V1, and not for the foveal representa-
tions (Falchier et al. 2002). This pattern is highly consistent with the much stronger behavioral and
physiological auditory modulation of vision in the periphery compared with the fovea that we have
observed (Shams et al. 2001). Interestingly, tactile modulation of visual processing also seems to
be stronger in the periphery (Diederich and Colonius 2007). Therefore, it seems likely that a direct
projection from A1 or a feedback projection from superior temporal sulcus (STS) could mediate the
modulations we have observed. We believe that the former may be more likely because although the
activation in V1 was found to correlate with the perception of flash, the activation of area STS was
always increased with the perception of illusion regardless of the type of illusion (single or double-
flash; Watkins et al. 2006, 2007). Therefore, these results are more readily consistent with a direct
modulation of V1 projections from auditory areas.
likelihood estimation of an object property using two independent cues, for example, an auditory esti-
mate and a visual estimate, results in an estimate that is more reliable (more precise) than either one of
the individual estimates. Many studies of multisensory perception have confirmed that the human ner-
vous system integrates two cross-modal estimates in a similar fashion (e.g., Alais and Burr 2004; Ernst
and Banks 2002; van Beers et al. 1999; Ronsse et al. 2009). Therefore, integrating information across
modalities is always beneficial. Interestingly, recent studies using single-cell recordings and behavioral
measurements from macaque monkeys have provided a bridge between the behavioral manifestations of
multisensory integration and neural activity, showing that the activity of multisensory (visual–vestibular)
neurons is consistent with Bayesian cue integration (for a review, see Angelaki et al. 2009).
In the temporal numerosity experiment, a variable number of flashes were presented in the
periphery simultaneously with a variable number of beeps. The task of the observers was to judge
the number of flashes and beeps in each trial. In the spatial localization experiment, a Gabor patch
and/or a noise burst were briefly presented at one of several locations along a horizontal line and
the task of the subject was to judge the location of both the visual and auditory stimuli in each trial.
In both experiments, we observed a spectrum of interactions (Figure 12.1). When there was no dis-
crepancy between the auditory and visual stimuli, the two stimuli were fused (Figure 12.1a, left).
When the discrepancy was small between the two stimuli, they were again fused in a large frac-
tion of trials (Figure 12.1a, middle and right). These trials are those in which an illusion occurred.
For example, when one flash paired with two beeps was presented, in a large fraction of trials, the
observers reported seeing two flashes (sound-induced flash illusion) and hearing two beeps. The
reverse illusion occurred when two flashes paired with one beep were seen as a single flash in a
large fraction of trials. Similarly, in the localization experiment, when the spatial gap between the
flash and noiseburst was small (5°), the flash captured the location of the sound in a large fraction of
trials (ventriloquist illusion). In the other extreme, when the discrepancy between the auditory and
visual stimuli was large, there was little interaction, if any, between the two. For example, in the
1flash4beep or 4flash1beep conditions in the numerosity judgment experiments, or in the conditions
in which the flash was all the way to the left and noise all the way to the right or vice versa in the
localization experiment, there was hardly any shift in the visual or auditory percepts relative to the
unisensory conditions. We refer to this lack of interaction as segregation (Figure 12.1c) because it
appears that the signals are kept separate from each other. Perhaps most interestingly, in conditions
in which there was a moderate discrepancy between the two stimuli, sometimes there was a partial
shift of the two modalities toward each other. We refer to this phenomenon as “partial integration”
(Figure 12.1b). For example, in the 1flash3beep condition, the observers sometimes reported seeing
two flashes and hearing three beeps. Or in the condition in which the flash is at –5° (left of fixa-
tion) and noise is at +5° (right of fixation), the observers sometimes reported hearing the noise at
0 degrees and seeing the flash at –5°. Therefore, in summary, in both experiments, we observed a
(a)
Fusion
Partial integration Segregation
(b)
Conflict
(c)
FIGURE 12.1 Range of cross-modal interactions. Horizontal axis in these panels represents a perceptual
dimension such as space, time, number, etc. Light bulb and loudspeaker icons represent visual stimulus
and auditory stimulus, respectively. Eye and ear icons represent visual and auditory percepts, respectively.
(a) Fusion. Three examples of conditions in which fusion often occurs. Left: when stimuli are congruent and
veridically perceived. Middle: when discrepancy between auditory and visual stimuli is small, and percept
corresponds to a point in between two stimuli. Right: when discrepancy between two stimuli is small, and one
modality (in this example, vision) captures the other modality. (b) Partial integration. Left: when discrepancy
between two stimuli is moderate, and the less reliable modality (in this example, vision) gets shifted toward
the other modality but does not converge. Right: when discrepancy is moderate and both modalities get shifted
toward each other but not enough to converge. (c) Segregation. When conflict between two stimuli is large, and
the two stimuli do not affect each other.
222 The Neural Bases of Multisensory Processes
(a) (b)
50 50
% Auditory bias
% Visual bias 40 40
30 30
20 20
1 2 3 5 10 15 20
Number disparity (#) Spatial disparity (deg.)
FIGURE 12.2 Interaction between auditory and visual modalities as a function of conflict. (a) Visual bias
(i.e., influence of sound on visual perception) as a function of discrepancy between number of flashes and
beeps in temporal numerosity judgment task. (b) Auditory bias (i.e., influence of vision on auditory perception)
as a function of spatial gap between the two in spatial localization task.
spectrum of interactions between the two modalities. When the discrepancy is zero or small, the
two modalities tend to get fused. When the conflict is moderate, partial integration may occur, and
when the conflict is large, the two signals tend to be segregated (Figure 12.1, right). In both experi-
ments, the interaction between the two modalities gradually decreased as the discrepancy between
the two increased (Figure 12.2).
What would happen if we had more than two sensory signals? For example, if we have a visual,
auditory, and tactile signal, as is most often the case in nature. We investigated this scenario using
the numerosity judgment task (Wozny et al. 2008). We presented a variable number of flashes paired
with a variable number of beeps and a variable number of taps, providing unisensory, bisensory,
and trisensory conditions pseudorandomly interleaved. The task of the participants was to judge the
number of flashes, beeps, and taps on each trial. This experiment provided a rich set of data that
replicated the sound-induced flash illusion (Shams et al. 2000) and the touch-induced flash illusion
(Violentyev et al. 2005), as well as many previously unreported illusions. In fact, in every condition
in which there was a small discrepancy between two or three modalities, we observed an illusion.
This finding demonstrates that the interaction among these modalities is the rule rather than the
exception, and the sound-induced flash illusions that have been previously reported are not “special”
in the sense that they are not unusual or out of the ordinary, but rather, they are consistent with a
general pattern of cross-modal interactions that cuts across modalities and stimulus conditions.
We wondered whether these changes in perceptual reports reflect a change in response criterion
as opposed to a change in perception per se. We calculated the sensitivity (d′) change between
bisensory and unisensory conditions (and between trisensory and bisensory conditions) and found
statistically significant changes in sensitivity as a result of the introduction of a second (or third)
sensory signal in most of the cases despite the very conservative statistical criterion used. In other
words, the observed illusions (both fission and fusion) reflect cross-modal integration processes, as
opposed to response bias.
we use both haptic and visual information to estimate the shape of the mug. It is also expected for
the bits of information to be fairly consistent with each other if they arise from the same object.
Therefore, it would make sense for the nervous system to fuse the sensory signals when there is
little or no discrepancy between the signals. Similarly, as discussed earlier, it is reasonable for the
nervous system not to combine the bits of information if they correspond to different objects. It is
also expected for the bits of information to be highly disparate if they stem from different objects.
Therefore, if we are holding a mug while watching TV, it would be best not to combine the visual
and haptic information. Therefore, segregation also makes sense from a functional point of view.
How about partial integration? Is there a situation in which partial integration would be beneficial?
There is no intuitively obvious explanation for partial integration, as we do not encounter situations
wherein two signals are only partially caused by the same object. Therefore, the phenomenon of
partial integration is rather curious. Is there a single rule that can account for the entire range of
cross-modal interactions including partial integration?
(a) (b)
s sA sV
x1 x2 xA xV
(c) (d) C
sA sV sT
C=1 C=2
s sA sV
xA xV xT
x1 x2 xA xV
FIGURE 12.3 Generative model of different models of cue combination. (a) Traditional model of cue combi-
nation, in which two signals are assumed to be caused by one source. (b) Causal inference model of cue combi-
nation, in which each signal has a respective cause, and causes may or may not be related. (c) Generalization of
model in (b) to three signals. (d) Hierarchical causal inference model of cue combination. There are two explicit
causal structures, one corresponding to common cause and one corresponding to independent causes, and
variable C chooses between the two. (b, Adapted from Shams, L. et al., Neuroreport, 16, 1923–1927, 2005b; c,
adapted from Wozny, D.R. et al., J. Vis., 8, 1–11, 2008; d, Körding, K. et al., PLoS ONE, 2, e943, 2007.)
224 The Neural Bases of Multisensory Processes
To come up with a general model that can account for the entire range of interactions, we aban-
doned the assumption of a single source, and allowed each of the sensory cues to have a respective
source. By allowing the two sources to be either dependent or independent, we allowed for both
conditions of a common cause and conditions of independent causes for the sensory signals (Figure
12.3b). We assume that the two sensory signals (xA and x V) are conditionally independent of each
other. This follows from the assumption that up to the point where the signals get integrated, the
sensory signals in different modalities are processed in separate pathways and thus are corrupted
by independent noise processes. As mentioned above, this is a common assumption. The additional
assumption made here is that the auditory signal is independent of the visual source (sV) given the
auditory source (sA), and likewise for visual signal. This is based on the observation that either the
two signals are caused by the same object, in which case, the dependence of auditory signal on
the visual source is entirely captured by its dependence on the auditory source, or they are caused
by different objects, in which case, the auditory signal is entirely independent of the visual source
(likewise for visual signal). In other words, this assumption follows from the observation that there
is either a common source or independent sources. This general model of bisensory perception
(Shams et al. 2005b) results in a very simple inference rule:
P ( x A | s A ) P ( x V | sV ) P ( s A , sV )
P ( s A , sV | x A , x V ) = (12.1)
P( x A , x V )
where the probability of the auditory and visual sources, sA and sV, given the sensory signals xA
and x V is a normalized product of the auditory likelihood (i.e., the probability of getting a signal xA
given that there is a source sA out there) and visual likelihood (i.e., the probability of getting a signal
x V given that there is a source sV) and the prior probability of sources sA and s V occurring jointly.
The joint prior probability P(sA,s V) represents the implicit knowledge that the perceptual system has
accumulated over the course of a lifetime about the statistics of auditory–visual events in the envi-
ronment. In effect, it captures the coupling between the two modalities, and therefore, how much
the two modalities will interact in the process of inference. If the two signals (e.g., the number of
flashes and beeps) have always been consistent in one’s experience, then the expectation is that they
will be highly consistent in the future, and therefore, the joint prior matrix would be diagonal (only
the identical values of number of flashes and beeps are allowed, and the rest will be zero). On the
other hand, if in one’s experience, the number of flashes and beeps are completely independent of
each other, then P(sA,sV) would be factorizable (e.g., a uniform distribution or an isotropic Gaussian
distribution) indicating that the two events have nothing to do with each other, and can take on any
values independently of each other. Therefore, by having nonzero values for both sA = sV and sA ≠
sV in this joint probability distribution, both common cause and independent cause scenarios are
allowed, and the relative strength of these probabilities would determine the prior expectation of
a common cause versus independent causes. Other recent models of multisensory integration have
also used joint prior probabilities to capture the interaction between two modalities, for example, in
haptic–visual numerosity judgment tasks (Bresciani et al. 2006) and auditory–visual rate perception
(Roach et al. 2006).
The model of Equation 12.1 is simple, general, and readily extendable to more complex situa-
tions. For example, the inference rule for trisensory perception (Figure 12.3c) would be as follows:
P( x A | sA ) P( x V | sV ) P( x T | sT ) P(sA , sV , sT )
P ( s A , sV , sT | x A , x V , x T ) = (12.2)
P( x A , xV , xT )
To test the trisensory perception model of Equation 12.2, we modeled the three-dimensional joint
prior P(sA,sV,sT) with a multivariate Gaussian function, and each of the likelihood functions with
a univariate Gaussian function. The mean of the likelihoods were assumed to be unbiased (i.e., on
Early Integration and Bayesian Causal Inference in Multisensory Perception 225
average at the veridical number), and the standard deviation of the likelihoods was estimated using
data from unisensory conditions. It was also assumed that the mean and variance for the prior of
the three modalities were equal, and the three covariances (for three pairs of modalities) were also
equal.* This resulted in a total of three free parameters (mean, variance, and covariance of the
prior). These parameters were fitted to the data from the trisensory numerosity judgment experi-
ment discussed earlier. The model accounted for 95% of variance in the data (676 data points) using
only three free parameters. To test whether the three parameters rendered the model too powerful
and able to account for any data set, we scrambled the data and found that the model badly failed
to account for the arbitrary data (R2 < .01). In summary, the Bayesian model of Figure 12.3c could
provide a remarkable account for the myriad of two-way and three-way interactions observed in
the data.
( ) (
p xV , x A | C = 1 p C = 1 )
(
p C = 1 | xV , x A =) p(x , x )
(12.3)
V A
According to this rule, the probability of a common cause is simply a product of two factors. The
left term in the numerator—the likelihood that the two sensory signals occur if there is a common
cause—is a function of how similar the two sensory signals are. The more dissimilar the two sig-
nals, the lower this probability will be. The right term in the numerator is the a priori expectation
of a common cause, and is a function of prior experience (how often two signals are caused by the
same source in general). The denominator again is a normalization factor.
Given this probability of a common cause, the location of the auditory and visual stimulus can
now be computed as follows:
( ) ( )
sˆ = p C = 1 | xV , x A sˆC=1 + p C = 2 | x V , x A sˆC=2 (12.4)
where ŝ denotes the overall estimate of the location of sound (or visual stimulus), and ŝ C = 1 and ŝ C = 2
denote the optimal estimates of location for the scenario of common-cause or scenario of indepen-
dent causes, respectively. The inference rule is interesting because it is a weighted average of two
optimal estimates, and it is nonlinear in xA and x V.
What does this inference rule mean? Let us focus on auditory estimation of location for example,
and assume Gaussian functions for prior and likelihood functions over space. If the task of the
observer is to judge the location of sound, then if the observer knows for certain that the auditory
and visual stimuli were caused by two independent sources (e.g., a puppeteer talking and a puppet
moving), then the optimal estimate of the location of sound would be entirely based on the auditory
* These assumptions were made to minimize the number of free parameters and maximize the parsimony of the model.
However, the assumptions were verified by fitting a model with nine parameters (allowing different values for the mean,
variance, and covariance across modalities) to the data, and finding almost equal values for all three means, all three
variances, and all three covariances.
226 The Neural Bases of Multisensory Processes
x A σ A2 + x P σ P2
information and the prior: ŝA,C= 2 = where σA and σ P are the standard deviations
1 σ A2 + 1 σ P2
of the auditory likelihood and the prior, respectively. On the other hand, if the observer knows for
certain that the auditory and visual stimuli were caused by the same object (e.g., a puppet talking
and moving), then the optimal estimate of the location of sound would take visual information into
x σ 2 + x σ 2 + x P σ P2
account: sˆA ,C=1 = V V2 A 2 A . In nature, the observer is hardly ever certain about the
1 σ V + 1 σ A + 1 σ P2
causal structure of the events in the environment, and in fact, it is the job of the nervous system
to solve that problem. Therefore, in general, the nervous system would have to take both of these
possibilities into account, thus, the overall optimal estimate of the location of sound happens to be
a weighted average of the two optimal estimates each weighted by their respective probabilities as
in Equation 12.3. It can now be understood how partial integration could result from this optimal
scheme of multisensory perception.
It should be noted that Equation 12.4 is derived assuming a mean squared error cost function.
This is a common assumption, and roughly speaking, it means that the nervous system tries to
minimize the average magnitude of error. The mean squared error function is minimized if the
mean of the posterior distribution is selected as the estimate. The estimate shown in Equation 12.4
corresponds to the mean of the posterior distribution, and as it is a weighted average of the estimates
of the two causal structures (i.e., ŝA,C = 2 and ŝA,C = 1), it is referred to as “model averaging.” If, on the
other hand, the goal of the perceptual system is to minimize the number of times that an error is
made, then the maximum of the posterior distribution would be the optimal estimate. In this sce-
nario, the overall estimate of location would be the estimate corresponding to the causal structure
with the higher probability, and thus, this strategy is referred to as “model selection.” Although the
model averaging strategy of Equation 12.4 provides estimates that are never entirely consistent with
either one of the two possible scenarios (i.e., with what occurs in the environment), this strategy
does minimize the magnitude of error on average (the mean squared error) more than any other
strategy, and therefore, it is optimal given the cost function.
space (i.e., the strength of the bias for center). Because the width of the Gaussian prior over space is
a free parameter, if there is no such bias for center position, the parameter will take on a large value,
practically rendering this distribution uniform, and thus, the bias largely nonexistent.
The model accounted for 97% of variance in human observer data (1225 data points) using only
four free parameters (Körding et al. 2007). This is a remarkable fit, and as before, is not due to the
degrees of freedom of the model, as the model cannot account for arbitrary data using the same
number of free parameters. Also, if we set the value of the four parameters using some common
sense values or the published data from other studies, and compare the data with the predictions of
the model with no free parameters, we can still account for the data similarly well.
We tested whether model averaging (Equation 12.4) or model selection (see above) explains the
observers’ data better, and found that observers’ responses were highly more consistent with model
averaging than model selection.
In our spatial localization experiment, we did not ask participants to report their perceived causal
structure on each trial. However, Wallace and colleagues did ask their subjects to report whether
they perceive a unified source for the auditory and visual stimuli on each trial (Wallace et al. 2004).
The hierarchical causal inference model can account for their published data; both for the data
on judgments of unity, and the spatial localizations and interactions between the two modalities
(Körding et al. 2007).
We compared this model with other models of cue combination on the spatial localization data
set. The causal inference model accounts for the data substantially better than the traditional forced
fusion model of integration, and better than two recent models of integration that do not assume
forced fusion (Körding et al. 2007). One of these models was a model developed by Bresciani et al.
(2006) that assumes a Gaussian ridge distribution as the joint prior, and the other one was a model
developed by Roach et al. (2006) that assumes the sum of a uniform distribution and a Gaussian
ridge as the joint prior.
We tested the hierarchical causal inference model on the numerosity judgment data described ear-
lier. The model accounts for 86% of variance in the data (576 data points) using only four free param-
eters (Beierholm 2007). We also compared auditory–visual interactions and visual–visual interactions
in the numerosity judgment task, and found that both cross-modal and within-modality interactions
could be explained using the causal inference model, with the main difference between the two being
in the a priori expectation of a common cause (i.e., Pcommon). The prior probability of a common
cause for visual–visual condition was higher than that of the auditory–visual condition (Beierholm
2007). Hospedales and Vijayakumar (2009) have also recently shown that an adaptation of the causal
inference model for an oddity detection task accounts well for both within-modality and cross-modal
oddity detection of observers. Consistent with our results, they found the prior probability of a com-
mon cause to be higher for the within-modality task compared with the cross-modality task.
In summary, we found that the causal inference model accounts well for two complementary
sets of data (spatial localization and numerosity judgment), it accounts well for data collected by
another group, it outperforms the traditional and other contemporary models of cue combination
(on the tested data set), and it provides a unifying account of within-modality and cross-modality
integration.
likelihoods and priors are independent of each other. It is quite possible that changing the likeli-
hoods would result in a change in priors or vice versa. Given that we are able to estimate likelihoods
and priors using the causal inference model, we can empirically investigate the question of inde-
pendence of likelihoods and priors. Furthermore, it is possible that the Bayes-optimal performance
is achieved without using Bayesian inference (Maloney and Mamassian 2009). For example, it has
been described that an observer using a table-lookup mechanism can achieve near-optimal perfor-
mance using reinforcement learning (Maloney and Mamassian 2009). Because the Bayes-optimal
performance can be achieved by using different processes, it has been argued that comparing human
observer performance with a Bayesian observer in one setting alone is not sufficient as evidence
for Bayesian inference as a process model of human perception. For these reasons, Maloney and
Mamassian (2009) have proposed transfer criteria as more powerful experimental tests of Bayesian
decision theory as a process model of perception. The transfer criterion is to test whether the change
in one component of decision process (i.e., likelihood, prior, or decision rule) leaves the other com-
ponents unchanged. The idea is that if the perceptual system indeed engages in Bayesian inference, a
change in likelihoods, for example, would not affect the priors. However, if the system uses another
process such as a table-lookup then it would fail these kinds of transfer tests.
We asked whether priors are independent of likelihoods (Beierholm et al. 2009). To address this
question, we decided to induce a strong change in the likelihoods and examine whether this would
lead to a change in priors. To induce a change in likelihoods, we manipulated the visual stimulus.
We used the spatial localization task and tested participants under two visual conditions, one with a
high-contrast visual stimulus (Gabor patch), and one with a low-contrast visual stimulus. The task,
procedure, auditory stimulus, and all other variables were identical across the two conditions that
were tested in two separate sessions. The two sessions were held 1 week apart, so that if the observ-
ers learn the statistics of the stimuli during the first session, the effect of this learning would disap-
pear by the time of the second session. The change in visual contrast was drastic enough to cause the
performance on visual-alone trials to be lower than that of the high-contrast condition by as much
as 41%. The performance on auditory-alone trials did not change significantly because the auditory
stimuli were unchanged. The model accounts for both sets of data very well (R2 = .97 for high con-
trast, and R2 = .84 for low-contrast session). Therefore, the performance of the participants appears
to be Bayes-optimal in both the high-contrast and low-contrast conditions. Considering that the
performances in the two sessions were drastically different (substantially worse in the low-contrast
condition), and considering that the priors were estimated from the behavioral responses, there is no
reason to believe that the priors in these two sessions would be equal (as they are derived from very
different sets of data). Therefore, if the estimated priors do transpire to be equal between the two
sessions, that would provide a strong evidence for independence of priors from likelihoods.
If the priors are equal, then swapping them between the two sessions should not hurt the good-
ness of fit to the data. We tested this using priors estimated from the low-contrast data to predict
high-contrast data, and the priors estimated from the high-contrast data to predict the low-contrast
data. The results were surprising: the goodness of fit remained almost as good (R2 = .97 and R2 =
.81) as using priors from the same data set (Beierholm et al. 2009). Next, we directly compared the
estimated parameters of the likelihood and prior functions for the two sessions. The model was
fitted to each individual subject’s data, and the likelihood and prior parameters were estimated
for each subject for each of the two sessions separately. Comparing the parameters across subjects
(Figure 12.4) revealed a statistically significant (P < .0005) difference only for the visual likelihood
(showing a higher degree of noise for the low-contrast condition). No other parameters (neither the
auditory likelihood nor the two prior parameters) were statistically different between the two ses-
sions. Despite a large difference between the two visual likelihoods (by >10 standard deviations) no
change was detected in either probability of a common cause nor the prior over space. Therefore,
these results suggest that priors are encoded independently of the likelihoods (Beierholm et al.
2009). These findings are consistent with the findings of a previous study showing that the change
in the kind of perceptual bias transfers qualitatively to other types of stimuli (Adams et al. 2004).
Early Integration and Bayesian Causal Inference in Multisensory Perception 229
Percentage common
12 60
8 40
4 20
0 0
σV σA σP Pcommon
Likelihoods Priors
FIGURE 12.4 Mean prior and likelihood parameter values across participants in two experimental sessions
differing only in contrast of visual stimulus. Black and gray denote values corresponding to session with
high-contrast and low-contrast visual stimulus, respectively. Error bars correspond to standard error of mean.
(From Beierholm, U. et al., J. Vis., 9, 1–9, 2009. With permission.)
12.12 CONCLUSIONS
Together with a wealth of other accumulating findings, our behavioral findings suggest that cross-
modal interactions are ubiquitous, strong, and robust in human perceptual processing. Even visual
perception that has been traditionally believed to be the dominant modality and highly self-contained
can be strongly and radically influenced by cross-modal stimulation. Our ERP, MEG, and fMRI
findings consistently show that visual processing is affected by sound at the earliest levels of corti-
cal processing, namely at V1. This modulation reflects a cross-modal integration phenomenon as
opposed to attentional modulation. Therefore, multisensory integration can occur even at these early
stages of sensory processing, in areas that have been traditionally held to be unisensory.
Cross-modal interactions depend on a number of factors, namely the temporal, spatial, and struc-
tural consistency between the stimuli. Depending on the degree of consistency between the two
stimuli, a spectrum of interactions may result, ranging from complete integration, to partial integra-
tion, to complete segregation. The entire range of cross-modal interactions can be explained by a
Bayesian model of causal inference wherein the inferred causal structure of the events in the envi-
ronment depends on the degree of consistency between the signals as well as the prior knowledge/
bias about the causal structure. Indeed given that humans are surrounded by multiple objects and
hence multiple sources of sensory stimulation, the problem of causal inference is a fundamental
problem at the core of perception. The nervous system appears to have implemented the optimal
solution to this problem as the perception of human observers appears to be Bayes-optimal in mul-
tiple tasks, and the Bayesian causal inference model of multisensory perception presented here can
account in a unified and coherent fashion for an entire range of interactions in a multitude of tasks.
Not only the performance of observers appears to be Bayes-optimal in multiple tasks, but the priors
also appear to be independent of likelihoods, consistent with the notion of priors encoding the sta-
tistics of objects and events in the environment independent of sensory representations.
REFERENCES
Adams, W.J., E.W. Graf, and M.O. Ernst. 2004. Experience can change the ‘light-from-above’ prior. Nature
Neuroscience, 7, 1057–1058.
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current
Biology, 14, 257–62.
230 The Neural Bases of Multisensory Processes
Angelaki, D.E., Y. Gu, and G.C. Deangelis. 2009. Multisensory integration: Psychophysics, neurophysiology,
and computation. Current Opinion in Neurobiology, 19, 452–458.
Beierholm, U. 2007. Bayesian modeling of sensory cue combinations. PhD Thesis, California Institute of
Technology.
Beierholm, U., S. Quartz, and L. Shams. 2009. Bayesian priors are encoded independently of likelihoods in
human multisensory perception. Journal of Vision, 9, 1–9.
Bhattacharya, J., L. Shams, and S. Shimojo. 2002. Sound-induced illusory flash perception: Role of gamma
band responses. Neuroreport, 13, 1727–1730.
Bresciani, J.P., F. Dammeier, and M.O. Ernst. 2006. Vision and touch are automatically integrated for the per-
ception of sequences of events. Journal of Vision, 6, 554–564.
Calvert, G., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in
humans by application of electro-physiological criteria to the BOLD effect. NeuroImage, 14, 427–438.
Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications
for multisensory integration, spatial awareness, and visual consciousness. Cognitive Affective Behavioral
Neuroscience, 4, 117–126.
Diederich, A., and H. Colonius. 2007. Modeling spatial effects in visual-tactile saccadic reaction time.
Perception & Psychophysics, 69, 56–67.
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature, 415, 429–433.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience, 22, 5749–5759.
Gebhard, J.W., and G.H. Mowbray. 1959. On discriminating the rate of visual flicker and auditory flutter.
American Journal of Psychology, 72, 521–528.
Ghahramani, Z. 1995. Computation and psychophysics of sensorimotor integration. Ph.D. Thesis, Massachusetts
Institute of Technology.
Ghazanfar, A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences,
10, 278–285.
Hackett, T.A., J.F. Smiley, I. Ulbert, G. Karmos, P. Lakatos, L.A. De La Mothe, and C.E. Schroeder. 2007.
Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception, 36, 1419–1430.
Hospedales, T., and S. Vijayakumar. 2009. Multisensory oddity detection as Bayesian inference. PLoS ONE,
4, e4205.
Howard, I.P., and W.B. Templeton. 1966. Human Spatial Orientation, London, Wiley.
Körding, K., U. Beierholm, W.J. Ma, J.M. Tenenbaum, S. Quartz, and L. Shams. 2007. Causal inference in
multisensory perception. PLoS ONE, 2, e943.
Landy, M.S., L.T. Maloney, E.B. Johnston, and M. Young. 1995. Measurement and modeling of depth cue
combination: In defense of weak fusion. Vision Research, 35, 389–412.
Maloney, L.T., and P. Mamassian. 2009. Bayesian decision theory as a model of human visual perception:
Testing Bayesian transfer. Visual Neuroscience, 26, 147–155.
McDonald, J.J., W.A. Teder-Sälejärvi, and S.A. Hillyard. 2000. Involuntary orienting to sound improves visual
perception. Nature, 407, 906–908.
McGurk, H., and J.W. McDonald. 1976. Hearing lips and seeing voices. Nature, 264, 746–748.
Pascual-Leone, A., and R. Hamilton. 2001. The metamodal organization of the brain. Progress in Brain
Research, 134, 427–445.
Roach, N., J. Heron, and P. McGraw. 2006. Resolving multisensory conflict: A strategy for balancing the costs
and benefits of audio-visual integration. Proceedings of the Royal Society B: Biological Sciences, 273.
2159–2168.
Rock, I., and J. Victor. 1964. Vision and touch: An experimentally created conflict between the two senses.
Science, 143, 594–596.
Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology, 50, 19–26.
Ronsse, R., C. Miall, and S.P. Swinnen. 2009. Multisensory integration in dynamical behaviors: Maximum
likelihood estimation across bimanual skill learning. Journal of Neuroscience, 29, 8419–8428.
Scheier, C.R., R. Nijwahan, and S. Shimojo. 1999. Sound alters visual temporal resolution. In Investigative
Ophthalmology and Visual Science, 40, S4169.
Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature, 385, 308.
Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature, 408, 788.
Shams, L., Y. Kamitani, S. Thompson, and S. Shimojo. 2001. Sound alters visual evoked potentials in humans.
Neuroreport, 12, 3849–3852.
Early Integration and Bayesian Causal Inference in Multisensory Perception 231
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research,
14, 147–152.
Shams, L., S. Iwaki, A. Chawla, and J. Bhattacharya. 2005a. Early modulation of visual cortex by sound: An
MEG study. Neuroscience Letters, 378, 76–81.
Shams, L., W.J. Ma, and U. Beierholm. 2005b. Sound-induced flash illusion as an optimal percept. Neuroreport,
16, 1923–1927.
Spence, C., and J. Driver. 1997. Audiovisual links in exogenous covert spatial orienting. Perception &
Psychophysics, 59, 1–22.
Spence, C., M.E. Nicholls, N. Gillespie, and J. Driver. 1998. Cross-modal links in exogenous covert spatial
orienting between touch, audition, and vision. Perception and Psychophysics, 60, 544–557.
Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by
auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506.
Van Beers, R.J., A.C. Sittig, and J.J. Denier van der Gon. 1999. Integration of proprioceptive and visual position
information: An experimentally supported model. Journal of Neurophysiology, 81, 1355–1364.
Violentyev, A., S. Shimojo, and L. Shams. 2005. Touch-induced visual illusion. Neuroreport, 16, 1107–1110.
Walker, J.T., and K.J. Scott. 1981. Auditory–visual conflicts in the perceived duration of lights, tones, and gaps.
Journal of Experimental Psychology: Human Perception and Performance, 7, 1327–1339.
Wallace, M.T., G.H. Roberson, W.D. Hairston, B.E. Stein, J.W. Vaughan, and J.A. Schirillo. 2004. Unifying
mulitsensory signals across time and space. Experimental Brain Research, 158, 252–258.
Watkins, S., L. Shams, S. Tanaka, J.-D. Haynes, and G. Rees. 2006. Sound alters activity in human V1 in asso-
ciation with illusory visual perception. NeuroImage, 31, 1247–1256.
Watkins, S., L. Shams, O. Josephs, and G. Rees. 2007. Activity in human V1 follows multisensory perception.
NeuroImage, 37, 572–578.
Wozny, D.R., U.R. Beierholm, and L. Shams. 2008. Human trimodal perception follows optimal statistical
inference. Journal of Vision, 8, 1–11.
Yuille, A.L., and H.H. Bülthoff. 1996. Bayesian decision theory and psychophysics. In Perception as Bayesian
Inference, ed. D.C. Knill and W. Richards. Cambridge, UK: Cambridge Univ. Press.
13 Characterization of
Multisensory Integration
with fMRI
Experimental Design,
Statistical Analysis, and
Interpretation
Uta Noppeney
CONTENTS
13.1 Functional Specialization: Mass- Univariate Statistical Approaches....................................234
13.1.1 Conjunction Analyses................................................................................................234
13.1.2 Max and Mean Criteria............................................................................................. 236
13.1.3 Interaction Approaches.............................................................................................. 236
13.1.3.1 Classical Interaction Design: 2 × 2 Factorial Design Manipulating
Presence versus Absence of Sensory Inputs............................................... 236
13.1.3.2 Interaction Design: 2 × 2 Factorial Design Manipulating
Informativeness or Reliability of Sensory Inputs....................................... 238
13.1.3.3 Elaborate Interaction Design: m × n Factorial Design (i.e., More than
Two Levels).................................................................................................240
13.1.3.4 Interaction Analyses Constrained by Maximum Likelihood
Estimation Model........................................................................................ 242
13.1.3.5 Combining Interaction Analyses with Max Criterion................................ 242
13.1.4 Congruency Manipulations....................................................................................... 243
13.1.5 fMRI Adaptation (or Repetition Suppression)........................................................... 243
13.2 Multisensory Representations: Multivariate Decoding and Pattern Classifier Analyses......246
13.3 Functional Integration: Effective Connectivity Analyses..................................................... 247
13.3.1 Data-Driven Effective Connectivity Analysis: Psychophysiological Interactions
and Granger Causality............................................................................................... 247
13.3.2 Hypothesis-Driven Effective Connectivity Analysis: Dynamic Causal
Modeling....................................................................................................................248
13.4 Conclusions and Future Directions........................................................................................ 249
Acknowledgments........................................................................................................................... 249
References.......................................................................................................................................249
233
234 The Neural Bases of Multisensory Processes
This chapter reviews the potential and limitations of functional magnetic resonance imaging (fMRI)
in characterizing the neural processes underlying multisensory integration. The neural basis of mul-
tisensory integration can be characterized from two distinct perspectives. From the perspective of
functional specialization, we aim to identify regions where information from different senses con-
verges and/or is integrated. From the perspective of functional integration, we investigate how infor-
mation from multiple sensory regions is integrated via interactions among brain regions. Combining
these two perspectives, this chapter discusses experimental design, analysis approaches, and inter-
pretational limitations of fMRI results. The first section describes univariate statistical analyses of
fMRI data and emphasizes the interpretational ambiguities of various statistical criteria that are
commonly used for the identification of multisensory integration sites. The second section explores
the potential and limitations of multivariate and pattern classifier approaches in multisensory inte-
gration. The third section introduces effective connectivity analyses that investigate how multi-
sensory integration emerges from distinct interactions among brain regions. The complementary
strengths of data-driven and hypothesis-driven effective connectivity analyses will be discussed.
We conclude by emphasizing that the combined potentials of these various analysis approaches
may help us to overcome or at least ameliorate the interpretational ambiguities associated with each
analysis when applied in isolation.
A V
(b)
1
0.8
Threshold
0.6
0.4
0.2
0
–0.2 A V Conjunction
(c)
1
0.8
0.6 Threshold
0.4
0.2
0
–0.2 A V
FIGURE 13.1 Conjunction design and analysis. (a) Experimental design. (1) Auditory: environmental sounds;
(2) visual: pictures or video clips. Example stimuli are presented as visual images and corresponding sound
spectrograms. (b and c) Data analysis and interpretation. (b) A region responding to auditory “and” visual
inputs when presented in isolation is identified as multisensory in a conjunction analysis. (c) A region respond-
ing only to auditory but not visual inputs is identified as unisensory in a conjunction analysis. Therefore,
conjunction analyses cannot capture modulatory interactions in which one sensory (e.g., visual) input in itself
does not elicit a response, but significantly modulates response of another sensory input (e.g., auditory). Bar
graphs represent effect for auditory (black) and visual (darker gray) stimuli, and “multisensory” (lighter gray)
effect as defined by a conjunction.
would respond to unisensory inputs from multiple sensory modalities (e.g., AV neurons to A inputs
and V inputs). (2) In the case of pure regional convergence, the blood oxygen level dependent
(BOLD) response is generated by independent populations of either auditory neurons or visual
neurons (e.g., A neurons to A and V neurons to V inputs). Given the low spatial resolution of fMRI,
both cases produce a “conjunction” BOLD response profile, i.e., regional activation that is elicited
by unisensory inputs from multiple senses. Hence, conjunction analyses cannot unambiguously
identify multisensory integration.
From a statistical perspective, it is important to note that the term “conjunction analysis” has been
used previously to refer to two distinct classes of statistical tests that have later on been coined (1)
“global null conjunction analysis” (Friston et al. 1999, 2005) and (2) “conjunction null conjunction
analysis” (Nichols et al. 2005). (1) A global null conjunction analysis generalizes the one-sided t-test
to multiple dimensions (i.e., comparable to an F-test, but unidirectional) and enables inferences about
k or more effects being present. Previous analyses based on minimum statistics have typically used
the null hypothesis that k = 0. Hence, they tested whether one or more effects were present. In the
context of multisensory integration, this sort of global null conjunction analysis tests whether “at least
one” unisensory input significantly activates a particular region or voxel (with all unisensory inputs
eliciting an effect greater than a particular minimum t value). (2) The more stringent conjunction
null conjunction analysis (implemented in most software packages) explicitly tests whether a region
is significantly activated by both classes of unisensory inputs. Hence, a conjunction null conjunc-
tion analysis forms a logical “and” operation of the two statistical comparisons. This second type of
inference, i.e., a logical “and” operation, is needed when identifying multisensory convergence with
236 The Neural Bases of Multisensory Processes
the help of conjunction analyses. Nevertheless, because conjunction analyses were used primarily
in the early stages of fMRI multisensory research, when this distinction was not yet clearly drawn,
most of the previous research is actually based on the more liberal and, in this context, inappropriate
global null conjunction analysis. For instance, initial studies identified integration sites of motion
information by performing a global null conjunction analysis on motion effects in the visual, tactile,
and auditory domains (Bremmer et al. 2001). Future studies are advised to use the more stringent
conjunction null conjunction approach to identify regional multisensory convergence.
(a)
Auditory Visual Audiovisual
A V AV
FIGURE 13.2 Max and mean criteria. (a) Experimental design. (1) Auditory: environmental sounds;
(2) visual: pictures or video clips; (3) audiovisual: sounds + concurrent pictures. Example stimuli are presented
as visual images and corresponding sound spectrograms. (b–d) Data analysis and interpretation. (b) A region
where audiovisual response is equal to sum of auditory and visual responses is identified as potentially multi-
sensory. However, this activation profile could equally well emerge in a region with independent auditory and
visual neuronal populations. (c and d) A “unisensory” region responding equally to auditory and audiovisual
inputs but not to visual inputs is identified as unisensory by max criterion (C), but as multisensory by mean cri-
terion (d). Bar graphs represent effect for auditory (black), visual (darker gray), and audiovisual (lighter gray)
stimuli, and “multisensory” (gray) effect as defined by max (multisensory enhancement) or mean criteria.
(AV – V). For example, the interaction approach investigates whether the response to an auditory
stimulus depends on the presence versus the absence of a visual stimulus. To relate the interaction
approach to the classical neurophysiological criterion of superadditivity, we can rewrite this formula
as (AV – fixation) ≠ (A – fixation) + (V – fixation) ↔ (AV + fixation) ≠ (A + V). In other words,
the response to the bisensory stimulus is different from the sum of two unisensory stimuli when
presented alone (with each stimulus evoked response being normalized relative to, e.g., prestimulus
baseline activity; Stanford et al. 2005; Perrault et al. 2005). A positive interaction identifies regions
where the bisensory response exceeds the sum of the unisensory responses—hence referred to as
238 The Neural Bases of Multisensory Processes
a superadditive response. Similarly, subadditive (and even suppressive) effects can be identified by
negative interactions. Although previous fMRI research has largely ignored and discarded subad-
ditive interactions for methodological reasons (Beauchamp 2005), recent neurophysiological stud-
ies have clearly revealed the relevance of different, i.e., superadditive and subadditive interaction
profiles for multisensory integration (Stanford et al. 2005; Laurienti et al. 2005; Stanford and Stein
2007; Sugihara et al. 2006; Avillac et al. 2007). This emphasizes the need to develop methodologi-
cal approaches in fMRI that enable the interpretation of subadditive interactions.
A BOLD response profile consistent with a significant superadditive and subadditive interac-
tion cannot be attributed to the summation of independent auditory and visual responses within a
region and hence implicates a region in multisensory integration. Furthermore, in contradistinc-
tion to the conjunction analysis, the interaction approach does not necessitate that a multisensory
region responds to unisensory input from multiple sensory modalities. Therefore, it can also capture
the modulatory interactions in which auditory input modulates the processing of visual input even
though the auditory input does not elicit a response when presented alone. However, this classical
interaction design gives rise to four major drawbacks. First, by definition, the interaction term can
only identify nonlinear combinations of modality-specific inputs, leaving out additive multisensory
integration effects that have been observed at the single neuron level. Second, for the interaction
term to be valid and unbiased, the use of “fixation” (the absence of auditory and visual informa-
tion) precludes that subjects perform a task on the stimuli (Beauchamp 2005). This is because
task-related activations are absent during the “fixation” condition, leading to an overestimation of
the summed unisensory relative to the bisensory fMRI-responses in the interaction term. Yet, even
in the absence of a task, the interaction term may be unbalanced with respect to processes that
are induced by stimuli but not during the fixation condition. For instance, stimulus-induced exog-
enous attention is likely to be enhanced for (A + V) relative to (AV + fixation). Third, subadditive
interactions may be because of nonlinearities or ceiling effects not only in the neuronal but also
in the BOLD response—rendering the interpretation ambiguous. Fourth, during the recognition of
complex environmental stimuli such as speech, objects, or actions, multisensory interactions could
emerge at multiple processing levels, ranging from the integration of low-level spatiotemporal to
higher-level object-related perceptual information. These different types of integration processes
are all included in the statistical comparison (i.e., interaction) when using a “fixation” condition
(Werner and Noppeney 2010c). Hence, a selective dissociation of integration at multiple processing
stages such as spatiotemporal and object-related information is not possible (Figure 13.3).
(a) Auditory
Present Absent
Present
AV V
Visual
Absent
+
A Fix
(b) Interaction:
Superadditive: (AV + Fix) – (A + V) > 0
1
0.8
0.6
0.4
0.2
0
–0.2 A V AV Fix AV + Fix A + V MSI
MSI
FIGURE 13.3 Classical interaction design: 2 × 2 factorial design manipulating presence versus absence of
sensory inputs. (a) Experimental design: 2 × 2 factorial design with the factors (1) auditory: present versus
absent; (2) visual: present versus absent. Example stimuli are presented as visual images and correspond-
ing sound spectrograms. (b–d) Data analysis and interpretation. Three activation profiles are illustrated.
(b) Superadditive interaction as indexed by a positive MSI effect. (c) Subadditive interaction as indexed by
a negative interaction term in context of audiovisual enhancement. (d) Subadditive interaction as indexed by
a negative interaction term in context of audiovisual suppression. Please note that subadditive (yet not sup-
pressive) interactions can also result from nonlinearities in BOLD response. Bar graphs represent effect for
auditory (black), visual (darker gray), and audiovisual (lighter gray) stimuli, and “multisensory” (gray) effect
as defined by audiovisual interaction (AV + Fix) – (A + V). To facilitate understanding, two additional bars are
inserted indicating sums that enter into interaction, i.e., AV + Fix and A + V.
240 The Neural Bases of Multisensory Processes
(a) Auditory
Intact Noise
Intact
AiVi ViAn
Visual
Noise
AiVn AnVn
FIGURE 13.4 Interaction design: 2 × 2 factorial design manipulating reliability of sensory inputs.
(a) Experimental design. 2 × 2 factorial design with the factors (1) auditory: reliable versus unreliable;
(2) visual: reliable versus unreliable. Example stimuli are presented as visual images and corresponding sound
spectrograms. Please note that manipulating stimulus reliability rather than presence evades the problem of
fixation condition. (b) Data analysis and interpretation. One activation profile is illustrated as an example:
superadditive interaction as indexed by a positive MSI effect.
13.1.3.3 Elaborate Interaction Design: m × n Factorial Design (i.e., More than Two Levels)
The drawbacks of the classical interaction design can be ameliorated further if the factorial design
includes more than two levels. For instance, in a 3 × 3 factorial design, auditory and visual modali-
ties may include three levels of sensory input: (1) sensory intact = Vi or Ai, (2) sensory degraded =
Vd or Ad, or (3) sensory absent (Figure 13.5). This more elaborate interaction design enables the
dissociation of audiovisual integration at multiple stages of information processing (Werner and
Noppeney 2010b). The interaction approach can thus open up the potential for a fine-grained char-
acterization of the neural processes underlying the integration of different types of audiovisual
information. In addition to enabling the estimation of interactions, it also allows us to compare
interactions across different levels. For instance, in a 3 × 3 factorial design, we can investigate
whether an additive response combination for degraded stimuli turns into subadditive response
combinations for intact stimuli by comparing the superadditivitydegraded to superadditivityintact (for-
mally: AdVd + fixation – Vd – Ad > AiVi + fixation – Vi – Ai → AdVd – Vd – Ad > AiVi – Vi – Ai).
Thus, an additive integration profile at one particular sensory input level becomes an interesting
finding when it is statistically different from the integration profile (e.g., subadditive) at a different
input level. In this way, the interaction approach that is initially predicated on response nonlineari-
ties is rendered sensitive to additive combinations of unisensory responses. Testing for changes in
superadditivity (or subadditivity) across different stimulus levels can also be used as a test for the
principle of inverse effectiveness. According to the principle of inverse effectiveness, superadditiv-
ity is expected to decrease with stimulus efficacy as defined by, for instance, stimulus intensity or
informativeness. A more superadditive or less subadditive integration profile would be expected
for weak signal intensities (Stein and Stanford 2008). Finally, it should be emphasized that this
Characterization of Multisensory Integration with fMRI 241
(a) Auditory
Intact Degraded Absent
Intact
AiVi AdVi AaVi
Degraded
Visual
1
0.8
0.6
0.4
0.2
0
–0.2 Ad Vd AdVd Fix AdVd Ad + Vd MSId
+ Fix
FIGURE 13.5 “Elaborate” interaction design with more than two levels. (a) Experimental design: 3 × 3 fac-
torial design with factors (1) auditory: (i) auditory intact = Ai, (ii) auditory degraded = Ad, and (iii) auditory
absent Aa; (2) visual: (i) visual intact = Vi, (ii) visual degraded = Vd, and (iii) visual absent Va. Example stimuli
are presented as visual images and corresponding sound spectrograms. (b–d) Data analysis and interpretation.
This more elaborate design enables computation of (b) interaction for intact stimuli (MSIi), (c) interaction for
degraded stimuli (MSId), and (d) inverse effectiveness contrast, i.e., MSId – MSI i = (AdVd – Vd – Ad) – (A iVi –
Vi – Ai) that does not depend on fixation condition.
242 The Neural Bases of Multisensory Processes
more complex inverse effectiveness contrast does not depend on the “fixation” condition, as that is
included on both sides of the inequality (and eliminated from the contrast). Thus, the inverse effec-
tiveness contrast is an elegant way to circumvent the problems associated with the fixation condition
mentioned above (Stevenson et al. 2009; Stevenson and James 2009; Werner and Noppeney 2010b;
also, for a related approach in which audiovisual interactions are compared between intelligible and
nonintelligible stimuli, see Lee and Noppeney 2010).
subsequent stimuli with identical attributes. Repetition suppression has frequently been interpreted
as the fMRI analogue of neuronal response suppression, i.e., a decrease in neuronal firing rate as
recorded in nonhuman primates (Desimone 1996). Despite current uncertainties about its underly-
ing neural mechanisms, fMRI repetition suppression has been widely used as a tool for dissociating
and mapping the various stages of sensory and cognitive processing. These fMRI experiments are
based on the rationale that the sensitivity of a brain region to variations in stimulus attributes deter-
mines the degree of repetition suppression: the more a brain region is engaged in processing and
hence sensitive to a particular stimulus feature, the more it will adapt to stimuli that are identical
with respect to this feature—even though they might vary with respect to other dimensions (Grill-
Spector and Malach 2001; Grill-Spector et al. 2006). Repetition suppression can thus be used to
define the response selectivity and invariance of neuronal populations within a region. Initial fMRI
adaptation paradigms have used simple block designs, i.e., they presented alternating blocks of
“same (adaptation)” versus “different (no adaptation)” stimuli. However, arrangement of the stimuli
in blocks introduces a strong attentional confound that renders the interpretations of the adaptation
effect difficult (even when attempts are made to maintain attention in a control task). More recent
studies have therefore used randomized fMRI adaptation paradigms that reduce attentional top-
down modulation at least to a certain degree. In addition to attentional confounds, task effects (e.g.,
response priming) need to be very tightly controlled in adaptation paradigms (for further discussion,
see Henson and Rugg 2003; Henson 2003).
In the field of multisensory integration, fMRI adaptation may be used to identify “amodal” neu-
ral representations. Thus, despite the changes in sensory modality, a multisensory or amodal region
should show fMRI adaptation when presented with identical stimuli in different sensory modalities.
For instance, presenting identical words subsequently in a written and spoken format, this cross-
modal adaptation effect was used to identify amodal or multisensory phonological representations
(Noppeney et al. 2008; Hasson et al. 2007). fMRI adaptation paradigms may also be combined with
the outlined interaction approach. Here, a 2 × 2 factorial design would manipulate the repetition
of (1) visual and (2) auditory features. A region that integrates visual and auditory features is then
expected to show an interaction between the auditory and visual repetition effects, e.g., an increased
visual adaptation, if the auditory feature is also repeated (Tal and Amedi 2009). This experimental
approach has recently been used to study form and motion integration within the visual domain
(Sarkheil et al. 2008). Most commonly, fMRI adaptation is used to provide insights into subvoxel
neuronal representation. This motivation is based on the so-called fatigue model that proposes that
the fMRI adaptation effect is attributable to a “fatigue” (as indexed by decreased activity) of the
neurons initially responding to a specific stimulus (Grill-Spector and Malach 2001). For instance,
let us then assume that a voxel contains populations of A and B neurons and responds equally to
stimuli A and B, so that a standard paradigm would not be able to reveal selectivity for stimulus
A. Yet, repetitive presentation of stimulus A will only fatigue the A-responsive neurons. Therefore,
subsequent presentation of stimulus B will lead a rebound response of the “fresh” B neurons. Thus,
it was argued the fMRI adaptation can increase the spatial resolution to a subvoxel level. Along
similar lines, fMRI adaptation could potentially be used to dissociate unisensory and multisensory
neuronal populations. In the case of independent populations of visual and auditory neurons (no
multisensory neurons), after adaptation to a specific visual stimulus, a rebound in activation should
be observed when the same stimulus is presented in the auditory modality. This activation increase
should be comparable to the rebound observed when presented with a new unrelated stimulus. In
contrast, if a region contains multisensory neurons, it will adapt when presented with the same
stimulus irrespective of sensory modality. Thus, within the fatigue framework, fMRI adaptation
may help us to dissociate unisensory and multisensory neuronal populations that evade standard
analyses. However, it is likely that voxels containing visual and auditory neurons will also include
audiovisual neurons. This mixture of multiple neuronal populations within a voxel may produce a
more complex adaptation profile than illustrated in our toy example. Furthermore, given the diver-
sity of multisensory enhancement and depression profiles for concurrently presented sensory inputs,
Characterization of Multisensory Integration with fMRI 245
the adaptation profile for asynchronously presented inputs from multiple modalities is not yet well
characterized—it may depend on several factors such as the temporal relationship, stimulus inten-
sity, and a voxel’s responsiveness.
Even in the “simple” unisensory case, the interpretation of fMRI adaptation results is impeded
by our lack of understanding of the underlying neuronal mechanisms as well as the relationship
between the decreased BOLD activation and neuronal response suppression (for review and discus-
sion, see Henson and Rugg 2003; Henson 2003). In fact, multiple models and theories have been
advanced to explain repetition suppression. (1) According to the fMRI adaptation approach (the
“fatigue” model mentioned above), the number of neurons that are important for stimulus represen-
tation and processing remain constant, but show reductions in their firing rates for repeated stimuli
(Grill-Spector and Malach 2001). (2) Repetition suppression has been attributed to a sharpening
1
us
al
ul
od
0.8
tim
0.6
es
ry
so
m
0.4 -Unisensory
Sa
en
0.2
es
m
(b) 0
Sa
–0.2
1 ulus ity
1
e stim odal 0.8
0.8 Sam s o ry m
se n Stim 1 0.6
0.6 rent
0.4 Diffe 0.4 -Amodal
0.2 0.2 -Multisensory?
Stim 1 0 Diff 0
e
–0.2 Sam rent st –0.2
e se imu
nso lus
ry m
oda 1
lity (c) 0.8
D
D
0.6 -Unisensory
iff nt s
iff
er en
er
0.4 -Multisensory
en so
e
t s ry
0.2 -Amodal
tim m
Stim 2 0
ul oda
–0.2
us
1
lit
y
(d) 0.8
0.6 -Unisensory
0.4 -Multisensory
0.2 -Amodal
0
Stim 2
–0.2
FIGURE 13.6 Cross-modal fMRI adaptation paradigm and BOLD predictions. Figure illustrates BOLD
predictions for different stimulus pairs with (1) stimulus and/or (2) sensory modality being same or differ-
ent for the two presentations. Please note that this simplistic toy example serves only to explain fundamental
principles rather than characterizing the complexity of multisensory adaptation profiles (see text for further
discussion). (a) Same stimulus, same sensory modality: decreased BOLD response is expected in unisensory,
multisensory, and amodal areas. (b) Same stimulus, different sensory modality: decreased BOLD response
is expected for higher-order “amodal” regions and not for unisensory regions. Given the complex interaction
profiles for concurrently presented sensory inputs, prediction for multisensory regions is unclear. Different
stimulus, same sensory modality (c) and different stimulus, different sensory modality (d). No fMRI adapta-
tion is expected of unisensory, multisensory, or amodal regions.
246 The Neural Bases of Multisensory Processes
of the cortical stimulus representations, whereby neurons that are not essential for stimulus pro-
cessing respond less for successive stimulus presentations (Wiggs and Martin 1998). (3) In neural
network models, repetition suppression is thought to be mediated by synaptic changes that decrease
the settling time of an attractor neural network (Becker et al. 1997; Stark and McClelland 2000).
(4) Finally, hierarchical models of predictive coding have proposed that response suppression reflects
reduced prediction error, i.e., the brain learns to predict the stimulus attributes on successive expo-
sures to identical stimuli, the firing rate of stimulus-evoked error units are suppressed by top-down
predictions mediated by backward connections from higher-level cortical areas (Friston 2005). The
predictive coding model raises questions about the relationship between cross-modal congruency
and adaptation effects. Both fMRI adaptation and congruency designs manipulate the “congru-
ency” between two stimuli. The two approaches primarily differ in the (a)synchrony between the
two sensory inputs. For instance, spoken words and the corresponding facial movements would be
presented synchronously in a classical congruency paradigm and sequentially in an adaptation para-
digm. The different latencies of the sensory inputs may induce distinct neural mechanisms for con-
gruency and/or adaptation effects. Yet, events in the natural environment often produce temporal
asynchronies between sensory signals. For instance, facial movements usually precede the auditory
speech signal. Furthermore, the asynchrony between visual and auditory signals depends on the
distance between signal source and observer because of differences in velocity of light and sound.
Finally, the neural processing latencies for signals from different sensory modalities depend on the
particular brain regions and stimuli, which will lead, in turn, to variations in the width and asym-
metry of temporal integration windows as a function of stimulus and region. Collectively, the vari-
ability in latency and temporal integration window suggests a continuum between “syn chronous”
congruency effects and “asynchronous” adaptation effects that may rely on distinct and shared
neural mechanisms (Figure 13.6).
signals (Roebroeck et al. 2009; David et al. 2008). As a primarily data-driven approach, the analysis
estimates the Granger causal influences of a seed region on all other voxels in the brain. Because
this analysis approach does not require an a priori selection of regions of interest, it may be very
useful to generate hypotheses that may then be further evaluated on new data in a more constrained
framework. Recently, Granger causality has been used to investigate and reveal top-down influ-
ences from the STS on auditory cortex/planum temporale in the context of letter–speech sound
congruency (multivariate autoregressive models; van Atteveldt et al. 2009) and temporal synchrony
manipulations (directed information transfer; Noesselt et al. 2007). For instance, van Atteveldt et al.
(2009) have suggested that activation increases for congruent relative to incongruent letter–sound
pairs may be mediated via increased connectivity from the STS. Similarly, Granger causality has
been used to investigate the influence of somatosensory areas on the lateral occipital complex dur-
ing shape discrimination (Deshpande et al. 2010; Peltier et al. 2007).
STS STS
AV AV
AV
A V A V
FIGURE 13.7 Candidate dynamic causal models. (a) “Direct” influence DCM: audiovisual costimulation
modulates direct connectivity between auditory and visual regions. (b) “Indirect” influence DCM: audiovisual
costimulation modulates indirect connectivity between auditory and visual regions. STG, superior temporal
gyrus; CaS, calcarine sulcus; A, auditory input; V, visual input; AV, audiovisual input.
indirect pathways via the STS. Partitioning the model space into “direct,” “indi rect,” or “indirect +
direct” models suggested that visual input may influence auditory processing in the superior tem-
poral gyrus via direct and indirect connectivity from visual cortices (Lewis and Noppeney 2010;
Noppeney et al. 2010; Werner and Noppeney 2010a; Figure 13.7).
ACKNOWLEDGMENTS
We thank Sebastian Werner, Richard Lewis, and Johannes Tünnerhoff for helpful comments on a
previous version of this manuscript and JT for his enormous help with preparing the figures.
REFERENCES
Adam, R., U. Noppeney. 2010. Prior auditory information shapes visual category-selectivity in ventral occipito-
temporal cortex. NeuroImage 52:1592–1602.
Allman, B.L., L.P., Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribu-
tion of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167.
Avillac, M., H.S. Ben, and J.R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area of the
macaque monkey. Journal of Neuroscience 27:1922–1932.
Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17:377–391.
250 The Neural Bases of Multisensory Processes
Beauchamp, M.S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics
3:93–113.
Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004. Unraveling multisensory integra-
tion: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–1192.
Becker, S., M. Moscovitch, M. Behrmann, and S. Joordens. 1997. Long-term semantic priming: a computa-
tional account and empirical evidence. Journal of Experimental Psychology. Learning, Memory, and
Cognition 23:1059–1082.
Bonath, B., T. Noesselt, A. Martinez, J. Mishra, K. Schwiecker, H.J. Heinze, and S.A. Hillyard. 2007. Neural
basis of the ventriloquist illusion. Current Biology 17:1697–1703.
Bremmer, F., A. Schlack, N.J. Shah et al. 2001. Polymodal motion processing in posterior parietal and premo-
tor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron
29:287–296.
Busse, L., K.C. Roberts, R.E. Crist, D.H. Weissman, and M.G. Woldorff. 2005. The spread of attention across
modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the
United States of America 102:18751–18756.
Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cerebral Cortex 11:1110–1123.
Calvert, G.A., R. Campbell, and M.J. Brammer. 2000. Evidence from functional magnetic resonance imaging
of crossmodal binding in the human heteromodal cortex. Current Biology 10:649–657.
Calvert, G.A., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration
sites in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14:
427–438.
David, O., I. Guillemain, S. Saillet et al. 2008. Identifying neural drivers with functional MRI: An electrophysi-
ological validation. PLoS Biology 6:2683–2697.
Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith 2004. Cross-modal circuitry between auditory
and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multi-
sensory convergence. Cerebral Cortex 14:387–403.
Deshpande, G., X. Hu, S. Lacey, R. Stilla, and K. Sathian. 2010. Object familiarity modulates effective con-
nectivity during haptic shape perception. NeuroImage 49:1991–2000.
Desimone, R. 1996. Neural mechanisms for visual memory and their role in attention. Proceedings of the
National Academy of Sciences of the United States of America 93:13494–13499.
Doehrmann, O., and M.J. Naumer. 2008. Semantics and the multisensory brain: how meaning modulates pro-
cesses of audio-visual integration. Brain Research 1242:136–150.
Driver, J., and T. Noesselt 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’
brain regions, neural responses, and judgments. Neuron 57:11–23.
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415:429–433.
Fairhall, S.L., and E. Macaluso. 2009. Spatial attention can modulate audiovisual integration at multiple corti-
cal and subcortical sites. European Journal of Neuroscience 29:1247–1257.
Friston, K. 2005. A theory of cortical responses. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences 360:815–836.
Friston, K., C. Chu, J. Mourao-Miranda, O. Hulme, G. Rees, W. Penny, and J. Ashburner. 2008. Bayesian
decoding of brain images. NeuroImage 39:181–205.
Friston, K.J., C. Buechel, G.R. Fink, J. Morris, E. Rolls, and R.J. Dolan. 1997. Psychophysiological and modu-
latory interactions in neuroimaging. NeuroImage 6:218–229.
Friston, K.J., L. Harrison, and W. Penny. 2003. Dynamic causal modelling. NeuroImage 19:1273–1302.
Friston, K.J., A. Holmes, K.J. Worsley, J.B. Poline, C.D. Frith, and R. Frackowiak. 1995. Statistical parametric
mapping: A general linear approach. Human Brain Mapping 2:189–210.
Friston, K.J., A.P. Holmes, C.J. Price, C. Buchel, and K.J. Worsley. 1999. Multisubject fMRI studies and con-
junction analyses. NeuroImage 10:385–396.
Friston, K.J., W.D. Penny, and D.E. Glaser. 2005. Conjunction revisited. NeuroImage 25:661–667.
Goebel, R., A. Roebroeck, D.S. Kim, and E. Formisano. 2003. Investigating directed cortical interactions in
time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic
Resonance Imaging 21:1251–1261.
Gosselin, F., and P.G. Schyns. 2003. Superstitious perceptions reveal properties of internal representations.
Psychological Science 14:505–509.
Grill-Spector, K., and R. Malach. 2001. fMR-adaptation: A tool for studying the functional properties of human
cortical neurons. Acta Psychologica 107:293–321.
Characterization of Multisensory Integration with fMRI 251
Grill-Spector, K., R. Henson, and A. Martin. 2006. Repetition and the brain: neural models of stimulus-specific
effects. Trends in Cognitive Sciences 10:14–23.
Harrison, L., W.D. Penny, and K. Friston. 2003. Multivariate autoregressive modeling of fMRI time series.
NeuroImage 19:1477–1491.
Hasson, U., J.I. Skipper, H.C. Nusbaum, and S.L. Small. 2007. Abstract coding of audiovisual speech: Beyond
sensory representation. Neuron 56:1116–1126.
Haynes, J.D., and G. Rees. 2006. Decoding mental states from brain activity in humans. Nature Reviews.
Neuroscience 7:523–534.
Hein, G., O. Doehrmann, N.G. Muller, J. Kaiser, L. Muckli, and M.J. Naumer. 2007. Object familiar-
ity and semantic congruency modulate responses in cortical audiovisual integration areas. Journal of
Neuroscience 27:7881–7887.
Helbig, H.B., M.O. Ernst, E. Ricciardi, P. Pietrini, A. Thielscher, K.M. Mayer, J. Schultz, and U. Noppeney.
2010. Reliability of visual information modulates tactile shape processing in primary somatosensory
cortices (Submitted for publication).
Henson, R.N. 2003. Neuroimaging studies of priming. Progress in Neurobiology 70:53–81.
Henson, R.N., and M.D. Rugg. 2003. Neural response suppression, haemodynamic repetition effects, and
behavioural priming. Neuropsychologia 41:263–270.
Hinrichs, H., H.J. Heinze, and M.A. Schoenfeld. 2006. Causal visual interactions as revealed by an information
theoretic measure and fMRI. NeuroImage 31:1051–1060.
Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–1574.
Knill, D.C., and J.A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judg-
ments of surface slant? Vision Research 43:2539–2558.
Kriegeskorte, N., R. Goebel, and P. Bandettini. 2006. Information-based functional brain mapping. Proceedings
of the National Academy of Sciences of the United States of America 103:3863–3868.
Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity
as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental
Brain Research 166:289–297.
Lee, H., and U. Noppeney. Physical and perceptual factors shape the neural mechanisms that integrate audiovi-
sual signals in speech comprehension (submitted for publication).
Lewis, R., and U. Noppeney. 2010. Audiovisual synchrony improves motion discrimination via enhanced con-
nectivity between early visual and auditory areas. Journal of Neuroscience 30:12329–12339.
Macaluso, E., C.D. Frith, and J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial atten-
tion. Science 289:1206–1208.
Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex.
Neuroreport 20:126–131.
Nandy, R.R., and D. Cordes. 2003. Novel nonparametric approach to canonical correlation analysis with appli-
cations to low CNR functional MRI data. Magnetic Resonance in Medicine 50:354–365.
Nichols, T., M. Brett, J. Andersson, T. Wager, and J.B. Poline. 2005 Valid conjunction inference with the mini-
mum statistic. NeuroImage 25:653–660.
Noesselt, T., J.W. Rieger, M.A. Schoenfeld et al. 2007. Audiovisual temporal correspondence modulates
human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience
27:11431–11441.
Noppeney, U., K. Friston, and C. Price. 2003. Effects of visual deprivation on the organisation of the semantic
system. Brain 126:1620–1627.
Noppeney, U., O. Josephs, J. Hocking, C.J. Price, and K.J. Friston. 2008. The effect of prior visual information
on recognition of speech and sounds. Cerebral Cortex 18:598–609.
Noppeney, U., D. Ostwald, S. Werner. 2010. Perceptual decisions formed by accumulation of audiovisual evi-
dence in prefrontal cortex. Journal of Neuroscience 30:7434–7446.
Peltier, S., R. Stilla, E. Mariola, S. LaConte, X. Hu, and K. Sathian. 2007. Activity and effective connectivity of
parietal and occipital cortical regions during haptic shape perception. Neuropsychologia 45:476–483.
Penny, W.D., K.E. Stephan, A. Mechelli, and K.J. Friston. 2004. Comparing dynamic causal models. NeuroImage
22:1157–1172.
Pereira, F., T. Mitchell, and M. Botvinick. 2009. Machine learning classifiers and fMRI: A tutorial overview.
NeuroImage 45:S199–S209.
Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use dis-
tinct operational modes in the integration of multisensory stimuli. Journal of Neurophysiology 93:
2575–2586.
252 The Neural Bases of Multisensory Processes
Price, C.J., R.J. Wise, and R.S. Frackowiak. 1996. Demonstrating the implicit processing of visually presented
words and pseudowords. Cerebral Cortex 6:62–70.
Roebroeck, A., E. Formisano, and R. Goebel. 2005. Mapping directed influence over the brain using Granger
causality and fMRI. NeuroImage 25:230–242.
Roebroeck, A., E. Formisano, and R. Goebel. 2009. The identification of interacting networks in the brain using
fMRI: Model selection, causality and deconvolution. NeuroImage.
Sadaghiani, S., J.X. Maier, and U. Noppeney. 2009. Natural, metaphoric, and linguistic auditory direction sig-
nals have distinct influences on visual motion processing. Journal of Neuroscience 29:6490–6499.
Sarkheil, P., Q.C. Vuong, H.H. Bulthoff, and U. Noppeney. 2008. The integration of higher order form and
motion by the human brain. NeuroImage 42:1529–1536.
Schroeder, C.E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current
Opinion in Neurobiology 15:454–458.
Seltzer, B., M.G. Cola, C. Gutierrez, M. Massee, C. Weldon, and C.G. Cusick. 1996. Overlapping and non-
overlapping cortical projections to cortex of the superior temporal sulcus in the rhesus monkey: Double
anterograde tracer studies. Journal of Comparative Neurology 370:173–190.
Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in
context. Neuroreport 18:787–792.
Stanford, T.R., S. Quessy, B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in
the cat superior colliculus. Journal of Neuroscience 25:6499–6508.
Stark, C.E., and J.L. McClelland. 2000. Repetition priming of words, pseudowords, and nonwords. Journal of
Experimental Psychology. Learning, Memory, and Cognition 26:945–972.
Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: current issues from the perspective of the single
neuron. Nature Reviews. Neuroscience 9:255–266.
Stein, B.E., T.R. Stanford, R. Ramachandran, T.J. Perrault Jr., and B.A. Rowland. 2009. Challenges in quantify-
ing multisensory integration: alternative criteria, models, and inverse effectiveness. Experimental Brain
Research 198(2–3):131–126.
Stephan, K.E., W.D. Penny, J. Daunizeau, R.J. Moran, and K.J. Friston. 2009. Bayesian model selection for
group studies. NeuroImage 46(4):1004–1017. Erratum in NeuroImage 48(1):311.
Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse
effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210–1223.
Stevenson, R.A., S. Kim, and T.W. James. 2009. An additive-factors design to disambiguate neuronal and areal
convergence: Measuring multisensory interactions between audio, visual, and haptic sensory streams
using fMRI. Experimental Brain Research 198(2–3):183–194
Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual
communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience
26:11138–11147.
Tal, N., and A. Amedi. 2009. Multisensory visual-tactile object related network in humans: insights gained
using a novel crossmodal adaptation approach. Experimental Brain Research 198:165–182.
van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in
the human brain. Neuron 43:271–282.
van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007a. The effect of temporal asynchrony on
the multisensory integration of letters and speech sounds. Cerebral Cortex 17:962–974.
van Atteveldt, N.M., E. Formisano, R. Goebel, and L. Blomert. 2007b. Top-down task effects overrule automatic
multisensory responses to letter-sound pairs in auditory association cortex. NeuroImage 36:1345–1360.
van Atteveldt, N., A. Roebroeck, and R. Goebel. 2009. Interaction of speech and script in human auditory cor-
tex: Insights from neuro-imaging and effective connectivity. Hearing Research 258(1–2):152–164
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–2172.
Werner, S., and U. Noppeney. 2010a. Distinct functional contributions of primary sensory and association areas
to audiovisual integration in object categorization. Journal of Neuroscience 30:2662–2675.
Werner, S., and U Noppeney. 2010b. Superadditive responses in superior temporal sulcus predict audiovisual
benefits in object categorization. Cerebral Cortex 20:1829–1842.
Werner, S., and U. Noppeney. 2010c. The contributions of transient and sustained response codes to audiovisual
integration. Cerebral Cortex 21(4):920–931.
Wiggs, C.L., and A. Martin. 1998. Properties and mechanisms of perceptual priming. Current Opinion in
Neurobiology 8:227–233.
14 Modeling Multisensory
Processes in Saccadic
Responses
Time-Window-of-
Integration Model
Adele Diederich and Hans Colonius
CONTENTS
14.1 Summary............................................................................................................................... 253
14.2 Multisensory Processes Measured through Response Time.................................................254
14.3 TWIN Modeling.................................................................................................................... 255
14.3.1 Basic Assumptions..................................................................................................... 255
14.3.2 Quantifying Multisensory Integration in the TWIN Model...................................... 257
14.3.3 Some General Predictions of TWIN......................................................................... 257
14.4 TWIN Models for Specific Paradigms: Assumptions and Predictions................................. 258
14.4.1 Measuring Cross-Modal Effects in Focused Attention and Redundant Target
Paradigms.................................................................................................................. 258
14.4.2 TWIN Model for the FAP......................................................................................... 259
14.4.2.1 TWIN Predictions for the FAP...................................................................260
14.4.3 TWIN Model for RTP............................................................................................... 263
14.4.3.1 TWIN Predictions for RTP.........................................................................264
14.4.4 Focused Attention versus RTP.................................................................................. 265
14.5 TWIN Model for Focused Attention: Including a Warning Mechanism..............................266
14.5.1 TWIN Predictions for FAP with Warning................................................................268
14.6 Conclusions: Open Questions and Future Directions............................................................ 270
Appendix A..................................................................................................................................... 271
A.1 Deriving the Probability of Interaction in TWIN................................................................. 271
A.1.1 Focused Attention Paradigm..................................................................................... 271
A.1.2 Redundant Target Paradigm...................................................................................... 272
A.1.3 Focused Attention and Warning................................................................................ 273
References....................................................................................................................................... 274
14.1 SUMMARY
Multisensory research within experimental psychology has led to the emergence of a number of
lawful relations between response speed and various empirical conditions of the experimental setup
(spatiotemporal stimulus configuration, intensity, number of modalities involved, type of instruc-
tion, and so forth). This chapter presents a conceptual framework to account for the effects of
253
254 The Neural Bases of Multisensory Processes
cross- modal stimulation on response speed. Although our framework applies to measures of cross-
modal response speed in general, here we focus on modeling saccadic reaction time as a measure
of orientation performance toward cross-modal stimuli.
The central postulate is the existence of a critical “time-window-of-integration” (TWIN) con-
trolling the combination of information from different modalities. It is demonstrated that a few
basic assumptions about this timing mechanism imply a remarkable number of empirically testable
predictions. After introducing a general version of the TWIN model framework, we present various
specifications and extensions of the original model that are geared toward more specific experi-
mental paradigms. Our emphasis will be on predictions and empirical testability of these model
versions, but for experimental data, we refer the reader to the original literature.
those to auditory or somatosensory stimuli. Note also, as the superior colliculus is an important site
of oculomotor control (e.g., Munoz and Wurtz 1995), measuring saccadic responses is an obvious
choice for studying the behavioral consequences of multisensory integration.
assumed to be stochastically independent random variables. This leads to the first postulate of the
TWIN model:
(B1) First Stage Assumption: The first stage consists in a (stochastically independent) race among the
peripheral processes in the visual, auditory, and/or somatosensory pathways triggered by a cross-modal
stimulus complex.
The existence of a critical “spatiotemporal window” for multisensory integration to occur has been
suggested by several authors, based on both neurophysiological and behavioral findings in humans,
monkey, and cat (e.g., Bell et al. 2005; Meredith 2002; Corneil et al. 2002; Meredith et al. 1987; see
Navarra et al. 2005 for a recent behavioral study). This integration may manifest itself in the form of
an increased firing rate of a multisensory neuron (relative to unimodal stimulation), an acceleration
of saccadic reaction time (Frens et al. 1995; Diederich et al. 2003), an effective audiovisual speech
integration (Van Wassenhove et al. 2007), or in an improved or degraded judgment of temporal
order of bimodal stimulus pairs (cf. Spence and Squire 2003).
One of the basic tenets of the TWIN framework, however, is the priority of temporal proximity
over any other type of proximity: rather than assuming a joint spatiotemporal window of integra-
tion permitting interaction to occur only for both spatially and temporally neighboring stimuli, the
TWIN model allows for cross-modal interaction to occur, for example, even for spatially rather
distant stimuli of different modalities as long as they fall within the time window.
(B2) TWIN Assumption: Multisensory integration occurs only if the peripheral processes of the first
stage all terminate within a given temporal interval, the TWIN.
In other words, a visual and an auditory stimulus may occur at the same spatial location, or the lip
movements of a speaker may be perfectly consistent with the utterance, no intersensory interaction
effect will be possible if the data from the two sensory channels are registered too distant from
each other in time. Thus, the window acts like a filter determining whether afferent information
delivered from different sensory organs is registered close enough in time to allow for multisensory
integration. Note that passing the filter is a necessary, but not sufficient, condition for multisensory
integration to occur. The reason is that the amount of multisensory integration also depends on other
aspects of the stimulus set, such as the spatial configuration of the stimuli. For example, response
depression may occur with nearly simultaneous but distant stimuli, making it easier for the organ-
ism to focus attention on the more important event. In other cases, multisensory integration may fail
to occur—despite near-simultaneity of the unisensory events—because the a priori probability for
a cross-modal event is very small (e.g., Körding et al. 2007).
Although the priority of temporal proximity seems to afford more flexibility for an organism
in a complex environment, the next assumption delimits the role of temporal proximity to the first
processing stage:
(B3) Assumption of Temporal Separability: The amount of interaction manifesting itself in an
increase or decrease of second stage processing time is a function of cross-modal stimulus features, but
it does not depend on the presentation asynchrony (stimulus onset asynchrony, SOA) of the stimuli.
This assumption is based on a distinction between intra- and cross-modal stimulus properties,
where the properties may refer to both subjective and physical properties. Cross-modal properties
are defined when stimuli of more than one modality are present, such as spatial distance of target
to nontarget, or subjective similarity between stimuli of different modalities. Intramodal proper-
ties, on the other hand, refer to properties definable for a single stimulus, regardless of whether this
property is definable in all modalities (such as intensity) or in only one modality (such as wavelength
for color or frequency for pitch). Intramodal properties can affect the outcome of the race in the
first stage and, thereby, the probability of an interaction. Cross-modal properties may affect the
amount of cross-modal interaction occurring in the second stage. Note that cross-modal features
cannot influence first stage processing time because the stimuli are still being processed in separate
pathways.
Modeling Multisensory Processes in Saccadic Responses 257
(B4) Second Stage Assumption: The second stage comprises all processes after the first stage includ-
ing preparation and execution of a response.
The assumption of only two stages is certainly an oversimplification. Note, however, that the second
stage is defined here by default: it includes all subsequent, possibly overlapping, processes that are
not part of the peripheral processes in the first stage (for a similar approach, see Van Opstal and
Munoz 2004). Thus, the TWIN model retains the classic notion of a race mechanism as an explana-
tion for cross-modal interaction but restricts it to the very first stage of stimulus processing.
where S1 and S2 refer to first and second stage processing time, respectively (a base time would also
be subsumed under S2). Let I denote the event that multisensory integration occurs, having prob-
ability P(I). For the expected reaction time in the cross-modal condition then follows:
E[RTcrossmodal ] = E[ S1 ] + E[ S2 ]
= E[ S1 ] + P( I ) ⋅ E[ S2 | I ] + (1 − P (I )) ⋅ E[S2 | I c ]
where E[S2|I] and E[S2|Ic] denote the expected second stage processing time conditioned on interac-
tion occurring (I) or not occurring (Ic), respectively. Putting Δ ≡ E[S2|Ic] – E[S2|I], this becomes
That is, mean RT to cross-modal stimuli is the sum of mean RT of the first stage processing time,
mean RT of the second stage processing when no interaction occurs, and the term P(I) · Δ, which
is a measure of the expected amount of intersensory interaction in the second stage with positive Δ
values corresponding to facilitation, and negative values corresponding to inhibition.
This factorization of expected intersensory interaction into the probability of interaction P(I)
and the amount and sign of interaction (Δ) is an important feature of the TWIN model. According
to Assumptions B1 to B4, the first factor, P(I), depends on the temporal configuration of the stimuli
(SOA), whereas the second factor, Δ, depends on nontemporal aspects, in particular their spatial
configuration. Note that this separation of temporal and nontemporal factors is in accordance with
the definition of the window of integration: the incidence of multisensory integration hinges on the
stimuli to occur in temporal proximity, whereas the amount and sign of interaction (Δ) is modulated
by nontemporal aspects, such as semantic congruity or spatial proximity reaching, in the latter case,
from enhancement for neighboring stimuli to possible inhibition for distant stimuli (cf. Diederich
and Colonius 2007b).
experimental cross-modal paradigms. Nonetheless, even at the general level of the framework intro-
duced thus far, a number of qualitative empirical predictions of TWIN are possible.
SOA effects. The amount of cross-modal interaction should depend on the SOA between the
stimuli because the probability of integration, P(I), changes with SOA. Let us assume that two stim-
uli from different modalities differ considerably in their peripheral processing times. If the faster
stimulus is delayed (in terms of SOA) so that the arrival times of both stimuli have a high probability
of falling into the window of integration, then the amount of cross-modal interaction should be larg-
est for that value of SOA (see, e.g., Frens et al. 1995; Colonius and Arndt 2001).
Intensity effects. Stimuli of high intensity have relatively fast peripheral processing times.
Therefore, for example, if a stimulus from one modality has a high intensity compared to a stimulus
from the other modality, the chance that both peripheral processes terminate within the time win-
dow will be small, assuming simultaneous stimulus presentations. The resulting low value of P(I) is
in line with the empirical observation that a very strong signal will effectively rule out any further
reduction of saccadic RT by adding a stimulus from another modality (e.g., Corneil et al. 2002).
Cross-modal effects. The amount of multisensory integration (Δ) and its sign (facilitation or inhi-
bition) occurring in the second stage depend on cross-modal features of the stimulus set, for exam-
ple, spatial disparity and laterality (laterality here refers to whether all stimuli appear in the same
hemisphere). Cross-modal features cannot have an influence on first stage processing time because
the modalities are being processed in separate pathways. Conversely, parameter Δ not depending on
SOA cannot change its sign as a function of SOA and, therefore, the model cannot simultaneously
predict facilitation to occur for some SOA values and inhibition for others. Some empirical evidence
against this prediction has been observed (Diederich and Colonius 2008).
In the classic race model, the addition of a stimulus from a modality not yet present will increase
(or, at least, not decrease) the amount of response facilitation. This follows from the fact that—
even without assuming stochastic independence—the probability of the fastest of several processes
terminating processing before time t will increase with the number of “racers” (e.g., Colonius and
Vorberg 1994). In the case of TWIN, both facilitation and inhibition are possible under certain
conditions as follows:
Number of modalities effect. The addition of a stimulus from a modality not yet present will
increase (or, at least, not decrease) the expected amount of interaction if the added stimulus is not
“too fast” and the time window is not “too small.” The latter restrictions are meant to guarantee that
the added stimulus will fall into the time window, thereby increasing the probability of interaction
to occur.
In the redundant target paradigm (RTP; also known as the divided attention paradigm), stimuli
from different modalities are presented simultaneously or with certain SOA, and the participant
is instructed to respond to the stimulus detected first. Typically, the time to respond in the cross-
Modeling Multisensory Processes in Saccadic Responses 259
modal condition is faster than in either of the unimodal conditions. In the focused attention para-
digm (FAP), cross-modal stimulus sets are presented in the same manner, but now participants are
instructed to respond only to the onset of a stimulus from a specifically defined target modality,
such as the visual, and to ignore the remaining nontarget stimulus (the tactile or the auditory). In
the latter setting, when a stimulus of a nontarget modality, for example, a tone, appears before the
visual target at some spatial disparity, there is no overt response to the tone if the participant is
following the task instructions. Nevertheless, the nontarget stimulus has been shown to modulate
the saccadic response to the target: depending on the exact spatiotemporal configuration of target
and nontarget, the effect can be a speedup or an inhibition of saccadic RT (see, e.g., Amlôt et al.
2003; Diederich and Colonius 2007b), and the saccadic trajectory can be affected as well (Doyle
and Walker 2002).
Some striking similarities to human data have been found in a detection task utilizing both
paradigms. Stein et al. (1988) trained cats to orient to visual or auditory stimuli, or both. In one
paradigm, the target was a visual stimulus (a dimly illuminating LED) and the animal learned that
although an auditory stimulus (a brief, low-intensity broadband noise) would be presented periodi-
cally, responses to it would never be rewarded, and the cats learned to “ignore” it (FAP). Visual–
auditory stimuli were always presented spatially coincident, but their location varied from trial to
trial. The weak visual stimulus was difficult to detect and the cats’ performance was <50% correct
detection. However, combining the visual stimulus with the neutral auditory stimulus markedly
enhanced performance, regardless of their position. A similar result was obtained when animals
learned that both stimuli were potential targets (RTP). In a separate experiment in which the visual
and the (neutral) auditory stimuli were spatially disparate, however, performance was significantly
worse than when the visual stimulus was presented alone (cf. Stein et al. 2004).
A common method to assess the amount of cross-modal interaction is to use a measure that
relates mean RT in cross-modal conditions to that in the unimodal condition. The following defini-
tions quantify the percentage of RT enhancement in analogy to a measure proposed for measuring
multisensory enhancement in neural responses (cf. Meredith and Stein 1986; Anastasio et al. 2000;
Colonius and Diederich 2002; Diederich and Colonius 2004a, 2004b). For visual, auditory, and
visual–auditory stimuli with observed mean (saccadic or manual) reaction time, RTV, RTA, and
RTVA, respectively, and SOA = τ, the multisensory response enhancement (MRE) for the redundant
target task is defined as
where RTAV,τ refers to observed mean RT to the bimodal stimulus with SOA = τ. For the focused
attention task, MRE is defined as
RTV − RTVA , τ
MRE FAP = ⋅ 100, (14.4)
RTV
The idea here is that the winning nontarget will keep the saccadic system in a state of heightened
reactivity such that the upcoming target stimulus, if it falls into the time window, will trigger cross-
modal interaction. At the neural level, this may correspond to a gradual inhibition of fixation neu-
rons (in the superior colliculus) and/or omnipause neurons (in the midline pontine brain stem). In
the case of the target being the winner, no discernible effect on saccadic RT is predicted, such as in
the unimodal situation.
The race in the first stage of the model is made explicit by assigning statistically independent,
nonnegative random variables V and A to the peripheral processing times, for example, for a visual
target and an auditory nontarget stimulus, respectively. With τ as SOA value and ω as integration
window width parameter, Assumption B2-FAP amounts to the event that multisensory integration
occurs, IFAP , being
Thus, the probability of integration to occur, P(IFAP), is a function of both τ and ω, and it can be
determined numerically once the distribution functions of A and V have been specified.
Expected reaction time in the bimodal condition then is (cf. Equation 14.2)
c
E[RTVA,τ ] = E[V ] + E[ S2 | I FAP ] − P( I FAP ) ⋅ ∆. (14.5)
No interaction is possible in the unimodal condition. Thus, the expected reaction time for the visual
(target) stimulus condition is
c
E[RTV ] = E[V ] + E[ S2 | I FAP ]. (14.6)
Note that in the focused attention task, the first stage duration is defined as the time it takes to pro-
cess the (visual) target stimulus, E[V]. Cross-modal interaction (CI) is defined as difference between
mean RT to the unimodal and cross-modal stimuli, i.e.,
Thus, the separation of temporal and nontemporal factors expressed in the above equation for the
observable CI is directly inherited from Assumptions B4 and B2-FAP.
final part of this section. If not specifically mentioned otherwise, we always assume nonnegative Δ
values in the following elaborations.
SOA effects. When the nontarget is presented very late relative to the target (large positive SOA),
its chance of winning the race against the target and thus opening the window of integration become
very small. When it is presented rather early (large negative SOA), it is likely to win the race and
to open the window, but the window may be closed by the time the target arrives. Again, the prob-
ability of integration, P(IFAP), is small. Therefore, the largest probability of integration is expected
for some midrange SOA values. Although P(IFAP) is unobservable, it should leave its mark on a well-
known observable measure, i.e., MRE. In fact, MREFAP, defined in (Equation 14.4) as a function of
SOA, should have the same form as P(IFAP), scaled only by some constant:
MRE FAP =
( RTV − RTVA , τ ) ⋅ 100
RTV
P( I FAP ) ⋅ ∆
= ⋅ 100 (14.8)
RTV
= P( I FAP ) ⋅ ∆ ⋅ const.
Intensity effects. Increasing the intensity of the visual stimulus will speed up visual peripheral
processing (up to some minimum level) thereby increasing the chance for the visual target to win the
race. Thus, the probability that the window of integration opens decreases, predicting less multisen-
sory integration. Increasing the intensity of the nontarget auditory stimulus, on the other hand, leads
to the opposite prediction: the auditory stimulus will have a better chance to win the race and to
open the window of integration, hence, predicting more multisensory integration to occur on aver-
age. Two further distinctions can be made. For large negative SOA, i.e., when the auditory nontarget
arrives very early, further increasing the auditory intensity makes is more likely for the TWIN to
close before the target arrives and therefore results in a lower P(IFAP) value. For smaller negative
SOA, however, i.e., when the nontarget is presented shortly before the target, increasing the auditory
intensity improves its chances to win against the target and to open the window. Given the com-
plexity of these intensity effects, however, more specific quantitative predictions will require some
distributional assumptions for the first stage processing times (see below). Alternatively, it may be
feasible to adapt the “double factorial paradigm” developed by Townsend and Nozawa (1995) to
analyze predictions when the effects of both targets and nontargets presented at two different inten-
sities levels are observed.
Cross-modal effects. If target and nontarget are presented in two distinct cross-modal condi-
tions, one would expect parameter Δ to take on two different values. For example, for two spatial
conditions, ipsilateral and contralateral, the values could be Δi and Δc, respectively. Subtracting the
corresponding cross-modal interaction terms then gives (cf. Equation 14.7)
an expression that should again yield the same qualitative behavior, as a function of SOA, as P(IFAP).
In a similar vein, one can capitalize on the factorization of expected cross-modal interaction if some
additional experimental factor affecting Δ, but not P(IFAP), is available. In Colonius et al. (2009), an
auditory background masker stimulus, presented at increasing intensity levels, was hypothesized to
simultaneously increase Δc and decrease Δi. The ratio of CIs in both configurations,
CI i P( I FAP ) ⋅ ∆i ∆i
= = , (14.10)
CI c P( I FAP ) ⋅ ∆c ∆c
262 The Neural Bases of Multisensory Processes
should then remain invariant across SOA values, with a separate value for each level of the
masker.
Number of nontargets effects. For cross-modal interaction to occur in the focused attention task,
it is necessary that the nontarget process wins the race in the first stage. With two or more nontar-
gets entering the race, the probability of one of them winning against the target process increases
and, therefore, the probability of opening the window of integration increases with the number of
nontargets present. In this case, there are even two different ways of utilizing the factorization of CI,
both requiring the existence of two cross-modal conditions with two different Δ parameters (spatial
or other). The first test is analogous to the previous one. Because the number of nontargets affects
P(IFAP) only, the ratio in Equation 14.10 should be the same whether it is computed from conditions
with one or two nontargets. The second test results from taking the ratio of CI based on one non-
target, C1, over CI based on two nontargets, C2. Because Δ should not be affected by the number of
nontargets, the ratio
where P1 and P2 refer to the probability of opening the window under one or two nontargets, respec-
tively, should be the same, no matter from which one of the two cross-modal conditions it was
computed. In the study of Diederich and Colonius (2007a), neither of these tests revealed evidence
against these TWIN predictions.
SOA and intensity effects predicted by a parametric TWIN version. Assuming exponential dis-
tributions for the peripheral processing times, the intensity parameter for the visual modality is set
to 1/λV = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the auditory nontarget. Quantitative predic-
tions of TWIN for focused attention are shown in the left of Figure 14.1. Panels 1 and 2 show mean
RT and P(IFAP) as a function of SOA for the various intensities of the auditory nontarget. Note that
two intensities result in faster mean RT, whereas two intensities result in lower mean RT, compared
to mean unimodal RT to the visual target. Here, the parameter for second stage processing time
when no integration occurs, μ, was set to 100 ms. The TWIN was set to 200 ms. The parameter for
multisensory integration was set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, implying
a facilitation effect. Note that neither λV nor μ are directly observable, but the sum of the peripheral
and central processing time for the visual target stimulus constitutes a prediction for unimodal
mean saccadic RT:
1
E[RTV ] = + ,
λV
which, for the present example, is 50 ms + 100 ms = 150 ms. The dashed line and the dotted line
show the bimodal RT predictions for the auditory nontargets with the highest and lowest intensity,
respectively.
No fits to empirical data sets are presented here, but good support of TWIN has been found thus
far (see, e.g., Diederich and Colonius 2007a, 2007b, 2008; Diederich et al. 2008). Close correspon-
dence between data and model prediction, however, is not the only aspect to consider. Importantly,
the pattern of parameter values estimated for a given experimental setting should suggest a mean-
ingful interpretation. For example, increasing stimulus intensities are reflected in a decrease of the
corresponding λ parameters, assuming higher intensities to lead to faster peripheral processing
times (at least, within certain limits). Furthermore, in the study with an auditory background masker
(Colonius et al. 2009), the cross-modal interaction parameter (Δ) was a decreasing or increasing
function of masker level for the contralateral or ipsilateral condition, respectively, as predicted.
Modeling Multisensory Processes in Saccadic Responses 263
150 150
Mean RT (ms)
130
140
110
130 90
1 1
Pr(I)
0.5 0.5
0 0
30
12
20
8
MRE
4 10
0 0
−400 −300 −200 −100 0 100 200 0 20 40 100 200 300
SOA (ms) SOA (ms)
FIGURE 14.1 TWIN predictions for FAP (left panels) and RTP (right panels). Parameters in both paradigms
were chosen to be identical. Mean RT for visual stimulus is 150 ms (1/λV = 50, μ = 100). Peripheral process-
ing times for auditory stimuli are 1/λA = 10 ms (dashed line), 1/λA = 30 ms (solid), 1/λA = 70 ms (dash-dotted),
and 1/λA = 90 ms (dotted). Interaction parameter is Δ = 20 ms.
To compute the probability of interaction in the RTP, P(IRTP), we assume that a visual and an
auditory stimulus are presented with an SOA equal to τ. Then, either the visual stimulus wins, V <
A + τ, or the auditory stimulus wins, A + τ < V; so, in either case, min(V, A + τ) < max(V, A + τ) and,
by Assumption B2-RTP,
Thus, the probability of integration to occur is a function of both τ and ω, as before. Expected reac-
tion time in the cross-modal condition is computed as (see Equation 14.2)
c
E[RTVA,τ ] = E[min(V , A + τ )] + E[ S2 | I RTP ] − P( I RTP ) ⋅ ∆. (14.12)
In the RTP, first stage duration is determined by the termination time of the winner. This is an
important difference to the focused attention situation in which first stage duration is defined by
the time it takes to process the (visual) target stimulus. Even for a zero probability of interaction,
expected reaction time in the bimodal condition is smaller than, or equal to, either of the unimodal
stimulus conditions. These are
c
E[RTV ] = E[V ] + E[ S2 | I RTP ] (14.13)
and
c
E[RTA ] = E[ A] + E[ S2 | I RTP ], (14.14)
because in the redundant target version of TWIN, the race in the first stage produces a statistical
facilitation effect equivalent to the one in the classic race model. Thus, a possible cross-modal
enhancement observed in a redundant target task may be because of multisensory integration or sta-
tistical facilitation, or both. Moreover, a possible cross-modal inhibition effect may be weakened by
the simultaneous presence of statistical facilitation in the first stage. Predictions for the redundant
target case are less straightforward than for focused attention because the factorization of cross-
modal interaction (CI) in the latter is no longer valid. Nevertheless, some general predictions can be
made assuming, as before, a multisensory facilitation effect, i.e., Δ > 0.
SOA and intensity effects predicted by a parametric TWIN version. Figure 14.1 (right panels)
shows the quantitative predictions of TWIN for SOA and intensity variations under exponential
distributions for the peripheral processing times. Parameters are the same as for the FAP predic-
tions (left panels). Panels 1 and 2 show mean RT and P(I) as a function of SOA for various intensity
levels (λ parameters) of the auditory stimulus. Both panels exhibit the predicted monotonicity in
SOA and intensity. The third panel, depicting MRE, reveals some nonmonotonic behavior in both
SOA and intensity.
Without going into numerical details, this nonmonotonicity of MRE can be seen to be because
of a subtle interaction between two mechanisms, both being involved in the generation of MRE:
(1) statistical facilitation occurring in the first stage and (2) opening of the time window. The former
is maximal if presentation of the stimulus processed faster is delayed by an SOA equal to the differ-
ence in mean RT in the unimodal stimulus conditions, that is when peripheral processing times are
in physiological synchrony; for example, if mean RT to an auditory stimulus is 110 ms and mean
RT to a visual stimulus is 150 ms, the maximal amount of statistical facilitation is expected when
the auditory stimulus is presented 150 ms – 110 ms = 40 ms after the visual stimulus. The SOA
value being “optimal” for statistical facilitation, however, need not be the one producing the highest
probability of opening the time window that was shown to be decreasing with SOA. Moreover, the
nonmonotonicity in intensity becomes plausible if one realizes that variation in intensity results in a
change in mean processing time analogous to an SOA effect: for example, lowering auditory stimu-
lus intensity has an effect on statistical facilitation and the probability of opening the time window
that is comparable to increasing SOA.
It means that any realization of the peripheral processing times that leads to an opening of the time
window under the focused attention instruction also leads to the same event under the redundant
target instruction. Thus, the probability of integration under redundant target instructions cannot
be smaller than that under focused attention instruction: P(IFAP) ≤ P(IRTP), given identical stimulus
conditions (see also Figure 14.1).
Inverse effectiveness. It is instructive to consider the effect of varying stimulus intensity in both
paradigms when both stimuli are presented simultaneously (SOA = 0) and at intensity levels produc-
ing the same mean peripheral speed, i.e., with the same intensity parameters, λV = λA. Assuming
exponential distributions, Figure 14.2 depicts the probability of integration (upper panels) and MRE
(lower panels) as a function of time window width (ω) for both paradigms and with each curve pre-
senting a specific intensity level. The probability of integration increases monotonically from zero
(for ω = 0) toward 0.5 for the focused attention, and toward 1 for the RTP. For the former, the prob-
ability of integration cannot surpass 0.5 because, for any given window width, the target process has
the same chance of winning as the nontarget process under the given λ parameters. For both para-
digms, P(I), as a function of ω, is ordered with respect to intensity level: it increases monotonically
266 The Neural Bases of Multisensory Processes
FAP RTP
1 1
0.75 0.75
Pr(I)
Pr(I)
0.5 0.5
0.25 0.25
0 0
0 50 100 200 300 0 50 100 200 300
Time window width (ms) Time window width (ms)
FAP RTP
30 30
20 20
MRE
MRE
10 10
0 0
0 50 100 200 300 0 50 100 200 300
Time window width (ms) Time window width (ms)
FIGURE 14.2 TWIN predictions for FAP (left panels) and RTP (right panels) as a function of time win-
dow width (ω) at SOA = 0. Upper panels depict probability of integration P(I), whereas lower panels show
MRE. Each curve corresponds to a specific intensity parameter of stimuli. Peripheral processing times for
auditory and visual stimuli are 1/λA = 1/λV equal to 30 ms (dashed line), 50 ms (solid), 70 ms (dash-dotted),
and 90 ms (black dotted). Mean second stage processing time is μ = 100 ms). Interaction parameter is
Δ = 20 ms.
with the mean processing time of both stimuli* (upper panels of Figure 14.2). The same ordering is
found for MRE in the FAP; somewhat surprisingly, however, the ordering is reversed for MRE in
the RTP: increasing intensity implies less enhancement, i.e., it exhibits the “inverse effectiveness”
property often reported in empirical studies (Stein and Meredith 1993; Rowland and Stein 2008).
Similar to the above discussion of intensity effects for RTP, this is because of an interaction gener-
ated by increasing intensity: it weakens statistical facilitation in first stage processing but simultane-
ously increases the probability of integration.
* This is because of a property of the exponential distribution: mean and SD are identical.
Modeling Multisensory Processes in Saccadic Responses 267
and Colonius (2007a). This decrease, however, no longer depended on whether target and nontarget
appeared at ipsilateral or contralateral positions, thus supporting the hypothesis that the nontarget
plays the role of a spatially unspecific alerting cue, or warning signal, for the upcoming target when-
ever the SOA is large enough.
The hypothesis of increased cross-modal processing triggered by an alerting cue had already
been advanced by Nickerson (1973), who called it “preparation enhancement.” In the eye movement
literature, the effects of a warning signal have been studied primarily in the context of explaining
the “gap effect,” i.e., the latency to initiate a saccade to an eccentric target is reduced by extinguish-
ing the fixation stimulus approximately 200 ms before target onset (Reuter-Lorenz et al. 1991;
Klein and Kingston 1993). An early study on the effect of auditory or visual warning signals on
saccade latency, but without considering multisensory integration effects, was conducted by Ross
and Ross (1981).
Here, the dual role of the nontarget—inducing multisensory integration that is governed by the
above-mentioned spatiotemporal rules, on the one hand, and acting as a spatially unspecific cross-
modal warning cue, on the other—will be taken into account by an extension of TWIN that yields
an estimate of the relative contribution of either mechanism for any specific SOA value.
(W) Assumption on warning mechanism: If the nontarget wins the processing race in the first stage
by a margin wide enough for the TWIN to be closed again before the arrival of the target, then subse-
quent processing will be facilitated or inhibited (“warning effect”) without dependence on the spatial
configuration of the stimuli.*
The time margin by which the nontarget may win against the target will be called head start denoted
as γ. The assumption stipulates that the head start is at least as large as the width of the time window
for a warning effect to occur. That is, the warning mechanism of the nontarget is triggered when-
ever the nontarget wins the race by a head start γ ≥ ω ≥ 0. Taking, for concreteness, the auditory as
nontarget modality, occurrence of a warning effect corresponds to the event:
W = {A + τ + γ < V}.
The probability of warning to occur, P(W), is a function of both τ and γ. Because γ ≥ ω ≥ 0 this
precludes the simultaneous occurrence of both warning and multisensory interaction within one
and the same trial and, therefore, P(I ∩ W) = 0 (because no confusion can arise, we write I for IFAP
throughout this section). The actual value of the head start criterion is a parameter to be estimated
in fitting the model under Assumption W.
The expected saccadic reaction time in the cross-modal condition in the TWIN model with
warning assumption can then be shown to be
E[RTcross-modal ] = E[ S1 ] + E[ S2 ]
= E[ S1 ] + E[ S2 | I c ∩ W c ] − P( I ) ⋅ {E[ S2 | I c ∩ W c ] − E[S2 | I ]}
* In the study of Diederich and Colonius 2008, an alternative version of this assumption was considered as well (ver-
sion B). If the nontarget wins the processing race in the first stage by a wide enough margin, then subsequent processing
will in part be facilitated or inhibited without dependence on the spatial configuration of the stimuli. This version is less
restrictive: All that is needed for the nontarget to act as a warning signal is a “large enough” headstart against the target
in the race and P(I ∩ W) can be larger than 0. Assuming that the effects on RT of the two events I and W, integration and
warning, combine additively, it can then be shown that the cross-modal interaction prediction of this model version is
captured by the same equation as under the original version, i.e., Equation 14.17 below. The only difference is in the order
restriction for the parameters, γ ≥ ω. Up to now, no empirical evidence has been found in favor of one of the two versions
over the other.
268 The Neural Bases of Multisensory Processes
where E[S2|I], E[S2|W], and E[S2|Ic ∩ Wc] denote the expected second stage processing time condi-
tioned on interaction occurring (I), warning occurring (W), or neither of them occurring (Ic ∩ Wc),
respectively (Ic, Wc stand for the complement of events I, W). Setting
∆ ≡ E[ S 2 | I c ∩ W c ] − E [ S 2 | I ]
κ ≡ E[ S 2 | I c ∩ W c ] − E[ S 2 | W ]
where κ denotes the amount of the warning effect (in milliseconds), this becomes
In the unimodal condition, neither integration nor warning are possible. Thus,
and we arrive at a simple expression for the combined effect of multisensory integration and warn-
ing, cross-modal interaction (CI),
Recall that the basic assumptions of TWIN imply that for a given spatial configuration and nontar-
get modality, there are no sign reversals or changes in magnitude of Δ across all SOA values. The
same holds for κ. Note, however, that Δ and κ can separately take on positive or negative values
(or zero) depending on whether multisensory integration and warning have a facilitative or inhibi-
tory effect. Furthermore, for the probability of integration P(I), the probability of warning P(W)
does change with SOA.
This expression is an observable function of SOA and, because the factor Δi – Δc does not depend
on SOA by Assumption B3, it should exhibit the same functional form as P(I): increasing and then
decreasing (see Figure 14.1, middle left panel).
Context effects. The magnitude of the warning effect may be influenced by the experimental design.
Specifically, presenting nontargets from different modalities in two distinct presentation modes, e.g.,
blocking or mixing the modality of the auditory and tactile nontargets within an experimental block of
trials, such that supposedly no changes in the expected amount of multisensory integration should occur,
then subtraction of the corresponding CI values yields, after canceling the integration effect terms,
a quantity that should decrease monotonically with SOA because P(W) does.
The extension of the model to include warning effects has been probed for both auditory and tac-
tile nontargets. Concerning the warning assumptions, no clear superiority of version A over version
150 150
Mean RT (ms)
Mean RT (ms)
140 140
130 130
1 1
Pr(W), Pr(I)
Pr(W)
0.5 0.5
0 0
12 12
8 8
MRE
MRE
4 4
0 0
−400 −300 −200 −100 0 100 200 −400 −300 −200 −100 0 100 200
SOA (ms) SOA (ms)
FIGURE 14.3 TWIN predictions for FAP when only warning occurs (left panels) and when both integra-
tion and warning occur (right panels). Parameters are chosen as before: 1/λV = 50 and μ = 100, resulting in
a mean RT for visual stimulus of 150 ms. Peripheral processing times for auditory stimuli are 1/λA = 10 ms
(dashed line), 1/λA = 30 ms (solid), 1/λA = 70 ms (dash-dotted), and 1/λA = 90 ms (black dotted).
270 The Neural Bases of Multisensory Processes
B was found in the data. For detailed results on all of the tests described above, we refer the reader
to Diederich and Colonius (2008).
SOA and intensity: quantitative predictions. To illustrate the predictions of TWIN with warning
for mean SRT, we choose the following set of parameters. As before, the intensity parameter for the
visual modality is set to 1/λV = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the (auditory) nontar-
get, the parameter for second stage processing time when no integration and no warning occurs, μ ≡
E[S2|Ic ∩ Wc], is set to 100 ms, and the TWIN to 200 ms. The parameter for multisensory integration
is set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, and κ is set to 5 ms (Figure 14.3).
* Strictly speaking, this only holds for the focused attention version of TWIN; for the redundant target version, an estimate
of the amount of statistical facilitation is required and can be attained empirically (cf. Colonius and Diederich 2006).
Modeling Multisensory Processes in Saccadic Responses 271
One of the most intriguing neurophysiological findings has been the suppression of multisensory
integration ability of superior colliculus neurons by a temporary suspension of corticotectal inputs
from the anterior ectosylvian sulcus and the lateral suprasylvian sulcus (Clemo and Stein 1986; Jiang
et al. 2001). A concomitant effect on multisensory orientation behavior observed in the cat (Jiang
et al. 2002) suggests the existence of more general cortical influences on multisensory integration.
Currently, there is no explicit provision of a top-down mechanism in the TWIN framework. Note,
however, that the influence of task instruction (FAP vs. RTP) is implicitly incorporated in TWIN
because the probability of integration is supposed to be computed differently under otherwise iden-
tical stimulus conditions (cf. Section 14.4.4). It is a challenge for future development to demonstrate
that the explicit incorporation of top-down processes can be reconciled with the two-stage structure
of the TWIN framework.
APPENDIX A
A.1 DERIVING THE PROBABILITY OF INTERACTION IN TWIN
The peripheral processing times V for the visual and A for the auditory stimulus have an exponential
distribution with parameters λV and λA, respectively. That is,
fV (t ) = λV e − λV t ,
fA (t ) = λA e − λA t
for t ≥ 0, and f V(t) = fA(t) ≡ 0 for t < 0. The corresponding distribution functions are referred to as
FV(t) and FA(t).
=
∫ f (x){F (x + τ + ω ) − F (x + τ )} dx,
A V V
where τ denotes the SOA value and ω is the width of the integration window. Computing the integral
expression requires that we distinguish between three cases for the sign of τ + ω:
−τ
P(I FAP ) =
∫ {
λA e − λA x 1 − e − λV ( x +τ +ω ) d x }
− τ −ω
+
−τ
∫ λ e {eA
− λA x − λV ( x +τ )
}
− e − λV ( x +τ +ω ) d x
λV
= e λAτ ( −1 + e λA ω ) ;
λV + λ A
272 The Neural Bases of Multisensory Processes
P(I FAP ) =
∫ λ e {1 − e
0
A
− λA x − λV ( x +τ +ω )
} dx
∞
+
∫ λ e {e
−τ
A
− λAx − λV ( x +τ )
− e − λV ( x +τ +ω ) d x }
1
=
λV + λ A
{λ A
(1 − e− λV (ω +τ )
) + λV (1 − e )}; λA τ
P(I FAP ) =
∫ λ e {e
0
A
− λA x − λV ( x +τ )
}
− e − λV ( x +τ +ω ) d x
λA
=
λV + λ A
{e − λV τ
− e − λ V (ω + τ ) }
1
= + − P( I FAP ) ⋅ ∆
λV
and the mean RT for the visual target is
1
E[RTV ] = + ,
λV
where 1/λV, the mean of the exponential distribution, is the mean RT of the first stage and μ is the
mean RT of the second stage when no interaction occurs.
(1) 0 ≤ τ ≤ ω
τ
P(I RTPV ) =
∫λ e V
− λV x (1 − e− λ A ( x +ω −τ )
) dx
0
+
∫λ e V
− λV x
(1 − e − λ A ( x +ω −τ )
− (1 − e − λA ( x −τ ) ) ) d x
τ
1
=
λV + λ A
{
λV (1 − e λA ( − ω +τ ) ) + λ A (1 − e( − λV τ ) ;
}
Modeling Multisensory Processes in Saccadic Responses 273
(2) 0 < ω ≤ τ
τ
P(I RTPV ) =
∫ λ x (1 − e
τ −ω
V
− λ A ( x +ω −τ ) ) dx
+
∫ λ e (1 − e
τ
V
− λV x − λ A ( x +ω − τ )
)
− (1 − e − λA ( x −τ ) ) d x
=
λV + λ A
λA
{e − λV τ
⋅ ( −1 + e λV ω ) }
P(I RTPA ) =
∫ λ e {e
0
A
− λA x − λ V ( x +τ )
}
− e − λ V ( x +τ +ω ) d x
λA
=
λV + λ A
{e − λV τ
− e − λ V ( ω +τ ) }
The probability that the visual or the auditory stimulus wins is therefore
1 1 1
= − e − λV τ ⋅ − + − P( I RTP ) ⋅ ∆
λV λV λV + λ A
and
1
E[RTA ] = + ,
λA
P(W ) = Pr ( A + τ + γ A < V )
∞
=
∫ f (x){1 − F (x + τ + γ
0
A V A )} d x
= 1−
∫ f ( x ) F (a + τ + γ
A V A ) d x.
0
274 The Neural Bases of Multisensory Processes
(1) τ + γA < 0
∞
P(W ) = 1 −
∫ {
λ A e − λ A a 1 − e − λ V ( a +τ + γ A ) d a }
− τ −γ A
λV
= 1− e λ A (τ + γ A ) ;
λV + λ A
(2) τ + γA ≥ 0
∞
P(W ) = 1 −
∫ λ e {1 − e
0
A
− λA a − λV ( a +τ + γ A )
} da
λA
= e − λV ( τ + γ A ) .
λV + λ A
1
= + − P( I FAP ) ⋅ ∆ − P(W ) ⋅ κ
λV
where 1/λV is the mean RT of the first stage, μ is the mean RT of the second stage when no inter-
action occurs, P(IFAP) · Δ is the expected amount of intersensory interaction, and P(W) · κ is the
expected amount of warning.
REFERENCES
Amlôt, R., R. Walker, J. Driver, and C. Spence. 2003. Multimodal visual-somatosensory integration in saccade
generation. Neuropsychologia 41:1–15.
Anastasio, T.J., P.E. Patton, and K. Belkacem-Boussaid. 2000. Using Bayes’ rule to model multisensory
enhancement in the superior colliculus. Neural Computation 12:1165–1187.
Arndt, A., and H. Colonius. 2003. Two separate stages in crossmodal saccadic integration: Evidence from vary-
ing intensity of an auditory accessory stimulus. Experimental Brain Research 150:417–426.
Bell, A.H., A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate
superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of
Neurophysiology 93:3659–3673.
Clemo, H.R., and B.E. Stein. 1986. Effects of cooling somatosensory corticotectal influences in cat. Journal of
Neurophysiology 55:1352–1368.
Colonius, H., and P. Arndt. 2001. A two-stage model for visual-auditory interaction in saccadic latencies.
Perception & Psychophysics, 63:126–147.
Colonius, H., and A. Diederich. 2002. A maximum-likelihood approach to modeling multisensory enhance-
ment. In Advances in Neural Information Processing Systems 14, T.G. Ditterich, S. Becker, and Z.
Ghahramani (eds.). Cambridge, MA: MIT Press.
Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-of-
integration model. Journal of Cognitive Neuroscience 16:1000–1009.
Colonius, H., and A. Diederich. 2006. Race model inequality: Interpreting a geometric measure of the amount
of violation. Psychological Review 113(1):148–154.
Colonius, H., and A. Diederich. 2010. The optimal time window of visual–auditory integration: A reaction time
analysis. Frontiers in Integrative Neuroscience, 4:11. doi:10.3389/fnint.2010.00011.
Modeling Multisensory Processes in Saccadic Responses 275
Colonius, H., and D. Vorberg. 1994. Distribution inequalities for parallel models with unlimited capacity.
Journal of Mathematical Psychology 38:35–58.
Colonius, H., A. Diederich, and R. Steenken. 2009. Time-window-of-integration (TWIN) model for saccadic
reaction time: Effect of auditory masker level on visual-auditory spatial interaction in elevation. Brain
Topography 21:177–184.
Corneil, B.D., and D.P. Munoz. 1996. The influence of auditory and visual distractors on human orienting gaze
shifts. Journal of Neuroscience 16:8193–8207.
Corneil, B.D., M. Van Wanrooij, D.P. Munoz, A.J. Van Opstal. 2002. Auditory-visual interactions subserving
goal-directed saccades in a complex scene. Journal of Neurophysiology 88:438–454.
Diederich, A. 1995. Intersensory facilitation of reaction time: Evaluation of counter and diffusion coactivation
models. Journal of Mathematical Psychology 39:197–215.
Diederich, A. 2008. A further test on sequential sampling models accounting for payoff effects on response bias
in perceptual decision tasks. Perception & Psychophysics 70(2):229–256.
Diederich, A., and H. Colonius. 2004a. Modeling the time course of multisensory interaction in manual and
saccadic responses. In Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein,
395–408. Cambridge, MA: MIT Press.
Diederich, A., and H. Colonius. 2004b. Bimodal and trimodal multisensory enhancement: Effects of stimulus
onset and intensity on reaction time. Perception & Psychophysics 66(8):1388–1404.
Diederich, A., and H. Colonius. 2007a. Why two “distractors” are better than one: Modeling the effect of
nontarget auditory and tactile stimuli on visual saccadic reaction time. Experimental Brain Research
179:43–54.
Diederich, A., and H. Colonius. 2007b. Modeling spatial effects in visual–tactile saccadic reaction time.
Perception & Psychophysics 69(1):56–67.
Diederich, A., and H. Colonius. 2008. Crossmodal interaction in saccadic reaction time: Separating multisensory
from warning effects in the time window of integration model. Experimental Brain Research 186:1–22.
Diederich, A., H. Colonius, D. Bockhorst, and S. Tabeling. 2003. Visual–tactile spatial interaction in saccade
generation. Experimental Brain Research 148:328–337.
Diederich, A., H. Colonius, and A. Schomburg. 2008. Assessing age-related multisensory enhancement with
the time-window-of-integration model. Neuropsychologia 46:2556–2562.
Doyle, M.C., and R. Walker. 2002. Multisensory interactions in saccade target selection: Curved saccade tra-
jectories Experimental Brain Research 142:116–130.
Driver, J., and C. Spence. 2004. Crossmodal spatial attention: Evidence from human performance. In Crossmodal
space and crossmodal attention, ed. C. Spence and J. Driver, 179–220. Oxford: Oxford Univ. Press.
Eimer, M. 2001. Crossmodal links in spatial attention between vision, audition, and touch: Evidence from
event-related brain potentials. Neuropsychologia 39:1292–1303.
Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995. Spatial and temporal factors determine auditory–
visual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–816.
Harrington, L.K., and C.K. Peck. 1998. Spatial disparity affects visual–auditory interactions in human senso-
rimotor processing. Experimental Brain Research 122:247–252.
Hershenson, M. 1962. Reaction time as a measure of intersensory facilitation. Journal of Experimental
Psychology 63:289–293.
Hughes, H.C., P.-A. Reuter-Lorenz, G. Nozawa, and R. Fendrich. 1994. Visual–auditory interactions in sen-
sorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology: Human
Perception and Performance 20:131–153.
Hughes, H.C., M.D. Nelson, and D.M. Aronchick. 1998. Spatial characteristics of visual–auditory summation
in human saccades. Vision Research 38:3955–3963.
Jiang, W., M.T. Wallace, H. Jiang, J.W. Vaughan, and B.E. Stein. 2001. Two cortical areas mediate multisensory
integration in superior colliculus neurons. Journal of Neurophysiology 85:506–522.
Jiang, W., H. Jiang, and B.E. Stein. 2002. Two cortical areas facilitate multisensory orientation behaviour.
Journal of Cognitive Neuroscience 14:1240–1255.
Körding, K.P., U. Beierholm, W.J. Ma, S. Quartz, J.B. Tenenbaum et al. 2007. Causal inference in multisensory
perception. PLoS ONE 2(9):e943, doi:10.1371/journal.pone.0000943.
Klein, R., and A. Kingstone. 1993. Why do visual offsets reduce saccadic latencies? Behavioral and Brain
Sciences 16(3):583–584.
Luce, R.D. 1986. Response times: Their role in inferring elementary mental organization. New York: Oxford
Univ. Press.
Meredith, M.A. 2002. On the neural basis for multisensory convergence: A brief overview. Cognitive Brain
Research 14:31–40.
276 The Neural Bases of Multisensory Processes
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662.
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons. I. Temporal factors. Journal of Neuroscience 10:3215–3229.
Miller, J.O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology
14:247–279.
Munoz, D.P., and R. H. Wurtz. 1995. Saccade-related activity in monkey superior colliculus. I. Characteristics
of burst and buildup cells. Journal of Neurophysiology 73:2313–2333.
Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asyn-
chronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain
Research 25:499–507.
Nickerson, R.S. 1973. Intersensory facilitation of reaction time: Energy summation or preparation enhance-
ment. Psychological Review 80:489–509.
Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of
Science 24:574–590.
Reuter-Lorenz, P.A., H.C. Hughes, and R. Fendrich. 1991. The reduction of saccadic latency by prior offset of
the fixation point: An analysis of the gap effect. Perception & Psychophysics 49(2):167–175.
Ross, S.M., and L.E. Ross. 1981. Saccade latency and warning signals: Effects of auditory and visual stimulus
onset and offset. Perception & Psychophysics 29(5):429–437.
Rowland, B.A., and B.E. Stein. 2008. Temporal profiles of response enhancement in multisensory integration.
Frontiers in Neuroscience 2:218–224.
Schweickert, R., D.L. Fisher, and K. Sung. Discovering Cognitive Architecture by Selectively Influencing
Mental Processes. London: World Scientific Publishing (in press).
Sinclair, C., and G.R. Hammond. 2009. Excitatory and inhibitory processes in primary motor cortex during
the foreperiod of a warned reaction time task are unrelated to response expectancy. Experimental Brain
Research 194:103–113.
Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current
Biology 13:R519–R521.
Stein, B.E., and Meredith M.A. 1993. The Merging of the Senses. Cambridge, MA: MIT Press.
Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory
integration apply. Brain Research 448:355–358.
Stein, B.E., W. Jiang, and T.R. Stanford. 2004. Multisensory integration in single neurons in the midbrain. In
Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein, 243–264. Cambridge,
MA: MIT Press.
Sternberg, S. 2001. Separate modifiability, mental modules, and the use of pure and composite measures to
reveal them. Acta Psychologica 106:147–246.
Todd, J.W. 1912. Reaction to multiple stimuli, in Archives of Psychology, No. 25. Columbia contributions to
philosophy and psychology, ed. R.S. Woodworth, Vol. XXI, No. 8, New York: The Science Press.
Townsend, J.T., and G. Nozawa. 1995. Spatio-temporal properties of elementary perception: An investigation
of parallel, serial, and coactive theories. Journal of Mathematical Psychology 39:321–359.
Van Opstal, A.J., and D.P. Munoz. 2004. Auditory–visual interactions subserving primate gaze orienting. In
Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein, 373–393. Cambridge,
MA: MIT Press.
Van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual
speech perception. Neuropsychologia 45:598–607.
Van Zandt, T. 2002. Analysis of response time distributions. In Stevens’ handbook of experimental psychology,
vol. 4, 3rd edn, ed. H. Pashler. New York: Wiley & Sons, Inc.
Whitchurch, E.A., and T.T. Takahashi. 2006. Combined auditory and visual stimuli facilitate head saccades in
the barn owl (Tyto alba). Journal of Neurophysiology 96:730–745.
Section IV
Development and Plasticity
15 The Organization and
Plasticity of Multisensory
Integration in the Midbrain
Thomas J. Perrault Jr., Benjamin A. Rowland, and Barry E. Stein
CONTENTS
15.1 Impact of Multisensory Integration....................................................................................... 279
15.2 Organization of Multisensory Organization in Adult SC......................................................280
15.3 SC Multisensory Integration Depends on Influences from Cortex....................................... 287
15.4 Ontogeny of SC Multisensory Integration............................................................................. 288
15.4.1 Impact of Developing in Absence of Visual–Nonvisual Experience........................ 289
15.4.2 Altering Early Experience with Cross-Modal Cues by Changing Their Spatial
Relationships.............................................................................................................. 291
15.4.3 Role of Cortical Inputs during Maturation................................................................ 291
15.4.4 Ontogeny of Multisensory Integration in Cortex...................................................... 292
15.4.5 Ontogeny of SC Multisensory Integration in a Primate............................................ 292
Acknowledgments........................................................................................................................... 294
References....................................................................................................................................... 294
A great deal of attention has been paid to the physiological processes through which the brain
integrates information from different senses. This reflects the substantial impact of this process on
perception, cognitive decisions, and overt behavior. Yet, less attention has been given to the postna-
tal development, organization, and plasticity associated with this process. In the present chapter we
examine what is known about the normal development of multisensory integration and how early
alterations in postnatal experience disrupt, change, and dramatically alter the fundamental proper-
ties of multisensory integration. The focus here is on the multisensory layers of the cat superior
colliculus (SC), a system that has served as an excellent model for understanding multisensory inte-
gration at the level of the single neuron and at the level of overt orientation behavior. Before discuss-
ing this structure’s normal development and its capacity to change, it is important to examine what
has been learned about multisensory integration and the functional role of the SC in this process.
279
280 The Neural Bases of Multisensory Processes
and Munoz 1996; Frens et al. 1995b; Ghazanfar et al. 2005; Ghazanfar and Schroeder 2006; Grant
et al. 2000; Hughes et al. 1994; King and Palmer 1985; Lakatos et al. 2007; Liotti et al. 1998; Marks
2004; Massaro 2004; Newell 2004; Partan 2004; Recanzone 1998; Sathian 2000, 2005; Sathian
et al. 2004; Schroeder and Foxe 2004; Senkowski et al. 2007; Shams et al. 2004; Stein et al. 1989;
Sugihara et al. 2006; Sumby and Pollack 1954; Talsma et al. 2006, 2007; Wallace et al. 1996;
Weisser et al. 2005; Woldorff et al. 2004; Woods and Recanzone 2004a, 2004b; Zangaladze et al.
1999). The facilitation of these capabilities has enormous survival value, so its retention and elabo-
ration in all extant species is no surprise. What is surprising is that despite the frequent discussion
of this phenomenon in adults (see Calvert et al. 2004b; Ghazanfar and Schroeder 2006; Spence and
Driver 2004; Stein and Meredith 1993), there is much less effort directed to understanding how this
process develops, and how it adapts to the environment in which it will be used.
The multisensory neuron in the cat SC is an excellent model system to explore the organization
and plasticity of multisensory integration. This is because it is not only the primary site of converg-
ing inputs from different senses (Fuentes-Santamaria et al. 2008; Stein et al. 1993; Wallace et al.
1993), but because it is involved in well-defined behaviors (orientation and localization), thereby
providing an opportunity to relate physiology to behavior. Furthermore, we already know a good
deal about the normal development of the unisensory properties of SC neurons (Kao et al. 1994;
Stein 1984) and SC neurons have been one of the richest sources of information about the ontogeny
and organization of multisensory integration (Barth and Brett-Green 2004; Calvert et al. 2004b;
Groh and Sparks 1996a, 1996b; Gutfreund and Knudsen 2004; Jay et al. 1987a, 1987b; King et al.
2004; Lakatos et al. 2007; Peck 1987b; Sathian et al. 2004; Senkowski et al. 2007; Stein 1984; Stein
and Arigbede 1972; Stein and Clamann 1981; Stein and Meredith 1993; Stein et al. 1973, 1976,
1993; Wallace 2004; Woods et al. 2004a).
Of the most interest in the present context are two experimental observations. The first is that
influences from the cortex are critical for the maturation of SC multisensory integration, the second
is that experience during early postnatal life guides the nature of that integrative process. These
are likely to be interrelated observations given the well-known plasticity of neonatal cortex. One
reasonable possibility is that experience is coded in the cortex and in the morphology and functional
properties of its connections with the SC.
Nas
al
Nasal
Tem
p ora Temporal
l Multisensory
r
erio
rior r Face
erio
Sup
Infe
Sup rior
Inf e
Body
Medial
sal
tral
Caudal Visual
Dor
Ven
Auditory
Somatosensory
FIGURE 15.1 Correspondence of visual, auditory, and somatosensory representations in SC. Horizontal
and vertical meridians of different sensory representations in SC suggest a common coordinate system rep-
resenting multisensory space. (From Stein, B.E., and Meredith, M.A., The merging of the senses, MIT Press,
Cambridge, 1993. With permission.)
these receptive fields are in spatial coincidence with each other (King et al. 1996; Meredith and Stein
1990; Meredith et al. 1991, 1992). Cross-modal stimuli that are in spatial and temporal coincidence
with one another and fall within the excitatory receptive fields of a given neuron function synergisti-
cally. They elicit more vigorous responses (more impulses) than are evoked by the strongest of them
individually. This is called “multisensory enhancement” and is illustrated in Figure 15.2. However,
when these same stimuli are disparate in space, such that one falls within its excitatory receptive
% Interaction
8 8
Mean impulses
Mean impulses
Sum I
6 50 N Sum 50
6
Ao Ai
4 4
* –47%
2 2
0 0 0 0
V Ai VAi V AoVAo
Visual RF
50
FIGURE 15.2 Multisensory enhancement and depression. Middle: visual (dark gray) and auditory (light
gray) receptive fields (RF) of this SC neuron are plotted on hemispheres representing visual and auditory
space. Each concentric circle represents 10° of space with right caudal aspect of auditory space represented
by the half hemisphere. White bar labeled V represents a moving visual stimulus, whereas speakers labeled
A0 and Ai represent auditory stimuli. Left: response enhancement occurred when visual and auditory stimuli
were placed in spatial congruence (VAi). Note, in plot to the left, multisensory response exceeded sum of
visual and auditory responses (horizontal dotted line) and was 94% greater than response to the most effec-
tive component stimulus (visual). Right: response depression occurred when visual and auditory stimuli were
spatially disparate (VA0) so that multisensory response was 47% less than response to visual stimulus.
282 The Neural Bases of Multisensory Processes
field and the other falls within the inhibitory portion of its receptive field, the result is “multisen-
sory depression.” Now the response consists of fewer impulses than that evoked by the most effec-
tive individual component stimulus. This ubiquitous phenomenon of enhancement and depression
has been described in the SC and cortex for a number of organisms ranging from the rat to the
human (Barth and Brett-Green 2004; Calvert et al. 2004b; DeGelder et al. 2004; Fort and Giard
2004; Ghazanfar and Schroeder 2006; King and Palmer 1985; Lakatos et al. 2007; Laurienti et al.
2002; Lovelace et al. 2003; Macaluso and Driver 2004; Meredith and Stein 1983, 1986a, 1986b,
1996; Morgan et al. 2008; Romanski 2007; Sathian et al. 2004; Schroeder et al. 2001; Schroeder and
Foxe 2002, 2004; Wallace and Stein 1994; Wallace et al. 1992, 1993, 1998, 2004b).
The clearest indicator that a neuron can engage in multisensory integration is its ability to show
multisensory enhancement because multisensory depression occurs only in a subset of neurons that
show multisensory enhancement (Kadunce et al. 2001). The magnitude of response enhancement
will vary dramatically, both among neurons across the population as well as within a particular
neuron throughout its dynamic range. This variation is in part due to differences in responses to dif-
ferent cross-modal stimulus combinations. When spatiotemporally aligned cross-modal stimuli are
poorly effective, multisensory response enhancement magnitudes are often proportionately greater
than those elicited when stimuli are robustly effective. Single neurons have demonstrated that mul-
tisensory responses are capable of exceeding predictions based on the simple addition of the two
unisensory responses. These superadditive interactions generally occur at the lower end of a given
neuron’s dynamic range and as stimulus effectiveness increases, multisensory responses tend to
exhibit more additive or subadditive interactions (Alvarado et al. 2007b; Perrault et al. 2003, 2005;
Stanford and Stein 2007; Stanford et al. 2005), a series of transitions that are consistent with the
concept of “inverse effectiveness” (Meredith and Stein 1986b), in which the product of an enhanced
multisensory interaction is proportionately largest when the effectiveness of the cross-modal stimuli
are weakest. Consequently, the proportionate benefits that accrue to performance based on this
neural process will also be greatest.
This makes intuitive sense because highly effective cues are generally easiest to detect, locate,
and identify. Using the same logic, the enhanced magnitude of a multisensory response is likely
to be proportionately largest at its onset, because it is at this point when the individual component
responses would be just beginning, and thus, weakest. Recent data suggests this is indeed the case
(Rowland et al. 2007a, 2007b; see Figure 15.3). This is of substantial interest because it means that
individual responses often, if not always, involve multiple underlying computations: superadditivity
at their onset and additivity (and perhaps subadditivity) as the response evolves. In short, the super-
additive multisensory computation may be far more common than previously thought, rendering the
initial portion of the response of far greater impact than would otherwise be the case and markedly
increasing its likely role in the detection and localization of an event.
Regarding computational modes, one should be cautious when interpreting multisensory response
enhancements from pooled samples of neurons. As noted earlier, the underlying computation varies
among neurons as a result of their inherent properties and the specific features of the cross-modal
stimuli with which they are evaluated. Many of the studies cited above yielded significant population
enhancements that appear “additive,” yet one cannot conclude from these data that this was their
default computation (e.g., Alvarado et al. 2007b; Perrault et al. 2005; Stanford et al. 2005). This is
because they were examined with a battery of stimuli whose individual efficacies were dispropor-
tionately high. Because of inverse effectiveness, combinations of such stimuli would, of course, be
expected to produce less robust enhancement and a high incidence of additivity (Stanford and Stein
2007). If those same neurons were tested with minimally effective stimuli exclusively, the incidence
of superadditivity would have been much higher. Furthermore, most neurons, regardless of the com-
putation that best describes their averaged response, exhibit superadditive computations at their onset,
when activity is weakest (Rowland and Stein 2007). It is important to consider that this initial portion
of a multisensory response may have the greatest impact on behavior (Rowland et al. 2007a).
The Organization and Plasticity of Multisensory Integration in the Midbrain 283
Qsum (# impulses)
3
VA
Trials
2
V
0 100 200 300 1 A
Time from V stim onset (ms) V
0
0 100 200 300
Trials
Event estimate
VA A
.25 V
Trials
VA
0 100 200 300 0
Time from V stim onset (ms) 0 100 200 300
Time from V stim onset (ms)
FIGURE 15.3 Temporal profile of multisensory enhancement. Left: impulse rasters illustrating responses
of a multisensory SC neuron to visual (V), auditory (A), and combined visual–auditory (VA) stimulation.
Right: two different measures of response show the same basic principle of “initial response enhancement.”
Multisensory responses are enhanced from their very onset and have shorter latencies than either of indi-
vidual unisensory responses. Upper right: measure is mean stimulus-driven cumulative impulse count (qsum),
reflecting temporal evolution of enhanced response. Bottom right: an instantaneous measure of response effi-
cacy using event estimates. Event estimates use an appropriate kernel function that convolves impulse spike
trains into spike density functions that differentiate spontaneous activity from stimulus-driven activity using
a mutual information measure. Spontaneous activity was then subtracted from stimulus-driven activity and a
temporal profile of multisensory integration was observed. (From Rowland, B.A., and Stein, B.E., Frontiers
in Neuroscience, 2, 218–224, 2008. With permission.)
This process of integrating information from different senses is computationally distinct from
the integration of information within a sense. This is likely to be the case, in large part, because the
multiple cues in the former provide independent estimates of the same initiating event whereas the
multiple cues in the latter contain substantial noise covariance (Ernst and Banks 2002). Using this
logic, one would predict that a pair of within-modal stimuli would not yield the same response
enhancement obtained with a pair of cross-modal stimuli even if both stimulus pairs were posi-
tioned at the same receptive field locations. On the other hand, one might argue that equivalent
results would be likely because, in both cases, the effect reflects the amount of environmental
energy. This latter argument posits that multiple, redundant stimuli explain the effect, rather than
some unique underlying computation (Gondan et al. 2005; Leo et al. 2008; Lippert et al. 2007;
Miller 1982; Sinnett et al. 2008).
The experimental results obtained by Alvarado and colleagues (Figure 15.4) argue for the former
explanation. The integration of cross-modal cues produced significantly greater response products
than did the integration of within-modal cues. The two integration products also reflected very dif-
ferent underlying neural computations, with the latter most frequently reflecting subadditivity—a
computation that was rarely observed with cross-modal cues (Alvarado et al. 2007b). Gingras et al.
(2009) tested the same assumption and came to the same conclusions using an overt behavioral mea-
sure in which cats performed a detection and localization task in response to cross-modal (visual–
auditory) and within-modal (visual–visual or auditory–auditory) stimulus combinations (Gingras et
al. 2009; Figure 15.5).
284 The Neural Bases of Multisensory Processes
25
20
y = 1.29x + 1.11
15
10
5 Significant difference
No significant difference
0
0 5 10 15 20 25 30
Best unisensory response (impulses)
30 R = 0.94
30 R = 0.94 25
20
25 15
10
20 5 y = 0.87x + 1.16
Multisensory neurons
0
0 5 10 15 20 25 30
15
30 R = 0.95
25
10 y = 0.91x + 0.77 20
15
5 Significant difference
10
y = 0.96x + 0.26
5
No significant difference Unisensory neurons
0 0
0 5 10 15 20 25 30
0 5 10 15 20 25 30
Best unisensory response (impulses)
FIGURE 15.4 Physiological comparisons of multisensory and unisensory integration. (a) Magnitude of
response evoked by a cross-modal stimulus (y-axis) is plotted against magnitude of largest response evoked
by component unisensory stimuli (x-axis). Most of observations show multisensory enhancement (positive
deviation from solid line of unity). (b) The same cannot be said for response magnitudes evoked by two within-
modal stimuli. Here, typical evoked response is not statistically better than that evoked by largest response to a
component stimulus. Within-modal responses are similar in both multisensory and unisensory neurons (insets
on right). (From Alvarado, J.C. et al., Journal of Neurophysiology 97, 3193–205, 2007b. With permission.)
Because the SC is a site at which modality-specific inputs from the different senses converge
(Meredith and Stein 1986b; Stein and Meredith 1993; Wallace et al. 1993), it is a primary site of
their integration, and is not a reflection of multisensory integration elsewhere in the brain. The
many unisensory structures from which these inputs are derived have been well-described (e.g., see
Edwards et al. 1979; Huerta and Harting 1984; Stein and Meredith 1993; Wallace et al. 1993). Most
multisensory SC neurons send their axons out of the structure to target motor areas of the brain-
stem and spinal cord. It is primarily via this descending route that the multisensory responses of
SC neurons effect orientation behaviors (Moschovakis and Karabelas 1985; Peck 1987a; Stein and
Meredith 1993; Stein et al. 1993). Thus, it is perhaps no surprise that the principles found govern-
ing the multisensory integration at the level of the individual SC neuron also govern SC-mediated
overt behavior (Burnett et al. 2004, 2007; Jiang et al. 2002, 2007; Stein et al. 1989; Wilkinson et
al. 1996).
(a)
100 +168% +123% +94% +137% +125% +147% +156%
* *
+79% +45% * +52% * +31% +32% * +63% +58% *
*
50 * * * * *
* *
% Accuracy
0
A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1 A1 V1 V2 V1V2V1A1
–45° –30° –15° 0° +15° +30° +45°
–50
0
50
100
150
30% 18% 17%
42%
C
NG W NG W +137%* 250
A1
C
W
–29% 200
C –45
27%
30
NG
–56%* 150 0
45
51% 65% 15
NG –30
W 100
V1A1
V1V2 –45 –15
% Enhancement
V1 C
24% 30
& –15
25% 36% 26% 50 45
C
V2 +49%* –30 0
No-Go Wrong 15 V1V2
location
0
W
+9% 20 25 30 35
Correct Best unisensory accuracy (%)
NG
–29%*
38%
The Organization and Plasticity of Multisensory Integration in the Midbrain
FIGURE 15.5 Multisensory integration was distinct from unisensory visual–visual integration. (a) At every spatial location, multisensory integration produced sub-
stantial performance enhancements (94–168%; mean, 137%), whereas unisensory visual integration produced comparatively modest enhancements (31–79%; mean,
49%). Asterisks indicate comparisons that were significantly different (χ2 test; P < 0.05). (b) Pie charts to left show performance in response to modality-specific auditory
(A1) and visual (V1 and V2 are identical) stimuli. Figures within the bordered region show performance to cross-modal (V1A1) and within-modal (V1V2) stimulus combi-
nations. No-Go errors (NG; gray) and Wrong Localization errors (W; white) were significantly decreased as a result of multisensory integration, but only No-Go errors
were significantly reduced as a result of unisensory integration. (c) Differential effect of multisensory and unisensory integration was reasonably constant, regardless of
effectiveness of best component stimulus, and both showed an inverse relationship, wherein benefits were greatest when effectiveness of component stimuli was lowest.
285
V, visual; A, auditory; C, correct. (From Gingras, G. et al., Journal of Neuroscience, 29, 4897–902, 2009. With permission.)
286 The Neural Bases of Multisensory Processes
20 20
15 15
Impulses
Impulses
10 10
5 5
0 0
Multisensory
AV1 AV2 AV3 AV4 AV5 AV1 AV2 AV3 AV4 AV5
20 20
15 15
Impulses
Impulses
10 10
5 5
0 0
VA
15 15 rLS
V
* AES
*
* 200 200
+180%
+89%
100 100
5 +58% 5
+37% 50 50
0% +9%
0% –14% –3%
0 0
0 0
V1 V2 V3 V4 V5 V1 V2 V3 V4 V5
Visual effectiveness Visual effectiveness
FIGURE 15.6 SC multisensory integration depends on influences from association cortex. SC responses to
auditory (A), visual (V), and multisensory (AV) stimuli were recorded before (left) and after (right) deactiva-
tion of association cortex. Visual stimulus was presented at multiple (five) levels of effectiveness. At the top of
the figure are individual stimulus traces, impulse rasters, and peristimulus time histograms for each response.
Graphs at bottom summarize these data showing mean response levels (lines) and percentage of multisensory
enhancement (bars) observed for each of stimulus pairings. Before cortical deactivation, enhanced responses
showed characteristic “inverse effectiveness” profile with larger unisensory responses associated with smaller
multisensory enhancements. However, after cortical deactivation (shaded region of inset), multisensory
enhancements were eliminated at each of stimulus effectiveness levels tested so that multisensory and unisen-
sory responses were no longer significantly different. (From Jiang, W. et al., Journal of Neurophysiology, 85,
506–22, 2001. With permission.)
The Organization and Plasticity of Multisensory Integration in the Midbrain 287
FAES
SIV
FIGURE 15.7 (See color insert.) SC neurons receive converging input from different sensory subregions of
anterior ectosylvian (association) cortex. Flourescent tracers were deposited in auditory (FAES; green) and soma-
tosensory (SIV; red) subregions. Axons of these cortical neurons often had boutons in contact with SC neurons,
and sometimes could be seen converging onto the same target neurons. Presumptive contact points are indicated
by arrows. (From Fuentes-Santamaria, V. et al., Cerebral Cortex, 18, 1640–52, 2008. With permission.)
288 The Neural Bases of Multisensory Processes
Rowland et al. (2007b) used these convergence patterns as the basis for an explanatory model
in which AES inputs and other inputs have different convergence patterns on the dendrites of
their SC target neurons (Rowland et al. 2007b; Figure 15.7). The model assumption of N-methyl-
d-aspartate (NMDA) (and 2-amino-3-(5-methyl-3-oxo-1,2-oxazol-4-yl)propanoic acid (AMPA))
receptors at every dendritic region provides the possibility of producing nonlinear interaction
between inputs that cluster in the same region. These clustering inputs are selectively those from
AES, and are preferentially on proximal dendrites. The currents they introduce affect one another,
and produce a nonlinear amplification through the NMDA receptors, something that the inputs
from non-AES areas cannot do because they are more computationally segregated from one
another. All inputs also contact a population of inhibitory interneurons, and these also contact
SC multisensory neurons, so that the output of the SC neuron depends on the relative balance of
excitatory inputs from the direct projecting inputs and the shunting inhibition via the inhibitory
interneurons.
(a) 70
60
% Multisensory neurons
50
16-20
eeks) 13-15 adult
a g e (w 9-1011-12
l
40 tn ata 7
8
s
Po 6
5
30 4
3
20 2 Unimodal
Multisensory
10 1
0
0 5 10 15 20 adult
Postnatal age (weeks)
(b) (c)
500 250
Receptive field size (% of adult value)
Somatosensory
Auditory
400 Somatosensory 200 Visual
Auditory
Visual
Mean latency (ms)
300 150
200 100
100 50
0 0
0 5 10 15 20 adult 0 5 10 15 adult
Postnatal age (weeks) Postnatal age (weeks)
FIGURE 15.8 Developmental chronology of SC multisensory neurons. (a) Percentage of multisensory neu-
rons as a proportion of sensory-responsive neurons in deep SC is shown as a function of postnatal age. Each
closed circle represents a single age, and increasing proportion of such neurons is also shown on pie charts.
(b) Rapid decrease in size of different receptive fields (as a percentage of mean adult value) of multisensory
neurons is shown as a function of postnatal age. (c) Decrease in response latencies of multisensory neurons to
each modality-specific stimulus is shown as a function of postnatal age. (From Wallace, M.T., and Stein, B.E.,
Journal of Neuroscience, 17, 2429–44, 1997. With permission.)
in darkness), and also in situations in which the spatial cues associated with common events were
perturbed. The first experimental condition tests the notion that in the absence of such experience,
multisensory integration would not develop, and the second tests the possibility that the specific
features of experience guide the formation of the principles governing multisensory integration.
(a) Dark-reared V A VA
75 150 225
15
Mean impulses/trial
% Change (MSI)
0 5 10 15 20 25
10
Impulses
V No
change
5
A +8%
0
0
V A VA
(b) Disparity-reared
75 150 225
15
Mean impulses/trial
Stimuli coincident in VRF
% Change (MSI)
0 5 10 15 20 25 No
10
Impulses
change
V A
5
+16%
0
0
V A VA
75 150 225
15
Stimuli coincident in ARF
Mean impulses/trial
% Change (MSI)
0 5 10 15 20 25
10
Impulses
No
V A change
5
–3%
0
V A VA
*
75 150 225
15
Stimuli disparate
Mean impulses/trial
% Change (MSI)
0 5 10 15 20 25
+144%
10
Impulses
V A
0 5
0
V A VA
FIGURE 15.9 Early experience influences receptive field and response properties of SC multisensory neu-
rons. Impact of dark rearing (a) and disparity rearing (b) on properties of adult multisensory neurons are
shown using two exemplar neurons. Rearing in absence of visual experience was characterized by large visual
and auditory receptive fields (a) that were more characteristic of neonates than adults. This neuron was typi-
cal of population of neurons from dark-reared animals. It was responsive to visual and auditory stimuli, but
its inexperience with visual–auditory stimuli was evident in its lack of ability to integrate those cross-modal
stimuli to producing an enhanced response. Responses from neuron depicted in panel (b) were characteristic
of those affected by a rearing environment in which visual and auditory stimuli were always spatially dispa-
rate. Its visual and auditory receptive fields did not develop normal spatial register, but were completely out
of alignment. It was also incapable of “normal” multisensory integration as indicated by absence of enhanced
responses to spatiotemporally aligned cross-modal stimuli (B1 and B2). Nevertheless, it did show multisen-
sory enhancement to spatially disparate stimuli (B3), revealing that its multisensory integrative properties had
been crafted to adapt them to presumptive environment in which they would be used. (Adapted from Wallace,
M.T. et al., Journal of Neuroscience, 24, 9580–4, 2004a; Wallace, M.T. et al., Proceedings of the National
Academy of Sciences of the United States of America, 101, 2167–72, 2004b; Wallace, M.T., and Stein, B.E.,
Journal of Neurophysiology, 97, 921–6, 2007.)
The Organization and Plasticity of Multisensory Integration in the Midbrain 291
were more characteristic of a neonate than of an adult animal. These neurons were also unable to
integrate their multiple sensory inputs as evidenced by the absence of visual–auditory integration
(Figure 15.9a). This too made them appear more like neonatal, or adults who have had association
cortex removed, than like adult animals (Jiang et al. 2006). These observations are consistent with
the idea that experience with cross-modal cues is necessary for integrating those cues.
If early experience does indeed craft the principles governing multisensory integration, changes in
those experiences should produce corresponding changes in those principles. Under normal circum-
stances, cross-modal events provide cues that have a high degree of spatial and temporal fidelity.
In short, the different sensory cues come from the same event, so they come from about the same
place at about the same time. Presumably, with extensive experience, the brain links stimuli from
the two senses by their temporal and spatial relationships. In that way, similar concordances among
cross-modal stimuli that are later encountered facilitate the detection, localization, and identifica-
tion of those initiating events.
Given those assumptions, any experimental changes in the physical relationships of the cross-
modal stimuli that are experienced during early life should be reflected in adaptations in the prin-
ciples governing multisensory integration. In short, they should be appropriate for that “atypical”
environment and inappropriate for the normal environment. To examine this expectation, a group
of cats was reared in a darkroom from birth to 6 months of age, and were periodically presented
with visual and auditory cues that were simultaneous, but derived from different locations in space
(Wallace and Stein 2007). This was accomplished by fixing speakers and light-emitting diodes to
different locations on the wall of the cages.
When SC neurons were then examined, many had developed visual–auditory responsiveness.
Most of them looked similar to those found in animals reared in the dark. They had very large recep-
tive fields, and were unable to integrate their visual–auditory inputs. The retention of these neonatal
properties was not surprising in light of the fact that these stimuli presented in an otherwise dark
room required no response, and were not associated with any consequence. However, there were a
substantial number of SC neurons in these animals that did appear to reflect their visual–auditory
experience. Their visual–auditory receptive fields had contracted as would be expected with sen-
sory experience, but they had also developed poor alignment. A number of them had no overlap
between them (see Figure 15.9b), a relationship almost never seen in animals reared in illuminated
conditions or in animals reared in the dark. However, it did reflect their unique rearing condition.
Most significant in the present context is that they could engage in multisensory integration.
However, only when the cross-modal stimuli were disparate in space were they able to fall simul-
taneously in their respective visual and auditory receptive fields. In this case, the magnitude of the
response to the cross-modal stimulus was significantly enhanced, just as in normally reared animals
when presented with spatially aligned visual–auditory stimuli. Similarly, the cross-modal stimulus
configurations that are spatially coincident fail to fall within the corresponding receptive fields of
the neuron, and the result is to produce response depression or no integration (see Kadunce et al.
2001; Meredith and Stein 1996). These observations are consistent with the prediction above, and
reveal that early experience with the simple temporal coincidence of the two cross-modal stimuli
was sufficient for the brain to link them, and initiate multisensory integration.
for this role. To test this idea, Rowland and colleagues (Stein and Rowland 2007) reversibly deacti-
vated both AES and rLS during the period (25–81 days postnatal) in which multisensory integration
normally develops (see Wallace and Stein 1997), so that their neurons were unable to participate in
these sensory experiences. This was accomplished by implanting a drug-infused polymer over these
cortical areas. The polymer would gradually release its store of muscimol, a gamma-aminobutyric
acid A (GABAa) receptor agonist that blocked neuronal activity. Once the stores of muscimol were
depleted over many weeks, or the polymer was physically removed, these cortical areas would once
again become active and responsive to external stimulation. As predicted, SC neurons in these ani-
mals were unable to integrate their visual and auditory inputs to enhance their responses. Rather,
their responses were no greater to the cross-modal combination of stimuli than they were to the
most effective of its component stimuli. Furthermore, comparable deficits were apparent in overt
behavior. Animals were no better at localizing a cross-modal stimulus than they were at local-
izing the most effective of its individual component stimuli. Although these data do not prove the
point, they do suggest that the cortical component of the SC multisensory circuit is a critical site
for incorporating the early sensory experiences required for the development of SC multisensory
integration.
spatially aligned and spatially disparate cross-modal stimuli. Although there may seem to be no a
priori reason to assume that their maturation would depend on different factors than those of the
cat, the monkey, unlike the cat, is a precocial species. Its SC neurons have comparatively more time
to develop in utero than do those of the cat. Of course, they also have to do so in the dark, making
one wonder if the late in utero visual-free experiences of the monkey have some similarity to the
visual-free environment of the dark-reared cat.
Wallace and Stein (2001) examined the multisensory properties of the newborn monkey SC and
found that, unlike the SC of the newborn cat, there were already multisensory neurons present (Wallace
et al. 2001; Figure 15.10). However, as in the cat SC, these multisensory neurons were unable to inte-
grate visual–nonvisual inputs. Their responses to combinations of coincident visual and auditory or
somatosensory cues were no better than were their responses to the most effective of these component
stimuli individually. Although there is no data regarding when they develop this capacity, and whether
dark-rearing would preclude its appearance, it seems highly likely that the monkey shares the same
developmental antecedents for the maturation of multisensory integration as the cat.
Recent reports in humans suggest that this may be a general mammalian plan. People who have
experienced early visual deprivation due to dense congenital cataracts were examined many years
after surgery to remove those cataracts. The observations are consistent with predictions that would
be made from the animal studies. Specifically, their vision appeared to be normal, but their ability
to integrate visual–nonvisual information was significantly less well developed than in normal sub-
jects. This ability was compromised in a variety of tasks including those that involved speech and
those that did not (Putzar et al. 2007).
Whether neurons in the human SC, like those in the SC of cat and monkey, are incapable of
multisensory integration is not yet known. However, human infants do poorly on tasks requiring
the integration of visual and auditory information to localize events before 8 months of age (Neil
Multisensory Adult
neurons (28%)
AS (0.9%) VAS
Multisensory VS 6.5%
neurons (14.7%) 9.3%
VAS (1%) VA 11.1% Visual
AS (1%) 37.0%
VS
17.6%
VA 17.6%
7.4 %
5.3 % Somatosensory
Auditory
Modality-specific
neurons (72%)
49.5 % Visual
23.2 %
Somatosensory
12.6 %
Newborn
Auditory Modality-specific
neurons (85.3%)
FIGURE 15.10 Modality convergence patterns in SC of newborn and adult (inset) monkey. Pie charts show
distributions of all recorded sensory-responsive neurons in multisensory laminas (IV–VII) of SC. (From
Wallace, M.T., and Stein, B.E., Journal of Neuroscience, 21, 8886–94, 2001. With permission.)
294 The Neural Bases of Multisensory Processes
et al. 2006), and do poorly on tasks requiring the integration of visual and haptic information before
8 years of age (Gori et al. 2008). These data indicate that multisensory capabilities develop over
far longer periods in the human brain than in the cat brain, an observation consistent with the long
period of postnatal life devoted to human brain maturation. These observations, coupled with those
indicating that early sensory deprivation has a negative effect on multisensory integration even far
later in life suggests that early experience with cross-modal cues is essential for normal multisen-
sory development in all higher-order species. If so, we can only wonder how well the human brain
can adapt its multisensory capabilities to the introduction of visual or auditory input later in life
via prosthetic devices. Many people who had congenital hearing impairments, and later received
cochlear implants, have shown remarkable accommodation to them. They learn to use their newly
found auditory capabilities with far greater precision than one might have imagined when they were
first introduced. Nevertheless, it is not yet known whether they can use them in concert with other
sensory systems. Although the population of people with retinal implants is much smaller, there are
very encouraging reports among them as well. However, the same questions apply: Are they able
to acquire the ability to engage in some forms of multisensory integration after experience with
visual–auditory cues later in life and, if so, how much experience and what kinds of experiences are
necessary for them to develop this capability? These issues remain to be determined.
ACKNOWLEDGMENTS
The research described here was supported in part by NIH grants NS36916 and EY016716.
REFERENCES
Alvarado, J.C., T.R. Stanford, J.W. Vaughan, and B.E. Stein. 2007a. Cortex mediates multisensory but not
unisensory integration in superior colliculus. Journal of Neuroscience 27:12775–86.
Alvarado, J.C., J.W. Vaughan, T.R. Stanford, and B.E. Stein. 2007b. Multisensory versus unisensory integra-
tion: Contrasting modes in the superior colliculus. Journal of Neurophysiology 97:3193–205.
Barth, D.S., and B. Brett-Green. 2004. Multisensory-Evoked Potentials in Rat Cortex. In The handbook of mul-
tisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 357–70. Cambridge, MA: MIT Press.
Bernstein, L.E., J. Edward, T. Auer, and J.K. Moore. 2004. Audiovisual Speech Binding: Convergence or
Association. In Handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein,
203–23. Cambridge, MA: MIT Press.
Burnett, L.R., B.E. Stein, D. Chaponis, and M.T. Wallace. 2004. Superior colliculus lesions preferentially dis-
rupt multisensory orientation. Neuroscience 124:535–47.
Burnett, L.R., B.E. Stein, T.J. Perrault Jr., and M.T. Wallace. 2007. Excitotoxic lesions of the superior colliculus
preferentially impact multisensory neurons and multisensory integration. Experimental Brain Research
179:325–38.
Busse, L., K.C. Roberts, R.E. Crist, D.H. Weissman, and M.G. Woldorff. 2005. The spread of attention across
modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the
United States of America 102:18751–6.
Calvert, G., C. Spence, and B.E. Stein. 2004a. The handbook of multisensory processes. Cambridge, MA: MIT
Press.
Calvert, G. A., and J. Lewis, W. 2004b. Hemodynamic Studies of Audiovisual Interactions. In The Handbook of
Multisensory Processes, ed. G. A. Calvert, C. Spence, and B.E. Stein, 483–502. Cambridge, MA: MIT Press.
Carriere, B.N., D.W. Royal, T.J. Perrault et al. 2007. Visual deprivation alters the development of cortical mul-
tisensory integration. Journal of Neurophysiology 98:2858–67.
Corneil, B.D., and D.P. Munoz. 1996. The influence of auditory and visual distractors on human orienting gaze
shifts. Journal of Neuroscience 16:8193–207.
DeGelder, B., J. Vroomen, and G. Pourtois. 2004. Multisensory Perception of Emotion, Its Time Course, and
Its Neural Basis. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein,
581–96. Cambridge, MA: MIT Press.
Edwards, S.B., C.L. Ginsburgh, C.K. Henkel, and B.E. Stein. 1979. Sources of subcortical projections to the
superior colliculus in the cat. Journal of Comparative Neurology 184:309–29.
The Organization and Plasticity of Multisensory Integration in the Midbrain 295
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415:429–33.
Fort, A., and M.-H. Giard. 2004. Multiple Electrophysiological Mechanisms of Audiovisual Integration in
Human Perception. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E.
Stein, 503–13. Cambridge, MA: MIT Press.
Frens, M.A., and A.J. Van Opstal. 1995a. A quantitative study of auditory-evoked saccadic eye movements in
two dimensions. Experimental Brain Research 107:103–17.
Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995b. Spatial and temporal factors determine auditory-
visual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–16.
Fuentes-Santamaria, V., J.C., Alvarado, B.E., Stein, and J.G. McHaffie. 2008. Cortex contacts both output
neurons and nitrergic interneurons in the superior colliculus: Direct and indirect routes for multisensory
integration. Cerebral Cortex 18:1640–52.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–285.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Gingras, G., B.A. Rowland, and B.E. Stein. 2009. The differing impact of multisensory and unisensory integra-
tion on behavior. Journal of Neuroscience 29:4897–902.
Gondan, M., B., Niederhaus, F. Rosler, and B. Roder. 2005. Multisensory processing in the redundant-target
effect: A behavioral and event-related potential study. Perception & Psychophysics 67:713–26.
Gori, M., M. Del Viva, G. Sandini, and D.C. Burr. 2008. Young children do not integrate visual and haptic form
information. Current Biology 18:694–8.
Grant, A.C., M.C. Thiagarajah, and K. Sathian. 2000. Tactile perception in blind Braille readers: A psy-
chophysical study of acuity and hyperacuity using gratings and dot patterns. Perception & Psychophysics
62:301–12.
Grantyn, A., and R. Grantyn. 1982. Axonal patterns and sites of termination of cat superior colliculus neurons
projecting in the tecto-bulbo-spinal tract. Experimental Brain Research 46:243–56.
Groh, J.M., and D.L. Sparks. 1996a. Saccades to somatosensory targets: II. Motor convergence in primate
superior colliculus. Journal of Neurophysiology 75:428–38.
Groh, J.M., and D.L. Sparks. 1996b. Saccades to somatosensory targets: III. Eye-position-dependent soma-
tosensory activity in primate superior colliculus. Journal of Neurophysiology 75:439–53.
Guitton, D., and D.P. Munoz. 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the
head-free cat: I. Identification, localization, and effects of behavior on sensory responses. Journal of
Neurophysiology 66:1605–23.
Gutfreund, Y., and E.I. Knudsen. 2004. Visual Instruction of the Auditory Space Map in the Midbrain. In The handbook
of multisensory processes, ed. G.A. Calvert, C. Spence and B.E. Stein, 613–24. Cambridge, MA: MIT Press.
Harris, L.R. 1980. The superior colliculus and movements of the head and eyes in cats. Journal of Physiology
300:367–91.
Huerta, M.F., and J.K. Harting. 1984. The mammalian superior colliculus: Studies of its morphology and con-
nections. In Comparative neurology of the optic tectum, ed. H. Vanegas, 687–773. New York: Plenum
Publishing Corporation.
Hughes, H.C., P.A. Reuter-Lorenz, G. Nozawa, and R. Fendrich. 1994. Visual–auditory interactions in senso-
rimotor processing: Saccades versus manual responses. Journal of Experimental Psychology. Human
Perception and Performance 20:131–53.
Jay, M.F., and D.L. Sparks. 1984. Auditory receptive fields in primate superior colliculus shift with changes in
eye position. Nature 309:345–7.
Jay, M.F., and D.L. Sparks. 1987a. Sensorimotor integration in the primate superior colliculus: I. Motor conver-
gence. Journal of Neurophysiology 57:22–34.
Jay, M.F., and D.L. Sparks. 1987b. Sensorimotor integration in the primate superior colliculus: II. Coordinates
of auditory signals. Journal of Neurophysiology 57:35–55.
Jiang, W., and B.E. Stein. 2003. Cortex controls multisensory depression in superior colliculus. Journal of
Neurophysiology 90:2123–35.
Jiang, W., M.T. Wallace, H. Jiang, J.W. Vaughan, and B.E. Stein. 2001. Two cortical areas mediate multisensory
integration in superior colliculus neurons. Journal of Neurophysiology 85:506–22.
Jiang, W., H. Jiang, and B.E. Stein. 2002. Two corticotectal areas facilitate multisensory orientation behavior.
Journal of Cognitive Neuroscience 14:1240–55.
Jiang, H., B.E. Stein, and J.G. McHaffie. 2003. Opposing basal ganglia processes shape midbrain visuomotor
activity bilaterally. Nature 423:982–6.
296 The Neural Bases of Multisensory Processes
Jiang, W., H. Jiang, B.A. Rowland, and B.E. Stein. 2007. Multisensory orientation behavior is disrupted by
neonatal cortical ablation. Journal of Neurophysiology 97:557–62.
Jiang, W., H. Jiang, and B.E. Stein. 2006. Neonatal cortical ablation disrupts multisensory development in
superior colliculus. Journal of Neurophysiology 95:1380–96.
Kadunce, D.C., J.W. Vaughan, M.T. Wallace, and B.E. Stein. 2001. The influence of visual and auditory
receptive field organization on multisensory integration in the superior colliculus. Experimental Brain
Research 139:303–10.
Kao, C.Q., B.E. Stein, and D.A. Coulter. 1994. Postnatal development of excitatory synaptic function in deep
layers of SC. Society of Neuroscience Abstracts.
King, A.J., T.P. Doubell, and I. Skaliora. 2004. Epigenetic factors that align visual and auditory maps in the
ferret midbrain. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein,
599–612. Cambridge, MA: MIT Press.
King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the
guinea-pig superior colliculus. Experimental Brain Research. 60:492–500.
King, A.J., J.W. Schnupp, S. Carlile, A.L. Smith, and I.D. Thompson. 1996. The development of topographi-
cally-aligned maps of visual and auditory space in the superior colliculus. Progress in Brain Research
112:335–50.
Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multi-
sensory interaction in primary auditory cortex. Neuron 53:279–92.
Larson, M.A., and B.E. Stein. 1984. The use of tactile and olfactory cues in neonatal orientation and localiza-
tion of the nipple. Developmental Psychobiology 17:423–36.
Laurienti, P.J., J.H. Burdette, M.T. Wallace et al. 2002. Deactivation of sensory-specific cortex by cross-modal
stimuli. Journal of Cognitive Neuroscience 14:420–9.
Leo, F., N. Bolognini, C. Passamonti, B.E. Stein, and E. Ladavas. 2008. Cross-modal localization in hemiano-
pia: New insights on multisensory integration. Brain 131: 855–65.
Liotti, M., K. Ryder, and M.G. Woldorff. 1998. Auditory attention in the congenitally blind: Where, when and
what gets reorganized? Neuroreport 9:1007–12.
Lippert, M., N.K. Logothetis, and C. Kayser. 2007. Improvement of visual contrast detection by a simultaneous
sound. Brain Research 1173:102–9.
Lovelace, C.T., B.E. Stein, and M.T. Wallace. 2003. An irrelevant light enhances auditory detection in humans:
A psychophysical analysis of multisensory integration in stimulus detection. Cognitive Brain Research
17:447–453.
Macaluso, E., and J. Driver. 2004. Functional imaging evidence for multisensory spatial representations and
cross-modal attentional interactions in the human brain. In The handbook of multisensory processes, ed.
G.A. Calvert, C. Spence, and B.E. Stein, 529–48. Cambridge, MA: MIT Press.
Marks, L.E. 2004. Cross-modal interactions in speeded classification. In The handbook of multisensory pro-
cesses, ed. G.A. Calvert, C. Spence, and B.E. Stein, 85–106. Cambridge, MA: MIT Press.
Massaro, D.W. 2004. From multisensory integration to talking heads and language learning. In The handbook of
multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 153–76. Cambridge, MA: MIT Press.
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–91.
Meredith, M.A., and B.E. Stein. 1986a. Spatial factors determine the activity of multisensory neurons in cat
superior colliculus. Brain Research 365:350–4.
Meredith, M.A., and B.E. Stein. 1986b. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62.
Meredith, M.A., and B.E. Stein. 1990. The visuotopic component of the multisensory map in the deep laminae
of the cat superior colliculus. Journal of Neuroscience 10:3727–42.
Meredith, M.A., and B.E. Stein. 1996. Spatial determinants of multisensory integration in cat superior collicu-
lus neurons. Journal of Neurophysiology 75:1843–57.
Meredith, M.A., H.R. Clemo, and B.E. Stein. 1991. Somatotopic component of the multisensory map in the
deep laminae of the cat superior colliculus. Journal of Comparative Neurology 312:353–70.
Meredith, M.A., M.T. Wallace, and B.E. Stein. 1992. Visual, auditory and somatosensory convergence in output
neurons of the cat superior colliculus: Multisensory properties of the tecto-reticulo-spinal projection.
Experimental Brain Research 88:181–6.
Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus.
Journal of Neuroscience 4:2621–34.
Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology
14:247–79.
The Organization and Plasticity of Multisensory Integration in the Midbrain 297
Morgan, M.L., G.C. Deangelis, and D.E. Angelaki. 2008. Multisensory integration in macaque visual cortex
depends on cue reliability. Neuron 59:662–73.
Moschovakis, A.K., and A.B. Karabelas. 1985. Observations on the somatodendritic morphology and axonal
trajectory of intracellularly HRP-labeled efferent neurons located in the deeper layers of the superior col-
liculus of the cat. Journal of Comparative Neurology 239:276–308.
Munoz, D.P., and R.H. Wurtz. 1993a. Fixation cells in monkey superior colliculus. I. Characteristics of cell
discharge. Journal of Neurophysiology 70:559–75.
Munoz, D.P., and R.H. Wurtz. 1993b. Fixation cells in monkey superior colliculus: II. Reversible activation and
deactivation. Journal of Neurophysiology 70:576–89.
Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory
spatial integration and perception in humans. Developmental Science 9:454–64.
Newell, F.N. 2004. Cross-modal object recognition. In The handbook of multisensory processes, ed. G.A.
Calvert, C. Spence, and B.E. Stein, 123–39: Cambridge, MA: MIT Press.
Partan, S.R. 2004. Multisensory animal communication. In The handbook of multisensory processes, ed. G.A.
Calvert, C. Spence, and B.E. Stein, 225–40. Cambridge, MA: MIT Press.
Peck, C.K. 1987a. Saccade-related burst neurons in cat superior colliculus. Brain Research 408:329–33.
Peck, C.K. 1987b. Visual–auditory interactions in cat superior colliculus: Their role in the control of gaze.
Brain Research 420:162–6.
Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2003. Neuron-specific response characteristics
predict the magnitude of multisensory integration. Journal of Neurophysiology 90:4022–6.
Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use distinct
operational modes in the integration of multisensory stimuli. Journal of Neurophysiology 93:2575–86.
Putzar, L., I. Goerendt, K. Lange, F. Rosler, and B. Roder. 2007. Early visual deprivation impairs multisensory
interactions in humans. Nature Neuroscience 10:1243–5.
Recanzone, G.H. 1998. Rapidly induced auditory plasticity: The ventriloquism aftereffect. Proceedings of the
National Academy of Sciences of the United States of America 95:869–75.
Romanski, L.M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral
lateral prefrontal cortex. Cerebral Cortex 17(Suppl 1):i61–9.
Rowland, B.A., and B.E. Stein. 2007. Multisensory integration produces an initial response enhancement.
Frontiers in Integrative Neuroscience 1:4.
Rowland, B.A., and B.E. Stein. 2008. Temporal profiles of response enhancement in multisensory integration.
Frontiers in Neuroscience 2:218–24.
Rowland, B.A., S. Quessy, T.R. Stanford, and B.E. Stein. 2007a. Multisensory integration shortens physiologi-
cal response latencies. Journal of Neuroscience 27:5879–84.
Rowland, B.A., T.R. Stanford, and B.E. Stein. 2007b. A model of the neural mechanisms underlying multisen-
sory integration in the superior colliculus. Perception 36:1431–43.
Royal, D.W., B.N. Carriere, and M.T. Wallace. 2009. Spatiotemporal architecture of cortical receptive fields
and its impact on multisensory interactions. Experimental Brain Research 198:127–36.
Sathian, K. 2000. Practice makes perfect: Sharper tactile perception in the blind. Neurology 54:2203–4.
Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived.
Developmental Psychobiology 46:279–86.
Sathian, K., S.C. Prather, and M. Zhang. 2004. Visual cortical involvement in normal tactile perception. In The
handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 703–9. Cambridge,
MA: MIT Press.
Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas
of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–98.
Schroeder, C. E., and J.J. Foxe. 2004. Multisensory convergence in early cortical processing. In The handbook
of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 295–309. Cambridge, MA: MIT
Press.
Schroeder, C.E., R.W. Lindsley, C. Specht et al. 2001. Somatosensory input to auditory association cortex in
the macaque monkey. Journal of Neurophysiology 85:1322–7.
Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisen-
sory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations.
Neuropsychologia 45:561–71.
Shams, L., Y. Kamitani, and S. Shimojo. 2004. Modulations of visual perception by sound. In The handbook of
multisensoty processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 27–33. Cambridge, MA: MIT Press.
Sinnett, S., S. Soto-Faraco, and C. Spence. 2008. The co-occurrence of multisensory competition and facilita-
tion. Acta Psychologica 128:153–61.
298 The Neural Bases of Multisensory Processes
Sparks, D.L. 1986. Translation of sensory signals into commands for control of saccadic eye movements: Role
of primate superior colliculus. Physiological Reviews 66:118–71.
Sparks, D.L., and J.S. Nelson. 1987. Sensory and motor maps in the mammalian superior colliculus. Trends in
Neuroscience 10:312–7.
Spence, C., and J. Driver. 2004. Crossmodal space and crossmodal attention. Oxford: Oxford Univ. Press.
Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in
context. Neuroreport 18:787–92.
Stanford, T.R., S. Quessy, and B.E. Stein. 2005. Evaluating the operations underlying multisensory integration
in the cat superior colliculus. Journal of Neuroscience 25:6499–508.
Stein, B.E. 1984. Development of the superior colliculus. Annual Review of Neuroscience 7:95–125.
Stein, B.E., and M.O. Arigbede. 1972. Unimodal and multimodal response properties of neurons in the cat’s
superior colliculus. Experimental Neurology 36:179–96.
Stein, B.E., and H.P. Clamann. 1981. Control of pinna movements and sensorimotor register in cat superior
colliculus. Brain, Behavior and Evolution 19:180–92.
Stein, B.E., and H.L. Gallagher. 1981. Maturation of cortical control over superior colliculus cells in cat. Brain
Research 223:429–35.
Stein, B.E., and M.A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press.
Stein, B.E., and B.A. Rowland. 2007. The critical role of cortico-collicular interactions in the development of
multisensory integration. Paper presented at the Society for Neuroscience.
Stein, B.E., E. Labos, and L. Kruger. 1973. Sequence of changes in properties of neurons of superior colliculus
of the kitten during maturation. Journal of Neurophysiology 36:667–79.
Stein, B.E., B. Magalhaes-Castro, and L. Kruger. 1976. Relationship between visual and tactile representations
in cat superior colliculus. Journal of Neurophysiology 39:401–19.
Stein, B.E., R.F. Spencer, and S.B. Edwards. 1984. Efferent projections of the neonatal cat superior colliculus:
Facial and cerebellum-related brainstem structures. Journal of Comparative Neurology 230:47–54.
Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory inte-
gration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience
1:12–24.
Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: Multisensory
integration in cat and monkey. Progress in Brain Research 95:79–90.
Stein, B.E., M.W. Wallace, T.R. Stanford, and W. Jiang. 2002. Cortex governs multisensory integration in the
midbrain. Neuroscientist 8:306–14.
Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communi-
cation information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–47.
Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibillity in noise. Journal of the
Acoustical Society of America 26:212–5.
Talsma, D., T.J. Doty, R. Strowd, and M.G. Woldorff. 2006. Attentional capacity for processing concurrent
stimuli is larger across sensory modalities than within a modality. Psychophysiology 43:541–9.
Talsma, D., T.J. Doty, and M.G. Woldorff. 2007. Selective attention and audiovisual integration: Is attending to
both modalities a prerequisite for early integration? Cerebral Cortex 17:679–90.
Wallace, M.T. 2004. The development of multisensory integration. In The handbook of multisensory processes,
ed. G.A. Calvert, C. Spence, and B.E. Stein, 625–42. Cambridge, MA: MIT Press.
Wallace, M.T., and B.E. Stein. 1994. Cross-modal synthesis in the midbrain depends on input from cortex.
Journal of Neurophysiology 71:429–32.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience 17:2429–44.
Wallace, M.T., and B.E. Stein. 2000. Onset of cross-modal synthesis in the neonatal superior colliculus is gated
by the development of cortical influences. Journal of Neurophysiology 83:3578–82.
Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior col-
liculus. Journal of Neuroscience 21:8886–94.
Wallace, M.T., and B.E. Stein. 2007. Early experience determines how the senses will interact. Journal of
Neurophysiology 97:921–6.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory modalities in cat cortex.
Experimental Brain Research 91:484–8.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1993. Converging influences from visual, auditory, and
somatosensory cortices onto output neurons of the superior colliculus. Journal of Neurophysiology
69:1797–809.
The Organization and Plasticity of Multisensory Integration in the Midbrain 299
Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs
in primate superior colliculus. Journal of Neurophysiology 76:1246–66.
Wallace, M.T., M.A. Meredith, and B.E. Stein. 1998. Multisensory integration in the superior colliculus of the
alert cat. Journal of Neurophysiology 80:1006–10.
Wallace, M.T., W.D. Hairston, and B.E. Stein. 2001. Long-term effects of dark-rearing on multisensory pro-
cessing. Paper presented at the Society for Neuroscience.
Wallace, M.T., T.J. Perrault Jr., W.D., Hairston, and B.E. Stein. 2004a. Visual experience is necessary for the
development of multisensory integration. Journal of Neuroscience 24:9580–4.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004b. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–72.
Wallace, M.T., B.N. Carriere, T.J. Perrault Jr., J.W. Vaughan, and B.E. Stein. 2006. The development of cortical
multisensory integration. Journal of Neuroscience 26:11844–9.
Weisser, V., R. Stilla, S. Peltier, X. Hu, and K. Sathian. 2005. Short-term visual deprivation alters neural pro-
cessing of tactile form. Experimental Brain Research 166:572–82.
Wilkinson, L.K., M.A. Meredith, and B.E. Stein. 1996. The role of anterior ectosylvian cortex in cross-modality
orientation and approach behavior. Experimental Brain Research 112:1–10.
Woldorff, M.G., C.J. Hazlett, H.M. Fichtenholtz et al. 2004. Functional parcellation of attentional control
regions of the brain. Journal of Cognitive Neuroscience 16:149–65.
Woods, T.M., and G.H. Recanzone. 2004a. Cross-modal interactions evidenced by the ventriloquism effect in
humans and monkeys. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E.
Stein, 35–48. Cambridge, MA: MIT Press.
Woods, T.M., and G.H. Recanzone. 2004b. Visually induced plasticity of auditory spatial perception in
macaques. Current Biology 14:1559–64.
Wurtz, R.H., and J.E. Albano. 1980. Visual–motor function of the primate superior colliculus. Annual Review
of Neuroscience 3:189–226.
Wurtz, R.H., and M.E. Goldberg. 1971. Superior colliculus cell responses related to eye movements in awake
monkeys. Science 171:82–4.
Zangaladze, A., C.M. Epstein, S.T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile dis-
crimination of orientation. Nature 401:587–90.
16 Effects of Prolonged
Exposure to Audiovisual
Stimuli with Fixed Stimulus
Onset Asynchrony on
Interaction Dynamics
between Primary Auditory
and Primary Visual Cortex
Antje Fillbrandt and Frank W. Ohl
CONTENTS
16.1 Introduction...........................................................................................................................302
16.1.1 Speed of Signal Transmission Is Modality-Specific.................................................. 303
16.1.2 Simultaneity Constancy............................................................................................. 303
16.1.3 Temporal Recalibration............................................................................................. 303
16.1.4 Mechanisms of Temporal Recalibration....................................................................304
16.1.4.1 Are There Any Indications for Recalibration at Early Levels of
Stimulus Processing?..................................................................................304
16.1.4.2 To What Extent Does Temporal Recalibration Need Attentional
Resources?..................................................................................................304
16.1.4.3 Is Recalibration Stimulus-Specific?............................................................ 305
16.1.4.4 Is Recalibration Modality-Specific?........................................................... 305
16.1.4.5 Does Recalibration Occur at Decision Level?............................................ 305
16.1.5 Outlook on Experiments............................................................................................ 305
16.2 Methods.................................................................................................................................306
16.2.1 Animals.....................................................................................................................306
16.2.2 Electrodes..................................................................................................................306
16.2.3 Animal Preparation and Recording...........................................................................306
16.2.4 Stimuli.......................................................................................................................306
16.2.5 Experimental Protocol...............................................................................................306
16.2.6 Data Preprocessing....................................................................................................307
16.2.7 DTF: Mathematical Definition.................................................................................. 307
16.2.8 Estimation of Autoregressive Models........................................................................ 308
16.2.9 Normalization of DTF...............................................................................................309
16.2.10 Statistical Testing.....................................................................................................309
301
302 The Neural Bases of Multisensory Processes
Temporal congruity between auditory and visual stimuli has frequently been shown to be an impor-
tant factor in audiovisual integration, but information about temporal congruity is blurred by the
different speeds of transmission in the two sensory modalities. Compensating for the differences in
transmission times is challenging for the brain because at each step of transmission, from the pro-
duction of the signal to its arrival at higher cortical areas, the speed of transmission can be affected
in various ways. One way to deal with this complexity could be that the compensation mechanisms
remain plastic throughout its lifetime so that they can flexibly adapt to the typical transmission
delays of new types of stimuli. Temporal recalibration to new values of stimulus asynchronies has
been demonstrated in several behavioral studies. This study seeks to explore the potential mecha-
nisms underlying such recalibration at the cortical level. Toward this aim, tone and light stimuli
were presented repeatedly to awake, passively listening, Mongolian gerbils at the same constant
lag. During stimulation, the local field potential was recorded from electrodes implanted into the
auditory and visual cortices. The interaction dynamics between the auditory and visual cortices
were examined using the directed transfer function (DTF; Kaminski and Blinowska 1991). With an
increasing number of stimulus repetitions, the amplitude of the DTF showed characteristic changes
at specific time points between and after the stimuli. Our findings support the view that repeated
presentation of audiovisual stimuli at a constant delay alters the interactions between the auditory
and visual cortices.
16.1 INTRODUCTION
Listening to a concert is also enjoyable while watching the musicians play. Under normal circum-
stances, we are not confused by seeing the drumstick movement or the lip movement of the singer
after hearing the beat and the vocals. When in our conscious experience of the world, the senses
appear as having been united, this also seems to imply that stimulus processing in different modali-
ties must have reached consciousness at about the same time.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony 303
Apparently, the task to judge which stimuli have appeared simultaneously is quite challenging
for the brain: during the past decade, an increasing number of studies have been published indicat-
ing that temporal perception remains plastic throughout the lifetime. These studies demonstrated
that when stimuli from different sensory modalities are presented repeatedly at a small constant
temporal onset asynchrony, after a while, their temporal disparity is perceived as being diminished
in the conscious experience. This chapter describes the electrophysiological results of interaction
processes between the auditory and visual cortices during constant asynchronous presentation of
audiovisual stimuli in a rodent preparation designed to mimic relevant aspects of classic experi-
ments in humans on the recalibration of temporal order judgment.
The existence of temporal recalibration to new stimuli has been demonstrated in several stud-
ies (Fujisaki et al. 2004; Vroomen et al. 2004; Navarra et al. 2005; Heron et al. 2007; Keetels and
Vroomen 2007). In these studies, experimental paradigms typically start with an adaptation phase
with auditory and visual stimuli being presented repeatedly over several minutes, and consistently
at a slight onset asynchrony of about 0 to 250 ms. In a subsequent behavioral testing phase, auditory
and visual stimuli are presented at various temporal delays and their perceived temporal distance
is usually assessed by a simultaneity judgment task (subjects have to indicate whether the stimuli
are simultaneous or not) or a temporal order judgment task (subjects have to indicate which of the
stimuli they perceived first).
Using these procedures, temporal recalibration could be demonstrated repeatedly: the average
time one stimulus had to lead the other for the two to be judged as occurring simultaneously, the
point of subjective simultaneity (PSS), was shifted in the direction of the lag used in the adapta-
tion phase (Fujisaki et al. 2004; Vroomen et al. 2004). For example, if sound was presented before
light in the adaptation phase, in the testing phase, the sound stimulus had to be presented earlier
in time than before the adaptation to be regarded as having occurred simultaneously with the light
stimulus.
In addition, in several studies, an increase in the just notable difference (JND) was observed (the
smallest temporal interval between two stimuli needed for the participants in a temporal order task
to be able to judge correctly which of the stimuli was presented first in 75% of the trials; Fujisaki et
al. 2004; Navarra et al. 2005).
of recalibration and attention might be independent: Fujisaki and colleagues found no interaction
between the shift in the PSS caused by attention and the shift in PSS caused by adaptation in a
recalibration experiment (Fujisaki et al. 2004).
and visual cortices change during the course of continuous asynchronous presentation of auditory
and visual stimuli. There is accumulating evidence that the synchronization dynamics between
brain areas might reflect their mode of interaction (Bressler 1995, 1996). We examined directional
influences between auditory and visual cortices by analyzing the local field potential data using the
DTF (Kaminski and Blinowska 1991).
16.2 METHODS
16.2.1 Animals
Data were obtained from eight adult male Mongolian gerbils (Meriones unguiculatus). All ani-
mal experiments were surveyed and approved by the animal care committee of the Land Sachsen-
Anhalt.
16.2.2 Electrodes
Electrodes were made of stainless steel wire (diameter, 185 µm) and were deinsulated only at the tip.
The tip of the reference electrodes was bent into a small loop (diameter, 0.6 mm). The impedance of
the recording electrodes was 1.5 MΩ (at 1 kHz).
16.2.4 Stimuli
Auditory and visual stimuli were presented at a constant intermodal stimulus onset asynchrony
of 200 ms. The duration of both the auditory and the visual stimuli was 50 ms and the intertrial
interval varied randomly between 1 and 2 s with a rectangular distribution of intervals in that range.
Acoustic stimuli were tones presented from a loudspeaker located 30 cm above the animal. The tone
frequency was chosen for each individual animal to match the frequency that evoked, in preparatory
experiments, the strongest amplitude of local field potential at the recording site within the tono-
topic map of primary auditory cortex (Ohl et al. 2000, 2001). The frequencies used ranged from 250
Hz to 4 kHz with the peak level of the tone stimuli varying between 60 dB (low frequencies) and 48
dB (high frequencies), measured by a Bruel und Kjaer sound level meter type). Visual stimuli were
flashes presented from an LED lamp (9.6 cd/m2) located at the height of the eyes of the animal.
p p
X1 (t ) = ∑Aj =1
1→1 ( j) X1 (t − j) + ∑A
j =1
2→1 ( j) X 2 (t − j) + E (16.1)
p p
X 2 (t ) = ∑A
j =1
1→2 ( j)X1 (t − j) + ∑A
j =1
2→2 ( j) X 2 (t − j) + E (16.2)
Here, the A(j) are the autoregressive coefficients at time lag j, p is the order of the autoregressive
model, and E the prediction error. According to the concept of Granger causality, in Equation 16.1,
the channel X2 is said to have a causal influence on channel X1 if the prediction error E can be
reduced by including past measurements of channel X2 (for the influence of the channel X1 on the
channel X2, see Equation 16.2).
To investigate the spectral characteristics of interchannel interaction, the autoregressive coef-
ficients in Equation 16.1 were Fourier-transformed; the transfer matrix was then obtained by matrix
inversion:
−1
A1→1 ( f ) A2→1 ( f ) H1→1 ( f ) H 2→1 ( f )
= (16.3)
A1→2 ( f ) A2→2 ( f ) H1→2 ( f ) H 2→2 ( f )
Al→m ( f ) = 1 − ∑A
j =1
l→m ( j )e − i 2π fj when l = m (16.4)
with l being the number of the transmitting channel and m the number of the receiving channel
308 The Neural Bases of Multisensory Processes
Al→m ( f ) = 0 − ∑A
j =1
l→m ( j )e − i 2π fj otherwise. (16.5)
The DTF for the influence from a selectable channel 1 to a selectable channel 2, DTF1→2, is
defined as
In the case of only two channels, the DTF measures the predictability of the frequency response of
a first channel from a second channel measured earlier in time. When, for example, X1 describes the
local field potential from the auditory cortex, X2 the local field potential from the visual cortex, and
the amplitude of the nDTF1→2 has high values in the beta band, this means that we are able to predict
the beta response of the visual cortex from the beta response of the auditory cortex measured earlier
in time. There are several possible situations of cross-cortical interaction that might underlie the
modulation of DTF amplitudes (see, e.g., Kaminski et al. 2001; Cassidy and Brown 2003; Eichler
2006). See Section 16.4 for more details.
Using a high sampling rate ensures that the number of data points contained in the small time
windows is sufficient for model estimation. For example, when we used a sampling rate of 500 Hz
instead of 1000 Hz to estimate models from our time windows of 100 ms, the covariance of the
residuals increased, signaling that the estimation had become worse (the autocovariance of the
residuals of the auditory and visual channels at 1000 Hz were about 10% of the auto- and crossco-
variance of the auditory and visual channels at 500 Hz). Importantly, when inspecting the spectra
visually, they seemed to be quite alike, indicating that AR models were robust, to an extent, to a
change in sampling rate. When using a data window of 200 ms with the same sampling rate of 500
Hz, the model estimation improved (the covariance of the residuals was 20–40% of the covariance
of a model with a window of 100 ms), but at the expense of the temporal resolution.
H A→ V ( f ) 2
nDTFA→V ( f ) = k (16.7)
∑H
2
M→V (f)
M =1
In the two-channel case, the DTFA→V is divided by the sum of itself and the spectral autocovariance
of the visual channel. Thus, when using this normalization, the amplitude of the nDTFA→V depends
on the influence of the auditory channel on itself and, reciprocally, the amplitude of the nDTFV→A
is dependent on the influence of the visual channel on itself. This is problematic in two ways: first,
we cannot tell whether differences between the amplitude of the nDTFA→V and the amplitude of the
nDTFV→A are because of differences in normalization or to differences in the strengths of cross-
cortical influences. Second, analysis of our data has shown that the auditory and the visual stimuli
influenced both the amplitude of the local field potential and the spectral autocovariance of both
auditory and visual channels. Thus, it is not clear whether changes in the amplitude of the nDTF
after stimulation signal changes in the crosscortical interaction or changes in spectral autocovari-
ance of the single channels.
As the nonnormalized DTF is difficult to handle because of large differences in the amplitudes
at different frequencies, we normalized the DTF in the following way:
DTFA→V ( f )
nDTFA→V ( f ) = (16.8)
n _ session n _ trials n _ windows
∑ ∑ ∑
1 1 1
(
DTFA→V ( f ) / n _ windows * n _ trials * n _ session )
with n_windows being the number of time windows of the prestimulus interval per trial, n_trials the
number of trials per session, and n_session the number of sessions.
Hence, the amplitude of the DTF estimated for each single time window of the single trials was
divided by the average of the DTF of all time windows taken from the 1 s prestimulus interval of
the single trials of all sessions.
empirical statistical error distribution of the nDTF (but see Eichler 2006, for an investigation of the
statistical properties of the DTF). The general procedure was as follows: first, bootstrap samples
were drawn from real data under the assumption that the null hypothesis was true. Then for each
bootstrap sample, a chosen test statistic was computed. The values of the test statistic from all boot-
strap samples formed a distribution of values of the test statistic under the assumption of the null
hypothesis. Next, we determined from the bootstrap distribution of the test statistic the probability
of finding values equal to or larger than the empirically observed one by chance. If this value was
less than the preselected significance level, the null hypothesis was rejected.
More specifically, in our first bootstrap test, we wanted to test the hypothesis of whether the
nDTF has higher amplitude values in the poststimulus interval than in the prestimulus interval.
Under the assumption of the null hypothesis, the nDTF amplitude values of the prestimulus and
the poststimulus interval should not be different from each other. Thus, pairs of bootstrap samples
were generated by taking single-trial nDTF amplitude values at random but with replacement from
the prestimulus and from the poststimulus interval. For each of the sample pairs, the amplitudes
were averaged across trials and the difference between the averages was computed separately for
each pair. This procedure of drawing samples was repeated 1000 times, getting a distribution of
differences between the average amplitudes. The resulting bootstrap distribution was then used to
determine the probability of the real amplitude difference of the averages between the prestimulus
and the poststimulus interval under the assumption of the null hypothesis.
In a second bootstrap test, we assessed the significance of the slope of a line fitted to the data by
linear regression analysis. We used the null hypothesis that the predictor variable (here, the number
of stimulus presentations) and the response variable (here, the nDTF amplitude) are independent
from each other. We generated bootstrap samples by randomly pairing the values of the predictor
and observer variables. For each of these samples, a line was fitted by linear regression analysis and
the slope was computed obtaining a distribution of slope values under the null hypothesis.
16.3 RESULTS
16.3.1 S timulus-Induced Changes in Single-Trial nDTF,
Averaged across All Trials from All Sessions
For a first inspection of the effect the audiovisual stimulation had on the nDTF, from the auditory
to the visual cortex (nDTFA→V) and from the visual to the auditory cortex (nDTFV→A), we averaged
nDTF amplitudes across all single trials of all sessions, separately for each time window from 1 s
before to 1 s after the first stimulus. Figure 16.1 shows time-frequency plots of the nDTFA→V (left),
which describes the predictability of the frequency response of the visual cortex based on the fre-
quency response of the auditory cortex, and the nDTF V→A (right), which describes the predictability
of the frequency response of the auditory cortex based on the frequency response of the visual cor-
tex. Results from animals receiving the light stimulus first are presented in the upper two graphs and
results from animals receiving the tone stimulus first are shown in the lower two graphs. Data from
200 ms before the first stimulus to 1 s after the first stimulus is shown here. Note that the abscissa
indicates the start of a time window (window duration: 100 ms), so the data from time windows at
100 ms before the first stimulus are already influenced by effects occurring after the presentation
of the first stimulus.
The significance of the observed changes in the nDTF amplitude was assessed separately for
each animal using Student’s t-test based on the bootstrap technique (see Methods). More precisely,
we tested whether the amplitudes of the nDTF averaged across trials at different time points after
the presentation of the first stimulus were significantly different from the nDTF amplitude of the
prestimulus interval, averaged across trials and time from –1000 to 100 ms before the first stimulus.
To compare the relative amplitudes of the nDTFA→V and the nDTFV→A, we tested whether the dif-
ference of the amplitudes of nDTFA→V and nDTFV→A averaged across trials at different time points
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony 311
A V–DTF V A–DTF
100 100
1.6 2
80 80
Frequency [Hz] 1.4
60 1.2 60
1.5
40 1 40
0.8
20 20 1
0.6
A V–DTF V A–DTF
100 1.6 100
1.8
80 1.4 80
Frequency [Hz]
1.6
60 1.2 60 1.4
40 1 40 1.2
0.8 1
20 20
0.8
0.6
(b) 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
0 0
60 60
–0.5 –0.2
40 40
20 20 –0.4
–1
FIGURE 16.1 (a and b) nDTFA→V (left) and nDTF V→A (right), averaged across all trials from all sessions,
separately for time windows from –0.2 to 0.9 s after start of first stimulus. (a) Animal receiving light first.
(b) Animal receiving tone first. (c) Difference between averages (nDTFA→V – nDTFV→A). Animal receiving
light first (left). Animal receiving tone first (right).
after the presentation of the first stimulus were significantly different from the difference of the
amplitudes of nDTFA→V and nDTFV→A of the prestimulus interval. In the following we will describe
only peaks of the amplitudes of nDTF, which deviated significantly (P < 0.01) from the average
amplitude of prestimulus interval.
Even though the temporal development and the frequency spectra were roughly similar in the
nDTFA→V and the nDTFV→A, there were small but important differences.
First, there were stimulus-evoked differences in the amplitudes of the nDTFA→V and the nDTF V→A
(Figure 16.1c, left, and the line plots in Figure 16.2, top). After the visual stimulus, the nDTF ampli-
tude was significantly higher in the nDTFV→A than in the nDTFA→V, whereas after the auditory
stimulus, the nDTFA→V reached higher values, but only at frequencies exceeding 30 Hz.
Second, even though the peaks could be found at all frequency bands in the nDTF V→A, the first
peak was strongest at a frequency of 1 Hz and at about 32 Hz, and the second peak at frequencies of
1 Hz and at about 40 Hz. In the nDTFA→V, the highest amplitude values after the first peak could be
observed at 1 Hz and at about 35 Hz and after the second peak at 1 Hz and at about 45 Hz.
95 Hz
85 Hz
75 Hz
65 Hz
nDTF
55 Hz
45 Hz
35 Hz
25 Hz
15 Hz
5 Hz
95 Hz
85 Hz
75 Hz
65 Hz
nDTF
55 Hz
45 Hz
35 Hz
25 Hz
15 Hz
5 Hz
FIGURE 16.2 Top: representative nDTFV→A (dashed) and nDTFA→V (solid), averaged across all trials from
all sessions, separately for all time windows from –200 to 900 ms after start of first stimulus, from an animal
receiving light first, followed by tone stimulus. Bottom: data from an animal receiving tone first, followed by
light stimulus.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony 313
the tone and the light stimulus, at about –40 ms. The second and the third peaks occurred after
the light stimulus at about 170 ms and 330 ms, respectively. And as in the VA animals, after the
auditory stimulus (here the first stimulus), the amplitude of the nDTFA→V significantly exceeded the
amplitude of the nDTF V→A for frequencies above 20 Hz in the AV animals, whereas after the visual
stimulus, amplitudes were significantly higher in the nDTFV→A (Figure 16.1c, right). Thus, the sign
of the difference between the nDTFA→V and the nDTFV→A depended on the type of the stimulus
(auditory or visual) and not on the order of stimulus presentation.
The peaks ran through all frequencies from 0 to 100 Hz. The first peak of the nDTFA→V was most
pronounced at 1 Hz and at about 42 Hz, the second peak at 1 Hz, at about 32 Hz, and at 100 Hz. The
first peak of the nDTFV→A reached their highest values at 1 Hz and at 35 Hz, the second peak had
its highest amplitude at 1 Hz and at 28 Hz. For the third peak, the amplitude was most prominent
at 1 Hz.
16.3.2.1 VA-Animals
In Figure 16.3a and b, the development of the nDTF amplitude of the first and the second peaks
within the sessions is depicted and averaged across all four animals that received the light stimulus
first. Most of the effects could roughly be observed over the whole range of frequencies tested (in
Figure 16.3, we selected nDTF peaks at a frequency of 40 Hz for illustration). Nevertheless, effects
did not always reach significance at all frequencies tested (see Tables 16.1 and 16.2 for more detailed
information on the development of peaks at other frequencies).
After the first (visual) stimulus, the amplitude of the first peak increased in the nDTFA→V and
decreased in the nDTFV→A (Figure 16.3a, left). At the beginning of the session, the amplitude was
higher in the nDTFV→A than in the nDTFA→V, thus the amplitude difference between the nDTFA→V
and the nDTFV→A decreased significantly over the session (Figure 16.3a, right).
After the second (auditory) stimulus, the amplitude of the second peak increased both in the
nDTFA→V and the nDTFV→A (Figure 16.3b, left). Importantly, the increase in the nDTFA→V exceeded
the increase in the nDTFV→A, gradually increasing the difference between the nDTFA→V and the
nDTFV→A (Figure 16.3b, right).
16.3.2.2 AV-Animals
Similar to the nDTF development in VA-animals after the second (auditory) stimulus, in the AV-
animals after the first (auditory) stimulus, the amplitude increased both in the nDTFA→V and the
nDTFV→A (Figure 16.3c, left). The increase was more pronounced in nDTFA→V, further increasing
the difference between the nDTFA→V and the nDTFV→A (Figure 16.3c, right).
Interestingly, after the second (visual) stimulus, the behavior of the nDTF in the AV-animals
did not resemble the behavior of the nDTF after the first (visual) stimulus in the VA-animals. In
the AV-animals, the amplitude of the nDTFV→A increased after the visual stimulus, the ampli-
tude of the nDTFA→V decreased slightly in some animals, whereas in other animals, an increase
could be observed (Figure 16.3d, left; Table 16.1). After the visual stimulus, the amplitude of the
nDTFV→A was already higher than the amplitude of the nDTFA→V at the beginning of the sessions,
314 The Neural Bases of Multisensory Processes
nDTF
1.2 0
1.1 –0.1
1 –0.2
0 2 4 6 8 0 2 4 6 8
Number of trial interval Number of trial interval
nDTF
1.2 0.1
1.3 0
–0.1
0 2 4 6 8 0 2 4 6 8
Number of trial interval Number of trial interval
nDTF
0.8 0.1
0.7 0
–0.1
0 2 4 6 8 0 2 4 6 8
Number of trial interval Number of trial interval
nDTF
0.1 –0.1
0.05 –0.15
0 –0.2
0 2 4 6 8 0 2 4 6 8
Number of trial interval Number of trial interval
FIGURE 16.3 Development of nDTF peaks at 40 Hz within sessions averaged across nonoverlapping win-
dows of 125 trials stepped through all sessions. (a and b) Animals receiving light first. (c and d) Animals
receiving tone first. Left: development of average amplitude peak after first stimulus in nDTFA→V and nDTF V→A
(a and c). Development of average amplitude peak after second stimulus in nDTFA→V and nDTFV→A (b and
d). Right: amplitude of nDTFV→A peak subtracted from amplitude of nDTFA→V peak shown in left. Error bars
denote standard error of mean, averaged across animals.
the difference between the nDTFA→V and the nDTFV→A further increased during the course of the
sessions (Figure 16.3d, right).
TABLE 16.1
P Values of Slope of a Regression Line Fitted to Peak Amplitudes of nDTF Averaged across
Nonoverlapping Windows of 125 Trials Stepped through All Sessions
A→V nDTF peak 1 V→A nDTF peak 1
Animals 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz
AV090 <0.001 a 0.003 a 0.00a 0.006 a 0.003 a <0.001 b <0.001 b <0.001 b >0.05c >0.001a
AV091 0.003a 0.001a 0.001a 0.004a 0.002a 0.001a 0.001a 0.01a >0.05c >0.05c
AV106 0.02a 0.001a <0.001a <0.001a <0.001a 0.01a <0.001a 0.002a 0.05a >0.05c
AV125 0.0a <0.001a 0.04a 0.01a 0.001a 0.02a 0.02a 0.03a >0.05c >0.05c
VA099 <0.001a 0.001b 0.001b >0.05c 0.03a <0.001b <0.001b <0.001b <0.001b 0.001b
VA100 0.02a 0.01a 0.04a 0.001a 0.001a 0.02b <0.001b 0.001b 0.002b 0.01b
VA107 0.004a 0.001b 0.01b 0.01a 0.001a >0.05c 0.004b >0.05c >0.05c 0.01a
VA124 0.03a <0.001a <0.001a 0.01a >0.05c 0.01a <0.001b <0.001b 0.01b 0.01b
A→V nDTF peak 2 V→A nDTF peak 2
Animals 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz
AV090 <0.001 a 0.05 >0.05c >0.05 c >0.05c 0.01a 0.03a >0.05c <0.001 a <0.001a
AV091 0.001a 0.001a 0.002a 0.001a <0.001a <0.001a 0.001a 0.002a 0.006a 0.01b
AV106 <0.001a >0.05c 0.002a <0.05c 0.001a <0.001a <0.001a 0.002a <0.001a 0.004a
AV125 <0.001a <0.001a 0.001a 0.003a <0.05c 0.03a <0.001a <0.001a >0.05c >0.05c
VA099 0.001a <0.001a <0.001a <0.001a <0.001a 0.02a 0.03a 0.001a >0.05c >0.05c
VA100 0.001a 0.001a 0.001a 0.001a 0.001a >0.05c 0.002a >0.05c >0.05c 0.001a
VA107 0.001a 0.001a 0.001a 0.001a 0.001a >0.05c 0.001a >0.05c >0.05c >0.05c
VA124 >0.05c 0.01b 0.001b 0.01a >0.05c 0.01a 0.02a 0.001a 0.001a >0.05c
Note: Upper table: results from the nDTF peak after the first stimulus. Bottom table, results from the nDTF peak after the
second stimulus. Animal notation: AV, animals receiving tone first; VA, animals receiving the light first.
a Slope is positive.
b Slope is negative.
c Nonsignificant results.
was examined by linear regression analysis and the significance of the slope was tested using
the bootstrap technique. In the following, effects are reported for a chosen significance level
of 0.05.
Even though some significant trends could be observed, results were not consistent among
animals. In the VA-group, one animal showed a decrease in the amplitude of the nDTFA→V at the
beginning of the first stimulus, but an increase could be found only 20 ms after the beginning
of the first stimulus. In a second animal, there was an increase in the amplitude of the nDTFA→V
after the second stimulus. In the amplitude of the nDTF V→A of two VA-animals, decreases could
be observed after the first and second stimulus, whereas in a third animal, an increase was found
after the second stimulus. All these results could be observed for the majority of examined
frequencies.
In the nDTFA→V of the AV-animals, at many frequencies, no clear developmental trend could be
observed, but at frequencies less than 10 Hz, there was an increase in amplitude both after the first
and second stimulus in two animals, whereas in one animal, a decrease could be found after both
stimuli. In the amplitude of the nDTF V→A, increases could be observed at various frequencies and
time points after stimulation.
316 The Neural Bases of Multisensory Processes
TABLE 16.2
P Values of Slope of a Regression Line Fitted to Difference of Peak Amplitudes of nDTFV→A
and nDTFA→V Averaged in Nonoverlapping Windows of 125 Trials Stepped through All
Sessions
Difference (A→V minus V→A): peak 1 Difference (A→V minus V→A): peak 2
Animals 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz 1 Hz 20 Hz 40 Hz 60 Hz 80 Hz
AV090 0.03 a 0.002 a 0.004 a 0.006a 0.009a 0.01 b 0.02b 0.02 b >0.05 c >0.05c
AV091 0.01a 0.006a 0.007a 0.004a 0.009a 0.01b 0.04b 0.02b >0.05c 0.01a
AV106 0.008a 0.03a 0.04a 0.03a 0.02a 0.02b >0.05c >0.05c >0.05c >0.05c
AV125 >0.05c >0.05c 0.04a 0.005a 0.06a 0.02b 0.01b 0.02b 0.03b 0.01b
VA099 0.002a 0.005a 0.002a <0.001a 0.002a 0.002a 0.001a 0.002a 0.002a 0.002a
VA100 0.04a 0.009a 0.01a 0.008a 0.04a 0.03a 0.004a 0.001a 0.001a 0.001a
VA107 0.01a >0.05c 0.04a 0.02a 0.04a 0.01a 0.06c >0.05c >0.05c >0.05c
Note: Left, first nDTF peak. Right, second nDTF peak. Animal notation: AV, animals receiving tone first; VA, animals
receiving the light first.
a Slope is positive.
b Slope is negative.
c Nonsignificant results.
16.4 DISCUSSION
The repeated presentation of pairs of auditory and visual stimuli, with random intervals between
stimulus pairs but constant audiovisual stimulus onset asynchrony within each pair, led to robust
changes in the interaction dynamics between the primary auditory and the primary visual cortex.
Independent of the stimulus order, when an auditory stimulus was presented, the amplitude of the
nDTFA→V exceeded the amplitude of the nDTF V→A, whereas after the visual stimulus, the amplitude
of the nDTFV→A reached higher values. Moreover, within adaptation sessions, some of the observed
changes in nDTF amplitudes showed clear dynamic trends, whereas across adaptation sessions, no
coherent development could be observed. In the following we will discuss which processes might
be evoked by the repeated asynchronous presentation of audiovisual stimuli and whether they might
offer suitable explanations for the amplitude changes in the nDTF we observed.
As paired-stimulus adaptation protocols, similar to the one used in the present study, have been
shown to induce recalibration of temporal order judgment in humans (e.g., Fujisaki et al. 2004;
Vroomen et al. 2004), we want to discuss whether some of the described effects on the directed
information transfer could possibly underlie such recalibration functions. To prepare the discussion,
some general considerations of the interpretation of nDTF amplitudes seem appropriate.
However, as Cassidy and Brown (2003) have demonstrated in a series of simulation studies, there
is no straightforward way to conclude from the information provided by the DTF to cross-cortical
interactions. Specifically, from DTF amplitudes alone, we cannot tell whether the information flow
is unidirectional, bidirectional, or even multidirectional, including additional brain areas.
Let us consider the situation after the presentation of the auditory stimulus when the amplitude of
the nDTFA→V attains higher values than the amplitude of the nDTFV→A. First, this result might indi-
cate that there is unidirectional influence from the auditory to the visual cortex, with the size of the
amplitude difference positively correlating with the delay in the information transfer. Second, this
finding could also reflect a reciprocal influence between the auditory and visual cortices, but with
the influence from the auditory cortex either larger in amplitude or lagged relative to the influence
from the visual cortex. Third, additional unobserved structures might be involved, sending input
slightly earlier to the auditory cortex than to the visual cortex.
The cross-cortical interaction between auditory and visual cortices reflected in the peaks of the
nDTF could simply be an indication that information is spread among the sensory cortices dur-
ing the course of stimulus processing. However, we also have to take into account that the nDTF
amplitudes increased within the sessions, signaling that the interaction between the auditory and
the visual cortex intensified. In addition, after the visual stimulus, the behavior of the DTF differed
strongly with the order of stimulus presentation. Each of these observations might be a sign that the
auditory and the visual information became associated. This hypothesis is in accordance with the
unity assumption (e.g., Bedford 2001; Welch 1999; Welch and Warren 1980), which states that two
stimuli from different sensory modalities will be more likely regarded as deriving from the same
event when they are presented, for example, in close temporal congruence.
The increase in the nDTF after the second stimulus might indicate that stimuli are integrated
after the second stimulus has been presented. The increase in the nDTF before the second stimulus
might indicate the expectation of the second stimulus. Several other studies have demonstrated
increases in coherent activity associated with anticipatory processing (e.g., Roelfsema et al. 1998;
Von Stein et al. 2000; Fries et al. 2001; Liang et al. 2002). But on the other hand, our results on
the development of the nDTF after the first stimulus varied strongly with the stimulus order, and it
seems strange that the effect the expectation of an auditory stimulus has on the nDTF is quite dif-
ferent from the effect the expectation of a visual stimulus might have on the nDTF.
To clarify whether the observed changes might have something to do with stimulus associa-
tion or expectation processes, the repetition of this experiment with anesthetized animals might be
helpful. To explore whether the nDTF amplitude is influenced by anticipatory processing, it might
also be interesting to vary the likelihood with which a stimulus of a first modality is followed by
a stimulus of a second modality (see Sutton et al. 1965, for an experiment examining the effect of
stimulus uncertainty on local field potentials).
effects. In a similar way, a specific stimulus onset asynchrony between the stimuli does not seem
to be required, speaking against a dominant role for lag-specific detection processes underlying the
recalibration effect.
16.5 CONCLUSIONS
The repeated presentation of paired auditory and visual stimuli with constant intermodal onset
asynchrony is known to recalibrate audiovisual temporal order judgment in humans. The aim of
this study was to identify potential neural mechanisms that could underlie this recalibration in an
animal model amenable to detailed electrophysiological analysis of neural mass activity. Using
Mongolian gerbils, we found that prolonged presentation of paired auditory and visual stimuli
320 The Neural Bases of Multisensory Processes
caused characteristic changes in the neuronal interaction dynamics between the primary auditory
cortex and the primary visual cortex, as evidenced by changes in the amplitude of the nDTF esti-
mated from local field potentials recorded in both cortices. Specifically, changes in both the DTF
from auditory to visual cortex (nDTFA→V) and from visual to auditory cortex (nDTF V→A) dynami-
cally developed over the course of the adaptation trials. We discussed three types of processes
that might have been induced by the repeated stimulation: stimulus association processes, lag
detection processes, and changes in the speed of stimulus processing. Although all three processes
could potentially have contributed to the observed changes in nDTF amplitudes, their relative roles
for mediating psychophysical recalibration of temporal order judgment must remain speculative.
Further clarification of this issue would require a behavioral test of the recalibration of temporal
order judgment in combination with the electrophysiological analysis.
REFERENCES
Akaike, H. 1974. A new look at statistical model identification. Transactions on Automatic Control
19:716–723.
Alais, D. and S. Carlile. 2005. Synchronizing to real events: Subjective audiovisual alignment scales with per-
ceived auditory depth and speed of sound. Proceedings of the National Academy of Science of the United
States of America 102(6):2244–2247.
Arnold, D.H., A. Johnston, and S. Nishida. 2005. Timing sight and sound. Vision Research 45:1275–1284.
Astolfi, L., F. Cincotti, D. Mattia, M.G. Marciani, L.A. Baccala, F. de Vico Fallani, S. Salinari, M. Ursino, M.
Zavaglia, L. Ding, J.C. Edgar, G.A. Miller, B. He, and F. Babiloni. 2007. Comparison of different cortical
connectivity estimators for high-tech resolution EEG Recordings. Human Brain Mapping 28:143–157.
Baylor, D.A., B.J. Nunn, and J.L. Schnapf. 1984. The photocurrent, noise and spectral sensitivity of rods of the
monkey Macaca fascicularis. Journal of Physiology 357:575–607.
Baylor, D.A., B.J. Nunn, and J.L. Schnapf. 1987. Spectral sensitivity of cones of the monkey Macaca fascicu-
laris. Journal of Physiology 390:124–160.
Bedford, F.L. 2001. Toward a general law of numerical/object identity. Current Psychology of Cognition
20(3–4):113–175.
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for
multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–2198.
Bressler, S.L. 1995. Large scale cortical networks and cognition. Brain Research Reviews 20:288–304.
Bressler, S.L. 1996. Interareal synchronization in the visual cortex. Behavioral Brain Research 76:37–49.
Bressler, S.L., R. Coppola, and R. Nakamura. 1993. Episodic multiregional cortical coherence at multiple fre-
quencies during visual task performance. Nature 366:153–156.
Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate audi-
tory cortex of highly trained monkeys. Journal of Neuroscience 25(29):6796–6806.
Bushara, K.O., J. Grafman, and M. Hallet. 2001. Neural correlates of audio-visual stimulus onset asynchrony
detection. The Journal of Neuroscience 21(1):300–304.
Cahill, L., F.W. Ohl, and H. Scheich. 1996. Alternation of auditory cortex activity with a visual stimulus through
conditioning: A 2-deoxyglucose analysis. Neurobiology of Learning and Memory 65(3):213–222.
Cassidy, M., and P. Brown. 2003. Spectral phase estimates in the setting of multidirectional coupling. Journal
of Neuroscience Methods 127:95–103.
Cobbs, E.H., and E.N. Pugh Jr. 1987. Kinetics and components of the flash photocurrent of isolated retinal rods
of the larval salamander, Ambystoma tigrinum. Journal of Physiology 394:529–572.
Corey, D.P., and A.J. Hudspeth. 1979. Response latency of vertebrate hair cells. Biophysical Journal 26:499–506.
Corey, D.P., and A.J. Hudspeth. 1983. Analysis of the microphonic potential of the bullfrog’s sacculus. Journal
of Neuroscience 3:942–961.
Crawford, A.C., and R. Fettiplace. 1985. The mechanical properties of ciliary bundles of turtle cochlear hair
cells. Journal of Physiology 364:359–379.
Crawford, A.C., M.G. Evans, and R. Fettiplace. 1991. The actions of calcium on the mechanoelectrical trans-
ducer current of turtle hair cells. Journal of Physiology 491:405–434.
Ding, M., S.L. Bressler, W. Yang, and H. Liang. 2000. Short-window spectral analysis of cortical event-related
potentials by adaptive autoregressive modelling: Data preprocessing, model validation, variability assess-
ment. Biological Cybernetics 83:35–45.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony 321
Eichler, M. 2006. On the evaluation of information flow in multivariate systems by the directed transfer func-
tion. Biological Cybernetics 94:469–482.
Engel, G.R., and W.G. Dougherty 1971. Visual-auditory distance constancy. Nature 234:308.
Efron, B., and R.J. Tibshirani 1993. An Introduction to the Bootstrap. Boca Raton, FL: Chapman and Hall/
CRC.
Fain, G.L. 2003. Sensory Transduction. Sunderland: Sinauer Associates.
Franaszczuk, P.J., and G.K. Bergey. 1998. Application of the directed transfer function method to mesial and
lateral onset temporal lobe seizures. Brain Topography 11:13–21.
Freeman,W.J. 2000. Neurodynamics: An Exploration in Mesoscopic Brain Dynamics. London: Springer
Verlag.
Fries, P., J.H. Reynolds, A.E. Rorie, and R. Desimone. 2001. Modulation of oscillatory neuronal synchroniza-
tion by selective visual attention. Science 291:1560–1563.
Fujisaki, W., and S. Nishida. 2005. Temporal frequency characteristics of synchrony-asynchrony discrimina-
tion of audio-visual signals. Experimental Brain Research 166:455–464.
Fujisaki, W., and S. Nishida. 2008. Top-down feature based selection of matching feature for audio-visual syn-
chrony discrimination. Neuroscience Letters 433:225–230.
Fujisaki, W., S. Shinsuke, K. Makio, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature
Neuroscience 7(7):773.
Fujisaki, W., A. Koene, D. Arnold, A. Johnston and S. Nishida. 2006. Visual search for a target changing in
synchrony with an auditory signal. Proceedings of the Royal Society of London. Series B. Biological
Sciences 273:865–874.
Harrar, V., and L.R. Harris. 2005. Simultaneity constancy: Detecting events with touch and vision. Experimental
Brain Research 166:465–473.
Harrar, V., and L.R. Harris. 2008. The effects of exposure to asynchronous audio, visual, and tactile stimulus
combination on the perception of simultaneity. Experimental Brain Research 186:517–524.
Hanson, J.V.M., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities.
Experimental Brain Research 185:347–352.
Heron, J., D. Whitaker, P. McGraw, and K.V. Horoshenkov. 2007. Adaptation minimizes distance-related audio-
visual delays. Journal of Vision 7(13):1–8.
Hestrin, S., and J.I. Korenbrot. 1990. Activation kinetics of retinal cones and rods: Response to intense flashes
of light. Journal of Neuroscience 10:1967–1973.
Kaminski, M., and K.J. Blinowska. 1991. A new method for the description of the information flow in the brain
structures. Biological Cybernetics 65:203–210.
Kaminski, M., K.J. Blinowska, and W. Szelenberger. 1997. Topographic analysis of coherence and propaga-
tion of EEG activity during sleep and wakefulness. Electroencephalography Clinical Neurophysiology
102:216–277.
Kaminski, M., M. Ding, W.A. Trucculo, and S.L. Bressler. 2001. Evaluating causal relations in neural sys-
tems: Granger causality, directed transfer function and statistical assessment of significance. Biological
Cybernetics 85:145–157.
Kay, S.M. 1987. Modern Spectral Estimation. Englewood Cliffs, NJ: Prentice Hall.
Kayser, C., C. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18:1560–1574.
Keetels, M., and J. Vroomen. 2007. No effect of auditory–visual spatial disparity on temporal recalibration.
Experimental Brain Research 182:559–565.
Kopinska, A., and L.R. Harris. 2004. Simultaneity constancy. Perception 33:1049–1060.
Korzeniewska, A., S. Kasicki, M. Kaminski, and K.J. Blinowska. 1997. Information flow between hippocam-
pus and related structures during various types of rat’s behavior. Journal of Neuroscience Methods
73:49–60.
Korzeniewska, A., M. Manczak, M. Kaminski, K.J. Blinowska, and S. Kasicki. 2003. Determination of infor-
mation flow direction among brain structures by a modified directed transfer function (dDTF) method.
Journal of Neuroscience Methods 125:195–207.
Kus, R., M. Kaminski, and K.J. Blinowska. 2004. Determination of EEG activity propagation: Pairwise versus
multichannel estimate. IEEE Transactions on Bio-Medical Engineering 51:1501–1510.
Lewald, J., and R. Guski. 2004. Auditory-visual temporal integration as a function of distance: No compensa-
tion for sound-transmission time in human perception. Neuroscience Letters 357:119–122.
Liang, H., M. Ding, R. Nakamura, and S.L. Bressler. 2000. Causal influences in primate cerebral cortex.
Neuroreport 11(13):2875–2880.
322 The Neural Bases of Multisensory Processes
Liang, H., S.L. Bressler, M. Ding, W.A. Truccolo, and R. Nakamura. 2002. Synchronized activity in prefrontal
cortex during anticipation of visuomotor processing. Neuroreport 13(16):2011–2015.
Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis, 2nd ed. Berlin: Springer.
Marple, S.L. 1987. Digital Spectral Analysis with Applications. Englewood Cliffs, NJ: Prentice Hall.
Medvedev, A., and J.O. Willoughby. 1999. Autoregressive modeling of the EEG in systemic kainic acid-induced
epileptogenesis. International Journal of Neuroscience 97:149–167.
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior col-
liculus neurons. I. Temporal factors. The Journal of Neuroscience 7(10):3212–3229.
Miyazaki, M., S. Yamamoto, S. Uchida, and S. Kitazawa. 2006. Bayesian calibration of simultaneity in tactile
temporal order judgment. Nature Neuroscience 9:875–877.
Musacchia, G., and C.E. Schroeder. 2009. Neural mechanisms, response dynamics and perceptual functions of
multisensory interactions in auditory cortex. Hearing Research 285:72–79.
Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asyn-
chronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain
Research 25:499–507.
Navarra, J., S. Soto-Faraco, and C. Spence. 2006. Adaptation to audiovisual asynchrony. Neuroscience Letters
431:72–76.
Nickalls, R.W.D. 1996. The influences of target angular velocity on visual latency difference determined using
the rotating Pulfirch effect. Vision Research 36:2865–2872.
Ohl, F.W., H. Scheich, and W.J. Freeman. 2000. Topographic analysis of epidural pure-tone-evoked potentials
in gerbil auditory cortex. Journal of Neurophysiology 83:3123–3132.
Ohl, F.W., H. Scheich, and W.J. Freeman. 2001. Change in pattern of ongoing cortical activity with auditory
learning. Nature 412:733–736.
Posner, M.I., C.R.R. Snyder, and B.J. Davidson. 1980. Attention and the detection of signals. Journal of
Experimental Psychology: General 109(2):160–174.
Robson, J.G., S.M. Saszik, J. Ahmed, and L.J. Frishman. 2003. Rod and cone contributions to the a-wave of the
electroretinogram of the macaque. Journal of Physiology 547:509–530.
Rodriguez, E., N. Georg, J.P. Lachaux, J. Martinerie, B. Renault, and F.J. Varela. 1999. Perception’s shadow:
Long-distance synchronization of neural activity. Nature 397:430–433.
Roelfsema, P.R., A.K. Engel, P. König, and W. Singer. 1997. Visuomotor integration is associated with zero
time-lag synchronization among cortical areas. Nature 385:157–161.
Roelfsema, P.R., V.A.F. Lamme, and H. Spekreijse. 1998. Object based attention in the primary auditory cortex
of the macaque monkey, Nature 395:377–381.
Schlögl, A. 2006. A comparison of multivariate autoregressive estimators. Signal Processing 86:2426–2429.
Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisen-
sory integration: Effects of the precision of temporal synchrony as revealed by gamma band oscillations.
Neuropsychologica 45:561–571.
Stone, J.V. 2001. Where is now? Perception of simultaneity. Proceedings of the Royal Society of London. Series
B. Biological Sciences 268:31–38.
Sugita, Y., and Suzuki, Y. 2003. Audiovisual perception. Implicit evaluation of sound arrival time. Nature
421:911.
Sutton, S., M. Braren, J. Subin, and E.R. John. 1965. Evoked potential correlates of stimulus uncertainty.
Science 150:1178–1188
Varela, F., J. Lacheaux, E. Rodriguez, and J. Martinerie. 2001. The brain-web: Phase synchronization and large-
scale integration. Nature Reviews Neuroscience 2:229–239.
Vatakis, A., and C. Spence. 2007. Crossmodal binding: Evaluating the influence of the ‘unity assumption’ using
audiovisual speech stimuli. Perception & Psychophysics 69(5):744–56.
Vatakis, A., and Spence, C. 2008. Evaluating the influence of the ‘unity assumption’ on the temporal perception
of realistic audiovisual stimuli. Acta Psychologica 127:12–23.
Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2007. Temporal recalibration during asynchronous
audiovisual speech perception. Experimental Brain Research 181:173–181.
Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2008. Audiovisual temporal adaptation of speech:
Temporal order versus simultaneity judgments. Experimental Brain Research 185:521–529.
Von Békésy, G. 1963. Interaction of paired sensory stimuli and conduction of peripheral nerves. Journal of
Applied Physiology 18:1276–1284.
Von Stein, A., C. Chiang, and P. König. 2000. Top-down processing mediated by interarea synchronization.
Proceedings of the National Academy of Science of the United States of America 97:147148–147153.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony 323
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by
exposure to audio-visual asynchrony. Cognitive Brain Research 22:32–35.
Welch, R.B. 1999. Meaning, attention and the unity assumption in the intersensory bias of spatial and tem-
poral perceptions. In Cognitive contributions to the perception of spatial and temporal events, ed. G.
Achersleben, T. Bachmann, and J. Müsseler, 371–387. Amsterdam: Elsevier.
Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological
Bulletin 88:638–667.
Wilson, J.A., and S.M. Anstis. 1996. Visual delay as a function of luminance. American Journal of Psychology
82:350–358.
17 Development of Multisensory
Temporal Perception
David J. Lewkowicz
CONTENTS
17.1 Introduction........................................................................................................................... 325
17.2 Perception of Multisensory Temporal Information and Its Coherence................................. 326
17.3 Developmental Emergence of Multisensory Perception: General Patterns and Effects
of Experience......................................................................................................................... 327
17.4 Perception of Temporal Information in Infancy.................................................................... 330
17.5 Perception of A–V Temporal Synchrony............................................................................... 331
17.5.1 A–V Temporal Synchrony Threshold........................................................................ 331
17.5.2 Perception of A–V Speech Synchrony and Effects of Experience............................ 332
17.5.3 Binding of Nonnative Faces and Vocalizations......................................................... 334
17.6 Perception of Multisensory Temporal Sequences in Infancy................................................ 336
17.7 Speculations on Neural Mechanisms Underlying the Development of Multisensory
Perception.............................................................................................................................. 338
References....................................................................................................................................... 339
17.1 INTRODUCTION
The objects and events in our external environment provide us with a constant flow of multisensory
information. Such an unrelenting flow of information might be potentially confusing if no mecha-
nisms were available for its integration. Fortunately, however, sophisticated multisensory integra-
tion* mechanisms have evolved across the animal kingdom to solve this problem (Calvert et al.
2004; Ghazanfar and Schroeder 2006; Maier and Schneirla 1964; Marks 1978; Partan and Marler
1999; Rowe 1999; Stein and Meredith 1993; Stein and Stanford 2008; Welch and Warren 1980).
These mechanisms enable mature organisms to integrate multisensory inputs and, in the process,
make it possible for them to perceive the coherent nature of their multisensory world.
The other chapters in this volume discuss the structural and functional characteristics of multi-
sensory processing and integration mechanisms in adults. Here, I address the developmental ques-
tion by asking (1) when do multisensory response mechanisms begin to emerge in development,
and (2) what specific processes underlie their emergence? To answer these questions, I discuss our
work on the development of multisensory processing of temporal information and focus primar-
ily on human infants. I show that processing of multisensory temporal information, as well as the
* Historically, the term “integration,” when used in the context of work on multisensory processing, has been used to refer
to different processes by different researchers (Stein et al. 2010). For some, this term is reserved for cases in which sen-
sory input in one modality changes the qualitative experience that one has in response to stimulation in another modality,
as is the case in the McGurk effect (McGurk and MacDonald 1976). For others, it has come to be associated with neural
and behavioral responsiveness to near-threshold stimulation in one modality either being enhanced or suppressed by
stimulation in another modality (Stein and Stanford 2008). Finally, for some investigators, integration has simply meant
the process that enables perceivers to detect and respond to the relational nature of multisensory stimulation and no
assumptions were made about underlying perceptual or neural mechanisms. It is this last meaning that is used here.
325
326 The Neural Bases of Multisensory Processes
processing of other types of multisensory information, emerges gradually during the first year of
life, argue that the rudimentary multisensory processing abilities found at the beginning of life
reflect neural/behavioral immaturities and the relative lack of perceptual and sensorimotor experi-
ence, and provide evidence that the gradual improvement in multisensory processing ability reflects
the interaction between behavioral and (implied) neural maturation and perceptual experience.
* There is a functionally important distinction between intersensory cues such as duration, tempo, and rhythm, on the one
hand, and intersensory temporal synchrony cues, on the other. The former are all amodal stimulus attributes because they
can be specified independently in different modalities and, as a result, can be perceived even in the absence of temporal
synchrony cues (e.g., even if the auditory and visual attributes of a speech utterance are not presented together, their
equal duration can be perceived). In contrast, temporal synchrony is not an amodal perceptual cue because it cannot be
specified independently in a single sensory modality; an observer must have access to the concurrent information in the
different modalities to perceive it. Moreover, and especially important for developmental studies, infants might be able to
perceive intersensory synchrony relations without being able to perceive the amodal cues that characterize the multisen-
sory attributes (e.g., an infant might be able to perceive that a talking face and the vocalizations that it produces belong
together but may not be able to detect the equal duration of the visible and audible articulations).
Development of Multisensory Temporal Perception 327
example, when adults see a single flash and hear two tones, they report two flashes even though they
know that there is only a single flash (Shams et al. 2000). Similarly, when two identical objects are
seen moving toward and then through each other and a brief sound is presented at the point of their
coincidence, adults report that the objects bounce against each other rather than pass through one
another (Sekuler et al. 1997). This “bounce” illusion emerges in infancy in that starting at 6 months
of age, infants begin to exhibit evidence that they experience it as well (Scheier et al. 2003).
Even though the various amodal and invariant temporal attributes are natural candidates for
the perception of multisensory coherence, there are good a priori theoretical reasons to expect that
intersensory temporal synchrony might play a particularly important role during the earliest stages of
development (Gibson 1969; Lewkowicz 2000a; Thelen and Smith 1994) and that young infants may
not perceive the kinds of higher-level amodal invariants mentioned earlier. One reason for this may be
the fact that, unlike in the case of the detection of higher-level amodal invariants, it is relatively easy to
detect multisensory temporal synchrony relations. All that is required is the detection of the concur-
rent onsets and offsets of stimulus energy across modalities. In contrast, the detection of amodal cues
requires the ability to perceive the equivalence of some of the higher-level types of correlated patterns
of information discussed earlier. Moreover, sometimes observers are even required to detect such pat-
terns when they are not available concurrently and can do so too (Kamachi et al. 2003). Infants also
exhibit this ability but, thus far, evidence indicates that they can do so only starting at 6 months of age
(Pons et al. 2009) and no studies have shown that they can perform this kind of task earlier.
Although young infants’ presumed inability to perceive amodal cues might seem like a serious
limitation, it has been argued by some that developmental limitations actually serve an important
function (Oppenheim 1981). With specific regard to multisensory functions, Turkewitz has argued
that sensory limitations help infants organize their perceptual world in an orderly fashion while at
the same time not overwhelming their system (Turkewitz 1994; Turkewitz and Kenny 1982). From
this perspective, the ability to detect temporal synchrony cues very early in life makes it possible
for young, immature, and inexperienced infants to first discover that multisensory inputs cohere
together, albeit at a very low level. This, in turn, gives them an entrée into a multisensory world
composed not only of the various higher-level amodal invariants mentioned earlier but other higher-
level nontemporal multisensory attributes such as gender, affect, and identity. Most theorists agree
that the general processes that mediate this gradual improvement in multisensory processing ability
are perceptual learning and differentiation in concert with infants’ everyday experience and senso-
rimotor interactions with their multisensory world.
Extant empirical findings are generally consistent with the theoretical developmental pattern
described above. For instance, young infants can detect the synchronous onsets of inanimate visual
and auditory stimuli (Lewkowicz 1992a, 1992b, 1996) and rely on synchrony cues to perceive the
amodal property of duration (Lewkowicz 1986). Likewise, starting at birth and thereafter, infants
can detect the synchronous relationship between the audible and visible attributes of vocalizing faces
(Lewkowicz 2000b, 2010; Lewkowicz and Ghazanfar 2006; Lewkowicz et al. 2010). Interestingly,
however, when the multisensory temporal task is too complex (i.e., when it requires infants to detect
which of two objects that are moving at different tempos corresponds to a synchronous sound)
synchrony cues are not sufficient for the perception of multisensory coherence (Lewkowicz 1992a,
1994). Similarly, when the relationship between two moving objects and a sound that occurs during
their coincidence is ambiguous (as is the case in the bounce illusion), 6- and 8-month-old infants
perceive this relationship but 4-month-olds do not.
do these findings reflect a general developmental pattern? The answer is that the same pattern holds
for infant perception of other types of multisensory perceptual cues. To make theoretical sense of
the overall body of findings on the developmental emergence of multisensory perceptual abilities in
infancy, it is helpful to first ask what the key theoretical questions are in this area. If, as indicated
earlier, infants’ initial immaturity and relative lack of experience imposes serious limitations on
their ability to integrate the myriad inputs that constantly bombard their perceptual systems, how
do they go about integrating those inputs and how does this process get bootstrapped at the start of
postnatal life? As already suggested, one possible mechanism is a synchrony detection mechanism
that simply detects synchronous stimulus onsets and offsets across different modalities. This, in
turn, presumably provides developing infants with the opportunity to gradually discover increas-
ingly more complex multisensory coherence cues.
Although the detection of multisensory synchrony is one possible specific mechanism that can
mediate developmental change, other more general processes probably contribute to developmental
change as well. Historically, these more general processes have been proposed in what appear to be
two diametrically opposed theoretical views concerning the development of multisensory functions.
One of these views holds that developmental differentiation is the process underlying developmental
change, whereas the other holds that developmental integration is the key process. More specifi-
cally, the first, known as the developmental differentiation view, holds that infants come into the
world prepared to detect certain amodal invariants and that this ability improves and broadens in
scope as they grow (Gibson 1969; Thelen and Smith 1994; Werner 1973). According to the principal
proponent of this theoretical view (Gibson 1969), the improvement and broadening is mediated by
perceptual differentiation, learning, and the emergence of increasingly better stimulus detection
abilities. The second, known as the developmental integration view, holds that infants come into the
world with their different sensory systems essentially disconnected and that the senses gradually
become functionally connected as a result of children’s active interaction with their world (Birch
and Lefford 1963, 1967; Piaget 1952). One of the most interesting and important features of each of
these theoretical views is that both assign central importance to developmental experience.
A great deal of empirical evidence has been amassed since the time that the two principal theo-
retical views on the development of multisensory functions were proposed. It turns out that some of
this evidence can be interpreted as consistent with the developmental differentiation view whereas
some of it can be interpreted as consistent with the developmental integration view. Overall, then,
it seems that both processes play a role in the developmental emergence of multisensory functions.
The evidence that is consistent with the developmental differentiation view comes from studies
showing that despite the fact that the infant nervous system is highly immature and, despite the fact
that infants are perceptually inexperienced, infants exhibit some multisensory perceptual abilities
from birth onward (Gardner et al. 1986; Lewkowicz et al. 2010; Lewkowicz and Turkewitz 1980,
1981; Slater et al. 1997, 1999). Importantly, however, and as indicated earlier, these abilities are
relatively rudimentary. For example, newborns can detect multisensory synchrony cues and do so
by detecting nothing more than stimulus energy onsets and offsets (Lewkowicz et al. 2010). In
addition, newborns are able to detect audiovisual (A–V) intensity equivalence (Lewkowicz and
Turkewitz 1980) and can associate arbitrary auditory and visual object attributes on the basis of
their synchronous occurrence (Slater et al. 1997, 1999). Although impressive, these kinds of findings
are not surprising given that there are ample opportunities for intersensory interactions—especially
those involving the co-occurrence of sensations in different modalities—during fetal life and that
these interactions are likely to provide the foundation for the kinds of rudimentary multisensory
perceptual abilities found at birth (Turkewitz 1994).
Other evidence from the body of empirical work amassed to date is consistent with the develop-
mental integration view by indicating that multisensory perceptual abilities improve as infants grow
and acquire perceptual experience (Bremner et al. 2008; Lewkowicz 1994, 2000a, 2002; Lickliter
and Bahrick 2000; Walker-Andrews 1997). This evidence shows that older infants possess more
sophisticated multisensory processing abilities than do younger infants. For example, young infants
Development of Multisensory Temporal Perception 329
can perceive multisensory synchrony cues (Bahrick 1983; Bahrick and Lickliter 2000; Lewkowicz
1992a,b, 1996, 2000b, 2003, 2010), amodal intensity (Lewkowicz and Turkewitz 1980), amodal
duration (Lewkowicz 1986), and the multisensory invariance of isolated audible and visible pho-
nemes (Brookes et al. 2001; Kuhl and Meltzoff 1982, 1984; Patterson and Werker 2003). In contrast,
however, whereas younger infants do not, older infants (roughly older than 6 months of age) also
exhibit the ability to perceive amodal affects produced by strangers (Walker-Andrews 1986) and
amodal gender (Patterson and Werker 2002; Walker-Andrews et al. 1991), bind arbitrary modality-
specific cues (Bahrick 1994; Reardon and Bushnell 1988), integrate auditory and visual spatial cues
in an adult-like manner (Neil et al. 2006), and integrate multisensory spatial bodily and external
cues (Bremner et al. 2008). Considered together, this latter body of findings clearly shows that mul-
tisensory perceptual abilities improve over the first year of life. Thus, when all the extant empirical
evidence is considered together, it is clear that developmental differentiation and developmental
integration processes operate side-by-side in early human development and that both contribute to
the emergence of multisensory perceptual abilities in infancy and probably beyond.
If developmental differentiation and integration both contribute to the development of multisen-
sory perception, what role might experience play in this process? As might be expected (Gibson
1969), evidence from studies of human infants indicates that experience plays a critical role in the
development of multisensory functions. Until now, however, very little direct evidence for the effects
of early experience was available at the human level except for two studies that together demon-
strated that infant response to amodal affect information depends on the familiarity of the informa-
tion. Thus, in the first study, Walker-Andrews (1986) found that 7-month-olds but not 5-month-olds
detected amodal affect when the affect was produced by a stranger. In the second study, Kahana-
Kalman and Walker-Andrews (2001) found that when the affect was produced by the infant’s own
mother, infants as young as 3.5 months of age detected it.
More recently, my colleagues and I have discovered a particularly intriguing and seemingly
paradoxical effect of experience on the development of multisensory responsiveness. We have dis-
covered that some multisensory perceptual functions are initially present early in life and then
decline as infants age. This multisensory perceptual narrowing phenomenon was not predicted by
either the developmental differentiation or the developmental integration view. In these recent stud-
ies, we have found that infants between birth and 6 months of age can match monkey faces and the
vocalizations that they produce but that older infants no longer do so (Lewkowicz and Ghazanfar
2006; Lewkowicz et al. 2008, 2010). In addition, we have found that 6-month-old infants can match
visible and audible phonemes regardless of whether these phonemes are functionally relevant in
their own language or in other languages (Pons et al. 2009). Specifically, we found that 6-month-old
Spanish-learning infants can match a visible /ba/ to an audible /ba/ and a visible /va/ to an audible
/va/, whereas 11-month-old Spanish-learning infants no longer do so. In contrast, English-learning
infants can make such matches at both ages. The failure of the older Spanish-learning infants to
make the matches is correlated with the fact that the /ba/ – /va/ phonetic distinction is not phonemi-
cally functional in Spanish. This means that when older Spanish-learning infants have to choose
between a face mouthing a /ba/ and a face mouthing a /va/ after having listened to one of these
phonemes, they cannot choose the matching face because the phonemes are no longer distinct for
them. Together, our findings on multisensory perceptual narrowing indicate that as infants grow
and gain experience with vocalizing human faces and with native language audiovisual phonology,
their ability to perceive cross-species and cross-language multisensory coherence declines because
nonnative multisensory information is not relevant for everyday functioning.
We have also explored the possible evolutionary origins of multisensory perceptual narrowing
and, thus far, have found that it seems to be restricted to the human species. We tested young vervet
monkeys, at ages when they are old enough to be past the point of narrowing, with the same vocal-
izing rhesus monkey faces that we presented in our initial infant studies and found that vervets
do not exhibit multisensory perceptual narrowing (Zangenehpour et al. 2009). That is, the vervets
matched rhesus monkey visible and audible vocalizations even though they were past the point when
330 The Neural Bases of Multisensory Processes
narrowing should have occurred. We interpreted this finding as reflecting the fact that monkey
brains mature four times as fast as human brains do and that, as a result, young vervets are less open
to the effects of early experience than are human infants. This interpretation suggests that experi-
ence interacts with the speed of neural growth and differentiation and that slower brain growth and
differentiation is highly advantageous because it provides for greater developmental plasticity.
The vervet monkey study demonstrates that the rate of neural growth plays an important role
in the development of behavioral functions and provides yet another example illustrating this key
developmental principle (Turkewitz and Kenny 1982). What about neural and experiential immatu-
rity, especially at the beginning of postnatal and/or posthatching life? Do other organisms, besides
humans, manifest relatively poor and immature multisensory processing functions? The answer
is that they do. A number of studies have found that the kinds of immaturities and developmental
changes observed in human infants are also found in the young of other species. Together, these
studies have found that rats, cats, and monkeys exhibit relatively poor multisensory responsiveness
early in life, that its emergence follows a pattern of gradual improvement, and that early experience
plays a critical role in this process. For example, Wallace and Stein (1997, 2001) have found that
multisensory cells in the superior colliculus of cats and rhesus monkeys, which normally integrate
auditory and visual spatial cues in the adult, do not integrate in newborn cats and monkeys, and that
integration only emerges gradually over the first weeks of life. Moreover, Wallace et al. (2006) have
found that the appropriate alignment of the auditory and visual maps in the superior colliculus of
the rat depends on their normal spatial coregistration. The same kinds of effects have been found in
barn owls and ferrets, in which calibration of the precise spatial tuning of the neural map of auditory
space depends critically on concurrent visual input (King et al. 1988; Knudsen and Brainard 1991).
Finally, in bobwhite quail hatchlings, the ability to respond to the audible and visible attributes of
the maternal hen after hatching depends on prehatching and posthatching experience with the audi-
tory, visual, and tactile stimulation arising from the embryo’s own vocalizations, the maternal hen,
and broodmates (Lickliter and Bahrick 1994; Lickliter et al. 1996).
Taken together, the human and animal data indicate that the general developmental pattern con-
sists of an initial emergence of low-level multisensory abilities, a subsequent experience-dependent
improvement of emerging abilities, and finally, the emergence of higher-level multisensory abilities.
This developmental pattern, especially in humans, appears to be due to the operation of develop-
mental differentiation and developmental integration processes. Moreover, and most intriguing, our
recent discovery of multisensory perceptual narrowing indicates that even though young infants
possess relatively crude and low-level types of multisensory perceptual abilities (i.e., sensitivity to
A–V synchrony relations), these abilities imbue them with much broader multisensory perceptual
tuning than is the case in older infants. As indicated earlier, the distinct advantage of this kind of
tuning is that it provides young infants with a way of bootstrapping their multisensory perceptual
abilities at a time when they are too immature and inexperienced to extract higher-level amodal
attributes.
In the remainder of this chapter, I review results from our studies on infant response to multi-
sensory temporal information as an example of the gradual emergence of multisensory functions.
Moreover, I review additional evidence of the role of developmental differentiation and integration
processes as well as of early experience in the emergence of multisensory responsiveness. Finally,
I speculate on the neural mechanisms that might underlie the developmental emergence of multi-
sensory perception and highlight the importance of studying the interaction between neural and
behavioral growth and experience.
and cognitively meaningful multisensory experiences. This, of course, assumes that they are sensi-
tive to the temporal flow of information in each modality. Indeed, evidence indicates that infants are
sensitive to temporal information at both the unisensory and multisensory levels. For example, it has
been found that infants as young as 3 months of age can predict the occurrence of a visual stimulus
at a particular location based on their prior experience with a temporally predictable pattern of spa-
tiotemporally alternating visual stimuli (Canfield and Haith 1991; Canfield et al. 1997). Similarly,
it has been found that 4-month-old infants can quickly learn to detect a “missing” visual stimulus
after adaptation to a regular and predictable visual stimulus regimen (Colombo and Richman 2002).
In the auditory modality, studies have shown that newborn infants (1) exhibit evidence of temporal
anticipation when they hear a tone that is not followed by glucose—after the tone (CS) and the glu-
cose (UCS) were paired during an initial conditioning phase (Clifton 1974) and (2) can distinguish
between different classes of linguistic input on the basis of the rhythmic attributes of the auditory
input (Nazzi and Ramus 2003). Finally, in the audiovisual domain, it has been found that 7-month-
old infants can anticipate the impending presentation of an audiovisual event when they first hear a
white noise stimulus that has previously reliably predicted the occurrence of the audiovisual event
(Donohue and Berg 1991), and that infants’ duration discrimination improves between 6 and 10
months of age (Brannon et al. 2007). Together, these findings indicate that infants are generally
sensitive to temporal information in the auditory and visual modalities.
impact sound was presented 150, 250, and 350 ms before the object’s visible bounce (sound-first
group) or 250, 350, or 450 ms after the visible bounce (sound-second group). Infants in the sound-
first group detected the 350 ms asynchrony, whereas infants in the sound-second group detected the
450 ms asynchrony (no age effects were found). Adults, who were tested in a similar task and with
the same stimuli, detected an asynchrony of 80 ms in the sound-first condition and 112 ms in the
sound-second condition. Conceptualizing these results in terms of an intersensory temporal conti-
guity window (ITCW), they indicate that the ITCW is wider in infants than it is in adults and that it
decreases in size during development.
Fam-0 ms.
Nov-366 ms.
Nov-500 ms.
12 Nov-666 ms.
*
10
0
Test Trials
FIGURE 17.1 Mean duration of looking during test trials in response to each of three different A–V tem-
poral asynchronies after habituation to a synchronous audiovisual syllable. Error bars indicate standard error
of mean and asterisk indicates that response recovery in that particular test trial was significantly higher than
response obtained in the familiar test trial (Fam-0 ms.).
of a unity assumption, short-term exposure to an asynchronous multisensory event does not cause
infants to treat it as synchronous but rather focuses their attention on the event’s temporal attributes
and, in the process, sharpens their perception of A–V temporal relations.
Finally, to investigate the mechanisms underlying A–V asynchrony detection, in Experiment
3, we habituated infants to a synchronous audiovisual syllable and then tested them again for the
detection of asynchrony with audiovisual asynchronies of 366, 500, and 666 ms. This time, how-
ever, the test stimuli consisted of a visible syllable and a 400 Hz tone rather than the audible syllable.
12 Fam-666 ms.
Nov-500 ms.
Nov-366 ms.
Nov-0 ms. *
10
Mean duration of looking (s)
*
8
0
Test Trials
FIGURE 17.2 Mean duration of looking during test trials in response to each of three different A–V tempo-
ral asynchronies after habituation to an asynchronous audiovisual syllable. Error bars indicate standard error
of mean and asterisks indicate that response recovery in those particular test trials was significantly higher
than response obtained in the familiar test trial (Fam-666 ms.).
334 The Neural Bases of Multisensory Processes
Fam-0 ms. *
12 Nov-366 ms.
Nov-500 ms.
Nov-666 ms.
10
0
Test Trials
FIGURE 17.3 Mean duration of looking during test trials in response to each of three different A–V tempo-
ral asynchronies after habituation to an audiovisual stimulus consisting of a visible syllable and a synchronous
tone. Error bars indicate standard error of mean and asterisk indicates that response recovery in that particular
test trial was significantly higher than response obtained in the familiar test trial (Fam-0 ms.).
Substituting the tone for the acoustic part of the syllable was done to determine whether the dynamic
variations in the spectral energy inherent in the acoustic part of the audiovisual speech signal and/
or their correlation with the dynamic variations in gestural information contribute to infant detec-
tion of A–V speech synchrony relations. Once again, infants detected the 666 ms asynchrony but
not the two lower ones (see Figure 17.3). The fact that these findings replicated the findings from
Experiment 1 indicates that infants do not rely on acoustic spectral energy nor on its correlation
with the dynamic variations in the gestural information to detect A–V speech synchrony relations.
Rather, it appears that infants attend primarily to energy onsets and offsets when processing A–V
speech synchrony relations, suggesting that detection of such relations is not likely to require the
operation of higher-level neural mechanisms.
synchrony relations to perceive even nonnative facial gestures and accompanying vocalizations as
coherent entities. The older infants no longer do so for two related reasons. First, they gradually
shift their attention to higher-level perceptual features as a function of increasing neural growth,
maturation of their perceptual systems, and increasing perceptual experience all acting together
to make it possible for them to extract such features. Second, their exclusive and massive experi-
ence with human faces and vocalizations narrows their perceptual expertise to ecologically relevant
signals. In other words, as infants grow and as they acquire experience with vocalizing faces, they
learn to extract more complex features (e.g., gender, affect, and identity), rendering low-level syn-
chrony relations much less relevant. In addition, as infants grow, they acquire exclusive experience
with human faces and vocalizations and, as a result, become increasingly more specialized. As they
specialize, they stop responding to the faces and vocalizations of other species.
Because the matching faces and vocalizations corresponded not only in terms of onset and offset
synchrony but in terms of duration as well, the obvious question is whether amodal duration might
have contributed to multisensory matching. To investigate this question, we repeated the Lewkowicz
and Ghazanfar (2006) procedures in a subsequent study (Lewkowicz et al. 2008), except that this
time, we presented the monkey audible calls out of synchrony with respect to both visible calls. This
meant that the corresponding visible and audible calls were now only related in terms of their dura-
tion. Results yielded no matching in either the 4- to 6-month-old or the 8- to 10-month-old infants,
indicating that A–V temporal synchrony mediated successful matching in the younger infants. The
fact that the younger infants did not match in this study, despite the fact that the corresponding faces
and vocalizations corresponded in their durations, shows that duration did not mediate matching in
the original study. This is consistent with previous findings that infants do not match equal-duration
auditory and visual inputs unless they are also synchronous (Lewkowicz 1986).
If A–V temporal synchrony mediates intersensory matching in young infants, and if responsive-
ness to this multisensory cue depends on a basic and relatively low-level process, then it is possible
that cross-species multisensory matching emerges very early in development. To determine if that
is the case, we asked whether newborns also might be able to match monkey faces and vocaliza-
tions (Lewkowicz et al. 2010). In Experiment 1 of this study, we used the identical stimulus mate-
rials and testing procedures used by Lewkowicz and Ghazanfar (2006), and found that newborns
also matched visible and audible monkey calls. We then investigated whether successful matching
reflected matching of the synchronous onsets and offsets of the audible and visible calls. If so,
then newborns should be able to make the matches even when some of the identity information is
removed. Thus, we repeated Experiment 1, except that rather than present the natural call, we pre-
sented a complex tone in Experiment 2. To preserve the critical temporal features of the audible call,
we ensured that the tone had the same duration as the natural call and that its onsets and offsets were
synchronous with the matching visible call. Despite the absence of acoustic identity information and
the absence of a correlation between the dynamic variations in facial gesture information and the
amplitude and formant structure inherent in the natural audible call, newborns still performed suc-
cessful intersensory matching. This indicates that newborns’ ability to make cross-species matches
in Experiment 1 was based on their sensitivity to the temporally synchronous onsets and offsets of
the matching faces and vocalizations and that it was not based on identity information nor on the
dynamic correlation between the visible and audible call features.
Together, the positive findings of cross-species intersensory matching in newborns and 4- to
6-month-old infants demonstrate that young infants are sensitive to a basic feature of their percep-
tual world, namely, stimulus energy onsets and offsets. This basic perceptual sensitivity bootstraps
newborns’ entry into the world of multisensory objects and events and enables them to perceive
them as coherent entities, regardless of their specific identity. This sensitivity is especially potent
when the visual information is dynamic. When it is not, infants do not begin to bind the auditory
and visual attributes of multisensory objects, such as color/shape and pitch, or color and taste until
the second half of the first year of life. The pervasive and fundamental role that A–V temporal syn-
chrony plays in infant perceptual response to multisensory attributes suggests that sensitivity to this
336 The Neural Bases of Multisensory Processes
intersensory perceptual cue reflects the operation of a fundamental early perceptual mechanism.
That is, as indicated earlier, even though sensitivity to A–V temporal synchrony is mediated by rela-
tively basic and low-level processing mechanisms, it provides infants with a powerful initial percep-
tual tool for gradually discovering that multisensory objects are characterized by many other forms
of intersensory invariance. For example, once infants start to bind the audible and visible attributes
of talking faces, they are in a position to discover that faces and the vocalizations that accompany
them could also be specified by common duration, tempo, and rhythm, as well as by higher-level
amodal and invariant attributes such as affect, gender, and identity.
made an impact sound, turned to the right, and moved off to the side and disappeared. This cycle
was repeated for the duration of each habituation trial. After habituation, infants were given test
trials during which the order of sequence elements was changed in some way and the question was
whether they detected the change.
In an initial study (Lewkowicz 2004), we asked whether infants can learn a sequence composed of
three moving/impacting objects and, if so, what aspects of that sequence they encoded. Results indi-
cated that 4-month-old infants detected serial order changes only when the changes were specified
concurrently by audible and visible attributes during the learning as well as the test phase and only
when the impact part of the event—a local event feature that was not informative about sequential
order—was blocked from view. In contrast, 8-month-old infants detected order changes regardless
of whether they were specified by unisensory or bisensory attributes and whether they could see the
impact or not. In sum, younger infants required multisensory redundancy to detect the serial order
changes whereas older infants did not. A follow-up study (Lewkowicz 2008) replicated the earlier
findings, ruled out primacy effects, extended the earlier findings by showing that even 3-month-old
infants can perceive and discriminate three-element dynamic audiovisual sequences and that they
also rely on multisensory redundancy for successful learning and discrimination. In addition, this
study showed that object motion plays an important role in that infants exhibited less robust respon-
siveness to audiovisual sequences consisting of looming rather than explicitly moving objects.
Because the changes in our two initial studies involved changes in the order of a particular object/
impact sound as well as its statistical relations vis-à-vis the other sequence elements, we investigated
the separate role of each of these sequential attributes in our most recent work (Lewkowicz and
Berent 2009). Here, we investigated directly whether 4-month-old infants could track the statis-
tical relations among specific sequence elements (e.g., AB, BC), and/or whether they could also
encode abstract ordinal position information (e.g., that B is the second element in a sequence such
as ABCD). Thus, across three experiments, we habituated infants to sequences of four moving/
sounding objects in which three of the objects and their sounds varied in their ordinal position but
in which the position of one target object/sound remained invariant (e.g., ABCD, CBDA). Figure
17.4 shows an example of one of these sequences and how they moved. We then tested whether the
infants detected a change in the target’s position. We found that infants detected an ordinal position
change only when it disrupted the statistical relations between adjacent elements, but not when the
statistical relations were controlled. Together, these findings indicate that 4-month-old infants learn
the order of sequence elements by tracking their statistical relations but not their invariant ordinal
position. When these findings are combined with the previously reviewed findings on sequence
FIGURE 17.4 One of three different sequences presented during the habituation phase of the sequence
learning experiment (actual objects presented are shown). Each object made a distinct impact sound when it
came in contact with the black ramp. Across three different sequences, the triangle was the target stimulus
and, thus, for one group of infants, the target remained in second ordinal position during habituation phase
and then changed to third ordinal position in the test trials.
338 The Neural Bases of Multisensory Processes
learning in infancy, they show that different and increasingly more complex temporal sequence
learning abilities emerge during infancy. For example, they suggest that the ability to perceive and
learn the invariant ordinal position of a sequence element emerges sometime after 4 months of age.
When it emerges and what mediates its emergence is currently an open question, as are the ques-
tions about the emergence of the other more complex sequence perception and learning skills.
If multisensory interactions begin to occur right after the sensory input stage and before sensory
elaboration has occurred, and if such interactions continue to occur as the information ascends
the neural pathways to the traditional association areas of the cortex, then this resolves a critical
problem. From the standpoint of the adult brain, it solves the problem of having to wait until the
higher-order cortical areas can extract the various types of relations inherent in multisensory input.
This way, the observer can begin to perform a veridical scene analysis and arrive at a coherent mul-
tisensory experience shortly after input arrives at the sensory organs (Foxe and Schroeder 2005).
From the standpoint of the immature infant brain, the adult findings raise some interesting possibili-
ties. For example, because these early neural interactions are of a relatively low level, they are likely
to occur very early in human development and can interact with any other low level subcortical
integration mechanisms. Whether this scenario is correct is currently unknown and awaits further
investigation. As shown here, behavioral findings from human infants support these conjectures
in that starting at birth, human infants are capable of multisensory perception. Thus, the question
is no longer whether such mechanisms operate but rather what is their nature and where in the
brain are such mechanisms operational. Another interesting question is whether the heterochronous
emergence of heterogeneous multisensory perceptual skills that has been found in behavioral infant
studies (Lewkowicz 2002) is reflected in the operation of distinct neural mechanisms emerging at
different times and in different regions of the brain.
There is little doubt that the neural mechanisms underlying multisensory processing are likely to be
quite rudimentary in early human development. The central nervous system as well as the different sen-
sory systems are immature and young infants are perceptually and cognitively inexperienced. This is
the case despite the fact that the tactile, vestibular, chemical, and auditory modalities begin to function
before birth (Gottlieb 1971) and despite the fact that this provides fetuses with some sensory experience
and some opportunity for intersensory interaction (Turkewitz 1994). Consequently, newborn infants are
relatively unprepared for the onslaught of new multisensory input that also, for the first time, includes
visual information. In addition, newborns are greatly limited by the immature nature of their different
sensory systems (Kellman and Arterberry 1998). That is, their visual limitations include poor spatial
and temporal resolution and poor sensitivity to contrast, orientation, motion, depth, and color. Their
auditory limitations include much higher thresholds compared to adults and include higher absolute
frequency, frequency resolution, and temporal resolution thresholds. Obviously, these basic sensory
functions improve rapidly over the first months of life, but there is little doubt that they initially impose
limitations on infant perception and probably account for some of the developmental changes found
in the development of multisensory responsiveness. The question for future studies is: How do infants
overcome these limitations? The work reviewed here suggests that the answer lies in the complex inter-
actions between neural and behavioral levels of organization and in the daily experiences that infants
have in their normal ecological setting. Because developmental change is driven by such interactions
(Gottlieb et al. 2006), the challenge for future studies is to explicate these interactions.
REFERENCES
Bahrick, L.E. 1983. Infants’ perception of substance and temporal synchrony in multimodal events. Infant
Behavior & Development 6:429–51.
Bahrick, L.E. 1994. The development of infants’ sensitivity to arbitrary intermodal relations. Ecological
Psychology 6:111–23.
Bahrick, L.E., and R. Lickliter. 2000. Intersensory redundancy guides attentional selectivity and perceptual
learning in infancy. Developmental Psychology 36:190–201.
Bahrick, L.E., R. Lickliter, and R. Flom. 2004. Intersensory redundancy guides the development of selective
attention, perception, and cognition in infancy. Current Directions in Psychological Science 13:99–102.
Birch, H.G., and A. Lefford. 1963. Intersensory development in children. Monographs of the Society for
Research in Child Development 25.
Birch, H.G., and A. Lefford. 1967. Visual differentiation, intersensory integration, and voluntary motor control.
Monographs of the Society for Research in Child Development 32:1–87.
340 The Neural Bases of Multisensory Processes
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for
multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89.
Brannon, E.M., S. Suanda, and K. Libertus. 2007. Temporal discrimination increases in precision over develop-
ment and parallels the development of numerosity discrimination. Developmental Science 10:770–7.
Bremner, A.J., N.P. Holmes, and C. Spence. 2008. Infants lost in (peripersonal) space? Trends in Cognitive
Sciences 12:298–305.
Brookes, H., A. Slater, P.C. Quinn et al. 2001. Three-month-old infants learn arbitrary auditory-visual pairings
between voices and faces. Infant & Child Development 10:75–82.
Bushara, K.O., J. Grafman, and M. Hallett. 2001. Neural correlates of auditory-visual stimulus onset asyn-
chrony detection. Journal of Neuroscience 21:300–4.
Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific corti-
ces during crossmodal binding. Neuroreport: For Rapid Communication of Neuroscience Research
10:2619–23.
Calvert, G.A., C. Spence, and B. Stein (eds.). 2004. The Handbook of Multisensory Processes. Cambridge,
MA: MIT Press.
Canfield, R.L., and M.M. Haith. 1991. Young infants’ visual expectations for symmetric and asymmetric stimu-
lus sequences. Developmental Psychology 27:198–208.
Canfield, R.L., E.G. Smith, M.P. Brezsnyak, and K.L. Snow. 1997. Information processing through the first
year of life: A longitudinal study using the visual expectation paradigm. Monographs of the Society for
Research in Child Development 62:v–vi, 1–145.
Clifton, R.K. 1974. Heart rate conditioning in the newborn infant. Journal of Experimental Child Psychology
18:9–21.
Colombo, J., and W.A. Richman. 2002. Infant timekeeping: Attention and temporal estimation in 4-month-olds.
Psychological Science 13:475–9.
Donohue, R.L., and W.K. Berg. 1991. Infant heart-rate responses to temporally predictable and unpredictable
events. Developmental Psychology 27:59–66.
Fiser, J., and R.N. Aslin. 2002. Statistical learning of new visual feature combinations by infants. Proceedings
of the National Academy of Sciences of the United States of America 99:15822–6.
Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical
processing. Neuroreport 16:419.
Fraisse, P. 1982. The adaptation of the child to time. In W.J. Friedman (ed.), The developmental psychology of
time, 113–40. New York: Academic Press.
Frank, M.C., J.A. Slemmer, G.F. Marcus, and S.P. Johnson. 2009. Information from multiple modalities helps
5-month-olds learn abstract rules. Developmental Science 12:504–9.
Fujisaki, W., S. Shimojo, M. Kashino, and S.Y. Nishida. 2004. Recalibration of audiovisual simultaneity.
Nature Neuroscience 7:773–8.
Gardner, J.M., D.J. Lewkowicz, S.A. Rose, and B.Z. Karmel. 1986. Effects of visual and auditory stimula-
tion on subsequent visual preferences in neonates. International Journal of Behavioral Development
9:251–63.
Gebhard, J.W., and G.H. Mowbray. 1959. On discriminating the rate of visual flicker and auditory flutter.
American Journal of Psychology 72:521–9.
Gerken, L. 2006. Decisions, decisions: Infant language learning when multiple generalizations are possible.
Cognition 98:B67–74.
Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences
10:278–85. Epub 2006 May 18.
Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in
humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90.
Gibson, J.J. 1966. The senses considered as perceptual systems. Boston: Houghton-Mifflin.
Gibson, E.J. 1969. Principles of perceptual learning and development. New York: Appleton.
Gómez, R.L., and J. Maye. 2005. The developmental trajectory of nonadjacent dependency learning. Infancy
7:183–206.
Gottlieb, G. 1971. Ontogenesis of sensory function in birds and mammals. In The biopsychology of develop-
ment, ed. E. Tobach, L.R. Aronson, and E. Shaw, 67–128. New York: Academic Press.
Gottlieb, G., D. Wahlsten, and R. Lickliter. 2006. The significance of biology for human development: A devel-
opmental psychobiological systems view. In Handbook of child psychology, ed. R. Lerner, 210–57. New
York: Wiley.
Development of Multisensory Temporal Perception 341
Greenfield, P.M. 1991. Language, tools and brain: The ontogeny and phylogeny of hierarchically organized
sequential behavior. Behavioral and Brain Sciences 14:531–95.
Gulya, M., and M. Colombo. 2004. The ontogeny of serial-order behavior in humans (Homo sapiens):
Representation of a list. Journal of Comparative Psychology 118:71–81.
Handel, S., and L. Buffardi. 1969. Using several modalities to perceive one temporal pattern. Quarterly Journal
of Experimental Psychology 21:256–66.
Johnson, S.P., K.J. Fernandes, M.C. Frank et al. 2009. Abstract rule learning for visual sequences in 8- and
11-month-olds. Infancy 14:2–18.
Kahana-Kalman, R., and A.S. Walker-Andrews. 2001. The role of person familiarity in young infants’ percep-
tion of emotional expressions. Child Development 72:352–69.
Kamachi, M., H. Hill, K. Lander, and E. Vatikiotis-Bateson. 2003. Putting the face to the voice: Matching
identity across modality. Current Biology 13:1709–14.
Kellman, P.J., and M.E. Arterberry. 1998. The cradle of knowledge: Development of perception in infancy.
Cambridge, MA: MIT Press.
King, A.J., M.E. Hutchings, D.R. Moore, and C. Blakemore. 1988. Developmental plasticity in the visual and
auditory representations in the mammalian superior colliculus. Nature 332:73–6.
Kirkham, N.Z., J.A. Slemmer, and S.P. Johnson. 2002. Visual statistical learning in infancy: Evidence for a
domain general learning mechanism. Cognition 83:B35–42.
Knudsen, E.I., and M.S. Brainard. 1991. Visual instruction of the neural map of auditory space in the develop-
ing optic tectum. Science 253:85–7.
Kuhl, P.K., and A.N. Meltzoff. 1982. The bimodal perception of speech in infancy. Science 218:1138–41.
Kuhl, P.K., and A.N. Meltzoff. 1984. The intermodal representation of speech in infants. Infant Behavior &
Development 7:361–81.
Lashley, K.S. 1951. The problem of serial order in behavior. In Cerebral mechanisms in behavior: The Hixon
symposium, ed. L.A. Jeffress, 123–47. New York: Wiley.
Lewkowicz, D.J. 1986. Developmental changes in infants’ bisensory response to synchronous durations. Infant
Behavior & Development 9:335–53.
Lewkowicz, D.J. 1992a. Infants’ response to temporally based intersensory equivalence: The effect of synchro-
nous sounds on visual preferences for moving stimuli. Infant Behavior & Development 15:297–324.
Lewkowicz, D.J. 1992b. Infants’ responsiveness to the auditory and visual attributes of a sounding/moving
stimulus. Perception & Psychophysics 52:519–28.
Lewkowicz, D.J. 1994. Limitations on infants’ response to rate-based auditory-visual relations. Developmental
Psychology 30:880–92.
Lewkowicz, D.J. 1996. Perception of auditory-visual temporal synchrony in human infants. Journal of
Experimental Psychology: Human Perception & Performance 22:1094–106.
Lewkowicz, D.J. 2000a. The development of intersensory temporal perception: An epigenetic systems/limita-
tions view. Psychological Bulletin 126:281–308.
Lewkowicz, D.J. 2000b. Infants’ perception of the audible, visible and bimodal attributes of multimodal syl-
lables. Child Development 71:1241–57.
Lewkowicz, D.J. 2002. Heterogeneity and heterochrony in the development of intersensory perception.
Cognitive Brain Research 14:41–63.
Lewkowicz, D.J. 2003. Learning and discrimination of audiovisual events in human infants: The hierarchical
relation between intersensory temporal synchrony and rhythmic pattern cues. Developmental Psychology
39:795–804.
Lewkowicz, D.J. 2004. Perception of serial order in infants. Developmental Science 7:175–84.
Lewkowicz, D.J. 2008. Perception of dynamic and static audiovisual sequences in 3- and 4-month-old infants.
Child Development 79:1538–54.
Lewkowicz, D.J. 2010. Infant perception of audio-visual speech synchrony. Developmental Psychology 46:66–77.
Lewkowicz, D.J., and I. Berent. 2009. Sequence learning in 4-month-old infants: Do infants represent ordinal
information? Child Development 80:1811–23.
Lewkowicz, D., and K. Kraebel. 2004. The value of multisensory redundancy in the development of intersen-
sory perception. The Handbook of Multisensory Processes: 655–78. Cambridge, MA: MIT Press.
Lewkowicz, D.J., and A.A. Ghazanfar. 2006. The decline of cross-species intersensory perception in human
infants. Proceedings of the National Academy of Sciences of the United States of America 103:6771–4.
Lewkowicz, D.J., and G. Turkewitz. 1980. Cross-modal equivalence in early infancy: Auditory–visual intensity
matching. Developmental Psychology 16:597–607.
Lewkowicz, D.J., and G. Turkewitz. 1981. Intersensory interaction in newborns: Modification of visual prefer-
ences following exposure to sound. Child Development 52:827–32.
342 The Neural Bases of Multisensory Processes
Lewkowicz, D.J., R. Sowinski, and S. Place. 2008. The decline of cross-species intersensory perception in human
infants: Underlying mechanisms and its developmental persistence. Brain Research 1242:291–302.
Lewkowicz, D.J., I. Leo, and F. Simion. 2010. Intersensory perception at birth: Newborns match non-human
primate faces and voices. Infancy 15:46–60.
Lickliter, R., and L.E. Bahrick. 2000. The development of infant intersensory perception: Advantages of a
comparative convergent-operations approach. Psychological Bulletin 126:260–80.
Lickliter, R., and H. Banker. 1994. Prenatal components of intersensory development in precocial birds. In
Development of intersensory perception: Comparative perspectives, ed. D.J. Lewkowicz and R. Lickliter,
59–80. Norwood, NJ: Lawrence Erlbaum Associates, Inc.
Lickliter, R., D.J. Lewkowicz, and R.F. Columbus. 1996. Intersensory experience and early perceptual devel-
opment: The role of spatial contiguity in bobwhite quail chicks’ responsiveness to multimodal maternal
cues. Developmental Psychobiology 29:403–16.
Maier, N.R.F., and T.C. Schneirla. 1964. Principles of animal psychology. New York: Dover Publications.
Marcovitch, S., and D.J. Lewkowicz. 2009. Sequence learning in infancy: The independent contributions of
conditional probability and pair frequency information. Developmental Science 12:1020–5.
Marcus, G.F., S. Vijayan, S. Rao, and P. Vishton. 1999. Rule learning by seven-month-old infants. Science
283:77–80.
Marcus, G.F., K.J. Fernandes, and S.P. Johnson. 2007. Infant rule learning facilitated by speech. Psychological
Science 18:387–91.
Marks, L. 1978. The unity of the senses. New York: Academic Press.
Martin, J.G. 1972. Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological
Review 79:487–509.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:229–39.
Molholm, S., W. Ritter, M.M. Murray et al. 2002. Multisensory auditory–visual interactions during early
sensory processing in humans: A high-density electrical mapping study. Cognitive Brain Research
14:115–28.
Munhall, K.G., and E. Vatikiotis-Bateson. 2004. Spatial and temporal constraints on audiovisual speech percep-
tion. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 177–88.
Cambridge, MA: MIT Press.
Myers, A.K., B. Cotton, and H.A. Hilp. 1981. Matching the rate of concurrent tone bursts and light flashes as a
function of flash surround luminance. Perception & Psychophysics 30(1):33–8.
Navarra, J., A. Vatakis, M. Zampini et al. 2005. Exposure to asynchronous audiovisual speech extends the tem-
poral window for audiovisual integration. Cognitive Brain Research 25:499–507.
Nazzi, T., and F. Ramus. 2003. Perception and acquisition of linguistic rhythm by infants. Speech Communication
41:233–43.
Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory
spatial integration and perception in humans. Developmental Science 9:454–64.
Nelson, K. 1986. Event knowledge: Structure and function in development. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Nelson, K. 2007. Young minds in social worlds. Cambridge, MA: Harvard Univ. Press.
Oppenheim, R.W. 1981. Ontogenetic adaptations and retrogressive processes in the development of the nervous
system and behavior: A neuroembryological perspective. In Maturation and development: Biological and
psychological perspectives, ed. K.J. Connolly and H.F.R. Prechtl, 73–109. Philadelphia, PA: Lippincott.
Partan, S., and P. Marler. 1999. Communication goes multimodal. Science 283:1272–3.
Patterson, M.L., and J.F. Werker. 2002. Infants’ ability to match dynamic phonetic and gender information in
the face and voice. Journal of Experimental Child Psychology 81:93–115.
Patterson, M.L., and J.F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice.
Developmental Science 6(2):191–6.
Piaget, J. 1952. The origins of intelligence in children. New York: International Universities Press.
Pons, F., D.J. Lewkowicz, S. Soto-Faraco, and N. Sebastián-Gallés. 2009. Narrowing of intersensory speech
perception in infancy. Proceedings of the National Academy of Sciences of the United States of America
106:10598–602.
Reardon, P., and E.W. Bushnell. 1988. Infants’ sensitivity to arbitrary pairings of color and taste. Infant Behavior
and Development 11:245–50.
Romei, V., M.M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of
low-level human visual cortex excitability by sounds. Current Biology 19:1799–805.
Rowe, C. 1999. Receiver psychology and the evolution of multicomponent signals. Animal Behaviour
58:921–31.
Development of Multisensory Temporal Perception 343
Saffran, J.R., R.N. Aslin, and E.L. Newport. 1996. Statistical learning by 8-month-old infants. Science
274:1926–8.
Scheier, C., D.J. Lewkowicz, and S. Shimojo. 2003. Sound induces perceptual reorganization of an ambiguous
motion display in human infants. Developmental Science 6:233–44.
Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385:308.
Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature 408(6814):788.
Shipley, T. 1964. Auditory flutter-driving of visual flicker. Science 145:1328–30.
Slater, A., E. Brown, and M. Badenoch. 1997. Intermodal perception at birth: Newborn infants’ memory for
arbitrary auditory–visual pairings. Early Development & Parenting 6:99–104.
Slater, A., P.C. Quinn, E. Brown, and R. Hayes. 1999. Intermodal perception at birth: Intersensory redundancy
guides newborn infants’ learning of arbitrary auditory–visual pairings. Developmental Science 2:333–8.
Slutsky, D.A., and G.H. Recanzone. 2001. Temporal and spatial dependency of the ventriloquism effect.
Neuroreport 12:7–10.
Stein, B.E., and M.A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press.
Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single
neuron. Nature Reviews. Neuroscience 9:255–66.
Stein, B.E., D. Burr, C. Constantinidis et al. 2010. Semantic confusion regarding the development of multisen-
sory integration: A practical solution. European Journal of Neuroscience 31:1713–20.
Thelen, E., and L.B. Smith. 1994. A dynamic systems approach to the development of cognition and action.
Cambridge, MA: MIT Press.
Thomas, K.M., and C.A. Nelson. 2001. Serial reaction time learning in preschool- and school-age children.
Journal of Experimental Child Psychology 79:364–87.
Turkewitz, G. 1994. Sources of order for intersensory functioning. In The development of intersensory percep-
tion: Comparative perspectives, ed. D.J. Lewkowicz and R. Lickliter, 3–17. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Turkewitz, G., and P.A. Kenny. 1982. Limitations on input as a basis for neural organization and perceptual
development: A preliminary theoretical statement. Developmental Psychobiology 15:357–68.
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by
exposure to audio-visual asynchrony. Cognitive Brain Research 22:32–5.
Walker-Andrews, A.S. 1986. Intermodal perception of expressive behaviors: Relation of eye and voice?
Developmental Psychology 22:373–7.
Walker-Andrews, A.S. 1997. Infants’ perception of expressive behaviors: Differentiation of multimodal infor-
mation. Psychological Bulletin 121:437–56.
Walker-Andrews, A.S., L.E. Bahrick, S.S. Raglioni, and I. Diaz. 1991. Infants’ bimodal perception of gender.
Ecological Psychology 3:55–75.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience 17:2429–44.
Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior col-
liculus. Journal of Neuroscience 21:8886–94.
Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation.
Proceedings of the National Academy of Sciences of the United States of America 101:2167–72.
Wallace, M.T., B.E. Stein, and R. Ramachandran. 2006. Early experience determines how the senses will inter-
act: A revised view of sensory cortical parcellation. Journal of Neurophysiology 101:2167–72.
Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological
Bulletin 88:638–67.
Welch, R.B., L.D. Duttenhurt, and D.H. Warren. 1986. Contributions of audition and vision to temporal rate
perception. Perception & Psychophysics 39:294–300.
Werner, H. 1973. Comparative psychology of mental development. New York: International Universities
Press.
Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior.
Speech Communication 26:23–43.
Zangenehpour, S., A.A. Ghazanfar, D.J. Lewkowicz, and R.J. Zatorre. 2009. Heterochrony and cross-species
intersensory matching by infant vervet monkeys. PLoS ONE 4:e4302.
18 Multisensory Integration
Develops Late in Humans
David Burr and Monica Gori
CONTENTS
18.1 Development of Multimodal Perception in Infancy and Childhood..................................... 345
18.2 Neurophysiological Evidence for Development of Multimodal Integration.......................... 347
18.3 Development of Cue Integration in Spatial Navigation......................................................... 348
18.4 Development of Audiovisual Cue Integration....................................................................... 349
18.5 Sensory Experience and Deprivation Influence Development of Multisensory
Integration.............................................................................................................................. 350
18.6 Development of Visuo-Haptic Integration............................................................................. 351
18.7 Calibration by Cross-Modal Comparison?............................................................................ 355
18.8 Haptic Discrimination in Blind and Low-Vision Children: Disruption of Cross-
Sensory Calibration?.............................................................................................................. 356
18.9 Concluding Remarks: Evidence of Late Multisensory Development.................................... 357
Acknowledgment............................................................................................................................ 358
References....................................................................................................................................... 358
345
346 The Neural Bases of Multisensory Processes
neural reorganization lasting well into early adolescence (Paus 2005). A further complication is that
different senses develop at different rates: first touch, followed by vestibular, chemical, and auditory
(all beginning to function before birth), and finally vision (Gottlieb 1971). The differences in devel-
opment rates could exacerbate the challenges for cross-modal integration and calibrating, needing
to take into account growing limbs, eye length, interocular distances, etc.
Some sensory properties, like contrast sensitivity, visual acuity, binocular vision, color percep-
tion, and some kinds of visual motion perception mature rapidly to reach near adult-like levels
within 8 to 12 months of age (for a review, see Atkinson 2000). Similarly, young infants can explore,
manipulate, and discriminate the form of objects haptically, analyzing and coding tactile and weight
information, during a period when their hands are undergoing rapid changes (Streri 2003; Streri
et al. 2000, 2004; Striano and Bushnell 2005).
On the other hand, not all perceptual skills develop early. For example, auditory frequency dis-
crimination (Olsho 1984; Olsho et al. 1988), temporal discrimination (Trehub et al. 1995), and basic
speech abilities all improve during infancy (Jusczyk et al. 1998). Also, projective size and shape
are not noticed or understood until at least 7 years of age, and evidence suggests that even visual
acuity and contrast sensitivity continue to improve slightly up until 5 to 6 years of age (Brown et
al. 1987). Other attributes, such as the use of binocular cues to control prehensile movements (Watt
et al. 2003) and the development of complex form and motion perception (Del Viva et al. 2006;
Ellemberg et al. 1999, 2004; Kovács et al. 1999; Lewis et al. 2004) continue until 8 to 14 years of
age. Object manipulation also continues to improve until 8 to 14 years (Rentschler et al. 2004),
and tactile object recognition in blind and sighted children does not develop until 5 to 6 years
(Morrongiello et al. 1994). Many other complex and experience-dependent capacities, such as facili-
tation of speech perception in noise (e.g., Elliott 1979; Johnson 2000), have been reported to be
immature throughout childhood.
All these studies suggest that there is a difference not only in the developmental rates of different
sensory systems, but also in the development of different aspects within each sensory system, all
potential obstacles for the development of cue integration. The development of multimodal percep-
tual abilities in human infants has been studied with various techniques, such as habituation and
preferential looking. Many studies suggest that some multisensory processes, such as cross-modal
facilitation, cross-modal transfer, and multisensory matching are present to some degree at an early
age (e.g., Streri 2003; Lewkowicz 2000, for review). Young infants can match signals between dif-
ferent sensory modalities (Dodd 1979; Lewkowicz and Turkewitz 1981) and detect equivalence in
the amodal properties of objects across the senses (e.g., Patterson and Werker 2002; Rose 1981). For
example, they can match faces with voices (Bahrick 2001) and visual and auditory motion signals
(Lewkowicz 1992) on the basis of their synchrony. By 3 to 5 months of age, they can discriminate
audiovisual changes in tempo and rhythm (Bahrick et al. 2002; Bahrick and Lickliter 2000), from
4 months of age, they can match visual and tactile form properties (Rose and Ruff 1987), and at
about 6 months of age, they can do duration-based matches (Lewkowicz 1986).
Young infants seem to be able to benefit from multimodal redundancy of information across
senses (Bahrick and Lickliter 2000, 2004; Bahrick et al. 2002; Lewkowicz 1988a, 1996; Neil et al.
2006). There is also evidence for cross-modal facilitation, in which stimuli in one modality increases
the responsiveness to stimuli in other modalities (Lewkowicz and Lickliter 1994; Lickliter et al.
1996; Morrongiello et al. 1998). However, not all forms of facilitation develop early. Infants do not
exhibit multisensory facilitation of reflexive head and eye movements for spatial localization until
about 8 months of age (Neil et al. 2006), and multisensory coactivation during a simple audiovisual
detection task does not occur until 8 years of age in most children (Barutchu et al. 2009, 2010).
Recent studies suggest that human infants can transfer information gleaned from one sense to
another (e.g., Streri 2003; Streri et al. 2004). For example, 1-month-old infants can visually recog-
nize an object they have previously explored orally (Gibson and Walker 1984; Meltzoff and Borton
1979) and 2-month-old infants can visually recognize an object they have previously felt (Rose
1981; Streri et al. 2008). However, many of these studies show an asymmetry in the transfer (Sann
Multisensory Integration Develops Late in Humans 347
and Streri 2007; Streri 2003; Streri et al. 2008) or a partial dominance of one modality over another
(Lewkowicz 1988a, 1988b), supporting the idea that, even when multimodal skills are present, they
are not necessarily fully mature. Recent results (Bremner et al. 2008a, 2008b) on the representation
of peripersonal space support the presence of two distinct mechanisms in sensory integration with
different developmental trends: the first, relying principally on visual information, is present dur-
ing the first 6 months; the second, incorporating information of hand and body posture with visual,
develops only after 6.5 months of age.
Over the past years, the majority of multisensory studies in infants and children have investi-
gated the development of multisensory matching, transfer, and facilitation abilities, whereas few of
those have investigated the development of multisensory integration. Those few that did investigate
multisensory integration in school-age children point to unimodal dominance rather than integra-
tion abilities (Hatwell 1987; Klein 1966; McGurk and Power 1980; Misceo et al. 1999).
(a) (b)
1 1R
2 1 2
3 3
Start Start
(c) SM (self-motion)
100
LM (landmarks)
80 SM+LM
Mean SD (cm)
60
40
20
0
4-5 yr. 7-8 yr. Adult
Group
(d)
Prediction, integration model ±1 s.e. Prediction, alternation model ±1 s.e.
mean predicted SD (model)
Mean SD (cm) (measured)/
FIGURE 18.1 (See color insert.) Use of multiple cues for navigation in adults and children. (a) Representation
of room in which subject performed the task in nonconflictual condition. Starting from “start,” subject picked up
three numbered objects in sequence. Three visual landmarks (a “moon,” a “lightning bolt,” and a “star”) were also
present in the room. (b) Representation of room in which subject performed the task in conflictual condition. Here,
landmarks were rotated around the subject from white to colored position of 15°. (c) Mean standard deviation (SD)
of participant responses for three different conditions. (d) Curves report the means of functions that predict mean
standard deviation (SD ±1 SE) from integration model (in green) or alternation model (in pink) for different age
groups. (Reproduced from Nardini, M. et al., Curr. Biol., 18, 689–693, 2008. With permission.)
Multisensory Integration Develops Late in Humans 349
disoriented—and with both cues present (SM + LM). Figure 18.1c shows a clear developmental
trend in the unimodal performance, with mean mislocalization thresholds decreasing with age, sug-
gesting that navigation improves during development.
More interestingly, whereas adults take advantage of multiple cue integration, the children do
not. SM + LM thresholds were higher than LM thresholds for children in both age groups, whereas
the adults showed lower thresholds in the two-cue condition (evidence of cross-sensory fusion).
Nardini et al. (2008) also measured navigation in a conflict condition (Figure 18.1b), in which land-
marks were rotated by 15° after the participants had collected the objects. They considered two
models, one in which the cues were weighted by the inverse of variance and integrated (green line in
Figure 18.1d), and one in which subjects alternate between the two cues (pink line in Figure 18.1d).
Although the integration model predicted adult performance in the conflict condition, 4- to 5- and
7- to 8-year-olds followed the alternation model rather than the integration model.
Although adults clearly integrate optimally multiple cues for navigation, young children do not,
alternating between cues from trial to trial. These results suggest that the development of the two
individual spatial representations occur before they are integrated within a common unique refer-
ence frame. This study suggests that optimal multisensory integration of spatial cues for short-range
navigation occurs late during development.
100
80
Subjects (%)
60
40
20
0
AV A V AV A V AV A V
Controls Implants < 30 m Implants > 30 m
FIGURE 18.2 McGurk effect in children with cochlear implants compared with age-matched controls.
Phoneme /pa/ was played to subjects while they observed a video of lips pronouncing /ka/, and reported the pho-
neme they perceived. Black bars show percentage of each group to report fusion (/ka/) on at least 70% of trials;
light gray bars show auditory dominance (/pa/) and dark gray bars show visual dominance (/ka/). For controls,
more than half showed bimodal fusion (McGurk effect), and of those that did not, most showed auditory domi-
nance. Also, for children with early cochlear implants (before 30 months of age), majority show fusion, but those
that did not showed visual dominance. For children with later implants, almost all showed visual dominance.
Multisensory Integration Develops Late in Humans 351
auditory dominance. Among the group who had implants at an early age (before 30 months), a simi-
lar proportion (56%) perceived the fused phoneme, suggesting that bimodal fusion was occurring.
However, the majority of those who did not perceive the fused phoneme perceived the visual /ka/
rather than the auditory /pa/ that the control children perceive. For late implants, however, only one
showed cross-modal fusion, all the others showed visual dominance.
These results suggest that cross-modal fusion is not innate, but needs to be learned. The group
of hearing-restored children who received the implant after 30 months of age showed no evidence
of cross-modal fusion, with the visual phoneme dominating perception. Those with early implants
demonstrate a remarkable plasticity in acquiring bimodal fusion, suggesting that there is a sensitive
period for the development of bimodal integration of speech.
It is interesting that in normal-hearing children, sound dominates the multimodal perception,
whereas vision dominated in all the cochlea-implanted children, both early and late implants. It is
possible that the dominance can be explained by reliability-based integration. Speech is a complex
temporal task in audition and spatiotemporal task in vision. Although performance has not yet been
measured (to our knowledge), it is reasonable to suppose that in normal-hearing children, the audi-
tory perception is more precise, explaining the dominance. What about the cochlea-implanted chil-
dren? Is their auditory precision worse than visual precision, so the visual dominance is the result
of ideal fusion? Or is the auditory perception actually better than visual perception at this task, so
the visual dominance is not the most optimal solution? In this case, it may be that vision remains the
most robust sense, even if not the most precise. This would be interesting to investigate, perhaps in
a simplified situation, as has been done for visuo-haptic judgments (see following section).
where ŜVH is the combined visual and haptic estimate, estimate and ŜV and ŜH the independent hap-
tic and visual estimates. The weights w V and wH sum to unity and are inversely proportional to the
variance (σ 2) of the presumed underlying noise distribution.
( ) ( )
wV = σ V−2 σ H−2 + σ V−2 , wH = σ H−2 σ H−2 + σ V−2 (18.2)
−2
σ VH = σ V−2 + σ H−2 (18.3)
where σ V and σ H are the visual and haptic unimodal thresholds. The improvement is greatest ( 2 )
when σ V = σ H.
This model has been spectacularly successful in predicting human multimodal integration for
various tasks, including visuo-haptic size judgments (Ernst and Banks 2002), audiovisual position
352 The Neural Bases of Multisensory Processes
judgments (Alais and Burr 2004), and visual–tactile integration of sequence of events (Bresciani
and Ernst 2007). Gori et al. (2008) adapted the technique to study the development of reliability-
based cross-sensory integration of two aspects of form perception: size and orientation discrimina-
tion. The size discrimination task (top left icon of Figure 18.3) was a low-technology, child-friendly
adaptation of Ernst and Banks’ technique (Ernst and Banks 2002), where visual and haptic informa-
tion were placed in conflict with each other to investigate which dominates perception under vari-
ous degrees of visual degradation. The stimuli were physical blocks of variable height, displayed in
(a) (d)
1.0 1.0
10 Years 8 Years
0.5 0.5
Proportion “steeper”
Proportion “taller”
0.0 0.0
1.0 1.0
5 Years 5 Years
0.5 0.5
0.0 0.0
–6 –3 0 3 6 –12 –6 0 6 12
FIGURE 18.3 (See color insert.) Development of cross-modal integration for size and orientation discrimi-
nation. Illustration of experimental setup for size (a) and orientation (d) discrimination. Sample psychometric
functions for four children, with varying degrees of cross-modal conflict. (b and c) Size discriminations: SB
age 10.2 (b); DV age 5.5 (c); (e and f) orientation discrimination: AR age 8.7 (e); GF age 5.7 (f). Lower color-
coded arrows show MLE predictions, calculated from threshold measurements (Equation 18.1). Black-dashed
horizontal lines show 50% performance point, intersecting with curves at their PSE (shown by short vertical
bars). Upper color-coded arrows indicate size of haptic standard in size condition (b and c) and orientation of
visual standard in orientation condition (e and f). Older children generally follow the adult pattern, whereas
5-year-olds were dominated by haptic information for size task, and visual information for orientation task.
For size judgment, amount of conflict was 0 for red symbols, +3 mm (where plus means vision was larger) for
blue symbols, and –3 mm for green symbols. For orientation, same colors refer to 0° and ±4°.
Multisensory Integration Develops Late in Humans 353
front of an occluding screen for visual judgments, behind the screen for haptic judgments, or both
in front and behind for bimodal judgments.
All trials involved a two-alternative forced-choice task in which the subject judged whether a
standard block seemed taller or shorter than a probe of variable height. For the single-modality tri-
als, one stimulus was the standard, always 55 mm high, the other the probe, of variable height. The
proportion of trials in which the probe was judged taller than the standard was computed for each
probe height, yielding psychometric functions. The crucial condition was the dual-modality condi-
tion, in which visual and haptic sizes of the standard were in conflict, with the visual block 55 +
Δ mm and the haptic block 55 – Δ mm (Δ = 0 or ±3 mm). The probe was composed of congruent
visual and haptic stimuli of variable heights (48–62 mm).
After validating the technique with adults, demonstrating that optimal cross-modal integration
also occurred under these conditions, we measured haptic, visual, and bimodal visuo-haptic size dis-
crimination in 5- to 10-year-old children. Figure 18.3 shows sample psychometric functions for the
dual-modality measurements, fitted with cumulative Gaussian functions whose median estimates
the point of subjective equality (PSE) between the probe and standard. The pattern of results for the
10-year-old (Figure 18.3b) was very much like those for the adult: negative values of Δ caused the
curves to shift leftward, positive values caused them to shift rightward. That is, to say, the curves
followed the visual standard, suggesting that visual information was dominating the match, as the
MLE model suggests it should, as the visual thresholds were lower than the haptic thresholds. This
is consistent with the MLE model (indicated by color-coded arrows below the abscissa): the visual
judgment was more precise, and should therefore dominate.
For the 5-year-olds (Figure 18.3c), however, the results were completely different: the psycho-
metric functions shifted in the direction opposite to that of the 10-year-olds, following the bias of
the haptic stimulus. The predictions (color-coded arrows under the abscissa) are similar for both the
5- and 10-year-olds, as for both groups of children, visual thresholds were much lower than haptic
thresholds, so the visual stimuli should dominate: but for the 5-year-olds, the reverse holds, with the
haptic standard dominating the match.
These data show that for size judgments, touch dominates over vision. But is this universally
true? We repeated the experiments with another spatial task, orientation discrimination, another
basic spatial task that could, in principle, be computed by neural hardware of the primary visual
cortex (Hubel and Wiesel 1968). Subjects were required to discriminate which bar of a dual pre-
sentation (standard and probe) was rotated more counterclockwise. As with the size discrimina-
tions, we first measured thresholds in each separate modality, then visuo-haptically, by varying
degrees of conflict (Δ = 0 or ±4°). Figure 18.3e and F show sample psychometric functions for the
dual-modality measurements for a 5- and 8-year-old child. As with the size judgments, the pattern
of results for the 8-year-old was very much like those for the adult, with the functions of the three
different conflicts (Figure 18.3e) falling very much together, as predicted from the single modality
thresholds by the MLE model (arrows under the abscissa). Again, however, the pattern of results
for the 5-year-old was quite different (Figure 18.3f). Although the MLE model predicts similar
curves for the three conflict conditions, the psychometric functions very closely followed the visual
standards (indicated by the arrows above the graphs), the exact opposite pattern to that observed for
size discrimination.
Figure 18.4 reports PSEs for children in all ages for the three conflict conditions, plotted as
a function of the MLE predictions from single-modality discrimination thresholds. If the MLE
prediction held, the data should fall along the black-dotted equality line (like in the bottom graph
that reports the adults’ results). For adults this was so, for both size and orientation. However,
at 5 years of age, the story was quite different. For the size discriminations (Figure 18.4a), not
only do the measured PSEs not follow the MLE predictions, they varied inversely with Δ (fol-
lowing the haptic standard), lining up almost orthogonal to the equality line. Similarly, the data
for the 6-year-olds do not follow the prediction, but there is a tendency for the data to be more
scattered rather than ordered orthogonal to the prediction line. By 8 years of age, the data begin
354 The Neural Bases of Multisensory Processes
4 6
2 3
0 6Y 0
–2 –3
–4 –6
PSE measured (mm)
4
2
0 10Y
–2
–4
4 6
2 3
0 Adults 0
–2 –3
–4 –6
–4 –2 0 2 4 –6 –3 0 3 6
(mm) (deg)
Prediction from thresholds
FIGURE 18.4 (See color insert.) Summary data showing PSEs for all subjects for all conflict conditions,
plotted against predictions, for size (a) and orientation (b) discriminations. Different colors refer to different
subjects within each age group. Symbol shapes refer to level of cross-sensory conflict (Δ): squares, 3 mm or
4°; circles, –3 mm or –4°; upright triangles, 0; diamonds, 2 mm; inverted triangles, –2 mm. Closed symbols
refer to no-blur condition for size judgments, and vertical orientation judgments; open symbols to modest blur
(screen at 19 cm) or oblique orientations; cross in symbols to heavy blur (screen at 39 cm).
to follow the prediction, and by age 10, the data falls along it well, similar to the adult pattern of
results.
Figure 18.5a shows how thresholds vary with age for the various conditions. For both tasks,
visual and haptic thresholds decreased steadily up till 10 years (orientation more so than size).
The light-blue symbols show the thresholds predicted from the MLE model (Equation 18.3). For
the adults, the predicted improvement was close to the best single-modality threshold, and indeed,
the dual-modality thresholds were never worse than the best single-modality threshold. For the
5-year-old children, the results were quite different, with the dual-modality thresholds following
the worst thresholds. For the size judgment, they follow the haptic thresholds, not only much higher
than the MLE predictions, but twice the best single-modality (visual) thresholds. This result shows
not only that integration was not optimal, it was not even a close approximation, like “winner take
all.” Indeed, it shows a “loser take all” strategy. This reinforces the PSE data in showing that these
young children do not integrate cross-modally in a way that benefits perceptual discrimination.
Figure 18.5b plots the development of theoretical (violet symbols) and observed (black symbols)
visual and haptic weights. For both size and orientation judgments, the theoretical haptic weights
(calculated from thresholds) were fairly constant over age, 0.2 to 0.3 for size and 0.3 to 0.4 for
Multisensory Integration Develops Late in Humans 355
(a) Haptic
Vision 30
10 MLE
Thresholds (mm)
Thresholds (deg)
Cross-modal
10
3
5
1 2
3 10 Adult Blur 3 10 Adult
(b)
1.0 PSEs
0.0
Thresholds
Haptic weight
Visual weight
0.5 0.5
0.0 1.0
3 10 Adult 3 10 Adult
Age (y)
FIGURE 18.5 (See color insert.) Development of thresholds and visuo-haptic weights. Average thresholds
(geometric means) for haptic (red symbols), visual (green), and visuo-haptic (dark blue) size and orientation
discrimination, together with average MLE predictions (light blue), as a function of age. Predictions were cal-
culated individually for each subject and then averaged. Tick-labeled “blur” shows thresholds for visual stimuli
blurred by a translucent screen 19 cm from blocks. Error bars are ±1 SEM. Haptic and visual weights for size
and orientation discrimination, derived from thresholds via MLE model (violet circles) or from PSE values
(black squares). Weights were calculated individually for each subject, and then averaged. After 8 to 10 years,
the two estimates converged, suggesting that the system then integrates in a statistically optimal manner.
orientation. However, the haptic weights necessary to predict the 5-year-old PSE size data are 0.6
to 0.8, far, far greater than the prediction, implying that these young children give far more weight
to touch for size judgments than is optimal. Similarly, the haptic weights necessary to predict the
orientation judgments are around 0, far less than the prediction, suggesting that these children base
orientation judgments almost entirely on visual information. In neither case does anything like
optimal cue combination occur.
FIGURE 18.6 Accuracy and precision. Accuracy is defined as closeness of a measurement to its true physi-
cal value (its veracity), whereas precision is degree of reproducibility or repeatability between measurements,
usually measured as standard deviation of distribution. “Target analogy” shows high precision but poor accu-
racy (left), and good average accuracy but poor precision (right). The archer would correct his or her aim by
calibrating sights of the bow. Similarly, perceptual systems can correct for a bias by cross-calibration between
senses.
children are effectively “learning to see,” calibration may be expected to be more important. It is
during these years that limbs are growing rapidly, eye length and eye separation are increasing, all
necessitating constant recalibration between sight and touch. Indeed, many studies suggest that the
first 8 years in humans corresponds to the critical period of plasticity in humans for many attributes,
for many properties such as binocular vision (Banks et al. 1975) and acquiring accent-free language
(Doupe and Kuhl 1999).
So before 8 years of age, calibration may be more important than integration. The advantages of
fusing sensory information are probably more than offset by those of keeping the evolving system
calibrated and using one system to calibrate another precludes the fusion of the two. Therefore, if
we accept Berkeley’s ideas that vision must be calibrated by touch might explain why size discrimi-
nation thresholds are dominated by touch, even though touch is less precise than vision. But why
are orientation thresholds dominated by vision? Perhaps Berkeley was not quite right, and touch
does not always calibrate vision, but the more robust sense for a particular task is the calibrator. In
the same way that the more precise sense has the highest weights for sensory fusion, perhaps the
more accurate sense is used for calibration. The more accurate need not be the more precise, but is
probably the more robust. Accuracy is defined in absolute terms, as the distance from physical real-
ity, whereas precision is a relative measure, related to the reliability or repeatability of the results
(see Figure 18.6). It is therefore reasonable that for size, touch will be more accurate, as vision
cannot code it directly, but only by a complex calculation of retinal size and estimate of distance.
Orientation, on the other hand, is coded directly by primary visual cortex (Hubel and Wiesel 1968),
and calculated from touch only indirectly via complex coordinate transforms.
0.3
0.3 1 10
Normalized orientation thresholds
FIGURE 18.7 Thresholds for orientation discrimination, normalized by age-matched controls, plotted
against normalized size thresholds, for 17 unsighted or low-vision children aged between 5 and 18 years. Most
points lie in lower-right quadrant, implying better size and poorer orientation discrimination. Arrows refer
to group averages, 2.2 ± 0.3 for orientation and 0.8 ± 0.06 for size. Star in lower-left quadrant is the acquired
low-vision child. (Reprinted from Gori, M. et al., Curr. Biol., 20, 223–5, 2010. With permission.)
Orientation discrimination thresholds were all worse than the age-matched controls (>1), on aver-
age twice as high, whereas size discrimination thresholds were generally better than the controls
(<1). Interestingly, one child with an acquired visual impairment (star symbol) showed a completely
different pattern of results, with no orientation deficit. Although we have only one such subject, we
presume that his fine orientation thresholds result from the early visual experience (before 2½ years
of age), which may have been sufficient for the visual system to calibrate touch.
Many previous studies have examined haptic perception in the visually impaired, with seem-
ingly contradictory results: some studies show the performance of blind and low-vision subjects
to be as good or better than normally sighted controls, in tasks such as size discrimination with a
cane (Sunanto and Nakata 1998), haptic object exploration and recognition, and tactile recognition
of two-dimensional angles and gratings (Morrongiello et al. 1994); whereas other tasks including
haptic orientation discrimination (Alary et al. 2009; Postma et al. 2008), visual spatial imagination
(Noordzij et al. 2007), and representation and updating of spatial information (Pasqualotto and
Newell 2007) have shown impairments. Visually impaired children had particular difficulties with
rotated object arrays (Ungar et al. 1995). Most recently, Bülthoff and colleagues have shown that
congenitally blind subjects are worse than both blindfolded sighted and acquired-blind subjects at
haptic recognition of faces (Dopjans et al. 2009). It is possible that the key to understanding the dis-
crepancy in the literature is whether the haptic task may have required an early cross-modal visual
calibration. However, early exposure to vision seems to be sufficient to calibrate the developing
haptic system, suggesting that the sensitive period for damage is shorter than that for normal devel-
opment. This is consistent with other evidence for multiple sensitive periods, such as global motion
perception (Lewis and Maurer 2005).
The suggestion that specific perceptual tasks may require cross-modal calibration during devel-
opment could have practical implications, possibly leading to improvements in rehabilitation pro-
grams. Where cross-sensory calibration has been compromised, for example by blindness, it may
be possible to train people to use some form of “internal” calibration, or to calibrate by another
modality such as sound.
of the individual estimates. In the past few years, a great interest has emerged in when and how
these functions develop in children and young animals.
Many studies, both in children and animal models, suggest that multisensory integration does not
occur at birth, but develops over time. Some basic forms of integration, such as reflexive orienting
toward an audiovisual signal, develop quite early (Neil et al. 2006); some others, such as integration
of visual-haptic signals for orientation and size (Gori et al. 2008), and self-generated cues during
navigation (Nardini et al. 2008) develop only after 8 years of age. Similarly, whereas orientating
reflexes benefit from cue-integration by 8 months (Neil et al. 2006), nonreflexive motor responses to
bimodal stimuli continue to develop throughout childhood (Barutchu et al. 2009, 2010).
Some have suggested that the late development might be because multisensory integration
requires higher-order cognitive processes, including attention, reach a certain level of maturity or,
alternatively, until all motor processes reach maturity (Barutchu et al. 2010), which does not occur
until late adolescence (e.g., Betts et al. 2006; Kanaka et al. 2008; Smith and Chatterjee 2008).
However, it is far from clear what complex cognitive processes are involved in simple size and ori-
entation discriminations, and processes such as attention have been shown to operate at very low
levels, including V1 and A1 (Gandhi et al. 1999; Woldorff et al. 1993).
We suggest that anatomical and physiological differences in maturation rates could pose particu-
lar challenges for development, as could the need for the senses to continually recalibrate, to take
into account growing limbs, eye length, interocular distances, etc. If cross-sensory calibration were
more fundamental during development than for mature individuals, this would explain the lack of
integration, as the use of one sense to calibrate the other necessarily precludes the integration of
redundant information. Calibration does not always occur in the same direction (such as touch edu-
cating vision) but, in general, the more robust sense for a particular task calibrates the less robust.
The haptic system, which has the more immediate information about size, seems to calibrate vision,
which has no absolute size information and must scale for distance. On the other hand, for orienta-
tion discrimination, the visual system, which has specialized detectors tuned for orientation, seems
to calibrate touch. Indeed, congenitally blind or low-vision children show a strong deficit for haptic
orientation judgments and are consistent with the possibility that the deficit results from an early
failure to calibrate.
Cross-sensory calibration can explain many curious results, such as the fact that before integra-
tion, the dominance is task dependent, visual for orientation, haptic for size. Similar results have
been observed with audiovisual integration: audiovisual speech illusions do not seem to develop
until 10 years of age, whereas illusions not involving speech are mature by age 5 (Tremblay et al.
2007). Along the same lines, it can also explain the asymmetries in task performances in subjects
with different sensorial deficits (Gori et al. 2010; Putzar et al. 2007; Schorr et al. 2005).
All these results suggest that whereas the different sensory systems of infants and children are
clearly interconnected, multimodal perception may be not fully developed until quite late. Cross-
sensory calibration may be a useful strategy to allow our brain to take into account the dramatic
anatomical and sensorial changes during early life, as well as keeping our senses robustly calibrated
through life’s trials and tribulations.
ACKNOWLEDGMENT
This research was supported by the Italian Ministry of Universities and Research, EC project
“STANIB” (FP7 ERC), EC project “RobotCub” (FP6-4270), and Istituto David Chiossone Onlus.
REFERENCES
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current
Biology 14:257–62.
Multisensory Integration Develops Late in Humans 359
Alary, F., M. Duquette, R. Goldstein, C. Elaine Chapman, P. Voss, V. La Buissonniere-Ariza, and F. Lepore.
2009. Tactile acuity in the blind: A closer look reveals superiority over the sighted in some but not all
cutaneous tasks. Neuropsychologia 47:2037–43.
Atkinson, J. 2000. The developing visual brain. New York: Oxford Univ. Press.
Bahrick, L.E. 2001. Increasing specificity in perceptual development: Infants’ detection of nested levels of
multimodal stimulation. Journal of Experimental Child Psychology 79:253–70.
Bahrick, L.E., R. Flom, and R. Lickliter. 2002. Intersensory redundancy facilitates discrimination of tempo in
3-month-old infants. Developmental Psychobiology 41:352–63.
Bahrick, L.E., and R. Lickliter. 2000. Intersensory redundancy guides attentional selectivity and perceptual
learning in infancy. Developmental Psychology 36:190–201.
Bahrick, L.E., and R. Lickliter. 2004. Infants’ perception of rhythm and tempo in unimodal and multimodal
stimulation: a developmental test of the intersensory redundancy hypothesis. Cognitive, Affective &
Behavioral Neuroscience 4:137–47.
Banks, M.S., R.N. Aslin, and R.D. Letson. 1975. Sensitive period for the development of human binocular
vision. Science 190:675–7.
Barutchu, A., D.P. Crewther, and S.G. Crewther. 2009. The race that precedes coactivation: development of
multisensory facilitation in children. Developmental Science 12:464–73.
Barutchu, A., J. Danaher, S.G. Crewther, H. Innes-Brown, M.N. Shivdasani, and A.G. Paolini. 2010. Audiovisual
integration in noise by children and adults. Journal of Experimental Child Psychology 105:38–50.
Bergeson, T.R., and D.B. Pisoni. 2003. Audiovisual speech perception in deaf adults and children following
cochlear implantation. In Handbook of multisensory integration, ed. G. Calvert, C. Spence, and B.E.
Stein, 749–772. Cambridge, MA: MIT Press.
Berkeley, G. 1709. An essay towards a new theory of vision. 1963. Indianapolis, IN: Bobbs-Merrill.
Betts, J., J. McKay, P. Maruff, and V. Anderson. 2006. The development of sustained attention in children: The
effect of age and task load. Child Neuropsychology 12:205–21.
Bremner, A.J., N.P. Holmes, and C. Spence. 2008a. Infants lost in (peripersonal) space? Trends in Cognitive
Sciences 12:298–305.
Bremner, A.J., D. Mareschal, S. Lloyd-Fox, and C. Spence. 2008b. Spatial localization of touch in the first year
of life: Early influence of a visual spatial code and the development of remapping across changes in limb
position. Journal of Experimental Psychology. General 137:149–62.
Bresciani, J.P., and M.O. Ernst. 2007. Signal reliability modulates auditory–tactile integration for event count-
ing. Neuroreport 18:1157–61.
Brown, A.M., V. Dobson, and J. Maier. 1987. Visual acuity of human infants at scotopic, mesopic and photopic
luminances. Vision Research 27:1845–58.
Del Viva, M.M., R. Igliozzi, R. Tancredi, and D. Brizzolara. 2006. Spatial and motion integration in children
with autism. Vision Research 46:1242–52.
Dodd, B. 1979. Lip reading in infants: Attention to speech presented in- and out-of-synchrony. Cognitive
Psychology 11:478–84.
Dopjans, L., C. Wallraven, and H.H. Bülthoff. 2009. Visual experience supports haptic face recognition:
Evidence from the early- and late-blind. 10th International Multisensory Research Forum (IMRF), New
York City, The City College of New York.
Doupe, A.J., and P.K. Kuhl. 1999. Birdsong and human speech: Common themes and mechanisms. Annual
Review of Neuroscience 22:567–631.
Ellemberg, D., T.L. Lewis, D. Maurer, C.H. Lui, and H.P. Brent. 1999. Spatial and temporal vision in patients
treated for bilateral congenital cataracts. Vision Research 39:3480–9.
Ellemberg, D., T.L. Lewis, M. Dirks, D. Maurer, T. Ledgeway, J.P. Guillemot, and F. Lepore. 2004. Putting
order into the development of sensitivity to global motion. Vision Research 44:2403–11.
Elliott, L.L. 1979. Performance of children aged 9 to 17 years on a test of speech intelligibility in noise using
sentence material with controlled word predictability. Journal of the Acoustical Society of America
66:651–3.
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415:429–33.
Gandhi, S.P., D.J. Heeger, and G.M. Boynton. 1999. Spatial attention affects brain activity in human pri-
mary visual cortex. Proceedings of the National Academy of Sciences of the United States of America
96:3314–9.
Gibson, E.J., and A.S. Walker. 1984. Development of knowledge of visual-tactual affordances of substance.
Child Development 55:453–60.
360 The Neural Bases of Multisensory Processes
Gori, M., M.M. Del Viva, G. Sandini, and D.C. Burr. 2008. Young children do not integrate visual and haptic
form information. Current Biology 18:694–8.
Gori, M., G. Sandini, C. Martinoli, and D. Burr. 2010. Poor haptic orientation discrimination in nonsighted
children may reflect disruption of cross-sensory calibration. Current Biology 20:223–5.
Gottlieb, G. 1971. Development of species identification in birds: An inquiry into the prenatal determinants of
perception. Chicago: Univ. of Chicago Press.
Hatwell, Y. 1987. Motor and cognitive functions of the hand in infancy and childhood. International Journal of
Behavioural Development 10:509–26.
Hotting, K., and B. Roder. 2009. Auditory and auditory–tactile processing in congenitally blind humans.
Hearing Research 258:165–74.
Hubel, D.H., and T.N. Wiesel. 1968. Receptive fields and functional architecture of monkey striate cortex.
Journal of Physiology 195:215–43.
Jiang, W., and B.E. Stein. 2003. Cortex controls multisensory depression in superior colliculus. Journal of
Neurophysiology 90:2123–35.
Johnson, C.E. 2000. Children’s phoneme identification in reverberation and noise. Journal of Speech, Language,
and Hearing Research 43:144–57.
Jusczyk, P., D. Houston, and M. Goodman. 1998. Speech perception during the first year. In Perceptual devel-
opment: Visual, Auditory, and Speech Perception in Infancy, ed. A. Slater. Psychology Press.
Kanaka, N., T. Matsuda, Y. Tomimoto, Y. Noda, E. Matsushima, M. Matsuura, and T. Kojima. 2008.
Measurement of development of cognitive and attention functions in children using continuous perfor-
mance test. Psychiatry and Clinical Neurosciences 62:135–41.
Klein, R.E. 1966. A developmental study of perception under condition of conflicting cues. Dissertation abstract.
Kovács, I., P. Kozma, A. Fehér, and G. Benedek. 1999. Late maturation of visual spatial integration in humans.
Proceedings of the National Academy of Sciences of the United States of America 96, 12204–9.
Lewis, T.L., and D. Maurer. 2005. Multiple sensitive periods in human visual development: Evidence from
visually deprived children. Developmental Psychobiology 46:163–83.
Lewis, T.L., D. Ellemberg, D. Maurer, J.-P. Guillemot, and F. Lepore. 2004. Motion perception in 5-year-olds:
Immaturity is related to hypothesized complexity of cortical processing. Journal of Vision 4:30–30a.
Lewkowicz, D.J. 1986. Developmental changes in infants’ bisensory response to synchronous durations. Infant
Behavior and Development 163:180–8.
Lewkowicz, D.J. 1988a. Sensory dominance in infants: 1. Six-month-old infants’ response to auditory–visual
compounds. Developmental Psychology 24:155–71.
Lewkowicz, D.J. 1988b. Sensory dominance in infants: 2. Ten-month-old infants’ response to auditory-visual
compounds. Developmental Psychology 24:172–82.
Lewkowicz, D.J. 1992. Infants’ responsiveness to the auditory and visual attributes of a sounding/moving
stimulus. Perception & Psychophysics 52:519–28.
Lewkowicz, D.J. 1996. Perception of auditory–visual temporal synchrony in human infants. Journal of
Experimental Psychology. Human Perception and Performance 22:1094–106.
Lewkowicz, D.J. 2000. The development of intersensory temporal perception: An epigenetic systems/limita-
tions view. Psychological Bulletin 126:281–308.
Lewkowicz, D.J., and R. Lickliter (ed.). 1994. The development of intersensory perception: Comparative perspec-
tives. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Lewkowicz, D.J., and G. Turkewitz. 1981. Intersensory interaction in newborns: Modification of visual prefer-
ences following exposure to sound. Child Development 52:827–32.
Lickliter, R., D.J. Lewkowicz, and R.F. Columbus. 1996. Intersensory experience and early perceptual devel-
opment: The role of spatial contiguity in bobwhite quail chicks’ responsiveness to multimodal maternal
cues. Developmental Psychobiology 29:403–16.
Macaluso, E., and J. Driver. 2004. Neuroimaging studies of cross-modal integration for emotion. In The Handbook of
Multisensory Processes, ed. G.A. Calvet, C. Spence, and B.E. Stein, 529–48. Cambridge, MA: MIT Press.
Massaro, D.W. 1987. Speech perception by ear and eye: A paradigm for psychological inquiry. Program in
experimental psychology. Hillsdale, NJ: Laurence Erlbaum Associates.
McGurk, H., and R.P. Power. 1980. Intermodal coordination in young children: Vision and touch. Developmental
Psychology 16:679–80.
Meltzoff, A.N., and R.W. Borton. 1979. Intermodal matching by human neonates. Nature 282:403–4.
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62.
Misceo, G.F., W.A. Hershberger, and R.L. Mancini. 1999. Haptic estimates of discordant visual-haptic size
vary developmentally. Perception & Psychophysics 61:608–14.
Multisensory Integration Develops Late in Humans 361
Morrongiello, B.A., G.K. Humphrey, B. Timney, J. Choi, and P.T. Rocca. 1994. Tactual object exploration and
recognition in blind and sighted children. Perception 23:833–48.
Morrongiello, B.A., K.D. Fenwick, and G. Chance. 1998. Cross-modal learning in newborn infants: Inferences
about properties of auditory–visual events. Infant Behavior and Development 21:543–54.
Nardini, M., P. Jones, R. Bedford, and O. Braddick. 2008. Development of cue integration in human navigation.
Current Biology 18:689–93.
Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory
spatial integration and perception in humans. Developmental Science 9:454–64.
Noordzij, M.L., S. Zuidhoek, and A. Postma. 2007. The influence of visual experience on visual and spatial
imagery. Perception 36:101–12.
Olsho, L.W. 1984. Infant frequency discrimination as a function of frequency. Infant Behavior and Development
7:27–35.
Olsho, L.W., E.G. Koch, E.A. Carter, C.F. Halpin, and N.B. Spetner. 1988. Pure-tone sensitivity of human
infants. Journal of the Acoustical Society of America 84:1316–24.
Pasqualotto, A., and F.N. Newell. 2007. The role of visual experience on the representation and updating of
novel haptic scenes. Brain and Cognition 65:184–94.
Patterson, M.L., and J.F. Werker. 2002. Infants’ ability to match dynamic phonetic and gender information in
the face and voice. Journal of Experimental Child Psychology 81:93–115.
Paus, T. 2005. Mapping brain development and aggression. Canadian Child and Adolescent Psychiatry Review
14:10–5.
Postma, A., S. Zuidhoek, M.L. Noordzij, and A.M. Kappers. 2008. Haptic orientation perception benefits from
visual experience: Evidence from early-blind, late-blind, and sighted people. Perception & Psychophysics
70:1197–206.
Putzar, L., I. Goerendt, K. Lange, F. Rosler, and B. Roder. 2007. Early visual deprivation impairs multisensory
interactions in humans. Nature Neuroscience 10:1243–5.
Rentschler, I., M. Jüttner, E. Osman, A. Müller, and T. Caelli. 2004. Development of configural 3D object rec-
ognition. Behavioural Brain Research 149:107–11.
Rose, S.A. 1981. Developmental changes in infants’ retention of visual stimuli. Child Development 52:227–33.
Rose, S.A., and H.A. Ruff. 1987. Cross-modal abilities in human infants. In Handbook of Infant Development,
ed. J.D. Osofsky, 318–62. New York: Wiley.
Röder, B., F. Rosler, and C. Spence. 2004. Early vision impairs tactile perception in the blind. Current Biology
14:121–4.
Röder, B., A. Kusmierek, C. Spence, and T. Schicke. 2007. Developmental vision determines the reference
frame for the multisensory control of action. Proceedings of the National Academy of Sciences of the
United States of America 104:4753–8.
Sann, C., and A. Streri. 2007. Perception of object shape and texture in human newborns: evidence from cross-
modal transfer tasks. Developmental Science 10:399–410.
Schorr, E.A., N.A. Fox, V. van Wassenhove, and E.I. Knudsen. 2005. Auditory-visual fusion in speech percep-
tion in children with cochlear implants. Proceedings of the National Academy of Sciences of the United
States of America 102:18748–50.
Smith, S.E., and A. Chatterjee. 2008. Visuospatial attention in children. Archives of Neurology 65:1284–8.
Stein, B.E. 2005. The development of a dialogue between cortex and midbrain to integrate multisensory infor-
mation. Experimental Brain Research 166:305–15.
Stein, B.E., E. Labos, and L. Kruger. 1973. Sequence of changes in properties of neurons of superior colliculus
of the kitten during maturation. Journal of Neurophysiology 36:667–79.
Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: multisensory
integration in cat and monkey. Progress in Brain Research 95:79–90.
Stein, B.E., T.J. Perrault, T.R. Stanford, and B.A. Rowland. 2009a. Postnatal experiences influence how the
brain integrates information from different senses. Frontiers in Integrative Neuroscience 3:21.
Stein, B.E., T.R. Stanford, and B.A. Rowland. 2009b. The neural basis of multisensory integration in the mid-
brain: Its organization and maturation. Hearing Research 258:4–15.
Streri, A. 2003. Cross-modal recognition of shape from hand to eyes in human newborns. Somatosensory &
Motor Research 20:13–8.
Streri, A., M. Lhote, and S. Dutilleul. 2000. Haptic perception in newborns. Developmental Science 3:319–27.
Streri, A., E. Gentaz, E. Spelke, and G. van de Walle. 2004. Infants’ haptic perception of object unity in rotating
displays. Quarterly Journal of Experimental Psychology A 57:523–38.
Streri, A., C. Lemoine, and E. Devouche. 2008. Development of inter-manual transfer of shape information in
infancy. Developmental Psychobiology 50:70–6.
362 The Neural Bases of Multisensory Processes
Striano, T., and E. Bushnell. 2005. Haptic perception of material properties by 3-month-old infants. Infant
Behavior and Development 28:266–89.
Sunanto, J., and H. Nakata. 1998. Indirect tactual discrimination of heights by blind and blindfolded sighted
subjects. Perceptual and Motor Skills 86:383–6.
Trehub, S.E., B.A. Schneider, and J.L. Henderson. 1995. Gap detection in infants, children, and adults. Journal
of the Acoustical Society of America 98:2532–41.
Tremblay, C., F. Champoux, P. Voss, B.A. Bacon, F. Lepore, and H. Theoret. 2007. Speech and non-speech
audio-visual illusions: a developmental study. PLoS One 2, e742.
Trommershäuser, J., M. Landy, and K. Körding (eds.) (in press). Sensory cue integration. New York: Oxford
Univ. Press.
Ungar, S., M. Blades, and C. Spencer. 1995. Mental rotation of a tactile layout by young visually impaired
children. Perception 24:891–900.
Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat
superior colliculus. Journal of Neuroscience 17:2429–44.
Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior col-
liculus. Journal of Neuroscience 21:8886–94.
Wallace, M.T., and B.E. Stein. 2007. Early experience determines how the senses will interact. Journal of
Neurophysiology 97:921–6.
Wallace, M.T., T.J. Perrault Jr., W.D. Hairtston, and B.E. Stein. 2004. Visual experience is necessary for the
development of multisensory integration. Journal of Neuroscience 24:9580–4.
Watt, S.J., M.F. Bradshaw, T.J. Clarke, and K.M. Elliot. 2003. Binocular vision and prehension in middle child-
hood. Neuropsychologia 41:415–20.
Wilkinson, L.K., M.A. Meredith, and B.E. Stein. 1996. The role of anterior ectosylvian cortex in cross-modal-
ity orientation and approach behavior. Experimental Brain Research 112:1–10.
Woldorff, M.G., C.C. Gallen, S.A. Hampson, S.A. Hillyard, C. Pantev, D. Sobel, and F.E. Bloom. 1993.
Modulation of early sensory processing in human auditory cortex during auditory selective attention.
Proceedings of the National Academy of Sciences of the United States of America 90:8722–6.
19 Phonetic Recalibration
in Audiovisual Speech
Jean Vroomen and Martijn Baart
CONTENTS
19.1 Introduction........................................................................................................................... 363
19.2 A Short Historical Background on Audiovisual Speech Aftereffects...................................364
19.3 Seminal Study on Lip-Read–Induced Recalibration............................................................. 365
19.4 Other Differences between Recalibration and Selective Speech Adaptation . ..................... 367
19.4.1 Buildup...................................................................................................................... 367
19.4.2 Dissipation................................................................................................................. 368
19.4.3 Recalibration in “Speech” versus “Nonspeech” Mode.............................................. 368
19.5 Stability of Recalibration over Time..................................................................................... 369
19.5.1 Basic Phenomenon of Lexically Induced Recalibration............................................ 369
19.5.2 Lip-Read–Induced versus Lexically Induced Recalibration..................................... 370
19.6 Developmental Aspects......................................................................................................... 372
19.7 Computational Mechanisms.................................................................................................. 373
19.8 Neural Mechanisms............................................................................................................... 374
19.9 Conclusion............................................................................................................................. 376
Acknowledgments........................................................................................................................... 376
References....................................................................................................................................... 376
19.1 INTRODUCTION
In the literature on cross-modal perception, there are two important findings that most researchers
in this area will know about, although only few have ever made a connection between the two. The
first is that perceiving speech is not solely an auditory, but rather a multisensory phenomenon. As
many readers know by now, seeing a speaker deliver a statement can help decode the spoken mes-
sage. The most famous experimental demonstration of the multisensory nature of speech is the so-
called McGurk illusion: when perceivers are presented an auditory syllable /ba/ dubbed onto a face
articulating /ga/, they report “hearing” /da/ (McGurk and MacDonald 1976). The second finding
goes back more than 100 years ago to Stratton (1896). He performed experiments with goggles and
prisms that radically changed his visual field, thereby creating a conflict between vision and pro
prioception. What he experienced is that after wearing prisms for a couple of days, he adapted to the
upside-down visual world and he learned to move along in it quite well. According to Stratton, the
visual world had changed as it sometimes appeared to him as if it was “right side up,” although oth-
ers such as Held (1965) argued later that it was rather the sensory–motor system that was adapted.
What these two seemingly different phenomena have in common is that in both cases an arti-
ficial conflict between the senses is created about an event that should yield congruent data under
normal circumstances. Thus, in the McGurk illusion, there is a conflict between the auditory system
that hears the syllable /ba/ and the visual system that sees the face of a speaker saying /ga/, in the
prism case there is a conflict between proprioception that may feel the hand going upward and the
363
364 The Neural Bases of Multisensory Processes
visual system that sees the same hand going downward. In 2003, the commonality between these
two phenomena led us (Bertelson et al. 2003) to question whether one might also observe long-
term adaptation effects with audiovisual speech as reported by Stratton for prism adaptation. To be
more specific, to the best of our knowledge, nobody had ever examined whether auditory speech
perception would adapt as a consequence of exposure to the audiovisual conflict present in McGurk
stimuli. This was rather surprising given that the original paper by McGurk and MacDonald is one
of the most widely cited papers in this research area (more than 1500 citations by January 2009).
Admittedly though, on first sight it may look as a somewhat exotic enterprise to examine whether
listeners adapt to speech sounds induced by exposure to an audiovisual conflict. After all, why
would adaptation to a video of an artificially dubbed speaker be of importance? Experimental psy-
chologists should rather spend their time on fundamental aspects of perception and cognition that
remain constant across individuals, cultures, and time, and not on matters that are flexible and
adjustable. And, indeed, the dominant approach in speech research did just that by focusing on the
information available in the speech signal, the idea being that there must be acoustic invariants in
the signal that are extracted during perception. On second thought though, it has turned out to be
extremely difficult to find a set of acoustic invariant parameters that work for all contexts, cultures,
and speakers, and the question we addressed might open an alternative view: Rather than searching
for acoustic invariants, it might be equally fruitful to examine whether and how listeners adjust their
phoneme boundaries so as to accommodate the variation they hear.
In 2003, we (Bertelson et al. 2003) reported that phonetic recalibration induced by McGurk-like
stimuli can indeed be observed. We termed the phenomenon “recalibration” in analogy with the
much better known “spatial recalibration,” as we considered it a readjustment or a fine-tuning of an
already existing phonetic representation. In the same year, and in complete independence, Norris
et al. (2003) reported a very similar phenomenon they named “perceptual learning in speech.” The
basic procedure in both studies was very similar: Listeners were presented with a phonetically
ambiguous speech sound and another source of contextual information that disambiguated that
sound. In our study, we presented listeners a sound halfway between /b/ and /d/ with as context
the video of a synchronized face that articulated /b/ or /d/ (in short, lip-read information), whereas
in the study of Norris et al. (2003), an ambiguous /s/-/f/ sound was heard embedded in the context
of an f- or s-biasing word (e.g., “witlo-s/f” was an f-biasing context because “witlof” is a word in
Dutch meaning “chicory,” but “witlos” is not a Dutch word). Recalibration (or perceptual learning)
was subsequently measured in an auditory-only identification test in which participants identified
members of a speech continuum. Recalibration manifested itself as a shift in phonetic categoriza-
tion toward the contextually defined speech environment. Listeners thus increased their report of
sounds consistent with the context they had received before, so more /b/ responses after exposure to
lip-read /b/ rather than lip-read /d/, and more /f/ responses after exposure to f-biasing words rather
than /s/-biasing words. Presumably, this shift reflected an adjustment of the phoneme boundary that
had helped listeners to understand speech better in the prevailing input environment.
After these seminal reports, there have been a number of studies that examined phonetic recali-
bration in more detail (Baart and Vroomen 2010a, 2010b; Cutler et al. 2008; Eisner and McQueen
2005, 2006; Jesse and McQueen 2007; Kraljic et al. 2008a, 2008b; Kraljic and Samuel 2005, 2006,
2007; McQueen et al. 2006a, 2006b; Sjerps and McQueen 2010; Stevens 2007; van Linden and
Vroomen 2007, 2008; Vroomen and Baart 2009a, 2009b; Vroomen et al. 2004, 2007). In what fol-
lows, we will provide an overview of this literature and, given the topic of this book, we will focus
on the audiovisual case.
appropriate dubbings, can change the auditory percept (McGurk and MacDonald 1976). More
recently, audiovisual speech has served in functional magnetic resonance imaging (fMRI) stud-
ies as an ideal stimulus for studying the neural substrates of multisensory integration (Calvert and
Campbell 2003). Surprisingly though, until 2003 there were only three studies that had focused on
auditory aftereffects as a consequence of exposure to audiovisual speech, despite the fact that after-
effects were extensively studied in the late 1970s, and are again nowadays.
Roberts and Summerfield (1981) were the first to study the aftereffects of audiovisual speech,
although they were not searching for recalibration, but for “selective speech adaptation,” which is
basically a contrastive effect. The main question of their study was whether selective speech adapta-
tion takes place at a phonetic level of processing, as originally proposed by Eimas and Corbit (1973),
or at a more peripheral acoustic level. Selective speech adaptation differs from recalibration in that
it does not depend on an (intersensory) conflict, but rather on the repeated presentation of an acous-
tically nonambiguous sound that reduces report of sounds similar to the repeating one. For example,
hearing /ba/ many times reduces subsequent report of /ba/ on a /ba/–/da/ test continuum. Eimas
and Corbit (1973) argued that selective speech adaptation reflects the neural fatigue of hypothetical
“linguistic feature detectors,” but this viewpoint was not left unchallenged by others claiming that
it reflects a mere shift in criterion (Diehl 1981; Diehl et al. 1978, 1980) or a combination of both
(Samuel 1986), or possibly that even more qualitatively different levels of analyses are involved
(Samuel and Kat 1996). Still, others (Sawusch 1977) showed that the size of selective speech adapta-
tion depends on the degree of spectral overlap between the adapter and test sound, and that most—
although not all—of the effect is acoustic rather than phonetic.
Roberts and Summerfield (1981) found a clever way to disentangle the acoustic from the phonetic
contribution using McGurk-like stimuli. They dubbed a canonical auditory /b/ (a “good” acoustic
example) onto the video of lip-read /b/ to create an audiovisual congruent adapter and also dubbed
the auditory /b/ onto a lip-read /g/ to create a compound stimulus intended to be perceived as /d/.
Results showed that repeated exposure to the congruent audiovisual adapter induced similar con-
trastive aftereffects on a /b/–/d/ test continuum (i.e., fewer /b/ responses) as the incongruent adapter
AbVg, even though the two adapters were perceived differently. This led the authors to conclude
that selective speech adaptation mainly depends on the acoustic quality of the stimulus, and not the
perceived or lip-read one.
Saldaña and Rosenblum (1994) and Shigeno (2002) later replicated these results with different
adapters. Saldaña and Rosenblum compared auditory-only adapters with audiovisual ones (auditory
/b/ paired with visual /v/, a compound stimulus perceived mostly as /v/), and found, as Roberts and
Summerfield did, that the two adapters again behaved similarly, as in both cases fewer /b/ responses
were obtained at the test. Similar results were also found by Shigeno (2002) using AbVg as adapter,
and by us (unpublished) demonstrating that selective speech adaptation depends, to a large extent,
on repeated exposure to nonambiguous sounds.
prism (Welch and Warren 1986)—and they all showed that exposure to spatially conflicting inputs
recalibrates processing in the respective modalities in a way that reduces the conflict.
Despite the fact that immediate biases and recalibration effects had been demonstrated for spa-
tial conflict situations, the existing evidence was less complete for conflicts regarding audiovisual
speech. Here, immediate biases were well known (the McGurk effect) as well as selective speech
adaptation, but recalibration had not been demonstrated. Bertelson et al. (2003) hypothesized that
a slight variation in the paradigm introduced by Roberts and Summerfield (1981) might neverthe-
less produce these effects, thus revealing recalibration. The key factor was the ambiguity of the
adapter sound. Rather than using a conventional McGurk-like stimulus containing a canonical (and
incongruent) sound, Bertelson et al. (2003) used an ambiguous sound. They created a synthetic
sound halfway between /aba/ and /ada/ (henceforth A? for auditory ambiguous) and dubbed it onto
the corresponding video of a speaker pronouncing /aba/ or /ada/ (A?Vb and A?Vd, respectively).
Participants were shortly exposed to either A?Vb or A?Vd, and then tested on identification of A?,
and the two neighbor tokens on the auditory continuum A? −1 and A? +1. Each exposure block con-
tained eight adapters (either A?Vb or A?Vd) immediately followed by six test trials. These exposure-
test blocks were repeated many times, and participants were thus biased toward both /b/ and /d/ in
randomly ordered blocks (a within-subjects factor). Results showed that listeners quickly learned
to label the ambiguous sound in accordance with the lip-read information they were exposed to
shortly before. Listeners thus gave more /aba/ responses after exposure to A?Vb than after exposure
to A?Vd, and this was taken as the major sign of recalibration (see Figure 19.1, left panel).
In a crucial control experiment, Bertelson et al. (2003) extended these findings by incorporat-
ing audiovisual congruent adapters AbVb and AdVd. These adapters were not expected to induce
recalibration because there was no conflict between sound and vision. Rather, they were expected
to induce selective speech adaptation due to the nonambiguous nature of the sound. As shown in
Figure 19.1, right panel, these adapters indeed induced selective speech adaptation, and there were
thus less /aba/ responses after exposure to AbVb than AdVd, an effect in the opposite direction of
recalibration.
The attractiveness of these control stimuli was that participants could not distinguish them from
the ones with an ambiguous sound that induced recalibration. This was confirmed in an identifi-
cation test in which A?Vb and AbVb were perceived as /b/, and A?Vd and AdVd as /d/ on nearly
1.00 1.00
A?Vaba AVaba
A?Vada AVada
Proportion of 'b' responses
0.80 0.80
0.60 0.60
0.40 0.40
0.20 0.20
FIGURE 19.1 Percentage of /aba/ responses as a function of auditory test token. Left panel: After exposure
to audiovisual adapters with ambiguous sounds, A?Vaba or A?Vada, there were more responses consistent
with the adapter (recalibration). Right panel: After exposure to audiovisual adapters with non-ambiguous
sounds, AVaba or AVada, there were fewer responses consistent with the adapter (selective speech adaptation).
(Results on auditory tests adapted from Bertelson, P. et al., Psychol. Sci., 14, 6, 592–597, 2003; Exp. 2.)
Phonetic Recalibration in Audiovisual Speech 367
100% the trials. Moreover, even when participants were explicitly asked to discriminate AbVb from
A?Vb, and AdVd from A?Vd, they performed at chance level because there was a strong immedi-
ate bias by the lip-read information that captured the identity of the sound (Vroomen et al. 2004).
These findings imply that the difference in aftereffects induced by adapters with ambiguous ver-
sus nonambiguous sounds cannot be ascribed to some (unknown) explicit strategy of the listeners,
because listeners simply could not know whether they were actually hearing adapters with ambigu-
ous sounds (causing recalibration) or nonambiguous sounds (causing selective speech adaptation).
This confirms the sensory, rather than strategic, nature of the phenomenon.
Lip-read–induced recalibration of speech was thus demonstrated, and appeared to be contingent
upon exposure to an ambiguous sound and another source of information that disambiguated that
sound. Selective speech adaptation, on the other hand, occurred in the absence of an intersensory
conflict, and mainly depended on repeated presentation of an acoustically clear sound. These two
forms of aftereffects had been studied before in other perceptual domains, but always in isolation.
Recalibration was earlier demonstrated for the ventriloquist situation and analogous intramodal
conflicts such as between different cues to visual depth (see reviews by Epstein 1975 and Wallach
1968), whereas contrastive aftereffects where already well known for color, curvature (Gibson 1933),
size (Blakemore and Sutton 1969) and motion (Anstis 1986; Anstis et al. 1998).
19.4.1 Buildup
To examine the buildup of recalibration and selective speech adaptation, Vroomen et al. (2007)
presented the four previously used audiovisual adapters (A?Vb, A?Vd, AbVb, and AdVd) in a con-
tinuous series of exposure trials, and inserted test trials after 1, 2, 4, 8, 16, 32, 64, 128, and 256
exposures. The aftereffects of adapters with ambiguous sounds (A?Vb and A?Vd) were already
at ceiling after only eight exposure trials (the level of exposure used in the original study) and
then, surprisingly, after 32 exposure trials fell off with prolonged exposure (128 and 256 trials).
Aftereffects of adapters with nonambiguous sounds AbVb and AdVd were again contrastive and
the effect linearly increased with the (log-)number of exposure trials. The latter fitted well with the
idea that selective speech adaptation reflects an accumulative process, but there was no apparent
reason why a learning effect such as recalibration would reverse at some point. The authors sug-
gested that two processes might be involved here: selective speech adaptation running in parallel
with recalibration and eventually taking over. Recalibration would then dominate the observed
aftereffects in the early stages of exposure, whereas selective speech adaptation would become
manifest later on.
Such a phenomenon was indeed observed when data of an “early” study (i.e., one before the
initial reports on phonetic recalibration) by Samuel (2001) were reanalyzed. Samuel exposed his
participants to massive repeated presentations of an ambiguous /s/–/∫/ sound in the context of either
an /s/-final word (e.g., /bronchiti?/, from bronchitis), or a /∫/-final one (e.g., /demoli?/, from demolish).
In this situation, one might expect recalibration to take place. However, in post-tests involving iden-
tification of the ambiguous /s/–/∫/ sound, Samuel obtained contrastive aftereffects indicative of
selective speech adaptation, so less /s/ responses after exposure to /bronchiti?/ than /demoli?/ (and
thus an effect in the opposite direction later reported by Norris et al. 2003). This made him conclude
that a lexically restored phoneme produces selective speech adaptation similar to a nonambiguous
368 The Neural Bases of Multisensory Processes
sound. Others, though—including Samuel—would report in later years recalibration effects using
the same kinds of stimuli (Kraljic and Samuel 2005; Norris et al. 2003; van Linden and Vroomen
2007). To examine this potential conflict in more detail, Samuel allowed us to reanalyze the data
from his 2001 study as a function of number of exposures blocks (Vroomen et al. 2007). His experi-
ment consisted of 24 exposure blocks, each containing 32 adapters. Contrastive aftereffects were
indeed observed for the majority of blocks following block 3, showing the reported dominant role
of selective speech adaptation. Crucially, though, a significant recalibration effect was obtained (so
more /s/ responses after exposure to /bronchiti?/ than /demoli?/) in the first block of 32 exposure
trials, which, in the overall analyses, was swamped by selective adaptation in later blocks. Thus,
the same succession of aftereffects dominated early by recalibration and later by selective adapta-
tion was already present in Samuel’s data. The same pattern may therefore occur generally during
prolonged exposure to various sorts of conflict situations involving ambiguous sounds.
19.4.2 Dissipation
A study by Vroomen et al. (2004) focused on how long recalibration and selective speech adaptation
effects last over time. Participants were again exposed to A?Vb, A?Vd, AdVd, or AbVb, but rather
than using multiple blocks of eight adapters and six test trials in a within-subject design (as in the
original study), participants were now exposed to only one of the four adapters (a between-subject
factor) in three similar blocks consisting of 50 exposure trials followed by 60 test trials. The recal
ibration effect turned out to be very short-lived and lasted only about six test trials, whereas the
selective speech adaptation effect was observed even after 60 test trials. The results again confirmed
that the two phenomena were different from each other. Surprisingly, though, lip-read–induced
recalibration turned out to be rather short-lived, a finding to which we will return later.
1.00 1.00
V/omso/ V/omso/
Proportion of 'n' responses
V/onso/ V/onso/
0.80 0.80
0.60 0.60
Speech
mode
0.40 0.40
0.20 0.20
0.00 0.00
/A?-1/ /A?/ /A?+1/ /A?-1/ /A?/ /A?+1/
Auditory token Auditory token
1.00 1.00
V/omso/ V/omso/
Proportion of '2' responses
V/onso/ V/onso/
0.80 0.80
0.20 0.20
0.00 0.00
/A?-1/ /A?/ /A?+1/ /A?-1/ /A?/ /A?+1/
Auditory token Auditory token
FIGURE 19.2 Curves represent mean proportion of /onso/ responses as a function of auditory test tokens
of continuum after exposure to auditory ambiguous adapters A?Vonso and A?Vomso (left panels), and audi-
tory non-ambiguous adapters AonsoVonso and AomsoVomso (right panels). Upper panels show performance
of speech group; lower panels show performance of non-speech group. Error bars = 1 SEM. (Adapted from
Vroomen, J., and Baart, M., Cognition, 110, 2, 254–259, 2009a.)
As an example, listeners can infer that an ambiguous sound somewhere in between /b/ and /d/ in the
context of “?utter” is more likely to be /b/ rather than /d/ because “butter” is a word in English, but
not “dutter.” There is also, as for lip-reading, an immediate lexical bias in phoneme identification
known as the Ganong effect (Ganong 1980). For example, an ambiguous /g/-/k/ sound is “heard” as
/g/ when followed by “ift” and as /k/ when followed by “iss” because “gift” and “kiss” are words,
but “kift” and “giss” are not.
The corresponding aftereffect that results from exposure to such lexically biased phonemes was
first reported by Norris et al. (2003). They exposed listeners to a sound halfway between /s/ and /f/
in the context of an f- or s-biasing word, and listeners were then tested on an /es/-/ef/ continuum.
As comparable to the lip-reading case, the authors observed recalibration (or in their terminology,
perceptual learning), so more /f/ responses after an f-biasing context, and more /s/ responses after
an s-biasing context.
Later studies confirmed the original finding and additionally suggested that the effect is speaker-
specific (Eisner and McQueen 2005), or possibly, token-specific (Kraljic and Samuel 2006, 2007),
that it generalizes to words outside the original training set (McQueen et al. 2006a) and across syl-
labic positions (Jesse and McQueen 2007), and that it arises automatically as a consequence of hear-
ing the ambiguous pronunciations in words (McQueen et al. 2006b). Although Jesse and McQueen
(2007) demonstrated that lexical recalibration can generalize to word onset positions, there was no
lexical learning when listeners were exposed to ambiguous onset words (Jesse and McQueen 2007).
However, Cutler et al. (2008) showed that legal word-onset phonotactic information can induce reca-
libration, presumably because this type of information can be used immediately, whereas lexical
knowledge about the word is not yet available when one hears the ambiguous onset. Moreover, lexical
retuning is not restricted to a listener’s native language as the English fricative theta ([θ] as in “bath”)
presented in a Dutch f- or s-biasing context induced lexical learning (Sjerps and McQueen 2010).
expected because lip-reading has in general a much stronger impact on sound processing than lexi-
cal information does (Brancazio 2004). Most important, though, both aftereffects dissipated equally
fast, and thus there was no sign that lexical recalibration by itself was more robust than lip-read–
induced recalibration.
The same study also explored whether recalibration would become more stable if a contrast pho-
neme from the opposite category was included in the set of exposure items. Studies reporting long-
lasting lexical aftereffects presented during the exposure not only words with ambiguous sounds,
but also filler words with nonambiguous sounds taken from the opposite side of the phoneme con-
tinuum. For example, in the exposure phase of Norris et al. (2003) in which an ambiguous s/f sound
was biased toward /f/, there were not only exposure stimuli such as “witlo?” that supposedly drive
recalibration, but also contrast stimuli containing the nonambiguous sound /s/ (e.g., naaldbos). Such
contrast stimuli might serve as an anchor or a comparison model for another stimulus, and afteref-
fects thought to reflect recalibration might in this way be boosted because listeners set the criterion
for the phoneme boundary in between the ambiguous token and the extreme one. The obtained
aftereffect may then reflect the contribution of two distinct processes: one related to recalibration
proper (i.e., a shift in the phoneme boundary meant to reduce the conflict between the sound and
the context), the other to a strategic and long-lasting criterion setting operation that depends on the
presence of an ambiguous phoneme and a contrast phoneme from the opposing category. Our results
showed that aftereffects did indeed become substantially bigger if a contrast stimulus was included
in the exposure set but crucially, aftereffects did not become more stable. Contrast stimuli thus
boosted the effect, but did not explain why sometimes long-lasting aftereffects were obtained.
Another factor that was further explored was whether participants were biased in consecutive
exposure phases toward only one or both phoneme categories. One can imagine that if listeners
are biased toward both a t-word and p-word (as was standard in lip-read studies, but not the lexical
ones), the boundary setting that listeners adopt may become fragile. However, this did not turn out
to be critical: Regardless of whether participants were exposed to only one or both contexts, it did
not change the size and stability of the aftereffect.
Of note is that lip-read and lexical recalibration effect did not vanish if a 3-min silent interval
separated the exposure phase from test. The latter finding indicates that recalibration as such is not
fragile, but that other factors possibly related to the test itself may explain why aftereffects dissipate
quickly during testing. One such possibility might be that listeners adjust their response criterion
in the course of testing such that the two response alternatives are chosen about equally often.
However, although this seems reasonable, it does not explain why in the same test selective speech
adaptation effects remained stable in due course of testing (Vroomen et al. 2004).
Still, another possibility is that recalibration needs time to consolidate, and sleep might be a fac-
tor in this. Eisner and McQueen (2006) explored this possibility and observed equal amounts of lex-
ically induced aftereffects after 12 h, regardless of whether listeners had slept. Vroomen and Baart
(2009b) conducted a similar study on lip-read–induced recalibration, including contrast phonemes
to boost the aftereffect, and tested participants twice: immediately after the lip-read exposure phase
(as standard) and after a 24-h period during which participants had slept. The authors found large
recalibration effects in the beginning of the test (the first six test trials), but they again quickly dis-
sipated with prolonged testing (within 12 trials), and did not reappear after a 24-h delay.
It may also be the case that the dissipation rate of recalibration depends on the acoustic nature
of the stimuli. The studies that found quick dissipation used intervocalic and syllable-final stops
that varied in place of articulation (/aba/-/ada/ and /p/-/t/), whereas others used fricatives (/f-s/ and
/s-∫/; Eisner and McQueen 2006; Kraljic et al. 2008b; Kraljic and Samuel 2005) or syllable-initial
voiced–voiceless stop consonants (/d-t/ and /b/-/p/; Kraljic and Samuel 2006). If the stability of the
phenomenon depends on the acoustic nature of the cues (e.g., place cues might be more vulnerable),
one may observe aftereffects to differ in this respect as well.
Another variable that may play a role is whether the same ambiguous sound is used during the
exposure phase, or whether the token varies from trial to trial. Stevens (2007, Chapter 3) examined
372 The Neural Bases of Multisensory Processes
token variability in lexical recalibration using similar procedures as those used by Norris et al.
(2003), but listeners were either exposed to the same or different versions of an ambiguous s/f sound
embedded in s- and f-biasing words. His design also included contrast phonemes from the opposite
phoneme category that should have boosted the effect. When the ambiguous token was constant,
as in the original study by Norris et al., the learning effect was quite substantial on the first test
trials, but quickly dissipated with prolonged testing, and in the last block (test trials 36–42), lexical
recalibration had disappeared completely akin to lip-read–induced recalibration (van Linden and
Vroomen 2007; Vroomen and Baart 2009b; Vroomen et al. 2004). When the sound varied from trial
to trial, the overall learning effect was much smaller and restricted to the f-bias condition, but the
effect lasted longer.
Another aspect that may play a role is the use of filler items. Studies reporting short-lived after-
effects tended to use massed trials of adapters with either no filler items separating the critical
items, or only a few contrast stimuli. Others, reporting long-lasting effects used lots of filler items
separating the critical items (Eisner and McQueen 2006; Kraljic and Samuel 2005, 2006; Norris
et al. 2003). Typically, about 20 critical items containing the ambiguous phoneme were interspersed
among 180 fillers items. A classic learning principle is that massed trials produce weaker learning
effect than spaced trials (e.g., Hintzman 1974). At present, it remains to be explored whether recal
ibration is sensitive to this variable as well and whether it follows the same principle. One other
factor that might prove to be valuable in the discussion regarding short- versus long-lasting effects
is that extensive testing may override, or wash out, the learning effects (e.g., Stevens 2007) because
during the test, listeners might “relearn” their initial phoneme boundary. Typically, in the Bertelson
et al. (2003) paradigm, more test trials are used than in the Norris et al. (2003) paradigm, possibly
influencing the time course of the observed effects. For the time being, though, the critical differ-
ence between the short- and long-lasting recalibration effects remains elusive.
paired with lip-read /ba/, and in the other group all auditory tokens were paired with lip-read /da/. In
the latter two groups, lip-read information thus did not inform the infant how to divide the sounds
from the continuum into two categories. A preference procedure revealed that infants in the former,
but not in the two latter groups learned to discriminate the tokens from the /ba/–/da/ continuum.
These results suggest that infants can use lip-read information to adjust the phoneme boundary of
an auditory speech continuum. Further testing, however, is clearly needed so as to understand what
critical experience is required and how it relates to lip-read–induced recalibration in detail.
bias—namely, in the case where SWS stimuli were perceived as nonspeech—there was also no
recalibration. Immediate bias and recalibration thus usually go hand in hand, and in order to claim
that they are distinct, one would like to see empirical evidence in the form of a dissociation.
-2 -2 -2 ±1.9 µV
MMN
Phonetic Recalibration in Audiovisual Speech
0 0 0
2 2 2 -µV
4 4 4
-50 0 50 100 150 200 250 -50 0 50 100 150 200 250 -50 0 50 100 150 200 250
Time (ms) Time (ms) Time (ms)
FIGURE 19.3 Grand-averaged waveforms of standard, deviant, and MMN at electrode Fz for t-word condition (left panel) and p-word condition (middle panel).
(Adapted from Vroomen, J. et al., Neuropsychologia, 45, 3, 572–577, 2007.) Right panel: MMNs and their scalp topographies for both conditions. Voltage map ranges in
μV are displayed below each map. y-axis marks onset of acoustic deviation between /?/ and /t/.
375
376 The Neural Bases of Multisensory Processes
basis, the subjects’ percepts of ambiguous sounds to be tested about 10 s later. The functional inter-
pretation of these areas is to be explored further, but the activation changes may reflect trial-by-trial
variations in subjects’ processing of the audiovisual stimuli, which in turn influence recalibration
and later auditory perception. For instance, variations in recruitment of attentional mechanisms
and/or involvement of working memory might be of importance, although the latter seems to be
unlikely (Baart and Vroomen 2010b).
19.9 CONCLUSION
We reviewed literature that demonstrates that listeners adjust their phoneme boundaries to the pre-
vailing speech context. Phonetic recalibration can be induced by lip-read and lexical context. Both
yield converging data, although the stability of the effect varies quite substantially between studies
for as yet unknown reasons. One reason could be that aftereffects as measured during tests reflect
the contribution of both recalibration and selective speech adaptation that run in parallel but with
different contributions over time. Several computational mechanisms have been proposed that can
account for phonetic recalibration, but critical data that distinguish between these alternatives—in
particular, about the generalization to new tokens—have not yet been collected. Phonetic recalibra-
tion leaves traces in the brain that can be examined with brain imaging techniques. Initial studies
suggest that a recalibrated sound behaves like an acoustically real sound from that category, and
possible loci (e.g., middle and inferior frontal gyrus, parietal cortex) that subserve recalibration have
been identified. Further testing, however, is needed to examine this in more detail. Involvement of
the parietal cortex could indicate that (verbal) short-term memory plays a role in phonetic recal
ibration, although a recent study conducted by our group indicates that phonetic recalibration is
not affected if subjects are involved in a difficult verbal or spatial short-term memory task (Baart
and Vroomen 2010b). Moreover, auditory speech has also been shown to shift the interpretation of
lip-read speech categories in a similar manner as auditory speech can be recalibrated by lip-read
information, so the effect is genuinely bidirectional (Baart and Vroomen 2010a). On this view,
audiovisual speech is like other cross-modal learning effects (e.g., the ventriloquist illusion) where
bidirectional effects have been demonstrated.
ACKNOWLEDGMENTS
We would like to thank Arthur Samuel and James McQueen for insightful comments on an earlier
version of this manuscript.
REFERENCES
Anstis, S. 1986. Motion perception in the frontal plane: Sensory aspects. In Handbook of perception and human
performance, Vol. 2, Chap. 27, ed. K. R. Boff, L. Kaufman, and J. P. Thomas. New York: Wiley.
Anstis, S., F. A. J. Verstraten, and G. Mather. 1998. The motion aftereffect. Trends in Cognitive Sciences 2:
111–117.
Baart, M., and J. Vroomen. 2010a. Do you see what you are hearing?: Crossmodal effects of speech sounds on
lipreading. Neuroscience Letters 471: 100–103.
Baart, M., and J. Vroomen. 2010b. Phonetic recalibration does not depend on working memory. Experimental
Brain Research 203: 575–582.
Bermant, R. I., and R. B. Welch. 1976. Effect of degree of separation of visual–auditory stimulus and eye posi-
tion upon spatial interaction of vision and audition. Perceptual and Motor Skills 42(43): 487–493.
Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic
Bulletin and Review 5(3): 482–489.
Bertelson, P., I. Frissen, J. Vroomen, B. De Gelder B. et al. 2006. The aftereffects of ventriloquism: Patterns of
spatial generalization. Perception and Psychophysics 68(3): 428–436.
Bertelson, P., and M. Radeau. 1981. Cross-modal bias and perceptual fusion with auditory–visual spatial dis-
cordance. Perception and Psychophysics 29(6): 578–584.
Phonetic Recalibration in Audiovisual Speech 377
Bertelson, P., J. Vroomen, and B. De Gelder. 2003. Visual recalibration of auditory speech identification: A
McGurk aftereffect. Psychological Science 14(6): 592–597.
Blakemore, C., and P. Sutton. 1969. Size adaptation: A new aftereffect. Science 166(902): 245–247.
Brancazio, L. 2004. Lexical influences in audiovisual speech perception. Journal of Experimental Psychology:
Human Perception and Performance 30(3): 445–463.
Callan, D. E. et al. 2003. Neural processes underlying perceptual enhancement by visual speech gestures.
Neuroreport 14(17): 2213–2218.
Calvert, G. A., E. T. Bullmore, M. J. Brammer et al. 1997. Activation of auditory cortex during silent lipreading.
Science 276(5312): 593–596.
Calvert, G. A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of
visible speech. Journal of Cognitive Neuroscience 15(1): 57–70.
Campbell, R. 2008. The processing of audio-visual speech: Empirical and neural bases. Philosophical
Transactions of the Royal Society of London. Series B, Biological Sciences 363(1493): 1001–1010.
Colin, C., M. Radeau, A. Soquet, D. Demolin, F. Colin, and P. Deltenre. 2002. Mismatch negativity evoked
by the McGurk-MacDonald effect: A phonetic representation within short-term memory. Clinical
Neurophysiology 113(4): 495–506.
Cutler, A., J. M. McQueen, S. Butterfield, and D. Norris. 2008. Prelexically-driven perceptual retuning of pho-
neme boundaries. Proceedings of Interspeech 2008, Brisbane, Australia.
Desjardins, R. N., and J. F. Werker. 2004. Is the integration of heard and seen speech mandatory for infants?
Developmental Psychobiology 45: 187–203.
Diehl, R. L. 1981. Feature detectors for speech: a critical reappraisal. Psychological Bulletin 89(1): 1–18.
Diehl, R. L., J. L. Elman, and S. B. McCusker. 1978. Contrast effects on stop consonant identification. Journal
of Experimental Psychology: Human Perception and Performance 4(4): 599–609.
Diehl, R. L., M. Lang, and E. M. Parker. 1980. A further parallel between selective adaptation and contrast.
Journal of Experimental Psychology: Human Perception and Performance 6(1): 24–44.
Eimas, P. D., and J. D. Corbit. 1973. Selective adaptation of linguistic feature detectors. Cognitive Psychology
4: 99–109.
Eisner, F., and J. M. McQueen. 2005. The specificity of perceptual learning in speech processing. Perception
and Psychophysics 67(2): 224–238.
Eisner, F., and J. M. McQueen. 2006. Perceptual learning in speech: Stability over time. Journal of the Acoustical
Society of America 119(4): 1950–1953.
Epstein, W. 1975. Recalibration by pairing: A process of perceptual learning. Perception 4: 59–72.
Formisano, E., F. De Martino, M. Bonte, and R. Goebel. 2008. “Who” is saying “what”? Brain-based decoding
of human voice and speech. Science 322(5903): 970–973.
Ganong, W. F. 1980. Phonetic categorization in auditory word perception. Journal of Experimental Psychology:
Human Perception and Performance 6(1): 110–125.
Gibson, J. J. 1933. Adaptation, after-effects and contrast in the perception of curved lines. Journal of
Experimental Psychology 18: 1–31.
Held, R. 1965. Plasticity in sensory–motor systems. Scientific America 213(5): 84–94.
Hintzman, D. L. 1974. Theoretical implications of the spacing effect. In Theories in cognitive psychology: The
Loyola symposium, ed. R. L. Solso, 77–99. Potomac, MD: Erlbaum.
Jesse, A., and J. M. McQueen. 2007. Prelexical adjustments to speaker idiosyncracies: Are they position-
specific? In Proceedings of Interspeech 2007, ed. H. V. Hamme and R. V. Son, 1597–1600. Antwerpen,
Belgium: Causal Productions (DVD).
Kilian-Hütten, N. J., J. Vroomen, and E. Formisano. 2008. One sound, two percepts: Predicting future speech
perception from brain activation during audiovisual exposure. [Abstract]. Neuroimage 41, Supplement
1: S112.
Klemm, O. 1909. Localisation von Sinneneindrücken bei disparaten Nebenreizen. Psychologische Studien 5:
73–161.
Klucharev, V., R. Möttönen, and M. Sams. 2003. Electrophysiological indicators of phonetic and non-phonetic
multisensory interactions during audiovisual speech perception. Brain Research, Cognitive Brain
Research 18(1): 65–75.
Kraljic, T., S. E. Brennan, and A. G. Samuel. 2008a. Accommodating variation: dialects, idiolects, and speech
processing. Cognition 107(1): 54–81.
Kraljic, T., and A. G. Samuel. 2005. Perceptual learning for speech: Is there a return to normal? Cognitive
Psychology 51(2): 141–178.
Kraljic, T., and A. G. Samuel. 2006. Generalization in perceptual learning for speech. Psychonomic Bulletin
and Review 13(2): 262–268.
378 The Neural Bases of Multisensory Processes
Kraljic, T., and A. G. Samuel. 2007. Perceptual adjustments to multiple speakers. Journal of Memory and
Language 56: 1–15.
Kraljic, T., A. G. Samuel, and S. E. Brennan. 2008b. First impressions and last resorts: How listeners adjust to
speaker variability. Psychological Science 19(4): 332–338.
Kuhl, P. K., and A. N. Meltzoff. 1982. The bimodal perception of speech in infancy. Science 218: 1138–1141.
Lang, H., T. Nyrke, M. Ek, O. Aaltonen, I. Raimo, and R. Näätänen. 1990. Pitch discrimination performance
and auditory event-related potentials. In Psychophysiological Brain Research, vol. 1, ed. C. M. H. Brunia,
A. W. K. Gaillard, A. Kok, G. Mulder, and M. N. Verbaten, 294–298. Tilburg: Tilburg University Press.
Massaro, D. W. 1984. Children’s perception of visual and auditory speech. Child Development 55:
1777–1788.
McClelland, J. L., and J. L. Elman. 1986. The TRACE model of speech perception. Cognitive Psychology
18(1): 1–86.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748.
McQueen, J. M., A. Cutler, and D. Norris. 2006a. Phonological abstraction in the mental lexicon. Cognitive
Science 30: 1113–1126.
McQueen, J. M., A. Jesse, and D. Norris. 2009. No lexical–prelexical feedback during speech perception or: Is
it time to stop playing those Christmas tapes? Journal of Memory and Language 61: 1–18.
McQueen, J. M., D. Norris, and A. Cutler. 2006b. The dynamic nature of speech perception. Language and
Speech 49(1): 101–112.
Mirman, D., J. L. McClelland, and L. L. Holt. 2006. An interactive Hebbian account of lexically guided tuning
of speech perception. Psychonomic Bulletin and Review 13(6): 958–965.
Norris, D., J. M. McQueen, and A. Cutler. 2000. Merging information in speech recognition: Feedback is never
necessary. Behavioral and Brain Sciences 23(3): 299–325 discussion: 325–370.
Norris, D., J. M. McQueen, and A. Cutler. 2003. Perceptual learning in speech. Cognitive Psychology 47(2):
204–238.
Näätänen, R. 1992. Attention and brain function. Hillsdale: Erlbaum.
Näätänen, R. 2001. The perception of speech sounds by the human brain as reflected by the mismatch negativ-
ity (MMN) and its magnetic equivalent. Psychophysiology 38: 1–21.
Näätänen, R., A. W. K. Gaillard, and S. Mäntysalo. 1978. Early selective-attention effect in evoked potential
reinterpreted. Acta Psychologica 42: 313–329.
Näätänen, R., P. Paavilainen, H. Tiitinen, D. Jiang, and K. Alho. 1993. Attention and mismatch negativity.
Psychophysiology 30: 436–450.
Patterson, M., and J. F. Werker. 1999. Matching phonetic information in lips and voice is robust in 4.5-month-
old infants. Infant Behavior and Development 22: 237–247.
Patterson, M. L., and J. F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice.
Developmental Science 6(2): 191–196.
Radeau, M., and P. Bertelson. 1974. The after-effects of ventriloquism. The Quarterly Journal of Experimental
Psychology 26(1): 63–71.
Radeau, M., and P. Bertelson. 1976. The effect of a textured visual field on modality dominance in a ventrilo-
quism situation. Perception and Psychophysics 20: 227–235.
Radeau, M., and P. Bertelson. 1977. Adaptation to auditory–visual discordance and ventriloquism in semireal-
istic situations. Perception and Psychophysics 22(2): 137–146.
Remez, R. E., P. E. Rubin, D. B. Pisoni, and T. D. Carrell. 1981. Speech perception without traditional speech
cues. Science 212: 947–949.
Roberts, M., and Q. Summerfield. 1981. Audiovisual presentation demonstrates that selective adaptation in
speech perception is purely auditory. Perception and Psychophysics 30(4): 309–314.
Rosenblum, L. D., M. A. Schmuckler, and J. A. Johnson. 1997. The McGurk effect in infants. Perception and
Psychophysics 59: 347–357.
Saldaña, H. M., and L. D. Rosenblum. 1994. Selective adaptation in speech perception using a compelling
audiovisual adaptor. Journal of the Acoustical Society of America 95(6): 3658–3661.
Sams, M., R. Aulanko, M. Hämäläinen et al. 1991. Seeing speech: Visual information from lip movements
modifies activity in the human auditory cortex. Neuroscience Letters 127(1): 141–145.
Samuel, A. G. 1986. Red herring detectors and speech perception: in defense of selective adaptation. Cognitive
Psychology 18(4): 452–499.
Samuel, A. G. 2001. Knowing a word affects the fundamental perception of the sounds within it. Psychological
Science 12(4): 348–351.
Samuel, A. G., and D. Kat. 1996. Early Levels of Analysis of Speech. Journal of Experimental Psychology:
Human Perception and Performance 22(3): 676–694.
Phonetic Recalibration in Audiovisual Speech 379
Sawusch, J. R. 1977. Peripheral and central processes in selective adaptation of place of articulation in stop
consonants. Journal of the Acoustical Society of America 62(3): 738–750.
Shigeno, S. 2002. Anchoring effects in audiovisual speech perception. Journal of the Acoustical Society of
America 111(6): 2853–2861.
Sjerps, M. J., and J. M. McQueen. 2010. The bounds on flexibility in speech perception. Journal of Experimental
Psychology: Human Perception and Performance 36: 195–211.
Stekelenburg, J. J., and J. Vroomen. 2007. Neural correlates of multisensory integration of ecologically valid
audiovisual events. Journal of Cognitive Neuroscience 19(12): 1964–1973.
Stevens, M. 2007. Perceptual adaptation to phonological differences between language varieties. University of
Gent, Gent (Ph.D.-thesis).
Stratton, G. M. 1896. Some preliminary experiments on vision without inversion of the retinal image.
Psychological Review 611–617.
Sumby, W. H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the
Acoustical Society of America 26: 212–215.
Teinonen, T., R. N. Aslin, P. Alku, and G. Csibra. 2008. Visual speech contributes to phonetic learning in
6-month-old infants. Cognition 108(3): 850-855.
Tuomainen, J., T. S. Andersen, K. Tiippana, and M. Sams. 2005. Audio-visual speech perception is special.
Cognition 96(1): B13–B22.
van Linden, S., J. J. Stekelenburg, J. Tuomainen, and J. Vroomen. 2007. Lexical effects on auditory speech
perception: An electrophysiological study. Neuroscience Letters 420(1): 49–52.
van Linden, S., and J. Vroomen. 2007. Recalibration of phonetic categories by lipread speech versus lexi-
cal information. Journal of Experimental Psychology: Human Perception and Performance 33(6):
1483–1494.
van Linden, S., and J. Vroomen. 2008. Audiovisual speech recalibration in children. Journal of Child Language
35(4): 809–822.
Vroomen, J., and M. Baart. 2009a. Phonetic recalibration only occurs in speech mode. Cognition 110(2):
254–259.
Vroomen, J., and M. Baart. 2009b. Recalibration of phonetic categories by lipread speech: Measuring afteref-
fects after a twenty-four hours delay. Language and Speech 52: 341–350.
Vroomen, J., S. van Linden, B. de Gelder, and P. Bertelson. 2007. Visual recalibration and selective adaptation
in auditory–visual speech perception: Contrasting build-up courses. Neuropsychologia 45(3): 572–577.
Vroomen, J., S. van Linden, M. Keetels, B. de Gelder, and P. Bertelson. 2004. Selective adaptation and recali-
bration of auditory speech by lipread information: Dissipation. Speech Communication 44: 55–61.
Wallach, H. 1968. Informational discrepancy as a basis of perceptual adaptation. In The neuropsychology of
spatially oriented behaviour, ed. S. J. Freeman, 209–230. Homewood, IL: Dorsey.
Welch, R. B., and D. H. Warren. 1986. In Handbook of perception and human performance, ed. K. R. Kaufman
and J. P. Thomas, 1–36. New York: Wiley.
Winkler, I., T. Kujala, and Y. Shtyrov. 1999. Brain responses reveal the learning of foreign language phonemes.
Psychophysiology 36: 638–642.
20 Multisensory Integration
and Aging
Jennifer L. Mozolic, Christina E. Hugenschmidt,
Ann M. Peiffer, and Paul J. Laurienti
CONTENTS
20.1 General Cognitive Slowing.................................................................................................... 383
20.2 Inverse Effectiveness............................................................................................................. 383
20.3 Larger Time Window of Integration...................................................................................... 385
20.4 Deficits in Attentional Control.............................................................................................. 385
20.5 An Alternative Explanation: Increased Noise at Baseline.................................................... 387
20.6 Summary and Conclusions.................................................................................................... 388
References....................................................................................................................................... 389
Effective processing of multisensory stimuli relies on both the peripheral sensory organs and cen-
tral processing in subcortical and cortical structures. As we age, there are significant changes in all
sensory systems and a variety of cognitive functions. Visual acuity tends to decrease and hearing
thresholds generally increase (Kalina 1997; Liu and Yan 2007), whereas performance levels on tasks
of motor speed, executive function, and memory typically decline (Rapp and Heindel 1994; Birren
and Fisher 1995; Rhodes 2004). There are also widespread changes in the aging brain, including
reductions in gray and white matter volume (Good et al. 2001; Salat et al. 2009), alterations in
neurotransmitter systems (Muir 1997; Backman et al. 2006), regional hypoperfusion (Martin et al.
1991; Bertsch et al. 2009), and altered patterns of functional activity during cognitive tasks (Cabeza
et al. 2004; Grady 2008). Given the extent of age-related alterations in sensation, perception, and
cognition, as well as in the anatomy and physiology of the brain, it is not surprising that multisen-
sory integration also changes with age.
Several early studies provided mixed results on the differences between multisensory process-
ing in older and younger adults (Stine et al. 1990; Helfer 1998; Strupp et al. 1999; Cienkowski
and Carney 2002; Sommers et al. 2005). For example, Stine and colleagues (1990) reported that
although younger adults’ memory for news events was better after audiovisual presentation than
after auditory information alone, older adults did not show improvement during the multisensory
conditions. In contrast, Cienkowski and Carney (2002) demonstrated that audiovisual integration
on the McGurk illusion was similar for older and younger adults, and that in some conditions,
older adults were even more likely to report the fusion of visual and auditory information than
their young counterparts. Similarly, in a study examining the contribution of somatosensory input
to participants’ perception of visuospatial orientation, Strupp et al. (1999) reported an age-related
increase in the integration of somatosensory information into the multisensory representation of
body orientation.
Despite providing a good indication that multisensory processing is somehow altered in aging,
the results of these studies are somewhat difficult to interpret due to their use of complex cog-
nitive tasks and illusions, and to the variability in analysis methods. Several newer studies that
381
382 The Neural Bases of Multisensory Processes
have attempted to address these factors more clearly demonstrate that multisensory integration is
enhanced in older adults (Laurienti et al. 2006; Peiffer et al. 2007; Diederich et al. 2008).
On a two-choice audiovisual discrimination task, Laurienti and colleagues (2006) showed that
response time (RT) benefits for multisensory versus unisensory targets were larger for older adults
than for younger adults (Figure 20.1). That is, older adults’ responses during audiovisual conditions
were speeded more than younger adults’, when compared with their respective responses during
unisensory conditions. Multisensory gains in older adults remained significantly larger than those
observed in younger adults, even after controlling for the presence of two targets in the multisensory
condition (redundant target effect; Miller 1982, 1986; Laurienti et al. 2006).
Using similar analysis methods, Peiffer et al. (2007) also reported increased multisensory gains
in older adults. On a simple RT task, where average unisensory RTs were equivalent in younger
and older adults, older adults actually responded faster than younger adults on multisensory trials
because of their enhanced multisensory integration (Peiffer et al. 2007). Diederich and colleagues
(2008) have also shown that older adults exhibit greater speeding of responses to multisensory
targets than younger adults on a saccadic RT task. The analysis methods used in this experiment
indicate a slowing of peripheral sensory processing, as well as a wider time window over which
integration of auditory and visual stimuli can occur (Diederich et al. 2008).
These experiments highlight several possible explanations that could help answer a critical ques-
tion about multisensory processing in aging: Why do older adults exhibit greater integration of
multisensory stimuli than younger adults? Potential sources of enhanced integration in older adults
include age-related cognitive slowing not specific to multisensory processing, inverse effectiveness
14%
Young
12%
Elderly
10%
Probability difference
8%
6%
4%
2%
0%
250 400 550 700 850 1000 1150 1300 1450 1600
–2%
–4%
Response time (ms)
FIGURE 20.1 Multisensory performance enhancements are significantly larger in older adults than in
younger adults on a two-choice audiovisual discrimination paradigm. These curves illustrate multisensory-
mediated gains relative to race model, which is the summed probability of unisensory responses. Each curve
is the difference between the cumulative distribution of response times for the multisensory condition and the
race model cumulative distribution function. Thus, positive deflections in these curves represent responses to
multisensory stimuli that were faster than would be predicted by independent processing of the auditory and
visual stimulus components (i.e., multisensory integration). Significant multisensory facilitation was observed
in younger adults 340–550 ms after stimulus onset, and the maximum benefit achieved was approximately
8.3%. Older adults exhibited significant multisensory gains over a broader temporal window (330–690 and
730–740 ms after stimulus onset), and had performance gains of about 13.5%. Thus, both younger and older
participants demonstrated speeding of responses to multisensory stimuli that exceeded gains predicted by
the race model; however, older adults benefited more from the multisensory stimulus presentation than did
younger adults. (Adapted from Laurienti, P.J. et al., Neurobiol Aging, 27, 1155–1163, 2006, with permission
from Elsevier.)
Multisensory Integration and Aging 383
associated with sensory deficits, alterations in the temporal parameters of integration, and ineffi-
cient top–down modulation of sensory processing. In the following sections we will investigate each
of these possible explanations in greater detail and offer some alternative hypotheses for the basis
of enhanced multisensory integration in older adults.
cells, and changes in cutaneous receptors and the olfactory epithelium (Kovács 2004; Liu and Yan
2007; Shaffer and Harrison 2007; Charman 2008), and to age-related alterations in how the central
nervous system processes sensory information (Schmolesky et al. 2000; Cerf-Ducastel and Murphy
2003; Ostroff et al. 2003; Quiton et al. 2007).
Reduced sensitivity or acuity in the individual sensory systems is another potential explanation
for increased multisensory benefits for older adults, attributable to a governing principle of mul-
tisensory integration known as inverse effectiveness. According to this principle, decreasing the
effectiveness of individual sensory stimuli increases the magnitude of multisensory enhancements
(Meredith and Stein 1983, 1986). In other words, when an auditory or visual stimulus is presented
just above threshold level, the gains produced by bimodal audiovisual presentation are larger than
when the individual stimuli are highly salient. Early demonstrations of inverse effectiveness in the
cat superior colliculus (Meredith and Stein 1983, 1986) have been extended to cat and monkey cor-
tex (Wallace et al. 1992; Kayser et al. 2005) as well as both neural and behavioral data in humans
(Hairston et al. 2003; Stevenson and James 2009). For example, Hairston and colleagues (2003)
demonstrated that young participants with normal vision were able to localize unimodal visual
and bimodal audiovisual targets equally well; however, when participants’ vision was artificially
degraded, their localization abilities were significantly enhanced during audiovisual conditions
relative to performance on visual targets alone.
The evidence for inverse effectiveness as a source of enhanced multisensory integration in
older adults is not yet clear. In the study performed by Peiffer and colleagues (2007; mentioned
above), RTs on unisensory trials were similar for younger and older adults, yet the older adults still
showed larger multisensory gains than the younger group. This finding suggests that other mech-
anisms beyond inverse effectiveness may be required to explain the age-related enhancements.
The paradigm used in this study, however, matched the performance between populations using
superthreshold stimuli and did not specifically investigate the consequence of degrading stimulus
effectiveness.
In a population composed exclusively of older adults, Tye-Murray et al. (2007) demonstrated
that integration levels in an audiovisual speech perception task did not differ for older adults with
mild-to-moderate hearing loss and older adults without hearing impairment. However, all test-
ing in this experiment was conducted in the presence of background auditory noise (multitalker
“babble”), and the level of this noise was adjusted for each participant so that the performance of
the two groups was matched in the unisensory auditory condition. This design makes it difficult
to address the interesting question of whether reduced stimulus effectiveness due to age-related
hearing loss would increase multisensory integration in hearing-impaired versus normal-hearing
older adults.
Results from a study conducted by Cienkowski and Carney (2002) provide some clues on the
effects of hearing loss on age-related integration enhancements. This experiment tested three groups
of participants on the McGurk illusion: (1) young adults with normal hearing, (2) older adults with
mild, but age-appropriate hearing loss, and (3) a control group of young adults with hearing thresh-
olds artificially shifted to match the older adults. Both the older adults and the threshold-shifted
controls were more likely to integrate the visual and auditory information than young, normal hear-
ing participants in one experimental condition (Cienkowski and Carney 2002). In this condition, the
participants viewed the McGurk illusion presented by a male talker. Interestingly, integration did
not differ between the three groups when the illusion was presented by a female talker. Although the
response patterns of the threshold-shifted controls closely matched that of the older adults with mild
hearing loss, the level of integration experienced by each group across the different experimental
conditions did not have a clear inverse relationship with successful unisensory target identification.
For example, in an auditory-only condition, all groups were better at identifying syllables presented
by the male talker than the female talker, yet levels of audiovisual integration were higher for all
groups in the male-talker condition (Cienkowski and Carney 2002). If increased integration in this
task were due simply to increased ambiguity in the auditory signals for older adults and control
Multisensory Integration and Aging 385
subjects (whose hearing thresholds were shifted by noise), then we would expect the highest levels
of integration under conditions where unisensory performance was poorest. Clearly, more studies
that carefully modulate signal intensities and compare the multisensory gains in younger and older
adults will be needed to further characterize the role of inverse effectiveness in age-related multi-
sensory enhancements.
In young, healthy adults, dividing attention across multiple sensory modalities appears to be
critical for multisensory integration, whereas restricting attention to a single sensory modality can
abolish behavioral and neural enhancements associated with multisensory stimuli (Alsius et al.
2005; Talsma and Woldorff 2005; Talsma et al. 2007; Mozolic et al. 2008a). Many studies have
demonstrated that older adults have deficits in attention and are more distracted by stimuli within
and across sensory modalities (Dywan et al. 1998; Alain and Woods 1999; West and Alain 2000;
Milham et al. 2002; Andres et al. 2006; Poliakoff et al. 2006; Yang and Hasher 2007; Healey et al.
2008). For example, Andres and colleagues (2006) reported that older adults were more distracted
by irrelevant sounds than younger adults on an auditory–visual oddball paradigm. It would seem
possible then, that increased multisensory integration in older adults could result from deficits in
top–down attentional control that allow more cross-modal information to be processed.
This apparently simple account of age-related increases in distractibility is complicated by the
fact that there is also strong evidence suggesting that older adults can, in fact, successfully engage
selective attention on a variety of tasks (Groth and Allen 2000; Verhaeghen and Cerella 2002;
Madden et al. 2004; Townsend et al. 2006; Ballesteros et al. 2008; Hugenschmidt et al. 2009a;
Hugenschmidt et al. 2009c). In a recent study, Hugenschmidt and colleagues (2009a) used a cued
multisensory discrimination paradigm to demonstrate that older adults can reduce multisensory
integration by attending to a single sensory modality in a similar manner as has been observed
young adults (Mozolic et al. 2008a). However, multisensory integration was still enhanced in older
adults relative to young because the levels of integration in older adults were significantly higher at
baseline, in the absence of modality-specific attentional modulation (Figure 20.2). These results indi-
cate that enhanced integration in older adults is not due to deficits in engaging top–down selective
attention mechanisms, but could instead result from age-related increases in baseline cross-modal
interactions. This alternative explanation may also help to account for the seemingly contradictory
evidence that older adults are both more distractible than younger adults and equally able to engage
selective attention.
(a) (b)
15 15
Probability difference (%)
Divided Divided
(multisensory–race)
0 0
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
–5 –5
–10 –10
Reaction time (ms) Reaction time (ms)
FIGURE 20.2 Selective attention reduces multisensory integration in younger and older adults. As in Figure
20.1, each curve represents the difference between the cumulative distribution for multisensory responses and
the race model, and thus, positive deflections show time bins where multisensory integration was observed.
In this cued, two-choice discrimination paradigm, multisensory and unisensory targets were presented under
three different attention conditions: divided attention, selective auditory attention, and selective visual atten-
tion. Younger adults exhibited integration only during divided attention conditions (peak facilitation ≈ 5%);
selective attention abolished multisensory gains (a). Older adults were also able to reduce multisensory inte-
gration during selective attention; however, due to higher levels of integration during the baseline divided
attention condition (peak facilitation ≈ 10%), older adults still exhibited significant multisensory gains during
selective attention (b). These data demonstrate that older adults are able to engage selective attention and
modulate multisensory integration, yet have a general increase in the level of integration relative to younger
adults that is independent of attention condition. (Adapted from Hugenschmidt, C.E. et al., Neuroreport, 20,
349–353, 2009a, with permission form Wolters Kluwer Health.)
Multisensory Integration and Aging 387
their baseline levels of sensory processing are elevated, they are still more distracted than younger
adults when incoming sensory streams contain irrelevant or conflicting information. However, if
the extraneous sensory information becomes task relevant, older adults will exhibit larger gains
than younger adults, as information that was previously interfering with task performance becomes
helpful in completing the task.
Additional illustrations of the costs and benefits that older adults experience as a consequence of
increased baseline sensory processing can be seen in unisensory distraction tasks (Rowe et al. 2006;
Yang and Hasher 2007; Healey et al. 2008). In one example, Yang and Hasher (2007) demonstrated
that older adults were more distracted by irrelevant pictures than young in a task that required par-
ticipants to make semantic judgments about words that appeared superimposed on the pictures. In a
very similar paradigm that modified task demands, however, older adults had an advantage (Rowe
et al. 2006). In this experiment, younger and older adults were required to make same/different
judgments about the pictures that appeared beneath an overlay containing irrelevant words. On a
subsequent test of implicit memory for the irrelevant words, older adults actually showed better
memory, indicating that they had indeed processed more “noise” or irrelevant background infor-
mation than younger adults (Rowe et al. 2006). These studies support the notion that older adults
are more distractible than younger adults because they do not adequately filter sensory noise, but
when to-be-ignored information becomes relevant, older adults can actually benefit from increased
background sensory processing.
In spite of the accumulating evidence that baseline sensory processing changes with age, there is
no clear evidence for an underlying neural mechanism. One potential source of age-related changes
in baseline filtering parameters is dysregulation of the default mode network (DMN), an anatomi-
cally and physiologically defined system of structures thought to be involved in monitoring inter-
nal thoughts and the external environment at rest (Raichle et al. 2001; Greicius and Menon 2004;
Buckner et al. 2008). Composed of regions such as the anterior cingulate, posterior cingulate/pre-
cuneus region, and the parietal cortex, the default mode network is most active during rest and
becomes less active during most goal-directed behaviors (Raichle et al. 2001; Greicius and Menon
2004; Buckner et al. 2008). Several studies have reported that the DMN is not suppressed as effec-
tively during external tasks in older adults as in young (Lustig et al. 2003; Grady et al. 2006; Persson
et al. 2007). Failure to suppress default mode network activity has also been implicated in reduced
stimulus processing during attentional lapses, increased frequency of task-unrelated thoughts, and
increased error rates (McKiernan et al. 2006; Weissman et al. 2006; Li et al. 2007). A recent study
by Stevens and colleagues (2008) directly linked increased background activity in auditory cortex
during a visual task to DMN activity. In this functional MRI (fMRI) study, older and younger adults
were asked to complete a visual working memory task in a noisy MRI scanner environment. When
older adults made errors on this task, they had increased activity in the auditory cortex. In younger
adults, however, error trials were not associated with increased auditory activation. This suggests
that older adults were processing more background information than younger adults and that the
increased processing was related to distraction by irrelevant auditory stimulation. Furthermore,
increased auditory activity was associated with increased DMN activity, indicating that older adults’
vulnerability to distraction may be linked to age-related differences in suppression of the DMN
(Stevens et al. 2008). It seems likely, therefore, that further characterization of the default mode
network in aging may be important for understanding the neural basis of altered baseline sensory
processing and enhanced multisensory integration in older adults.
general cognitive slowing (Laurienti et al. 2006) or use paradigms that equate unisensory RTs for
younger and older adults (Peiffer et al. 2007) demonstrate that multisensory gains are still larger for
older participants. A large portion of the behavioral changes that older adults exhibit in these para-
digms must therefore be specific to multisensory processing, rather than be attributed to the general
effects of sensorimotor and cognitive slowing.
Similarly, older adults’ broad time window of integration does not seem to be the source of their
multisensory processing enhancements. The analysis methods used by Diederich and colleagues
(2008) clearly show that older adults have a larger time interval over which multisensory integration
can occur; however, this is the result of slowed peripheral sensory processing and does not appear to
compensate for a decreased probability that the processing of multiple unisensory stimuli will over-
lap in time. This decreased probability of interaction between unisensory stimuli is because older
adults’ unisensory processing times are slow and highly variable, and therefore two independent
stimuli are less likely to be available for processing and integration at the same time. Yet if the two
stimuli are integrated, the older adults are speeded more than younger adults (Diederich et al. 2008).
Thus, older adults’ wider time window of integration, a consequence of increased RT and variability,
does not provide an explanation as to why integration is stronger in older adults when it does occur.
Another logical hypothesis is that older adults show enhanced multisensory integration because
they are unable to use selective attention to filter incoming sensory information; however, age-related
deficits in attentional control fail to adequately explain integration enhancements. Hugenscmidt
et al. (2009c) have confirmed that older adults can successfully instantiate modality-specific selec-
tive attention and have further demonstrated that there is no age-related difference in the magnitude
of multisensory integration reduction during selective attention (Hugenschmidt et al. 2009a). Rather
than implicating selective attention deficits as the source of underlying increases in multisensory
integration, data suggest that older adults differ from younger adults in the amount of baseline
sensory processing. Findings from an MRI study of CBF support this notion, showing that audi-
tory cortex CBF associated with task-irrelevant scanner noise is increased in older adults relative to
young, both during rest and during a visual task (Hugenschmidt et al. 2009b). Increased activity in
brain structures that comprise the default mode network has been implicated in the level of back-
ground sensory processing in older adults, and further investigation of the DMN may yield critical
information about the nature of age-related changes in baseline sensory processing that can inform
our understanding of multisensory integration in aging.
Another potential mechanism for age-related increases in multisensory benefits that cannot be
discounted is inverse effectiveness. To our knowledge, there have been no conclusive studies on the
relationship between stimulus salience and multisensory gains in older adults. A recent fMRI study
in younger adults, performed by Stevenson and colleagues (2009), demonstrated inverse effective-
ness in the patterns of cortical activity during audiovisual presentations of speech and object stimuli.
As the intensity of the auditory and visual stimulus components decreased, activation gains in the
superior temporal sulcus during multisensory stimuli increased. In other words, highly effective
sensory stimuli resulted in smaller activity changes in multisensory cortex compared to degraded
stimuli. A similar experimental paradigm could be used to investigate the relationship between stim-
ulus effectiveness and multisensory enhancements at the cortical level in younger and older adults.
Over the past several years, we have learned a great a deal about how multisensory processing
changes with age; however the mechanisms underlying age-related enhancements in multisensory
integration are not yet clear. Further exploration of the connections between baseline sensory pro-
cessing, stimulus salience, and multisensory gains should provide insight into the advantages and
impairments older adults can experience from changes in multisensory integration.
REFERENCES
Alain, C., and D. L. Woods. 1999. Age-related changes in processing auditory stimuli during visual attention:
Evidence for deficits in inhibitory control and sensory memory. Psychol Aging 14:507–519.
390 The Neural Bases of Multisensory Processes
Alsius, A., J. Navarra, R. Campbell, and S. Soto-Faraco. 2005. Audiovisual integration of speech falters under
high attention demands. CurrBiol 15:839–843.
Andres, P., F. B. Parmentier, and C. Escera. 2006. The effect of age on involuntary capture of attention by irrel-
evant sounds: A test of the frontal hypothesis of aging. Neuropsychologia 44:2564–2568.
Backman, L., L. Nyberg, U. Lindenberger, S. C. Li, and L. Farde. 2006. The correlative triad among aging,
dopamine, and cognition: Current status and future prospects. Neurosci Biobehav Rev 30:791–807.
Ballesteros, S., J. M. Reales, J. Mayas, and M. A. Heller. 2008. Selective attention modulates visual and haptic
repetition priming: Effects in aging and Alzheimer’s disease. Exp Brain Res 189:473–483.
Bertsch, K., D. Hagemann, M. Hermes, C. Walter, R. Khan, and E. Naumann. 2009. Resting cerebral blood
flow, attention, and aging. Brain Res 1267:77–88.
Birren, J. E., and L. M. Fisher. 1995. Aging and speed of behavior: Possible consequences for psychological
functioning. Ann Rev Psychol 46:329–353.
Buckner, R. L., J. R. Andrews-Hanna, and D. L. Schacter. 2008. The brain’s default network: Anatomy, func-
tion, and relevance to disease. Ann NY Acad Sci 1124:1–38.
Cabeza, R., S. M. Daselaar, F. Dolcos, S. E. Prince, M. Budde, and L. Nyberg. 2004. Task-independent and
task-specific age effects on brain activity during working memory, visual attention and episodic retrieval.
Cereb Cortex 14:364–375.
Cerella, J. 1985. Information processing rates in the elderly. Psychol Bull 98:67–83.
Cerf-Ducastel, B., and C. Murphy. 2003. FMRI brain activation in response to odors is reduced in primary
olfactory areas of elderly subjects. Brain Res 986:39–53.
Charman, W. N. 2008. The eye in focus: Accommodation and presbyopia. Clinical and Experimental Optometry
91:207–225.
Cienkowski, K. M., and A. E. Carney. 2002. Auditory–visual speech perception and aging. Ear Hear
23:439–449.
Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-of-
integration model. J Cogn Neurosci 16:1000–1009.
Corbetta, M., F. M. Miezin, S. Dobmeyer, G. L. Shulman, and S. E. Petersen. 1990. Attentional modulation of
neural processing of shape, color, and velocity in humans. Science 248:1556–1559.
Cornelissen, F. W., and A. C. Kooijman. 2000. Does age change the distribution of visual attention? A comment
on McCalley, Bouwhuis, and Juola (1995). J Gerontol B Psychol Sci Soc Sci 55:187–190.
Diederich, A., H. Colonius, and A. Schomburg. 2008. Assessing age-related multisensory enhancement with
the time-window-of-integration model. Neuropsychologia 46:2556–2562.
Dywan, J., S. J. Segalowitz, and L. Webster. 1998. Source monitoring: ERP evidence for greater reactivity to
nontarget information in older adults. Brain Cogn 36:390–430.
Ghatan, P. H., J. C. Hsieh, K. M. Petersson, S. Stone-Elander, and M. Ingvar. 1998. Coexistence of attention-
based facilitation and inhibition in the human cortex. Neuroimage 7:23–29.
Good, C. D., I. S. Johnsrude, J. Ashburner, R. N. Henson, K. J. Friston, and R. S. Frackowiak. 2001. A voxel-
based morphometric study of ageing in 465 normal adult human brains. Neuroimage 14:21–36.
Grady, C. L. 2008. Cognitive neuroscience of aging. Ann NY Acad Sci 1124:127–144.
Grady, C. L., M. V. Springer, D. Hongwanishkul, A. R. McIntosh, and G. Winocur. 2006. Age-related changes
in brain activity across the adult lifespan. J Cogn Neurosci 18:227–241.
Greicius, M. D., and V. Menon. 2004. Default-mode activity during a passive sensory task: Uncoupled from
deactivation but impacting activation. J Cogn Neurosci 16:1484–1492.
Groth, K. E., and P. A. Allen. 2000. Visual attention and aging. Front Biosci 5:D284.
Hairston, W. D., P. J. Laurienti, G. Mishra, J. H. Burdette, and M. T. Wallace. 2003. Multisensory enhancement
of localization under conditions of induced myopia. Exp Brain Res 152:404–408.
Hale, S., J. Myerson, G. A. Smith, and L. W. Poon. 1988. Age, variability, and speed: Between-subjects diver-
sity. Psychology and Aging 3:407.
Healey, M. K., K. L. Campbell, and L. Hasher. 2008. Cognitive aging and increased distractibility: Costs and
potential benefits (Chapter 22). Prog Brain Res 169:353–363.
Helfer, K. S. 1998. Auditory and auditory–visual recognition of clear and conversational speech by older adults.
J Am Acad Audiol 9:234.
Hugenschmidt, C. E., J. L. Mozolic, and P. J. Laurienti. 2009a. Suppression of multisensory integration by
modality-specific attention in aging. Neuroreport 20:349–353.
Hugenschmidt, C. E., J. L. Mozolic, H. Tan, R. A. Kraft, and P. J. Laurienti. 2009b. Age-related increase in
cross-sensory noise in resting and steady-state cerebral perfusion. Brain Topography 20:241–251.
Hugenschmidt, C. E., A. M. Peiffer, T. P. McCoy, S. Hayasaka, and P. J. Laurienti. 2009c. Preservation of cross-
modal selective attention in healthy aging. Exp Brain Res 198:273–285.
Multisensory Integration and Aging 391
Hultsch, D. F., S. W. MacDonald, and R. A. Dixon. 2002. Variability in reaction time performance of younger
and older adults. J Gerontol B Psychol Sci Soc Sci 57:101–115.
Johnson, J. A., and R. J. Zatorre. 2006. Neural substrates for dividing and focusing attention between simulta-
neous auditory and visual events. Neuroimage 31:1673–1681.
Kalina, R. E. 1997. Seeing into the future. Vision and aging. West J Med 167:253–257.
Kastner, S., and L. G. Ungerleider. 2000. Mechanisms of visual attention in the human cortex. Annu Rev
Neurosci 23:315–341.
Kawashima, R., B. T. O’Sullivan, and P. E. Roland. 1995. Positron-emission tomography studies of cross-
modality inhibition in selective attentional tasks: Closing the “mind’s eye.” Proc Natl Acad Sci U S A
92:5969–5972.
Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48:373–384.
Kovács, T. 2004. Mechanisms of olfactory dysfunction in aging and neurodegenerative disorders. Ageing Res
Rev 3:215.
Laurienti, P. J., J. H. Burdette, J. A. Maldjian, and M. T. Wallace. 2006. Enhanced multisensory integration in
older adults. Neurobiol Aging 27:1155–1163.
Laurienti, P. J., R. A. Kraft, J. A. Maldjian, J. H. Burdette, and M. T. Wallace. 2004. Semantic congruence is a
critical factor in multisensory behavioral performance. Exp Brain Res 158:405–414.
Li, C. S., P. Yan, K. L. Bergquist, and R. Sinha. 2007. Greater activation of the “default” brain regions predicts
stop signal errors. Neuroimage 38:640–648.
Liu, X., and D. Yan. 2007. Ageing and hearing loss. J Pathol 211:188–197.
Lustig, C., A. Z. Snyder, M. Bhakta et al. 2003. Functional deactivations: Change with age and dementia of the
Alzheimer type. Proc Natl Acad Sci U S A 100:14504.
Madden, D. J., W. L. Whiting, R. Cabeza, and S. A. Huettel. 2004. Age-related preservation of top-down atten-
tional guidance during visual search. PsycholAging 19:304.
Martin, A. J., K. J. Friston, J. G. Colebatch, and R. S. Frackowiak. 1991. Decreases in regional cerebral blood
flow with normal aging. J Cereb Blood Flow Metab 11:684–689.
McKiernan, K. A., B. R. D’Angelo, J. N. Kaufman, and J. R. Binder. 2006. Interrupting the “stream of con-
sciousness”: An fMRI investigation. Neuroimage 29:1185–1191.
Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221:389–391.
Meredith, M. A., and B. E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. J Neurophysiol 56:640–662.
Milham, M. P., K. I. Erickson, M. T. Banich et al. 2002. Attentional control in the aging brain: Insights from an
fMRI study of the stroop task. Brain Cogn 49:277.
Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cogn Psychol
14:247–279.
Miller, J. 1986. Time course of coactivation in bimodal divided attention. Percept Psychophys 40:331–343.
Morse, C. K. 1993. Does variability increase with age? An archival study of cognitive measures. Psychol Aging
8:156–164.
Mozolic, J. L., C. E. Hugenschmidt, A. M. Peiffer, and P. J. Laurienti. 2008a. Modality-specific selective atten-
tion attenuates multisensory integration. Exp Brain Res 184:39–52.
Mozolic, J. L., D. Joyner, C. E. Hugenschmidt et al. 2008b. Cross-modal deactivations during modality-specific
selective attention. BMC Neurol 8:35.
Muir, J. L. 1997. Acetylcholine, aging, and Alzheimer’s disease. Pharmacol Biochem Behav 56:687–696.
Ostroff, J. M., K. L. McDonald, B. A. Schneider, and C. Alain. 2003. Aging and the processing of sound dura-
tion in human auditory cortex. Hear Res 181:1–7.
Peiffer, A. M., J. L. Mozolic, C. E. Hugenschmidt, and P. J. Laurienti. 2007. Age-related multisensory enhance-
ment in a simple audiovisual detection task. Neuroreport 18:1077–1081.
Persson, J., C. Lustig, J. K. Nelson, and P. A. Reuter-Lorenz, 2007. Age differences in deactivation: A link to
cognitive control? J Cogn Neurosci 19:1021–1032.
Poliakoff, E., S. Ashworth, C. Lowe, and C. Spence. 2006. Vision and touch in ageing: Crossmodal selective
attention and visuotactile spatial interactions. Neuropsychologia 44:507–517.
Posner, M. I., and J. Driver. 1992. The neurobiology of selective attention. Curr Opin Neurobiol 2:165–169.
Quiton, R. L., S. R. Roys, J. Zhuo, M. L. Keaser, R. P. Gullapalli, and J. D. Greenspan. 2007. Age-related
changes in nociceptive processing in the human brain. Ann NY Acad Sci 1097:175–178.
Raichle, M. E., A. M. MacLeod, A. Z. Snyder, W. J. Powers, D. A. Gusnard, and G. L. Shulman. 2001. A default
mode of brain function. Proc Natl Acad Sci U S A 98:676–682.
392 The Neural Bases of Multisensory Processes
Rapp, P. R., and W. C. Heindel. 1994. Memory systems in normal and pathological aging. Curr Opin Neurol
7:294–298.
Rhodes, M. G. 2004. Age-related differences in performance on the Wisconsin card sorting test: A meta-ana-
lytic review. Psychol Aging 19:482–494.
Rowe, G., S. Valderrama, L. Hasher, and A. Lenartowicz. 2006. Attentional disregulation: A benefit for implicit
memory. Psychol Aging 21:826–830.
Salat, D. H., D. N. Greve, J. L. Pacheco et al. 2009. Regional white matter volume differences in nondemented
aging and Alzheimer’s disease. Neuroimage 44:1247–1258.
Salthouse, T. A. 1988. The complexity of age × complexity functions: Comment on Charness and Campbell
(1988). J Exp Psychol Gen 117:425.
Salthouse, T. A. 2000. Aging and measures of processing speed. Biol Psychol 54:35–54.
Schmolesky, M. T., Y. Wang, M. Pu, and A. G. Leventhal. 2000. Degradation of stimulus selectivity of visual
cortical cells in senescent rhesus monkeys. Nat Neurosci 3:384–390.
Shaffer, S. W., and A. L. Harrison, 2007. Aging of the somatosensory system: A translational perspective. Phys
Ther 87:193–207.
Sommers, M. S., N. Tye-Murray, and B. Spehar. 2005. Auditory–visual speech perception and auditory–visual
enhancement in normal-hearing younger and older adults. Ear Hear 26:263–275.
Spence, C., and J. Driver. 1997. On measuring selective attention to an expected sensory modality. Percept
Psychophys 59:389–403.
Spence, C., M. E. Nicholls, and J. Driver. 2001. The cost of expecting events in the wrong sensory modality.
Percept Psychophys 63:330–336.
Stevens, W. D., L. Hasher, K. S. Chiew, and C. L. Grady. 2008. A neural mechanism underlying memory failure
in older adults. J Neurosci 28:12820–12824.
Stevenson, R. A., and T. W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse
effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210.
Stine, E. A., A. Wingfield, and S. D. Myers. 1990. Age differences in processing information from television
news: The effects of bisensory augmentation. J Gerontol 45:1–8.
Strupp, M., V. Arbusow, C. Borges Pereira, M., Dieterich, and T. Brandt. 1999. Subjective straight-ahead during
neck muscle vibration: Effects of ageing. Neuroreport 10:3191–3194.
Talsma, D., and M. G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of
effects on the evoked brain activity. J Cogn Neurosci 17:1098–1114.
Talsma, D., T. J. Doty, and M. Woldorff. 2007. Selective attention and audiovisual integration: Is attending to
both modalities a prerequisite for early integration? Cereb Cortex 17:679–690.
Townsend, J., M. Adamo, and F. Haist. 2006. Changing channels: An fMRI study of aging and cross-modal
attention shifts. Neuroimage.
Tye-Murray, N., M.S. Sommers, and B. Spehar. 2007. Audiovisual integration and lipreading abilities of older
adults with normal and impaired hearing. Ear Hear 28:656–668.
Verhaeghen, P., and L. De Meersman. 1998. Aging and the Stroop effect: A meta-analysis. Psychol Aging
13:120–126.
Verhaeghen, P., and J. Cerella. 2002. Aging, executive control, and attention: A review of meta-analyses.
Neurosci Biobehav Rev 26:849–857.
Wallace, M. T., M. A. Meredith, and B. E. Stein. 1992. Integration of multiple sensory modalities in cat cortex.
Exp Brain Res 91:484–488.
Weissman, D. H., K. C. Roberts, K. M. Visscher, and M. G. Woldorff. 2006. The neural bases of momentary
lapses in attention. Nat Neurosci 9:971–978.
West, R., and C. Alain. 2000. Age-related decline in inhibitory control contributes to the increased Stroop effect
observed in older adults. Psychophysiology 37:179.
Yang, L., and L. Hasher. 2007. The enhanced effects of pictorial distraction in older adults. J Gerontol B
Psychol Sci Soc Sci 62:230–233.
Yordanova, J., V. Kolev, J. Hohnsbein, and M. Falkenstein. 2004. Sensorimotor slowing with ageing is medi-
ated by a functional dysregulation of motor-generation processes: Evidence from high-resolution event-
related potentials. Brain 127:351–362.
Section V
Clinical Manifestations
21 Neurophysiological
Mechanisms Underlying Plastic
Changes and Rehabilitation
following Sensory Loss in
Blindness and Deafness
Ella Striem-Amit, Andreja Bubic, and Amir Amedi
CONTENTS
21.1 Introduction........................................................................................................................... 395
21.2 Rehabilitation following Sensory Loss.................................................................................. 397
21.2.1 Sensory Substitution Devices.................................................................................... 397
21.2.2 Sensory Restoration Approaches...............................................................................400
21.2.3 Functional Visual Rehabilitation...............................................................................402
21.3 Neural and Cognitive Consequences of Sensory Loss..........................................................403
21.3.1 Evidence for Robust Plasticity Promoted by Sensory Loss.......................................404
21.3.2 Principles Guiding Reorganization following Sensory Loss.....................................406
21.3.3 Plasticity following Sensory Loss across the Lifespan.............................................407
21.3.4 Neurophysiologic Mechanisms Underlying Plastic Changes in the Blind................409
21.4 Rehabilitation-Induced Plasticity.......................................................................................... 412
21.4.1 Plasticity after SSD Use and Its Theoretical Implications........................................ 412
21.5 Concluding Remarks and Future Directions......................................................................... 414
References....................................................................................................................................... 415
21.1 INTRODUCTION
We live in a society that is based on vision. Visual information is used for orienting in our envi-
ronment, identifying objects in our surroundings, alerting us to important events that require our
attention, engaging in social interactions, and many more necessary functions so we can efficiently
function in everyday life. Similarly, audition is used for communication and for guiding our atten-
tion to potentially important or even dangerous events (e.g., the sound of a nearing car). Thus, the
loss of any of these modalities decreases the quality of life and represents a severe challenge to
efficient functioning for tens of millions of individuals worldwide (World Health Organization, Fact
Sheet no. 282, May 2009). Furthermore, it has a significant economic impact on society.
It is therefore not surprising that numerous approaches and potential solutions designed to over-
come these difficulties have been put forward to help the sensory-impaired. Although such compen-
sation devices, for example, highly sensitive hearing aids, volume enhancing devices for different
technologies, and medical–technological solutions such as cochlear implants, are much more suc-
cessful for the auditorily impaired, compensation and technological aids for the visually impaired,
395
396 The Neural Bases of Multisensory Processes
the focus of this chapter, are currently much less effective. At this point, the most commonly used
rehabilitation techniques for blindness are sensory aids such as the Braille reading system, mobil-
ity aids such as canes, or more contemporary devices such as obstacle detectors, laser canes, or
ultrasonic echolocating devices. All of these devices derive from the premise that the blind are
deprived of numerous important types of information typically acquired through vision and attempt
to supply such information through other sensory systems. Typically, these attempts utilize the nor-
mal perceptual processing of the system they exploit for communicating the relevant information.
In contrast to this, the new generation of sensory aids takes one step further, as it aims to deliver
pure visual information to the brains of the blind, either by surgically or medically restoring the
missing functionality of the eyes and brain areas typically exploited for visual processing (as is
already done in audition to some extent, mainly for successful perception of auditory single speaker
communication, using cochlear implants; Fallon et al. 2008; Spelman 2006; Geers 2006) or by
“teaching” these regions to take over visual functions after introducing them to visual information
transmitted through nonvisual modalities. The first group of such techniques, neuroprosthetic medi-
cal solutions is invasive—requiring surgical interventions, and extremely expensive at the moment.
These approaches show some promising results only in very restricted populations of the blind, but
unfortunately only to a limited extent. However, once the technological (and neuroscientific, i.e.,
the ability of the brain to make sense of the restored input; see below) obstacles are resolved, these
may hold great future promise for restoring natural vision to many blind individuals, similar to
the enormous progress in the treatment of deafness that has been made since the development of
cochlear implants. Similarly, novel medical approaches for replacing the damaged sensory receptor
cells, via stem cell transplantation (which will be discussed briefly) may be very promising in the
further future, but are currently only at relatively preliminary research stages. The second group
of rehabilitation approaches includes sensory substitution devices (SSDs) that represent noninva-
sive, cheap, and relatively accessible techniques. These devices are specifically designed in order to
deliver visual information to the blind using their remaining and fully functioning sensory modali-
ties, in hope that the brains of such individuals would learn to exploit this information, similar to the
way the sighted use equivalent information transmitted through the visual pathway. Although this
hope may appear counterintuitive or even unrealistic, the most recent SSDs are currently showing
remarkable behavioral outcomes. Such efficiency combined with their low cost and broad applica-
bility to different types or ages at sensory loss make them highly attractive sensory aids. This is
especially important in blindness, given that 87% of the blind are located in developing countries
and therefore need cheap and widely applicable solutions (World Health Organization, Fact Sheet
no. 282, May 2009).
In order to capture the “magic” of these rehabilitation approaches and illustrate how surprisingly
efficient they might be if proper training is applied, we will begin this chapter by presenting some
of these exciting new solutions and briefly discuss the rehabilitation outcomes currently associ-
ated with them. To better understand the mechanisms mediating such outcomes and appreciate the
remaining challenges that need to be overcome, in the second part of the chapter we provide a more
theoretical illustration of neuroplastic changes associated with the use of these devices. In particu-
lar, we show that these changes are not “magic” nor in any way restricted to the use of the presented
rehabilitation techniques. On the contrary, these techniques are designed in order to exploit and
channel the brain’s natural potential for change. This potential is present in all individuals, but may
become somewhat more accentuated in the brains of the sensory-impaired, as the lack of one sen-
sory modality leaves vast cortical regions free of their typical input and triggers a reorganization of
such cortices and their integration into other brain networks. This reorganization is constrained and
channeled by the individual’s own activity, information available from the environment, as well as
intrinsic properties of the neural system promoting or limiting such changes during different periods
in life. Importantly, such restructuring is crucial for enabling the cognitive changes that also occur
after sensory loss, allowing the sensory-impaired individuals to efficiently function in their environ-
ment. Specifically, successfully dealing with sensory impairment often results in collateral benefits,
Neurophysiological Mechanisms Underlying Plastic Changes 397
which include better differentiation and higher efficiency of nonvisual sensory or other cognitive
functions. Many of the neural and cognitive changes triggered by sensory loss will be reviewed in
the second part of the chapter, illustrating how they rely on the same mechanisms as those underly-
ing the successful outcomes of novel rehabilitation techniques, which will now be presented.
(Kay and Kay 1983) typically scan the environment acoustically (ultrasonically) or optically (laser
light), and transmit spatial information on obstacles and objects in the surroundings via vibrotactile
or auditory signals.
In contrast to devices that are typically designed for a limited purpose and are successful in
substituting for only certain functional aspects of vision, more sophisticated techniques that replace
vision through tactile or auditory information have been developed over the past few decades (see
Figure 21.1a). The first targeted modality for substituting vision was touch, because of the simplicity
and ease of transforming visual into tactile signals that are both characterized by two-dimensional
(2-D) spatial representations (retina in vision and skin surface in touch). Pioneering work in this
field was done in the 1970s by Paul Bach-y-Rita, who devised a tactile display that mapped images
from a video camera to a vibrotactile device worn on the subject’s back. This device (Bach-y-Rita
2004; Bach-y-Rita et al. 1969; Bach-y-Rita and Kercel 2003), dubbed the Tactile Vision Substitution
System, provided tactile transformation of black-and-white images at a resolution of 20 × 20 pixels
and enabled the blind to perform sufficiently well in some visual tasks. However, it was extremely
large and immobile, which motivated the development of smaller, mobile tactile devices placed on
the tongue and forehead (for a review, see Bach-y-Rita 2004) that are also characterized by better
spatial somatosensory resolution. One of these, the Tongue display unit (TDU) (Bach-y-Rita et
al. 1968, 1998), an electrotactile device composed of a 12 × 12 matrix of stimulators (measuring
approximately 3 cm2) placed on the subject’s tongue, provides blind individuals with an initial
“visual” acuity (tested by the Snellen E chart) comparable to 20/860 (Sampaio et al. 2001; the
numerator refers to the distance in feet from which a person can reliably distinguish a pair of
objects, whereas the denominator is the distance from which a person with standard visual acuity
would be able to distinguish them; in North America and most of Europe, legal blindness is defined
as visual acuity of 20/200 or poorer), which might improve after training. Other studies investigat-
ing this device suggest that at least a subgroup of early-onset blind individuals may particularly
benefit from its use (Chebat et al. 2007).
Audition was the second candidate to substitute for vision. The development of auditory-based
devices was triggered by certain limitations of tactile SSDs: their price and the fact that they are
inherently limited by the spatial resolution of touch and relatively lower information content due to
a cap on the number of electrodes. The first auditory SSD device was The vOICe system (Meijer
1992), which currently uses a default resolution of 176 × 64 sampling points. This mobile and inex-
pensive device uses a video camera, which provides the visual input, a small computer running
the conversion program, and stereo headphones that provide the resulting sound patterns to the
user. Given the fact that 87% of the world’s visually impaired live in developing countries (WHO
report 2009 fact sheet 282), the importance of providing solutions that are not just high-resolution,
but also cheap and accessible, cannot be underestimated. To some extent, visual-to-auditory SSDs
fulfill all of these criteria. However, these devices still pose great challenges both to the developers
and the brains of blind individuals using them, as they rely on conversion algorithms that are much
less intuitive than those employed by visual-to-tactile SSDs. For example, in the visual-to-auditory
The vOICe SSD (Meijer 1992), the conversion program transforms visual into auditory information
(‘soundscapes’) based on three simple rules: the vertical axis (i.e., elevation of the object) is repre-
sented by frequency, the horizontal axis by time and stereo panning, and the brightness of the image
is encoded by loudness. Although these conversion rules appear relatively simple, explicit and quite
extensive training is required to learn how to interpret even simple shapes. Similar but not identi-
cal transformations are implemented in two more recently developed auditory SSDs: the Prosthesis
Substituting Vision with Audition (PSVA; Capelle et al. 1998) and SmartSight (Cronly-Dillon et al.
1999, 2000). PSVA uses different tones to provide horizontal location directly, whereas SmartSight
presents the vertical location information in terms of musical notes. PSVA can break down the
“visual sound” into components of vertically and horizontally oriented edges. Additionally, PSVA
applies a magnification to the center of the image to simulate the better resolution (magnification
factor) of the human fovea.
Neurophysiological Mechanisms Underlying Plastic Changes 399
(a)
Visual objects
(b) Tactile and SSD objects (c) Haptic objects
p = 0.005
n=7 IPS
p = 0.05 LOtv
(Corr.)
LOtv
FIGURE 21.1 Sensory substitution devices: general concept of sensory substitution (SSD) and use of SSDs
in studying brain plasticity, perception, and multisensory integration. (a) SSDs typically include a visual cap-
turing device (e.g., camera glasses), a computational device transforming visual input into either a tactile or
auditory display using a simple known transformation algorithm, and an output device, then transmitting this
information to user. Right: example of an auditory SSD (e.g., The vOICe; Meijer 1992) transmitting sensory-
transformed information using headphones. Left: example of a tactile device that can transmit tactile informa-
tion via an electrode array targeting the tongue (e.g., TDU; Bach-y-Rita et al. 1998) or another skin surface,
in this case presented in neck. (With kind permission from Springer Science+Business Media: Multisensory
Object Perception in the Primate Brain, part 4, 2010, 351–380, Bubic, A. et al., figure number 18.2.) (b) A con
junction analysis for shape perception across modalities and experimental conditions in a group of seven expert
users of The vOICe SSD (five sighted, one late blind, and one congenitally blind). A conjunction analysis test-
ing for common areas of activation between object recognition using soundscapes (i.e., using The vOICe SSD
to extract shape information) and by touch but not typical sounds made by objects (which does not convey
shape information) or by corresponding sensory controls. Contrast (random-effect GLM model, corrected for
multiple comparisons) showed bilateral LO activation with weaker responses in right hemisphere, signifying
that lateral occipital complex (LOC) region is a multimodal operator for shape. (Modified and adapted from
Amedi, A. et al., Nat Neurosci, 10, 687–689, 2007.) (c) Object-related regions in visual and haptic modalities
shown on an inflated right hemisphere (top: lateral view; bottom: ventral view). Visual object selectivity is
relative to scrambled visual images; haptic object selectivity is relative to haptic textures. Visuo-haptic object
selectivity in LOC is found within lateral occipito-temporal sulcus (delineating LOtv), similar to location of
multimodal object related area shown in panel b. (Modified and adapted from Amedi, A. et al., Nat Neurosci,
4, 324–330, 2001; and Lacey, S. et al., Brain Topogr, 21, 269–274, 2009.)
400 The Neural Bases of Multisensory Processes
Although extremely different, both auditory and tactile SSDs can potentially be very useful for
the blind. Recent tests show that blind and/or blindfolded sighted individuals can, especially after
training or prolonged use of the device (Poirier et al. 2006b), learn to interpret the transmitted infor-
mation and use it in simple visual discrimination and recognition (Arno et al. 1999, 2001; Sampaio
et al. 2001; Poirier et al. 2006b) as well as more complex tasks in which acquiring the knowledge of
spatial locations of objects (Auvray et al. 2007; Proulx et al. 2008) or constructing mental images
of more complex environments (Cronly-Dillon et al. 2000) is required. More anecdotal reports that
have not yet been explored in formal research confirm that following extended use of such devices,
behavioral abilities using SSDS may be even more promising, as they can be used to identify facial
expressions and read simple words (Amedi, Striem-Amit, and Reich, unpublished observation; see,
e.g., http://brain.huji.ac.il/press.asp) as well as to orient and navigate in everyday life (see, e.g.,
the reports of a late-onset blind individual of her experiences with the vOICe SSD in http://www
.seeingwithsound.com/users.htm).
Although the sensory-transformed information may partly occupy an available sensory chan-
nel or at least add to its attentional load (e.g., provide vOICe information in addition to naturally
occurring environmental sounds), after training such percepts should not significantly interfere with
normal sensory perception. However, it needs to be emphasized that training is crucial in obtain-
ing optimal results in this regard, as initial usage of SSDs may be confusing and overwhelming for
the sensory impaired. Because of the multisensory nature of perception, the human brain can be
expected to successfully process these percepts in a parallel manner, similarly to processing several
types of visual parameters, allocating attention to the most relevant visual feature at the time, and
similarly to perceiving an auditory conversation above other environmental noises. Naturally, how-
ever, if an individual uses the SSD in very high volume or if the environmental sounds are near per-
ceptual thresholds, there might be a significant cost to using the SSD in the intact sensory channel.
Future studies on dividing attention between SSD input and the natural sensory input are needed to
fully assess the possible interference in such cases.
Overall, although there is still a great deal of work to be done in this area, initial experi-
ences with SSDs show more than promising results. These devices truly offer new hope for the
sensory-impaired in a somewhat nonintuitive, but “brain-friendly” manner, as they use normal
neural resources and functioning modalities for transmitting previously unavailable informa-
tion. Furthermore, in order to fully appreciate the value of sensory substitution, it might be use-
ful to imagine how exciting it would be to have infrared vision or hear ultrasound frequencies.
Interestingly, future, second-generation SSDs might just make these “superhuman” abilities pos-
sible: just as visual information can be transmitted and used by the blind through their functioning
auditory or tactile modality, so could infrared or ultrasound frequencies be perceived by anyone
using functioning vision or audition. For the blind, the efficient use of vision or visual information
transferred via such SSDs represents exactly this type of ability or an even greater accomplish-
ment, as they need to function in an environment optimally designed for the majority of the popu-
lation, that is, the sighted.
and Humayun 2008), or identify very simple patterns, shapes and even letters (Brelen et al. 2005;
Dobelle 2000; Weiland and Humayun 2008). However, there are still several major issues currently
preventing these electrical and biological approaches from becoming true clinical solutions. First
of all, their invasive nature makes them prone to risks related to surgical procedures in the brain,
such as inflammation, hemorrhage, increased patient mortality, and focal seizures induced by direct
cortical stimulation in the case of retinal prostheses, and risks involved with immune rejection of
the implanted cells in the case of cell transplantation solutions.
Moreover, retinal prostheses (and retinal molecular approaches such as cell transplantation ther-
apy, detailed above), which currently appear more promising as future solutions for blindness, are
not applicable to all populations of the blind, as they require the existence of residual functional
retinal ganglion cells. Additionally, these techniques are expensive, making them unavailable to the
majority of the blind, who reside in developing countries. In addition to these drawbacks, visual
prostheses have severe technical limitations including relatively low resolution, narrow field of view,
and the need for complicated image processing algorithms compensating for the visual processing
taking place in the retina itself. Functionally, these devices typically do not take advantage of eye
movements (an exception to this is the system developed by Palanker (Palanker et al. 2005), and
require large and slow head movements to scan entire visual patterns (Brelen et al. 2005; Veraart
et al. 2003; Chen et al. 2007). Therefore, visual prostheses (which are not yet available except in
preliminary clinical trials) do not yet provide sight that resembles natural vision and a key milestone
in this field, namely, generating truly useful and functional vision, at affordable costs has yet to be
reached. Finally, just like cochlear implants (or even much more than this), visual prostheses require
extensive training in order to achieve reasonable performance even for very simple stimuli. This
will be discussed in the next section. If, however, visual prosthesis research, and even more so bio-
logical methods replacing the actual retinal cells, can overcome these obstacles, these approaches
could provide a real visual experience and not just the “visual” information or orientation provided
by SSDs.
surgical procedure is coupled with specific additional rehabilitation strategies that modulate brain
processing, enabling it to extract relevant and functionally meaningful information from neuropros-
thetic inputs that should gradually lead to restoration or development of visual functions. Thus, in
contrast to the encouraging behavioral outcomes of some cochlear implant patients, it is illusory
to expect that such successful sensory restoration can easily generalize to different subpopulations
of sensory impaired, such as the visually impaired. More research and development of behavioral
rehabilitation may be needed to achieve functional sensory ability in those who once suffered from
sensory loss. To fulfill this goal, we will have to overcome more than just surgical or technical chal-
lenges that will enable safer medical procedures or more advanced sensory substitution algorithms.
Although necessary, such advancements will have to be complemented by knowledge pertaining to
brain mechanisms and cognitive functions we want to change or develop using the available reha-
bilitation techniques. Thus, achieving full sensory restoration will only be possible if we take into
account the specificities of cognitive and neural functioning of the sensory impaired, a topic that
will be presented in the next part of the chapter.
of the blind and deaf do not degenerate. Rather, they undergo extensive plasticity resulting in sig-
nificantly changed neural responsiveness as well as functional involvement in nonvisual/nonauditory
cognitive functions. Significant, although typically less extensive, plastic changes may also occur in
populations suffering from noncongenital sensory loss. This neuroplasticity is evident both in atypi-
cal brain activation in the blind when compared with that of the sighted, as well as in behavioral
manifestations, for example, sensory hyperacuity and specific cognitive skills.
Consequently, the blind have problems recognizing potentially useful information needed to per-
form the mentioned tasks and lack the benefits that could arise from simultaneously available vision.
For example, concurrent visual input could facilitate recognition and learning of helpful auditory
or somatosensory features given that the existence of redundant or overlapping information from
more than one modality is generally associated with guiding attention and enhanced learning of
amodal stimulus features (Lickliter and Bahrick 2004). Nevertheless, such recognition of useful
cues or calibration of auditory and tactile space is eventually possible even in the absence of vision,
as it may be achieved using different cues, for example, those stemming from self-motion (Ashmead
et al. 1989, 1998). Importantly, although it may require relatively long training to reach a stage in
which the missing sensory input is replaced and compensated for by equivalent information from
other modalities, spatial representations that are finally generated on the basis of haptic and auditory
input of the blind seem to be equivalent to the visually based ones in the sighted (Röder and Rösler
1998; Vanlierde and Wanet-Defalque 2004). Overall, the findings indicate that the blind, once they
learn to deal with the available sensory modalities, can show comparable or superior performance
in many tasks when compared to the sighted. This advantage can even be compromised by the pres-
ence of visual information, as indicated by inferior performance of the partially blind (Lessard et
al. 1998). Thus, the available evidence tends to counter the notion that sensory loss leads to general
maladjustment and dysfunction in functions outside the missing modality. Quite the contrary, this
general-loss hypothesis should be abandoned in favor of the alternative, compensatory hypothesis
suggesting that sensory loss leads to the superior development of the remaining senses (Pascual-
Leone et al. 2005).
In the past decades, neural correlates of reported impairment-induced changes in cognitive func-
tions and strategies have been thoroughly studied, providing a wealth of information regarding the
brain’s abilities to change. Studies investigating neural processing of congenitally blind and deaf
individuals, as well as more invasive animal models of these conditions, show that the brain is
capable of robust plasticity reflected in profoundly modified functioning of entire brain networks.
Important evidence pertaining to the altered cognitive processing and the functional status of the
occipital cortex in the blind stems from electrophysiological studies that investigated nonvisual
sensory functions of the blind. These yielded results showing shorter latencies for event-related
potentials (ERP) in auditory and somatosensory tasks in the blind in contrast to the sighted, sug-
gesting more efficient processing in these tasks in this population (Niemeyer and Starlinger 1981;
Röder et al. 2000). Furthermore, different topographies of the elicited ERP components in the
sighted and the blind provided first indications of reorganized processing in the blind, such as to
include the engagement of their occipital cortex in nonvisual tasks (Kujala et al. 1992; Leclerc et al.
2000; Rösler et al. 1993; Uhl et al. 1991). Functional neuroimaging studies have collaborated and
extended these findings by showing functional engagement of the occipital lobe (visual cortex) of
congenitally blind individuals in perception in other modalities (i.e., audition and touch; Gougoux
et al. 2005; Kujala et al. 2005; Sathian 2005; Stilla et al. 2008; for a recent review of these findings,
see Noppeney 2007), tactile Braille reading (Büchel et al. 1998; Burton et al. 2002; Gizewski et al.
2003; Sadato et al. 1996, 1998), verbal processing (Burton et al. 2002, 2003; Ofan and Zohary 2006;
Röder et al. 2002), and memory tasks (Amedi et al. 2003; Raz et al. 2005). Importantly, the reported
activations reflect functionally relevant contributions to these tasks, as indicated by studies in which
processing within the occipital cortex was transiently disrupted using transcranial magnetic stimu-
lation (TMS) during auditory (Collignon et al. 2007), tactile processing including Braille reading
(Cohen et al. 1997; Merabet et al. 2004) as well as linguistic functions (Amedi et al. 2004). Akin
to the findings in the blind, it has been shown that the auditory cortex of the congenitally deaf is
activated by visual stimuli (Finney et al. 2001), particularly varieties of visual movement (Campbell
and MacSweeney 2004).
It is important to realize that involvement of unisensory brain regions in cross-modal perception
is not only limited to individuals with sensory impairments, but can under certain circumstances
also be identified in the majority of the population (Sathian et al. 1997; Zangaladze et al. 1999;
406 The Neural Bases of Multisensory Processes
Amedi et al. 2001, 2005b; Merabet et al. 2004; Sathian 2005), consistent with reports in experi-
mental animals of nonvisual inputs into visual cortex and nonauditory inputs into auditory cortex
(Falchier et al. 2002; Rockland and Ojima 2003; Schroeder et al. 2003; Lakatos et al. 2007). In the
blind and deaf this involvement is much stronger, because sensory areas deprived of their custom-
ary sensory input become functionally reintegrated into different circuits, which lead to profound
changes in the affected modality and the system as a whole.
properties of its “host” and does not hold on to its genetic predisposition (Schlaggar and O’Leary
1991). This implies that the cross-modal plasticity observed in the blind is most probably subserved
by altered connectivity patterns, as will be further discussed in the next section.
Supramodal plasticity refers to changes encompassing areas and brain functions that are typi-
cally considered nonsensory. Evidence for such plasticity have been revealed in studies showing
involvement of the occipital cortex in memory or language (verb generation or semantic judgments)
processing in the blind (Amedi et al. 2003, 2004; Burton et al. 2002, 2003; Ofan and Zohary 2006;
Raz et al. 2005; Röder et al. 2000, 2002). This type of plasticity is comparable to cross-modal
plasticity and is enabled by altered connectivity patterns between the visual cortex and other supra-
modal brain regions.
When describing and systematizing different types of plastic changes, we want to once again
emphasize that these are not mutually independent. They often occur in synchrony and it may
occasionally be difficult to categorize a certain type of change as belonging to one of the suggested
types. Furthermore, all types of large-scale plasticity depend on or reflect anatomical and functional
changes in neural networks and may therefore rely on similar neurophysiological mechanisms.
Before describing these mechanisms in more detail and illustrating how they could underlie differ-
ent types of plastic changes, we will present another important element that needs to be considered
with respect to compensating for sensory impairments. Specifically, we will now focus on the fact
that all of the mentioned changes in neural networks show large variability between individuals,
resulting in corresponding variability in compensatory cognitive and behavioral skills. It is impor-
tant to consider some of the main sources of this variability, not just so that we can better understand
the reorganization following sensory loss in different populations of the blind, but also because this
variability has important implications for the potential for successful rehabilitation.
environmental inputs or injuries (Wiesel and Hubel 1963). Thus, injuries affecting different stages
of development, even when they occur at a roughly similar ages, may trigger distinct patterns of
compensatory neuroplastic changes and lead to different levels of recovery. Specifically, early stud-
ies of recovery after visual loss (Wiesel and Hubel 1963, 1965) suggested that vision is particularly
sensitive to receiving natural input during early development, and that visual deprivation even for
short durations, but at an early developmental stage, may irreversibly damage the ability for normal
visual perception at older ages. Conversely, evidence of sparse visual recovery after early-onset
blindness (Gregory and Wallace 1963; Fine et al. 2003) demonstrates that this may not necessarily
apply in all cases, and some (although not all) visual abilities may be regained later in life.
The potential for neuroplasticity after puberty is considered to be either much lower as compared
to childhood or even impossible except in cases of pathological states and neural overstimulation
(Shaw and McEachern 2000). However, recovery following different types of pathological states
occurring in adulthood (Brown 2006; Chen et al. 2002), changes in neuronal counts and compensa-
tory increases in the number of synapses in aging (Kolb 1995), and the profound changes following
short periods of blindfolding (Pitskel et al. 2007; Pascual-Leone et al. 2005; Amedi et al. 2006)
suggest otherwise. In reconciling these seemingly contradictory conclusions, it is useful to take into
account the multifaceted nature of plasticity that includes different forms of changes occurring at
different timescales and on different levels of neural functioning. For example, synaptic changes
occurring in aging develop over an extended period and in synergy with altered experiences and
needs characteristic of later periods in life. The robust, short-term plasticity occurring after blind-
folding may arise from the recruitment of already existing, but commonly unused, inhibited, or
masked pathways that become available once the source or reason for such masking (e.g., avail-
ability of visual input in those who have been blindfolded) is removed. Therefore, some forms of
adult plasticity do not reflect “plasticity de novo,” which is characterized by the creation of new
connectivity patterns (Burton 2003). In contrast, in pathological states, injuries, or late sensory
loss, both of these types of changes can occur. Rapid changes reflecting the unmasking of existing
connections occurring in the first phase promote and enable subsequent slow, but more permanent
structural changes (Amedi et al. 2005a; Pascual-Leone et al. 2005). This suggests that potentially
similar functional outcomes may be mediated by different neural mechanisms whose availability
depends on the developmental stage in which they occur.
All of these general principles and differences in neuroplasticity across the lifespan can be applied
to the more specific case of plasticity following sensory loss. Given that the most extensive plasticity
is seen in the congenitally or early-onset blind, it has been suggested that processing advantages
and large-scale cortical reorganization might be limited to the congenitally and early blind with the
performance of the late blind resembling more that of the sighted (Fine 2008). Similarly, Cohen et
al. (1999) suggested that the critical period of susceptibility for significant cross-modal plasticity
would end at puberty. However, findings showing a high degree of modifiability of cortical maps
even in adulthood (Kaas 1991) as well as those indicating significant reorganization in the occipi-
tal cortex of the late blind (Büchel et al. 1998; Burton et al. 2002; Voss et al. 2004) argue against
this restriction. They are in line with the previous suggestion that significant potential for plastic
changes exists throughout the lifespan, but may differ in the extent and the underlying neurophysi-
ological mechanisms available in different periods of development.
Specifically, experience of vision, especially if still available after puberty, shapes both cogni-
tion and the brain and this influence is present even after vision is lost. Although the late blind need
to reorganize information processing in order to compensate for the lack of visual input, they can
employ previously learned visual strategies, for example, visual imagery, which is still available
after visual loss (Büchel et al. 1998) much more than the early blind. They also benefit greatly from
fully developed multisensory systems, which may explain differences in multisensory plasticity
encountered across the populations of the congenitally, early, and late blind. Equivalent benefits
and cross-modal connections encountered in the late blind cannot be expected to occur in those
who lack the experience of concurrent, often redundant or complementary input from different
Neurophysiological Mechanisms Underlying Plastic Changes 409
sensory modalities. Although the potential for multisensory integration can primarily be seen as a
phenomenon that develops through integration of unisensory inputs (Wallace 2004b), it is important
to emphasize that this does not imply a serial process in which fully developed individual modalities
somehow merge in order to produce multisensory percepts. On the contrary, although some level of
development of unisensory processing may be needed for the emergence of multisensory neurons,
unisensory and multisensory perception start developing in a highly interdependent manner soon
after this initial phase. Furthermore, although multisensory percepts may develop as a consequence
of concurrent and correlated inputs from different modalities, they in turn also influence or channel
the development and differentiation of single modalities (Lickliter and Bahrick 2004). Specifically,
recent findings (Putzar et al. 2007) show that humans deprived of patterned visual input during the
first months of life that later had their patterned vision restored show reduced audiovisual interac-
tions. This indicates that adequate multisensory input during early development is indeed necessary
for the full development of cross-modal interactions. Similar findings have been found for abnormal
cross-modal integration in cochlear implant patients (Schorr et al. 2005).
Overall, findings indicate substantial differences in all types of plasticity across congenitally,
early, and late blind individuals. These between-group differences are not necessarily the same
across all types of plastic changes and brain areas (networks) affected by them, because they depend
to a great extent on the interaction between the onset of blindness and the exact stage of devel-
opment at the time of blindness, which may differ in different brain systems. For example, it is
plausible to assume that the ventral and dorsal pathways within the visual systems would be dif-
ferently influenced by loss of vision at different developmental stages. Thus, systems dedicated to
dynamically shifting relations between locations, objects, and events (including the dorsal visual
pathway) may develop earlier and be therefore prone to a different pattern of developmental defi-
cits (Neville and Bavelier 2000), comparable to specific findings showing that motion perception
develops earlier than object perception (Fine et al. 2003). Finally, although some of the described,
more “extreme” examples of plasticity may take years to develop, several studies suggest that with-
holding visual information for short periods, even a week, may have dramatic results: subjects who
were blindfolded for only a week showed posterior occipital lobe activation during Braille reading
(Amedi et al. 2006), and during tactile discrimination tasks (Merabet et al. 2008b). This activation
was reduced when the blindfold was removed. Hence, not all cross-modal changes require long-
term sensory deprivation, or slowly developing altered connectivity patterns; some may result from
the previously mentioned unmasking of existing connectivity between the visual and other cortices,
which are dormant (or actively inhibited) in normal conditions. It is likely that at least some of the
plastic changes require extended periods of sensory deprivation, possibly occurring in the critical
or sensitive periods in development. Such dependence may have important implications concern-
ing the ability to restore sight and regain functional vision, as well as for understanding the neural
mechanisms explaining the plastic changes evident both in early-onset as well as late-onset blind.
of plasticity (i.e., cross-modal or multisensory plasticity), the plasticity of different areas, or, as
previously described, plastic changes that occur at different onsets of vision loss. Evidence has been
provided in support of each such model, suggesting that the individual models may capture different
phenomena of relevance and that their combination may offer the full specification of the changes
encountered after sensory loss. We will now briefly present models that emphasize subcortical and
cortical connectivity changes in different extent, and briefly review theories that aim at explaining
the general trends in long-range plasticity changes triggered by sensory loss.
Subcortical models of connectivity are based mostly on findings in animal models of plasticity
following sensory loss. Such studies in mammals, similar to the studies of rewiring sensory input
(Sharma et al. 2000; von Melchner et al. 2000), suggest that the visual cortex may receive sensory
input from subcortical sensory stations, which may cause a reorganization of the visual cortex,
enabling it to process stimuli from other modalities. Specifically, several studies have shown that
congenital blindness (caused by early enucleation) causes a rewiring of tactile and auditory inputs
from the thalamic and other brainstem stations in the sensory pathways to the visual cortex (Chabot
et al. 2007; Izraeli et al. 2002; Karlen et al. 2006; Laemle et al. 2006; Piche et al. 2007). This rewir-
ing is evident both in the neural connectivity (indicated by the use of anterograde and retrograde
tracers) and in the functional properties of the “visual” cortex (examined by electrophysiologi-
cal recordings), which now starts to exhibit auditory or tactile responses. This type of model may
explain the engagement of the visual cortex of blind humans in “low-level” sensory tasks (Kujala et
al. 2005; Gougoux et al. 2005) seen in many studies, which constitutes cross-modal or intermodal,
plasticity. However, despite the evidence for spontaneous occurrence of such connectivity in mam-
mals, no definite support for such a model has been established in humans as of yet.
Corticocortical models of connectivity are currently better grounded to account for the large-
scale plasticity observed in the blind. Although it was previously assumed that there are no direct
connections between sensory modalities, recent anatomical studies in primates indicate the exis-
tence of projections from the auditory to the visual cortex and multisensory feedback connections to
primary visual areas (Falchier et al. 2002; Rockland and Ojima 2003). Supporting this connectivity
in humans, increased functional connectivity between the primary somatosensory cortex and pri-
mary visual cortex was found in early-onset blind (Wittenberg et al. 2004). Although direct con-
nectivity from auditory or somatosensory cortex to primary visual cortex may explain some, it may
not account for all of its perceptual properties. In addition, such a model may not explain the “high
cognitive” component of the compensatory plasticity, as reflected in, for example, the involvement
of the visual cortex in verbal memory and language. In order to account for these findings, models
of corticocortical connectivity have to be further refined. Specifically, these cannot remain limited
only to illustrating the presence or lack of white matter fibers between different regions, but need
to address the dynamics of information transfer. In one such model, the so-called inverted hierar-
chy model (Amedi et al. 2003; Büchel 2003), feedback connectivity is considered to play a crucial
role in cross-modal (and supramodal) plasticity. Specifically, connections stemming from temporal,
parietal, and frontal lobes may, in the absence of visual input and visual pathway connectivity com-
petition, be responsible for providing nonvisual input to the occipital lobe, enabling its engagement
in nonvisual processing. This is particularly true for areas involved in multisensory processing even
in the sighted, such as regions within the lateral occipital complex (LOC) that are naturally active
both during tactile and visual object recognition (Amedi et al. 2001, 2002). Such areas retain some
of their original sensory input following the loss of one modality and may consequently preserve
their original functions (i.e., tactile shape recognition, including Braille reading), corresponding to
multimodal or multisensory plasticity. The feedback connectivity from these regions to earlier sta-
tions in the visual pathways, such as the primary visual cortex, may further expand this network.
Since these stations are now even further away from direct sensory input (a similar distance from
the sensory receptors as the frontal cortex, as measured by the number of synapses), the model
posits they may now begin to engage in even higher cognitive functions, similar to the frontal
cortex. In support of this hypothesis, it was demonstrated that the functional connectivity of the
Neurophysiological Mechanisms Underlying Plastic Changes 411
visual cortex with frontal language regions is increased in the blind (Liu et al. 2007). Such changes
in connectivity could account for the altered pattern of inputs reaching the occipital cortex, which
may in the end determine the morphological and physiological features of this area and enable its
functional reassignment to nonsensory tasks. It is still too early to speculate about all implications
of the inverted hierarchy approach, particularly in relation to those areas that might be at the top
of the postulated hierarchy. On a little less speculative front, recent studies have provided evidence
supporting some claims of the hypothesis suggesting increased feedback corticocortical informa-
tion transfer following sensory loss. For example, it has been shown that the area involved in (visual
and auditory) motion processing in the sighted is involved in auditory (Poirier et al. 2006a) as well
as tactile (Ricciardi et al. 2007) motion processing in the blind. Similar conclusions can be drawn
from findings showing the engagement of the ventral visual pathway typically involved in process-
ing information related to the identification of objects and faces (Ungerleider and Mishkin 1982)
in auditorily mediated object recognition, but only if detailed shape information is provided and
efficiently extracted (Poirier et al. 2006c; Amedi et al. 2007). All of these results are congruent
with the more general notion that cross-modal plasticity occurs in situations where the information
originally processed within a certain area is similar, regardless of the input being rerouted into it
(Grafman 2000). This implies that each cortical area may operate in a metamodal fashion (Pascual-
Leone and Hamilton 2001), being specialized in a particular type of computation rather than being
tied to a specific input modality. However, this type of broad generalization is subject to caution as it
is still not clear how such metamodal computations would develop, especially in the case of signifi-
cantly altered inputs during development such as in the case of congenital blindness. On one hand,
the metamodal theory suggests that, in blindness, visual deafferentation may lead to a strengthening
of the corresponding input signal from other modalities to the “visual” areas, which will maintain
the original cortical operation. This hypothesis predicts that the classical hierarchy (i.e., low-level
basic feature analysis in early visual areas, high level object recognition in LOC) is maintained in
the blind, who now utilize the tactile (and auditory) modalities. By contrast, the inverted hierarchy
theory suggests that, because of the dysfunctional main bottom-up geniculostriatal pathway in the
blind, the retinotopic areas (especially V1) will be much farther (in terms of the number of synapses)
from the remaining functional sense organs (in the tactile or auditory modalities). This, in turn,
would lead to V1 resembling more the prefrontal cortex (which is similarly remote from any direct
sensory input), rather than becoming a primary sensory area in the blind.
Both these theories may, however, be resolved with regard to the connectivity of the reorganized
visual cortex of the blind and the onset of visual loss. The development of computations charac-
teristic of a certain region is strongly dependent on the input that it originally receives. Therefore,
in cases of congenital sensory loss, primary sensory areas are less likely to develop computations
similar to the ones performed in the typical brain. These differences in connectivity may lead to
more excessive developmental changes, causing cortical regions to assume very different computa-
tions than their natural roles. The visual cortex of the congenitally blind may correspond to early
stations of sensory processing due to auditory or tactile subcortical (or even cortical, see Wittenberg
et al. 2004) connectivity as seen in animal models of blindness, or to higher stations in the hierarchy
(as predicted by the inverted hierarchy model) if most of the connectivity is indeed from high-order
(multisensory) cortical regions. Currently, evidence for both types of plasticity can be found, as the
visual cortex of the congenitally blind is activated both by simple perceptual tasks as well as by
mnemonic and semantic tasks, but there appear to be differences in the preference for perceptual vs.
high-level cognitive functions in different areas of the occipital cortex. Specifically, there is growing
evidence that as one goes anteriorly in the ventral visual stream—the weights between the activa-
tion to the two tasks tend toward the perceptual tasks, whereas posteriorly in and around calcarine
sulcus (V1) there is clear preference to the higher-order verbal memory and language tasks (Raz et
al., 2005). However, this issue is not yet resolved and it will greatly benefit from future anatomical
connectivity findings in humans. In the case of late-onset blindness, the connectivity of the visual
cortex and its development are more typical of the sighted brain (as previously described), and
412 The Neural Bases of Multisensory Processes
reorganization is more likely to be of the high-order corticocortical type, along with some unmask-
ing of subcortical connectivity, which is also apparent in blindfolded normally sighted individuals
(Pitskel et al. 2007; Pascual-Leone et al. 2005; Amedi et al. 2006).
regardless of whether it is transmitted in the visual, tactile, or auditory modality (Amedi et al.
2007; Poirier et al. 2006c) in sighted as well as blind individuals (see Figure 21.1b, c). Interestingly,
applying TMS to this region can disrupt shape identification using an auditory SSD (Merabet et al.
2008a). In the same way, studies conducted using PSVA in the sighted show that auditorily mediated
face perception can activate the visual fusiform face area (Plaza et al. 2009), whereas depth percep-
tion activates the occipito-parietal and occipito-temporal regions (Renier et al. 2005).
Studying the use of SSDs in a longitudinal fashion also provides a good opportunity to monitor
in real time how newly acquired information is learned, and investigate the accompanying cogni-
tive and neural changes. For example, several studies have looked into differential activation before
and after learning how to use a specific SSD. One study showed that shape discrimination using
the TDU SSD generated activation of the occipital cortex following short training only in early-
onset blind individuals (but not in sighted; Ptito et al. 2005), and that TDU training enables TMS to
induce spatially organized tactile sensations on the tongue (Kupers et al. 2006). These studies sug-
gest that the occipital lobe of the blind may be more prone to plasticity or to cross-modal process-
ing even in adulthood when compared to that of the sighted. Cross-modal activation of the visual
cortex of sighted subjects was also demonstrated, following training on the PSVA SSD (Poirier et al.
2007a). Although these findings indicating behavioral and imaging findings have been reported for
both early- (Arno et al. 2001) and late-onset blind (Cronly-Dillon et al. 1999) and sighted individu-
als (Poirier et al. 2007a), it has recently been claimed that the recruitment of occipital areas during
the use of SSDs could be mediated by different processes or mechanisms in different populations.
Specifically, although the early blind might exhibit real bottom-up activation of occipital cortex for
tactile or auditory perception, in the late blind and sighted this activation might reflect top-down
visual imagery mechanisms (Poirier et al. 2007b). This suggestion is not surprising, given that we
have previously given a similar claim with regard to the mechanisms underlying plastic changes
following sensory loss itself. Importantly, recent evidence of multisensory integration for object rec-
ognition, as shown by using a novel cross-modal adaptation paradigm (Tal and Amedi 2009), may
imply that the sighted could share some bottom-up mechanisms of tactile and visual integration in
visual cortex. Nevertheless, in addition to relying on different neurophysiological mechanisms, the
possible behavioral potential of SSDs may also vary between subpopulations of the blind, as the
late-onset blind can better associate the cross-modal input to the properties of vision as they knew it
(e.g., they have better knowledge of the 2-D representation of visual pictures, which is useful in most
current 2-D SSDs), whereas early blind individuals lack such understanding of the visual world,
but may have more highly developed auditory and tactile cross-modal networks and plasticity. This
difference in utilizing visual rehabilitation between the two blind groups may be even more valid in
the case of sensory restoration. Importantly, this differentiation between early and late-onset blind
in SSD use also highlights the potential of introducing such devices as early as possible in develop-
ment, while the brain is still in its prime with respect to plasticity. Similar to the improved outcomes
of cochlear implantation in early childhood (Harrison et al. 2005), it may be of particular interest
to attempt to teach young blind children to utilize such devices. Several early attempts to teach
blind infants to use the Sonicguide (Kay and Kay 1983) showed some promise, as younger subjects
showed more rapid sensitivity to the spatial information provided by the device (Aitken and Bower
1982, 1983) (although with highly variable results; for a discussion, see Warren 1994). However, to
our knowledge, only a few preliminary later attempts (Segond et al. 2007; Amedi et al., unpublished
observations) have been made to adapt the use of SSDs to children. The training of infants on SSD
use may also lead to a more “natural” perception of the sensorily transformed information, perhaps
even to the level of synesthesia (Proulx and Stoerig 2006), a condition in which one type of sensory
stimulation evokes the sensation of another, commonly in another modality or submodality: for
instance, color is associated with letters or numbers, sounds with vision or other sensory combina-
tions. This type of synesthesia may create visual experiences or even visual qualia with regard to
the SSD percept. In a recent study (Ward and Meijer 2009) describing the phenomenology of two
blind users of a the vOICe SSD, some evidence for its feasibility can be seen in reports of a late-
414 The Neural Bases of Multisensory Processes
blind vOICe user, who reports of synesthetic percepts of vision while using the device, and even
synesthetic percepts of color, which is not conveyed by the devices, but is “filled in” by her mind’s
eye. Some of her descriptions of her subjective experience illustrate the synesthetic nature of the
SSD percept: “the soundscapes seem to trigger a sense of vision for me. . . . It does not matter to me
that my ears are causing the sight to occur in my mind” (see more on the vOICe website, http://www
.seeingwithsound.com/users.htm).
In summary, observing the outcomes of sensory restoration and substitution offers a unique
opportunity to address and potentially answer numerous theoretical questions about the funda-
mental principles of brain organization, neuroplasticity, unisensory processing, and multisensory
integration, in addition to their obvious clinical use. Research in these fields may also provide use-
ful insights that can be applied in clinical settings, such as the suggested use of SSDs and sensory
recovery at an early developmental stage.
current challenge is to understand the principles that guide, mechanisms that underlie, and factors
that influence such changes, so that this knowledge can be channeled into practical rehabilitation
purposes.
REFERENCES
Aitken, S., and T. G. Bower. 1982. Intersensory substitution in the blind. J Exp Child Psychol 33: 309–323.
Aitken, S., and T. G. Bower. 1983. Developmental aspects of sensory substitution. Int J Neurosci 19: 13–91.
Amedi, A., J. Camprodon, L. Merabet et al. 2006. Highly transient activation of primary visual cortex (V1) for
tactile object recognition in sighted following 5 days of blindfolding. Paper presented at the 7th Annual
Meeting of the International Multisensory Research Forum, University of Dublin.
Amedi, A., A. Floel, S. Knecht, E. Zohary, and L. G. Cohen. 2004. Transcranial magnetic stimulation of the
occipital pole interferes with verbal processing in blind subjects. Nat Neurosci 7: 1266–1270.
Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape
processing in the human lateral occipital complex. Cereb Cortex 12: 1202–1212.
Amedi, A., R. Malach, R. Hendler, S. Peled, and E. Zohary. 2001. Visuo-haptic object-related activation in the
ventral visual pathway. Nat Neurosci 4: 324–330.
Amedi, A., L. B. Merabet, F. Bermpohl, and A. Pascual-Leone. 2005a. The Occipital Cortex in the Blind.
Lessons About Plasticity and Vision. Curr Dir Psychol Sci, 14: 306–311.
Amedi, A., N. Raz, P. Pianka, R. Malach, and E. Zohary. 2003. Early ‘visual’ cortex activation correlates with
superior verbal memory performance in the blind. Nat Neurosci 6: 758–766.
Amedi, A., W. M. Stern, J. A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitu-
tion activates the lateral occipital complex. Nat Neurosci 10: 687–689.
Amedi, A., K. Von Kriegstein, N. M. Van Atteveldt, M. S. Beauchamp, and M. J. Naumer. 2005b. Functional
imaging of human crossmodal identification and object recognition. Exp Brain Res 166: 559–571.
Arno, P., C. Capelle, M. C. Wanet-Defalque, M. Catalan-Ahumada, and C. Veraart. 1999. Auditory coding of
visual patterns for the blind. Perception 28: 1013–1029.
Arno, P., A. G. De Volder, A. Vanlierde et al. 2001. Occipital activation by pattern recognition in the early blind
using auditory substitution for vision. Neuroimage 13: 632–645.
Ashmead, D. H., E. W. Hill, and C. R. Talor. 1989. Obstacle perception by congenitally blind children. Percept
Psychophys 46: 425–433.
Ashmead, D. H., R. S. Wall, K. A. Ebinger, S. B. Eaton, M. M. Snook-Hill, and X. Yang. 1998. Spatial hearing
in children with visual disabilities. Perception 27: 105–122.
Auvray, M., S. Hanneton, and J. K. O’Regan. 2007. Learning to perceive with a visuo-auditory substitution
system: Localisation and object recognition with ‘The vOICe’. Perception 36: 416–430.
Bach-Y-Rita, P. 2004. Tactile sensory substitution studies. Ann N Y Acad Sci 1013: 83–91.
Bach-Y-Rita, P., C. C. Collins, F. A. Saunders, B. White, and L. Scadden. 1969. Vision substitution by tactile
image projection. Nature 221: 963–964.
Bach-Y-Rita, P., K. A. Kaczmarek, M. E. Tyler, and J. Garcia-Lara. 1998. Form perception with a 49-point
electrotactile stimulus array on the tongue: A technical note. J Rehabil Res Dev 35: 427–430.
Bach-y-Rita, P., and S. W. Kercel. 2003. Sensory substitution and the human–machine interface. Trends Cogn
Sci 7: 541–546.
Bavelier, D., M. W. Dye, and P. C. Hauser. 2006. Do deaf individuals see better? Trends Cogn Sci 10: 512–518.
Bavelier, D., and H. J. Neville. 2002. Cross-modal plasticity: Where and how? Nat Rev Neurosci 3: 443–452.
Beauchamp, M. S., B. D. Argall, J. Bodurka, J. H. Duyn, and A. Martin. 2004. Unraveling multisensory integra-
tion: Patchy organization within human STS multisensory cortex. Nat Neurosci 7: 1190–1192.
Brelen, M. E., F. Duret, B. Gerard, J. Delbeke, and C. Veraart. 2005. Creating a meaningful visual perception
in blind volunteers by optic nerve stimulation. J Neural Eng 2: S22–S28.
Brown, J. A. 2006. Recovery of motor function after stroke. Prog Brain Res 157: 223–228.
Bubic, A., E. Striem-Amit, and A. Amedi. 2010. Large-scale brain plasticity following blindness and the use of
sensory substitution devices. In Multisensory Object Perception in the Primate Brain, ed. J. Kaiser and
M. Naumer, part 4, 351–380.
Büchel, C. 2003. Cortical hierarchy turned on its head. Nat Neurosci 6: 657–658.
Büchel, C., C. Price, R. S. Frackowiak, and K. Friston. 1998. Different activation patterns in the visual cortex
of late and congenitally blind subjects. Brain 121(Pt 3): 409–419.
Bull, N. D., and K. R. Martin. 2009. Using stem cells to mend the retina in ocular disease. Regen Med 4:
855–864.
416 The Neural Bases of Multisensory Processes
Buonomano, D. V., and H. A. Johnson. 2009. Cortical plasticity and learning: Mechanisms and models. In
Encyclopedia of neuroscience, ed. L. R. Squire. London: Academic Press.
Burton, H., A. Z. Snyder, J. B. Diamond, and M. E. Raichle. 2002. Adaptive changes in early and late blind: A
FMRI study of verb generation to heard nouns. J Neurophysiol 88: 3359–3371.
Burton, H. 2003. Visual cortex activity in early and late blind people. J Neurosci 23: 4005–4011.
Burton, H., J. B. Diamond, and K. B. McDermott. 2003. Dissociating cortical regions activated by semantic and
phonological tasks: A FMRI study in blind and sighted people. J Neurophysiol 90: 1965–1982.
Calvert, G. A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cereb Cortex 11: 1110–1123.
Campbell, R., and M. MacSweeney. 2004. Neuroimaging studies of cross-modal plasticity and language pro-
cessing in deaf people. In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E.
Stein. Cambridge, MA: MIT Press.
Capelle, C., C. Trullemans, P. Arno, and C. Veraart. 1998. A real-time experimental prototype for enhancement
of vision rehabilitation using auditory substitution. IEEE Trans Biomed Eng 45: 1279–1293.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. Eur J Neurosci 22: 2886–2902.
Chabot, N., S. Robert, R. Tremblay, D. Miceli, D. Boire, and G. Bronchti. 2007. Audition differently activates
the visual system in neonatally enucleated mice compared with anophthalmic mutants. Eur J Neurosci
26: 2334–2348.
Champoux, F., F. Lepore, J. P. Gagne, and H. Theoret. 2009. Visual stimuli can impair auditory processing in
cochlear implant users. Neuropsychologia 47: 17–22.
Chechik, G., I. Meilijson, and E. Ruppin. 1999. Neuronal regulation: A mechanism for synaptic pruning during
brain maturation. Neural Comput 11: 2061–2080.
Chebat, D. R., C. Rainville, R. Kupers, and M. Ptito. 2007. Tactile–‘visual’ acuity of the tongue in early blind
individuals. Neuroreport 18: 1901–1904.
Chen, R., L. G. Cohen, and M. Hallett. 2002. Nervous system reorganization following injury. Neuroscience
111: 761–773.
Chen, S. C., L. E. Hallum, G. J. Suaning, and N. H. Lovell. 2007. A quantitative analysis of head movement
behaviour during visual acuity assessment under prosthetic vision simulation. J Neural Eng 4: S108.
Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications
for multisensory integration, spatial awareness, and visual consciousness. Cogn Affect Behav Neurosci
4: 117–126.
Cohen, L. G., P. Celnik, A. Pascual-Leone, B. Corwell, L. Falz, J. Dambrosia, M. Honda, N. Sadato, C. Gerloff,
M. D. Catala, and M. Hallett. 1997. Functional relevance of cross-modal plasticity in blind humans.
Nature 389: 180–183.
Cohen, L. G., R. A. Weeks, N. Sadato, P. Celnik, K. Ishii, and M. Hallett. 1999. Period of susceptibility for
cross-modal plasticity in the blind. Ann Neurol 45: 451–460.
Collignon, O., L. Renier, R. Bruyer, D. Tranduy, and C. Veraart. 2006. Improved selective and divided spatial
attention in early blind subjects. Brain Res 1075: 175–182.
Collignon, O., M. Lassonde, F. Lepore, D. Bastien, and C. Veraart. 2007. Functional cerebral reorganization
for auditory spatial processing and auditory substitution of vision in early blind subjects. Cereb Cortex
17: 457–465.
Cronin, T., T. Leveillard, and J. A. Sahel. 2007. Retinal degenerations: From cell signaling to cell therapy; pre-
clinical and clinical issues. Curr Gene Ther 7: 121–129.
Cronly-Dillon, J., K. Persaud, and R. P. Gregory. 1999. The perception of visual images encoded in musical
form: A study in cross-modality information transfer. Proc Biol Sci 266: 2427–2433.
Cronly-Dillon, J., K. C. Persaud, and R. Blore. 2000. Blind subjects construct conscious mental images of
visual scenes encoded in musical form. Proc Biol Sci 267: 2231–2238.
D’angiulli, A., and P. Waraich. 2002. Enhanced tactile encoding and memory recognition in congenital blind-
ness. Int J Rehabil Res 25: 143–145.
Dagnelie, G. 2008. Psychophysical evaluation for visual prosthesis. Annu Rev Biomed Eng 10: 339–368.
Delbeke, J., M. C. Wanet-Defalque, B. Gerard, M. Troosters, G. Michaux, and C. Veraart. 2002. The microsys-
tems based visual prosthesis for optic nerve stimulation. Artif Organs 26: 232–234.
De Volder, A. G., A. Bol, J. Blin, A. Robert, P. Arno, C. Grandin, C. Michel, and C. Veraart. 1997. Brain energy
metabolism in early blind subjects: Neural activity in the visual cortex. Brain Res 750: 235–244.
Dobelle, W. H. 2000. Artificial vision for the blind by connecting a television camera to the visual cortex.
ASAIO J 46: 3–9.
Neurophysiological Mechanisms Underlying Plastic Changes 417
Doucet, M. E., F. Bergeron, M. Lassonde, P. Ferron, and F. Lepore. 2006. Cross-modal reorganization and
speech perception in cochlear implant users. Brain 129: 3376–3383.
Doucet, M. E., J. P. Guillemot, M. Lassonde, J. P. Gagne, C. Leclerc, and F. Lepore. 2005. Blind subjects pro-
cess auditory spectral cues more efficiently than sighted individuals. Exp Brain Res 160: 194–202.
Dowling, J. 2008. Current and future prospects for optoelectronic retinal prostheses. Eye 23:1999–2005.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. J Neurosci 22: 5749–5759.
Fallon, J. B., D. R. Irvine, and R. K. Shepherd. 2008. Cochlear implants and brain plasticity. Hearing Res 238:
110–117.
Fernandez, E., P. Ahnelt, P. Rabischong, C. Botella, and F. Garcia-De Quiros. 2002. Towards a cortical visual
neuroprosthesis for the blind. IFMBE Proc 3(2): 1690–1691.
Fieger, A., B. Röder, W. Teder-Salejarvi, S. A. Hillyard, and H. J. Neville. 2006. Auditory spatial tuning in late-
onset blindness in humans. J Cogn Neurosci 18: 149–157.
Fine, I. 2008. The behavioral and neurophysiological effects of sensory deprivation. In Blindness and brain
plasticity in navigation and object perception, ed. J. J. Rieser, D. H. Ashmead, F. F. Ebner, and A. L.
Corn. New York: Taylor and Francis.
Fine, I., A. R. Wade, A. A. Brewer et al. 2003. Long-term deprivation affects visual perception and cortex. Nat
Neurosci 6: 915–916.
Finney, E. M., I. Fine, and K. R. Dobkins. 2001. Visual stimuli activate auditory cortex in the deaf. Nat Neurosci
4: 1171–1173.
Geers, A. E. 2006. Factors influencing spoken language outcomes in children following early cochlear implan-
tation. Adv Otorhinolaryngol 64: 50–65.
Gizewski, E. R., T. Gasser, A. de Greiff, A. Boehm, and M. Forsting. 2003. Cross-modal plasticity for sensory
and motor activation patterns in blind subjects. Neuroimage 19: 968–975.
Goldish, L. H., and H. E. Taylor. 1974. The Optacon: A valuable device for blind persons. New Outlook Blind
68: 49–56.
Goldreich, D., and I. M. Kanics. 2003. Tactile acuity is enhanced in blindness. J Neurosci 23: 3439–3445.
Goldreich, D., and I. M. Kanics. 2006. Performance of blind and sighted humans on a tactile grating detection
task. Percept Psychophys 68: 1363–1371.
Gougoux, F., R. J. Zatorre, M. Lassonde, P. Voss, and F. Lepore. 2005. A functional neuroimaging study of sound
localization: Visual cortex activity predicts performance in early-blind individuals. PLoS Biol 3: e27.
Grafman, J. 2000. Conceptualizing functional neuroplasticity. J Commun Disord 33: 345–355; quiz 355-6.
Grant, A. C., M. C. Thiagarajah, and K. Sathian. 2000. Tactile perception in blind Braille readers: A psy-
chophysical study of acuity and hyperacuity using gratings and dot patterns. Percept Psychophys 62:
301–312.
Gregory, R. L., and J. G. Wallace. 1963. Recovery from early blindness: A case study. In Experimental
Psychology Society, Monograph Supplement. 2nd ed. Cambridge, MA: Heffers.
Haddock, J. N., and L. Berlin. 1950. Transsynaptic degeneration in the visual system; report of a case. Arch
Neurol Psychiatry 64: 66–73.
Harrison, R. V., K. A. Gordon, and R. J. Mount. 2005. Is there a critical period for cochlear implantation in
congenitally deaf children? Analyses of hearing and speech perception performance after implantation.
Dev Psychobiol 46: 252–261.
Heyes, A. D. 1984. The Sonic Pathfinder: A new electronic travel aid. J Vis Impair Blind 78: 200–202.
Hugdahl, K., M. Ek, F. Takio et al. 2004. Blind individuals show enhanced perceptual and attentional sensitivity
for identification of speech sounds. Brain Res Cogn Brain Res 19: 28–32.
Hull, T., and H. Mason. 1995. Performance of blind-children on digit-span tests. J Vis Impair Blind 89:
166–169.
Izraeli, R., G. Koay, M. Lamish, A. J. Heicklen-Klein, H. E. Heffner, R. S. Heffner, and Z. Wollberg. 2002.
Cross-modal neuroplasticity in neonatally enucleated hamsters: Structure, electrophysiology and behav-
iour. Eur J Neurosci 15: 693–712.
Kaas, J. H. 1991. Plasticity of sensory and motor maps in adult mammals. Annu Rev Neurosci 14: 137–167.
Kaas, J. H. 2000. The reorganization of somatosensory and motor cortex after peripheral nerve or spinal cord
injury in primates. Prog Brain Res 128: 173–179.
Karlen, S. J., D. M. Kahn, and L. Krubitzer. 2006. Early blindness results in abnormal corticocortical and thal-
amocortical connections. Neuroscience 142: 843–858.
Kay, L., and N. Kay. 1983. An ultrasonic spatial sensor’s role as a developmental aid for blind children. Trans
Ophthalmol Soc N Z 35: 38–42.
418 The Neural Bases of Multisensory Processes
Kleiner, A., and R. C. Kurzweil. 1977. A description of the Kurzweil reading machine and a status report on its
testing and dissemination. Bull Prosthet Res 10: 72–81.
Knudsen, E. I. 2004. Sensitive periods in the development of the brain and behavior. J Cogn Neurosci 16:
1412–1425.
Kolb, B. 1995. Brain plasticity and behavior. Mahwah: Lawrence Erlbaum Associates, Inc.
Korte, M., and J. P. Rauschecker. 1993. Auditory spatial tuning of cortical neurons is sharpened in cats with
early blindness. J Neurophysiol 70: 1717–1721.
Kral, A., J. Tillein, S. Heid, R. Klinke, and R. Hartmann. 2006. Cochlear implants: Cortical plasticity in con-
genital deprivation. Prog Brain Res 157: 283–313.
Kujala, T., K. Alho, P. Paavilainen, H. Summala, and R. Naatanen. 1992. Neural plasticity in processing of sound loca-
tion by the early blind: An event-related potential study. Electroencephalogr Clin Neurophysiol 84: 469–472.
Kujala, T., M. J. Palva, O. Salonen et al. 2005. The role of blind humans’ visual cortex in auditory change detec-
tion. Neurosci Lett 379: 127–131.
Kupers, R., A. Fumal, A. M. De Noordhout, A. Gjedde, J. Schoenen, and M. Ptito. 2006. Transcranial magnetic
stimulation of the visual cortex induces somatotopically organized qualia in blind subjects. Proc Natl
Acad Sci U S A 103: 13256–13260.
Lacey, S., N. Tal, A. Amedi, and K. Sathian. 2009. A putative model of multisensory object representation.
Brain Topogr 21: 269–274.
Laemle, L. K., N. L. Strominger, and D. O. Carpenter. 2006. Cross-modal innervation of primary visual cortex
by auditory fibers in congenitally anophthalmic mice. Neurosci Lett 396: 108–112.
Leclerc, C., D. Saint-Amour, M. E. Lavoie, M. Lassonde, and F. Lepore. 2000. Brain functional reorganization
in early blind humans revealed by auditory event-related potentials. Neuroreport 11: 545–550.
Lakatos, P., C. M. Chen, M. N. O’Connell, A. Mills, and C. E. Schroeder. 2007. Neuronal oscillations and
multisensory interaction in primary auditory cortex. Neuron 53: 279–292.
Lamba, D., M. Karl, and T. Reh. 2008. Neural regeneration and cell replacement: A view from the eye. Cell
Stem Cell 2: 538–549.
Lamba, D. A., M. O. Karl, and T. A. Reh. 2009. Strategies for retinal repair: Cell replacement and regeneration.
Prog Brain Res 175: 23–31.
Lee, D. S., J. S. Lee, S. H. Oh, S. K. Kim, J. W. Kim, J. K. Chung, M. C. Lee, and C. S. Kim. 2001. Cross-modal
plasticity and cochlear implants. Nature 409: 149–150.
Lessard, N., M. Pare, F. Lepore, and M. Lassonde. 1998. Early-blind human subjects localize sound sources
better than sighted subjects. Nature 395: 278–280.
Lickliter, R., and L. E. Bahrick. 2004. Perceptual development and the origins of multisensory responsiveness.
In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA:
MIT Press.
Linvill, J. G., and J. C. Bliss. 1966. A direct translation reading aid for the blind. Proc IEEE 54: 40–51.
Liu, Y., C. Yu, M. Liang et al. 2007. Whole brain functional connectivity in the early blind. Brain 130:
2085–2096.
Locker, M., C. Borday, and M. Perron. 2009. Stemness or not stemness? Current status and perspectives of
adult retinal stem cells. Curr Stem Cell Res Ther 4: 118–130.
MacLaren, R. E., and R. A. Pearson. 2007. Stem cell therapy and the retina. Eye (London) 21: 1352–1359.
Marr, D. 1982. Vision. San Francisco: W. H. Freeman.
Meijer, P. B. 1992. An experimental system for auditory image representations. IEEE Trans Biomed Eng 39:
112–121.
Merabet, L., G. Thut, B. Murray, J. Andrews, S. Hsiao, and A. Pascual-Leone. 2004. Feeling by sight or seeing
by touch? Neuron 42: 173–179.
Merabet, L. B., J. F. Rizzo, A. Amedi, D. C. Somers, and A. Pascual-Leone. 2005. What blindness can tell us
about seeing again: Merging neuroplasticity and neuroprostheses. Nat Rev Neurosci 6: 71–77.
Merabet, L. B., L. Battelli, S. Obretenova, S. Maguire, P. Meijer, and A. Pascual-Leone. 2008a. Functional recruit-
ment of visual cortex for sound encoded object identification in the blind. Neuroreport 20: 132–138.
Merabet, L. B., R. Hamilton, G. Schlaug et al. 2008b. Rapid and reversible recruitment of early visual cortex
for touch. PLoS ONE 3: e3046.
Michel, G. F., and A. N. Tyler,. 2005. Critical period: A history of the transition from questions of when, to
what, to how. Dev Psychobiol 46: 156–162.
Millar, S. 1981. Cross-modal and intersensory perception and the blind. In Intersensory perception and sensory
integration, ed. R. D. Walk and H. L. J. Pick. New York: Plenum Press.
Murphy, C., and W. S. Cain. 1986. Odor identification: The blind are better. Physiol Behav 37: 177–180.
Neurophysiological Mechanisms Underlying Plastic Changes 419
Neville, H. J., and D. Bavelier. 2000. Specificity of developmental neuroplasticity in humans: Evidence from
sensory deprivation and altered language experience. In Toward a theory of neuroplasticity, ed. C. A.
Shaw and J. C. Mceachern. New York: Taylor and Francis.
Newman, N. M., R. A. Stevens, and J. R. Heckenlively. 1987. Nerve fibre layer loss in diseases of the outer
retinal layer. Br J Ophthalmol 71: 21–26.
Niemeyer, W., and I. Starlinger. 1981. Do the blind hear better? Investigations on auditory processing in con-
genital or early acquired blindness: II. Central functions. Audiology 20: 510–515.
Noordzij, M. L., S. Zuidhoek, and A. Postma. 2006. The influence of visual experience on the ability to form
spatial mental models based on route and survey descriptions. Cognition 100: 321–342.
Noppeney, U. 2007. The effects of visual deprivation on functional and structural organization of the human
brain. Neurosci Biobehav Rev 31: 1169–1180.
Noppeney, U., K. J. Friston, J. Ashburner, R. Frackowiak, and C. J. Price. 2005. Early visual deprivation
induces structural plasticity in gray and white matter. Curr Biol 15: R488–R490.
Ofan, R. H., and E. Zohary. 2006. Visual cortex activation in bilingual blind individuals during use of native and
second language. Cereb Cortex 17: 1249–1259.
Ostrovsky, Y., A. Andalman, and P. Sinha. 2006. Vision following extended congenital blindness. Psychol Sci
17: 1009–1014.
Ostrovsky, Y., E. Meyers, S. Ganesh, U. Mathur, and P. Sinha. 2009. Visual parsing after recovery from blind-
ness. Psychol Sci 20: 1484–1491.
Palanker, D., A. Vankov, P. Huie, and S. Baccus. 2005. Design of a high-resolution optoelectronic retinal pros-
thesis. J Neural Eng 2: S105–S120.
Pan, W. J., G. Wu, C. X. Li, F. Lin, J. Sun, and H. Lei. 2007. Progressive atrophy in the optic pathway and visual
cortex of early blind Chinese adults: A voxel-based morphometry magnetic resonance imaging study.
Neuroimage 37: 212–220.
Pascual-Leone, A., A. Amedi, F. Fregni, and L. B. Merabet. 2005. The plastic human brain cortex. Annu Rev
Neurosci 28: 377–401.
Pascual-Leone, A., and R. Hamilton. 2001. The metamodal organization of the brain. Prog Brain Res 134:
427–445.
Penfield, W., and T. Rasmussen. 1950. The cerebral cortex of man: A clinical study of localization of function.
New York: Macmillan.
Pezaris, J. S., and R. C. Reid. 2005. Microstimulation in LGN produces focal visual percepts: Proof of concept
for a visual prosthesis. J Vis 5: 367.
Pezaris, J. S., and R. C. Reid. 2009. Simulations of electrode placement for a thalamic visual prosthesis. IEEE
Trans Biomed Eng 56: 172–178.
Piche, M., N. Chabot, G. Bronchti, D. Miceli, F. Lepore, and J. P. Guillemot. 2007. Auditory responses in the
visual cortex of neonatally enucleated rats. Neuroscience 145: 1144–1156.
Pitskel, N. B., L. B. Merabet, C. Ramos-Estebanez, T. Kauffman, and A. Pascual-Leone. 2007. Time-dependent
changes in cortical excitability after prolonged visual deprivation. Neuroreport 18: 1703–1707.
Plaza, P., I. Cuevas, O. Collignon, C. Grandin, A. G. De Volder, and I. Renier. 2009. Perceiving faces using
auditory substitution of vision activates the fusiform face area. Belgian Society for Fundamental and
Clinical Physiology and Pharmacology, Spring Meeting 2009. Acta Physiologica 195: S670.
Poirier, C., O. Collignon, C. Scheiber et al. 2006a. Auditory motion perception activates visual motion areas in
early blind subjects. Neuroimage 31: 279–285.
Poirier, C., A. De Volder, D. Tranduy, and C. Scheiber. 2007a. Pattern recognition using a device substituting
audition for vision in blindfolded sighted subjects. Neuropsychologia 45: 1108–1121.
Poirier, C., A. G. De Volder, and C. Scheiber. 2007b. What neuroimaging tells us about sensory substitution.
Neurosci Biobehav Rev 31: 1064–1070.
Poirier, C., M. A. Richard, D. T. Duy, and C. Veraart. 2006b. Assessment of sensory substitution prosthesis
potentialities in minimalist conditions of learning. Appl Cogn Psychol 20: 447–460.
Poirier, C. C., A. G. De Volder, D. Tranduy, and C. Scheiber. 2006c. Neural changes in the ventral and dorsal
visual streams during pattern recognition learning. Neurobiol Learn Mem 85: 36–43.
Pozar, L. 1982. Effect of long-term sensory deprivation on recall of verbal material. Stud Psychol 24: 311–311.
Pring, L. 1988. The ‘reverse-generation’ effect: A comparison of memory performance between blind and
sighted children. Br J Psychol 79 (Pt 3): 387–400.
Proulx, M. J., and P. Stoerig. 2006. Seeing sounds and tingling tongues: Qualia in synaesthesia and sensory
substitution. Anthropol Philos 7: 135–151.
420 The Neural Bases of Multisensory Processes
Proulx, M. J., P. Stoerig, E. Ludowig, and I. Knoll. 2008. Seeing ‘where’ through the ears: Effects of learning-
by-doing and long-term sensory deprivation on localization based on image-to-sound substitution. PLoS
ONE 3: e1840.
Ptito, M., S. M. Moesgaard, A. Gjedde, and R. Kupers. 2005. Cross-modal plasticity revealed by electrotactile
stimulation of the tongue in the congenitally blind. Brain 128: 606–614.
Putzar, L., I. Goerendt, K. Lange, F. Rösler, and B. Röder. 2007. Early visual deprivation impairs multisensory
interactions in humans. Nat Neurosci 10: 1243–1245.
Rauschecker, J. P. 2000. Developmental neuroplasticity during brain development. In Toward a theory of neu-
roplasticity, ed. C. A. Shaw and J. C. McEachern. New York: Taylor and Francis.
Rauschecker, J. P. 2008. Plasticity of cortical maps in visual deprivation. In Blindness and brain plasticity in
navigation and object perception, ed. J. J. Rieser, D. H. Ashmead, F. F. Ebner, and A. L. Corn. New York:
Taylor and Francis.
Rauschecker, J. P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex. J
Neurosci 13: 4538–4548.
Raz, N., A. Amedi, and E. Zohary. 2005. V1 activation in congenitally blind humans is associated with episodic
retrieval. Cereb Cortex 15: 1459–1468.
Raz, N., E. Striem, G. Pundak, T. Orlov, and E. Zohary. 2007. Superior serial memory in the blind: A case of
cognitive compensatory adjustment. Curr Biol 17: 1129–1133.
Recanzone, G. H., M. M. Merzenich, W. M. Jenkins, K. A. Grajski, and H. R. Dinse. 1992. Topographic
reorganization of the hand representation in cortical area 3b of owl monkeys trained in a frequency-
discrimination task. J Neurophysiol 67: 1031–1056.
Renier, L., O. Collignon, C. Poirier et al. 2005. Cross-modal activation of visual cortex during depth perception
using auditory substitution of vision. J Vis 5: 902.
Ricciardi, E., N. Vanello, L. Sani et al. 2007. The effect of visual experience on the development of functional
architecture in hMT+. Cereb Cortex 17: 2933–2939.
Rizzo, J. F., L. Snebold, and M. Kenney. 2007. Development of a visual prosthesis. In Visual Prosthesis and
Ophthalmic Devices: New Hope in Sight, ed. J. Tombran-Tink, C. J. Barnstable, and J. F. Rizzo, 71–93.
Totowa, NJ: Humana Press.
Rockland, K. S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
Int J Psychophysiol 50: 19–26.
Röder, B., and F. Rösler. 2004. Compensatory plasticity as consequence of sensory loss. In The handbook of multi-
sensory processes. ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: Bradford Books, MIT Press.
Röder, B., and F. Rösler. 1998. Visual input does not facilitate the scanning of spatial images. J Ment Imagery
22: 127–144.
Röder, B., F. Rösler, and H. J. Neville. 2000. Event-related potentials during auditory language processing in
congenitally blind and sighted people. Neuropsychologia 38: 1482–1502.
Röder, B., F. Rösler, and H. J. Neville. 2001. Auditory memory in congenitally blind adults: A behavioral–
electrophysiological investigation. Brain Res Cogn Brain Res 11: 289–303.
Röder, B., W. Teder-Salejarvi, A. Sterr, F. Rösler, S. A. Hillyard, and H. J. Neville. 1999. Improved auditory
spatial tuning in blind humans. Nature 400: 162–166.
Röder, B., O. Stock, S. Bien, H. Neville, and F. Rösler. 2002. Speech processing activates visual cortex in con-
genitally blind humans. Eur J Neurosci 16: 930–936.
Rösler, F., B. Röder, M. Heil, and E. Hennighausen. 1993. Topographic differences of slow event-related brain
potentials in blind and sighted adult human subjects during haptic mental rotation. Brain Res Cogn Brain
Res 1: 145–159.
Sadato, N. 2005. How the blind “see” Braille: Lessons from functional magnetic resonance imaging.
Neuroscientist 11: 577–582.
Sadato, N., A. Pascual-Leone, J. Grafman, M. P. Deiber, V. Ibanez, and M. Hallett. 1998. Neural networks for
Braille reading by the blind. Brain 121: 1213–1229.
Sadato, N., A. Pascual-Leone, J. Grafman, V. Ibanez, M. P. Deiber, G. Dold, and M. Hallett. 1996. Activation of
the primary visual cortex by Braille reading in blind subjects. Nature 380: 526–528.
Sampaio, E., S. Maris, and P. Bach-Y-Rita. 2001. Brain plasticity: ‘Visual’ acuity of blind persons via the
tongue. Brain Res 908: 204–207.
Sathian, K. 2000. Practice makes perfect: Sharper tactile perception in the blind. Neurology 54: 2203–2204.
Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived. Dev
Psychobiol 46: 279–286.
Sathian, K., A. Zangaladze, J. M. Hoffman, and S. T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport
8: 3877–3881.
Neurophysiological Mechanisms Underlying Plastic Changes 421
Schlaggar, B. L., and D. D. O’Leary. 1991. Potential of visual cortex to develop an array of functional units
unique to somatosensory cortex. Science 252: 1556–1560.
Schmidt, E. M., M. J. Bak, F. T. Hambrecht, C. V. Kufta, D. K. O’rourke, and P. Vallabhanath. 1996. Feasibility
of a visual prosthesis for the blind based on intracortical microstimulation of the visual cortex. Brain
119(Pt 2): 507–522.
Schorr, E. A., N. A. Fox, V. van Wassenhove, and E. I. Knudsen. 2005. Auditory–visual fusion in speech percep-
tion in children with cochlear implants. Proc Natl Acad Sci U S A 102: 18748–18750.
Schroeder, C. E., J. Smiley, K. G. Fu, T. McGinnis, M. N. O’Connell, and T. A. Hackett. 2003. Anatomical
mechanisms and functional implications of multisensory convergence in early cortical processing. Int J
Psychophysiol 50: 5–17.
Schroeder, C. E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Curr
Opin Neurobiol 15: 454–458.
Segond, H., D. Weiss, and E. Sampaio. 2007. A proposed tactile vision–substitution system for infants who are
blind tested on sighted infants. J Vis Impair Blind 101: 32–43.
Sharma, J., A. Angelucci, and M. Sur. 2000. Induction of visual orientation modules in auditory cortex. Nature
404: 841–847.
Shaw, C. A., and J. C. Mceachern. 2000. Transversing levels of organization: A theory of neuronal stability and
plasticity. In Toward a theory of neuroplasticity, ed. C. A. Shaw and J. C. Mceachern. New York: Taylor
and Francis.
Shimony, J. S., H. Burton, A. A. Epstein, D. G. McLaren, S. W. Sun, and A. Z. Snyder. 2006. Diffusion tensor
imaging reveals white matter reorganization in early blind humans. Cereb Cortex 16: 1653–1661.
Sinha, P. 2003. Face classification following long-term visual deprivation. J Vis 3: 104.
Smith, M., E. A. Franz, S. M. Joy, and K. Whitehead. 2005. Superior performance of blind compared with
sighted individuals on bimanual estimations of object size. Psychol Sci 16: 11–14.
Smits, B., and M. J. C. Mommers. 1976. Differences between blind and sighted children on WISC Verbal
Subtests. New Outlook Blind 70: 240–246.
Spelman, F. A. 2006. Cochlear electrode arrays: Past, present and future. Audiol Neurootol 11: 77–85.
Stilla, R., R. Hanna, X. Hu, E. Mariola, G. Deshpande, and K. Sathian. 2008. Neural processing underlying
tactile microspatial discrimination in the blind: A functional magnetic resonance imaging study. J Vis 8:
13 11–1319.
Tal, N., and A. Amedi. 2009. Multisensory visual–tactile object-related network in humans: Insights from a
novel crossmodal adaptation approach. Exp Brain Res 198: 165–182.
Tillman, M. H., and W. L. Bashaw. 1968. Multivariate analysis of the WISC scales for blind and sighted chil-
dren. Psychol Rep 23: 523–526.
Troyk, P., M. Bak, J., Berg et al. 2003. A model for intracortical visual prosthesis research. Artif Organs 27:
1005–1015.
Uhl, F., P. Franzen, G. Lindinger, W. Lang, and L. Deecke. 1991. On the functionality of the visually deprived
occipital cortex in early blind persons. Neurosci Lett 124: 256–259.
Ungerleider, L. G., and M. Mishkin. 1982. Two cortical visual systems. In Analysis of visual behavior, ed. D. J.
Ingle, M. A. Goodale, and R. J. W. Mansfield. Boston: MIT Press.
Van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in
the human brain. Neuron 43: 271–282.
Vanlierde, A., and M. C. Wanet-Defalque. 2004. Abilities and strategies of blind and sighted subjects in visuo-
spatial imagery. Acta Psychol (Amst) 116: 205–222.
Veraart, C., M. C. Wanet-Defalque, B. Gerard, A. Vanlierde, and J. Delbeke. 2003. Pattern recognition with the
optic nerve visual prosthesis. Artif Organs 27: 996–1004.
Von Melchner, L., S. L. Pallas, and M. Sur. 2000. Visual behaviour mediated by retinal projections directed to
the auditory pathway. Nature 404: 871–876.
Voss, P., M. Lassonde, F. Gougoux, M. Fortin, J. P. Guillemot, and F. Lepore. 2004. Early- and late-onset blind
individuals show supra-normal auditory abilities in far-space. Curr Biol 14: 1734–1738.
Wakefield, C. E., J. Homewood, and A. J. Taylor. 2004. Cognitive compensations for blindness in children: An
investigation using odour naming. Perception 33: 429–442.
Wallace, M. 2004a. The development of multisensory processes. Cogn Process 5: 69–83.
Wallace, M. T. 2004b. The development of multisensory integration. In The handbook of multisensory pro-
cesses, ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: MIT Press.
Ward, J., and P. Meijer. 2009. Visual experiences in the blind induced by an auditory sensory substitution
device. Conscious Cogn 19: 492–500.
422 The Neural Bases of Multisensory Processes
Warren, D. H. 1994. Blindness and children: An individual differences approach. New York: Cambridge Univ.
Press.
Weiland, J. D., W. Liu, and M. S. Humayun. 2005. Retinal prosthesis. Annu Rev Biomed Eng 7: 361–401.
Weiland, J. D., and M. S. Humayun. 2008. Visual prosthesis. Proc IEEE 96: 1076–1084.
West, E. L., R. A. Pearson, R. E. MacLaren, J. C. Sowden, and R. R. Ali. 2009. Cell transplantation strategies
for retinal repair. Prog Brain Res 175: 3–21.
Wiesel, T. N., and D. H. Hubel. 1963. Single-cell responses in striate cortex of kittens deprived of vision in one
eye. J Neurophysiol 26: 1003–1017.
Wiesel, T. N., and D. H. Hubel. 1965. Comparison of the effects of unilateral and bilateral eye closure on corti-
cal unit responses in kittens. J Neurophysiol 28: 1029–1040.
Wittenberg, G. F., K. J. Werhahn, E. M. Wassermann, P. Herscovitch, and L. G. Cohen. 2004. Functional connec-
tivity between somatosensory and visual cortex in early blind humans. Eur J Neurosci 20: 1923–1927.
Yu, C., Y. Liu, J. Li et al. 2008. Altered functional connectivity of primary visual cortex in early blindness.
Hum Brain Mapp 29(5): 533–543.
Zangaladze, A., C. M. Epstein, S. T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile
discrimination of orientation. Nature 401: 587–590.
Zwiers, M. P., A. J. Van Opstal, and J. R. Cruysberg. 2001. A spatial hearing deficit in early-blind humans.
J Neurosci 21: RC142: 1–5.
22 Visual Abilities in Individuals
with Profound Deafness
A Critical Review
Francesco Pavani and Davide Bottari
CONTENTS
22.1 Visual Abilities in Profound Deafness: An Open Challenge for Cross-Modal Plasticity
Research................................................................................................................................. 423
22.1.1 Multiple Operational Definitions............................................................................... 425
22.1.2 Making Sense of Heterogeneity................................................................................ 426
22.2 A Task-Oriented Review of Empirical Evidence................................................................... 427
22.2.1 Perceptual Thresholds Tasks..................................................................................... 427
22.2.2 Simple Detection and Lateralization Tasks............................................................... 430
22.2.3 Visual Search Tasks................................................................................................... 432
22.2.4 Visual Discrimination and Identification Tasks........................................................ 434
22.2.4.1 Visual Discrimination with Flanker Interference . .................................... 436
22.2.5 Visual Tasks of Higher Complexity........................................................................... 438
22.3 A Transversal View on Literature.........................................................................................440
22.3.1 Enhanced Reactivity Rather than Enhanced Perceptual Processing........................440
22.3.2 Role of Deaf Sample Characteristics and Visual Stimulus Characteristics Are
Relevant but Not Critical........................................................................................... 441
22.3.3 Role of Target Eccentricity and Selective Visual Attention Is Critical but
Underspecified........................................................................................................... 441
22.4 Conclusions and Future Directions........................................................................................ 443
Acknowledgments...........................................................................................................................444
References.......................................................................................................................................444
423
424 The Neural Bases of Multisensory Processes
very far in space. In particular, hearing can provide a good estimate of the most likely location in
space of the nonvisible stimulus (see Heffner and Heffner 1992 for a cross-species evaluation of the
relationship between the ability to localize a sound and the width of the field of best vision). In addi-
tion, hearing constantly models the acoustic regularity in the environment and reacts to violations
of such regularity, regardless of the current behavioral goal of the individual (Näätänen 1992). Thus,
audition constitutes a fundamental guidance for reorienting our exploratory behavior. Efficient inte-
gration of sensory inputs from audition and vision is therefore essential for successful exploration
of the surrounding environment.
The way our cognitive system perceives the multisensory environment in which we live leads to
a fundamental question that has long been debated among scientists and philosophers: What are the
consequences of the absence of one sensory modality for cognition and multisensory perception?
For instance, which are the consequences of long-term auditory deprivation due to profound deaf-
ness for the remaining sensory modalities, mainly vision and touch? An interest for this issue can
be traced back at least to the seventeenth century (for historical reviews, see Hartmann 1933; Jordan
1961), and two opposing hypotheses have traditionally been put forward to account for the impact
of sensory deprivation (i.e., deafness or blindness) on the remaining senses. The first hypothesis is
that a substantial deficit in one sensory modality could affect the development and organization of
the other sensory systems. We will refer to this first perspective as the perceptual deficit hypothesis.
When applied to the case of profound deafness, the perceptual deficit hypothesis predicts poorer
visual and tactile perceptual performance in deaf individuals, as compared to the age-matched hear-
ing controls (e.g., Myklebust 1964). This hypothesis was based on the assumption that auditory defi-
ciency can have a direct impact on the development of the other senses. In addition, it assumed that
any language impairments resulting from profound deafness would limit hearing-impaired children
in their interaction with the world, and result in a cognitive development lag in perceptual and cog-
nitive tasks (Furth 1966). The second hypothesis is that a deficit in one sensory system would make
the other modalities more sensitive, vicariously compensating for the loss of one sensory channel
(e.g., Gibson 1969). We will refer to this second perspective as the sensory compensation hypoth-
esis. When applied to the case of profound deafness, the sensory compensation hypothesis predicts
that the visual and tactile modalities will show enhanced sensitivity. The latter prediction is often
stated both in terms of behavioral consequences of deafness, and in terms of its neural outcomes.
Specifically, the neural implications of the sensory compensation hypothesis are that the brain areas
serving the impaired sensory modality may develop the ability to process perceptual inputs from
one or more of the intact sensory systems ( functional reallocation account), or alternatively that
brain areas of the remaining senses may acquire enhanced functional and processing capabilities
(remaining senses hypertrophy account).
After more than 30 years of systematic research conducted mainly on the visual abilities of pro-
foundly deaf individuals, it is apparent that the long-standing debate as to whether perceptual and
cognitive functions of deaf individuals are deficient or supranormal is far from being settled. Several
reviews of this literature (e.g., Parasnis 1983; Bavelier et al. 2006; Mitchell and Maslin 2007) clearly
indicate that deaf and hearing individuals perform comparably on a number of perceptual tasks. As
we shall see later (see Section 22.2.1), this conclusion is strongly supported by tasks involving basic
perceptual thresholds. Instead, other studies have revealed a differential performance in the two
groups, either in the direction of deficient abilities in deaf than hearing participants (e.g., Quittner
et al. 2004; Parasnis et al. 2003), or in the direction of supranormal abilities for the deaf population
(e.g., Bottari et al. 2010; Loke and Song 1991; Neville and Lawson 1987). In this context, it should
perhaps be emphasized that in the absence of clear behavioral differences between deaf and hear-
ing participants, even the most striking differences between the two groups observed at the neural
level cannot disentangle between the perceptual deficit hypothesis and the sensory compensation
hypotheses. For instance, much of the renewed interest in the study of visual abilities in deaf indi-
viduals has been motivated by the seminal work of Neville et al. (1983). In that study, visual evoked
potentials (VEPs) recorded from the scalp of eight congenitally deaf adults were significantly larger
Visual Abilities in Individuals with Profound Deafness 425
over both auditory and visual cortices, with respect to those of eight hearing controls, specifically
for visual stimuli occurring in the periphery of the visual field (8.3°). Although this pioneering work
implies that the lack of auditory experience from an early age can influence the organization of the
human brain for visual processing [a finding that was later confirmed and extended by many other
studies using different methodologies for the recording of brain responses; e.g., electroencephalo-
gram (EEG): Neville and Lawson 1987; magnetoencephalography: Finney et al. 2003; functional
magnetic resonance imaging: Bavelier et al. 2000, 2001], in the absence of a behavioral difference
between the two groups it remains potentially ambiguous whether modifications at the neural level
are an index of deficiency or compensation. In other words, even if one assumes that larger visual
evoked components (e.g., Neville et al. 1983; Neville and Lawson 1987) or stronger bold responses
(e.g., Bavelier et al. 2000; 2001) indicate enhanced processing of the incoming input, if this is not
accompanied by behavioral enhancement it is difficult to conclude that it really serves some adap-
tive functional role. Unfortunately, the current evidence in the literature lacks this explicative power.
With the only exception of the work by Neville and Lawson (1987), all other neuroimaging studies
focused on measures of brain response alone, instead of combined measures of brain response and
behavior. Furthermore, conclusive evidence that cortical reorganization serves a functional role can
only originate from the observation that interfering with the reorganized brain response [e.g., using
transcranial magnetic stimulation (TMS)] impairs the supranormal behavioral performance in the
sensory-deprived participants (e.g., see Cohen et al. 1997 for an example of abolished supranormal
tactile discrimination in the blind, following disruption of occipital lobe function using TMS).
of perceptual events. Finally, we conclude with a section on visual tasks of higher complexity that
extended the operational definition to include the contribution of visual working memory and dual
task performance.
studies have provided general support to the hypothesis that peripheral regions of the visual field
have a different status for deaf individuals with respect to hearing controls. However, the actual
visual eccentricities associated with the terms “central,” “perifoveal,” and “peripheral” consider-
ably varied across the different studies. Researchers have referred to stimulus location as “central”
both when the stimulus was presented directly at fixation (e.g., Poizner and Tallal 1987) and when
it was perifoveal (e.g., Neville and Lawson 1987). More critically, the term “peripheral” has been
applied to locations in the visual field ranging from 3° of eccentricity (e.g., Chen et al. 2006) to 20°
or more (e.g., Colmenero et al. 2004; Loke and Song 1991; Stevens and Neville 2006). As pointed
out by Reynolds (1993), this ambiguity in the adopted terminology originate from the fact that the
boundaries of the foveal region (up to 1.5° from fixation) are well defined by anatomical structures,
whereas the distinction between perifoveal and peripheral visual field is not.
Finally, most researchers have suggested that spatial selective attention plays a key role in modulat-
ing visual responses in deaf individuals (e.g., Bavelier et al. 2006; Dye et al. 2008; Loke and Song 1991;
Neville and Lawson 1987; Parasnis and Samar 1985; Sladen et al. 2005). This suggestion originated
from the studies that examined attention orienting in deaf and hearing participants (e.g., Colmenero et
al. 2004; Parasnis and Samar 1985) and found that deaf individuals pay less of a cost when detecting a
target occurring at invalidly cued locations. Furthermore, a potential difference in selective attention
has been proposed by those studies that examined the interference of flankers on target discrimination
(Proksch and Bavelier 2002; Sladen et al. 2005) and found that deaf individuals were more suscep-
tible to peripheral flankers than hearing controls. Finally, the suggestion that employment of selective
attention resources is the key requisite for revealing differences between deaf and hearing participants
has emerged from the empirical observation that differences between deaf individuals and hearing
controls have sometimes emerged specifically when attention was endogenously directed to the target
(e.g., Bavelier et al. 2000; Neville and Lawson 1987; but see Bottari et al. 2008).
However, whether all aspects of visual enhancement in deaf individuals are necessarily linked to
allocation of selective attention in space is still a matter of debate. Furthermore, it is well acknowl-
edged that selective spatial attention is not a unitary mechanism, and at least two functionally and
anatomically distinct mechanisms of spatial attention have been identified (Corbetta and Shulman
2002; Jonides 1981; Mayer et al. 2004; Posner 1980). Visual attention can be oriented to an object
or a location in a bottom-up fashion, because an abrupt change in visual luminance at the retinal
level has occurred in a specific region of the visual field. This type of attention orienting is entirely
automatic and has typically been referred to as exogenous orienting. Alternatively, visual attention
can be summoned to an object or a location because of its relevance for the behavioral goal of the
individual. This type of top-down attention orienting is voluntary and strategic, and has typically
been referred to as endogenous orienting. Whether one or both of the components of selective atten-
tion are changed as a consequence of deafness remains an open question. Thus, whenever the claim
that “early deafness results in a redistribution of attentional resources to the periphery” is made
(e.g., Dye et al. 2008, p. 75), one should also ask which aspect of selective attention (endogenous,
exogenous, or both) is changed by profound deafness.
In sum, four distinct transversal aspects may contribute to explain the heterogeneity of the
empirical results in the different behavioral tasks: diversity in the deaf sample characteristics, visual
characteristics of the target stimulus, target eccentricity, and role of selective spatial attention. The
second aim of the present review is to reevaluate the empirical evidence in support of these four
different (but possibly interrelated) aspects in modulating visual abilities in deaf individuals.
(11 years old on average) for two circular patches of white light presented at 4.8° of eccentricity,
on opposite sides with respect to the participant’s body midline. Initially, the just noticeable dif-
ference (JND) between the two patches was measured for each participant. Then, brightness for
one of the two stimuli (variable) was set to 0.75 JND units above or equal to the other (standard),
and participants were instructed to indicate whether the variable stimulus was brighter or equal in
apparent brightness with respect to the standard. In the latter task, the probability that the variable
stimulus was brighter than the standard changed between blocks, from less likely (0.25), to equal
(0.50), to more likely (0.75). Deaf and hearing participants showed comparable JNDs for brightness
discrimination. However, their sensitivity in the forced-choice task was better than hearing con-
trols, as measured by d′. Intriguingly, deaf performance was entirely unaffected by the probability
manipulation (i.e., deaf participants maintained a stable criterion, as measured by β), unlike hearing
controls who became more liberal in their criterion as stimulus probability increased. However, the
same two groups of participants showed comparable sensitivity (d′) when retested in a second study
with largely comparable methods (Bross 1979b). In addition, in one further study adapting the same
paradigm for visual-flicker thresholds, no difference between deaf and hearing controls emerged it
terms or d′ or β (Bross and Sauerwein 1980). This led Bross and colleagues (Bross 1979a, 1997b;
Bross and Sauerwein 1980) to conclude that no enhanced sensory sensitivity is observed in deaf
children, in disagreement with the sensory compensation hypothesis.
Finney and Dobkins (2001) reached a similar conclusion when measuring contrast sensitivity
to moving stimuli in 13 congenital or early deaf adult participants (all signers), 14 hearing subjects
with no signing experience, and 7 hearing subjects who signed from birth [Hearing Offspring of
Deaf parents (HOD)]. Stimuli were black and white moving sinusoidal gratings presented for 300
ms to the left or to the right of one visual marker, and the participant’s task was to report whether the
stimulus appeared to the left or to the right of the marker. Five markers were visible throughout the
task (the central fixation cross and four dots located at 15° of eccentricity with respect to fixation).
The stimulus could appear next to any of the five markers, thus forcing participants to distribute
their visual attention across several visual locations. The luminance contrast required to yield 75%
correct performance was measured for each participant across a range of 15 different combina-
tions of spatial and temporal frequency of the stimulus. Regardless of all these manipulations,
deaf, hearing, and HOD participants performed comparably on both central and peripheral stimuli,
leading to the conclusion that neither deafness nor sign-language use lead to overall increases or
decreases in absolute contrast sensitivity (Finney and Dobkins 2001, p. 175). Stevens and Neville
(2006) expanded this finding by showing that contrast sensitivity was comparable in 17 congenital
deaf and 17 hearing individuals, even for stimuli delivered in the macula of the participant, at 2°
around visual fixation (see also Bavelier et al. 2000, 2001, for further evidence of comparable lumi-
nance change detection in deaf and hearing individuals). Interestingly, a between-group difference
was instead documented when the task was changed to unspeeded detection of a small (1 mm) white
light, moving from the periphery to the center of the visual field. In this kinetic perimetry task,
deaf participants showed an enlarged field of view (about 196 cm2) with respect to hearing controls
(180 cm2), regardless of stimulus brightness.
The latter finding suggests that perceptual thresholds may differ for deaf and hearing individuals
when motion stimuli are employed. However, three further investigations (Bosworth and Dobkins
1999, 2002a; Brozinsky and Bavelier 2004) that examined the performance of deaf and hearing
participants in motion discrimination tasks indicate that this is not always the case. Bosworth and
Dobkins (1999) tested 9 congenital or early deaf (all signers) and 15 hearing (nonsigner) adults in
a motion direction–discrimination task. The stimulus consisted of a field of white dots presented
within a circular aperture, in which a proportion of dots (i.e., signal dots) moved in a coherent direc-
tion (either left or right), whereas the remaining dots (i.e., noise dots) moved in a random fashion.
Similar to the study of Finney and Dobkins (2001), stimuli were either presented at central fixa-
tion, or 15° to the left or to the right of fixation. Participants were instructed to report the direction
of motion with a key press, and the proportion of coherent motion signal yielding 75% correct
Visual Abilities in Individuals with Profound Deafness 429
performance was measured for each participant. Mean thresholds did not differ between deaf and
hearing controls, regardless of stimulus eccentricity (central or peripheral), stimulus duration (250,
400, or 600 ms) and vertical location of the lateralized stimuli (upper or lower visual field). The
only between group difference concerned the performance across the two visual hemifields. Deaf
participants exhibited a right visual field (RVF) advantage, whereas hearing controls exhibited a
slight left visual field (LVF) advantage. The latter finding, however, reflected the signing experience
rather than auditory deprivation, and resulted from the temporal coincidence between visual and
linguistic input in the left hemisphere of experienced signers, as subsequently shown by the same
authors (Bosworth and Dobkins 2002b). A convergent pattern of result emerged from the study by
Bosworth and Dobkins (2002a), in which 16 deaf signers (12 congenital), 10 hearing signers, and 15
hearing controls were asked to detect, within a circular aperture, the direction of motion of a pro-
portion of dots moving coherently (leftward or rightward), whereas the remaining dots moved in a
random fashion. The proportion of dots moving coherently varied across trials, to obtain a threshold
of the number of moving coherently dots necessary to yield the 75% of correct discriminations. The
results showed that all group of participants performed comparably in terms of thresholds suggest-
ing that deafness does not modulate the motion threshold.
Convergent findings also emerged from a study by Brozinsky and Bavelier (2004), in which 13
congenitally deaf (signers) and 13 hearing (nonsigner) adults were asked to detect velocity increases
in a ring of radially moving dots. On each trial, dots accelerated in one quadrant and participants
indicated the location of this velocity change in a four-alternative forced choice. Across experi-
ments, the field of dots extended between 0.5° and 8°, or between 0.4° and 2° (central field), or
between 12° and 15° (peripheral field). The temporal duration of the velocity change yielding to
79% correct was measured for each participant. Regardless of whether the dots moved centrally or
peripherally, velocity thresholds were equivalent for deaf and hearing individuals. Similar to the
study by Bosworth and Dobkins (1999), deaf signers displayed better performance in the RVF than
the LVF, again as a possible result of their fluency in sign language.
Equivalent performance in deaf and hearing individuals has been documented also when
assessing temporal perceptual thresholds (e.g., Bross and Sauerwein 1980; Poizner and Tallal
1987; Nava et al. 2008; but see Heming and Brown 2005). Poizner and Tallal (1987) conducted a
series of experiments to test temporal processing abilities in 10 congenitally deaf and 12 hearing
adults. Two experiments examined flicker fusion thresholds for a single circle flickering on and
off at different frequencies, or for two circles presented in sequence with variable interstimulus
interval (ISI) (Poizner and Tallal 1987; Experiments 1 and 2). One additional experiment tested
temporal order judgment abilities for pairs or triplets of visual targets presented in sequence
(Poizner and Tallal 1987; Experiment 3). All visual targets appeared from the same central spatial
location on the computer screen and participants were asked to report the correct order of target
appearance. No difference between deaf and hearing participants emerged across these tasks.
More recently, Nava et al. (2008) tested 10 congenital or early deaf adults (all signers), 10 hear-
ing controls auditory-deprived during testing, and 12 hearing controls who were not subjected to
any deprivation procedure, in a temporal order judgment for pairs of visual stimuli presented at
perifoveal (3°) or peripheral (8°) visual eccentricities. Regardless of stimulus eccentricity, tem-
poral order thresholds (i.e., JNDs) and points of subjective simultaneity did not differ between
groups. Notably, however, faster discrimination responses were systematically observed in deaf
than hearing participants, especially when the first of the two stimuli appeared at peripheral loca-
tions (Nava et al. 2008).
Finally, one study testing perceptual threshold for frequency discrimination in the tactile modal-
ity also confirmed the conclusion of comparable perceptual thresholds in deaf and hearing individu-
als (Levanen and Hamdorf 2001). Six congenitally deaf (all signers) and six hearing (nonsigners)
adults were asked to decide whether the frequency difference between a reference stimulus (at 200
Hz) and a test stimulus (changing in interval between 160 and 250 Hz) was “rising” or “falling.”
The frequency difference between the two stimuli that yielded 75% correct responses was measured
430 The Neural Bases of Multisensory Processes
for each participant. Although the frequency difference threshold was numerically smaller for deaf
than hearing participants, no statistically significant difference emerged.
In sum, the studies that have adopted perceptual thresholds to investigate the consequences of
deafness on vision and touch (i.e., used an operational definition of better performance in terms of
better low-level sensitivity to the stimulus) overall documented an entirely comparable performance
between deaf and hearing individuals. Importantly, these findings emerged regardless of whether
hearing-impaired participants were congenitally deaf born from deaf parents or early deaf. One
clear example of this is the comparison between the study by Poizner and Tallal (1987) and Nava
et al. (2008), which tested genetically versus early deaf on a comparable temporal order judgment
task, and converged to the same conclusion. The absence of a difference at the perceptual level also
emerged regardless of stimulus feature and eccentricity, i.e., regardless of whether target stimuli
were static (e.g., Bross 1979a, 1979b) or moving (e.g., Bosworth and Dobkins 1999; Brozinsky and
Bavelier 2004), and regardless of whether they appeared at central (e.g., Bosworth and Dobkins
1999; Brozinsky and Bavelier 2004; Poizner and Tallal 1987; Stevens and Neville 2006) or periph-
eral locations (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004; Nava et al. 2008).
Finally, making the stimulus location entirely predictable (Bross 1979a; Poizner and Tallal 1987) or
entirely unpredictable (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004) also had no
effect, indicating that comparable performance of deaf and hearing participants was not modulated
by the direction of selective visual attention in the scene. The only notable discrepancy with respect
to this very consistent pattern of results is the observation of Stevens and Neville (2006) that deaf
individuals possess a larger field of view with respect to hearing controls in the kinetic perimetry
task. It would be interesting to examine whether this finding can also be replicated with stationary
target at the extreme visual periphery.
of the responding finger (the purpose of the simultaneous bilateral response was to balance hemi-
spheric motoric activity in the task). Perifoveal targets consisted of six simple shapes (e.g., circle,
square, triangle, diamond) that could be presented alone or simultaneously with task-irrelevant
shapes of increasing complexity (from basic shapes to human faces or letters) delivered at fixation.
Immediately after stimulus detection, participants were also required to identify the shape of the
peripheral stimulus. Two results are noteworthy: first, simple detection of the foveal circle (baseline
task) was faster for deaf than hearing participants (70 ms on average); second, simple detection and
subsequent discrimination of the peripheral shapes also confirmed faster RTs for deaf than hearing
participants (56 ms), but failed to show any between-group difference in identification accuracy (see
Section 22.2.4 for further discussion of this study).
More recently, Bottari et al. (in preparation) asked 11 congenital or early deaf (all signers) and 11
hearing adults (non signers) to press the space bar of the computer keyboard to the appearance of a
small black circle, delivered for 48 ms on the computer screen at 3° or 8° of eccentricity. The results
showed that deaf were faster than hearing controls (56 ms on average) at detecting the onset of the
visual target, regardless of whether it appeared at 3° or 8°. Similarly, Bottari et al. (2010) asked a
different group of 11 congenital or early deaf (all signers) and 11 hearing controls (non signers) to
detect a circle open on the left or right side, presented for 48 ms at the 3° or 8° from central fixa-
tion. Stimuli were now corrected in size as a function of their eccentricity, and trials per condition
were increased from 24 to 96 to increase statistical power. The results of this second study entirely
supported those of Bottari et al. (in preparation), and showed a response time advantage in deaf
than hearing participants (44 ms on average) that again was not spatially selective, i.e., it emerged
regardless of target location instead of appearing only for peripheral targets (Loke and Song 1991).
One further finding of the study by Bottari and colleagues (2010) was that the overall RT advantage
for deaf participants emerged together with a differential response time ratios in the two groups
as a function of target location. Hearing controls paid a significant RT cost when responding to
peripheral than central target, whereas deaf individuals performed comparably across the two target
locations. This suggests that advantages in reactivity and advantages in peripheral processing may
be two dissociable aspects of enhanced visual processing in deaf individuals (see Section 22.3.3 for
further discussion of this point).
Other studies measuring speeded simple detection or speeded target lateralization in deaf people
also manipulated the direction of attention before target onset, typically adapting the cue–target
paradigm developed by Posner (1980). The first study to adopt this manipulation was conducted by
Parasnis and Samar in 1985. They tested 20 hearing and 20 congenitally deaf college students (all
signers and born from deaf parents) in a task requiring a speeded bimanual response (see Reynolds
1993) to indicate the side of a black unfilled circle, presented for 100 ms at 2.2° from central fixation.
The stimulus was preceded by an arrow indicating the correct target side 80% of the times, or by a
neutral cross signaling equal probability of the target on either side. In addition, across blocks, the
peripheral target was presented with concurrent stimulation at fixation (five black crosses; i.e., foveal
load condition) or alone (no load condition). Unlike the simple detection studies described above, the
results of this experiment showed no overall RT advantage for deaf than hearing participants (in fact,
there was even a trend for slower RTs in deaf than participants overall). Furthermore, all participants
showed RT benefits and costs, with respect to the neutral trials, when the target appeared at the cued
or the uncued location, respectively. However, deaf participants paid less cost than hearing controls
when responding to targets at the uncued locations under the foveal load condition. Parasnis and
Samar (1985) interpreted this finding as evidence of more efficient “redirecting of attention from one
part of the visual field to another in the presence of interfering foveal stimulation,” and concluded
that “developmental experience involving a visual–spatial language and/or a predominantly visual
(as contrasted with visual plus auditory) perception of the world leads to selective and ecologically
useful alterations in attentional control of perceptual processes” (Parasnis and Samar 1985, p. 321).
The results and conclusions of the classic study by Parasnis and Samar (1985) created the basis
for the widespread notion that attention reorienting is more efficient in deaf than hearing individuals.
432 The Neural Bases of Multisensory Processes
However, two further contributions that also examined simple detection of visual stimuli in the
presence of attentional cues suggest a more complex framework. Colmenero et al. (2004) asked 17
deaf (all signers with prelingually deafness) and 27 hearing adults to press a key whenever an “O”
appeared on the computer screen. The target appeared for 150 ms, at 20° of eccentricity to the left
or the right of central fixation, and was preceded by a vertical mark delivered at the exact target
location (valid condition, 53% of the trials), on the opposite side with respect to the target (invalid
condition, 13% of the trials) or on both sides (neutral condition, 33% of the trials). Stimulus onset
asynchrony (SOA) between cue and target ranged between 125 and 250 ms. Note that the use of
peripheral informative cues in this paradigm inevitably mixed exogenous and endogenous cueing of
attention within the same task. Deaf participants were faster than hearing control at detecting the
target (43 ms on average). Furthermore, the analysis of RT costs and benefits, for invalid and valid
cues, respectively, revealed that both attentional effects were larger in hearing than deaf partici-
pants. In a second experiment, Colmenero and colleagues (2004) examined whether performance
in the two groups differed when the SOA between the lateralized cue and the target was extended to
350 or 850 ms. With such long SOAs, hearing individuals typically show a cost at detecting targets
occurring at the cued location, which is interpreted as inhibition to reexplore locations where atten-
tion has been previously oriented [i.e., inhibition of return (IOR); Klein 2000)]. The results of this
second experiment revealed less enduring IOR in deaf than in hearing participants, again suggest-
ing a different role of attention orienting in the hearing-deprived population.
Chen et al. (2006) asked 16 congenitally deaf and 22 hearing adults to detect the occasional
appearance of a dot, presented at perifoveal locations (3°; see also Section 22.2.4 for a full descrip-
tion of the design of this study). The dot appeared with equal probability to the right or to the left
of fixation and was preceded by a valid or invalid exogenous cues. As in the study of Colmenero et
al. (2004), the SOA between the lateralized cue and the target was in the typical range for IOR (i.e.,
900 ms). Although IOR effects were again observed, these did not differ between the two groups.
However, the results revealed that detection of perifoveal targets was systematically faster in deaf
than in hearing participants (59 ms on average) regardless of the attention condition (i.e., valid or
invalid; Chen et al. 2006, Experiment 1).
In sum, two relevant aspects emerge from the studies that adopted an operational definition of bet-
ter visual performance in deaf individuals in terms of enhanced reactivity to the stimulus. First, all
reports (with the sole exception of the speeded lateralization study by Parasnis and Samar 1985) docu-
mented a response speed advantage in deaf than hearing individuals. Figure 22.1 summarizes this
result graphically, by plotting the percentage difference in RTs between hearing and deaf participants
with respect to the mean RT of the hearing group, in the different studies and as a function of stimulus
eccentricity. With the sole exception of point [3] corresponding to the study by Parasnis and Samar
(1985), all data points are above zero, indicating that deaf participants were faster than the hearing
controls (on average, 13% faster with respect to the hearing group; see legend to Figure 22.1 for exact
RT differences in milliseconds). Importantly, this response advantage in deaf participants emerged
regardless of whether the target appeared directly at fixation or at locations further toward the periph-
ery. This supranormal performance of deaf individuals in terms of response speed was also uninflu-
enced by the preceding attention cueing condition (e.g., Colmenero et al. 2004; Chen et al. 2006).
The second relevant aspect concerns the effect of attentional instructions on the performance of
deaf people. Deaf participants can benefit from valid cueing of spatial selective attention (Parasnis
and Samar 1985), but at the same time there is evidence that their performance may be less suscep-
tible to invalid attention orienting (e.g., Parasnis and Samar 1985; Colmenero et al. 2004) or IOR
(Colmenero et al. 2004; but see Chen et al. 2006) than hearing controls.
[3]
–25%
–50%
Visual eccentricity
FIGURE 22.1 Difference in RT between hearing and deaf individuals (expressed as a percentage of mean
RT of hearing group) across different studies, as a function of target eccentricity (in degrees). Multiple data
points from the same study (e.g., see point [2]) refer to targets at different eccentricities. Positive values on
Y-axis indicate faster response time in deaf than in hearing controls. Foveal (up to 1.5°), perifoveal (from 1.5°
to 5°), and peripheral eccentricities (beyond 5°) are indicated in plot by shadings of different hues of gray.
However, note that only boundaries of foveal visual field are clearly specified by anatomical landmarks; thus,
the distinction between perifoveal and peripheral regions is instead conventional (we adopted here the distinc-
tion proposed by Reynolds 1993; see Section 22.1.2). Actual RT difference are as follows: [1] Reynolds (1993):
70 ms at 0°, 56 ms at 4°; [2] Loke and Song (1991): 38 ms at 0.5°, 85 ms at 25°; [3] Parasnis and Samar (1985):
−58 ms at 2.2°; [4] Chen et al. (2006): 59 ms at 3°; [5] Colmenero et al. (2004): 43 ms at 20°; [6] Bottari et al.
(in preparation): 52 ms at 3°, 59 ms at 8°; [7] Bottari et al. (2010): 54 ms at 3°, 59 ms at 8°.
perception literature, visual search tasks have classically been employed to distinguish perceptual
processes requiring attention from perceptual processes occurring preattentively. When response
time is unaffected by the number of distractors in the array, the search is typically described as
preattentive (i.e., it does not require attention shift to the target in order to produce the response). By
contrast, when response time increases as a function of the number of distractors in the array, the
search is assumed to require serial attention shifts to the various items (Treisman 1982).
Henderson and Henderson (1973) were the first to compare the abilities of deaf and hearing
children (12.5 to 16.5 years old) in a visual search task that required searching for a target letter
in a letter array containing capital and lowercase letters. Although they found that the two groups
did not differ in the visual search task, it should be noted that the high similarity between the
target and the distractors inevitably determined a serial search in both groups. Several years later,
Stivalet and colleagues (1998) also adopted a visual search task to examine visual processing in
congenitally deaf and hearing adults. Unlike Henderson and Henderson (1973), they manipulated
the complexity of the search by asking participants to detect the presence or absence of a Q among
O’s (easier search, because the target contains a single identifying feature) or of an O among Q’s
(harder search, because the target is lacking a feature with respect to the distractors). Moreover,
to obtain a measure of visual processing time, which could be separate from the time required
for motor program retrieval and response initiation/execution, all stimuli were masked after a
variable interval and the dependent variable was the duration of the interval between stimuli and
mask sufficient to reach 90% correct. Notably, all stimuli were presented within the perifoveal
region, at an eccentricity ranging between 4.1° and 4.9°. When searching for Q among Os (easier
search), both groups performed a parallel search that was unaffected by the number of distractors
(4, 10, or 16). By contrast, when searching for an O among Qs (harder search), deaf adults proved
434 The Neural Bases of Multisensory Processes
substantially more efficient than hearing controls, with their visual search time (9 ms/letter) fall-
ing within the range of parallel processing (Enns and Rensink 1991), unlike hearing participants
(22 ms/letter).
Further evidence along the same direction came from a visual search study by Rettenbach and
colleagues (1999). They tested eight deaf and eight hearing adults, in a series of visual search task of
different complexity. Unlike the study by Stivalet and colleagues (1998), the stimuli covered a wide
visual angle, both vertically (20°) and horizontally (26°), thus spanning from central to peripheral
locations. The results revealed more efficient visual search in deaf than hearing adults. Interestingly,
when the same study was repeated in children and adolescents, deaf participants systematically
underperformed with respect to the age-matched hearing controls (see also Marendaz et al. 1997),
suggesting a potential developmental trajectory in the development of different visual search abili-
ties in deaf individuals.
In sum, the studies that evaluated visual search abilities in deaf and hearing controls indicate
that the range for parallel processing is ampler in deaf than hearing controls (Stivalet et al. 1998;
Rettenbach et al. 1999). Furthermore, this enhanced visual ability appears to be independent of the
spatial location of the stimuli, as it emerged for perifoveal (Stivalet et al. 1998) as well as periph-
eral stimuli (Rettenbach et al. 1999). However, the reconciliation of visual search findings with the
observation of less susceptibility of deaf participants to invalid cueing or IOR (e.g., Parasnis and
Samar 1985; Colmenero et al. 2004) is not straightforward. As we shall discuss later (see Section
22.3.3), assuming that both visual search and cueing effects can be accounted for by faster reorient-
ing of attention implies a description of better visual search in deaf individuals in terms of faster and
more efficient movement of the attention spotlight in space. This interpretation, however, is at odds
with the description of better search as being the result of preattentive processing.
unfamiliar Greek trigrams, suggesting that any discrimination difference between groups reflected
linguistic rather than perceptual difficulties.
A seminal works that adopted a visual discrimination task was conducted by Neville and Lawson
in 1987. In that study, behavioral and ERP responses were recorded while 12 congenitally deaf
adults (all signers, with at least one deaf parent) and 12 aged-match hearing controls performed a
discrimination of direction of motion for suprathreshold visual stimuli. Visual stimuli were white
squares presented at central (just above fixation) or peripheral locations (18° to the right or to the left
of central fixation), with an ISI from trial onset ranging randomly between 280 and 480 ms. On 80%
of the trials (termed “standards”), a single square appeared at one of these predetermined locations
for 33 ms. On the remaining 20% of the trials (termed “deviants”), the square jumped slightly to
one of eight possible immediately adjacent locations after the first 33 ms. The participant’s task con-
sisted in discriminating the direction of this moving square in deviant trials. Importantly, although
participants fixated centrally throughout the experimental session, they were also requested to ori-
ent their attention to one of the three possible target locations (center, left, or right) across blocks.
In terms of behavioral performance, deaf were faster than hearing controls (on average 70 ms) at
discriminating moving targets at the peripheral locations; by contrast, no between-group differ-
ence in RT emerged for targets occurring at central locations. Instead, the two groups performed
comparably in terms of sensitivity (d′): although hearing individuals showed better discrimination
ability in RVF than LVF, deaf participants showed the opposite pattern. In terms of EEG response,
three main findings were reported. First, the visual evoked component, termed P1 (i.e., positivity
peaking at about 100 ms after the stimulus presentation), was comparable between groups regard-
less of whether the stimulus was standard or deviant, and regardless of stimulus location and atten-
tion condition. Second, a larger amplitude in the N1 component emerged in deaf than in hearing
controls, when standard or deviant targets appeared at attended peripheral locations. These greater
increases in cortical response due to attentional engagement in deaf than hearing controls were
recorded over the occipital electrodes and in the left parietal and temporal regions. Third, the over-
all amplitude of N1 was larger over the right than left hemisphere in hearing controls, but larger
over the left than right hemisphere in deaf individuals. VEPs in response to central standards and
targets were instead comparable between groups. In summary, the result of the study by Neville and
Lawson (1987) suggested that deaf can outperform hearing individuals in terms of reactivity (but
not sensitivity) when discriminating the direction of motion for targets presented at peripheral loca-
tions. In addition, because VEP differences emerged in response to both static and moving stimuli
(i.e., standard and targets, respectively) specifically in the condition of attentional engagement to
peripheral locations, Neville and Lawson (1987) concluded that deafness modulates the neural sys-
tem that mediates spatial attention. However, later empirical evidence has shown that a similar N1
modulation can be also documented for target monitored in distributed attention (Armstrong et al.
2002), thus challenging the conclusion that differences between deaf and hearing controls emerge
selectively under conditions of focused attention.
Another study that evaluated discrimination performance in deaf and hearing participants
adopting moving stimuli was conducted by Bosworth and Dobkins (2002a; see also Bosworth and
Dobkins 2002b). These authors evaluated 16 profoundly deaf signers (12 congenital), 10 hearing
signers, and 15 hearing nonsigners in a direction-of-motion discrimination task. Participants were
required to discriminate the direction of motion of coherent moving dots presented among random
moving dots, within a single or multiple displays appearing in one or all the quadrants of the moni-
tor. The coherent motion threshold for each participant was the number of coherently moving dots
that yielded 75% correct discriminations. In addition to the number of presented displays, two other
conditions were manipulated: the presence or absence of endogenous cueing (a 100% predictive
spatial cue, delivered before display presentation) and stimulus duration (200 or 600 ms). Results
showed no overall better performance in deaf than hearing participants when discriminating direc-
tion of motion. Intriguingly, deaf individuals tended to be faster yet less accurate than the other
groups, suggesting a possible speed–accuracy trade-off in deaf but not hearing participants. The
436 The Neural Bases of Multisensory Processes
analyses also revealed that direction-of-motion thresholds were less affected by cueing of attention
in deaf individuals than in hearing controls (regardless of signing abilities). Furthermore, when the
stimuli lasted for 600 ms, performance for the deaf group paradoxically improved with multiple
rather than single displays, unlike hearing participants. Both these findings may indicate better
capture of attention by a discontinuity in a complex visual scene in deaf than hearing participants,
given enough time for the perceptual analysis.
Finally, in a recent study conducted in our laboratory (Bottari et al. 2010), we asked 11 congenital
or early deaf and 11 hearing controls to perform a speeded shape discrimination for visual targets
presented at one of eight possible locations (at 3° or 8° from central fixation). Targets were open circles
lasting for 48 ms and participants were required to discriminate whether the circle was open on the left
or on the right side. The result of this study showed comparable performance between deaf and hear-
ing individuals in terms of the RT measure, even if deaf participants showed numerically faster RTs.
Interestingly, deaf individuals performed worse than hearing controls in terms of accuracy, suggesting
different speed–accuracy trade-off in the deaf group (see also Bosworth and Dobkins 2002a).
In sum, the tasks requiring perceptual discrimination for suprathreshold stimuli did not provide
consistent evidence in support of the notion of enhanced abilities in deaf than in hearing controls.
When adopting static stimuli, better accuracy in deaf individuals compared to hearing controls
have been documented only for discrimination of colour changes (Suchman 1966). Instead, the
studies that required shape discrimination for static visual events failed to show any enhanced
abilities in deaf individuals (Hartung 1970; Bottari et al. 2010). When adopting moving stimuli,
faster RTs in deaf subjects than in hearing participants have been documented only by Neville
and Lawson (1987), selectively for events at peripheral locations. Instead, Bosworth and Dobkins
(2002a) showed an overall comparable performance between deaf and hearing controls when dis-
criminating coherence of motion.
(e.g., target: diamond; distractor: square), or else a neutral shape. Finally, a variable number (0, 1,
3, or 5) of filler shapes was introduced in the empty circular frames of the target ring to manipulate
perceptual load across trials. Participants were instructed to identify the target as quickly as pos-
sible, while ignoring all other distracting shapes. Overall, target identification proved longer for deaf
than hearing participants (Experiment 1: 765 vs. 824 ms, respectively; Experiment 3: 703 vs. 814
ms). All experiments consistently revealed the interfering effect of perceptual load and lateralized
distractors on RT performance. Critically, however, peripheral distractors proved more distracting
for deaf individuals, whereas central ones were more distracting for hearing controls (regardless
of whether they were signers). This led Proksch and Bavelier (2002) to conclude that “the spatial
distribution of visual attention is biased toward the peripheral field after early auditory deprivation”
(p. 699).
A related study was conducted by Sladen and colleagues (2005), using the classic flanker inter-
ference task developed by Eriksen and Eriksen (1974). Ten early deaf (onset before 2 years of age,
all signers) and 10 hearing adults were asked a speeded identification of a letter (H or N) presented
either in isolation (baseline) or surrounded by four response-compatible letters (two on either side;
e.g., HHHHH) or response-incompatible letters (e.g., NNHNN). Letters were presented 0.05°, 1°, or
3° apart from each other. The results showed that letter discrimination was faster in hearing than in
deaf participants in each of the experimental conditions including the baseline (e.g., between 50 and
81 ms difference, for incompatible stimuli), but this was accompanied by more errors in the hearing
group during incompatible trials. Interestingly, the two groups also differed in their performance
with the 1° spacing between target and flankers: the incongruent flanker cost emerged for both
groups, but was larger in deaf than in hearing participants. Again, this finding is compatible with
the notion that deaf individuals may have learned to “focus their visual attention in front of them
in addition to keeping visual resources allocated further out in the periphery” (Sladen et al. 2005,
p. 1536).
The study by Chen et al. (2006), described in Section 22.2.2, also adopted a flanker interfer-
ence paradigm. On each trial, participants were presented with a raw of three horizontally aligned
boxes, of which the central one contained the target and the side ones (arranged 3° on either side)
contained the distractors. The task required a speeded discrimination among four different colors.
Two colors were mapped onto the same response button, whereas the other two colors were mapped
onto a different response button. Simultaneous to target presentation, a flanker appeared in one of
the lateral boxes. The flanker was either identical to the target (thus leading to no perceptual conflict
and no motor response conflict), or different in color with respect to the target but mapped onto the
same motor response (thus leading only to a perceptual conflict) or different in color with respect
to the target and mapped onto a different response than the target (thus leading to perceptual and
motor conflict). Finally, spatial attention to the flanker was modulated exogenously by changing
the thickness and brightness of one of the lateral boxes at the beginning of each trial. Because the
time interval between this lateralized cue and the target was 900 ms, this attentional manipulation
created an IOR effect (see also Colmenero et al. 2004). Overall, color discrimination was compa-
rable between groups in terms of reaction times (see also Heider and Heider 1940). However, the
interference determined by the flankers emerged at different levels (perceptual vs. motor response)
in deaf and hearing participants, regardless of the cueing condition. Hearing participants displayed
a flanker interference effect both for flankers interacting at the perceptual and response levels. In
contrast, deaf participants showed flanker interference effects at the response level, but not at the
perceptual level.
Finally, Dye et al. (2007) asked 17 congenitally deaf and 16 hearing adults to perform a speeded
discrimination about the direction of a central arrow (pointing left or right) presented 1.5° above or
below central fixation and flanked by peripheral distractors (other arrows with congruent or incon-
gruent pointing directions, or neutral lines without arrowheads). A cue consisting of one or two
asterisks presented 400 ms before the onset of the arrows oriented attention to central fixation, to
the exact upcoming arrow location, or to both potential arrow locations (thus alerting for stimulus
438 The Neural Bases of Multisensory Processes
appearance without indicating the exact target location). The findings showed comparable effects
of orienting spatial cues in hearing and deaf individuals, as well as comparable alerting benefits.
Interestingly, when the number of flanker arrows was reduced to 2 and their relative distance from
the central arrow was increased to 1°, 2°, or 3° of visual angle, deaf participants displayed stronger
flanker interference effects in RTs compared to hearing controls.
In sum, the studies that measured allocation of attentional resources in the visual scene using
flanker interference tasks showed larger interference from distractors in deaf than in hearing partic-
ipants (Proksch and Bavelier 2002; Sladen et al. 2005; Chen et al. 2006; Dye et al. 2007). However,
although Proksch and Bavelier (2002) showed enhanced distractor processing in deaf than in hear-
ing adults at 4.2° from central fixation, Sladen et al. (2005) showed enhanced distractor processing
at 1° from central fixation, but comparable distractor processing at 3°. Finally, Dye et al. (2007)
showed increased flanker interference in deaf than in hearing controls regardless of whether the two
distracting items were located at 1°, 2°, or 3° from fixation. These mixed results suggest that some
characteristics of the visual scene and task, other than just the peripheral location of the distractors,
could play a role. These additional characteristics might include the degree of perceptual load, the
amount of crowding, or the relative magnification of the stimuli.
at 8°. On 50% of the trials, the second scene was entirely identical to the first (i.e., no change
occurred), whereas on the other 50% of the trials one drawing in the first scene changed into a dif-
ferent one in the second scene. The participant’s task was to detect whether the change was present
or absent. When comparing two alternating visual scenes, any change is typically detected without
effort because it constitutes a local transient that readily attracts exogenous attention to the location
where the change has occurred (O’Regan et al. 1999, 2000; Turatto and Brigeman 2005). However,
if a blank image is interposed between the two alternating scenes (as in the adopted paradigm),
any single part of the new scene changes with respect to the previous blank image, resulting in a
global rather than local transient. The consequence of this manipulation is that attention is no longer
exogenously captured to the location of change, and the change is noticed only through a strategic
(endogenous) scan of the visual scene (the so-called “change blindness” effect; Rensink 2001). Thus,
the peculiarity of this design was the fact that all local transients related to target change or target
onset were entirely removed. This produced an entirely endogenous experimental setting, which
had never been adopted in previous visual tasks with deaf people (see Bottari et al. 2008 for further
discussion of this point). The result of two studies (Bottari et al. 2008, in preparation) revealed that
sensitivity to the change in deaf and hearing adults was comparable, regardless of change in location
(center or periphery), suggesting that the sensitivity to changes in an entirely transient-free context
is not modulated by deafness. Furthermore, this conclusion was confirmed also when the direction
of endogenous attention was systematically manipulated between blocks by asking participants
to either focus attention to specific regions of the visual field (at 3° or 8°) or to distribute spatial
attention across to the whole visual scene (Bottari et al. 2008). In sum, even visual tasks tapping
on multiple stages of nonlinguistic visual processing (and particularly visual working memory)
do not reveal enhanced processing in deaf than in hearing controls. Once again, the absence of
supranormal performance was documented regardless of the eccentricity of the visual stimulation.
Furthermore, the results of Bottari et al. (2008) indicate that focusing endogenous attention is not
sufficient to determine a between-group difference. It remains to be ascertained whether the latter
results (which is at odds with the behavioral observation of Neville and Lawson 1987 and with the
neural observation of Bavelier et al. 2000) might be the consequence of having removed from the
scene all target-related transients that could exogenously capture the participant’s attention.
A different class of complex visual tasks in which deaf individuals were compared to hearing
controls evaluated speech-reading abilities (also termed lip-reading). Initial studies on speech-read-
ing suggested that this ability was considerably limited in hearing controls (30% words or fewer cor-
rect in sentences according to Rönnberg 1995) and that “the best totally deaf and hearing-impaired
subject often perform only as well as the best subjects with normal hearing” (Summerfield 1991,
p. 123; see also Rönnberg 1995). However, two later contributions challenged this view and clearly
showed that deaf individuals can outperform hearing controls in speech-reading tasks. Bernstein et
al. (2001) asked 72 deaf individuals and 96 hearing controls to identify consonant–vowel nonsense
syllables, isolated monosyllabic words and sentences presented through silent video recordings of a
speaker. The results showed that deaf individuals were more accurate than hearing controls, regard-
less of the type of the verbal material. In agreement with this conclusion, Auer and Bernstein (2007)
showed a similar pattern of results in a study that evaluated identification of visually presented sen-
tences in an even larger samples of deaf individuals and hearing controls (112 and 220, respectively).
It is important to note that both studies did not include deaf individuals who used sign language as
preferential communication mode, thus relating these enhanced lip-reading skills to the extensive
training that deaf individuals had throughout their lives.
For the purpose of the present review, it is important to note that speechreading is a competence
that links linguistic and nonlinguistic abilities. Mohammed and colleagues (2005) replicated the
observation that deaf individuals outperform hearing controls in lip-reading skills. Furthermore,
they showed that the lip-reading performance of deaf individual (but not hearing controls) cor-
related with the performance obtained in a classical coherence motion test (see also Bosworth and
Dobkins 1999; Finney and Dobkins 2001), despite that the overall visual motion thresholds were
440 The Neural Bases of Multisensory Processes
entirely comparable between the two groups (in agreement with what we reported in Section 22.2.1).
In sum, lip-reading is a visual skill that systematically resulted enhanced in deaf individuals com-
pared to hearing controls. Intriguingly, in deaf individuals this skill appears to be strongly intercon-
nected with the ability of perceiving motion in general, supporting the notion that visual motion
perception has a special role in this sensory-deprived population.
associated channel enhancements with endogenous attention selection, but channel selection with
exogenous attention capture (see also Section 22.3.3).
participants will differ in their performance. Better performance in deaf than in hearing participants
has been documented with both central and peripheral stimuli (e.g., see Section 22.2.2). Conversely,
threshold tasks proved ineffective in showing between-group differences, regardless of whether
stimuli were delivered centrally or peripherally. Thus, the question of what exactly is special in the
representation of peripheral stimuli in deaf individuals has not yet been resolved.
One observation relevant to this problem may be the recent finding from our group that the
differential processing for central and peripheral locations in deaf and hearing people emerge
independently from orienting of attention. Bottari et al. (2010) showed no RT cost when process-
ing peripheral than central items in deaf participants, unlike hearing controls. Importantly, this
occurred in a task (simple detection) that requires no selective allocation of attentional resources
(Bravo and Nakayama 1992). This implies a functional enhancement for peripheral portions of the
visual field that cannot be reduced to the differential allocation of attentional resources alone (see
also Stevens and Neville 2006 for related evidence). Because the cost for peripheral than central
processing in hearing controls is classically attributed to the amount of visual neurons devoted to
the analysis of central than peripheral portion of the visual field (e.g., Marzi and Di Stefano 1981;
Chelazzi et al. 1988), it can be hypothesized that profound deafness can modify the relative propor-
tion of neurons devoted to peripheral processing or their baseline activity. Note that assuming a dif-
ferent neural representation of the peripheral field also has implication for studies that examined the
effects of peripheral flankers on central targets (e.g., Proksch and Bavelier 2002; Sladen et al. 2005),
that is, it suggests that the larger interference from peripheral flankers in deaf individuals could at
least partially result from enhanced sensory processing of these stimuli, rather than attentional bias
to the periphery (similar to what would be obtained in hearing controls by simply changing the size
or the saliency of the peripheral flanker).
The final important aspect to consider is the role of selective attention in enhanced visual abili-
ties of deaf individuals. Our review of the literature concurs with the general hypothesis that deaf-
ness somehow modulates selective visual attention (e.g., Parasnis 1983; Neville and Lawson 1987;
Bavelier et al. 2006; Mitchell and Maslin 2007). However, it also indicates that any further devel-
opment of this theoretical assumption requires a better definition of which aspects of selective
attention are changed in this context of cross-modal plasticity. To date, even the basic distinction
between exogenous and endogenous processes have largely been neglected. If this minimal distinc-
tion is applied, it appears that endogenous orienting alone does not necessarily lead to better behav-
ioral performance in deaf than in hearing controls. This is, first of all, illustrated by the fact that
endogenous cueing of spatial attention (e.g., using a central arrow, as Parasnis and Samar 1985 have
done) can produce similar validity effects in deaf and hearing individuals. Furthermore, a recent
study by Bottari et al. (2008), which examined endogenous orienting of attention in the absence of
the exogenous captures induced by target onset, revealed no difference whatsoever between deaf
and hearing participants, regardless of whether attention was focused to the center, focused to the
periphery, or distributed across the entire visual scene. By contrast, several lines of evidence suggest
that the exogenous component of selective attention may be more prominent in deaf than in hearing
people. First, studies that have adapted the cue–target paradigm have shown more efficient detection
in deaf than in hearing adults, when the target occurs in a location of the visual field that have been
made unattended (i.e., invalid; see Parasnis and Samar 1985; Colmenero et al. 2004, Experiment
1; Bosworth and Dobkins 2002a). Second, paradigms that adopted an SOA between cue and target
that can lead to IOR also revealed that deaf participants are less susceptible to this attention manipu-
lation and respond more efficiently to targets appearing at the supposed inhibited location with
respect to controls (e.g., Colmenero et al. 2004, Experiment 2). Finally, deaf participants appear
to be more distracted than hearing controls by lateralized flankers that compete with a (relatively)
more central target (Dye et al. 2008; Proksch and Bavelier 2002; Sladen et al. 2005), as if the flanker
onset in the periphery of the visual field can capture exogenous attention more easily.
In the literature on visual attention in deaf individuals, the latter three findings have been inter-
preted within the spotlight metaphor for selective attention (Posner 1980), assuming faster shifts of
Visual Abilities in Individuals with Profound Deafness 443
visual attention (i.e., faster reorienting) in deaf than in hearing participants. However, this is not the
only way in which attention can be conceptualized. A well-known alternative to the spotlight meta-
phor of attention is the so-called gradient metaphor (Downing and Pinker 1985), which assumes a
peak of processing resources at the location selected (as a result of bottom-up or top-down signals)
as well as a gradual decrease of processing resources as the distance from the selected location
increases. Within this alternative perspective, the different performance in deaf participants during
the attention tasks (i.e., enhanced response to targets at the invalid locations, or more interference
from lateralized flankers) could reflect a less steep gradient of processing resources in the profoundly
deaf. Although it is premature to conclude in favor of one or the other metaphor of selective atten-
tion, we believe it is important to consider the implications of assuming one instead of the other. For
instance, the gradient metaphor could provide a more neurally plausible model of selective atten-
tion. If one assumes that reciprocal patterns of facilitation and inhibition in the visual cortex can
lead to the emergence of a saliency map that can contribute to the early filtering of bottom-up inputs
(e.g., Li 2002), the different distribution of exogenous selective attention in deaf individuals could
represent a modulation occurring at the level of this early saliency map. Furthermore, assuming a
gradient hypothesis may perhaps better reconcile the results obtained in the studies that adopted the
cue–target and flanker paradigms in deaf individuals, with the results showing more efficient visual
search pattern in this population. Within the gradient perspective, better visual search for simple
features or faster detection of targets at invalidly cued locations could both relate to more resources
for preattentive detection of discontinuities in deaf individuals.
at risk of remaining tautological redefinitions of the empirical findings. As discussed above for the
example of selective attention, even a minimal description of which aspects of selective attention
may be changed by profound deafness or a basic discussion about of the theoretical assumptions
underlying the notion of selective attention can already contribute to the generation of novel predic-
tions for empirical research.
ACKNOWLEDGMENTS
We thank two anonymous reviewers for helpful comments and suggestions on an earlier version of
this manuscript. We are also grateful to Elena Nava for helpful comments and discussion. This work
was supported by a PRIN 2006 grant (Prot. 2006118540_004) from MIUR (Italy), a grant from
Comune di Rovereto (Italy), and a PAT-CRS grant from University of Trento (Italy).
REFERENCES
Armstrong, B., S. A. Hillyard, H. J. Neville, and T. V. Mitchell. 2002. Auditory deprivation affects processing
of motion, but not colour. Brain Research Cognitive Brain Research 14: 422–434.
Auer, E. T., Jr., and L. E. Bernstein. 2007. Enhanced visual speech perception in individuals with early-onset
hearing impairment. Journal of Speech Language and Hearing Research 50(5):1157–1165.
Bavelier, D., C. Brozinsky, A. Tomman, T. Mitchell, H. Neville, and G. H. Liu. 2001. Impact of early deaf-
ness and early exposure to sign language on the cerebral organization for motion processing. Journal of
Neuroscience 21: 8931–8942.
Bavelier, D., M. W. G. Dye, and P. C. Hauser. 2006. Do deaf individuals see better? Trends in Cognitive Science
10: 512–518.
Bavelier, D., A. Tomann, C. Hutton, T. V. Mitchell, D. P. Corina, G. Liu, and H. J. Neville. 2000. Visual atten-
tion to the periphery is enhanced in congenitally deaf individuals. Journal of Neuroscience 20: 1–6.
Bernstein, L. E., M. E. Demorest, and P. E. Tucker. 2000. Speech perception without hearing. Perception &
Psychophysics 62: 233–252.
Bernstein, L. E., E. T. Auer Jr., and P. E. Tucker. 2001. Enhanced speechreading in deaf adults: Can short-term
training/practice close the gap for hearing adults? Journal of Speech, Hearing, and Language Research
44: 5–18.
Bosworth, R. G., and K. R. Dobkins. 1999. Left-hemisphere dominance for motion processing in deaf signers.
Psychological Science 10: 256–262.
Bosworth, R. G., and K. R. Dobkins. 2002a. The effect of spatial attention on motion processing in deaf signers,
hearing signers, and hearing nonsigners. Brain and Cognition 4: 152–169.
Bosworth, R. G., and K. R. Dobkins. 2002b. Visual field asymmetries for motion processing in deaf and hearing
signers. Brain and Cognition 4: 152–169.
Bottari, D., M. Turatto, F. Bonfioli, C. Abbadessa, S. Selmi, M. A. Beltrame, and F. Pavani. �������������������
2008. Change blind-
ness in profoundly deaf individuals and cochlear implant recipients. Brain Research 1242: 209–218.
Bottari, D., E. Nava, P. Ley, and F. Pavani. 2010. Enhanced reactivity to visual stimuli in deaf individuals.
Restorative Neurology and Neuroscience 28: 167–179.
Bottari, D., M. Turatto, and F. Pavani. In preparation. Visual change perception and speeded simple detection
in profound deafness.
Bravo, M. Y., and K. Nakayama. 1992. The role of attention in different visual search tasks. Perception &
Psychophysics 51: 465–472.
Bross, M. 1979a. Residual sensory capacities of the deaf: A signal detection analysis of a visual discrimination
task. Perceptual Motor Skills 1: 187–194.
Bross, M. 1979b. Response bias in deaf and hearing subjects as a function of motivational factors. Perceptual
Motor Skills 3: 779–782.
Bross, M., and H. Sauerwein. 1980. Signal detection analysis of visual flicker in deaf and hearing individuals.
Perceptual Motor Skills 51: 839–843.
Brozinsky, C. J., and D. Bavelier. 2004. Motion velocity thresholds in deaf signers: Changes in lateralization
but not in overall sensitivity. Brain Research Cognitive Brain Research 21: 1–10.
Chelazzi, L., C. A. Marzi, G. Panozzo, N. Pasqualini, G. Tassi�������������������������������������������������
nari, and L.�������������������������������������
Tomazzoli. �������������������������
1988. Hemiretinal differ-
ence in speed of light detection esotropic amblyopes. Vision Research 28(1): 95–104.
Visual Abilities in Individuals with Profound Deafness 445
Chen, Q., M. Zhang, and X. Zhou. 2006. Effects of spatial distribution of attention during inhibition of return
IOR on flanker interference in hearing and congenitally deaf people. Brain Research 1109: 117–127.
Cohen, L. G., P. Celnik, A. Pascual-Leone, B. Corwell, L. Falz, J. Dambrosia, J. et al. 1997. Functional rel-
evance of cross-modal plasticity in blind humans. Nature 389: 180–183.
Colmenero, J. M., A. Catena, L. J. Fuentes, and M. M. Ramos. 2004. Mechanisms of visuo-spatial orienting in
deafness. European Journal Cognitive Psychology 16: 791–805.
Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain.
Nature Review Neuroscience 3: 201–21
Doehring, D. G., and J. Rosenstein. 1969. Speed of visual perception in deaf children. Journal of Speech and
Hearing Research 12:118–125.
Downing, C. J., and S. Pinker. 1985. The spatial structure of visual attention. In Attention and Performance
Posner, ed. M. I. Posner and O. S. M. Marin, 171–187. Hillsdale, NJ: Erlbaum.
Dye, M. W., P. C. Hauser, and D. Bavelier. 2008. Visual skills and cross-modal plasticity in deaf readers:
Possible implications for acquiring meaning from print. Annals of the New York Academy of Science
1145: 71–82.
Dye, M. W. G., D. E. Baril, and D. Bavelier. 2007. Which aspects of visual attention are changed by deafness?
The case of the Attentional Network Test. Neuropsychologia 45: 1801–1811.
Enns, J. T., and R. A. Rensink. 1991. Preattentive recovery of three-dimensional orientation from line-draw-
ings. Psychological Review 98: 335–351.
Eriksen, B. A., and C. W. Eriksen. 1974. Effects of noise letters upon the identification of a target letter in a
nonsearch task. Perception & Psychophysics 16: 143–149.
Fine, I., E. M. Finney, G. M. Boynton, and K. R. Dobkins. 2005. Comparing the effects of auditory depriva-
tion and sign language within the auditory and visual cortex. Journal of Cognitive Neuroscience 17:
1621–1637.
Finney, E. M., and K. R. Dobkins. 2001. Visual contrast sensitivity in deaf versus hearing populations: explor-
ing the perceptual consequences of auditory deprivation and experience with a visual language. Cognitive
Brain Research 11(1): 171–183.
Finney, E. M., I. Fine, and K. R. Dobkins. 2001. Visual stimuli activate auditory cortex in the deaf. Natural
Neuroscience 4(12): 1171–1173.
Finney, E. M., B. A. Clementz, G. Hickok, and K. R. Dobkins. 2003. Visual stimuli activate auditory cortex in
deaf subjects: Evidence from MEG. Neuroreport 11: 1425–1427.
Furth, H. 1966. Thinking without language: Psychological implications of deafness. New York: Free
Press.
Suchman, R. G. 1966. Color–form preference, discriminative accuracy and learning of deaf and hearing chil-
dren. Child Development 37(2): 439–451.
Gibson, E. 1969. Principles of perceptual learning and development. New York: Meredith.
Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley.
Harrington, D. O. 1971. The visual fields. St. Louis, MO: CV Mosby.
Hartmann, G. W. 1933. Changes in visual acuity through simultaneous stimulation of other sense organs.
Journal of Experimental Psychology 16:393–407.
Hartung, J. E. 1970. Visual perceptual skill, reading ability, and the young deaf child. Exceptional Children
36(8): 603–638.
Hauser, P. C., M. W. G. Dye, M. Boutla, C. S. Gree, and D. Bavelier. 2007. Deafness and visual enumeration:
Not all aspects of attention are modified by deafness. Brain Research 1153: 178–187.
Heider, F., and G. Heider. 1940. Studies in the psychology of the deaf. Psychological Monographs 52: 6–22.
Heffner R. S., and H. E. Heffner. 1992. Visual factors in sound localization in mammals. Journal of Comparative
Neurology. 317(3): 219–32.
Heming, J. E., and L. N. Brown. 2005. Sensory temporal processing in adults with early hearing loss. Brain
and Cognition 59: 173–82.
Henderson, S. E., and L. Henderson. 1973. Levels of visual-information processing in deaf and hearing chil-
dren. American Journal of Psychology 86(3): 507–521.
Hoemann, H. 1978. Perception by the deaf. In Handbook of perception: Perceptual ecology, vol. 10, ed.
E. Carterette and M. Friedman, 43–64. NewYork: Academic Press.
Jonides, J. 1981. Voluntary versus automatic control over the mind’s eye’s movement. In Attention and perfor-
mance, Vol. IX, ed. L. B. Long and A. D. Baddeley, 187–203. Hillsdale, NJ: Erlbaum.
Jordan, T. E. 1961. Historical notes on early study of the deaf. Journal of Speech Hearing Disorders 26:118–121.
Klein, R. M. 2000. Inhibition of return. Trends in Cognitive Science 4: 138–147.
446 The Neural Bases of Multisensory Processes
Levanen, S., and D. Hamdorf. 2001. Feeling vibrations: Enhanced tactile sensitivity in congenitally deaf
humans. Neuroscience Letters 301: 75–77.
Li, Z. 2002. A saliency map in primary visual cortex. Trends in Cognitive Science 1: 9–16.
Loke, W. H., and S. Song. 1991. Central and peripheral visual processing in hearing and nonhearing individu-
als. Bulletin of the Psychonomic Society 29: 437–440.
Marendaz, C., C. Robert, and F. Bonthoux. 1997. Deafness and attentional visual search: A developmental
study. Perception A: 26.
Marzi, C. A., and M. Di Stefano. 1981. Hemiretinal differences in visual perception. Documenta Ophthalmologica
Proceedings Series 30: 273–278.
Mayer, A. R., J. M. Dorflinger, S. M. Rao, and M. Seidenberg. 2004. Neural networks underlying endogenous
and exogenous visual–spatial orienting. Neuroimage 2: 534–541
Milner, A. D., and M. A. Goodale. 1995. The visual brain in action. Oxford, UK: Oxford Univ. Press.
Mitchell, R. E., and M. A. Karchmer. 2002. Demographics of deaf education: More students in more places.
American Annals of the Deaf 151(2, issue 2006): 95–104. Washington, DC: Gallaudet Univ. Press.
Mitchell, T., and M. T. Maslin. 2007. How vision matters for individuals with hearing loss. International
Journal of Audiology 46(9): 500–511.
Mohammed T., R. Campbell, M. MacSweeney, E. Milne, P. Hansen, and M. Coleman. 2005. Speechreading
skill and visual movement sensitivity are related in deaf speechreaders. Perception 34: 205–216.
Myklebust, H. 1964. The psychology of deafness. New York: Grune and Stratton.
Näätänen, R. 1992. Attention and brain function. Hillsdale, NJ: Erlbaum.
Nava, E., D. Bottari, M. Zampini, and F. Pavani. 2008. Visual temporal order judgment in profoundly deaf
individuals. Experimental Brain research 190(2): 179–188.
Neville, H. J., and D. S. Lawson. 1987. Attention to central and peripheral visual space in a movement detec-
tion task: an event related potential and behavioral study: II. Congenitally deaf adults. Brain Research
405: 268–283.
Neville, H. J., and D. Bavelier. 2002. Human brain plasticity: Evidence from sensory deprivation and altered
language experience. Progress in Brain Research 138: 177–188.
Neville, H. J., A. Schmidt, and M. Kutas. 1983. Altered visual-evoked potentials in congenitally deaf adults.
Brain Research 266(1): 127–132.
O’Regan, J. K., H. Deubel, J. J. Clark, and R. A. Rensink. 2000. Picture changes during blinks: Looking with-
out seeing and seeing without looking. Visual Cognition 7: 191–212.
O’Regan, J. K., R. A. Rensink, and J. J. Clark. 1999. Change-blindness as a result of “mudsplashes.” Nature
398: 34.
Olson, J. R. 1967. A factor analytic study of the relation between the speed of visual perception and the lan-
guage abilities of deaf adolescents. Journal of Speech and Hearing Research 10(2): 354–360.
Parasnis, I. 1983. Visual perceptual skills and deafness: A research review. Journal of the Academy of
Rehabilitative Audiology 16: 148–160.
Parasnis, I., and V. J. Samar. 1985. Parafoveal attention in congenitally deaf and hearing young adults. Brain
and Cognition 4: 313–327.
Parasnis, I., V. J. Samar, and G. P. Berent. 2003. Deaf adults without attention deficit hyperactivity disorder
display reduced perceptual sensitivity and elevated impulsivity on the Test of Variables of Attention
T.O.V.A. Journal of Speech Language and Hearing Research 5: 1166–1183.
Poizner, H., and P. Tallal. 1987. Temporal processing in deaf signers. Brain and Language 30: 52–62.
Posner, M. 1980. Orienting of attention. The Quarterly Journal of Experimental Psychology 32: 3–25.
Prinzmetal, W., C. McCool, and S. Park. 2005. Attention: Reaction time and accuracy reveal different mecha-
nisms. Journal of Experimental Psychology: General 134: 73–92.
Prinzmetal, W., A. Zvinyatskovskiy, P. Gutierrez, and L. Dilem. 2009. Voluntary and involuntary attention
have different consequences: The effect of perceptual difficulty. The Quarterly Journal of Experimental
Psychology 2: 352–369.
Proksch, J., and D. Bavelier. 2002. Changes in the spatial distribution of visual attention after early deafness.
Journal of Cognitive Neuroscience 14: 687–701.
Pylyshyn, Z.W. 1989. The role of location indexes in spatial perception: A sketch of the FINST spatial-index
model. Cognition 32: 65–97.
Quittner, A. L., P. Leibach, and K. Marciel. 2004. The impact of cochlear implants on young deaf children:
New methods to assess cognitive and behavioral development. Archives of Otolaryngology Head Neck
and Surgery 5: 547–554.
Rensink, R. A. 2001. Change blindness: Implications for the nature of attention. In Vision and Attention, ed.
M. R. Jenkin and L. R., 169–188. New York: Springer.
Visual Abilities in Individuals with Profound Deafness 447
Rettenbach, R., G. Diller, and R. Sireteanu. 1999. Do deaf people see better? Texture segmentation and visual
search compensate in adult but not in juvenile subjects. Journal of Cognitive Neuroscience 5: 560–583.
Reynolds, H. 1993. Effects of foveal stimulation on peripheral visual processing and laterality in deaf and hear-
ing subjects. American Journal of Psychology 106(4): 523–540.
Rönnberg, J. 1995. Perceptual compensation in the deaf and blind: Myth or reality? In Compensating for psy-
chological deficits and declines, ed. R. A. Dixon and L. Backman, 251–274. Mahwah, NJ: Erlbaum.
Sagi, D., and B. Julesz. 1984. Detection versus discrimination of visual orientation. Perception 13(5):
619–628.
Sladen, D., A. M. Tharpe, D. H. Ashmead, D. W. Grantham, and M. M. Chun. 2005. Visual attention in deaf
and normal hearing adults: effects of stimulus compatibility. Journal of Speech Language and Hearing
Research 48: 1–9.
Stevens, C., and H. Neville. 2006. Neuroplasticity as a double-edged sword: Deaf enhancements and dyslexic
deficits in motion processing. Journal of Cognitive Neuroscience 18: 701–714.
Stivalet, P., Y. Moreno, J. Richard, P. A. Barraud, and C. Raphael. 1998. Differences in visual search tasks
between congenitally deaf and normally hearing adults. Brain Research Cognitive Brain Research 6:
227–232.
Suchman, R. G. 1966. Color–form preference, discriminative accuracy and learning of deaf and hearing chil-
dren. Child Development 2: 439–451.
Summerfield, Q. 1991. Visual perception of phonetic gestures. In Modularity and the motor theory of speech
perception, ed. I. G. Mattingly and M. Studdert-Kennedy, 117–137. Hillsdale, NJ: Erlbaum.
Treisman, A. 1982. Perceptual grouping and attention in visual search for features and for objects. Journal of
Experimental Psychology Human Perceptual Performance 2: 94–214.
Turatto, M., and B. Brigeman. 2005. Change perception using visual transients: Object substitution and dele-
tion. Experimental Brain Research 167: 595–608.
Turatto, M., M. Valsecchi, L. Tamè, and E. Betta. 2007. Microsaccades distinguish between global and local
visual processing. Neuroreport 18:1015–1018.
23 A Multisensory Interface for
Peripersonal Space
Body–Object Interactions
Claudio Brozzoli, Tamar R. Makin, Lucilla Cardinali,
Nicholas P. Holmes, and Alessandro Farnè
CONTENTS
23.1 Multisensory and Motor Representations of Peripersonal Space.......................................... 449
23.1.1 Multisensory Features of Peripersonal Space: Visuo- Tactile Interaction around
the Body.....................................................................................................................449
23.1.1.1 Premotor Visuo-Tactile Interactions........................................................... 450
23.1.1.2 Parietal Visuo-Tactile Interactions.............................................................. 450
23.1.1.3 Subcortical Visuo-Tactile Interaction......................................................... 451
23.1.1.4 A Visuo-Tactile Network............................................................................ 452
23.1.1.5 Dynamic Features of PpS Representation.................................................. 452
23.1.2 Motor Features of PpS: Visuo-Tactile Interaction around the Acting Body............. 452
23.1.3 A Multisensory–Motor Network for Body–Object Interactions in PpS.................... 454
23.2 Multisensory-Based PpS Representation in Humans............................................................ 455
23.2.1 PpS Representation in Humans................................................................................. 455
23.2.1.1 PpS Representation in Neuropsychological Patients.................................. 455
23.2.1.2 PpS Representation in Neurotypical Participants....................................... 456
23.2.2 A Multisensory Interface for Body–Objects Interactions......................................... 458
23.3 Conclusion.............................................................................................................................460
Acknowledgments...........................................................................................................................460
References.......................................................................................................................................460
449
450 The Neural Bases of Multisensory Processes
purposes, is that, in addition to responding both to visual and tactile stimulation (referred to here
as visuo-tactile), their visually evoked responses are modulated by the distance between the visual
object and the tactile receptive field (RF). This allows for the coding of visual information that is
dependent, or centered, on the body part that contains the tactile RF.
* A possibly earlier report can be attributed to Sakata and colleagues (1973, p. 100). In this study about the functional
organization of area 5, the authors stated: “Even the relatively rare neurons which we could activate visually were more
powerfully driven by somatosensory stimuli.” However, no further detail or discussion was offered concerning the limi-
tation in depth of the visual RF.
Peripersonal Space 451
(a)
1 2 3 4
Tactile RF
(c)
= Arm right
70 = Arm left
50
(b)
1 2 3 4 40
30
20
10
0
1 2 3 4
Stimulus trajectory
FIGURE 23.1 Representation of visual stimuli in hand-based coordinates. Visual responses of a typical premo-
tor neuron with a tactile RF (hatched) on forearm and hand, and a visual RF within 10 cm of tactile RF. On each
trial, the arm contralateral to neuron was fixed in one of two positions: (a) on the right (light gray symbols and
lines) or (b) on the left (dark gray symbols and lines) and visual stimulus was advanced along one of four trajec-
tories (numbered 1–4). (c) Responses of neuron to four stimulus trajectories when the arm was visible to the mon-
key were recorded for both positions. When the arm was fixed on the right, response was maximal for trajectory
3, which was approaching the neuron’s tactile RF. When the arm was fixed on the left, maximal response shifted
with the hand to trajectory 2, which was now approaching the tactile RF. This example shows that neurons in the
monkey’s premotor cortex represent visual information with respect to the tactile RF. (Modified from Graziano,
M. S. A. In Proceedings of the National Academy of Sciences of the United States of America, 1999.)
(Leinonen et al. 1979). That is, these neurons’ activation was shown to be dependent on the distance
of the effective visual stimulus from the body part. Most of these neurons responded to visual
stimuli moving toward the monkey, within about 10 cm of the tactile RF (although in some cases,
stimulation presented further away, but still within a reachable distance, was also effective).
Multisensory neurons have also been found in the monkey area VIP, in the fundus of the intrapa-
rietal sulcus (Avillac et al. 2005; Colby and Duhamel 1991; Colby et al. 1993; Duhamel et al. 1998).
VIP neurons respond to tactile and visual stimulation presented within a few centimeters of the
tactile RF. Unlike area 7b neurons, tactile RFs in VIP are primarily located on the face and head,
and visual RFs are anchored to a region of space around the face (Colby et al. 1993).
responsive to visual stimuli, as long as they are presented close to the tactile RF. A large portion
(82%) of face neurons responds best to visual stimuli presented in a region of space within 10–20 cm
from the tactile RF. Neurons with tactile RFs on the arm and hand present even more shallow visual
RFs around the hand (up to 5 cm; Graziano and Gross 1993).
23.1.2 Motor Features of PpS: Visuo-Tactile Interaction around the Acting Body
Why should the brain maintain a representation of the space around the body separate from a
representation of far extrapersonal space? One possibility is that this dichotomy stems purely from
perceptual aims, giving a “greater” perceptual salience to visual events occurring in the vicinity of
the body. Following this idea, the parieto-frontal network, together with the putamen, would code
visual space with individual body parts as its reference. This is suggested by the sensory properties
of this set of neurons, responding selectively for visual information close to the body. However, we
believe that this interpretation does not fully describe the potential functional applications of this
Peripersonal Space 453
system, since it does not correspond with some of the evidence described above. First, it may be dif-
ficult to interpret the complex tactile RFs of some of these neurons (e.g., single neurons in area F4
that represent both the hand and face, as reported by Rizzolatti et al. 1981a, 1981b). Second, it does
not account for the dynamic changes in their visual RFs, as observed in cases of objects approach-
ing the body (Fogassi et al. 1996). More critically, a purely perceptual account does not fit with the
presence of such bimodal neurons in a predominantly “motor” area, such as the premotor cortex.
Numerous visuo-tactile neurons in inferior area 6 (Gentilucci et al. 1988; Rizzolatti et al. 1981c,
1987, 1988, 1997; Rizzolatti and Gentilucci 1988), parietal areas 7b (Hyvärinen 1981; Hyvärinen
and Poranen 1974; Hyvärinen and Shelepin 1979; Leinonen 1980; Leinonen et al. 1979; Leinonen
and Nyman 1979; Robinson et al. 1978), and the putamen (Crutcher and DeLong 1984) respond not
only to passive visual and tactile stimulation, but also during motor activity.
These findings raise the more compelling possibility that the multisensory representation of PpS
serves some motor function. Objects in the vicinity of the body are indeed more relevant by virtue
of the possible interactions our body can establish with them (Graziano et al. 1993; Rizzolatti et
al. 1997, 1998). Therefore, hand-centered representation of PpS provides us with extremely valu-
able information regarding the spatial position of objects with respect to our hands. Here follows a
description of the motor aspects associated with PpS brain areas, as revealed by electrophysiologi-
cal studies in macaque monkeys.
The premotor cortex has both direct (Martino and Strick 1987) and indirect (Godschalk et al.
1984; Matsumura and Kubota 1979; Muakkassa and Strick 1979; Pandya and Vignolo 1971) access
to the control of upper limb movements, via projections to the spinal cord and the primary motor
cortex, respectively. The motor properties of neurons in the inferior premotor cortex support a
role for this structure in a perception–action interface. In particular, the visual responses of some
neurons within this area are enhanced when a reaching movement is performed toward an object
(Godschalk et al. 1985), as well as during reaching and grasping movements of the arm and hand
(Godschalk et al. 1981, 1985; Kurata et al. 1985; Kurata and Tanji 1986; Rizzolatti and Gentilucci
1988) and mouth (Rizzolatti et al. 1981c). Moreover, neurons in this area show a rather precise degree
of motor representation. Proximal and distal movements are represented separately (in areas F4/F1
and area F5, respectively), with the proximal neurons mostly activated for arm and face movements.
(Gentilucci et al. 1988; Kurata and Tanji 1986; Murata et al. 1997; Raos et al. 2006; Rizzolatti et al.
1987, 1988; Rizzolatti and Gentilucci 1988). Crucially, the passive RFs and the active movements
appear to share related functional roles: neurons with visuo-tactile RFs on the face also discharged
during arm reaching movements toward the upper part of space that corresponds to its visual RF.
This suggests that the sensory and motor responses are expressed in a common reference frame for
locating objects in the space close to the body and for guiding movements toward them. We believe
that such a complex motor mechanism cannot subserve a purely perceptual function.
Parietal area 7b also has motor properties. As in the premotor cortex, parietal motor functions
seem to be related to approaching movements of a body part toward an object (Gardner et al. 2007;
Lacquaniti and Caminiti 1998; Rizzolatti et al. 1997). Indeed, the posterior parietal cortex is part of
the dorsal stream of action-oriented visual processing (Milner and Goodale 1995), and both inferior
and superior parietal lobules are interconnected with the premotor cortex (see above).
Ablation and reversible inactivation studies in monkeys have shown a direct relationship between
the PpS network and motor responses. These studies tested for the behavioral consequences of a
lesion within premotor and posterior parietal areas, where visuo-tactile neurons have been found.
Interestingly, lesions to both the anterior or posterior parts of this network seem to produce very
similar patterns of motor impairments, most of which affect, in particular, the execution of visually
guided reaching actions (Battaglini et al. 2002; Deuel and Regan 1985; Ettlinger and Kalsbeck 1962;
Faugier-Grimaud et al. 1978; Gallese et al. 1994; Halsban and Passingham 1982; Moll and Kuypers
1977; Rizzolatti et al. 1983). After premotor ablation, for instance, the monkeys were unable to
reach when the movement required the monkey to avoid an obstacle with the contralesional arm.
Arm movements were executed without correctly taking into account visual information within
454 The Neural Bases of Multisensory Processes
PpS (Battaglini et al. 2002; Moll and Kuypers 1977). Similarly, removal of postarcuate regions
in the premotor cortex where the mouth is represented (presumably in area F4), caused a severe
impairment in grasping with the mouth (Rizzolatti et al. 1983). Attentional deficits have also been
reported after selective damage to visuo-tactile parietal and premotor regions (Rizzolatti et al. 1983)
in the form of spatial hemineglect and extinction. The monkeys appeared to be unaware of visual
(or tactile) stimuli presented in the contralesional space. Crucially, this deficit was selective for the
space around the body.
Subregion F5 of the inferior area 6 is also characterized by the presence of “mirror” neurons, a
special class of motor neurons with visual properties. These neurons are selective for the execution
of a specific motor act, such as precision grasping. They also discharge when the monkey observes
another monkey or a human executing the same action (di Pellegrino et al. 1992; Gallese et al. 1996;
Rizzolatti et al. 1996).* Relevant for this chapter is a recent study that showed selectivity in certain
mirror neurons for actions performed within the observer’s PpS rather than in its extrapersonal
space (peripersonal mirror neurons, Caggiano et al. 2009). A different subpopulation of mirror neu-
rons showed the opposite preference (i.e., selectivity for actions performed in extrapersonal space,
rather than PpS). Moreover, peripersonal and extrapersonal space appeared to be defined according
to a functional criterion: When accessibility to PpS was limited (e.g., by placing a screen in front of
the monkey), the responses of several peripersonal mirror neurons were reduced during observation
of actions performed in the inaccessible portion of the space. That is, when PpS was inaccessible
for action, it has been represented as farther extrapersonal space. Indeed, in such circumstances,
extrapersonal mirror neurons started to respond to observation of actions performed in the inacces-
sible PpS.
* A first report of neurons responding while the monkey was watching an action performed by another individual is
already present in an early electrophysiological study over the parietal area 7b (Leinonen 1980, p. 305) : “[…] two cells
discharged when the monkey grasped an object […] or when the monkey saw an investigator grasp an object.”
Peripersonal Space 455
mostly correlated with reaching and grasping movements (see Section 23.1.2). The two hypotheses
(involuntary and voluntary object-oriented actions) are not mutually exclusive and one could specu-
late that a fine-grained and sophisticated function could have developed from a more primordial
defensive machinery, using the same visuo-tactile spatial coding of the PpS (see the “neuronal
recycling hypothesis,” as proposed by Dehaene 2005). This hypothetical evolutionary advancement
could lead to the involvement of the PpS mechanisms in the control of the execution of voluntary
actions toward objects. Some comparative data showed, for instance, that prosimian sensory areas
corresponding to the monkeys’ parietal areas already present some approximate motor activity. The
most represented movements are very stereotyped limb retractions that are associated with avoid-
ance movements (Fogassi et al. 1994).
FIGURE 23.2 Peripersonal space representation. Head- and hand-centered peripersonal space (dark gray
areas) with respect to reaching space (light gray region). (Modified from Cardinali, L. et al., In Encyclopedia
of Behavioral Neuroscience, 2009b.)
456 The Neural Bases of Multisensory Processes
1994). A number of studies have shown that extinction can emerge when concurrent stimuli are pre-
sented in different sensory modalities: A visual stimulus presented near to the ipsilesional hand can
extinguish a touch delivered on the contralesional hand (di Pellegrino et al. 1997; see also Costantini
et al. 2007, for an example of cross-modal extinction within a hemispace). Crucially, such cross-
modal visuo-tactile extinction appears to be stronger when visual stimuli are presented in near
as compared to far space, thus providing neuropsychological support for the idea that the human
brain represents PpS through an integrated visuo-tactile system. Moreover, in accordance with the
findings from the electrophysiological studies described in the previous section, visual responses
to stimuli presented near the patient’s hand remain anchored to the hand when it is moved to the
opposite hemispace. This evidence suggests that PpS in humans is also coded in a hand-centered
reference frame (di Pellegrino et al. 1997; Farnè et al. 2003). A converging line of evidence suggests
that the space near the human face is also represented by a multisensory mechanism. We demon-
strated that visuo-tactile extinction can occur by applying visual and tactile stimuli on the patient’s
face (Farnè et al. 2005b). Interestingly, the extinction was strongest when the homologous body part
was being stimulated (i.e., left and right cheeks, rather than left hand and right cheek), suggesting
that different spatial regions, adjacent to different body parts, are represented separately (Farnè et
al. 2005b). In a further study, we presented four extinction patients with visual stimuli near and far
from the experimenter’s right hand, as well as from their own right hands (Farnè et al., unpublished
data). Although the visual stimulus presented near the patients’ hands successfully extinguished the
touch on the patients’ left hand, no cross-modal extinction effect was found to support a possible
body-matching property of the human PpS system. This discrepancy with the evidence reported
in the electrophysiological literature might stem from the fact that we used a more radical change
in orientation between the observer’s own and the observed hands (more than 35°; see Section
23.1.1). Finally, we have shown that the human PpS also features plastic properties, akin to those
demonstrated in the monkey: Visual stimuli presented in far space induced stronger cross-modal
extinction after the use of a 38-cm rake to retrieve (or act upon) distant objects (Farnè and Làdavas
2000; see also Berti and Frassinetti 2000; Bonifazi et al. 2007; Farnè et al. 2005c, 2007; Maravita
and Iriki 2004). The patients’ performance was evaluated before tool use, immediately after a 5-min
period of tool use, and after a further 5- to 10-min resting period. Far visual stimuli were found to
induce more severe contralesional extinction immediately after tool use, compared with before tool
use. These results demonstrate that, although near and far spaces are separately represented, this
spatial division is not defined a priori. Instead, the definition of near and far space may be derived
functionally, depending on movements that allow the body to interact with objects in space.*
* We have recently studied the effects of tool use on the body schema (Cardinali et al. 2009c). We have found that the rep-
resentation of the body has been dynamically updated with the use of the tool. This dynamic updating of the body schema
during action execution may serve as a sort of skeleton for PpS representation (for a critical review of the relationship
between human PpS and body schema representations, see Cardinali et al. 2009a).
Peripersonal Space 457
2007b, 2008), with some differences in results as compared to studies conducted in neurological
patients, as described above (see also Maravita et al. 2002).
Evidence for the existence of multisensory PpS is now accumulating from neuroimaging studies
in healthy humans. These new studies provide further support for the homologies between some
of the electrophysiological evidence reviewed above and the PpS neural mechanisms in the human
brain. Specifically, brain areas that represent visual and tactile information on and near to the hand
and face in body-centered coordinates have been reported to be the anterior section of the intrapa-
rietal sulcus and the ventral premotor cortex (Bremmer et al. 2001; Makin et al. 2007; Sereno and
Huang 2006). These findings correspond nicely with the anatomical locations of the monkey visuo-
tactile network. Moreover, recent studies have identified the superior parietal occipital junction as
a potential site for representing near-face and near-hand visual space (Gallivan et al. 2009; Quinlan
and Culham 2007). This new evidence extends our current knowledge of the PpS neural network,
and may guide further electrophysiological studies to come.
Although using functional brain imaging enabled us to demonstrate that multiple brain areas in
both sensory and motor cortices modulate their responses to visual stimuli based on their distance
from the hand and face, it did not allow us to determine the direct involvement of such representa-
tions in motor processing. In a series of experiments inspired by the macaque neurophysiological
literature, we recently examined the reference frames underlying rapid motor responses to real,
three-dimensional objects approaching the hand (Makin et al. 2009). We asked subjects to make a
simple motor response to a visual “Go” signal while they were simultaneously presented with a task-
irrelevant distractor ball, rapidly approaching a location either near to or far from their responding
hand. To assess the effects of these rapidly approaching distractor stimuli on the excitability of the
human motor system, we used single pulse transcranial magnetic stimulation, applied to the pri-
mary motor cortex, eliciting motor evoked potentials (MEPs) in the responding hand. As expected,
and across several experiments, we found that motor excitability was modulated as a function of the
distance of approaching balls from the hand: MEP amplitude was selectively reduced when the ball
approached near the hand, both when the hand was on the left and on the right of the midline. This
suppression likely reflects the proactive inhibition of a possible avoidance responses that is elicited
by the approaching ball (see Makin et al. 2009). Strikingly, this hand-centered suppression occurred
as early as 70 ms after ball appearance, and was not modified by the location of visual fixation rela-
tive to the hand. Furthermore, it was selective for approaching balls, since static visual distractors
did not modulate MEP amplitude. Together with additional behavioral measurements, this new
series of experiments provides direct and converging evidence for automatic hand-centered coding
of visual space in the human motor system. These results strengthen our interpretation of PpS as a
mechanism for translating potentially relevant visual information into a rapid motor response.
Together, the behavioral and imaging studies reviewed above confirm the existence of brain
mechanisms in humans that are specialized for representing visual information selectively when it
arises from near the hand. As highlighted in the previous section on monkey research, a strong bind-
ing mechanism of visual and tactile inputs has repeatedly been shown also in humans. Importantly,
these converging results have refined and extended our understanding of the neural processes under-
lying multisensory representation of PpS, namely, by identifying various cortical areas that are
involved in different sensory–motor aspects of PpS representation, and the time course of hand-
centered processing.
The tight relationship between motor and visual representation of near space in the human brain
led us most recently to an intriguing question: Would the loss of a hand through amputation (and
therefore the inability of the brain to represent visual information with respect to it) lead to changes
in visual perception? We recently discovered that hand amputation is indeed associated with a mild
visual “neglect” of the amputated side: Participants with an amputated hand favored their intact side
when comparing distances in a landmark position-judgment task (Makin et al. 2010). Importantly,
this bias was absent when the exact same task was repeated with the targets placed in far space.
These results thus suggest that the possibility for action within near space shapes the actor’s spatial
458 The Neural Bases of Multisensory Processes
perception, and emphasize the unique role that PpS mechanisms may play as a medium for interac-
tions between the hands and the world.
(a)
100
*
90
*
80
*
70 *
CCE (ms)
60
50
40
30
20
10
0
500 400 300 200 100 0 500 400 300 200 100 0 500 400 300 200 100 0
Y (mm) Y (mm) Y (mm)
FIGURE 23.3 (See color insert.) Grasping actions remap peripersonal space. (a) Action induces a reweight-
ing of multisensory processing as shown by a stronger VTI at action Onset (55 ms) compared to Static condi-
tion (22 ms). The increase is even more important (79 ms) when stimulation occurs in early Execution phase
(200 ms after action starts). (b) Dynamics of free hand grasping; schematic representation of estimated posi-
tion of the hand in the instant when stimulation occurred, for the static condition (blue panel), exactly at onset
of movement (yellow panel) or during the early execution phase (light blue panel). Wrist displacement (green
trajectory) and grip evolution (pink trajectories) are shown in each panel. (Modified from Brozzoli, C. et al.,
NeuroReport, 20, 913–917, 2009b.)
To investigate more deeply the relationship between PpS remapping and the motor characteris-
tics of the action, we tested whether different multisensory interactions might arise as a function of
the required sensory–motor transformations. We would expect that action-dependent multisensory
remapping should be more important whenever action performance requires relatively more com-
plex sensory–motor transformations.
In a more recent study (Brozzoli et al. 2009a), we asked a group of healthy participants to per-
form either grasping movements (as in Brozzoli et al. 2009b) or pointing movements. For both
movements, the interaction between task-irrelevant visual information on the object and the tactile
information delivered on the acting hand increased in the early component of the action (as reflected
in a higher VTI), thus replicating our previous findings. However, a differential updating of the
VTI took place during the execution phase of the two action types. Although the VTI magnitude
was further increased during the execution phase of the grasping action (with respect to movement
onset), this was not the case in the pointing action. In other words, when the hand approached the
object, the grasping movement triggered stronger visuo-tactile interaction than pointing. Thus, not
only a continuous updating of PpS occurs during action execution, but this remapping varies with
460 The Neural Bases of Multisensory Processes
the characteristics of the given motor act. If (part of) the remapping of PpS is already effective at
the onset of the motor program, the perceptual modulation will be kept unchanged. But in the case
of relatively complex object-oriented interactions such as grasping, the remapping of PpS will be
dynamically updated with respect to the motor command.
23.3 CONCLUSION
The studies reviewed in this chapter uncover the multisensory mechanisms our brain uses in order
to directly link between visual information available outside our body and tactile information on
our body. In particular, electrophysiological studies in monkeys revealed that the brain builds a
body parts–centered representation of the space around the body, through a network of visuo-tactile
areas. We also reviewed later evidence suggesting a functionally homologous representation of PpS
in humans, which serves as a multisensory interface for interactions with objects in the external
world. Moreover, the action-related properties of PpS representation feature a basic aspect that
might be crucial for rapid and automatic avoidance reactions, that is, a hand-centered representa-
tion of objects in near space. We also showed that PpS representation is dynamically remapped
during action execution, as a function of the sensory–motor transformations required by the action
kinematics. We therefore suggested that PpS representation may also play a major role in voluntary
action execution on nearby objects. These two hypotheses (involuntary and voluntary object-oriented
actions) are not mutually exclusive and one could speculate that, from a more primordial defensive
function of this machinery, a more fine-grained and sophisticated function could have developed
using the same, relatively basic visuo-tactile spatial computational capabilities. This development
could lead to its involvement in the control of the execution of voluntary actions toward objects.
ACKNOWLEDGMENTS
This work was supported by European Mobility Fellowship, ANR grants no. JCJC06_133960 and
RPV08085CSA, and INSERM AVENIR grant no. R05265CS.
REFERENCES
Avillac, M., S. Denève, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing
visual and tactile locations in parietal cortex. Nature Neuroscience 8: 941–949.
Battaglini, P. P., A. Muzur, C. Galletti, M. Skrap, A. Brovelli, and P. Fattori. 2002. Effects of lesions to area
V6A in monkeys. Experimental Brain Research 144: 419–422.
Bender, M. 1952. Disorders in perception. Springfield, IL: Thomas.
Berti, A., and F. Frassinetti. 2000. When far becomes near: Remapping of space by tool use. Journal of Cognitive
Neuroscience 12: 415–420.
Bremmer, F. 2005. Navigation in space—the role of the macaque ventral intraparietal area. Journal of Physiology
566: 29–35.
Bremmer, F. et al. 2001. Polymodal motion processing in posterior parietal and premotor cortex: A human
fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29: 287–296.
Brozzoli, C., L. Cardinali, F. Pavani, and A. Farnè. 2009a. Action specific remapping of peripersonal space.
Neuropsychologia, in press.
Brozzoli, C., M. L. Demattè, F. Pavani, F. Frassinetti, and A. Farnè. 2006. Neglect and extinction: Within and
between sensory modalities. Restorative Neurology Neuroscience 24: 217–232.
Brozzoli, C., F. Pavani, C. Urquizar, L. Cardinali, and A. Farnè. 2009b. Grasping actions remap peripersonal
space. NeuroReport 20: 913–917.
Bonifazi, S., A. Farnè, L. Rinaldesi, and E. Ladavas. 2007. Dynamic size-change of peri-hand space through
tool-use: Spatial extension or shift of the multi-sensory area. Journal of Neuropsychology 1: 101–114.
Caggiano, V., L. Fogassi, G. Rizzolatti, P. Thier, and A. Casile. 2009. Mirror neurons differentially encode the
peripersonal and extrapersonal space of monkeys. Science 324: 403–406.
Cardinali, L., C. Brozzoli, and A. Farnè. 2009a. Peripersonal space and body schema: Two labels for the same
concept? Brain Topography 21: 252–260
Peripersonal Space 461
Cardinali, L., C. Brozzoli, and A. Farnè. 2009b. Peripersonal space and body schema. In Encyclopedia of
Behavioral Neuroscience, ed. G. F. Koob, M. Le Moal, and R. R. Thompson, 40, Elsevier Science
Ltd.
Cardinali, L., F. Frassinetti, C. Brozzoli, C. Urquizar, A. Roy, and A. Farnè. 2009c. Tool-use induces morpho-
logical up-dating of the body schema. Current Biology 19: R478–R479.
Colby, C. L., and J. R. Duhamel. 1991. Heterogeneity of extrastriate visual areas and multiple parietal areas in
the macaque monkey. Neuropsychologia 29: 517–537.
Colby, C. L., J. R. Duhamel, and M. E. Goldberg, 1993. Ventral intraparietal area of the macaque: Anatomic
location and visual response properties. Journal of Neurophysiology 69: 902–914.
Cooke, D. F., and M. S. Graziano. 2003. Defensive movements evoked by air puff in monkeys. Journal of
Neurophysiology 90: 3317–3329.
Cooke, D. F., and M. S. Graziano. 2004. Sensorimotor integration in the precentral gyrus: Polysensory neurons
and defensive movements. Journal of Neurophysiology 91: 1648–1660.
Costantini, M., D. Bueti, M. Pazzaglia, and S. M. Aglioti. 2007. Temporal dynamics of visuo-tactile extinction
within and between hemispaces. Neuropsychology 21: 242–250.
Crutcher, M. D., and M. R. DeLong. 1984. Single cell studies of the primate putamen: II. Relations to direction
of movement and pattern of muscular activity. Experimental Brain Research 53: 244–258.
Dehaene, S. 2005. Evolution of human cortical circuits for reading and arithmetic: The “neuronal recycling”
hypothesis. In From Monkey Brain to Human Brain, ed. S. Dehaene, J. R. Duhamel, M. Hauser, and G.
Rizzolatti, 133–157. Cambridge, MA: MIT Press.
Deuel, R. K., and D. J. Regan. 1985. Parietal hemineglect and motor deficits in the monkey. Neuropsychologia
23: 305–314.
di Pellegrino, G., and De Renzi, E. 1995. An experimental investigation on the nature of extinction.
Neuropsychologia, 33: 153–170.
di Pellegrino, G., L. Fadiga, L. Fogassi, V. Gallese, and G. Rizzolatti. 1992. Understanding motor events: A
neurophysiological study. Experimental Brain Research 91: 176–180.
di Pellegrino, G., E. Ladavas, and A. Farné. 1997. Seeing where your hands are. Nature 21: 730.
Driver, J. 1998. The neuropsychology of spatial attention. In Attention, ed. H. Pashler, 297–340. Hove:
Psychology Press.
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral Intraparietal area of the macaque: Congruent
visual and somatic response properties. Journal of Neurophysiology 79: 126–136.
Ettlinger, G., and J. E. Kalsbeck. 1962. Changes in tactile discrimination and in visual reaching after successive
and simultaneous bilateral posterior parietal ablations in the monkey. Journal of Neurology, Neurosurgery
and Psychiatry 25: 256–268.
Farnè, A. et al. 2003. Visuo-motor control of the ipsilateral hand: Evidence from right brain–damaged patients.
Neuropsychologia 41: 739–757.
Farnè, A., S. Bonifazi, and E. Ladavas. 2005a. The role played by tool-use and tool-length on the plastic elonga-
tion of peri-hand space: A single case study. Cognitive Neuropsychology 22: 408–418.
Farnè, A., Demattè, M., and Ladavas, E. 2003. Beyond the window: Multisensory representation of periper-
sonal space across a transparent barrier. Journal of Physiology Paris 50: 51–61.
Farnè, A., M. L. Demattè, and E. Ladavas. 2005b. Neuropsychological evidence of modular organization of the
near peripersonal space. Neurology 13: 1754–1758.
Farnè, A., A. Iriki, and E. Ladavas. 2005c. Shaping multisensory action-space with tools: Evidence from
patients with cross-modal extinction. Neuropsychologia 43: 238–248.
Farnè, A., and E. Ladavas. 2000. Dynamic size-change of hand peripersonal space following tool use.
NeuroReport 11: 1645–1649.
Farnè, A., A. Serino, and E. Ladavas. 2007. Dynamic size-change of peri-hand space following tool-use:
Determinants and spatial characteristics revealed through cross-modal extinction. Cortex 43: 436–443.
Faugier-Grimaud, S., C. Frenois, and D. G. Stein. 1978. Effects of posterior parietal lesions on visually guided
behavior in monkeys. Neuropsychologia 16: 151–168.
Fogassi, L. et al. 1992. Space coding by premotor cortex. Experimental Brain Research 89: 686–690.
Fogassi, L. et al. 1996. Coding of peripersonal space in inferior premotor cortex (area F4). Journal of
Neurophysiology 76: 141–157.
Fogassi, L., V. Gallese, M. Gentilucci, G. Luppino, M. Matelli, and G. Rizzolatti. 1994. The fronto-parietal
cortex of the prosimian Galago: Patterns of cytochrome oxidase activity and motor maps. Behavioral
Brain Research 60: 91–113.
Fogassi, L., and G. Luppino. 2005. Motor functions of the parietal lobe. Current Opinion in Neurobiology 15:
626–631.
462 The Neural Bases of Multisensory Processes
Fogassi, L., V. Raos, G. Franchi, V. Gallese, G. Luppino, and M. Matelli. 1999. Visual responses in the dorsal
premotor area F2 of the macaque monkey. Experimental Brain Research 128: 194–199.
Gallese, V., L. Fadiga, L. Fogassi, and G. Rizzolatti. 1996. Action recognition in the premotor cortex. Brain
119: 593–609.
Gallese, V., A. Murata, M. Kaseda, N. Niki, and H. Sakata. 1994. Deficit of hand preshaping after muscimol
injection in monkey parietal cortex. NeuroReport 5: 1525–1529.
Gallivan, J. P., C. Cavina-Pratesi, and J. C. Culham. 2009. Is that within reach? fMRI reveals that the human
superior parieto-occipital cortex encodes objects reachable by the hand. Journal of Neuroscience 29:
4381–4391.
Gardner, E. P. et al. 2007. Neurophysiology of prehension: I. Posterior parietal cortex and object-oriented hand
behaviors. Journal of Neurophysiology 97: 387–406.
Gentilucci, M. et al. 1988. Somatotopic representation in inferior area 6 of the macaque monkey. Experimental
Brain Research 71: 475–490.
Gentilucci, M., C. Scandolara, I. N. Pigarev, and G. Rizzolatti. 1983. Visual responses in the postarcuate cortex
(area 6) of the monkey that are independent of eye position. Experimental Brain Research 50: 464–468.
Godschalk, M., R. N. Lemon, H. G. Nijs, and H. G. Kuypers. 1981. Behaviour of neurons in monkey peri-
arcuate and precentral cortex before and during visually guided arm and hand movements. Experimental
Brain Research 44: 113–116.
Godschalk, M., R. N. Lemon, H. G. Kuypers, and R. K. Ronday. 1984. Cortical afferents and efferents of
monkey postarcuate area: An anatomical and electrophysiological study. Experimental Brain Research
56: 410–424.
Godschalk, M., R. N. Lemon, H. G. Kuypers, and J. van der Steen. 1985. The involvement of monkey premo-
tor cortex neurones in preparation of visually cued arm movements. Behavioral Brain Research 18:
143–157.
Graziano, M. S. A. 1999. Where is my arm? The relative role of vision and proprioception in the neuronal
representation of limb position. Proceedings of the National Academy of Sciences of the United States of
America 96: 10418–10421.
Graziano, M. S. A. 2001. A system of multimodal areas in the primate brain. Neuron 29: 4–6.
Graziano, M. S. A., and D. F. Cooke. 2006. Parieto-frontal interactions, personal space, and defensive behavior.
Neuropsychologia 44: 2621–2635.
Graziano, M. S. A., and C. G. Gross. 1993. A bimodal map of space: Somatosensory receptive fields in the
macaque putamen with corresponding visual receptive fields. Experimental Brain Research 97: 96–109.
Graziano, M. S. A., and C. G. Gross. 1994. Multiple pathways for processing visual space. In Attention and
Performance XV, ed. C. Umiltà and M. Moscovitch, 181–207. Oxford: Oxford Univ. Press.
Graziano, M. S. A., and C. G. Gross. 1995. The representation of extrapersonal space: A possible role for
bimodal, visuo-tactile neurons. In The Cognitive Neurosciences, ed. M. Gazzaniga, 1021–1034. MIT
Press.
Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. Journal
of Neurophysiology 77: 2268–2292.
Graziano, M. S., C. S. Taylor, T. Moore, and D. F. Cooke. 2002. The cortical control of movement revisited.
Neuron 36: 349–362.
Graziano, M. S., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science 266:
1054–1057.
Halsband, U., and R. Passingham. 1982. The role of premotor and parietal cortex in the direction of action.
Brain Research 240: 368–372.
Holmes, N. P., G. A. Calvert, and C. Spence. 2004. Extending or projecting peripersonal space with tools? Multisensory
interactions highlight only the distal and proximal ends of tools. Neuroscience Letters 372: 62–67.
Holmes, N. P., D. Sanabria, G. A. Calvert, and C. Spence. 2007a. Tool-use: Capturing multisensory spatial
attention or extending multisensory peripersonal space? Cortex 43: 469–489. Erratum in: Cortex 43:
575.
Holmes, N. P., and C. Spence. 2004. The body schema and multisensory representations of peripersonale space.
Cognitive Processing 5: 94–105.
Holmes, N. P., G. A. Calvert, and C. Spence. 2007b. Tool use changes multisensory interactions in seconds:
Evidence from the crossmodal congruency task. Experimental Brain Research 183: 465–476.
Holmes, N. P., C. Spence, P. C. Hansen, C. E. Mackay, and G. A. Calvert. 2008. The multisensory attentional
consequences of tool use: A functional magnetic resonance imaging study. PLoS One 3: e3502.
Hyvärinen, J. 1981. Regional distribution of functions in parietal association area 7 of the monkey. Brain
Research 206: 287–303.
Peripersonal Space 463
Hyvärinen, J., and A. Poranen. 1974. Function of the parietal associative area 7 as revealed from cellular dis-
charges in alert monkeys. Brain 97: 673–692.
Hyvärinen, J., and Y. Shelepin. 1979. Distribution of visual and somatic functions in the parietal associative
area 7 of the monkey. Brain Research 169: 561–564.
Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque
postcentral neurons. NeuroReport 7: 2325–2330.
Ishida, H., K. Nakajima, M. Inase, and A. Murata. 2009. Shared mapping of own and others’ bodies in visuo-
tactile bimodal area of monkey parietal cortex. Journal of Cognitive Neuroscience 1–14.
Jeannerod, M. 1988. Motor control: Concepts and issues. New York: Wiley.
Jones, E. G., and T. P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral
cortex of the monkey. Brain 93: 793–820.
Kurata, K., and J. Tanji. 1986. Premotor cortex neurons in macaques: Activity before distal and proximal fore-
limb movements. Journal of Neuroscience 6: 403–411.
Kurata, K., K. Okano, and J. Tanji. 1985. Distribution of neurons related to a hindlimb as opposed to forelimb
movement in the monkey premotor cortex. Experimental Brain Research 60: 188–191.
Lacquaniti, F., and R. Caminiti. 1998. Visuo-motor transformations for arm reaching. European Journal of
Neuroscience 10: 195–203. Review. Erratum in: European Journal of Neuroscience, 1998, 10: 810.
Ladavas, E. 2002. Functional and dynamic properties of visual peripersonal space. Trends in Cognitive Sciences
6: 17–22.
Ladavas, E., and A. Farnè. 2004. Visuo-tactile representation of near-the-body space. Journal of Physiology
Paris 98: 161–170.
Legrand, D., C. Brozzoli, Y. Rossetti, and A. Farnè. 2007. Close to me: Multisensory space representations
for action and pre-reflexive consciousness of oneself-in-the-world. Consciousness and Cognition 16:
687–699.
Leinonen, L. 1980. Functional properties of neurones in the posterior part of area 7 in awake monkey. Acta
Physiologica Scandinava 108: 301–308.
Leinonen, L., J. Hyvärinen, G. Nyman, and I. Linnankoski. 1979. I. Functional properties of neurons in lateral
part of associative area 7 in awake monkeys. Experimental Brain Research, 34: 299–320.
Leinonen, L., and G. Nyman. 1979. II. Functional properties of cells in anterolateral part of area 7 associative
face area of awake monkeys. Experimental Brain Research 34: 321–333.
Luppino, G., A. Murata, P. Govoni, and M. Matelli. 1999. ���������������������������������������������������
Largely segregated parietofrontal connections link-
ing rostral intraparietal cortex (areas AIP and VIP) and the ventral premotor cortex (areas F5 and F4).
Experimental Brain Research 128: 181–187.
Lynch, J. C., V. B. Mountcastle, W. H. Talbot, and T. C. T. Yin. 1977. Parietal lobe mechanisms for directed
visual attention. Journal of Neurophysiology 140: 462–489.
Makin, T. R., N. P. Holmes, C. Brozzoli, Y. Rossetti, and A. Farnè. 2009. Coding of visual space during motor
preparation: Approaching objects rapidly modulate corticospinal excitability in hand-centered coordi-
nates. Journal of Neuroscience 29: 11841–11851.
Makin, T. R., N. P. Holmes, and H. H. Ehrsson. 2008. On the other hand: Dummy hands and peripersonal space.
Behavioral Brain Research 191: 1–10.
Makin, T. R., N. P. Holmes, and E. Zohary. 2007. Is that near my hand? Multisensory representation of peri
personal space in human intraparietal sulcus. Journal of Neuroscience 27: 731–740.
Makin, T. R., M. Wilf, I. Schwartz, and E. Zoary. 2010. Amputees “neglect” the space near their missing hand.
Psychological Science, in press.
Maravita, A. 2006. From body in the brain, to body in space: Sensory and intentional aspects of body represen-
tation. In The human body: Perception from the inside out, ed. G. Knoblich, M. Shiffrar, and M. Grosjean,
65–88. Oxford Univ. Press.
Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends in Cognitive Science 8: 79–86.
Maravita, A., C. Spence, and J. Driver. 2003. Multisensory integration and the body schema: Close to hand and
within reach. Current Biology 13: R531–R539.
Maravita, A., C. Spence, S. Kennett, and J. Driver. 2002. Tool-use changes multimodal spatial interactions
between vision and touch in normal humans. Cognition 83: B25–B34.
Martino, A. M., and P. L. Strick. 1987. Corticospinal projections originate from the arcuate premotor area.
Brain Research 404: 307–312.
Matelli, M., R. Camarda, M. Glickstein, and G. Rizzolatti. 1984a. Interconnections within the postarcuate
cortex (area 6) of the macaque monkey. Brain Research 310: 388–392.
Matelli, M., R. Camarda, M. Glickstein, and G. Rizzolatti. 1984b. ����������������������������������������������
Afferent and efferent projections of the infe-
rior area 6 in the macaque monkey. Journal of Comparative Neurology 251: 281–298.
464 The Neural Bases of Multisensory Processes
Rizzolatti, G., and M. Matelli. 2003. Two different streams form the dorsal visual system: Anatomy and func-
tions. Experimental Brain Research 153: 146–157.
Rizzolatti, G., C. Scandolara, M. Gentilucci, and R. Camarda. 1981a. Response properties and behavioral
modulation of “mouth” neurons of the postarcuate cortex (area 6) in macaque monkeys. Brain Research
225: 421–424.
Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981b. Afferent properties of periarcuate neurons
in macaque monkeys: I. Somatosensory responses. Behavioral Brain Research 2: 125–146.
Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981c. Afferent properties of periarcuate neurons
in macque monkeys: II. Visual responses. Behavioral Brain Research 2, 147–163.
Robinson, D. L., M. E. Goldberg, and G. B. Stanton. 1978. Parietal association cortex in the primate: Sensory
mechanisms and behavioral modulations. Journal of Neurophysiology 41: 910–932.
Robinson, C. J., and H. Burton. 1980a. Organization of somatosensory receptive fields in cortical areas 7b,
retroinsula, postauditory and granular insula of M. fascicularis. Journal of Comparative Neurology 192:
69–92.
Robinson, C. J., and H. Burton. 1980b. Somatic submodality distribution within the second somatosensory
(SII), 7b, retroinsular, postauditory, and granular insular cortical areas of M. fascicularis. Journal of
Comparative Neurology 192: 93–108.
Sakata, H., Y. Takaoka, A. Kawarasaki, and H. Shibutani. 1973. Somatosensory properties of neurons in the
superior parietal cortex (area 5) of the rhesus monkey. Brain Research 64: 85–102.
Seltzer, B., and D. N. Pandya. 1980. Converging visual and somatic sensory cortical input to the intraparietal
sulcus of the rhesus monkey. Brain Research 192: 339–351.
Sereno, M. I., and R. S. Huang. 2006. A human parietal face area contains aligned head-centered visual and
tactile maps. Nature Neuroscience 9: 1337–1343.
Shimazu, H., M. A. Maier, G. Cerri, P. A. Kirkwood, and R. N. Lemon. 2004. Macaque ventral premotor cortex
exerts powerful facilitation of motor cortex outputs to upper limb motoneurons. Journal of Neuroscience
24: 1200–1211.
Spence, C., F. Pavani, and J. Driver. 2004a. Spatial constraints on visual–tactile cross-modal distractor congru-
ency effects. Cognitve Affective and Behavioral Neuroscience 4: 148–169.
Spence, C., F. Pavani, A. Maravita, and N. Holmes. 2004b. Multisensory contributions to the 3-D representation
of visuotactile peripersonal space in humans: Evidence from the crossmodal congruency task. Journal of
Physiology Paris 98: 171–189.
Spence, C., F. Pavani, A. Maravita, and N. P. Holmes. 2008. Multisensory interactions. In Haptic rendering:
Foundations, algorithms, and applications, ed. M. C. Lin and M. A. Otaduy, 21–52. Wellesley, MA:
A. K. Peters Ltd.
Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press.
Strick, P. L., and C. C. Kim. 1978. Input to primate motor cortex from posterior parietal cortex (area 5):
I. Demonstration by retrograde transport. Brain Research 157: 325–330.
Strick, P. L. 2002. Stimulating research on motor cortex. Nature Neuroscience 5: 714–715.
Ungerleider, L. G., and Desimone, R. 1986. Cortical connections of visual area MT in the macaque. Journal of
Comparative Neurology, 248: 190–222.
Wallace, M. T., and Stein, B. E. 2007. Early experience determines how the senses will interact. Journal of
Neurophysiology 97: 921–926.
Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex
of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9: 79.
Ward, R., S. Goodrich, and J. Driver. 1994. Grouping reduces visual extinction: Neuropsychological evidence
for weight-linkage in visual selection. Visual Cognition 1: 101–129.
24 Multisensory Perception and
Bodily Self-Consciousness
From Out-of-Body to
Inside- Body Experience
Jane E. Aspell, Bigna Lenggenhager, and Olaf Blanke
CONTENTS
24.1 Introduction........................................................................................................................... 467
24.2 Multisensory Disintegration in Out-of-Body and Related Experiences of Neurological
Origin.....................................................................................................................................468
24.3 Using Multisensory Conflicts to Investigate Bodily Self in Healthy Subjects...................... 470
24.3.1 Body Part Studies: Rubber Hand Illusion.................................................................. 470
24.3.2 Full Body Studies...................................................................................................... 471
24.3.3 Mislocalization of Touch during FBIs....................................................................... 475
24.3.4 Multisensory First-Person Perspective...................................................................... 477
24.4 Conclusion............................................................................................................................. 478
References....................................................................................................................................... 478
24.1 INTRODUCTION
The most basic foundations of the self arguably lie in those brain systems that represent the body
(Blanke and Metzinger 2009; Damasio 2000; Gallagher 2005; Jeannerod 2006; Knoblich 2002;
Metzinger et al. 2007). The representation of the body is complex, involving the encoding and inte-
gration of a wide range of multisensory (somatosensory, visual, auditory, vestibular, visceral) and
motor signals (Damasio 2000; Gallagher 2005; Metzinger 2003). One’s own body is thus possibly
the most multisensory “object” in the world. Importantly, whereas external objects of perception
come and go, multisensory bodily inputs are continuously present, and have thus been proposed
as the basis for bodily self-consciousness—the nonconceptual and prereflective representation of
body-related information (Gallagher 2000; Haggard et al. 2003; Jeannerod 2007; Metzinger et al.
2007; Pacherie 2008).
Despite the apparent unitary, global character of bodily self-consciousness, experimental manip-
ulations have mainly focused on subglobal aspects, such as the sense of ownership and agency for
one’s hand and its movements (Botvinick and Cohen 1998; Ehrsson et al. 2004; Jeannerod 2006,
2007; Knoblich 2002; Pavani et al. 2000; Tsakiris and Haggard 2005; Tsakiris et al. 2007). These
latter studies on body-part representation are important (and will be discussed below in detail), yet
we have argued (e.g., see Blanke and Metzinger 2009) that they fail to account for a key feature of
bodily self-consciousness: its global character. This is because a fundamental aspect of bodily self-
consciousness is its association with a single, whole body, not with multiple body parts (Blanke and
Metzinger 2009; Carruthers 2008; Lenggenhager et al. 2007; Metzinger et al. 2007). A number of
467
468 The Neural Bases of Multisensory Processes
recent studies (Aspell et al. 2009; Ehrsson 2007; Lenggenhager et al. 2007, 2009; Mizumoto and
Ishikawa 2005; Petkova and Ehrsson 2008) have demonstrated that more global aspects of body
perception can also be experimentally manipulated using multisensory conflicts. These experi-
mental studies on healthy subjects were inspired by an unusual and revealing set of neurological
phenomena—autoscopic phenomena—in which the sense of the body as a whole is disrupted in
different ways, and which are likely to be caused by an underlying abnormality in the multisensory
integration of global bodily inputs (Blanke and Mohr 2005). In this chapter, we first examine how
the scientific understanding of bodily self-consciousness and its multisensory mechanisms can be
informed by the study of autoscopic phenomena. We then present a review of investigations of
multisensory processing relating to body-part perception (“rubber hand” illusion studies: Botvinick
and Cohen 1998; Ehrsson et al. 2004; Tsakiris and Haggard 2005) and go on to discuss more recent
“full body” illusion studies that were inspired by autoscopic phenomena and have shown that it is
also possible to dissociate certain components of bodily self-consciousness—namely, self-location,
self-identification, and the first-person perspective—in healthy subjects by inducing multisensory
conflicts.
ownership being referred to the rubber hand. It should be noted that these mechanisms and this
direction of causality have yet to be verified experimentally. It is worth noting that the size of the
drift is generally quite small (a few centimeters) compared to the actual distance between the fake
and the real hand, and that the induced changes in illusory touch and (even more so) ownership dur-
ing the rubber hand illusion are most often relatively weak changes in conscious bodily experience
(even after 30 min of stroking).
Several studies have investigated the brain mechanisms involved in the rubber hand illusion, for
example, using functional MRI (Ehrsson et al. 2004) and positron emission tomoraphy (Tsakiris
et al. 2007). A systematic review of the studies using the rubber hand illusion would be beyond the
scope of the present review, as this chapter focuses on scientific experimentation with full body illu-
sions and global aspects of bodily self-consciousness. The interested reader is referred to the recent
review on body part–specific aspects of bodily self-consciousness by Makin and colleagues (2008).
We only note here that comparison of neuroimaging studies of the rubber hand illusion is hampered
by the fact that the studies employed different methods to induce the rubber hand illusion, used
different control conditions, different behavioral proxies to quantify illusory touch and ownership,
and employed different brain imaging techniques. Not surprisingly, though, these studies implicated
several key brain areas that have previously been shown to be important in multisensory integration,
such as the premotor and intraparietal cortices as well as the TPJ, insula, extrastriate cortex, and
the cerebellum.
FIGURE 24.1 Experimental setup in synchronous (back) stroking condition in Lenggenhager et al.’s (2007)
study (top panel) and in synchronous (chest) stroking condition in Ehrsson’s (2007) study (bottom panel). In
both panels, the physical body of the subject is light-colored and the dark-colored body indicates the hypoth-
esized location of the perceived body (bodily self). (Modified from Lenggenhager, B. et al., Consciousness
and Cognition, 18(1), 110–117, 2009.)
body (the “virtual body”) while they were stroked on their real back with a stick. This stroking was
felt on their back and also seen in front on the virtual body either simultaneously (in real time) or not
(when delayed by a video delay). The stroking manipulation thus generated either congruent (syn-
chronous) or incongruent (asynchronous) visuo-tactile stimulation (as had been shown to affect the
perception of hand ownership and hand location in the rubber hand illusion; Botvinick and Cohen
1998). It was found that the illusion of self-identification with the virtual body (i.e., global owner-
ship, the feeling that “the virtual body is my body”) and the referral of touch (“feeling the touch of
the stick where I saw it touching my virtual body”) were both stronger when subjects were stroked
synchronously than when they were stroked asynchronously (Lenggenhager et al. 2007). Self-
location was also measured by passively displacing blindfolded subjects after the stroking period
and then asking them to walk back to the original position. Note that, as predicted, self-location was
experienced at a position that was closer to the virtual body, as if the subject was located “in front”
of the position where (s)he had been standing during the experiment. This ensemble of measures has
been termed the full body illusion (FBI).
In a related study (Ehrsson 2007), subjects were stroked on their chest (Figure 24.1). They were
seated while they viewed themselves (via an HMD) from behind, and they could see a stick moving
(synchronous or asynchronous with the touch) just below the camera’s lens. In this case, subjects
felt that the stick they saw was touching their real chest, they self-identified with the camera’s loca-
tion and they felt that looking at the virtual body was like viewing the body of someone else (i.e.,
decreased self-identification with the virtual body). Self-location was not quantified in this study by
using the drift measure as in Lenggenhager et al.’s (2007) study; instead, a threatening stimulus was
Multisensory Perception and Bodily Self-Consciousness 473
presented to the apparent location of the origin of the visuospatial perspective (just below the cam-
era). The skin conductance response to a swinging hammer (approaching the camera) was found
to be higher during synchronous stroking than during asynchronous stroking, providing implicit
physiological evidence that subjects self-identified with a spatial position that was displaced toward
the position of the camera.
There were several differences in bodily experiences in these two similar setups, and it is worth
considering what may account for these. Meyer (2008) proposed (in a response to these studies) that
in both setups the brain may use at least four different sources of information to generate the con-
scious experience of self-location and self-identification: (1) where the body is seen, (2) where the
world is seen from (the origin of the visuospatial perspective), (3) where the touch is seen to occur,
and (4) where the touch is felt to occur. These four “cues” do not correspond in both experimental
setups (but in everyday life, they usually do). Meyer argued that the most important of these cues for
the conscious experience of self-location might be where the touch is seen to occur (i.e., where the
stroking stick is seen). He concluded this because, first, in neither setup did self-location (measured
via drift by Lenggenhager et al. 2007 or assessed via a questionnaire score by Ehrsson 2007) exactly
coincide with the location where the touch was felt (i.e., where the physical body was located).
Second, the seen location of the virtual body biased self-location in one study (Lenggenhager et
al. 2007) but not in the other (Ehrsson 2007), and third, the location of the visuospatial perspec-
tive corresponded to self-location in Ehrsson’s (2007) study but not in Lenggenhager et al.’s (2007)
study. However, in both cases (during synchronous stroking), self-location coincided with (or more
accurately, was biased toward) the location where the touch was seen to occur (i.e., the seen location
of the stroking stick).
It is not very surprising that the tactile sense appears to have the weakest role in determining
self-location. Touch, after all, cannot give any reliable information regarding the location of the
body in external space, except via tactile contact with external surfaces. There is, however, an
additional important point to consider regarding the four cues: self-location was biased toward the
virtual body more when the seen stroking was synchronous with the felt stroking than when it was
asynchronous (Blanke et al. 2008). Thus, the congruence between tactile and visual input is an
additional important factor in determining self-location in this context. It seems that when vision
and touch are incongruent, the influence of the “visual information about stroking” is weaker and
not preeminent as Meyer implies. Thus, in the asynchronous condition, subjects’ self-location is
closer to where the touch is felt (i.e., where their physical body is actually located) than it is in the
synchronous condition.
It should be noted that different methods (different experimental conditions and dependent vari-
ables to quantify changes in bodily self-consciousness) were used in these studies (Ehrsson 2007;
Lenggenhager et al. 2007). It is therefore difficult to make meaningful, direct comparisons between
the results of these studies. A more recent study (Lenggenhager et al. 2009) therefore sought to
directly compare the approaches presented in these previous studies by using identical body posi-
tions and measures in order to quantify the conscious experience of self-identification, first-person
visuospatial perspective, and self-location. In addition, the authors investigated these aspects of
bodily self-consciousness while subjects were tested in the supine position (as OBEs usually occur
in this position; Bünning and Blanke 2005; Green 1968).
Subjects were again fitted with an HMD that displayed a video image of their body. Their vir-
tual body thus appeared to be located below their physical body (see Figure 24.2). The dependent
behavioral measure for the quantification of self-location was a new one: a “mental ball dropping”
(MBD) task in which subjects had to imagine that a ball fell from their hand, and they had to press
one button when they imagined that it left their grasp, and then another button when they imagined
that it hit the floor. The authors hypothesized that MBD estimation would be greater (i.e., the time
that subjects imagined it would take for the ball to reach the ground would be longer) when subjects’
self-location (where they perceived their self to be) was higher from the ground than when it was
closer to the ground. The prediction in this study was that, compared to asynchronous stroking,
474 The Neural Bases of Multisensory Processes
FIGURE 24.2 Experimental setup in synchronous (back) stroking condition (top panel) and synchronous
(chest) stroking condition (bottom panel) in Lenggenhager et al.’s (2009) study. Subject was filmed from
above and viewed the scene via an HMD. Light-colored body indicates where subjects’ real body was
located and dark-colored body, the hypothesized location of the perceived body (bodily self). (Modified from
Lenggenhager, B. et al., Consciousness and Cognition, 18(1), 110–117, 2009.)
synchronous back stroking would lead to a “downward” shift in self-location (toward the virtual
body, seen as though below subjects) and an increased self-identification with the virtual body.
Synchronous chest stroking, conversely, would lead to an “upward” shift in self-location (“away”
from the virtual body seen below), and a decreased self-identification with the virtual body. As
predicted, self-identification with the virtual body and referral of touch to the virtual body were
found to be greater during synchronous than during asynchronous back stroking. In contrast, during
synchronous chest stroking, there was decreased self-identification with the virtual and decreased
illusory touch. The MBD time estimates (quantifying self-location) were lower for synchronous
back stroking than synchronous chest stroking, suggesting that, as predicted, self-location was more
biased toward the virtual body in the synchronous back stroking condition and relatively more
toward the location of the visuospatial perspective (a third-person perspective) in the synchronous
chest stroking condition. This study confirmed the earlier suggestion that self-location and self-
identification are strongly influenced by where the stroking is seen to occur. Thus, self-location was
biased toward the virtual body located as though below (or in front) when subjects were stroked on
the back, and biased toward the location of the visuospatial perspective (behind/above the virtual
body) when subjects were stroked on their chests. These studies revealed that humans’ “inside-
body” self-location and “inside-body” first-person perspective can be transferred to an extracorpo-
real self-location and a third-person perspective.
It is notable that the subjective upward drift in self-location during synchronous chest strok-
ing was correlated with sensations of elevation and floating (as assessed by questionnaires). This
Multisensory Perception and Bodily Self-Consciousness 475
suggests that when subjects adopt a relaxed prone position—synchronous visual–tactile events may
interfere with vestibular processing. The importance of vestibular (otolith) input in abnormal self-
location has already been demonstrated (Blanke et al. 2002, 2004). Furthermore, there is evidence
that vestibular cues may interfere with body and self-representation (Le Chapelain et al. 2001;
Lenggenhager et al. 2008; Lopez et al. 2008; Yen Pik Sang et al. 2006). The relatively motionless
prone body position of the subjects in this study would have minimized vestibular sensory updating
and thus may have further contributed to the occurrence of such vestibular sensations, highlight-
ing their potential relevance for bodily self-consciousness, OBEs, and related experiences (see also
Lopez et al. 2008; Schwabe and Blanke 2008).
Can the mechanisms (explained above) for the rubber hand illusion also explain the changes
in self-location, first-person perspective, and self-identification during the FBI? It is probable that
some mechanisms are shared but there are likely to be several important conceptual, behavioral,
and neurobiological differences. The finding that in the FBI there appears to be referral of touch
to a virtual body viewed as though at a distance of 2 m away is in contrast to the finding that the
rubber hand illusion is greatly weakened or abolished by changing the posture of the rubber hand
to an implausible one (Tsakiris and Haggard 2005) or by placing the rubber hand at more distant
positions (Lloyd 2007). Viewing one’s body from an external perspective at 2 m distance is even
less “anatomically plausible” than a rubber hand with a misaligned posture; therefore, it is perhaps
surprising that the FBI occurs at all under such conditions. However, it has been shown that the
visual receptive field size of parietal bimodal neurons with tactile receptive fields centered on the
shoulder or the back can be very large—extending sometimes for more than a meter in extraper-
sonal space (Duhamel et al. 1998; Maravita and Iriki 2004). Shifts in the spatial characteristics of
such trunk-centered bimodal neurons may thus account for the observed changes during the FBI
(Blanke and Metzinger 2009). What these differences illustrate is that the constraints operating in
the FBI are in certain ways markedly different to those operating in the rubber hand illusion. They
appear similar in that the strength of both illusions depends on the temporal congruence between
seen and felt stroking. However, the constraints regarding the spatial relations between the location
of the origin of the first-person visuospatial perspective and the rubber hand are different to those
between the location of the origin of the first-person visuospatial perspective and the location of
the seen virtual body (see also Blanke and Metzinger 2009). Moreover, in the RHI it is the hand
with respect to the body that is mislocalized: a “body part–body” interaction. In the FBI the entire
body (the bodily self) is mislocalized within external space: a “body–world” interaction. It may be
that the “whole body drift” entails that (during the synchronous condition) the “volume” of peri
personal space is relocated (toward the virtual body) within a stable external space (compatible with
subjective reports during OBEs). Alternatively, it may be that peripersonal and extrapersonal space
are modified. The dimensions of the external room—for example, the proximity of walls to the
subjects—are likely to affect the FBI more than the RHI, but this has not been systematically tested
yet. Given the differences between the illusions, it is to be expected that there should be differences
in both the spatial constraints and neural bases (at the level of bimodal visuo-tactile neurons and of
brain regions encoding multisensory bodily signals) between these illusions.
ratings, being explicit judgments, are susceptible to various biases, for example, experimenter
expectancy effects. Also, the questions were asked only after the period of stroking, not during,
and so were not “online” measures of bodily self-consciousness. Furthermore, as recently pointed
out (Ehrsson and Petkova 2008), such questions are somewhat ambiguous in a VR setup: they are,
arguably, unable to distinguish between self-identification with a virtual body and self-recognition
in a VR/video system. A more recent study (Aspell et al. 2009) therefore developed an online
measure for the mislocalization of touch that would be less susceptible to response biases and that
would test more directly whether tactile mapping is altered during the FBI. This study investigated
whether modifications in bodily self-consciousness are associated with changes in tactile spatial
representations.
To investigate this, the authors (Aspell et al. 2009) adapted the cross-modal congruency task
(Spence et al. 2004) for the full body. This task was used because the cross-modal congruency
effect (CCE) measured in the task can function as a behavioral index of the perceived proximity of
visual and tactile stimuli. In previous studies of the CCE (Igarashi et al. 2008; Pavani and Castiello
2004; Pavani et al. 2000; Shore et al. 2006; Spence et al. 2004), the visual and tactile stimuli were
presented on foam cubes held in the hands: single vibrotactile devices paired with small lights [light
emitting diodes (LEDs)] were positioned next to the thumb and index finger of each hand. Subjects
made speeded elevation discriminations (“up”/index or “down”/thumb) of the tactile stimuli while
attempting to ignore the visual distractors. It was found that subjects performed worse when a
distracting visual stimulus occurred at an incongruent elevation with respect to the tactile (target)
stimulus. Importantly, the CCE (difference between reaction times during incongruent and congru-
ent conditions) was larger when the visual and tactile stimuli occurred closer to each other in space
(Spence et al. 2004).
The CC task was adapted for the full body (from the typical setup for the hands; Spence et al.
2004) by placing the vibrotactile and LEDs on the subject’s torso (back). Subjects were able to view
their body and the LEDs via an HMD (see Figure 24.3) as the setup was similar to that used in the
previous FBI study (Lenggenhager et al. 2007). To investigate whether “full body CCEs” would be
associated in a predictable way with changes in bodily self-consciousness, subjects’ self-identifica-
tion with the virtual body and self-location were manipulated across different blocks by employing
either synchronous or asynchronous stroking of the subjects’ backs. CCEs were measured during
FIGURE 24.3 Subject stood 2 m in front of a camera with a 3-D encoder. Four light vibration devices were
fixed to the subject’s back, the upper two at inner edges of the shoulder blades and the lower two 9 cm below.
Small inset windows represent what the subject viewed via the head mounted device. (1) Left panel: synchro-
nous stroking condition. (2) Right panel: asynchronous stroking condition. (Modified from Aspell, J. E. et al.,
PLoS ONE, 4(8), e6488, 2009.)
Multisensory Perception and Bodily Self-Consciousness 477
the stroking period and, as predicted, were found to be larger during synchronous than asynchro-
nous blocks, indicating that, as predicted, there was a greater mislocalization of touch during syn-
chronous stroking compared to during asynchronous stroking. [Note that although a number of
components—attention, response bias, and multisensory integration—are all thought to contrib-
ute to the CCE to varying degrees (e.g., depending on the stimulus-onset asynchrony between the
visual and tactile stimuli)—the finding of a difference in the CCE between same and different side
stimuli during the synchronous condition, but not during the asynchronous condition, indicates that
the visual and tactile stimuli were represented as being closer to each other in the former case.] In
the synchronous condition, there was also a greater bias in self-location toward the virtual body
and a greater self-identification with the virtual body compared to in asynchronous blocks (as in
Lenggenhager et al. 2007). Control conditions revealed that the modulating effect of spatial remap-
ping of touch was body-specific.
Interestingly, this study also found that the size of the CCE, the degree of self-identification
with, and the bias in self-location toward the virtual body were all modulated by the stimulus onset
synchrony between the visual and vibrotactile stimuli used in the CCE task. These data thus suggest
that certain key components of bodily self-consciousness—that is, “what I experience as my body”
(self-identification) and “where I experience my body to be” (self-location)—are associated with
changes in the spatial representation of tactile stimuli. They imply that a greater degree of visual
capture of tactile location occurs when there is a greater degree of self-identification for the seen
body. This change in the tactile spatial representation of stimuli is not a remapping on the body, but
is, we suggest, a change in tactile mapping with respect to extrapersonal space: the tactile sensations
are perceived at a spatial location biased toward the virtual body.
24.4 CONCLUSION
Studies of OBEs of neurological origin have influenced current scientific thinking on the nature of
global bodily self-consciousness. These clinical studies have highlighted that bodily self-conscious-
ness can be broken down into three key components: self-location, first-person perspective, and self-
identification (Blanke and Metzinger 2009). The phenomenology of OBEs and related experiences
demonstrates that these three components are dissociable, suggesting that they may have distinct
functional and neural bases. The first empirical investigations into the key dimensions of bodily
self-consciousness that we have reviewed here show that it is also possible to study and dissociate
these three components of the global bodily self in healthy subjects.
Future studies should seek to develop experimental settings in which bodily self-consciousness
can be manipulated more robustly and more strongly in healthy subjects. It will also be important
for future studies to characterize in detail the neural machinery that leads to the described experien-
tial and behavioral changes in bodily self-consciousness. The TPJ is likely to be crucially involved
(Blanke et al. 2004; Blanke and Mohr 2005), but we expect that other areas such as the medial
prefrontal cortex (Gusnard et al. 2001) and the precuneus (Northoff and Bermpohl 2004), as well as
somatosensory (Ruby and Decety 2001) and vestibular cortex (Lopez et al. 2008) will also be found
to contribute to bodily self-consciousness.
Will it ever be possible to experimentally induce full-blown OBEs in healthy subjects? OBEs
have previously been induced using direct brain stimulation in neurological patients (Blanke et al.
2002; De Ridder et al. 2007; Penfield 1955), but these clinical examinations can only be carried
out in a highly selective patient population, whereas related techniques, such as transcranial mag-
netic stimulation do not induce similar effects (Blanke and Thut 2007). Blackmore (1982, 1984)
has listed a number of behavioral procedures that may induce OBEs, and it may be interesting for
future empirical research to employ some of these “induction” methods in a systematic manner in
combination with well-controlled scientific experimentation. It is important to note that OBEs were
not actually induced in the studies (Ehrsson 2007; Lenggenhager et al. 2007, 2009) that used video-
projection, but rather produced states that are more comparable to heautoscopy. Where will we
find techniques to create experimental setups able to induce something even closer to an OBE? We
believe that virtual reality technology, robotics, and methods from the field of vestibular physiology
may be promising avenues to explore.
REFERENCES
Altschuler, E., and V. Ramachandran. 2007. A simple method to stand outside oneself. Perception 36(4):
632–634.
Arzy, S., G. Thut, C. Mohr, C. M. Michel, and O. Blanke. 2006. Neural basis of embodiment: Distinct con-
tributions of temporoparietal junction and extrastriate body area. Journal of Neuroscience 26(31):
8074–8081.
Aspell, J. E., B. Lenggenhager, and O. Blanke. 2009. Keeping in touch with one’s self: Multisensory mecha-
nisms of self-consciousness. PLoS ONE 4(8): e6488.
Blackmore, S. 1982. Beyond the body. An investigation of out-of-body experiences. London: Heinemann.
Blackmore, S. 1984. A psychological theory of the out-of-body experience. Journal of Parapsychology 48:
201–218.
Blanke, O., T. Landis, L. Spinelli, and M. Seeck. 2004. Out-of-body experience and autoscopy of neurological
origin. Brain 127(2): 243–258.
Blanke, O., and T. Metzinger. 2009. Full-body illusions and minimal phenomenal selfhood. Trends in Cognitive
Sciences 13(1): 7–13.
Blanke, O., T. Metzinger, and B. Lenggenhager. 2008. Response to Kaspar Meyer’s E-letter. Science E-letter.
Blanke, O., and C. Mohr. 2005. Out-of-body experience, heautoscopy, and autoscopic hallucination of neuro-
logical origin: Implications for neurocognitive mechanisms of corporeal awareness and self-conscious-
ness. Brain Research Reviews 50(1): 184–199.
Blanke, O., S. Ortigue, T. Landis, and M. Seeck. 2002. Neuropsychology: Stimulating illusory own-body per-
ceptions. Nature 419(6904): 269–270.
Multisensory Perception and Bodily Self-Consciousness 479
Blanke, O., and V. Castillo. 2007. Clinical neuroimaging in epileptic patients with autoscopic hallucinations
and out-of-body experiences. Epileptologie 24: 90–95.
Blanke, O., and G. Thut. 2007. Inducing out of body experiences. In Tall Tales, ed. G. Della Sala. Oxford:
Oxford Univ. Press.
Botvinick, M., and J. Cohen. 1998. Rubber hands ‘feel’ touch that eyes see. Nature 391(6669): 756–756.
Brandt, T., and M. Dieterich. 1999. The vestibular cortex: Its locations, functions, and disorders. Annals of the
New York Academy of Science 871(1): 293–312.
Bremmer, F., A. Schlack, J.-R. Duhamel, W. Graf, and G. R. Fink. 2001. Space coding in primate posterior
parietal cortex. NeuroImage 14(1): S46–S51.
Brugger, P. 2002. Reflective mirrors: Perspective-taking in autoscopic phenomena. Cognitive Neuropsychiatry
7: 179–194.
Brugger, P., R. Agosti, M. Regard, H. Wieser, and T. Landis. 1994. Heautoscopy, epilepsy, and suicide. Journal
of Neurology, Neurosurgery & Psychiatry 57(7): 838–839.
Brugger, P., M. Regard, and T. Landis. 1997. Illusory reduplication of one’s own body: Phenomenology and
classification of autoscopic phenomena. Cognitive Neuropsychiatry 2(1): 19–38.
Bünning, S., and O. Blanke. 2005. The out-of body experience: Precipitating factors and neural correlates. In
Progress in Brain Research, Vol. 150, 331–350. Amsterdam, The Netherlands: Elsevier.
Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. Evidence from functional magnetic resonance imaging
of crossmodal binding in the human heteromodal cortex. Current Biology 10(11): 649–657.
Carruthers, G. 2008. Types of body representation and the sense of embodiment. Consciousness and Cognition
17: 1302–1316.
Damasio, A. R. 2000. The feeling of what happens: Body and emotion in the making of consciousness. New
York: Harcourt Brace.
Dening, T. R., and G. E. Berrios. 1994. Autoscopic phenomena. The British Journal of Psychiatry 165: 808–817,
doi:10.1192/bjp.165.6.808.
De Ridder, D., K. Van Laere, P. Dupont, T. Menovsky, and P. Van de Heyning. 2007. Visualizing out-of-body
experience in the brain. New England Journal of Medicine 357(18): 1829–1833.
Devinsky, O., E. Feldmann, K. Burrowes, and E. Bromfield. 1989. Autoscopic phenomena with seizures.
Archives of Neurology 46(10): 1080–1088.
Duhamel, J., C. L. Colby, and M. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent visual
and somatic response properties. Journal of Neurophysiology 79(1): 126–136.
Ehrsson, H. 2007. The experimental induction of out-of-body experiences. Science 317(5841): 1048,
doi:10.1126/science.1142175.
Ehrsson, H., and V. Petkova. 2008. Response to Kaspar Meyer’s E-letter. Science, E-Letter.
Ehrsson, H., C. Spence, and R. Passingham. 2004. That’s my hand! Activity in premotor cortex reflects feeling
of ownership of a limb. Science 305(5685): 875–877, doi:10.1126/science.1097011.
Ehrsson, H. H., N. P. Holmes, and R. E. Passingham. 2005. Touching a rubber hand: Feeling of body ownership
is associated with activity in multisensory brain areas. The Journal of Neuroscience 25: 10564–10573,
doi:10.1523/jneurosci.0800-05.2005.
Ehrsson, H. H., K. Wiech, N. Weiskopf, R. J. Dolan, and R. E. Passingham. 2007. Threatening a rubber hand
that you feel is yours elicits a cortical anxiety response. Proceedings of the National Academy of Sciences
104: 9828–9833, doi:10.1073/pnas.0610011104.
Fasold, O. et al. 2002. Human vestibular cortex as identified with caloric stimulation in functional magnetic
resonance imaging. NeuroImage 17: 1384–1393.
Gallagher, S. 2000. Philosophical conceptions of the self: Implications for cognitive science. Trends in Cognitive
Sciences 4(1): 14–21.
Gallagher, S. 2005. How the body shapes the mind. Oxford: Clarendon Press.
Graziano, M., D. Cooke, and C. Taylor. 2000. Coding the location of the arm by sight. Science 290(5497):
1782–1786.
Green, C. 1968. Out-of-body experiences. Oxford: Institute of Psychophysical Research.
Gusnard, D. A., E. Akbudak, G. L. Shulman, and M. E. Raichle. 2001. Medial prefrontal cortex and self-refer-
ential mental activity: Relation to a default mode of brain function. Proceedings of the National Academy
of Sciences of the United States of America 98(7): 4259–4264.
Haggard, P., M. Taylor-Clarke, and S. Kennett. 2003. Tactile perception, cortical representation and the bodily
self. Current Biology 13(5): R170–R173.
Hägni, K., K. Eng, M.-C. Hepp-Reymond, L. Holper, B. Keisker, E. Siekierka et al. 2008. Observing virtual
arms that you imagine are yours increases the galvanic skin response to an unexpected threat. PLoS ONE
3(8): e3082.
480 The Neural Bases of Multisensory Processes
Hécaen, H., and J. Ajuriaguerra. 1952. Méconnaissances et hallucinations corporelles: intégration et désinté-
gration de la somatognosie. Masson.
Igarashi, Y., Y. Kimura, C. Spence, and S. Ichihara. 2008. The selective effect of the image of a hand on visuo
tactile interactions as assessed by performance on the crossmodal congruency task. Experimental Brain
Research 184(1): 31–38.
Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque
postcentral neurones. NeuroReport 7: 2325–2330.
Iriki, A., M. Tanaka, S. Obayashi, and Y. Iwamura. 2001. Self-images in the video monitor coded by monkey
intraparietal neurons. Neuroscience Research 40: 163–173.
Irwin, H. 1985. Flight of mind: A psychological study of the out-of-body experience. Metuche, NJ: Scarecrow
Press.
Jeannerod, M. 2006. Motor cognition: What actions tell the self. Oxford, UK: Oxford Univ. Press.
Jeannerod, M. 2007. Being oneself. Journal of Physiology – Paris 101(4–6): 161–168.
Kammers, M. P. M., F. de Vignemont, L. Verhagen, and H. C. Dijkerman. 2009. The rubber hand illusion in
action. Neuropsychologia 47: 204–211, doi:10.1016/j.neuropsychologia.2008.07.028.
Knoblich, G. 2002. Self-recognition: Body and action. Trends in Cognitive Sciences 6(11): 447–449.
Kölmel, H. 1985. Complex visual hallucinations in the hemianopic field. Journal of Neurology, Neurosurgery
and Psychiatry 48: 29–38.
Le Chapelain, L., J. Beis, J. Paysant, and J. André. 2001. Vestibular caloric stimulation evokes phantom limb
illusions in patients with paraplegia. Spinal Cord 39(2): 85–87.
Lenggenhager, B., C. Lopez, and O. Blanke. 2008. Influence of galvanic vestibular stimulation on egocentric
and object-based mental transformations. Experimental Brain Research 184(2): 211–221.
Lenggenhager, B., M. Mouthon, and O. Blanke. 2009. Spatial aspects of bodily self-consciousness.
Consciousness and Cognition 18(1): 110–117.
Lenggenhager, B., T. Tadi, T. Metzinger, and O. Blanke. 2007. Video ergo sum: Manipulating bodily self-
consciousness. Science 317(5841): 1096–1099.
Leube, D. T., G. Knoblich, M. Erb, W. Grodd, M. Bartels, and T. T. J. Kircher. 2003. The neural correlates of
perceiving one’s own movements. NeuroImage 20(4): 2084–2090.
Lhermitte, J. 1939. In L’image de notre corps, 170–227. L’Harmattan.
Lippman, C. 1953. Hallucinations of physical duality in migraine. Journal of Nervous and Mental Disease
117: 345–350.
Lloyd, D. M. 2007. Spatial limits on referred touch to an alien limb may reflect boundaries of visuo-tactile
peripersonal space surrounding the hand. Brain and Cognition 64(1): 104–109.
Lobel, E., J. F. Kleine, D. L. Bihan, A. Leroy-Willig, and A. Berthoz. 1998. Functional MRI of galvanic ves-
tibular stimulation. Journal of Neurophysiology 80: 2699–2709.
Longo, M. R., S. Cardozo, and P. Haggard. 2008. Visual enhancement of touch and the bodily self. Consciousness
and Cognition 17: 1181–1191.
Lopez, C., P. Halje, and O. Blanke. 2008. Body ownership and embodiment: Vestibular and multisensory mech-
anisms. Neurophysiologie Clinique/Clinical Neurophysiology 38(3): 149–161.
Makin, T. R., N. P. Holmes, and H. H. Ehrsson. 2008. On the other hand: Dummy hands and peripersonal space.
Behavioural Brain Research 191(1): 1–10.
Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends in Cognitive Sciences 8(2): 79–86.
Metzinger, T. 2003. Being no one. The self-model theory of subjectivity. Cambridge, MA: MIT Press.
Metzinger, T., B. Rahul, and K. C. Bikas. 2007. Empirical perspectives from the self-model theory of subjec-
tivity: A brief summary with examples. In Progress in Brain Research, Vol. 168, 215–245, 273–278.
Amsterdam, The Netherlands: Elsevier.
Meyer, K. 2008. How does the brain localize the self? Science, E-Letter.
Mizumoto, M., and M. Ishikawa. 2005. Immunity to error through misidentification and the bodily illusion
experiment Journal of Consciousness Studies 12(7): 3–19.
Muldoon, S., and H. Carrington. 1929. The projection of the astral body. London: Rider and Co.
Northoff, G., and F. Bermpohl. 2004. Cortical midline structures and the self. Trends in Cognitive Sciences
8(3): 102–107.
Pacherie, E. 2008. The phenomenology of action: A conceptual framework. Cognition 107(1): 179–217.
Pavani, F., and U. Castiello. 2004. Binding personal and extrapersonal space through body shadows. Nature
Neuroscience 7(1): 14–16.
Pavani, F., C. Spence, and J. Driver. 2000. Visual capture of touch: Out-of-the-body experiences with rubber
gloves. Psychological Science 11(5): 353–359.
Multisensory Perception and Bodily Self-Consciousness 481
Penfield, W. 1955. The 29th Maudsley lecture—The role of the temporal cortex in certain psychical phenom-
ena. Journal of Mental Science 101(424): 451–465.
Penfield, W., and T. Erickson. 1941. Epilepsy and Cerebral Localization. Oxford, England: Charles C.
Thomas.
Petkova, V., and H. H. Ehrsson. 2008. If I were you: Perceptual illusion of body swapping. PLoS ONE 3(12):
e3832.
Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981. Afferent properties of periarcuate neurons
in macaque monkeys. II. Visual responses. Behavioural Brain Research 2: 147–163.
Ruby, P., and J. Decety. 2001. Effect of subjective perspective taking during simulation of action: A PET inves-
tigation of agency. Nature Neuroscience 4(5): 546–550.
Schütz-Bosbach, S., J. Musil, and P. Haggard. 2009. Touchant-touché: The role of self-touch in the representa-
tion of body structure. Consciousness and Cognition 18: 2–11.
Schwabe, L., and O. Blanke. 2008. The vestibular component in out-of-body experiences: A computational
approach. Frontiers in Human Neuroscience 2: 17.
Shore, D. I., M. E. Barnes, and C. Spence. 2006. Temporal aspects of the visuotactile congruency effect.
Neuroscience Letters 392(1–2): 96–100.
Spence, C., F. Pavani, and J. Driver. 2004. Spatial constraints on visual–tactile cross-modal distractor congru-
ency effects. Cognitive, Affective and Behavioral Neuroscience 4(2): 148–169.
Stratton, G. 1899. The spatial harmony of touch and sight. Mind 8: 492–505.
Todd, J., and K. Dewhurst. 1955. The double: Its psychopathology and psycho-physiology. Journal of Nervous
and Mental Disorders 122: 47–55.
Tsakiris, M., and P. Haggard. 2005. The rubber hand illusion revisited: Visuotactile integration and self-attribu-
tion. Journal of Experimental Psychology-Human Perception and Performance 31(1): 80–91.
Tsakiris, M., M. Hesse, C. Boy, P. Haggard, and G. R. Fink. 2007. Neural signatures of body ownership: A
sensory network for bodily self-consciousness. Cerebral Cortex 17(10): 2235–2244, doi:10.1093/cercor
/bhl131.
Vallar, G. et al. 1999. A fronto-parietal system for computing the egocentric spatial frame of reference in
humans. Experimental Brain Research 124: 281–286.
Vogeley, K., and G. R. Fink. 2003. Neural correlates of the first-person-perspective. Trends in Cognitive
Sciences 7: 38–42.
Yen Pik Sang, F., K. Jáuregui-Renaud, D. A. Green, A. Bronstein, and M. Gresty. 2006. Depersonalisation/
derealisation symptoms in vestibular disease. Journal of Neurology Neurosurgery and Psychiatry 77(6):
760–766.
Section VI
Attention and Spatial Representations
25 Spatial Constraints in
Multisensory Attention
Emiliano Macaluso
CONTENTS
25.1 Introduction........................................................................................................................... 485
25.2 Unisensory and Multisensory Areas in Human Brain.......................................................... 487
25.3 Multisensory Endogenous Spatial Attention......................................................................... 490
25.4 Stimulus-Driven Spatial Attention........................................................................................ 492
25.5 Possible Relationship between Spatial Attention and Multisensory Integration................... 497
25.6 Conclusions............................................................................................................................500
References....................................................................................................................................... 501
25.1 INTRODUCTION
Our sensory organs continuously receive a large amount of input from the external world; some
of these are important for a successful interaction with the environment, whereas others can be
ignored. The operation of selecting relevant signals and filtering out irrelevant information is a
key task of the attentional system (Desimone and Duncan 1995; Kastner and Ungerleider 2001).
Attentional selection can occur on the basis of many different criteria, with a main distinction
between endogenous control (i.e., selection based on voluntary attention, current aims, and knowl-
edge) and stimulus-driven control (i.e., selection based on the intrinsic features of the sensory input).
Accordingly, we can decide to pay attention to the face of one person in a crowded room (i.e., attend-
ing to subtle details in a rich and complex environment), or attention can be captured by a loud
sound in a quiet room (i.e., attention captured by a salient stimulus).
Many different constraints can guide endogenous and stimulus-driven attention. We can volun-
tarily decide to attend to a specific visual feature, such as color or motion, but the very same features
can guide stimulus-driven attention if they stand out from the surrounding environment (“pop-out”
item, e.g., a single red stimulus presented among many green stimuli). Here, I will focus on pro-
cesses related to attentional selection based on spatial location. The investigation of mechanisms of
spatial attention control is appealing for many reasons. Spatial selectivity is one of the most impor-
tant characteristics of single neurons (i.e., the neuron’s receptive field) and well-organized maps of
space can be found throughout the brain (Gross and Graziano 1995). These include sensory areas
(e.g., striate and extrastriate occipital regions, for retinotopic representations of the visual world;
Tootell et al. 1982), subcortical regions [e.g., the superior colliculus (SC); Wallace et al. 1997], and
higher-level associative areas in frontal and parietal cortex (e.g., Ben Hamed et al. 2001; Sommer
and Wurtz 2000). This widespread selectivity for spatial locations opens the question about how/
whether these anatomically segregated representations contribute to the formation of an integrated
representation of external space. Indeed, from a subjective point of view, signals about different
visual features (e.g., shape/color) as well as motor commands seem to all merge effortlessly, giving
rise to a coherent and unified perception–action system that allows us to interact spatially with the
external environment.
485
486 The Neural Bases of Multisensory Processes
Dorsal FP Ventral FP 2
(IPS, PPC, FEF) (TPJ, IFG)
Dorsal FP Ventral FP
Visual areas
Visual areas
(occipital cortex)
SC
4 Thalamus
Stimulus-driven control
FIGURE 25.1 Schematic models of spatial attention control. (a) “Site-source model” of visuospatial con-
trol. This distinguishes areas that generate spatial biases [“sources,” in dorsal fronto-parietal (dFP) cortex]
and areas that receive these modulatory signals (“sites,” occipital visual cortex). The model also includes a
distinction between endogenous control (dark gray) and stimulus-driven control (light gray; see Corbetta et
al. 2002). The two control systems operate together and interaction between them has been proposed to affect
functional coupling between visual cortex and ventral attention control network (vFP; see Corbetta et al.
2008). IPS, intraparietal sulcus; PPC, posterior parietal cortex; TPJ, temporo-parietal cortex; IFG, inferior
frontal gyrus; SC, superior colliculus; Som/Aud, somatosensory/auditory. (b) An extension of “site-source”
model, with feedforward connectivity and backprojections that allow transferring spatial information between
sensory-specific (e.g., visual, auditory, and somatosensory areas) and multisensory regions (dFP and vFP).
These multiple pathways may mediate spatial constraints in multisensory attention. Possible routes include:
(1) feedforward multisensory input converging into vFP, for stimulus-driven control; (2) multisensory interac-
tions in dFP, which in turn may affect interplay between dorsal (endogenous) and ventral (stimulus-driven)
attention control systems; (3) direct projections between sensory-specific areas that may mediate cross-modal
effects in sensory-specific areas; (4) multisensory interaction via subcortical structures that send and receive
projections to/from sensory-specific and multisensory cortical areas.
During the cue-to-target interval, activation occurred first in the parietal cortex (approximately
at 200 ms post-cue onset) and then in the frontal regions (at 400 ms), followed by reactivation
of parietal regions (600 ms), and lastly, attentional modulation was found in the occipital cortex.
Moreover, this study also showed that these preparatory effects are predictive of subsequent per-
ceptual performance upon presentation of the target, confirming the relationship between activation
of fronto-parietal control regions and attentional benefits for targets presented at the cued location.
The vFP provides an additional control system that can flexibly interrupt endogenous control when
unexpected/salient events require reorienting of attention toward a new location (stimulus-driven
control; Corbetta and Shulman 2002; Corbetta et al. 2008).
It should be stressed that this is a simplified model of visuospatial attention control, as there
are many other processes (e.g., feature conjunction, sensory–motor transformations, etc.) and brain
regions (e.g., the SC and the pulvinar) that also contribute to covert spatial orienting. However, this
simple model embodies a few key concepts concerning (1) attention control vs. modulation, (2) feed-
forward vs. feedback connectivity, and (3) endogenous vs. stimulus-driven control, which can help
in the interpretation of many findings in studies of multisensory spatial attention.
T V T T V
6
5 * 4
2
0
0
–2
–5
L R L R aL aR aL aR
Bs
Vision Touch aVision aTouch
Sensory-specific Cross-modal
10 8
Effect size (a.u.)
* 6
*
5 *
n.s. 4
2
0
0
–2
–5
L R L R aL aR aL aR
Bs
Vision Touch aVision aTouch
FIGURE 25.2 (a) Mapping of multisensory space. Top panel shows a schematic illustration of an fMRI
experiment to map visual and tactile side-specific activation. In different blocks/conditions, subjects were
presented with stimuli in one modality and one side only (right touch in example). A region in aIPS showed
greater responses for contralateral than ipsilateral stimuli, irrespective of stimulus modality. Middle panel
shows multisensory activation of left aIPS for visual and tactile stimuli on right side. By contrast, sensory-
specific areas showed an effect of contralateral versus ipsilateral stimuli only for corresponding modality. For
example, left occipital visual cortex activated significantly more for right than left stimulation, but only for
visual stimuli (see bottom panel). (b) Multisensory endogenous spatial attention. Top panel shows a schematic
illustration of one of the setup utilized to study visuo-tactile cross-modal links in endogenous spatial atten-
tion. (Reproduced from Macaluso, E. et al., Cereb. Cortex, 12, 357–368, 2002b.) The stimulation was always
bimodal and bilateral, but in different conditions subjects are asked to attend to only one side and one modality
(attend right touch, in this example). Direct comparison of conditions of attention to one versus the other side
(attend right vs. attend left, in the figure) reveals modality-independent attentional modulation in contralateral
multisensory regions (e.g., left aIPS for attention to right hemifield; see middle panel) but also cross-modal
influences in sensory-specific areas. For example, bottom panel shows cross-modal spatial attentional effects
in left occipital visual cortex, with increased activation when subjects attended right vision (bar 2 minus 1, in
signal plot) but also when they attended right touch (bar 4 minus 3, in plot). V/T, visual/tactile; L/R, left/right;
aL/aR, attend left/right; Bs, baseline condition (central detection); *p < .05.
490 The Neural Bases of Multisensory Processes
observed even when visual distracters at the attended side conveyed misleading information (e.g.,
a single flash of light, while subjects attempted to detect double pulses of vibrations; Macaluso et
al. 2000a, 2002b, 2003b; see also Ciaramitaro et al. 2007). Accordingly, it is unlikely that subjects
decided to strategically shift both tactile and visual attention toward one side, but rather cross-modal
spatial influences in visual cortex appear to be obligatory (see also Eimer 1999). It should be noted
that modulatory effects of one modality on areas dedicated to a different modality are not confined
to tactile attention affecting the visual cortex (for review, see Eimer and Driver 2001). For example,
Eimer and Van Velzen (2002) showed modulation of early somatosensory event-related potentials
(ERPs) depending on the direction of visual attention (see also Kida et al. 2007; for a recent mag-
netoencephalography study localizing related cross-modal influences in secondary somatosensory
cortex); Teder-Salejarvi et al. (1999) found that endogenous visuospatial attention can modulate
early auditory ERPs; and Hotting et al. (2003) reported reciprocal cross-modal influences of audi-
tory and tactile spatial attention on tactile and auditory ERPs, respectively.
Our visuo-tactile fMRI study that isolated cue-related, preparatory processes (Macaluso et al.
2003b) provided additional hints about the nature of spatially specific cross-modal influences in the
occipital cortex. The comparison of leftward versus rightward attention-directing cues, and vice
versa, demonstrated that activity in contralateral occipital cortex increases before the presentation
of the target stimuli, that is, when subjects prepared for the upcoming tactile judgment. For exam-
ple, when the auditory cue instructed the subject to shift tactile attention to the right hemifield, brain
activity increased not only in left post-central somatosensory areas and in left multimodal intrapa-
rietal cortex, but also in the left extrastriate visual cortex (for preparatory cross-modal influences
between other modalities, see also Trenner et al. 2008; Eimer et al. 2002; Green et al. 2005). This
supports the hypothesis that endogenous attention generates “multisensory spatial biases,” and that
these can influence multiple levels of processing, including activity in multisensory regions (aIPS)
as well as in sensory-specific areas (somatosensory and visual cortex, for tactile spatial attention).
To summarize, studies on multisensory endogenous spatial attention have shown that: (1) con-
trol regions in dFP activate irrespective of modality; (2) selective attention to one hemifield boosts
activity in areas that represent the contralateral hemifield, including also sensory-specific areas
concerned with a different modality (e.g., cross-modal modulation of occipital cortex during tactile
attention); (3) both multisensory regions in dFP (plus spatially specific aIPS) and unisensory areas
show attentional modulation before the presentation of the target stimuli (cue-related effects), con-
sistent with the endogenous, internally generated nature of these attentional signals.
These findings can be interpreted in the context of “sites-sources” models of attention con-
trol. Accordingly, feedforward sensory convergence would make multisensory information avail-
able to the dFP attentional network that can therefore operate as a supramodal control system.
Backprojections from the control system (“sources”) to sensory-specific areas (“sites”) enable con-
veying modulatory signals about the currently relevant location. Critically, because the control sys-
tem operates supramodally and is connected with several modalities, these signals will spread over
multiple “site” regions affecting activity in a distributed network of multimodal and unimodal brain
regions, all representing the attended location. The net result of this is that endogenous attention
selects the attended location irrespective of modality, with all stimuli presented at the attended loca-
tion receiving enhanced processing (e.g., Eimer and Driver 2000).
This proposal entails that feedforward and feedbackward connections between sensory areas
and associative regions in dFP mediate a transfer of spatial information across modalities. This
effectively means that endogenous attention “broadcasts” information about the currently attended
location between anatomically distant brain areas, thus mediating multisensory integration of space.
Drawing a loose analogy with the feature integration theory in the visual modality (Treisman and
Gelade 1980), we can think that space is like an “object” composed by multiple “features” (visual
location, auditory location, saccadic target location, etc). Each “feature” is represented in a specific
region of brain, including many sensory-specific, multisensory, and motor representations local-
ized in separate brain regions. Attention coordinates and binds together these representations via
492 The Neural Bases of Multisensory Processes
modulatory influences, thus generating a coherent representation of the whole “object,” that is, an
integrated representation of space.
However, traditional views of multisensory integration posit that signals in different modalities
interact in an automatic manner, suggesting “preattentive” mechanisms of multisensory integration.
The next two sections will address this issue in more detail, first looking for multisensory effects
in paradigms involving stimulus-driven rather than voluntary attention, and then discussing a set
of studies that directly tested for the interplay between endogenous and stimulus-driven factors in
multisensory spatial attention. In the last section, I will further specify the possible relationship
between attention control and multisensory integration.
3 4
2 3
Cong
1 2
1
0 0
–1 –1
–2 –2
–3 –3
–4 –4
VL VR VL VR T V VL VR VL VR
TL TR TR TL TL TR TR TL
Incong
Cong Incong Cong Incong
6 6
4 4
Low
2 2
+
0 VT 0
Cong Incong Cong Incong Cong Incong Cong Incong
LOW load HIGH load LOW load HIGH load
FIGURE 25.3 Stimulus-driven cross-modal spatial attention and interactions with endogenous control.
(a) Stimulus-driven cross-modal influences in visual cortex. In this event-related fMRI study (unpublished
data), subjects performed a visual discrimination task (“up/down” judgment) with visual stimuli presented
in left or right hemifield near the forehead. Task-irrelevant touch was presented equiprobably on left or right
side of the forehead, yielding to spatially congruent trials (vision and touch on same side; e.g., both stimuli on
right side, cf. top-central panel) and incongruent trials (vision and touch on opposite sides; e.g., vision on the
right and touch on the left). Imaging data tested for interaction between position of visual target (left/right) and
spatial congruence of bimodal stimulation (congruent/incongruent: e.g., testing for greater activation for right
than left visual targets, in spatially congruent vs. incongruent trials). This revealed activity enhancement in
occipital visual areas when a contralateral visual target was coupled with a spatially congruent task-irrelevant
touch. For example, left occipital cortex showed greater activation comparing “right minus left visual targets,”
when touch was congruent vs. incongruent (see signal plot on left side: compare “bar 2 minus 1” vs. “bar 4
minus 3”); effectively yielding to maximal activation of left occipital cortex area when a right visual target
was combined with right touch on same side (see bar 2, in same plot). (b) Stimulus-driven cross-modal influ-
ences and endogenous visuospatial attention. (From Zimmer, U. and Macaluso, E., Eur. J. Neurosci., 26,
1681–1691, 2007.) Also in this study, we indexed side-specific cross-modal influences testing for interaction
between position of visual stimuli and spatial congruence of visuo-tactile input (see also Figure 25.3a; note
that, for simplicity, panel b shows only “right-congruent” condition), but now with both vision and touch fully
task-irrelevant. We assessed these cross-modal spatial effects under two conditions of endogenous visuospa-
tial attentional load. In “High load” condition, subjects were asked to detect subtle changes of orientation of
a grating patch presented above fixation. In “Low load” condition, they detected changes of luminance at
fixation. fMRI results showed that activity in occipital cortex increased for spatially congruent visuo-tactile
stimuli in contralateral hemifield, and that—critically—this occurred irrespective of load of visuospatial
endogenous task. Accordingly, analogous effects of spatial congruence were found in “Low load” condition
(bar 1 minus 2) and in “High load” condition (bar 3 minus 4, in each signal plot). V/T, vision/touch; L/R, left/
right; Cong/Incong, congruent (VT on the same side)/incongruent (VT on opposite sides).
Spatial Constraints in Multisensory Attention 495
The finding that cross-modal influences in sensory-specific occipital cortex can take posture
into account suggests that intermediate brain structures representing the current posture are also
involved. Postural signals have been found to affect activity in many different regions of the brain,
including fronto-parietal areas that also participate in attention control and multisensory processing
(Andersen et al. 1997; Ben Hamed and Duhamel 2002; Boussaoud et al. 1998; Bremmer et al. 1999;
Kalaska et al. 1997; Fasold et al. 2008). Hence, we can hypothesize that the fronto-parietal cortex
may also take part in stimulus-driven multisensory attention control.
In the visual modality, stimulus-driven control has been associated primarily with activation
of a vFP, including the TPJ and the IFG. These areas activate when subjects are cued to attended
to one hemifield but the visual target appears on the opposite side (invalid trials), thus triggering
a stimulus/target-driven shift of visuospatial attention (plus other task-related resetting processes;
see below). We employed a variation of this paradigm to study stimulus-driven shifts of attention
in vision and in touch (Macaluso et al. 2002c). A central informative cue instructed the subject
to attend to one side. On 80% of the trials the target appeared on the attended side (valid trials),
whereas in the remaining 20% of the trials the target appeared on the opposite side (invalid trials).
Critically, the target could be either visual (LED near to the left/right hands, on each side) or tactile
(air puff on the left/right hands). The modality of the target stimulus was randomized and unpre-
dictable, thus subjects could not strategically prepare to perform target discrimination in one or the
other modality. The dorsal FP network activated irrespective of cue validity, consistent with the role
of this network in voluntary shifts of attention irrespective of modality (see also Wu et al. 2007).
The direct comparison of invalid versus valid trials revealed activation of the vFP (TPJ and IFG),
both for invalid visual targets and for invalid tactile targets. This demonstrates that both visual and
tactile target stimuli at the unattended location can trigger stimulus-driven reorienting of spatial
attention and activation of the vFP network (see also Mayer et al. 2006; Downar et al. 2000).
Nonetheless, extensive investigation of spatial cueing paradigms in the visual modality indicates
that the activation of the vFP network does not reflect pure stimulus-driven control. As a matter of
fact, invalid trials involve not only stimulus-driven shifts of attention from the cued location to the
new target location, but they also entail breaches of expectation (Nobre et al. 1999), updating task-
related settings (Corbetta and Shulman 2002) and processing of low frequency stimuli (Vossel et
al. 2006). Several different strategies have been undertaken to tease apart the contribution of these
factors (e.g., Kincade et al. 2005; Indovina and Macaluso 2007). Overall, the results of these studies
lead to the current view that task-related (e.g., the task-relevance of the reorienting stimulus, i.e.,
the target that requires judgment and response) and stimulus-driven factors jointly contribute to the
activation of the vFP system (see Corbetta et al. 2008 for review).
Additional evidence for the role of task relevance for the activation of vFP in the visual modality
comes from a recent fMRI study, where we combined endogenous predictive cues and exogenous
nonpredictive visual cues on the same trial (Natale et al. 2009). Each trial began with a central,
predictive endogenous cue indicating the most likely (left/right) location of the upcoming target.
The endogenous cue was followed by a task-irrelevant, nonpredictive exogenous cue (brighten-
ing and thickening of a box in the left or right hemifield) that was quickly followed by the (left or
right) visual target. This allowed us to cross factorially the validity of endogenous and exogenous
cues within the same trial. We reasoned that if pure stimulus-driven attentional control can influ-
ence activity in vFP, exogenous cues that anticipate the position of an “endogenous-invalid” task-
relevant target (e.g., endogenous cue left, exogenous cue right, target right) should affect reorienting
related activation of vFP. Behaviorally, we found that both endogenous and exogenous cues affected
response times. Subjects were faster to discriminate “endogenous-invalid” targets when the exog-
enous cue anticipated the position of the target (exogenous valid trials, as in the stimulus sequence
above). However, the fMRI data did not reveal any significant effect of the exogenous cues in the
vFP, which activated equivalently in all conditions containing task-relevant targets on the opposite
side of the endogenously cued hemifield (i.e., all endogenous-invalid trials). These findings are in
agreement with the hypothesis that fully task-irrelevant visual stimuli do not affect activity in vFP
496 The Neural Bases of Multisensory Processes
(even when the behavioral data demonstrate an influence on these task-irrelevant cues on target
discrimination; see also Kincade et al. 2005).
However, a different picture emerged when we used task-irrelevant auditory rather than visual
cues (Santangelo et al. 2009). The experimental paradigm was analogous to the pure visual study
described above, with a predictive endogenous cue followed by a nonpredictive exogenous cue (now
auditory) and by the visual target, within each trial. The visual targets were presented in the left/right
hemifields near to the subject’s face and the task-irrelevant auditory stimuli were delivered at cor-
responding external locations. The overall pattern of reaction times was similar to the visual study:
both valid endogenous and valid exogenous cues speeded up responses, confirming cross-modal
influences of the task-irrelevant auditory cues on the processing of the visual targets (McDonald et
al. 2000). The fMRI data revealed the expected activation of vFP for “endogenously invalid” visual
targets, demonstrating once again the role of these regions during reorienting toward task-relevant
targets (e.g., Corbetta et al. 2000). But critically, now the side of the task-irrelevant auditory stim-
uli was found to modulate activity in the vFP. Activation of the right TPJ for endogenous-invalid
trials diminished when the auditory cue was on the same side as the upcoming invalid target (e.g.,
endogenous cue left, exogenous auditory cue right, visual target right). Accordingly, task-irrelevant
sounds that anticipate the position of the invalid visual target reduce reorienting-related activation
in TPJ, demonstrating a “pure” stimulus-driven cross-modal spatial effect in the ventral attention
control system (but see also Downar et al. 2001; Mayer et al. 2009).
To summarize, multisensory studies of stimulus-driven attention showed that: (1) task-irrelevant
stimuli in one modality modulate activity in sensory-specific areas concerned with a different
modality, and they can do so in a spatially specific manner (e.g., boosting of activity in contralateral
occipital cortex for touch and vision on the same side); (2) spatially specific cross-modal influences
in sensory-specific areas take posture into account, suggesting indirect influences via higher-order
areas; (3) control regions in vFP operate supramodally, activating during stimulus-driven spatial
reorienting toward visual or tactile targets; (4) task-irrelevant auditory stimuli can modulate activity
in vFP, revealing a “special status” of multisensory stimulus-driven control compared with unisen-
sory visuospatial attention (cf. Natale et al. 2009). These findings call for an extension of site-source
models of attention control, which should take into account the “special status” of multisensory
stimuli. In particular, models of multisensory attention control should include pathways allowing
nonvisual stimuli to reach the visual cortex and to influence activity in the ventral attention network
irrespective of task-relevance.
Figure 25.1b shows some of the hypothetical pathways that may mediate these effects. “Pathway 1”
entails direct feedforward influences from auditory/somatosensory cortex into the vFP attention
system. The presence of multisensory neurons in the temporo-parietal cortex and inferior premor-
tor cortex (Bruce et al. 1981; Barraclough et al. 2005; Hyvarinen 1981; Dong et al. 1994; Graziano
et al. 1997), plus activation of these regions for vision, audition, and touch in humans (Macaluso
and Driver 2001; Bremmer et al. 2001; Beauchamp et al. 2004; Downar et al. 2000) is consistent
with convergent multisensory projections into the vFP. A possible explanation for the effect of task-
irrelevant auditory cues in TPJ (see Santangelo et al. 2009) is that feedforward pathways from the
auditory cortex, unlike the pathway from occipital cortex, might not be under “task-related inhibi-
tory influences” (see Figure 25.1a). The hypothesis of inhibitory influences on the visual, occipital-
to-TPJ pathway was initially put forward by Corbetta and Shulman (2002) as a possible explanation
for why task-irrelevant visual stimuli do not activate TPJ (see also Natale et al. 2009). More recently
the same authors suggested that these inhibitory effects may arise from the middle frontal gyrus
and/or via subcortical structures (locus coeruleus; for details on this topic, see Corbetta et al. 2008).
Our finding of a modulatory effect by task-irrelevant audition in TPJ (Santangelo et al. 2009) sug-
gests that these inhibitory effects may not apply in situations involving task-irrelevant stimuli in a
modality other than vision.
“Pathway 2” involves indirect influences of multisensory signals in the ventral FP network, via
dorsal FP regions. Task-related modulations of the pathway between occipital cortex and TPJ are
Spatial Constraints in Multisensory Attention 497
thought to implicate the dFP network (Corbetta et al. 2008; see also the previous paragraph). Because
multisensory stimuli can affect processing in the dorsal FP network (via feedforward convergence),
these may in turn modify any influence that the dorsal network exerts on the ventral network (see
also He et al. 2007, for an example of how changes/lesions of one attention network can affect func-
tioning of the other network). This could comprise the abolishment of any inhibitory influence on
(auditory) task-irrelevant stimuli. The involvement of dorsal FP areas may also be consistent with
the finding that cross-modal effects in unisensory areas take posture into account. Postural signals
modulate activity of neurons in many dFP regions (e.g., Andersen et al. 1997; Ben Hamed et al.
2002; Boussaoud et al. 1998; Bremmer et al. 1999; Kalaska et al. 1997). An indirect route via dFP
could therefore combine sensory signals and postural information about eyes/head/body, yielding
to cross-modal influences according to position in external space (cf. Stein and Steinford 2008; but
note that postural signals are available in multisensory regions of the vFP network; Graziano et al.
1997; and the SC, Grossberg et al. 1997; see also Pouget et al. 2002 Deneve and Pouget 2004 for
computational models on this issue).
“Pathway 3” involves direct anatomical projections between sensory-specific areas that process
stimuli in different modalities. These have been now reported in many animal studies (e.g., Falchier
et al. 2002; Rockland and Ojima 2003; Cappe and Barone 2005) and could mediate automatic
influences of one modality (e.g., touch) on activity in sensory-specific areas of a different modality
(e.g., occipital visual cortex; see also Giard and Peronnet 1999; Kayser et al. 2005; Eckert et al.
2008). These connections between sensory-specific areas may provide fast, albeit spatially coarse,
indications about the presence of a multisensory object or event in the external environment. In
addition, a direct effect of audition or touch in occipital cortex could change the functional con-
nectivity between occipital cortex and TPJ (see Indovina and Macaluso 2004), also determining
stimulus-driven cross-modal influences in vFP.
Finally, additional pathways are likely to involve subcortical structures (“pathways 4” in Figure
25.1b). Many different subcortical regions contain multisensory neurons and can influence cortical
processing (e.g., superior colliculus, Meredith and Stein 1983; thalamus, Cappe et al. 2009; basal
ganglia, Nagy et al. 2006). In addition, subcortical structures are important for spatial orienting
(e.g., intermediate and deep layers SC are involved in the generation of overt saccadic responses; see
also Frens and Van Opstal 1998, for a study on overt orienting to bimodal stimuli) and have been
linked to selection processes in spatial attention (Shipp 2004). The critical role of SC for combining
spatial information across sensory modalities has been also demonstrated in two recent behavioral
studies (Maravita et al. 2008; Leo et al. 2008). These showed that superior behavioral performance
for spatially aligned, same-side versus opposite-side audiovisual trials disappears when the visual
stimuli are invisible to the SC (purple/blue stimuli).
highlight interactions between stimuli in different senses. These include phenomenological mea-
sures such as the perception of multisensory illusions (e.g., as in the “McGurk” illusion, McGurk
and MacDonald 1976; see also Soto-Faraco and Alsius 2009; or the “sound-bounce” illusion,
Bushara et al. 2003), behavioral criteria based on violations of the Miller inequality (Miller 1982;
see Tajadura-Jiménez et al. 2009, for an example), or physiological measures related to nonlinear
effects in single-cell spiking activity (Meredith and Stein 1986b), EEG (Giard and Peronnet 1999),
or fMRI (Calvert et al. 2001) signals. At present, there is still no consensus as most of these mea-
sures have drawbacks and no single index appears suitable for all possible experimental situations
(for an extensive treatment, see Beauchamp 2005; Laurienti et al. 2005; Holmes 2009).
In the case of cross-modal spatial cueing effects in stimulus-driven attention, the issue is further
complicated by the fact that stimulus-driven effects are driven by changes in stimulus configuration
(same vs. different position), which is also considered a critical determinant for multisensory inte-
gration (Meredith and Stein 1986b). Therefore, it is difficult to experimentally tease apart these two
processes. In our initial study (Macaluso et al. 2000b), we showed boosting of activity in occipital
cortex contralateral to the position of spatially congruent bimodal visuo-tactile stimuli that were
presented simultaneously and for a relatively long duration (300 ms). McDonald et al. (2001) argued
that these cross-modal influences may relate to multisensory interactions rather than spatial atten-
tion, as there was no evidence that task-irrelevant touch captured attention on the side of the visual
target. However, this point is difficult to address because it is impossible to obtain behavioral evi-
dence that exogenous cues—which by definition do not require any response—trigger shifts of
spatial attention. A related argument was put forward suggesting that a minimum condition to dis-
entangle attention versus integration is to introduce a gap between the offset of the cue and the onset
of the target (McDonald et al. 2001). This should eliminate multisensory integration (the trial would
never include simultaneous bimodal stimulation), while leaving spatial attentional effects intact
(i.e., faster and more accurate behavioral responses for same-side vs. opposite-side trials). However,
we have previously argued that criteria based on stimulus timing may be misleading because of
differential response latencies and discharge proprieties of neurons in different regions of the brain
(Macaluso et al. 2001). Thus, physically nonoverlapping stimuli (e.g., an auditory cue that precedes
a visual target) may produce coactivation of a bimodal neuron that has shorter response latency for
audition than for vision (e.g., see Meredith et al. 1987; for related findings using ERPs in humans;
see also Meylan and Murray 2007).
As an extension of the idea that the temporal sequence of events may be used to disentangle the
role of attention and multisensory integration in stimulus-driven cross-modal cueing paradigms
(McDonald et al. 2001), one may consider the timing of neuronal activation rather than the tim-
ing of the external stimuli. This can be addressed in the context of site-source models of attention
(cf. Figure 25.1). Along these lines, Spence et al. (2004) suggested that if control regions activate
before any modulation in sensory areas, this would speak for a key role of attention in cross-modal
integration; meanwhile, if attentional control engages after cross-modal effects in sensory-specific
areas, this would favor the view that multisensory integration takes place irrespective of attention.
In the latter case, cross-modal cueing effects could be regarded as arising as a “consequence” of
the integration process (see also Busse et al. 2005). Using ERP and dipole source localization in a
stimulus-driven audiovisual cueing paradigm, McDonald and colleagues (2003) found that associa-
tive regions in the posterior temporal cortex activate before any cross-modal spatial effect in the
visual cortex. In this study, there was a 17- to 217-ms gap between cue offset and target onset, and
the analysis of the behavioral data showed increased perceptual sensitivity (d′) for valid compared to
invalid trials. Accordingly, the authors suggested that the observed sequence of activation (including
cross-modal influences of audition on visual ERPs) could be related to involuntary shifts of spatial
attention. However, this study did not assess brain activity associated specifically with the exog-
enous cues, thus again not providing any direct evidence for cue-related shifts of attention. Using a
different approach to investigate the dynamics of cross-modal influences in sensory areas, a recent
fMRI study of functional connectivity showed that during processing of simultaneous audiovisual
Spatial Constraints in Multisensory Attention 499
streams, temporal areas causally influence activity in visual and auditory cortices, rather than the
other way round (Noesselt et al. 2007). Thus, cross-modal boosting of activity in sensory-specific
areas seems to arise because of backprojections from multisensory regions, emphasizing the causal
role of high-order associative areas and consistent with some coupling between attention control
and the sharing of spatial information across sensory modalities (which, depending on the defini-
tion, can be viewed as an index of multisensory integration).
More straightforward approaches can be undertaken to investigate the relationship between
endogenous attention and multisensory integration. Still pending on the specific definition of mul-
tisensory integration (see above), one may ask whether endogenous attention affects the way signals
in different modalities interact with each other. For example, Talsma and Woldorff (2005) indexed
multisensory integration using a supra-additive criterion on ERP amplitudes (AV > A + V), and
tested whether this was different for stimuli at the endogenously attended versus unattended side
(note that both vision and audition were task-relevant/attended in this experiment). Supra-additive
responses for AV stimuli were found in frontal and centro-medial scalp sites. Critically, this effect
was larger for stimuli at the attended than the unattended side, demonstrating some interplay
between spatial endogenous attention and multisensory integration (see also the study of Talsma et
al. 2007, who manipulated relevant-modality rather than relevant-location).
In a similar vein, we have recently investigated the effect of selective visuospatial endogenous
attention on the processing of audiovisual speech stimuli (Fairhall and Macaluso 2009). Subjects
were presented visually with two “speaking mouths” simultaneously in the left and right visual
fields. A central auditory stream (speaking voice) was congruent with one of the two visual stimuli
(mouth reading the same tale’s passage) and incongruent with the other one (mouth reading a dif-
ferent passage). In different blocks, subjects were asked to attend either to the congruent or to the
incongruent visual stimulus. In this way, we were able to keep the absolute level of multisensory
information present in the environment constant, testing specifically for the effect of selective spa-
tial attention to congruent or incongruent multisensory stimuli. The results showed that endogenous
visuospatial attention can influence the processing of audiovisual stimuli, with greater activation
for “attend to congruent” than “attend to incongruent” conditions. This interplay between attention
and multisensory processing was found to affect brain activity at multiple stages, including high-
level regions in the superior temporal sulcus, subcortically in the superior colliculus, as well as in
sensory-specific occipital visual cortex (V1 and V2).
Endogenous attention has been found not only to boost multisensory processing, but also in
some cases to reduce responses for attended versus unattended multisensory stimuli. For example,
van Atteveldt and colleagues (2007) presented subjects with letter–sound pairs that were either
congruent or incongruent. Under conditions of passive listening, activity increased in association
cortex for congruent compared to incongruent presentations. However, this effect disappeared as
soon as subjects were asked to perform an active “same/different” judgment with the letters and
sounds. The authors suggested that voluntary top-down attention can overrule bottom-up multi-
sensory interactions (see also Mozolic et al. 2008, on the effect of active attention to one modality
during multisensory stimulation). In another study on audiovisual speech, Miller and D’Esposito
(2005) dissociated patterns of activation related to physical stimulus attributes (synchronous vs.
asynchronous stimuli) and perception (“fused” vs. “unfused” percept). This showed that active per-
ception leads to increases in activity in the auditory cortex and the superior temporal sulcus for
fused audiovisual stimuli, whereas in the SC activity decreased for synchronous vs. asynchronous
stimuli, irrespective of perception. These results indicate that constraints of multisensory integra-
tion may change as a function of endogenous factors (fused/unfused percept), for example, with
synchronous audiovisual stimuli reducing rather than increasing activity in the SC (cf. Miller and
D’Esposito 2005 and Meredith et al. 1987).
Another approach to investigate the relationship between endogenous attention and multisensory
integration is to manipulate the attentional load of a primary task and to assess how this influences
multisensory processing. The underlying idea is that if a single/common pool of neural resources
500 The Neural Bases of Multisensory Processes
mediates both processes, increasing the amount of resources spent on a primary attentional task
should lead to some changes in the processing of the multisensory stimuli. On the contrary, if multi-
sensory integration does not depend on endogenous attention, changes in the attentional task should
not have any influence on multisensory processing. We used this approach to investigate the possible
role of endogenous visuospatial attention for the integration of visuo-tactile stimuli (Zimmer and
Macaluso 2007). We indexed multisensory integration comparing same-side versus opposite-side
visual–tactile stimuli and assessing activity enhancement in contralateral occipital cortex for the
same-side condition (cf. Figure 25.3a). These visual and tactile stimuli were fully task-irrelevant
and did not require any response. Concurrently, we asked subjects to perform a primary endogenous
visuospatial attention task. This entailed either attending to central fixation (low load) or sustaining
visuospatial covert attention to a location above fixation to detect subtle orientation changes in a
grating patch (high load; see Figure 25.3b). The results showed cross-modal enhancements in the
contralateral visual cortex for spatially congruent trials, irrespective of the level of endogenous load
(see signal plots in Figure 25.3b). These findings suggest that the processing of visuo-tactile spatial
congruence in visual cortex can be uncoupled from endogenous visuospatial attention control (see
also Mathiak et al. 2005, for a magnetoencephalography study reporting related findings in auditory
cortex).
In summary, direct investigation of the possible relationship between attention control and mul-
tisensory integration revealed that voluntary attention to multisensory stimuli or changing the task
relevance of the unisensory components of a multisensory stimulus (attend to one modality, to both,
or to neither) can affect multisensory interactions. This indicates that—to some extent—attention
control and multisensory integration make use of a shared pool of processing resources. However,
when both components of a multisensory stimulus are fully task-irrelevant, changes in cognitive
load in a separate task does not affect the integration of the multisensory input (at least for the load
manipulations reported by Zimmer et al. 2007; Mathiak et al. 2005).
Taken together, these findings suggest that multisensory interactions can occur at multiple levels
of processing, and that different constraints apply depending on the relative weighting of stimulus-
driven and endogenous attentional requirements. This multifaceted scenario can be addressed in
the context of models of spatial attention control that include multiple routes for the interaction of
signals in different modalities (see Figure 25.1b). It can be hypothesized that some of these path-
ways (or network’s nodes) are under the modulatory influence of endogenous and/or stimulus-driven
attention. For instance, cross-modal interactions that involve dorsal FP areas are likely to be subject
to endogenous and task-related attentional factors (e.g., see Macaluso et al. 2002b). Conversely,
stimulus-driven factors may influence multisensory interactions that take place within or via the
ventral FP system (e.g., Santangelo et al. 2009). Direct connections between sensory-specific areas
should be—at least in principle—fast, automatic, and preattentive (Kayser et al. 2005), although
attentional influences may then superimpose on these (e.g., see Talsma et al. 2007). Some interplay
between spatial attention and multisensory processing can take place also in subcortical areas, as
demonstrated by attentional modulation there (Fairhall et al. 2008; see also Wallace and Stein 1994;
Wilkinson et al. 1996, for the role of cortical input on multisensory processing in the SC).
25.6 CONCLUSIONS
Functional imaging studies of multisensory spatial attention revealed a complex interplay between
effects associated with the external stimulus configuration (e.g., spatially congruent vs. incon-
gruent multisensory input) and endogenous task requirements. Here, I propose that these can be
addressed in the context of “site-source” models of attention that include control regions in dorsal
and vFP associative cortex, connected via feedforward and feedback projections with sensory-
specific areas (plus subcortical regions). This architecture permits sharing spatial information
across multiple brain regions that represent space (unisensory, multisensory, plus motor represen-
tations). Spatial attention and the selection of currently relevant location result from the dynamic
Spatial Constraints in Multisensory Attention 501
interplay between the nodes of this network, with both stimulus-driven and endogenous factors
influencing the relative contribution of each node and pathway. I propose that the coordination of
activity within this complex network underlies the integration of space across modalities, produc-
ing a sensory–motor system that allows us to perceive and act within a unified representation of
external space.
In this framework, future studies may seek to better specify the dynamics of this network. A
key issue concerns possible causal links between activation of some parts of the network and atten-
tion/integration effects in other parts of the network. This relationship is indeed a main feature of
the “sites-sources” distinction emphasized in this model. This can be addressed in several ways.
Transcranic magnetic stimulation (TMS) can be used to transiently knock out one node of the
network during multisensory attention tasks, revealing the precise timing of activation of each net-
work’s node. Using this approach, Chambers and colleagues (2004a) identified two critical windows
for the activation of inferior parietal cortex during visuospatial reorienting, and demonstrated the
involvement of the same region (the angular gyrus) for stimulus-driven visuo-tactile spatial inter-
actions (Chambers et al. 2007; but see also Chambers et al. 2004b, for modality-specific effects).
TMS was also used to demonstrate the central role of posterior parietal cortex for spatial remapping
between vision and touch (Bolognini and Maravita 2007) and to infer direct influences of auditory
input on human visual cortex (Romei et al. 2007). Most recently, TMS has been combined with
fMRI, which allows investigating the causal influence of one area (e.g., frontal or parietal regions)
on activity in other areas (e.g., sensory-specific visual areas; see Ruff et al. 2006; and Bestmann et
al. 2008, for review). These studies may be extended to multisensory attention paradigms, looking
for the coupling between fronto-parietal attention control regions and sensory areas as a function
of the type of input (unisensory or multisensory, spatially congruent or incongruent). Task-related
changes in functional coupling between brain areas can also be assessed using analyses of effective
connectivity (e.g., dynamic causal modeling; Stephan et al. 2007). These have been successfully
applied to both fMRI and ERP data in multisensory experiments, showing causal influences of
associative areas in parietal and temporal cortex on sensory processing in the visual cortex (Moran
et al. 2008; Noesselt et al. 2007; Kreifelts et al. 2007). Future studies may combine attentional
manipulations (e.g., the direction of endogenous attention) and multisensory stimuli (e.g., spatially
congruent vs. incongruent multisensory input), providing additional information on the causal role
of top-down and bottom-up influences for the formation of an integrated system that represents
space across sensory modalities.
REFERENCES
Alsius, A., J. Navarra, R. Campbell, and S. Soto-Faraco. 2005. Audiovisual integration of speech falters under
high attention demands. Curr Biol 15: 839–843.
Andersen, R. A., L. H. Snyder, D. C. Bradley, and J. Xing. 1997. Multimodal representation of space in the
posterior parietal cortex and its use in planning movements. Annu Rev Neurosci 20: 303–330.
Arrington, C. M., T. H. Carr, A. R. Mayer, and S. M. Rao. 2000. Neural mechanisms of visual attention: Object-
based selection of a region in space. J Cogn Neurosci 2: 106–117.
Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and audi-
tory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci
17: 377–391.
Beauchamp, M. S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics 3:
93–113.
Beauchamp, M. S., B. D. Argall, J. Bodurka, J. H. Duyn, and A. Martin. 2004. Unraveling multisensory integra-
tion: Patchy organization within human STS multisensory cortex. Nat Neurosci 7: 1190–1192.
Ben Hamed, S., J. R. Duhamel, F. Bremmer, and W. Graf. 2001. Representation of the visual field in the lat-
eral intraparietal area of macaque monkeys: A quantitative receptive field analysis. Exp Brain Res 140:
127–144.
Ben Hamed, S., and J. R. Duhamel. 2002. Ocular fixation and visual activity in the monkey lateral intraparietal
area. Exp Brain Res 142: 512–528.
502 The Neural Bases of Multisensory Processes
Bertelson, P., J. Vroomen, B. de Gelder, and J. Driver. 2000. The ventriloquist effect does not depend on the
direction of deliberate visual attention. Percept Psychophys 62: 321–332.
Bestmann, S., C. C. Ruff, F. Blankenburg, N. Weiskopf, J. Driver, and J. C. Rothwell. 2008. Mapping causal
interregional influences with concurrent TMS-fMRI. Exp Brain Res 191: 383–402.
Bolognini, N., and A. Maravita. 2007. Proprioceptive alignment of visual and somatosensory maps in the pos-
terior parietal cortex. Curr Biol 17: 1890–1895.
Boussaoud, D., C. Jouffrais, and F. Bremmer. 1998. Eye position effects on the neuronal activity of dorsal
premotor cortex in the macaque monkey. J Neurophysiol 80: 1132–1150.
Bremmer, F., W. Graf, S. Ben Hamed, and J. R. Duhamel. 1999. Eye position encoding in the macaque ventral
intraparietal area (VIP). Neuroreport 10: 873–878.
Bremmer, F., A. Schlack, N. J. Shah, O. Zafiris, M. Kubischik, K. Hoffmann et al. 2001. Polymodal motion
processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalencies
between humans and monkeys. Neuron 29: 287–296.
Bressler, S. L., W. Tang, C. M. Sylvester, G. L. Shulman, and M. Corbetta. 2008. Top-down control of human
visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. J Neurosci 28:
10056–10061.
Bruce, C., R. Desimone, and C. G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. J Neurophysiol 46: 369–384.
Bushara, K. O., T. Hanakawa, I. Immisch, K. Toma, K. Kansaku, and M. Hallett. 2003. Neural correlates of
cross-modal binding. Nat Neurosci 6:190–195.
Busse, L., K. C. Roberts, R. E. Crist, D. H. Weissman, and M. G. Woldorff. 2005. The spread of attention across
modalities and space in a multisensory object. Proc Natl Acad Sci USA 102: 18751–18756.
Calvert, G. A., P. C. Hansen, S. D. Iversen, and M. J. Brammer. 2001. Detection of audio-visual integration sites
in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage 14: 427–438.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. Eur J Neurosci 22: 2886–2902.
Cappe, C., A. Morel, P. Barone, and E. M. Rouiller. 2009. The thalamocortical projection systems in primate:
an anatomical support for multisensory and sensorimotor interplay. Cereb Cortex 19: 2025–2037.
Chambers, C. D., J. M. Payne, and J. B. Mattingley. 2007. Parietal disruption impairs reflexive spatial attention
within and between sensory modalities. Neuropsychologia 45: 1715–1724.
Chambers, C. D., J. M. Payne, M. G. Stokes, and J. B. Mattingley. 2004a. Fast and slow parietal pathways
mediate spatial attention. Nat Neurosci 7: 217–218.
Chambers, C. D., M. G. Stokes, and J. B. Mattingley. 2004b. Modality-specific control of strategic spatial atten-
tion in parietal cortex. Neuron 44: 925–930.
Ciaramitaro, V. M., G. T. Buracas, and G. M. Boynton. 2007. Spatial and cross-modal attention alter responses
to unattended sensory information in early visual and auditory human cortex. J Neurophysiol 98:
2399–2413.
Corbetta, M., J. M. Kincade, J. M., Ollinger, M. P. McAvoy, and G. L. Shulman. 2000. Voluntary orienting is
dissociated from target detection in human posterior parietal cortex. Nat Neurosci 3: 292–297.
Corbetta, M., G. Patel, and G. L. Shulman. 2008. The reorienting system of the human brain: From environ-
ment to theory of mind. Neuron 58: 306–324.
Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat
Rev Neurosci 3: 215–229.
Corbetta, M., A. P. Tansy, C. M. Stanley, S. V. Astafiev, A. Z. Snyder, and G. L. Shulman. 2005. A functional
MRI study of preparatory signals for spatial location and objects. Neuropsychologia 43: 2041–2056.
Deneve, S., and A. Pouget. 2004. Bayesian multisensory integration and cross-modal spatial links. J Physiol
Paris 98: 249–258.
Desimone, R., and J. Duncan. 1995. Neural mechanisms of selective visual attention. Annl Rev Neurosci 18:
193–222.
Dong, W. K., E. H. Chudler, K. Sugiyama, V. J. Roberts, and T. Hayashi. 1994. Somatosensory, multisen-
sory, and task-related neurons in cortical area 7b (PF) of unanesthetized monkeys. J Neurophysiol 72:
542–564.
Downar, J., A. P. Crawley, D. J. Mikulis, and K. D. Davis. 2000. A multimodal cortical network for the detec-
tion of changes in the sensory environment. Nat Neurosci 3: 277–283.
Downar, J., A. P. Crawley, D. J. Mikulis, and K. D. Davis. 2001. The effect of task relevance on the cortical response
to changes in visual and auditory stimuli: An event-related fMRI study. Neuroimage 14: 1256–1267.
Driver, J., and C. Spence. 1998. Attention and the crossmodal construction of space. Trends Cogn Sci 2:
254–262.
Spatial Constraints in Multisensory Attention 503
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent
visual and somatic response properties. J Neurophysiol 79: 126–136.
Eckert, M. A., N. V. Kamdar, C. E. Chang, C. F. Beckmann, M. D. Greicius, and V. Menon. 2008. A cross-
modal system linking primary auditory and visual cortices: Evidence from intrinsic fMRI connectivity
analysis. Hum Brain Mapp 29: 848–857.
Eimer, M. 1999. Can attention be directed to opposite locations in different modalities? An ERP study. Clin
Neurophysiol 110: 1252–1259.
Eimer, M., and J. Driver. 2000. An event-related brain potential study of cross-modal links in spatial attention
between vision and touch. Psychophysiology 37: 697–705.
Eimer, M., and J. Driver. 2001. Crossmodal links in endogenous and exogenous spatial attention: Evidence
from event-related brain potential studies. Neurosci Biobehav Rev 25: 497–511.
Eimer, M., and J. van Velzen. 2002. Crossmodal links in spatial attention are mediated by supramodal control
processes: Evidence from event-related potentials. Psychophysiology 39: 437–449.
Eimer, M., J. van Velzen, and J. Driver. 2002. Cross-modal interactions between audition, touch, and vision
in endogenous spatial attention: ERP evidence on preparatory states and sensory modulations. J Cogn
Neurosci 14: 254–271.
Fairhall, S. L., and E. Macaluso. 2009. Spatial attention can modulate audiovisual integration at multiple corti-
cal and subcortical sites. Eur J Neurosci 29: 1247–1257.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration
in primate striate cortex. J Neurosci 22: 5749–5759.
Farah, M. J., A. B. Wong, M. A. Monheit, and L. A. Morrow. 1989. Parietal lobe mechanisms of spatial atten-
tion: Modality-specific or supramodal? Neuropsychologia 27: 461–470.
Fasold, O., J. Heinau, M. U. Trenner, A. Villringer, and R. Wenzel. 2008. Proprioceptive head posture-related
processing in human polysensory cortical areas. Neuroimage 40: 1232–1242.
Frens, M. A., and A. J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in
monkey superior colliculus. Brain Res Bull 46: 211–224.
Giard, M. H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in
humans: A behavioral and electrophysiological study. J Cogn Neurosci 11: 473–490.
Graziano, M. S., and C. G. Gross. 1993. A bimodal map of space: Somatosensory receptive fields in the macaque
putamen with corresponding visual receptive fields. Exp Brain Res 97: 96–109.
Graziano, M. S., and C. G. Gross. 1995. The representation of extrapersonal space: A possible role for bimodal,
visuo-tactile neurons. In The Cognitive Neurosciences, ed. M. S. Gazzaniga, 1021–1034. Cambridge,
MA: MIT Press.
Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. J
Neurophysiol 77: 2268–2292.
Green, J. J., and J. J. McDonald. 2008. Electrical neuroimaging reveals timing of attentional control activity in
human brain. PLoS Biol 6: 81.
Green, J. J., W. A. Teder-Salejarvi, and J. J. McDonald. 2005. Control mechanisms mediating shifts of attention
in auditory and visual space: A spatio-temporal ERP analysis. Exp Brain Res 166: 358–369.
Gross, C. G., and M. S. Graziano. 1995. Multiple representations of space in the brain. The Neuroscientist 1: 43–50.
Grossberg, S., K. Roberts, M. Aguilar, and D. Bullock. 1997. A neural model of multimodal adaptive saccadic
eye movement control by superior colliculus. J Neurosci 17: 9706–9725.
Hagler Jr., D. J., and M. I. Sereno. 2006. Spatial maps in frontal and prefrontal cortex. Neuroimage 29:
567–577.
He, B. J., A. Z. Snyder, J. L. Vincent, A. Epstein, G. L. Shulman, and M. Corbetta. 2007. Breakdown of func-
tional connectivity in frontoparietal networks underlies behavioral deficits in spatial neglect. Neuron 53:
905–918.
Heinze��������������������������������������������������������������������������������������������������
, H. J., G. R. Mangun, W. Burchert, H. Hinrichs, M. Scholz, T. F. Munte et al. 1994. �������������
Combined spa-
tial and temporal imaging of brain activity during visual selective attention in humans. Nature 372:
543–546.
Holmes, N. P. 2009. The principle of inverse effectiveness in multisensory integration: Some statistical consid-
erations. Brain Topogr 21: 168–176.
Hopfinger, J. B., M. H. Buonocore, and G. R. Mangun. 2000. The neural mechanisms of top-down attentional
control. Nat Neurosci 3: 284–291.
Hotting, K., F. Rosler, and B. Roder. 2003. Crossmodal and intermodal attention modulate event-related brain
potentials to tactile and auditory stimuli. Exp Brain Res 148: 26–37.
Hyvarinen, J. 1981. Regional distribution of functions in parietal association area 7 of the monkey. Brain Res
206: 287–303.
504 The Neural Bases of Multisensory Processes
Indovina, I., and E. Macaluso, E. 2004. Occipital–parietal interactions during shifts of exogenous visuospatial
attention: Trial-dependent changes of effective connectivity. Magn Reson Imaging 22: 1477–1486.
Indovina, I., and E. Macaluso. 2007. Dissociation of stimulus relevance and saliency factors during shifts of
visuospatial attention. Cereb Cortex 17: 1701–1711.
Kalaska, J. F., S. H. Scott, P. Cisek, and L. E. Sergio. 1997. Cortical control of reaching movements. Curr Opin
Neurobiol 7: 849–859.
Kastner, S., M. A. Pinsk, P. De Weerd, R. Desimone, and L. G. Ungerleider. 1999. Increased activity in human
visual cortex during directed attention in the absence of visual stimulation. Neuron 22: 751–761.
Kastner, S., and L. G. Ungerleider. 2001. The neural basis of biased competition in human visual cortex.
Neuropsychologia 39: 1263–1276.
Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48: 373–384.
Kelley, T. A., J. T. Serences, B. Giesbrecht, and S. Yantis. 2008. Cortical mechanisms for shifting and holding
visuospatial attention. Cereb Cortex 18: 114–325.
Kennett, S., M. Eimer, C. Spence, and J. Driver. 2001. Tactile–visual links in exogenous spatial attention under
different postures: Convergent evidence from psychophysics and ERPs. J Cogn Neurosci 13: 462–478.
Kida, T., K. Inui, T. Wasaka, K. Akatsuka, E. Tanaka, and R. Kakigi. 2007. Time-varying cortical activa-
tions related to visual–tactile cross-modal links in spatial selective attention. J Neurophysiol 97:
3585–3596.
Kincade, J. M., R. A. Abrams, S. V. Astafiev, G. L. Shulman, and M. Corbetta. 2005. An event-related functional
magnetic resonance imaging study of voluntary and stimulus-driven orienting of attention. J Neurosci
25: 4593–4604.
Kinsbourne, M. 1970. The cerebral basis of lateral asymmetries in attention. Acta Psychol (Amst) 33:
193–201.
Kreifelts, B., T. Ethofer, W. Grodd, M. Erb, and D. Wildgruber. 2007. Audiovisual integration of emotional
signals in voice and face: An event-related fMRI study. Neuroimage 37: 1445–1456.
Laurienti, P. J., J. H. Burdette, M. T. Wallace, Y. F. Yen, A. S. Field, and B. E. Stein. 2002. Deactivation of
sensory-specific cortex by cross-modal stimuli. J Cogn Neurosci 14: 420–429.
Laurienti, P. J., T. J. Perrault, T. R. Stanford, M. T. Wallace, and B. E. Stein. 2005. On the use of superadditiv-
ity as a metric for characterizing multisensory integration in functional neuroimaging studies. Exp Brain
Res 166: 289–297.
Leo, F., C. Bertini, G. di Pellegrino, and E. Ladavas. 2008. Multisensory integration for orienting responses in
humans requires the activation of the superior colliculus. Exp Brain Res 186: 67–77.
Lewis, J. W., M. S. Beauchamp, and E. A. DeYoe. 2000. A comparison of visual and auditory motion process-
ing in human cerebral cortex. Cereb Cortex 10: 873–888.
Luck, S. J., L. Chelazzi, S. A. Hillyard, and R. Desimone. 1997. Neural mechanisms of spatial selective atten-
tion in areas V1, V2, and V4 of macaque visual cortex. J Neurophysiol 77: 24–42.
Macaluso, E., and J. Driver. 2001. Spatial attention and crossmodal interactions between vision and touch.
Neuropsychologia 39: 1304–1316.
Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in
the human brain. Trends Neurosci 28: 264–271.
Macaluso, E., J. Driver, and C. D. Frith. 2003a. Multimodal spatial representations engaged in human parietal
cortex during both saccadic and manual spatial orienting. Curr Biol 13: 990–999.
Macaluso, E., M. Eimer, C. D. Frith, and J. Driver. 2003b. Preparatory states in crossmodal spatial attention:
Spatial specificity and possible control mechanisms. Exp Brain Res 149: 62–74.
Macaluso, E., C. Frith, and J. Driver. 2000a. Selective spatial attention in vision and touch: Unimodal and
multimodal mechanisms revealed by PET. J Neurophysiol 83: 3062–3075.
Macaluso, E., C. D. Frith, and J. Driver. 2005. Multisensory stimulation with or without saccades: fMRI evi-
dence for crossmodal effects on sensory-specific cortices that reflect multisensory location-congruence
rather than task-relevance. Neuroimage 26: 414–425.
Macaluso, E., C. D. Frith, and J. Driver. 2001. Multisensory integration and crossmodal attention effects in the
human brain. Science [Technical response] 292: 1791.
Macaluso, E., C. D. Frith, and J. Driver. 2002a. Crossmodal spatial influences of touch on extrastriate visual
areas take current gaze direction into account. Neuron 34: 647–658.
Macaluso, E., C. D. Frith, and J. Driver. 2002b. Directing attention to locations and to sensory modalities:
Multiple levels of selective processing revealed with PET. Cereb Cortex 12: 357–368.
Macaluso, E., C. D. Frith, and J. Driver. 2002c. Supramodal effects of covert spatial orienting triggered by
visual or tactile events. J Cogn Neurosci 14: 389–401.
Spatial Constraints in Multisensory Attention 505
Macaluso, E., C. D. Frith, and J. Driver. 2000b. Modulation of human visual cortex by crossmodal spatial atten-
tion. Science 289: 1206–1208.
Maravita, A., N. Bolognini, E. Bricolo, C. A. Marzi, and S. Savazzi. 2008. Is audiovisual integration subserved
by the superior colliculus in humans? Neuroreport 19: 271–275.
Martinez, A., L. Anllo-Vento, M. I. Sereno, L. R. Frank, R. B. Buxton, D. J. Dubowitz et al. 1999. Involvement
of striate and extrastriate visual cortical areas in spatial attention. Nat Neurosci 2: 364–369.
Massaro, D. W. 1999. Speechreading: Illusion or window into pattern recognition. Trends Cogn Sci 3:
310–317.
Mathiak, K., I. Hertrich, M. Zvyagintsev, W. Lutzenberger, and H. Ackermann. 2005. Selective influences of
cross-modal spatial-cues on preattentive auditory processing: A whole-head magnetoencephalography
study. Neuroimage 28: 627–634.
Mayer, A. R., A. R. Franco, and D. L. Harrington. 2009. Neuronal modulation of auditory attention by informa-
tive and uninformative spatial cues. Hum Brain Mapp 30: 1652–1666.
Mayer, A. R., D. Harrington, J. C. Adair, and R. Lee. 2006. The neural networks underlying endogenous audi-
tory covert orienting and reorienting. Neuroimage 30: 938–949.
McDonald, J. J., W. A. Teder-Salejarvi, F. Di Russo, and S. A. Hillyard. 2003. Neural substrates of perceptual
enhancement by cross-modal spatial attention. J Cogn Neurosci 15: 10–19.
McDonald, J. J., W. A. Teder-Salejarvi, and S. A. Hillyard. 2000. Involuntary orienting to sound improves
visual perception. Nature 407: 906–908.
McDonald, J. J., W. A. Teder-Salejarvi, and L. M. Ward. 2001. Multisensory integration and crossmodal atten-
tion effects in the human brain. Science 292: 1791.
McDonald, J. J., and L. M. Ward. 2000. Involuntary listening aids seeing: Evidence from human electrophysiol-
ogy. Psychol Sci 11: 167–171.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748.
Meredith, M. A., J. W. Nemitz, and B. E. Stein. 1987. Determinants of multisensory integration in superior
colliculus neurons: I. Temporal factors. J Neurosci 7: 3215–3229.
Meredith, M. A., and B. E. Stein. 1996. Spatial determinants of multisensory integration in cat superior col-
liculus neurons. J Neurophysiol 75: 1843–1857.
Meredith, M. A., and B. E. Stein. 1986a. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. J Neurophysiol 56: 640–662.
Meredith, M. A., and B. E. Stein. 1986b. Spatial factors determine the activity of multisensory neurons in cat
superior colliculus. Brain Res 365: 350–354.
Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221: 389–391.
Meyer, M., S. Baumann, S. Marchina, and L. Jancke. 2007. Hemodynamic responses in human multisensory
and auditory association cortex to purely visual stimulation. BMC Neurosci 8: 14.
Meylan, R. V., and M. M. Murray. 2007. Auditory–visual multisensory interactions attenuate subsequent visual
responses in humans. Neuroimage 35: 244–254.
Miller, J. 1982. Discrete versus continuous stagemodels of human information processing: In search of partial
output. Exp Psychol Hum Percept Perform 8: 273–296.
Miller, L. M., and M. D’Esposito. 2005. Perceptual fusion and stimulus coincidence in the cross-modal integra-
tion of speech. J Neurosci 25: 5884–5893.
Moore, T. 2006. The neurobiology of visual attention: Finding sources. Curr Opin Neurobiol 16: 159–165.
Moran, R. J., S. Molholm, R. B. Reilly, and J. J. Foxe. 2008. Changes in effective connectivity of human supe-
rior parietal lobule under multisensory and unisensory stimulation. Eur J Neurosci 27: 2303–2312.
Mozolic, J. L., D. Joyner, C. E. Hugenschmidt, A. M. Peiffer, R. A. Kraft, J. A. Maldjian et al. 2008. Cross-
modal deactivations during modality-specific selective attention. BMC Neurol 8: 35.
Nagy, A., G. Eordegh, Z. Paroczy, Z. Markus, and G. Benedek. 2006. Multisensory integration in the basal
ganglia. Eur J Neurosci 24: 917–924.
Natale, E., C. A. Marzi, and E. Macaluso. 2009. FMRI correlates of visuo-spatial reorienting investigated with
an attention shifting double-cue paradigm. Hum Brain Mapp 30: 2367–2381.
Nobre, A. C., J. T. Coull, C. D. Frith, and M. M. Mesulam. 1999. Orbitofrontal cortex is activated during
breaches of expectation in tasks of visual attention. Nat Neurosci 2: 11–12.
Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H. J. Heinze et al. 2007. Audiovisual
temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory
cortices. J Neurosci 27: 11431–11441.
Pessoa, L., S. Kastner, and L. G. Ungerleider. 2003. Neuroimaging studies of attention: From modulation of
sensory processing to top-down control. J Neurosci 23: 3990–3998.
506 The Neural Bases of Multisensory Processes
Trenner, M. U., H. R. Heekeren, M. Bauer, K. Rossner, R. Wenzel, A. Villringer et al. 2008. What happens in
between? Human oscillatory brain activity related to crossmodal spatial cueing. PLoS ONE 3: 1467.
van Atteveldt, N. M., E. Formisano, R. Goebel, and L. Blomert. 2007. Top-down task effects overrule auto-
matic multisensory responses to letter–sound pairs in auditory association cortex. Neuroimage 36:
1345–1360.
Vandenberghe, R., D. R. Gitelman, T. B. Parrish, and M. M. Mesulam. 2001. Functional specificity of superior
parietal mediation of spatial shifting. Neuroimage 14: 661–673.
Vossel, S., C. M. Thiel, and G. R. Fink. 2006. Cue validity modulates the neural correlates of covert endog-
enous orienting of attention in parietal and frontal cortex. Neuroimage 32: 1257–1264.
Wallace, M. T., J. G. McHaffie, and B. E. Stein. 1997. Visual response properties and visuotopic representation
in the newborn monkey superior colliculus. J Neurophysiol 78: 2732–2741.
Wallace, M. T., and B. E. Stein, 1994. Cross-modal synthesis in the midbrain depends on input from cortex.
J Neurophysiol 71: 429–432.
Wilkinson, L. K., M. A. Meredith, and B. E. Stein. 1996. The role of anterior ectosylvian cortex in cross-
modality orientation and approach behavior. Exp Brain Res 112: 1–10.
Wu, C. T., D. H. Weissman, K. C. Roberts, and M. G. Woldorff. 2007. The neural circuitry underlying the
executive control of auditory spatial attention. Brain Res 1134: 187–198.
Yantis, S., J. Schwarzbach, J. T. Serences, R. L. Carlson, M. A. Steinmetz, J. J. Pekar et al. 2002. Transient
neural activity in human parietal cortex during spatial attention shifts. Nat Neurosci 5: 995–1002.
Zimmer, U., and E. Macaluso. 2007. Processing of multisensory spatial congruency can be dissociated from
working memory and visuo-spatial attention. Eur J Neurosci 26: 1681–1691.
26 Cross-Modal Spatial Cueing
of Attention Influences
Visual Perception
John J. McDonald, Jessica J. Green,
Viola S. Störmer, and Steven A. Hillyard
CONTENTS
26.1 Spatial Attention: Modality-Specific or Supramodal?..........................................................509
26.2 Involuntary Cross-Modal Spatial Attention Enhances Perceptual Sensitivity...................... 511
26.3 Involuntary Cross-Modal Spatial Attention Modulates Time-Order Perception.................. 512
26.4 Beyond Temporal Order: The Simultaneity Judgment Task................................................. 516
26.5 Involuntary Cross-Modal Spatial Attention Alters Appearance........................................... 518
26.6 Possible Mechanisms of Cross-Modal Cue Effects............................................................... 520
26.7 Conclusions and Future Directions........................................................................................ 523
References....................................................................................................................................... 523
509
510 The Neural Bases of Multisensory Processes
modalities. Visual stimuli presented at the attended location elicited an enlarged negative ERP
component over the anterior scalp 170 ms after stimulus onset, both when visual stimuli were rel-
evant and when they were irrelevant. Similarly, auditory stimuli presented at the attended location
elicited an enlarged negativity over the anterior scalp beginning 140 ms after stimulus onset, both
when auditory stimuli were relevant and when they were irrelevant. Follow-up studies confirmed
that spatial attention influences ERP components elicited by stimuli in an irrelevant modality when
attention is sustained at a prespecified location over several minutes (Teder-Sälejärvi et al. 1999)
or is cued on a trial-by-trial basis (Eimer and Schröger 1998). The results from these ERP studies
indicate that spatial attention is not an entirely modality-specific process.
On the neuropsychological front, Farah and colleagues (1989) showed that unilateral damage to
the parietal lobe impairs reaction time (RT) performance in a spatial cueing task involving spatially
nonpredictive auditory cues. Prior visual-cueing studies had shown that patients with damage to the
right parietal lobe were substantially slower to detect visual targets appearing in the left visual field
following a peripheral visual cue to the right visual field (invalid trials) than when attention was
cued to the left (valid trials) or was cued to neither side (neutral trials) (Posner et al. 1982, 1984).
This location-specific RT deficit was attributed to an impairment in the disengagement of attention,
mainly because the patients appeared to have no difficulty in shifting attention to the contralesional
field following a valid cue or neutral cue. In Farah et al.’s study, similar impairments in detecting
contralesional visual targets were observed following either invalid auditory or visual cues pre-
sented to the ipsilesional side. On the basis of these results, Farah and colleagues concluded that
sounds and lights automatically engage the same supramodal spatial attention mechanism.
Given the neurophysiological and neuropsychological evidence in favor of a supramodal (or at
least partially shared) spatial attention mechanism, why did several early behavioral studies appear
to support the modality-specific view of spatial attention? These initial difficulties in showing spa-
tial attention effects outside of the visual modality may be attributed largely to methodological
factors, because some of the experimental designs that had been used successfully to study visual
spatial attention were not ideal for studying auditory spatial attention. In particular, because sounds
can be rapidly detected based on spectrotemporal features that are independent of a sound’s spatial
location, simple detection measures that had shown spatial specificity in visual cueing tasks did not
always work well for studying spatial attention within audition (e.g., Posner 1978). As researchers
began to realize that auditory spatial attention effects might be contingent on the degree to which
sound location is processed (Rhodes 1987), new spatial discrimination tasks were developed to
ensure the use of spatial representations (McDonald and Ward 1999; Spence and Driver 1994). With
these new tasks, researchers were able to document spatial cueing effects using all the various com-
binations of visual, auditory, and tactile cue and target stimuli. As reviewed elsewhere (e.g., Driver
and Spence 2004), voluntary spatial cueing studies had begun to reveal a consistent picture by the
mid 1990s: voluntarily orienting attention to a location facilitated the processing of subsequent tar-
gets regardless of the cue and target modalities.
The picture that emerged from involuntary spatial cueing studies remained less clear because
some of the spatial discrimination tasks that were developed failed to reveal cross-modal cueing
effects (for detailed reviews of methodological issues, see Spence and McDonald 2004; Wright
and Ward 2008). For example, using an elevation-discrimination task, Spence and Driver found an
asymmetry in the involuntary spatial cueing effects between visual and auditory stimuli (Spence and
Driver 1997). In their studies, spatially nonpredictive auditory cues facilitated responses to visual
targets, but spatially nonpredictive visual cues failed to influence responses to auditory targets. For
some time the absence of a visual–auditory cue effect weighed heavily on models of involuntary
spatial attention. In particular, it was taken as evidence against a single supramodal attention system
that mediated involuntary deployments of attention in multisensory space. However, researchers
began to suspect that Spence and Driver’s (1997) missing audiovisual cue effect stemmed from the
large spatial separation between cue and target, which existed even on validly (ipsilaterally) cued
trials, and the different levels of precision with which auditory and visual stimuli can be localized.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 511
Specifically, it was hypothesized that visual cues triggered shifts of attention that were focused
too narrowly around the cued location to affect processing of a distant auditory target (Ward et al.
2000). Data from a recent study confirmed this narrow-focus explanation for the last remaining
“missing link” in cross-modal spatial attention (Prime et al. 2008). Visual cues were found to facili-
tate responses to auditory targets that were presented at the cued location but not auditory targets
that were presented 14° above or below the cued location (see also McDonald et al. 2001).
The bulk of the evidence to date indicates that orienting attention involuntarily or voluntarily
to a specific location in space can facilitate responding to subsequent targets, regardless of the
modality of the cue and target stimuli. In principle, such cross-modal cue effects might reflect the
consequences of a supramodal attention-control system that alters the perceptual representations of
objects in different modalities (Farah et al. 1989). However, the majority of behavioral studies to
date have examined the effects of spatial cues on RT performance, which is at best a very indirect
measure of perceptual experience (Luce 1986; Watt 1991). Indeed, measures of response speed are
inherently ambiguous in that RTs reflect the cumulative output of multiple stages of processing,
including low-level sensory and intermediate perceptual stages, as well as later stages involved
in making decisions and executing actions. In theory, spatial cueing could influence processing at
any one of these stages. There is some evidence that the appearance of a spatial cue can alter an
observer’s willingness to respond and reduce the uncertainty of his or her decisions without affect-
ing perception (Shiu and Pashler 1994; Sperling and Dosher 1986). Other evidence suggests that
whereas voluntary shifts of attention can affect perceptual processing, involuntary shifts of atten-
tion may not (Prinzmetal et al. 2005).
In this chapter, we review studies that have extended the RT-based chronometric investigation
of cross-modal spatial attention by utilizing psychophysical measures that better isolate perceptual-
level processes. In addition, neurophysiological and neuroimaging methods have been combined
with these psychophysical approaches to identify changes in neural activity that might underlie the
cross-modal consequences of spatial attention on perception. These methods have also examined
neural activity within the cue–target interval that might reflect supramodal (or modality specific)
control of spatial attention and subsequent anticipatory biasing of activity within sensory regions
of the cortex.
of processing. One such method was used to investigate whether orienting attention involuntarily to
a sudden sound influences perceptual-level processing of subsequent visual targets (McDonald et
al. 2000). The design was adapted from earlier visual-cueing studies that eliminated location uncer-
tainty by presenting a mask at a single location and requiring observers to indicate whether they
saw a target at the masked location (Luck et al. 1994, 1996; see also Smith 2000). The mask serves
a dual purpose in this paradigm: to ensure that the location of the target (if present) is known with
complete certainty and to backwardly mask the target so as to limit the accrual and persistence of
stimulus information at the relevant location. Under such conditions, it is possible to use methods of
signal detection theory to obtain a measure of an observer’s perceptual sensitivity (d′)—the ability
to discern a sensory event from background noise—that is independent of the observer’s decision
strategy (which, in signal detection theory, is characterized by the response criterion, β; see Green
and Swets 1966).
Consistent with a perceptual-level explanation, McDonald and colleagues (2000) found that per-
ceptual sensitivity was higher when the visual target appeared at the location of the auditory cue
than when it appeared on the opposite side of fixation (Figure 26.1a and b). This effect was ascribed
to an involuntary shift of attention to the cued location because the sound provided no information
about the location of the impending target. Also, because there was no uncertainty about the target
location, the effect could not be attributed to a reduction in location uncertainty. Consequently,
the results provided strong evidence that shifting attention involuntarily to the location of a sound
actually improves the perceptual quality of a subsequent visual event appearing at that location (see
also Dufour 1999). An analogous effect on perceptual sensitivity has been reported in the converse
audiovisual combination, when spatially nonpredictive visual cues were used to orient attention
involuntarily before the onset of an 800-Hz target embedded in a white-noise mask (Soto-Faraco et
al. 2002). Together, these results support the view that sounds and lights engage a common supra-
modal spatial attention system, which then modulates perceptual processing of relevant stimuli at
the cued location (Farah et al. 1989).
To investigate the neural processes by which orienting spatial attention to a sudden sound influ-
ences processing of a subsequent visual stimulus, McDonald and colleagues (2003) recorded ERPs
in the signal-detection paradigm outlined above. ERPs to visual stimuli appearing at validly and
invalidly cued locations began to diverge from one another at about 100 ms after stimulus onset,
with the earliest phase of this difference being distributed over the midline central scalp (Figure
26.1c and d). After about 30–40 ms, this ERP difference between validly and invalidly cued visual
stimuli shifted to midline parietal and lateral occipital scalp regions. A dipole source analysis indi-
cated that the initial phase of this difference was generated in or near the multisensory region of the
superior temporal sulcus (STS), whereas the later phase was generated in or near the fusiform gyrus
of the occipital lobe (Figure 26.1e). This pattern of results suggests that enhanced visual perception
produced by the cross-modal orienting of spatial attention may depend on feedback connections
from the multisensory STS to the ventral stream of visual cortical areas. Similar cross-modal cue
effects were observed when participants made speeded responses to the visual targets, but the earli-
est effect was delayed by 100 ms (McDonald and Ward 2000). This is in line with behavioral data
suggesting that attentional selection might take place earlier when target detection accuracy (or fine
perceptual discrimination; see subsequent sections) is emphasized than when speed of responding
is emphasized (Prinzmetal et al. 2005).
validly cued
(a) (b)
invalidly cued
2.00
Detectibility (d′)
1.75
1.50
1.25
1.00
0
Left Right
Target location
(c) (d)
120–140 ms 150–170 ms
PO7 PO8
(e)
x=43 x=33 x=35
STS dipoles
FG dipoles
PostC dipoles
FIGURE 26.1 Results from McDonald et al.’s (2000, 2003) signal detection experiments. (a) Schematic
illustration of stimulus events on a valid-cue trial. Small light displays were fixed to bottoms of two loud-
speaker cones, one situated to the left and right of a central fixation point. Each trial began with a spatially
nonpredictive auditory cue from the left or right speaker (first panel), followed by a faint visual target on some
trials (second panel) and a salient visual mask (third panel). Participants were required to indicate whether
they saw the visual target. (b) Perceptual sensitivity data averaged across participants. (c) Grand-average
event-related potentials (ERPs) to left visual field stimuli following valid and invalid auditory cues. The ERPs
were recorded from lateral occipital electrodes PO7 and PO8. Negative voltages are plotted upward, by con-
vention. Shaded box highlights interval of P1 and N1 components, in which cue effects emerged. (d) Scalp
topographies of enhanced negative voltages to validly cued visual targets. (e) Projections of best-fitting dipo-
lar sources onto sections of an individual participant’s MRI. Dipoles were located in superior temporal lobe
(STS), fusiform gyrus (FG), and perisylvian cortex near post-central gyrus (PostC). PostC dipoles accounted
for relatively late (200–300 ms) activity over more anterior scalp regions.
processing at later stages, thereby leading to faster responses for validly cued objects than for inval-
idly cued objects. Theoretically, however, changes in the timing of perceptual processing could also
contribute to the cue effects on RT performance: an observer might become consciously aware of a
target earlier in time when it appears at a cued location than when it appears at an uncued location.
In fact, the idea that attention influences the timing of our perceptions is an old and controversial
one. More than 100 years ago, Titchener (1908) asserted that when confronted with multiple objects,
an observer becomes consciously aware of an attended object before other unattended objects.
Titchener called the hypothesized temporal advantage for attended objects the law of prior entry.
514 The Neural Bases of Multisensory Processes
Observations from laboratory experiments in the nineteenth and early twentieth centuries were
interpreted along the lines of attention-induced prior entry. In one classical paradigm known as
the complication experiment, observers were required to indicate the position of a moving pointer
at the moment a sound was presented (e.g., Stevens 1904; Wundt 1874; for a review, see Boring
1929). When listening in anticipation for the auditory stimulus, observers typically indicated that
the sound appeared when the pointer was at an earlier point along its trajectory than was actually
the case. For example, observers might report that a sound appeared when a pointer was at posi-
tion 4 even though the sound actually appeared when the pointer was at position 5. Early on, it was
believed that paying attention to the auditory modality facilitated sound perception and led to a
relative delay of visual perception, so that the pointer’s perceived position lagged behind its actual
position. However, this explanation fell out of favor when later results indicated that a specific judg-
ment strategy, rather than attention-induced prior entry, might be responsible for the mislocalization
error (e.g., Cairney 1975).
In more recent years, attention-induced prior entry has been tested experimentally in visual tem-
poral-order judgment (TOJ) tasks that require observers to indicate which of two rapidly presented
visual stimuli appeared first. When the attended and unattended stimuli appear simultaneously,
observers typically report that the attended stimulus appeared to onset before the unattended stim
ulus (Stelmach and Herdman 1991; Shore et al. 2001). Moreover, in line with the supramodal view
of spatial attention, such changes in temporal perception have been found when shifts in spatial
attention were triggered by spatially nonpredictive auditory and tactile cues as well as visual cues
(Shimojo et al. 1997).
Despite the intriguing behavioral results from TOJ experiments, the controversy over attention-
induced prior entry has continued. The main problem harks back to the debate over the complication
experiments: an observer’s judgment strategy might contribute to the tendency to report the cued
target as appearing first (Pashler 1998; Schneider and Bavelier 2003; Shore et al. 2001). Thus, in a
standard TOJ task, observers might perceive two targets to appear simultaneously but still report
seeing the target on the cued side first because of a decision rule that favors the cued target (e.g.,
when in doubt, select the cued target). Simple response biases (e.g., stimulus–response compatibility
effects) can be avoided quite easily by altering the task (McDonald et al. 2005; Shore et al. 2001),
but it is difficult to completely avoid the potential for response bias.
As noted previously, ERP recordings can be used to distinguish between changes in high-level
decision and response processes and changes in perceptual processing that could underlie entry
to conscious awareness. An immediate challenge to this line of research is to specify the ways
in which the perceived timing of external events might be associated with activity in the brain.
Philosopher Daniel Dennett expounded two alternatives (Dennett 1991). On one hand, the perceived
timing of external events may be derived from the timing of neural activities in relevant brain cir-
cuits. For example, the perceived temporal order of external events might be based on the timing
of early cortical evoked potentials. On the other hand, the brain might not represent the timing of
perceptual events with time itself. In Dennett’s terminology, the represented time (e.g., A before B)
is not necessarily related to the time of the representing (e.g., representing of A does not necessarily
precede representing of B). Consequently, the perceived temporal order of external events might be
based on nontemporal aspects of neural activities in relevant brain circuits.
McDonald et al. (2005) investigated the effect of cross-modal spatial attention on visual time-
order perception using ERPs to track the timing of cortical activity in a TOJ experiment. A spatially
nonpredictive auditory cue was presented to the left or right side of fixation just before the occur-
rence of a pair of simultaneous or nearly simultaneous visual targets (Figure 26.2a). One of the
visual targets was presented at the cued location, whereas the other was presented at the homolo-
gous location in the opposite visual hemifield. Consistent with previous behavioral studies, the
auditory spatial cue had a considerable effect on visual TOJs (Figure 26.2b). Participants judged the
cued target as appearing first on 79% of all simultaneous-target trials. To nullify this cross-modal
cueing effect, the uncued target had to be presented nearly 70 ms before the cued target.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 515
50
25 Point of subjective
simultaneity
–2 µV P1 1.55
N1
0
p < .05
μV
FIGURE 26.2 Results from McDonald et al.’s (2005) temporal-order-judgment experiment. (a) Schematic
illustration of events on a simultaneous-target trial (top) and nonsimultaneous target trials (bottom).
Participants indicated whether a red or a green target appeared first. SOA between cue and first target event was
100– 300 ms, and SOA between nonsimultaneous targets was 35 or 70 ms. T1 and T2 denote times at which
visual targets could occur. (b) Mean percentage of trials on which participants reported seeing the target on
cued side first, as a function of cued-side onset advantage (CSOA; i.e., lead time). Negative CSOAs indicate
that uncued-side target was presented first; positive CSOAs indicate that cued-side target was presented first.
(c) Grand-average ERPs to simultaneous visual targets, averaged over 79% of trials on which participants
indicated that cued-side target appeared first. ERPs were recorded at contralateral and ipsilateral occipital
electrodes (PO7/PO8). Statistically significant differences between contralateral and ipsilateral waveforms are
denoted in gray on time axis. (d) Scalp topographies of ERP waveforms in time range of P1 (90–120 ms). Left
and right sides of the map show electrodes ipsilateral and contralateral electrodes, respectively. (e) Projections
of best-fitting dipolar sources onto sections of an average MRI. Dipoles were located in superior temporal lobe
(STS) and fusiform gyrus (FG). FG dipoles accounted for cue-induced P1 amplitude modulation, whereas STS
dipoles accounted for a long-latency (200–250 ms) negative deflection.
To elucidate the neural basis of this prior-entry effect, McDonald and colleagues (2005) exam-
ined the ERPs elicited by simultaneously presented visual targets following the auditory cue. The
analytical approach taken was premised on the lateralized organization of the visual system and
the pattern of ERP effects that have been observed under conditions of bilateral visual stimulation.
Several previous studies on visual attention showed that directing attention to one side of a bilateral
visual display results in a lateralized asymmetry of the early ERP components measured over the
occipital scalp, with an increased positivity at electrode sites contralateral to the attended location
beginning in the time range of the occipital P1 component (80–140 ms; Heinze et al. 1990, 1994;
Luck et al. 1990; see also Fukuda and Vogel 2009). McDonald et al. (2005) hypothesized that if
attention speeds neural transmission at early stages of the visual system, the early ERP compo-
nents elicited by simultaneous visual targets would show an analogous lateral asymmetry in time,
such that the P1 measured contralateral to the attended (cued) visual target would occur earlier
than the P1 measured contralateral to the unattended (uncued) visual target. Such a finding would
516 The Neural Bases of Multisensory Processes
be consistent with Stelmach and Herdman’s (1991) explanation of attention-induced prior entry as
well as with the view that the time course of perceptual experience is tied to the timing of the early
evoked activity in the visual cortex (Dennett 1991). Such a latency shift was not observed, however,
even though the auditory cue had a considerable effect on the judgments of temporal order of the
visual targets. Instead, cross-modal cueing led to an amplitude increase (with no change in latency)
of the ERP positivity in the ventral visual cortex contralateral to the side of the auditory cue, start-
ing in the latency range of the P1 component (90–120 ms) (Figure 26.2c–e). This finding suggests
that the effect of spatial attention on the perception of temporal order occurs because an increase in
the gain of the cued sensory input causes a perceptual threshold to be reached at an earlier time, not
because the attended input was transmitted more rapidly than the unattended input at the earliest
stages of processing.
The pattern of ERP results obtained by McDonald and colleagues is likely an important clue for
the understanding the neural basis of visual prior entry due to involuntary deployments of spatial
attention to sudden sounds. Although changes in ERP amplitude appear to underlie visual percep-
tual prior entry when attention is captured by lateralized auditory cues, changes in ERP timing
might contribute to perceptual prior entry in other situations. This issue was addressed in a recent
study of multisensory prior entry, in which participants voluntarily attended to either visual or tac-
tile stimuli and judged whether the stimulus on the left or right appeared first, regardless of stimulus
modality (Vibell et al. 2007). The ERP analysis centered on putatively visual ERP peaks over the
posterior scalp (although ERPs to the tactile stimuli were not subtracted out and thus may have
contaminated the ERP waveforms; cf. Talsma and Woldorff 2005). Interestingly, the P1 peaked at
an average of 4 ms earlier when participants were attending to the visual modality than when they
were attending the tactile modality, suggesting that modality-based attentional selection may have
a small effect on the timing of early, evoked activity in the visual system. These latency results are
not entirely clear, however, because the small-but-significant attention effect may have been caused
by a single participant with an implausibly large latency difference (17 ms) and may have been influ-
enced by overlap with the tactile ERP. Unfortunately, the authors did not report whether attention
had a similar effect on the latency of the tactile ERPs, which may have helped to corroborate the
small attention effect on P1 latency. Notwithstanding these potential problems in the ERP analysis,
it is tempting to speculate that voluntary modality-based attentional selection influences the timing
of early visual activity, whereas involuntary location-based attentional selection influences the gain
of early visual activity. The question would still remain, however, how very small changes in ERP
latency (4 ms or less) could underlie much larger perceptual effects of tens of milliseconds.
* This argument would also apply to the findings of Vibell et al.’s (2007) cross-modal TOJ study.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 517
the uncued target had to appear 15–17 ms before the cued target in order for participants to have the
subjective impression that the two stimuli appeared simultaneously. This difference is referred to as
a shift in the point of subjective simultaneity (PSS), and it is typically attributed to the covert ori-
enting of attention (but see Schneider and Bavelier 2003, for an alternative sensory-based account).
The estimated shift in PSS was much smaller than the one reported in McDonald et al.’s earlier TOJ
task (17.4 vs. 68.5 ms), but the conclusions derived from the two findings were the same: Involuntary
capture of spatial attention by a sudden sound influences the perceived timing of visual events.
Santangelo and Spence went on to argue that the shift in PSS reported by McDonald et al. might
have been due to secondary response biases and, as a result, the shift in PSS observed in their study
provided “the first unequivocal empirical evidence in support of the effect of cross-modal atten-
tional capture on the latencies of perceptual processing” (p. 163).
Although the SJ task has its virtues, there are two main arguments against Santangelo and
Spence’s conclusions. First, the authors did not take into account the neurophysiological findings
of McDonald and colleagues’ ERP study. Most importantly, the effect of auditory spatial cuing
on early ERP activity arising from sensory-specific regions of the ventral visual cortex cannot be
explained in terms of response bias. Thus, although it may be difficult to rule out all higher-order
response biases in a TOJ task, the ERP findings provide compelling evidence that cross-modal spa-
tial attention modulates early visual-sensory processing. Moreover, although the SJ task may be less
susceptible to some decision-level factors, it may be impossible to rule out all decision-level factors
entirely as contributors to the PSS effect.* Thus, it is not inconceivable that Santangelo and Spence’s
behavioral findings may have reflected post-perceptual rather than perceptual effects.
Second, it should be noted that Santangelo and Spence’s results provided little, if any, empirical
support for the conclusion that cross-modal spatial attention influences the timing of visual percep-
tual processing. The problem is that their estimated PSS did not accurately represent their empiri-
cal data. Their PSS measure was derived from the proportion of “simultaneous” responses, which
varied as a function of the stimulus onset asynchrony (SOA) between the target on the cued side
and the target on the uncued side. As shown in their Figure 2a, the proportion of “simultaneous”
responses peaked when the cued and uncued targets appeared simultaneously (0 ms SOA) and
decreased as the SOA between targets increased. The distribution of responses was fit to a Gaussian
function using maximum likelihood estimation, and the mean of the fitted Gaussian function—not
the observed data—was used as an estimate of the PSS. Critically, this procedure led to a mismatch
between the mean of the fitted curve (or more aptly, the mean of the individual-subject fitted curves)
and the mean of the observed data. Specifically, whereas the mean of the fitted curves fell slightly
to the left of the 0-ms SOA (uncued target presented first), the mean of the observed data actually
fell slightly to the right of the 0-ms SOA (cued target presented first) because of a positive skew of
the distribution.†
Does auditory cueing influence the subjective impression of simultaneity in the context of a SJ
task? Unfortunately, the results from Santangelo and Spence’s study provide no clear answer to this
question. The reported leftward shift in PSS suggests that the auditory cue had a small facilitatory
effect on the perceived timing of the ipsilateral target. However, the rightward skew of the observed
* Whereas Santangelo and Spence (2008) made the strong claim that performance in SJ tasks should be completely inde-
pendent of all response biases, Schneider and Bavelier (2003) argued only that performance in SJ tasks should be less
susceptible to such decision-level effects than performance in TOJ tasks.
† The mismatch between the estimated PSS and the mean of the observed data in Santangelo and Spence’s (2008) SJ task
might have been due to violations in the assumptions of the fitting procedure. Specifically, the maximum likelihood
procedure assumes that data are distributed normally, whereas the observed data were clearly skewed. Santangelo and
Spence did perform one goodness-of-fit test to help determine whether the data differed significantly from the fitted
Gaussians, but this test was insufficient to pick up the positive skew (note that other researchers have employed multiple
goodness-of-fit tests before computing PSS; e.g., Stone et al. 2001). Alternatively, the mismatch between the estimated
PSS and the mean of the observed data might have arisen because data from the simultaneous-target trials were actually
discarded prior to the curve-fitting procedure. This arbitrary step shifted the mode of the distribution 13 ms to the left
(uncued target was presented 13 ms before cued target), which happened to be very close to the reported shift in PSS.
518 The Neural Bases of Multisensory Processes
distribution (and consequential rightward shift in the mean) suggests that the auditory cue may actu-
ally have delayed perception of the ipsilateral target. Finally, the mode of the observed distribution
suggests that the auditory cue had no effect on subjective reports of simultaneity. These inconclusive
results suggest that the SJ task may lack adequate sensitivity to detect shifts in perceived time order
induced by cross-modal cueing.
(a) (b)
1.0
Test patch cued
0 –1.0
μV –0.1 0.0 0.1 0.2 0.3
FIGURE 26.3 Results from Störmer et al.’s (2009) contrast-appearance experiment. (a) Stimulus sequence
and grand-average ERPs to equal-contrast Gabor, recorded at occipital electrodes (PO7/PO8) contralateral
and ipsilateral to cued side. On a short-SOA trial (depicted), a peripheral auditory cue was presented 150 ms
before a bilateral pair of Gabors that varied in contrast (see text for details). Isolated target ERPs revealed an
enlarged positivity contralateral to cued target. Statistically significant differences between contralateral and
ipsilateral waveforms are denoted in gray on time axis. (b) Mean probability of reporting contrast of test patch
to be higher than that of standard patch, as a function of test-patch contrast. Probabilities for cued-test and
cued-standard trials are shown separately. (c) Scalp topographies of equal-contrast-Gabor ERPs in time inter-
val of P1 (120–140 ms). Left and right sides of the map show electrodes ipsilateral and contralateral electrodes,
respectively. (d) Localization of distributed cortical current sources underlying contralateral-minus-ipsilateral
ERP positivity in 120–140 ms interval, projected onto cortical surface. View of the ventral surface, with
occipital lobes at the top. Source activity was estimated using LAURA algorithm and is shown in contralateral
hemisphere (right side of brain) only. (e) Correlations between individual participants’ tendencies to report the
cued-side target to be higher in contrast and magnitude of enlarged ERP positivities recorded at occipital and
parieto-occipital electrodes (PO7/PO8, PO3/PO4) in 120–140 ms interval.
contrast of the other (test) Gabor varied between 6% and 79%. ERPs were recorded on the trials (1/3
of the total) where the two Gabors were equal in contrast. Participants were required to indicate
whether the higher-contrast Gabor patch was oriented horizontally or vertically.
The psychophysical findings in this auditory cueing paradigm were consistent with those reported
by Carrasco and colleagues (2004). When the test and standard Gabors had the same physical con-
trast, observers reported the orientation of the cued-location Gabor significantly more often than the
uncued-location Gabor (55% vs. 45%) (Figure 26.3b). The point of subjective equality (PSE)—the
520 The Neural Bases of Multisensory Processes
test contrast at which observers judged the test patch to be higher in contrast on half of the trials—
averaged 20% when the test patch was cued and 25% when the standard patch was cued (in compari-
son with the 22% standard contrast; Figure 26.3a). These results indicate that spatially nonpredictive
auditory cues as well as visual cues can influence subjective (visual) contrast judgments.
To investigate whether the auditory cue altered visual appearance as opposed to a decision or
response processes, Störmer and colleagues (2009) examined the ERPs elicited by the equal-contrast
Gabors as a function of cue location. The authors reasoned that changes in subjective appearance
would likely be linked to modulations of early ERP activity in visual cortex associated with percep-
tual processing rather than decision- or response-level processing (see also Schneider and Komlos
2008). Moreover, any such effect on early ERP activity should correlate with the observers’ tenden-
cies to report the cued target as being higher in contrast. This is exactly what was found. Starting at
approximately 90 ms after presentation of the equal-contrast targets, the waveform recorded con-
tralaterally to the cued side became more positive than the waveform recorded ipsilaterally to the
cued side (Figure 26.3a). This contralateral positivity was observed on those trials when observers
judged the cued-location target to be higher in contrast but not when observers judged the uncued-
location target to be higher in contrast. The tendency to report the cued-location target as being
higher in contrast correlated with the contralateral ERP positivity, most strongly in the time interval
of the P1 component (120–140 ms), which is generated at early stages of visual cortical processing.
Topographical mapping and distributed source modeling indicated that the increased contralateral
positivity in the P1 interval reflected modulations of neural activity in or near the fusiform gyrus of
the occipital lobe (Figure 26.3c and d). These ERP findings converge with the behavioral evidence
that cross-modal spatial attention affects visual appearance through modulations at an early sensory
level rather than by affecting a late decision process.
The second major controversy over the possible mechanisms of cross-modal cue effects is spe-
cific to studies utilizing salient-but-irrelevant stimuli to capture attention involuntarily. In these
studies, the behavioral and neurophysiological effects of cueing are typically maximal when the
cue appears 100–300 ms before the target. Although it is customary to attribute these facilitatory
effects to the covert orienting of attention, they might alternatively result from sensory interactions
between cue and target (Tassinari et al. 1994). The cross-modal-cueing paradigm eliminates uni-
modal sensory interactions, such as those taking place at the level of the retina, but the possibility of
cross-modal sensory interaction remains because of the existence of multisensory neurons at many
levels of the sensory pathways that respond to stimuli in different modalities (Driver and Noesselt
2008; Foxe and Schroeder 2005; Meredith and Stein 1996; Schroeder and Foxe 2005). In fact,
the majority of multisensory neurons do not simply respond to stimuli in different modalities, but
rather appear to integrate the input signals from different modalities so that their responses to mul-
timodal stimulation differ quantitatively from the simple summation of their unimodal responses
(for reviews, see Stein and Meredith 1993; Stein et al. 2009; other chapters in this volume). Such
multisensory interactions are typically largest when stimuli from different modalities occur at about
the same time, but they are possible over a period of several hundreds of milliseconds (Meredith
et al. 1987). In light of these considerations, the cross-modal cueing effects described in previous
sections could in principle have been due to the involuntary covert orienting of spatial attention or
to the integration of cue and target into a single multisensory event (McDonald et al. 2001; Spence
and McDonald 2004; Spence et al. 2004).
Although it is often difficult to determine which of these mechanisms are responsible for cross-
modal cueing effects, several factors can help to tip the scales in favor of one explanation or the
other. One factor is the temporal relationship between the cue and target stimuli. A simple rule of
thumb is that increasing the temporal overlap between the cue and target will make multisensory
integration more likely and pre-target attentional biasing less likely (McDonald et al. 2001). Thus,
it is relatively straightforward to attribute cross-modal cue effects to multisensory integration when
cue and target are presented concurrently or to spatial attention when cue and target are separated
by a long temporal gap. The likely cause of cross-modal cueing effects is not so clear, however,
when there is a short gap between cue and target that is within the temporal window where inte-
gration is possible. In such situations, other considerations may help to disambiguate the causes of
the cross-modal cueing effects. For example, multisensory integration is largely an automatic and
invariant process, whereas stimulus-driven attention effects are dependent on an observer’s goals
and intentions (i.e., attentional set; e.g., Folk et al. 1992). Thus, if cross-modal spatial cue effects
were found to be contingent upon an observer’s current attentional set, they would be more likely to
have been caused by pre-target attentional biasing. To our knowledge, there has been little discus-
sion of the dependency of involuntary cross-modal spatial cueing effects on attentional set and other
task-related factors (e.g., Ward et al. 2000).
A second consideration that could help distinguish between alternative mechanisms of cross-
modal cueing effects concerns the temporal sequence of control operations (Spence et al. 2004).
According to the most prominent multisensory integration account, signals arising from stimuli
in different modalities converge onto multimodal brain regions and are integrated therein. The
resulting integrated signal is then fed back to the unimodal brain regions to influence processing
of subsequent stimuli in modality-specific regions of cortex (Calvert et al. 2000; Macaluso et al.
2000). Critically, such an influence on modality-specific processing would occur only after feedfor-
ward convergence and integration of the unimodal signals takes place (Figure 26.4a). This contrasts
with the supramodal-attention account, according to which the cue’s influence on modality-specific
processing may be initiated before the target in another modality has been presented (i.e., before
integration is possible). In the context of a peripheral cueing task, a cue in one modality (e.g.,
audi tion) would initiate a sequence of attentional control operations (such as disengage, move, reen-
gage; see Posner and Raichle 1994) that would lead to anticipatory biasing of activity in another
modality (e.g., vision) before the appearance of the target (Figure 26.4b). In other words, whereas
522 The Neural Bases of Multisensory Processes
AV AV
AV AV
time
Auditory Visual Auditory Visual
AV AV
AV
Auditory Visual
target
FIGURE 26.4 Hypothetical neural mechanisms for involuntary cross-modal spatial cueing effects.
(a) Integration-based account. Nearly simultaneous auditory and visual stimuli first activate unimodal audi-
tory and visual cortical regions and then converge upon a multisensory region (AV). Audiovisual interaction
within multisensory region feeds back to boost activity in visual cortex. (b) Attention-based account. An audi-
tory cue elicits a shift of spatial attention in a multisensory representation, which leads to pre-target biasing of
activity in visual cortex and ultimately boosts target-related activity in visual cortex.
multisensory integration occurs only after stimulation in two (or more) modalities, the consequences
of spatial attention are theoretically observable after stimulation in the cue modality alone. Thus,
a careful examination of neural activity in the cue–target interval would help to ascertain whether
pre-target attentional control is responsible for the cross-modal cueing effects on perception. This
is a challenging task in the case of involuntary cross-modal cue effects, because the time interval
between the cue and target is typically very short. In the future, however, researchers might success-
fully adapt the electrophysiological methods used to track the voluntary control of spatial attention
(e.g., Doesburg et al. 2009; Eimer et al. 2002; Green and McDonald 2008; McDonald and Green
2008; Worden et al. 2000) to look for signs of attentional control in involuntary cross-modal cueing
paradigms such as the ones described in this chapter.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 523
REFERENCES
Boring, E. G. 1929. A history of experimental psychology. New York: Appleton-Century.
Broadbent, D. E. 1958. Perception and communication. London: Pergamon Press.
Cairney, P. T. 1975. The complication experiment uncomplicated. Perception 4: 255–265.
Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. Evidence from functional magnetic resonance imaging
of crossmodal binding in the human heteromodal cortex. Current Biology 10: 649–657.
524 The Neural Bases of Multisensory Processes
Carrasco, M. 2006. Covert attention increases contrast sensitivity: Psychophysical, neurophysiological, and
neuroimaging studies. In Progress in Brain Research, Volume 154, Part 1: Visual Perception. Part I.
Fundamentals of Vision: Low and Mid-level Processes in Perception, ed. S. Martinez-Conde, S. L.
Macknik, L. M. Martinez, J. M. Alonso, and P. U. Tse, 33–70. Amsterdam: Elsevier.
Carrasco, M., S. Ling, and S. Read. 2004. Attention alters appearance. Nature Neuroscience 7: 308–313.
Carver, R. A., and V. Brown. 1997. Effects of amount of attention allocated to the location of visual stimulus
pairs on perception of simultaneity. Perception & Psychophysics 59: 534–542.
Cherry, C. E. 1953. Some experiments on the recognition of speech with one and two ears. Journal of the
Acoustical Society of America 25: 975–979.
Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain.
Nature Reviews Neuroscience 3: 201–215.
Dennett, D. 1991. Consciousness explained. Boston: Little, Brown & Co.
Deutsch, J. A., and D. Deutsch. 1963. Attention: Some theoretical considerations. Psychological Review 70:
80–90.
Doesburg, S. M., J. J. Green, J. J. McDonald, and L. M. Ward. 2009. From local inhibition to long-range inte-
gration: A functional dissociation of alpha-band synchronization across cortical scales in visuospatial
attention. Brain Research 1303: 97–110.
Driver, J., and T. Noesselt. 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’
brain regions, neural responses, and judgments. Neuron 57: 11–23.
Driver, J., and C. Spence. 2004. Crossmodal spatial attention: Evidence from human performance. In Crossmodal
space and crossmodal attention, ed. C. Spence and J. Driver, 179–220. Oxford: Oxford Univ. Press.
Dufour, A. 1999. Importance of attentional mechanisms in audiovisual links. Experimental Brain Research
126: 215–222.
Eimer, M., J. van Velzen, and J. Driver. 2002. Cross-modal interactions between audition, touch, and vision in
endogenous spatial attention: ERP evidence on preparatory states and sensory modulations. Journal of
Cognitive Neuroscience 14: 254–271.
Eimer, M., and E. Schröger. 1998. ERP effects of intermodal attention and cross-modal links in spatial atten-
tion. Psychophysiology 35: 313–327.
Eriksen, C. W., and J. E. Hoffman. 1972. Temporal and spatial characteristics of selective encoding from visual
displays. Perception & Psychophysics 12: 201–204.
Farah, M. J., A. B. Wong, M. A. Monheit, and L. A. Morrow. 1989. Parietal lobe mechanisms of spatial atten-
tion—modality-specific or supramodal. Neuropsychologia 27: 461–470.
Fechner, G. T. 1882. Revision der Hauptpunkte der Psychophysik. Leipzig: Breitkopf & Härtel.
Folk, C. L., R. W. Remington, and J. C. Johnston. 1992. Involuntary covert orienting is contingent on atten-
tional control settings. Journal of Experimental Psychology: Human Perception and Performance 18:
1030–1044.
Foxe, J. J., and C. E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical
processing. Neuroreport 16: 419–423.
Fukuda, K., and E. K. Vogel, 2009. Human variation in overriding attentional capture. Journal of Neuroscience
29: 8726–8733.
Fuller, S., R. Z. Rodriguez, and M. Carrasco. 2008. Apparent contrast differs across the vertical meridian:
Visual and attentional factors. Journal of Vision 8: 1–16.
Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley.
Green, J. J., and J. J. McDonald. 2008. Electrical neuroimaging reveals timing of attentional control activity in
human brain. PLoS Biology 6: e81.
Heinze, H. J., G. R. Mangun, and S. A. Hillyard. 1990. Visual event-related potentials index perceptual accu-
racy during spatial attention to bilateral stimuli. In Psychophysiological Brain Research, ed. C. Brunia et
al., 196–202. Tilburg, The Netherlands: Tilburg Univ. Press.
Heinze, H. J., G. R. Mangun, W. Burchert et al. 1994. Combined spatial and temporal imaging of brain activity
during visual selective attention in humans. Nature 372: 543–546.
Helmholtz, H. V. 1866. Treatise on psychological optics, 3rd ed., Vols. 2 & 3. Rochester: Optical Society of
America.
Hillyard, S. A., G. V. Simpson, D. L. Woods, S. Vanvoorhis, and T. F. Münte. 1984. Event-related brain poten-
tials and selective attention to different modalities. In Cortical Integration, ed. F. Reinoso-Suarez and C.
Ajmone-Marsan, 395–414. New York: Raven Press.
James, W. 1890. The principles of psychology. New York: Henry Holt.
LaBerge, D. 1995. Attentional processing: The brain’s art of mindfulness. Cambridge, MA: Harvard Univ.
Press.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 525
Ling, S., and M. Carrasco. 2007. Transient covert attention does alter appearance: A reply to Schneider 2006.
Perception & Psychophysics 69: 1051–1058.
Lu, Z. L., and B. A. Dosher. 1998. External noise distinguishes attention mechanisms. Vision Research 38:
1183–1198.
Luce, P. A. 1986. A computational analysis of uniqueness points in auditory word recognition. Perception &
Psychophysics 39: 155–158.
Luck, S. J., H. J. Heinze, G. R. Mangun, and S. A. Hillyard. 1990. Visual event-related potentials index
focussed attention within bilateral stimulus arrays: II. Functional dissociation of P1 and N1 components.
Electroencephalography and Clinical Neurophysiology 75: 528–542.
Luck, S. J., S. A. Hillyard, M. Mouloua, and H. L. Hawkins. 1996. Mechanisms of visual–spatial attention:
Resource allocation or uncertainty reduction? Journal of Experimental Psychology: Human Perception
and Performance 22: 725–737.
Luck, S. J., S. A. Hillyard, M. Mouloua, M. G. Woldorff, V. P. Clark, and H. L. Hawkins. 1994. Effects of spa-
tial cuing on luminance detectability: Psychophysical and electrophysiological evidence for early selec-
tion. Journal of Experimental Psychology: Human Perception and Performance 20: 887–904.
Macaluso, E., C. D. Frith, and J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial atten-
tion. Science 289: 1206–1208.
McDonald, J. J., and J. J. Green. 2008. Isolating event-related potential components associated with voluntary
control of visuo-spatial attention. Brain Research 1227: 96–109.
McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2003. Neural substrates of perceptual
enhancement by cross-modal spatial attention. Journal of Cognitive Neuroscience 15: 10–19.
McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2005. Neural basis of auditory-induced
shifts in visual time-order perception. Nature Neuroscience 8: 1197–1202.
McDonald, J. J., W. A. Teder-Sälejärvi, D. Heraldez, and S. A. Hillyard. 2001. Electrophysiological evidence
for the “missing link” in crossmodal attention. Canadian Journal of Experimental Psychology 55:
141–149.
McDonald, J. J., W. A. Teder-Sälejärvi, and S. A. Hillyard. 2000. Involuntary orienting to sound improves
visual perception. Nature 407: 906–908.
McDonald, J. J., and L. M. Ward. 1999. Spatial relevance determines facilitatory and inhibitory effects of audi-
tory covert spatial orienting. Journal of Experimental Psychology: Human Perception and Performance
25: 1234–1252.
McDonald, J. J., and L. M. Ward. 2000. Involuntary listening aids seeing: Evidence from human electrophysiol-
ogy. Psychological Science 11: 167–171.
Meredith, M. A., J. W. Nemitz, and B. E. Stein. 1987. Determinants of multisensory integration in superior
colliculus neurons: 1. Temporal factors. Journal of Neuroscience 7: 3215–3229.
Meredith, M. A., and B. E. Stein. 1996. Spatial determinants of multisensory integration in cat superior col-
liculus neurons. Journal of Neurophysiology 75: 1843–1857.
Mishra, J., A. Martinez, T. J. Sejnowski, and S. A. Hillyard. 2007. Early cross-modal interactions in auditory
and visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience 27: 4120–4131.
Mishra, J., A. Martinez, and S. A. Hillyard. 2010. Effect of attention on early cortical processes associated with
the sound-induced extra flash illusion. Journal of Cognitive Neuroscience 22: 1714–1729.
Pashler, H. E. 1998. The psychology of attention. Cambridge, MA: MIT Press.
Posner, M. I. 1978. Chronometric explorations of mind. Hillsdale, NJ: Lawrence Erlbaum.
Posner, M. I., Y. Cohen, and R. D. Rafal. 1982. Neural systems control of spatial orienting. Philosophical
Transactions of the Royal Society of London Series B-Biological Sciences 298: 187–198.
Posner, M. I., and M. E. Raichle 1994. Images of mind. New York: W. H. Freeman.
Posner, M. I., J. A. Walker, F. J. Friedrich, and R. D. Rafal. 1984. Effects of parietal injury on covert orienting
of attention. Journal of Neuroscience 4: 1863–1874.
Prime, D. J., J. J. McDonald, J. Green, and L. M. Ward. 2008. When cross-modal spatial attention fails.
Canadian Journal of Experimental Psychology 62: 192–197.
Prinzmetal, W., V. Long, and J. Leonhardt. 2008. Involuntary attention and brightness contrast. Perception &
Psychophysics 70: 1139–1150.
Prinzmetal, W., C. McCool, and S. Park. 2005. Attention: Reaction time and accuracy reveal different mecha-
nisms. Journal of Experimental Psychology: General 134: 73–92.
Rhodes, G. 1987. Auditory attention and the representation of spatial information. Perception & Psychophysics
42: 1–14.
Santangelo, V., and C. Spence. 2008. Crossmodal attentional capture in an unspeeded simultaneity judgement
task. Visual Cognition 16: 155–165.
526 The Neural Bases of Multisensory Processes
Schneider, K. A., and D. Bavelier. 2003. Components of visual prior entry. Cognitive Psychology 47:
333–366.
Schneider, K. A., and M. Komlos. 2008. Attention biases decisions but does not alter appearance. Journal of
Vision 8: 1–10.
Schroeder, C. E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory” processing. Current
Opinion in Neurobiology 15: 454–458.
Shaw, M. L. 1982. Attending to multiple sources of information: 1.The integration of information in decision-
making. Cognitive Psychology 14: 353–409.
Shaw, M. L. 1984. Division of attention among spatial locations: A fundamental difference between detection
of letters and detection of luminance increments. In Attention and Performance X, ed. H. Bouma and
D. G. Bouwhui, 109–121. Hillsdale, NJ: Erlbaum.
Shimojo, S., S. Miyauchi, and O. Hikosaka. 1997. Visual motion sensation yielded by non-visually driven
attention. Vision Research 37: 1575–1580.
Shiu, L. P., and H. Pashler. 1994. Negligible effect of spatial precueing on identification of single digits. Journal
of Experimental Psychology: Human Perception and Performance 20: 1037–1054.
Shore, D. I., C. Spence, and R. M. Klein. 2001. Visual prior entry. Psychological Science 12: 205–212.
Smith, P. L. 2000. Attention and luminance detection: Effects of cues, masks, and pedestals. Journal of
Experimental Psychology: Human Perception and Performance 26: 1401–1420.
Smith, P. L., and R. Ratcliff. 2009. An integrated theory of attention and decision making in visual signal detec-
tion. Psychological Review 116: 283–317.
Soto-Faraco, S., J. McDonald, and A. Kingstone. 2002. Gaze direction: Effects on attentional orienting and
crossmodal target responses. Poster presented at the annual meeting of the Cognitive Neuroscience
Society, San Francisco, CA.
Spence, C. J., and J. Driver. 1994. Covert spatial orienting in audition—exogenous and endogenous mecha-
nisms. Journal of Experimental Psychology: Human Perception and Performance 20: 555–574.
Spence, C., and J. Driver. 1997. Audiovisual links in exogenous covert spatial orienting. Perception &
Psychophysics 59: 1–22.
Spence, C., and J. J. McDonald. 2004. The crossmodal consequences of the exogenous spatial orienting of
attention. In The handbook of multisensory processing, ed. G. A. Calvert, C. Spence, and B. E. Stein,
3–25. Cambridge, MA: MIT Press.
Spence, C., J. J. McDonald, and J. Driver. 2004. Exogenous spatial cuing studies of human crossmodal atten-
tion and multisensory integration. In Crossmodal space and crossmodal attention, ed. C. Spence and J.
Driver, 277–320. Oxford: Oxford Univ. Press.
Sperling, G., and B. A. Dosher. 1986. Strategy and optimization in human information processing. In Handbook
of Perception and Human Performance, ed. K. R. Boff, L. Kaufman, and J. P. Thomas, 1–65. New York:
Wiley.
Stein, B. E., and M. A. Meredith 1993. The merging of the senses. Cambridge, MA: MIT Press.
Stein, B. E., T. R. Stanford, R. Ramachandran, T. J. Perrault, and B. A. Rowland. 2009. Challenges in quantify-
ing multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain
Research 198: 113–126.
Stelmach, L. B., and C. M. Herdman. 1991. Directed attention and perception of temporal-order. Journal of
Experimental Psychology: Human Perception and Performance 17: 539–550.
Stevens, H. C. 1904. A simple complication pendulum for qualitative work. American Journal of Psychology
15: 581.
Stone, J. V., N. M. Hunkin, J. Porrill et al. 2001. When is now? Perception of simultaneity. Proceedings of the
Royal Society of London Series B: Biological Sciences 268: 31–38.
Störmer, V. S., J. J. McDonald, and S. A. Hillyard. 2009. Cross-modal cueing of attention alters appearance and
early cortical processing of visual stimuli. PNAS 106: 22456–22461.
Talsma, D., and M. G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of
effects on the evoked brain activity. Journal of Cognitive Neuroscience 17: 1098–1114.
Tassinari, G., S. Aglioti, L. Chelazzi, A. Peru, and G. Berlucchi. 1994. Do peripheral non-informative cues
induce early facilitation of target detection. Vision Research 34: 179–189.
Teder-Sälejärvi, W. A., T. F. Münte, F. J. Sperlich, and S. A. Hillyard. 1999. Intra-modal and cross-modal
spatial attention to auditory and visual stimuli. An event-related brain potential study. Cognitive Brain
Research 8: 327–343.
Titchener, E. N. 1908. Lectures on the elementary psychology of feeling and attention. New York: The
MacMillan Company.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception 527
Treisman, A., and G. Geffen. 1967. Selective attention: Perception or response? Quarterly Journal of
Experimental Psychology 19: 1–18.
Vibell, J., C. Klinge, M. Zampini, C. Spence, and A. C. Nobre. 2007. Temporal order is coded temporally in the
brain: Early event-related potential latency shifts underlying prior entry in a cross-modal temporal order
judgment task. Journal of Cognitive Neuroscience 19: 109–120.
Ward, L. M., J. J. McDonald, and N. Golestani. 1998. Cross-modal control of attention shifts. In Visual atten-
tion, ed. R. D. Wright, 232–268. New York: Oxford Univ. Press.
Ward, L. M., J. J. McDonald, and D. Lin. 2000. On asymmetries in cross-modal spatial attention orienting.
Perception & Psychophysics 62: 1258–1264.
Watt, R. J. 1991. Understanding vision. San Diego, CA: Academic Press.
Worden, M. S., J. J. Foxe, N. Wang, and G. V. Simpson. 2000. Anticipatory biasing of visuospatial attention
indexed by retinotopically specific-band electroencephalography increases over occipital cortex. Journal
of Neuroscience 20 (RC63): 1–6.
Wright, R. D., and L. M. Ward. 2008. Orienting of attention. New York: Oxford Univ. Press.
Wundt, W. 1874. Grundzüge der physiologischen psychologies [Foundations of physiological psychology].
Leipzig, Germany: Wilhelm Engelmann.
27 The Colavita Visual
Dominance Effect
Charles Spence, Cesare Parise, and Yi-Chuan Chen
CONTENTS
27.1 Introduction........................................................................................................................... 529
27.2 Basic Findings on Colavita Visual Dominance Effect.......................................................... 531
27.2.1 Stimulus Intensity...................................................................................................... 531
27.2.2 Stimulus Modality..................................................................................................... 531
27.2.3 Stimulus Type............................................................................................................ 532
27.2.4 Stimulus Position....................................................................................................... 532
27.2.5 Bimodal Stimulus Probability................................................................................... 532
27.2.6 Response Demands.................................................................................................... 533
27.2.7 Attention.................................................................................................................... 533
27.2.8 Arousal...................................................................................................................... 534
27.2.9 Practice Effects.......................................................................................................... 535
27.3 Interim Summary.................................................................................................................. 537
27.4 Prior Entry and Colavita Visual Dominance Effect.............................................................. 537
27.5 Explaining the Colavita Visual Dominance Effect...............................................................540
27.5.1 Accessory Stimulus Effects and Colavita Effect.......................................................540
27.5.2 Perceptual and Decisional Contributions to Colavita Visual Dominance Effect...... 541
27.5.3 Stimulus, (Perception), and Response?...................................................................... 542
27.6 Biased (or Integrated) Competition and Colavita Visual Dominance Effect........................ 545
27.6.1 Putative Neural Underpinnings of Modality-Based Biased Competition................. 545
27.6.2 Clinical Extinction and Colavita Visual Dominance Effect..................................... 547
27.7 Conclusions and Questions for Future Research...................................................................548
27.7.1 Modeling the Colavita Visual Dominance Effect..................................................... 549
27.7.2 Multisensory Facilitation versus Interference........................................................... 549
References....................................................................................................................................... 550
27.1 INTRODUCTION
Visually dominant behavior has been observed in many different species, including birds, cows, dogs,
and humans (e.g., Partan and Marler 1999; Posner et al. 1976; Uetake and Kudo 1994; Wilcoxin et al.
1971). This has led researchers to suggest that visual stimuli may constitute “prepotent” stimuli for
certain classes of behavioral responses (see Colavita 1974; Foree and LoLordo 1973; LoLordo 1979;
Meltzer and Masaki 1973; Shapiro et al. 1980). One particularly impressive example of vision’s
dominance over audition (and more recently, touch) has come from research on the Colavita visual
dominance effect (Colavita 1974). In the basic experimental paradigm, participants have to make
speeded responses to a random series of auditory (or tactile), visual, and audiovisual (or visuotac-
tile) targets, all presented at a clearly suprathreshold level. Participants are instructed to make one
response whenever an auditory (or tactile) target is presented, another response whenever a visual
target is presented, and to make both responses whenever the auditory (or tactile) and visual targets
529
530 The Neural Bases of Multisensory Processes
are presented at the same time (i.e., on the bimodal target trials). Typically, the unimodal targets are
presented more frequently than the bimodal targets (the ratio of 40% auditory—or tactile—targets,
40% visual targets, and 20% bimodal targets has often been used; e.g., Koppen and Spence 2007a,
2007b, 2007c). The striking result to have emerged from a number of studies on the Colavita effect
is that although participants have no problem in responding rapidly and accurately to the unimodal
targets, they often fail to respond to the auditory (or tactile) targets on the bimodal target trials (see
Figure 27.1a and b). It is almost as if the simultaneous presentation of the visual target leads to the
“extinction” of the participants’ perception of, and/or response to, the nonvisual target on a propor-
tion of the bimodal trials (see Egeth and Sager 1977; Hartcher-O’Brien et al. 2008; Koppen et al.
2009; Koppen and Spence 2007c).
Although the majority of research on the Colavita effect has focused on the pattern of errors
made by participants in the bimodal target trials, it is worth noting that visual dominance can also
show up in reaction time (RT) data. For example, Egeth and Sager (1977) reported that although
participants responded more rapidly to unimodal auditory targets than to unimodal visual targets,
this pattern of results was reversed on the bimodal target trials—that is, participants responded
50 50
40 40 Colavita
Colavita effect
30 effect 30
20 20
10 10
0 0
Both responses Vision-only Audition-only Both responses Vision-only Touch-only
Response type Response type
50 50
40 40
30 Colavita 30 Colavita
20 effect 20 effect
10 10
0 0
Both responses Vision-only Audition-only Both responses Vision-only Audition-only
Response type Response type
FIGURE 27.1 Results of experiments conducted by Elcock and Spence (2009) highlighting a significant
Colavita visual dominance effect over both audition (a) and touch (b). Values reported in the graphs refer to the
percentage of bimodal target trials in which participants correctly made both responses, or else made either
a visual-only or auditory- (tactile-) only response. The order in which the two experiments were performed
was counterbalanced across participants. Nine participants (age, 18–22 years) completed 300 experimental
trials (40% auditory; 40% visual, and 20% bimodal; plus 30 unimodal practice trials) in each experiment. In
audiovisual experiment (a), auditory stimulus consisted of a 4000-Hz pure tone (presented at 63dB), visual
stimulus consisted of illumination of loudspeaker cone by an LED (64.3 cd/m2). In the visuotactile experiment
(b), the stimulus was presented to a finger on the participant’s left hand, and the visual target now consisted of
illumination of the same finger. Thus, auditory, visual, and tactile stimuli were presented from exactly the same
spatial location. Participants were given 2500 ms from the onset of the target in which to respond, and intertrial
interval was set at 650 ms. The Colavita effect was significant in both cases, that is, participants in audiovisual
experiment made 45% more visual-only than auditory-only responses, whereas participants in visuotactile
experiment made 41% more visual-only than tactile-only responses. (c and d) Results from Elcock and Spence’s
Experiment 3, in which they investigated the effects of caffeine (c) versus a placebo pill (d) on the audio
visual Colavita visual dominance effect. The results show that participants made significantly more visual-only
than auditory-only responses in both conditions (24% and 29% more, respectively), although there was no sig-
nificant difference between the magnitude of Colavita visual dominance effect reported in two cases.
The Colavita Visual Dominance Effect 531
more rapidly to the visual targets than to the auditory targets. Note that Egeth and Sager made sure
that their participants always responded to both the auditory and visual targets on the bimodal tri-
als by presenting each target until the participant had made the relevant behavioral response.* A
similar pattern of results in the RT data has also been reported in a number of other studies (e.g.,
Colavita 1974, 1982; Colavita and Weisberg 1979; Cooper 1998; Koppen and Spence 2007a; Sinnett
et al. 2007; Zahn et al. 1994).
In this article, we will focus mainly (although not exclusively) on the Colavita effect present in the
error data (in line with the majority of published research on this phenomenon). We start by sum-
marizing the basic findings to have emerged from studies of the Colavita visual dominance effect
conducted over the past 35 years or so. By now, many different factors have been investigated in
order to determine whether they influence the Colavita effect: Here, they are grouped into stimulus-
related factors (such as stimulus intensity, stimulus modality, stimulus type, stimulus position, and
bimodal stimulus probability) and task/participant-related factors (such as attention, arousal, task/
response demands, and practice). A range of potential explanations for the Colavita effect are evalu-
ated, and all are shown to be lacking. A new account of the Colavita visual dominance effect is
therefore proposed, one that is based on the “biased competition” model put forward by Desimone
and Duncan (1995; see also Duncan 1996; Peers et al. 2005). Although this model was initially
developed in order to provide an explanation for the intramodal competition taking place between
multiple visual object representations in both normal participants and clinical patients (suffering
from extinction), here we propose that it can be extended to provide a helpful framework in which
to understand what may be going on the Colavita visual dominance effect. In particular, we argue
that a form of cross-modal biased competition can help to explain why participants respond to the
visual stimulus while sometimes failing to respond to the nonvisual stimulus on the bimodal target
trials in the Colavita paradigm. More generally, it is our hope that explaining the Colavita visual
dominance effect may provide an important step toward understanding the mechanisms underlying
multisensory interactions. First, though, we review the various factors that have been hypothesized
to influence the Colavita visual dominance effect.
* That is, the visual target was only turned off once the participants made a visual response, and the auditory target was
only turned off when the participants made an auditory response. This contrasts with Colavita’s (1974) studies, in which
a participant’s first response turned off all the stimuli, and with other more recent studies in which the targets were only
presented briefly (i.e., for 50 ms; e.g., Koppen and Spence 2007a, 2007b, 2007c, 2007d).
532 The Neural Bases of Multisensory Processes
touch in normal participants (Hartcher-O’Brien et al. 2008, 2010; Hecht and Reiner 2009; see also
Gallace et al. 2007). Costantini et al. (2007) have even reported that vision dominates over touch
in extinction patients (regardless of whether the two stimuli were presented from the same position,
or from different sides; see also Bender 1952). Interestingly, however, no clear pattern of sensory
dominance has, as yet, been observed when participants respond to simultaneously presented audi-
tory and tactile stimuli (see Hecht and Reiner 2009; Occelli et al. 2010; but see Bonneh et al. 2008,
for a case study of an autistic child who exhibited auditory dominance over both touch and vision).
Intriguingly, Hecht and Reiner (2009) have recently reported that vision no longer dominates
when targets are presented in all three modalities (i.e., audition, vision, and touch) at the same time.
In their study, the participants were given a separate button with which to respond to the targets
in each modality, and had to press one, two, or three response keys depending on the combination
of target modalities that happened to be presented on each trial. Whereas vision dominated over
both audition and touch in the bimodal target trials, no clear pattern of dominance was shown on
the trimodal target trials (see also Shapiro et al. 1984, Experiment 3). As yet, there is no obvious
explanation for this result.
dominance effect can still be obtained if the probability of each type of target is equalized (i.e.,
when 33.3% auditory, 33.3% visual, and 33.3% bimodal targets are presented; see Koppen and
Spence 2007a). Koppen and Spence (2007d) investigated the effect of varying the probability of
bimodal target trials on the Colavita visual dominance effect (while keeping the relative propor-
tion of unimodal auditory and visual target trials matched).* They found that although a significant
Colavita effect was demonstrated whenever the bimodal targets were presented on 60% or less of
the trials, vision no longer dominated when the bimodal targets were presented on 90% of the tri-
als (see also Egeth and Sager 1974; Manly et al. 1999; Quinlan 2000). This result suggests that the
Colavita effect is not caused by stimulus-related (i.e., sensory) factors, since these should not have
been affected by any change in the probability of occurrence of bimodal targets (cf. Odgaard et al.
2003, 2004, on this point). Instead, the fact that the Colavita effect disappears if the bimodal targets
are presented too frequently (i.e., on too high a proportion of the trials) would appear to suggest that
response-related factors (linked to the probability of participants making bimodal target responses)
are likely to play an important role in helping to explain the Colavita effect (see also Gorea and Sagi
2000).
27.2.7 Attention
Originally, researchers thought that the Colavita visual dominance effect might simply reflect a
predisposition by participants to direct their attention preferentially toward the visual modality
(Colavita 1974; Posner et al. 1976). Posner et al.’s idea was that people endogenously (or voluntarily)
directed their attention toward the visual modality in order to make up for the fact that visual stimuli
are generally less alerting than stimuli presented in the other modalities (but see Spence et al. 2001b,
footnote 5). Contrary to this suggestion, however, a number of more recent studies have actually
* Note that researchers have also manipulated the relative probability of unimodal auditory and visual targets (see Egeth
and Sager 1977; Quinlan 2000; Sinnett et al. 2007). However, since such probability manipulations have typically been
introduced in the context of trying to shift the focus of a participant’s attention between the auditory and visual modali-
ties, they will be discussed later (see Section 27.2.7).
534 The Neural Bases of Multisensory Processes
shown that although the manipulation of a person’s endogenous attention can certainly modulate
the extent to which vision dominates over audition, it cannot in and of itself be used to reverse the
Colavita effect. That is, even when a participant’s attention is directed toward the auditory modality
(i.e., by verbally instructing them to attend to audition or by presenting unimodal auditory targets
much more frequently than unimodal visual targets), people still exhibit either visually dominant
behavior or else their behavior shows no clear pattern of dominance (see Koppen and Spence 2007a,
2007d; Sinnett et al. 2007). These results therefore demonstrate that any predisposition that partici-
pants might have to direct their attention voluntarily (or endogenously) toward the visual modality
cannot explain why vision always seems to dominate in the Colavita visual dominance effect.
De Reuck and Spence (2009) recently investigated whether varying the modality of a second-
ary task would have any effect on the magnitude of the Colavita visual dominance effect. To
this end, a video game (“Food boy” by T3Software) and a concurrent auditory speech stream
(consisting of pairs of auditory words delivered via a central loudspeaker) were presented in the
background while participants performed the two-response version of the Colavita task (i.e.,
pressing one key in response to auditory targets, another key in response to visual targets, and
both response keys on the bimodal target trials; the auditory targets in this study consisted of a
4000-Hz pure tone presented from a loudspeaker cone placed in front of the computer screen,
whereas the visual target consisted of the illumination of a red light-emitting device (LED), also
mounted in front of the computer screen). In the condition involving the secondary visual task,
the participants performed the Colavita task with their right hand while playing the video game
with their left hand (note that the auditory distracting speech streams were presented in the back-
ground, although they were irrelevant in this condition and so could be ignored). The participants
played the video game using a computer mouse to control a character moving across the bottom
of the computer screen. The participants had to “swallow” as much of the food dropping from the
top of the screen as possible, while avoiding any bombs that happened to fall. In the part of the
study involving an auditory secondary task, the video game was run in the demonstration mode
to provide equivalent background visual stimulation to the participants who now had to respond
by pressing a button with their left hand whenever they heard an animal name in the auditory
stream.
The results showed that the modality of the secondary task (auditory or visual) did not modulate
the magnitude of the Colavita visual dominance effect significantly, that is, the participants failed
to respond to a similar number of the auditory stimuli regardless of whether they were performing a
secondary task that primarily involved participants having to attend to the auditory or visual modal-
ity. De Reuck and Spence’s (2009) results therefore suggest that the Colavita visual dominance
effect may be insensitive to manipulations of participants’ attention toward either the auditory or
visual modality that are achieved by varying the requirements of a simultaneously performed sec-
ondary task (see Spence and Soto-Faraco 2009).
Finally, Koppen and Spence (2007a) have shown that exogenously directing a participant’s atten-
tion toward either the auditory or visual modality via the presentation of a task-irrelevant nonpre-
dictive auditory or visual cue 200 ms before the onset of the target (see Rodway 2005; Spence et al.
2001a; Turatto et al. 2002) has only a marginal effect on the magnitude of vision’s dominance over
audition (see also Golob et al. 2001). Taken together, the results reported in this section therefore
highlight the fact that although attentional manipulations (be they exogenous or endogenous) can
sometimes be used to modulate, or even to eliminate, the Colavita visual dominance effect, they
cannot be used to reverse it.
27.2.8 Arousal
Early animal research suggested that many examples of visual dominance could be reversed under
conditions in which an animal was placed in a highly aroused state (i.e., when, for example, fear-
ful of the imminent presentation of an electric shock; see Foree and LoLordo 1973; LoLordo and
The Colavita Visual Dominance Effect 535
Furrow 1976; Randich et al. 1978). It has been reported that although visual stimuli tend to control
appetitive behaviors, auditory stimuli tend to control avoidance behaviors in many species. Shapiro
et al. (1984) extended the idea that changes in the level of an organism’s arousal might change the
pattern of sensory dominance in the Colavita task to human participants (see also Johnson and
Shapiro 1989; Shapiro and Johnson 1987). They demonstrated what looked like auditory domi-
nance (i.e., participants making more auditory-only than visual-only responses in the Colavita task)
under conditions in which their participants were aversively motivated (by the occurrence of electric
shock, or to a lesser extent by the threat of electric shock, or tactile stimulation, presented after the
participants’ response on a random 20% of the trials).
It should, however, be noted that no independent measure of the change in a participant’s level
of arousal (i.e., such as a change in their galvanic skin response) was provided in this study. What
is more, Shapiro et al.’s (1984) participants were explicitly told to respond to the stimulus that they
perceived first on the bimodal target trials, that is, the participants effectively had to perform a tem-
poral order judgment (TOJ) task. What this means in practice is that their results (and those from
the study of Shapiro and Johnson (1987) and Johnson and Shapiro (1989), in which similar instruc-
tions were given) may actually reflect the effects of arousal on “prior entry” (see Spence 2010; Van
Damme et al. 2009b), rather than, as the authors argued, the effects of arousal on the Colavita visual
dominance effect.
Indeed, the latest research has demonstrated that increased arousal can lead to the prior entry of
certain classes of stimuli over others (when assessed by means of a participant’s responses on a TOJ
task; Van Damme et al. 2009b). In Van Damme et al.’s study, auditory and tactile stimuli delivered
from close to one of the participant’s hands were prioritized when an arousing picture showing
physical threat to a person’s bodily tissues was briefly flashed beforehand from the same (rather
than opposite) location. Meanwhile, Van Damme et al. (2009a) have shown that, when participants
are instructed to respond to both of the stimuli in the bimodal trials, rather than just to the stimulus
that the participant happens to have perceived first, the effects of arousal on the Colavita visual
dominance effect are far less clear-cut (we return later to the question of what role, if any, prior entry
plays in the Colavita visual dominance effect).
Elcock and Spence (2009) recently investigated the consequences for the Colavita effect of phar-
macologically modulating the participants’ level of arousal by administering caffeine. Caffeine
is known to increase arousal and hence, given Shapiro et al.’s (1984) research, ingesting caffeine
might be expected to modulate the magnitude of the Colavita visual dominance effect (Smith et al.
1992).* To this end, 15 healthy participants were tested in a within-participants, double-blind study,
in which a 200-mg caffeine tablet (equivalent to drinking about two cups of coffee) was taken 40
min before one session of the Colavita task and a visually identical placebo pill was taken before the
other session (note that the participants were instructed to refrain from consuming any caffeine in
the morning before taking part in the study). The Colavita visual dominance effect was unaffected
by whether the participants had ingested the caffeine tablet or the placebo (see Figure 27.1c and d).
Taken together, the results reported in this section would therefore appear to suggest that, contrary
to Shapiro et al.’s early claim, the magnitude of the Colavita visual dominance effect is not affected
by changes in a participant’s level of arousal.
* Caffeine is a stimulant that accelerates physiological activity, and results in the release of adrenaline and the increased
production of the neurotransmitter dopamine. Caffeine also interferes with the operation of another neurotransmitter:
adenosine (Smith 2002; Zwyghuizen-Doorenbos et al. 1990).
536 The Neural Bases of Multisensory Processes
15
10
0
–600 –400 –200 0 200 400 600
Audition Vision
SOA (ms)
first first
FIGURE 27.2 Graph highlighting the results of Koppen and Spence’s (2007b) study of Colavita effect in
which auditory and visual targets on bimodal target trials could be presented at any one of 10 SOAs. Although
a significant visual dominance effect was observed at a majority of asynchronies around objective simulta-
neity, a significant auditory dominance effect was only observed at the largest auditory-leading asynchrony.
Shaded gray band in the center of the graph represents the temporal window of audiovisual integration. Shaded
areas containing the ear and the eye schematically highlight SOAs at which auditory and visual dominance,
respectively, were observed. Note though (see text on this point) that differences between the proportion of
auditory-only and visual-only responses only reached statistical significance at certain SOAs (that said, the
trend in the data is clear). The error bars represent standard errors of means.
for a review). In these studies, each participant was only ever presented with a maximum of five
or six bimodal targets (see Colavita 1974, 1982; Colavita et al. 1976; Colavita and Weisberg 1979).
Contrast this with the smaller Colavita effects that have been reported in more recent research,
where as many as 120 bimodal targets were presented to each participant (e.g., Hartcher-O’Brien
et al. 2008; Koppen et al. 2008; Koppen and Spence 2007a, 2007c). This observation leads on to the
suggestion that the Colavita visual dominance effect may be more pronounced early on in the exper-
imental session (see also Kristofferson 1965).* That said, significant Colavita visual dominance
effects have nevertheless still been observed in numerous studies where participants’ performance
has been averaged over many hundreds of trials. Here, it may also be worth considering whether
any reduction in the Colavita effect resulting from increasing the probability of (and/or practice
with responding to) bimodal stimuli may also be related to the phenomenon of response coupling
(see Ulrich and Miller 2008). That is, the more often two independent target stimuli happen to be
presented at exactly the same time, the more likely it is that the participant will start to couple (i.e.,
program) their responses to the two stimuli together.
In the only study (as far as we are aware) to have provided evidence relevant to the question of
the consequence of practice on the Colavita visual dominance effect, the vigilance performance of
a group of participants was assessed over a 3-h period (Osborn et al. 1963). The participants in this
study had to monitor a light and sound source continuously for the occasional (once every 2½ min)
brief (i.e., lasting only 41 ms) offset of either or both of the stimuli. The participants were instructed
to press one button whenever the light was extinguished and another button whenever the sound was
interrupted. The results showed that although participants failed to respond to more of the auditory
than visual targets during the first 30-min session (thus showing a typical Colavita visual domi-
nance effect), this pattern of results reversed in the final four 30-min sessions (i.e., participants made
* Note that if practice were found to reduce the magnitude of the Colavita visual dominance effect, then this might pro-
vide an explanation for why increasing the probability of occurrence of bimodal target trials up to 90% in Koppen and
Spence’s (2007d) study has been shown to eliminate the Colavita effect (see Section 27.2.5). Alternatively, however,
increasing the prevalence (or number) of bimodal targets might also lead to the increased coupling of a participants’
responses on the bimodal trials (see main text for further details; Ulrich and Miller 2008).
The Colavita Visual Dominance Effect 537
more auditory-only than visual-only responses on the bimodal target trials; see Osborn et al. 1963;
Figure 27.2). It is, however, unclear whether these results necessarily reflect the effects of practice
on the Colavita visual dominance effect, or whether instead they may simply highlight the effects
of fatigue or boredom after the participants had spent several hours on the task (given that auditory
events are more likely to be responded to than visual events should the participants temporarily look
away or else close their eyes).
* Note the importance of using the same stimuli within the same pool of participants, given the large individual differences
in the perception of audiovisual simultaneity that have been reported previously (Smith 1933; Spence 2010; Stone et al.
2001).
538 The Neural Bases of Multisensory Processes
It is, however, important to note that there is a potential concern here regarding the interpretation
of Koppen and Spence’s (2007b) findings. Remember that the Colavita visual dominance effect is
eliminated when bimodal audiovisual targets are presented too frequently (e.g., see Section 27.2.5).
Crucially, Koppen and Spence looked for any evidence of the prior entry of visual stimuli into
awareness in their TOJ study under conditions in which a pair of auditory and visual stimuli were
presented on each and every trial. The possibility therefore remains that visual stimuli may only
be perceived before simultaneously presented auditory stimuli under those conditions in which the
occurrence of bimodal stimuli is relatively rare (cf. Miller et al. 2009). Thus, in retrospect, Koppen
and Spence’s results cannot be taken as providing unequivocal evidence against the possibility that
visual stimuli have prior entry into participants’ awareness on the bimodal trials in the Colavita
paradigm. Ideally, future research will need to look for any evidence of visual prior entry under
conditions in which the bimodal targets (in the TOJ task) are actually presented as infrequently as
when the Colavita effect is demonstrated behaviorally (i.e., when the bimodal targets requiring a
detection/discrimination response are presented on only 20% or so of the trials).
Given these concerns over the design (and hence interpretation) of Koppen and Spence’s (2007b)
TOJ study, it is interesting to note that Lucey and Spence (2009) were recently able to eliminate
the Colavita visual dominance effect by delaying the onset of the visual stimulus by a fixed 50 ms
with respect to the auditory stimuli on the bimodal target trials. Lucey and Spence used a between-
participants experimental design in which one group of participants completed the Colavita task
with synchronous auditory and visual targets on the bimodal trials (as in the majority of previ-
ous studies), whereas for the other group of participants, the onset of the visual target was always
delayed by 50 ms with respect to that of the auditory target. The apparatus and materials were
identical to those used by Elcock and Spence (2009; described earlier) although the participants
in Lucey and Spence’s study performed the three-button version of the audiovisual Colavita task
(i.e., in which participants had separate response keys for auditory, visual, and bimodal targets).
The results revealed that although participants made significantly more vision-only than auditory-
only responses in the synchronous bimodal condition (10.3% vs. 2.4%, respectively), no significant
Colavita visual dominance effect was reported when the onset of the visual target was delayed (4.6%
vs. 2.9%, respectively; n.s.). These results therefore demonstrate that the Colavita visual dominance
effect can be eliminated by presenting the auditory stimulus slightly ahead of the visual stimu-
lus. The critical question here, following on from Lucey and Spence’s results, is whether auditory
dominance would have been elicited had the auditory stimulus led the visual stimulus by a greater
interval.
Koppen and Spence (2007b) have provided an answer to this question. In their study of the
Colavita effect, the auditory and visual stimuli on the bimodal target trials were presented at one
of 10 stimulus onset asynchronies (SOAs; from auditory leading by 600 ms through to vision lead-
ing by 600 ms). Koppen and Spence found that the auditory lead needed in order to eliminate the
Colavita visual dominance effect on the bimodal target trials was correlated with the SOA at which
participants reliably started to perceive the auditory stimulus as having been presented before the
visual stimulus (defined as the SOA at which participants make 75% audition first responses; see
Koppen and Spence 2007b; Figure 27.3). This result therefore suggests that the prior entry of the
visual stimulus to awareness plays some role in its dominance over audition in the Colavita effect.
That said, however, Koppen and Spence also found that auditory targets had to be presented 600 ms
before visual targets in order for participants to make significantly more auditory-only than visual
only responses on the bimodal target trials (although a similar nonsignificant trend toward auditory
dominance was also reported at an auditory lead of 300 ms; see Figure 27.2).
It is rather unclear, however, what exactly caused the auditorily dominant behavior observed at
the 600 ms SOA in Koppen and Spence’s (2007b) study. This (physical) asynchrony between the
auditory and visual stimuli is far greater than any shift in the perceived timing of visual relative to
auditory stimuli that might reasonably be expected due to the prior entry of the visual stimulus to
awareness when the targets were actually presented simultaneously (see Spence 2010). In fact, this
The Colavita Visual Dominance Effect 539
(a) (b)
Neural activity
Neural activity
RTV
A(V) criterion
40 ms V criterion A criterion
+ RTV(A) RTV
V(A) criterion
R R
A
V
= =
A) V)
R V( R A(
RTA Time Time
(c)
35 ms
Neural activity
Neural activity
+ RTA RTA(V)
Time
Stimulus Unimodal
onset RT
Criterion Criterion
A)
V)
V(
R R RA R A(
V
Time Time
(d)
Neural activity
Unimodal criterion
A(V) criterion
al R
od
im
V(A) criterion
Un
A)
V(
R V
)
R A(
Time
FIGURE 27.3 (a) Schematic illustration of the results of Sinnett et al.’s (2008; Experiment 2) speeded target
detection study. The figure shows how the presentation of an accessory sound facilitates visual RTs (RT V(A)),
whereas the presentation of an accessory visual stimulus delays auditory RTs (RTA(V)). Note that unimodal
auditory (RTA) and visual (RT V) response latencies were serendipitously matched in this study (V, visual
target; A, auditory stimulus). (b) Schematic diagrams showing how the asymmetrical cross-modal accessory
stimulus effects reported by Sinnett et al. might lead to more (and more rapid) vision-only than auditory-only
responses on bimodal trials. Conceptually simple models outlined in panels (b) and (c) account for Sinnett et
al.’s asymmetrical RT effect in terms of changes in the criterion for responding to auditory and visual targets
(on bimodal as opposed to unimodal trials; (b) or in terms of asymmetrical cross-modal changes in the rate of
information accrual (c). We plot the putative rate of information accrual (R) as a function of stimuli presented.
However, the results of Koppen et al.’s (2009) recent signal detection study of Colavita effect has now pro-
vided evidence that is inconsistent with both of these simple accounts (see Figure 27.4). Hence, in panel (d), a
mixture model is proposed in which the presentation of an accessory stimulus in one modality leads both to a
change in criterion for responding to targets in the other modality (in line with the results of Koppen et al.’s,
study) and also to an asymmetrical effect on the rate of information accrual in the other modality (see Koppen
et al. 2007a; Miller 1986).
540 The Neural Bases of Multisensory Processes
SOA is also longer than the mean RT of participants’ responses to the unimodal auditory (440 ms)
targets. Given that the mean RT for auditory only responses on the bimodal target trials was only
470 ms (i.e., 30 ms longer, on average, than the correct responses on the bimodal trials; see Koppen
and Spence 2007b, Figure 1 and Table 1), one can also rule out the possibility that this failure to
report the visual stimulus occurred on trials in which the participants made auditory responses that
were particularly slow. Therefore, given that the visual target on the bimodal trials (in the 600 ms
SOA vision-lagging condition) was likely being extinguished by an already-responded-to auditory
target, one might think that this form of auditory dominance reflects some sort of refractory period
effect (i.e., resulting from the execution of the participants’ response to the first target; see Pashler
1994; Spence 2008), rather than the Colavita effect proper.
In summary, although Koppen and Spence’s (2007b) results certainly do provide an example
of auditory dominance, the mechanism behind this effect is most probably different from the
one causing the visual dominance effect that has been reported in the majority of studies (of the
Colavita effect), where the auditory and visual stimuli were presented simultaneously (see also
Miyake et al. 1986). Thus, although recent research has shown that delaying the presentation of
the visual stimulus can be used to eliminate the Colavita visual dominance effect (see Koppen and
Spence 2007b; Lucey and Spence 2009), and although the SOA at which participants reliably start
to perceive the auditory target as having been presented first correlates with the SOA at which the
Colavita visual dominance effect no longer occurs (Koppen and Spence 2007b), we do not, as yet,
have any convincing evidence that auditory dominance can be observed in the Colavita paradigm
by presenting the auditory stimulus slightly before the visual stimulus on the bimodal target trials
(i.e., at SOAs where the visual target is presented before the participants have initiated/executed
their response to the already-presented auditory target). That is, to date, no simple relationship has
been demonstrated between the SOA on the audiovisual target trials in the Colavita paradigm and
modality dominance. Hence, we need to look elsewhere for an explanation of vision’s advantage
in the Colavita visual dominance effect. Recent progress in understanding what may be going on
here has come from studies looking at the effect of accessory stimuli presented in one modality on
participants’ speeded responding to targets presented in another modality (Sinnett et al. 2008), and
from studies looking at the sensitivity and criterion of participants’ responses in the Colavita task
(Koppen et al. 2009).
unimodal auditory targets. Note that the argument here is phrased in terms of changes in the crite-
rion for responding set by participants, rather than in terms of changes in the perceptual threshold,
given the evidence cited below that behavioral responses can sometimes be elicited under conditions
in which participants remain unaware (i.e., they have no conscious access to the inducing stimulus).
According to Sinnett et al.’s (2008) results, the criterion for initiating a speeded response to the
visual targets should be reached sooner on the relatively infrequent bimodal trials than on the uni-
modal visual trials, whereas it should be reached more slowly (on the bimodal than on the unimodal
trials) for auditory targets.
There are at least two conceptually simple means by which such a pattern of behavioral results
could be achieved. First, the participants could lower their criterion for responding to the visual
targets on the bimodal trials while simultaneously raising their criterion for responding to the audi-
tory target (see Figure 27.3b). Alternatively, however, the criterion for initiating a response might not
change but the presentation of the accessory stimulus in one modality might instead have a cross-
modal effect on the rate of information accrual (R) within the other modality (see Figure 27.3c).
The fact that the process of information accrual (like any other internal process) is likely to be a
noisy one might then help to explain why the Colavita effect is only observed on a proportion of the
bimodal target trials. Evidence that is seemingly consistent with both of these simple accounts can
be found in the literature.
In particular, evidence consistent with the claim that bimodal (as compared to unimodal) stimu-
lation can result in a change in the rate of information accrual comes from an older go/no-go study
reported by Miller (1986). Unimodal auditory and unimodal visual target stimuli were presented ran-
domly in this experiment together with trials in which both stimuli were presented at one of a range
of different SOAs (0–167 ms). The participants had to make a simple speeded detection response
whenever a target was presented (regardless of whether it was unimodal or bimodal). Catch trials,
in which no stimulus was presented (and no response was required), were also included. Analysis of
the results provided tentative evidence that visual stimuli needed less time to reach the criterion for
initiating a behavioral response (measured from the putative onset of response-related activity) com-
pared to the auditory stimuli on the redundant bimodal target trials—this despite the fact that the
initiation of response-related activation after the presentation of an auditory stimulus started earlier
in time than following the presentation of a visual stimulus (see Miller 1986, pp. 340– 341). Taken
together, these results therefore suggest that stimulus-related information accrues more slowly for
auditory targets in the presence (vs. absence) of concurrent visual stimuli than vice versa, just as
highlighted in Figure 27.3c. Similarly, Romei et al.’s (2009) recent results showing that looming
auditory signals enhance visual excitability in a preperceptual manner can also be seen as being
consistent with the information accrual account. However, results arguing for the inclusion of some
component of criterion shifting into one’s model of the Colavita visual dominance effect (although
note that the results are inconsistent with the simple criterion-shifting model put forward in Figure
27.3b) comes from a more recent study reported by Koppen et al. (2009).
Analysis of Koppen et al.’s (2009) results using signal detection theory (see Green and Swets
1966) revealed that although the presentation of an auditory target had no effect on visual sensitiv-
ity, the presentation of a visual target resulted in a significant drop in participants’ auditory sen-
sitivity (see Figure 27.4a; see also Golob et al. 2001; Gregg and Brogden 1952; Marks et al. 2003;
Odgaard et al. 2003; Stein et al. 1996; Thompson et al. 1958). These results therefore show that
the presentation of a visual stimulus can lead to a small, but significant, lowering of sensitivity to
a simultaneously presented auditory stimulus, at least when the participants’ task involves trying
to detect which target modalities (if any) have been presented.* Koppen et al.’s results suggest that
only a relatively small component of the Colavita visual dominance effect may be attributable to the
asymmetrical cross-modal effect on auditory sensitivity (i.e., on the auditory perceptual threshold)
that results from the simultaneous presentation of a visual stimulus. That is, the magnitude of the
sensitivity drop hardly seems large enough to account for the behavioral effects observed in the
normal speeded version of the Colavita task.
The more important result to have emerged from Koppen et al.’s (2009) study in terms of the
argument being developed here was the significant drop in participants’ criterion for responding on
the bimodal (as compared to the unimodal) target trials. Importantly, this drop was significantly
larger for visual than for auditory targets (see Figure 27.4b). The fact that the criterion dropped
for both auditory and visual targets is inconsistent with the simple criterion shifting account of
the asymmetrical cross-modal effects highlighted by Sinnett et al. (2008) that were put forward in
Figure 27.3b. In fact, when the various results now available are taken together, the most plausible
model of the Colavita visual dominance effect would appear to be one in which an asymmetrical
lowering of the criterion for responding to auditory and visual targets (Koppen et al. 2009), is paired
with an asymmetrical cross-modal effect on the rate of information accrual (Miller 1986; see Figure
27.3d).
However, although the account outlined in Figure 27.3d may help to explain why it is that a
participant will typically respond to the visual stimulus first on the bimodal target trials (despite
the fact that the auditory and visual stimuli are actually presented simultaneously), it does not
explain why participants do not quickly recognize the error of their ways (after making a vision-
only response, say), and then quickly initiate an additional auditory response.† The participants
certainly had sufficient time in which to make a response before the next trial started in many of the
studies where the Colavita effect has been reported. For example, in Koppen and Spence’s (2007a,
2007b, 2007c) studies, the intertarget interval was in the region of 1500–1800 ms, whereas mean
vision-only response latencies fell in the 500–700 ms range.
* Note here that a very different result (i.e., the enhancement of perceived auditory intensity by a simultaneously-presented
visual stimulus) has been reported in other studies in which the participants simply had to detect the presence of an audi-
tory target (see Odgaard et al. 2004). This discrepancy highlights the fact that the precise nature of a participant’s task
constitutes a critical determinant of the way in which the stimuli presented in different modalities interact to influence
human information processing (cf. Gondan and Fisher 2009; Sinnett et al. 2008; Wang et al. 2008, on this point).
† Note here that we are talking about the traditional two-response version of the Colavita task. Remember that in the three-
response version, the participant’s first response terminates the trial, and hence there is no opportunity to make a second
response.
The Colavita Visual Dominance Effect 543
(a) 3.5
3
2.5
Sensitivity (d' )
Target type
2 Unimodal
1.5 Bimodal
1
0.5
0
Auditory Visual
(b)
1.2
1
Criterion (c)
0.8
0.6
0.4
0.2
0
Auditory Visual
Target modality
FIGURE 27.4 Summary of mean sensitivity (d' ) values (panel a) and criterion (c) (panel b) for unimodal
auditory, unimodal visual, bimodal auditory, and bimodal visual targets in Koppen et al.’s (2009) signal detec-
tion study of the Colavita visual dominance effect. Error bars indicate the standard errors of means. The
results show that although the simultaneous presentation of auditory and visual stimuli resulted in a reduction
of auditory sensitivity (when compared to performance in unimodal auditory target trials), no such effect was
reported for visual targets. The results also show highlight the fact presentation of a bimodal audiovisual
target resulted in a significant reduction in the criteria (c) for responding, and that this effect was significantly
larger for visual targets than for auditory targets. (Redrawn from Koppen, C. et al., Exp. Brain Res., 196,
353–360, 2009. With permission.)
necessary stage in the chain of human information processing. Rather, he suggests that conscious
perception can, on occasion, be bypassed altogether. Support for Neumann’s view that stimuli can
elicit responses in the absence of awareness comes from research showing, for example, that par-
ticipants can execute rapid and accurate discrimination responses to masked target stimuli that
they are subjectively unaware of (e.g., Taylor and McCloskey 1996). The phenomenon of blindsight
is also pertinent here (e.g., see Cowey and Stoerig 1991). Furthermore, researchers have shown
that people sometimes lose their memory for the second of two stimuli as a result of their having
executed a response to the first stimulus (Crowder 1968; Müsseler and Hommel 1997a, 1997b; see
also Bridgeman 1990; Ricci and Chatterjee 2004; Rizzolatti and Berti 1990). On the basis of such
results, then, our suggestion is that a participant’s awareness (of the target stimuli) in the speeded
version of the Colavita paradigm may actually be modulated by the responses that they happen to
make (select or initiate) on some proportion of the trials, rather than necessarily always being driven
by their conscious perception of the stimuli themselves (see also Hefferline and Perera 1963).
To summarize, when participants try to respond rapidly in the Colavita visual dominance task,
they may sometimes end up initiating their response before becoming aware of the stimulus (or
stimuli) that have elicited that response. Their awareness of which stimuli have, in fact, been pre-
sented is then constrained by the response(s) that they actually happen to make. In other words, if (as
a participant) I realize that I have made (or am about to make) a vision-only response, it would seem
unsurprising that I only then become aware of the visual target, even if an auditory target had also
been presented at the same time (although it perhaps reached the threshold for initiating a response
544 The Neural Bases of Multisensory Processes
more slowly than the visual stimulus; see above). Here, one might even consider the possibility that
participants simply stop processing (or stop responding to) the target stimulus (or stimuli) after they
have selected/triggered a response (to the visual target; i.e., perhaps target processing reflects a kind
of self-terminating processing). Sinnett et al.’s (2008) research is crucial here in showing that, as a
result of the asymmetrical cross-modal effects of auditory and visual stimuli on each other, the first
response that a participant makes on a bimodal target trial is likely to be to a visual (rather than an
auditory) stimulus.
If this hypothesis regarding people’s failure to respond to some proportion of the auditory (or
tactile) stimuli on the bimodal trials in the Colavita paradigm were to be correct, one would expect
the fastest visual responses to occur on those bimodal trials in which participants make a visual-
only response. Koppen and Spence’s (2007a; Experiment 3) results show just such a result in their
three-response study of the Colavita effect (i.e., where participants made one response to auditory
targets, one to visual targets, and a third to the bimodal targets; note, however, that the participants
did not have the opportunity to respond to the visual and auditory stimuli sequentially in this study).
In Koppen and Spence’s study, the visual-only responses on the bimodal target trials were actually
significantly faster, on average (mean RT = 563 ms), than the visual-only responses on unimodal
visual trials (mean RT = 582 ms; see Figure 27.5). This result therefore demonstrates that even
though participants failed to respond to the auditory target, its presence nevertheless still facilitated
their behavioral performance. Finally, the vision-only responses (on the bimodal trials) were also
found, on average, to be significantly faster than the participants’ correct bimodal responses on the
bimodal target trails (mean = 641 ms).
Interestingly, however, participants’ auditory-only responses on the bimodal target trials in
Koppen and Spence’s (2007a) study were significantly slower, on average, than on the unimodal
auditory target trials (mean RTs of 577 and 539 ms, respectively). This is the opposite pattern of
results to that seen for the visual target detection data (i.e., a bimodal slowing of responding for
auditory targets paired with a bimodal speeding of responding to the visual targets). This result
provides additional evidence for the existence of an asymmetrical cross-modal effect on the rate of
information accrual). Indeed, taken together, these results mirror those reported by Sinnett et al.
(2008) in their speeded target detection task, but note here that the data come from a version of the
Colavita task instead. Thus, it really does seem as though the more frequent occurrence of vision-
only as compared to auditory-only responses on the bimodal audiovisual target trials in the Colavita
visual dominance paradigm is tightly linked to the speed with which a participant initiates his/
*
* *
n.s.
Target Colavita
stimulus effect
FIGURE 27.5 Schematic timeline showing the mean latency of participants’ responses (both correct and
incorrect responses) in Koppen et al.’s (2007a) three-button version of the Colavita effect. Significant dif-
ferences between particular conditions of interest (p < .05) are highlighted with an asterisk. (See text for
details.)
The Colavita Visual Dominance Effect 545
her response. When participants respond rapidly, they are much more likely to make an erroneous
visual-only response than to make an erroneous auditory-only response.*
* One final point to note here concerns the fact that when participants made an erroneous response on the bimodal target
trials, the erroneous auditory-only responses were somewhat slower than the erroneous vision-only responses, although
this difference failed to reach statistical significance.
546 The Neural Bases of Multisensory Processes
to await future research, it is worth noting here that recent research has highlighted the importance
of feedback activity from higher order to early sensory areas in certain aspects of visual awareness
(e.g., Lamme 2001; Lamme et al. 2000; Pascual-Leone and Walsh 2001; but see also Macknik 2009;
Macknik and Martinez-Conde 2007, in press). It is also pertinent to note that far more of the brain
is given over to the processing of visual stimuli than to the processing of stimuli from the other sen-
sory modalities. For example, Sereno et al. (1995) suggest that nearly half of the cortex is involved
in the processing of visual information. Meanwhile, Felleman and van Essen (1991) point out that
in the macaque there are less than half the number of brain areas involved in the processing of
tactile information as involved in the processing of visual information. In fact, in their authoritative
literature review, they estimate that 55% of neocortex (by volume), is visual, as compared to 12%
somatosensory, 8% motor, 3% auditory, and 0.5% gustatory. Given such statistics, it would seem
probable that the visual system might have a better chance of setting-up such feedback activity fol-
lowing the presentation of a visual stimulus than would the auditory or tactile systems following the
simultaneous presentation of either an auditory or tactile stimulus. Note that this account suggests
that visual dominance is natural, at least for humans, in that it may have a hardwired physiologi-
cal basis (this idea was originally captured by Colavita et al.’s (1976) suggestion that visual stimuli
might be “prepotent”). It is interesting to note in this context that the amount of cortex given over
to the processing of auditory and tactile information processing is far more evenly matched than for
the competition between audition and vision, hence perhaps explaining the lack of a clear pattern
of dominance when stimuli are presented in these two modalities at the same time (see Hecht and
Reiner 2009; Occelli et al. 2010).
It is also important to note here that progress in terms of explaining the Colavita effect at
a neural level might also come from a more fine-grained study of the temporal dynamics of
multisensory integration in various brain regions. In humans, the first wave of activity in pri-
mary auditory cortex in response to the presentation of suprathreshold stimuli is usually seen at
a latency of about 10–15 ms (e.g., Liegeois-Chauvel et al. 1994; Howard et al. 2000; Godey et al.
2001; Brugge et al. 2003). Activity in primary visual cortex starts about 40–50 ms after stimulus
presentation (e.g., Foxe et al. 2008; see also Schroeder et al. 1998), whereas for primary soma-
tosensory cortex the figure is about 8–12 ms (e.g., Inui et al., 2004; see also Schroeder et al. 2001).
Meanwhile, Schroeder and Foxe (2002, 2004) have documented the asymmetrical time course of
the interactions taking place between auditory and visual cortex. Their research has shown that
the visual modulation of activity in auditory cortex occurs several tens of milliseconds after the
feedforward sweep of activation associated with the processing of auditory stimuli, under condi-
tions where auditory and visual stimuli happen to be presented simultaneously from a location
within peripersonal space (i.e., within arm’s reach; see Rizzolatti et al. 1997). This delay is caused
by the fact that much of the visual input to auditory cortex is routed through superior temporal
polysensory areas (e.g., Foxe and Schroeder 2002; see also Ghazanfar et al. 2005; Kayser et al.
2008; Smiley et al. 2007), and possibly also through prefrontal cortex. It therefore seems plau-
sible to suggest that such delayed visual (inhibitory) input to auditory cortex might play some
role in disrupting the setting-up of the feedback activity from higher (auditory) areas.* That said,
Falchier et al. (2010) recently reported evidence suggesting the existence of a more direct rout-
ing of information from visual to auditory cortex (i.e., from V2 to caudal auditory cortex), hence
potentially confusing the story somewhat.
By contrast, audition’s influence on visual information processing occurs more rapidly, and
involves direct projections from early auditory cortical areas to early visual areas. That is, direct
projections have now been documented from the primary auditory cortex A1 to the primary visual
cortex V1 (e.g., see Wang et al. 2008; note, however, that these direct connections tend to target
* Note here also the fact that visual influences on primary and secondary auditory cortex are greatest when the visual
stimulus leads the auditory stimulus by 20–80 ms (see Kayser et al. 2008), the same magnitude of visual leads that have
also been shown to give rise to the largest Colavita effect (see Figure 2; Koppen and Spence 2007b).
The Colavita Visual Dominance Effect 547
peripheral, rather than central, locations in the visual field; that said, other projections may well
be more foveally targeted). Interestingly, however, until very recently no direct connections had
as yet been observed in the opposite direction (see Falchier et al. 2010). These direct projections
from auditory to visual cortex may help to account for the increased visual cortical excitability seen
when an auditory stimulus is presented together with a visual stimulus (e.g., Martuzzi et al. 2007;
Noesselt et al. 2007; Rockland and Ojima 2003; Romei et al. 2007, 2009; see also Besle et al. 2009;
Clavagnier et al. 2004; Falchier et al. 2003). Indeed, Bolognini et al. (2010) have recently shown that
transcranic magnetic stimulation (TMS)-elicited phosphenes (presented near threshold) are more
visible when a white noise burst is presented approximately 40 ms before the TMS pulse (see also
Romei et al. 2009).
It is also interesting to note here that when auditory and tactile stimuli are presented simulta
neously from a distance of less than 1 m (i.e., in peripersonal space), the response in multisensory
convergence regions of auditory association cortex is both rapid and approximately simultaneous
for these two input modalities (see Schroeder and Foxe 2002, p. 193; see also Foxe et al. 2000,
2002; Murray et al. 2005; Schroeder et al. 2001). Such neurophysiological timing properties may
then also help to explain why no clear Colavita dominance effect has as yet been reported between
these two modalities (see also Sperdin et al. 2009).* That said, any neurally inspired account of the
Colavita effect will likely also have to incorporate the recent discovery of feedforward multisensory
interactions to early cortical areas taking place in the thalamus (i.e., via the thalamocortical loop;
Cappe et al. 2009).
Although any attempt to link human behavior to single-cell neurophysiological data in either
awake and anesthetized primates is clearly speculative at this stage, we are nevertheless convinced
that this kind of interdisciplinary approach will be needed if we are to develop a fuller understand-
ing of the Colavita effect in the coming years. It may also prove fruitful, when trying to explain
why it is that participants fail to make an auditory (or tactile) response once they have made a visual
one to consider the neuroscience research on the duration (and decay) of sensory memory in the
different modalities (e.g., Lu et al. 1992; Harris et al. 2002; Uusitalo et al. 1996; Zylberberg et al.
2009). Here, it would be particularly interesting to know whether there are any systematic modality-
specific differences in the decay rate of visual, auditory, and tactile sensory memory.
* It would be interesting here to determine whether the feedforward projections between primary auditory and tactile
cortices are any more symmetrical than those between auditory and visual cortices (see Cappe and Barone 2005; Cappe
et al. 2009; Hackett et al. 2007; Schroeder et al. 2001; Smiley et al. 2007, on this topic), since this could provide a neural
explanation for why no Colavita effect has, as yet, been reported between the auditory and tactile modalities (Hecht and
Reiner 2009; Occelli et al. 2010). That said, it should also be borne in mind that the nature of auditory-somatosensory
interactions have recently been shown to differ quite dramatically as a function of the body surface stimulated (e.g., dif-
ferent audio–tactile interactions have been observed for stimuli presented close to the hands in frontal space vs. close to
the back of the neck in rear space; see Fu et al. 2003; Tajadura-Jiminez et al. 2009; cf. Critchley 1953, p. 19). The same
may, of course, also turn out to be true for the auditory–tactile Colavita effect.
548 The Neural Bases of Multisensory Processes
Sinnett et al. 2008; see also Gorea and Sagi 2002). The proportion of experimental trials on which
each phenomenon occurs in the laboratory has also been shown to vary greatly between studies.
In terms of the biased (or integrated) competition hypothesis (Desimone and Duncan 1995;
Duncan 1996), extinction (in patients) is thought to reflect biased competition against stimuli from
one side (Driver and Vuilleumier 2001; Rapp and Hendel 2003), whereas here we have argued that
the Colavita effect reflects biased competition that favors the processing of visual stimuli. Although
extinction has typically been characterized as a spatial phenomenon (i.e., it is the contralesional
stimulus that normally extinguishes a simultaneously presented ipsilesional stimulus), it is worth
noting that nonspatial extinction effects have also been reported (Costantini et al. 2007; Humphreys
et al. 1995; see also Battelli et al. 2007). Future neuroimaging research will hopefully help to deter-
mine the extent to which the neural substrates underlying the Colavita visual dominance effect in
healthy individuals and the phenomenon of extinction in clinical patients are similar (Sarri et al.
2006). Intriguing data here come from a neuroimaging study of a single patient with visual–tactile
extinction reported by Sarri et al. In this patient, awareness of touch on the bimodal visuotactile
trials was associated with increased activity in right parietal and frontal regions. Sarri et al. argued
that the cross-modal extinction of the tactile stimulus in this patient resulted from increased com-
petition arising from the functional coupling of visual and somatosensory cortex with multisensory
parietal cortex.
The literature on unimodal and cross-modal extinction suggests that the normal process of
biased competition can be interrupted by the kinds of parietal damage that lead to neglect and/or
extinction. It would therefore be fascinating to see whether one could elicit the same kinds of biases
in neural competition (usually seen in extinction patients) in normal participants, simply by admin-
istering TMS over posterior parietal areas (see Driver and Vuilleumier 2001; Duncan 1996; Sarri
et al. 2006). Furthermore, following on from the single-cell neurophysiological work conducted by
Schroeder and his colleagues (e.g., see Schroeder and Foxe 2002, 2004; Schroeder et al. 2004), it
might also be interesting to target superior temporal polysensory areas, and/or the prefrontal cortex
in order to try and disrupt the modality-based biased competition seen in the Colavita effect (i.e.,
rather than the spatial or temporal competition that is more typically reported in extinction patients;
see Battelli et al. 2007). There are two principle outcomes that could emerge from such a study,
and both seem plausible: (1) TMS over one or more such cortical sites might serve to magnify the
Colavita visual dominance effect observed in normal participants, based on the consequences of
pathological damage to these areas observed in extinction patients; (2) TMS over these cortical sites
might also reduce the magnitude of the Colavita effect, by interfering with the normal processes of
biased competition, and/or by interfering with the late-arriving cross-modal feedback activity from
visual to auditory cortex (see Section 27.6.1). It would, of course, also be very interesting in future
research to investigate whether extinction patients exhibit a larger Colavita effect than normal par-
ticipants in the traditional version of the Colavita task (cf. Costantini et al. 2007).
(cf. Fink et al. 2000; Golob et al. 2001; Sarri et al. 2006; �������������������������������������������
Schubert et al. 2006�����������������������
). Event-related poten-
tial studies could also help to determine just how early (or late, see Falkenstein et al. 1991; Quinlan
2000; Zahn et al. 1994) the processing of ignored and reported auditory (or tactile) stimuli differs
(see Hohnsbein et al. 1991).
trials that the Colavita effect occurs when the nonvisual stimulus is seemingly ignored).* By con-
trast, in the redundant target effect paradigm (see earlier), both stimuli are relevant to the same task
(i.e., to making a simple speeded target detection response).
Researchers have known for more than half a century that people find it difficult to perform two
tasks at the same time (regardless of whether the target stimuli relevant to performing those tasks
are presented in the same versus different sensory modalities (e.g., Pashler 1994; Spence 2008). One
can therefore think of the Colavita paradigm in terms of a form of dual-task interference (resulting
from modality-based biased competition at the response-selection level)—interference that appears
to be intimately linked to the making of speeded responses to the target stimuli (however, see
Koppen et al. 2009). More generally, it is important to stress that although multisensory integra-
tion may, under the appropriate conditions, give rise to improved perception/performance, the ben-
efits may necessarily come at the cost of some loss of access to the component unimodal signals
(cf. Soto-Faraco and Alsius 2007, 2009). In closing, it is perhaps worth highlighting the fact that
the task-dependent nature of the consequences of multisensory integration that show up in studies
related to the Colavita effect have now also been demonstrated in a number of different behavioral
paradigms, in both humans (see Cappe et al. in press; Gondan and Fischer 2009; Sinnett et al. 2008;
Spence et al. 2003) and monkeys (see Besle et al. 2009; Wang et al. 2008).
REFERENCES
Battelli, L., A. Pascual-Leone, and P. Cavanagh. 2007. The ‘when’ pathway of the right parietal lobe. Trends in
Cognitive Sciences 11: 204–210.
Baylis, G. C., J. Driver, and R. D. Rafal. 1993. Visual extinction and stimulus repetition. Journal of Cognitive
Neuroscience 5: 453–466.
Baylis, G. C., S. L. Simon, L. L. Baylis, and C. Rorden. 2002. Visual extinction with double simultaneous
stimulation: What is simultaneous? Neuropsychologia 40: 1027–1034.
Bender, M. B. 1952. Disorders in perception. Springfield, IL: Charles Thomas.
Besle, J., O. Bertrand, and M. H. Giard. 2009. Electrophysiological (EEG, sEEG, MEG) evidence for multiple
audiovisual interactions in the human auditory cortex. Hearing Research 258(1–2): 143–151.
Bolognini, N., I. Senna, A. Maravita, A. Pasqual-Leone, and L. B. Merabeth. 2010. Auditory enhancement
of visual phosphene perception: The effect of temporal and spatial factors and of stimulus intensity.
Neuroscience Letters 477: 109–114.
Bonneh, Y. S., M. K. Belmonte, F. Pei, P. E. Iversen, T. Kenet, N. Akshoomoff, Y. Adini, H. J. Simon, C. I.
Moore, J. F. Houde, and M. M. Merzenich. 2008. Cross-modal extinction in a boy with severely autistic
behavior and high verbal intelligence. Cognitive Neuropsychology 25: 635–652.
Bridgeman, B. 1990. The physiological basis of the act of perceiving. In Relationships between perception and
action: Current approaches, ed. O. Neumann and W. Prinz, 21–42. Berlin: Springer.
Brozzoli, C., M. L. Demattè, F. Pavani, F. Frassinetti, and A. Farnè. 2006. Neglect and extinction: Within and
between sensory modalities. Restorative Neurology and Neuroscience 24: 217–232.
Brugge, J. F., I. O. Volkov, P. C. Garell, R. A. Reale, and M. A. Howard 3rd. 2003. Functional connections
between auditory cortex on Heschl’s gyrus and on the lateral superior temporal gyrus in humans. Journal
of Neurophysiology 90: 3750–3763.
Calvert, G. A., C. Spence, and B. E. Stein (eds.). 2004. The handbook of multisensory processes. Cambridge,
MA: MIT Press.
Cappe, C., and P. Barone, P. 2005. Heteromodal connections supporting multisensory integration at low levels
of cortical processing in the monkey. European Journal of Neuroscience 22: 2886–2902.
* One slight complication here though relates to the fact that people typically start to couple multiple responses to different
stimuli into response couplets under the appropriate experimental conditions (see Ulrich and Miller 2008). Thus, one
could argue about whether participants’ responses on the bimodal target trials actually counts as a third single (rather
than dual) task, but one that, in the two-response version of the Colavita task involves a bi-finger, rather than a unifingered
response. When considered in this light, the interference of performance seen in the Colavita task does not seem quite so
surprising.
The Colavita Visual Dominance Effect 551
Cappe, C., A. Morel, P. Barone, and E. M. Rouiller. 2009. The thalamocortical projection systems in primates:
An anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19: 2025–2037.
Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications
for multisensory integration, spatial awareness, and visual consciousness. Cognitive, Affective, and
Behavioural Neuroscience 4: 117–126.
Colavita, F. B. 1974. Human sensory dominance. Perception and Psychophysics 16: 409–412.
Colavita, F. B. 1982. Visual dominance and attention in space. Bulletin of the Psychonomic Society 19:
261–262.
Colavita, F. B., R. Tomko, and D. Weisberg. 1976. Visual prepotency and eye orientation. Bulletin of the
Psychonomic Society 8: 25–26.
Colavita, F. B., and D. Weisberg. 1979. A further investigation of visual dominance. Perception and
Psychophysics 25: 345–347.
Cooper, R. 1998. Visual dominance and the control of action. In Proceedings of the 20th Annual Conference of
the Cognitive Science Society, ed. M. A. Gernsbacher and S. J. Derry, 250–255. Mahwah, NJ: Lawrence
Erlbaum Associates.
Costantini, M., D. Bueti, M. Pazzaglia, and S. M. Aglioti. 2007. Temporal dynamics of visuo-tactile extinction
within and between hemispaces. Neuropsychology 21: 242–250.
Cowey, A., and P. Stoerig. 1991. The neurobiology of blindsight. Trends in the Neurosciences 14: 140–145.
Critchley, M. 1953. Tactile thought, with special reference to the blind. Brain 76: 19–35.
Crowder, R. G. 1968. Repetition effects in immediate memory when there are no repeated elements in the
stimuli. Journal of Experimental Psychology 78: 605–609.
De Reuck, T., and C. Spence. 2009. Attention and visual dominance. Unpublished manuscript.
Desimone, R., and J. Duncan. 1995. Neural mechanisms of selective visual attention. Annual Review of
Neuroscience 18: 193–222.
Driver, J., and P. Vuilleumier. 2001. Perceptual awareness and its loss in unilateral neglect and extinction.
Cognition 79: 39–88.
Duncan, J. 1996. Cooperating brain systems in selective perception and action. In Attention and performance
XVI: Information integration in perception and communication, ed. T. Inui and J. L. McClelland, 549–
578. Cambridge, MA: MIT Press.
Egeth, H. E., and L. C. Sager. 1977. On the locus of visual dominance. Perception and Psychophysics 22: 77–86.
Elcock, S., and C. Spence. 2009. Caffeine and the Colavita visual dominance effect. Unpublished manuscript.
Ernst, M. 2005. A Bayesian view on multimodal cue integration. In Perception of the human body from the
inside out, ed. G. Knoblich, I. Thornton, M. Grosejan, and M. Shiffrar, 105–131. New York: Oxford Univ.
Press.
Exner, S. 1875. Experimentelle Untersuchung der einfachsten psychischen Processe (Experimental study of
the most simple psychological processes). Archiv für die gesammte Physiologie des menschens und der
Thiere (Pflüger’s Archive) 11: 403–432.
Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2003. Anatomical evidence of multimodal integration
in primate striate cortex. Journal of Neuroscience 22: 5749–5759.
Falchier, A., C. E. Schroeder, T. A. Hackett, P. Lakatos, S. Nascimento-Silva, I. Ulbert, G. Karmos, and J. F.
Smiley. 2010. Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey.
Cerebral Cortex 20: 1529–1538.
Falkenstein, M., J. Hohnsbein, J. Hoormann, and L. Blanke. 1991. Effects of crossmodal divided attention on
late ERP components: II. Error processing in choice reaction tasks. Electroencephalography and Clinical
Neurophysiology 78: 447–455.
Farnè, A., C. Brozzoli, E. Làdavas, and T. Ro. 2007. Investigating multisensory spatial cognition through the
phenomenon of extinction. In Attention and performance XXII: Sensorimotor foundations of higher cog-
nition, ed. P. Haggard, Y. Rossetti, and M. Kawato, 183–206. Oxford: Oxford Univ. Press.
Felleman, D. J., and D. C. Van Essen. 1991. Distributed hierarchical processing in primate cerebral cortex.
Cerebral Cortex 1: 1–47.
Fink, G. R., J. Driver, C. Rorden, T. Baldeweg, and R. J. Dolan. 2000. Neural consequences of competing
stimuli in both visual hemifields: A physiological basis for visual extinction. Annals of Neurology 47:
440–446.
Foree, D. D., and V. M. J. LoLordo. 1973. Attention in the pigeon: Differential effects of food-getting versus
shock-avoidance procedures. Journal of Comparative and Physiological Psychology 85: 551–558.
Foxe, J. J., I. A. Morocz, M. M. Murray, B. A. Higgins, D. C. Javitt, and C. E. Schroeder. 2000. Multisensory
auditory–somatosensory interactions in early cortical processing revealed by high-density electrical
mapping. Cognitive Brain Research 10: 77–83.
552 The Neural Bases of Multisensory Processes
Foxe, J. J., E. C. Strugstad, P. Sehatpour, S. Molholm, W. Pasieka, C. E., Schroeder, and M. E. McCourt. 2008.
Parvocellular and magnocellular contributions to the initial generators of the visual evoked potential:
High-density electrical mapping of the “C1” component. Brain Topography 21: 11–21.
Foxe, J. J., G. R. Wylie, A. Martinez, C. E. Schroeder, D. C. Javitt, D. Guilfoyle, W. Ritter, and M. M. Murray.
2002. Auditory–somatosensory multisensory processing in auditory association cortex: An fMRI study.
Journal of Neurophysiology 88: 540–543.
Fu, K.-M. G., T. A. Johnston, A. S. Shah, L. Arnold, J. Smiley, T. A. Hackett, P. E. Garraghty, and C. E. Schroeder.
2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience 23:
7510–7515.
Gallace, A., H. Z. Tan, and C. Spence. 2007. Multisensory numerosity judgments for visual and tactile stimuli.
Perception and Psychophysics 69: 487–501.
Ghazanfar, A. A., J. X. Maier, K. L. Hoffman, and N. K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in Rhesus monkey auditory cortex. Journal of Neuroscience 25: 5004–5012.
Godey, B., D. Schwartz, J. B. de Graaf, P. Chauvel, and C. Liegeois-Chauvel. 2001. Neuromagnetic source
localization of auditory evoked fields and intracerebral evoked potentials: A comparison of data in the
same patients. Clinical Neurophysiology 112: 1850–1859.
Golob, E. J., G. G. Miranda, J. K. Johnson, and A. Starr. 2001. Sensory cortical interactions in aging, mild
cognitive impairment, and Alzheimer’s disease. Neurobiology of Aging 22: 755–763.
Gondan, M., and V. Fischer. 2009. Serial, parallel, and coactive processing of double stimuli presented with
onset asynchrony. Perception 38(Suppl.): 16.
Gorea, A., and D. Sagi. 2000. Failure to handle more than one internal representation in visual detection tasks.
Proceedings of the National Academy of Sciences of the United States of America 97: 12380–12384.
Gorea, A., and D. Sagi, D. 2002. Natural extinction: A criterion shift phenomenon. Visual Cognition 9: 913–936.
Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley.
Gregg, L. W., and W. J. Brogden. 1952. The effect of simultaneous visual stimulation on absolute auditory
sensitivity. Journal of Experimental Psychology 43: 179–186.
Hackett, T. A., L. De La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C. E. Schroeder. 2007. Multisensory
convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane.
Journal of Comparative Neurology 502: 924–952.
Hahnloser, R., R. J. Douglas, M. Mahowald, and K. Hepp. 1999. Feedback interactions between neuronal
pointers and maps for attentional processing. Nature Neuroscience 2: 746–752.
Harris, J. A., C. Miniussi, I. M. Harris, and M. E. Diamond. 2002. Transient storage of a tactile memory trace
in primary somatosensory cortex. Journal of Neuroscience 22: 8720–8725.
Hartcher-O’Brien, J., A. Gallace, B. Krings, C. Koppen, and C. Spence. 2008. When vision ‘extinguishes’
touch in neurologically-normal people: Extending the Colavita visual dominance effect. Experimental
Brain Research 186: 643–658.
Hartcher-O’Brien, J., C. Levitan, and C. Spence. 2010. Out-of-touch: Does vision dominate over touch when it
occurs off the body? Brain Research 1362: 48–55.
Hecht, D., and M. Reiner. 2009. Sensory dominance in combinations of audio, visual and haptic stimuli.
Experimental Brain Research 193: 307–314.
Hefferline, R. F., and T. B. Perera. 1963. Proprioceptive discrimination of a covert operant without its observa-
tion by the subject. Science 139: 834–835.
Hirsh, I. J., and C. E. Sherrick Jr. 1961. Perceived order in different sense modalities. Journal of Experimental
Psychology 62: 423–432.
Hohnsbein, J., and M. Falkenstein. 1991. Visual dominance: Asymmetries in the involuntary processing of
visual and auditory distractors. In Channels in the visual nervous system: Neurophysiology, psychophys-
ics and models, ed. B. Blum, 301–313. London: Freund Publishing House.
Hohnsbein, J., M. Falkenstein, and J. Hoormann. 1991. Visual dominance is reflected in reaction times and
event-related potentials (ERPs). In Channels in the visual nervous system: Neurophysiology, psychophys-
ics and models, ed. B. Blum, 315–333. London: Freund Publishing House.
Howard, M. A., I. O. Volkov, R. Mirsky, P. C. Garell, M. D. Noh, M. Granner, H. Damasio, M. Steinschneider,
R. A. Reale, J. E. Hind, and J. F. Brugge. 2000. Auditory cortex on the human posterior superior temporal
gyrus. Journal of Comparative Neurology 416: 79–92.
Humphreys, G. W., C. Romani, A. Olson, M. J. Riddoch, and J. Duncan. 1995. Nonspatial extinction following
lesions of the parietal lobe in man. Nature 372: 357–359.
Inui, K., X. Wang, Y. Tamura, Y. Kaneoke, and R. Kakigi. 2004. Serial processing in the human somatosensory
system. Cerebral Cortex 14: 851–857.
The Colavita Visual Dominance Effect 553
Jaśkowski, P. 1996. Simple reaction time and perception of temporal order: Dissociations and hypotheses.
Perceptual and Motor Skills 82: 707–730.
Jaśkowski, P. 1999. Reaction time and temporal-order judgment as measures of perceptual latency: The prob-
lem of dissociations. In Cognitive contributions to the perception of spatial and temporal events, ed. G.
Aschersleben, T. Bachmann, and J. Műsseler, 265–282. North-Holland: Elsevier Science.
Jaśkowski, P., F. Jaroszyk, and D. Hojan-Jesierska. 1990. Temporal-order judgments and reaction time for
stimuli of different modalities. Psychological Research 52: 35–38.
Johnson, T. L., and K. L. Shapiro. 1989. Attention to auditory and peripheral visual stimuli: Effects of arousal
and predictability. Acta Psychologica 72: 233–245.
Kayser, C., C. I. Petkov, and N. K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18: 1560–1574.
Koppen, C., A. Alsius, and C. Spence. 2008. Semantic congruency and the Colavita visual dominance effect.
Experimental Brain Research 184: 533–546.
Koppen, C., C. Levitan, and C. Spence. 2009. A signal detection study of the Colavita effect. Experimental
Brain Research 196: 353–360.
Koppen, C., and C. Spence. 2007a. Seeing the light: Exploring the Colavita visual dominance effect.
Experimental Brain Research 180: 737–754.
Koppen, C., and C. Spence. 2007b. Audiovisual asynchrony modulates the Colavita visual dominance effect.
Brain Research 1186: 224–232.
Koppen, C., and C. Spence. 2007c. Spatial coincidence modulates the Colavita visual dominance effect.
Neuroscience Letters 417: 107–111.
Koppen, C., and C. Spence. 2007d. Assessing the role of stimulus probability on the Colavita visual dominance
effect. Neuroscience Letters 418: 266–271.
Kristofferson, A. B. 1965. Attention in time discrimination and reaction time. NASA Contractors Report 194.
Washington, D.C.: Office of Technical Services, U.S. Department of Commerce.
Lamme, V. A. F. 2001. Blindsight: The role of feedforward and feedback corticocortical connections. Acta
Psychologica 107: 209–228.
Lamme, V. A. F., H. Supèr, R. Landman, P. R. Roelfsema, and H. Spekreijse. 2000. The role of primary visual
cortex (V1) in visual awareness. Vision Research 40: 1507–1521.
Liegeois-Chauvel, C., A. Musolino, J. M. Badier, P. Marquis, and P. Chauvel. 1994. Evoked potentials
recorded from the auditory cortex in man: Evaluation and topography of the middle latency components.
Electroencephalography and Clinical Neuroscience 92: 204–214.
LoLordo, V. M. 1979. Selective associations. In Mechanisms of learning and motivation: A memorial to Jerzy
Konorsky, ed. A. Dickinson and R. A. Boakes, 367–399. Hillsdale, NJ: Erlbaum.
LoLordo, V. M., and D. R. Furrow. 1976. Control by the auditory or the visual element of a compound discrimi-
native stimulus: Effects of feedback. Journal of the Experimental Analysis of Behavior 25: 251–256.
Lu, Z.-L., S. J. Williamson, and L. Kaufman. 1992. Behavioral lifetime of human auditory sensory memory
predicted by physiological measures. Science 258: 1669–1670.
Lucey, T., and C. Spence. 2009. Visual dominance. Unpublished manuscript.
Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in
the human brain. Trends in Neurosciences 28: 264–271.
Macknik, S. L. 2009. The role of feedback in visual attention and awareness. Perception 38(Suppl.): 162.
Macknik, S., and S. Martinez-Conde. 2007. The role of feedback in visual masking and visual processing.
Advances in Cognitive Psychology 3: 125–152.
Macknik, S., and S. Martinez-Conde. In press. The role of feedback in visual attention and awareness. In The
new cognitive neurosciences, ed. M. S. A. Gazzaniga, 1163–1177. Cambridge, MA: MIT Press.
Manly, T., I. H. Robertson, M. Galloway, and K. Hawkins. 1999. The absent mind: Further investigations of
sustained attention to response. Neuropsychologia 37: 661–670.
Marks, L. E., E. Ben-Artzi, and S. Lakatos. 2003. Cross-modal interactions in auditory and visual discrimina-
tion. International Journal of Psychophysiology 50: 125–145.
Martuzzi, R., M. M. Murray, C. M. Michel, J. P. Thiran, P. P. Maeder, S. Clarke, and R. A. Meuli. 2007.
Multisensory interactions within human primary cortices revealed by BOLD dynamics. Cerebral Cortex
17: 1672–1679.
McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2005. Neural basis of auditory-induced
shifts in visual time-order perception. Nature Neuroscience 8: 1197–1202.
Meltzer, D., and M. A. Masaki. 1973. Measures of stimulus control and stimulus dominance. Bulletin of the
Psychonomic Society 1: 28–30.
554 The Neural Bases of Multisensory Processes
Miller, J. O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology
14: 247–279.
Miller, J. O. 1986. Time course of coactivation in bimodal divided attention. Perception and Psychophysics 40:
331–343.
Miller, J. O. 1991. Channel interaction and the redundant targets effect in bimodal divided attention. Journal of
Experimental Psychology: Human Perception and Performance 17: 160–169.
Miller, J., R. Ulrich, and B. Rolke. 2009. On the optimality of serial and parallel processing in the psycho-
logical refractory period paradigm: Effects of the distribution of stimulus onset asynchronies. Cognitive
Psychology 58: 273–310.
Miyake, S., S. Taniguchi, and K. Tsuji. 1986. Effects of light stimulus upon simple reaction time and EP latency
to the click presented with different SOA. Japanese Psychological Research 28: 1–10.
Murray, M. M., S. Molholm, C. M. Michel, D. J. Heslenfeld, W. Ritter, D. C. Javitt, C. E. Schroeder, C. E., and
J. J. Foxe. 2005. Grabbing your ear: Auditory–somatosensory multisensory interactions in early sensory
cortices are not constrained by stimulus alignment. Cerebral Cortex 15: 963–974.
Müsseler, J., and B. Hommel. 1997a. Blindness to response-compatible stimuli. Journal of Experimental
Psychology: Human Perception and Performance 23: 861–872.
Müsseler, J., and B. Hommel. 1997b. Detecting and identifying response-compatible stimuli. Psychonomic
Bulletin and Review 4: 125–129.
Neumann, O. 1990. Direct parameter specification and the concept of perception. Psychological Research 52:
207–215.
Nickerson, R. 1973. Intersensory facilitation of reaction time: Energy summation or preparation enhancement?
Psychological Review 80: 489–509.
Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H.-J. Heinze, and J. Driver. 2007.
Audiovisual temporal correspondence modulates human multisensory temporal sulcus plus primary sen-
sory cortices. Journal of Neuroscience 27: 11431–11441.
Occelli, V., J. Hartcher O’Brien, C. Spence, and M. Zampini. 2010. Assessing the audiotactile Colavita effect
in near and rear space. Experimental Brain Research 203: 517–532.
O'Connor, N., and B. Hermelin. 1963. Sensory dominance in autistic children and subnormal controls.
Perceptual and Motor Skills 16: 920.
Odgaard, E. C., Y. Arieh, and L. E. Marks. 2003. Cross-modal enhancement of perceived brightness: Sensory
interaction versus response bias. Perception and Psychophysics 65: 123–132.
Odgaard, E. C., Y. Arieh, and L. E. Marks. 2004. Brighter noise: Sensory enhancement of perceived loudness by
concurrent visual stimulation. Cognitive, Affective, and Behavioral Neuroscience 4: 127–132.
Oray, S., Z. L. Lu, and M. E. Dawson. 2002. Modification of sudden onset auditory ERP by involuntary atten-
tion to visual stimuli. International Journal of Psychophysiology 43: 213–224.
Osborn, W. C., R. W. Sheldon, and R. A. Baker. 1963. Vigilance performance under conditions of redundant
and nonredundant signal presentation. Journal of Applied Psychology 47: 130–134.
Partan, S., and P. Marler. 1999. Communication goes multimodal. Science 283: 1272–1273.
Pascual-Leone, A., and V. Walsh. 2001. Fast backprojections from the motion to the primary visual area neces-
sary for visual awareness. Science 292: 510–512.
Pashler, H. 1994. Dual-task interference in simple tasks: Data and theory. Psychological Bulletin 116:
220–244.
Peers, P. V., C. J. H. Ludwig, C. Rorden, R. Cusack, C. Bonfiglioli, C. Bundesen, J. Driver, N. Antoun, and J.
Duncan. 2005. Attentional functions of parietal and frontal cortex. Cerebral Cortex 15: 1469–1484.
Posner, M. I., M. J. Nissen, and R. M. Klein. 1976. Visual dominance: An information-processing account of
its origins and significance. Psychological Review 83: 157–171.
Quinlan, P. 2000. The “late” locus of visual dominance. Abstracts of the Psychonomic Society 5: 64.
Randich, A., R. M. Klein, and V. M. LoLordo. 1978. Visual dominance in the pigeon. Journal of the Experimental
Analysis of Behavior 30: 129–137.
Rapp, B., and S. K. Hendel. 2003. Principles of cross-modal competition: Evidence from deficits of attention.
Psychonomic Bulletin and Review 10: 210–219.
Ricci, R., and A. Chatterjee. 2004. Sensory and response contributions to visual awareness in extinction.
Experimental Brain Research 157: 85–93.
Rizzolatti, G., and A. Berti. 1990. Neglect as a neural representation deficit. Revue Neurologique (Paris) 146:
626–634.
Rizzolatti, G., L. Fadiga, L. Fogassi, and V. Gallese. 1997. The space around us. Science 277: 190–191.
Rockland, K. S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey.
International Journal of Psychophysiology 50: 19–26.
The Colavita Visual Dominance Effect 555
Rodway, P. 2005. The modality shift effect and the effectiveness of warning signals in different modalities. Acta
Psychologica 120: 199–226.
Romei, V., M. M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of
low-level human visual cortex excitability by sounds. Current Biology 19: 1799–1805.
Romei, V., M. M. Murray, L. B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has
opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions.
Journal of Neuroscience 27: 11465–11472.
Rorden, C., J. B. Mattingley, H.-O. Karnath, and J. Driver. 1997. Visual extinction and prior entry: Impaired
perception of temporal order with intact motion perception after unilateral parietal damage. Neuro
psychologia 35: 421–433.
Rutschmann, J., and R. Link. 1964. Perception of temporal order of stimuli differing in sense mode and simple
reaction time. Perceptual and Motor Skills 18: 345–352.
Sarri, M., F. Blankenburg, and J. Driver. 2006. Neural correlates of crossmodal visual–tactile extinction and
of tactile awareness revealed by fMRI in a right-hemisphere stroke patient. Neuropsychologia 44:
2398–2410.
Schroeder, C. E., and J. J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory
areas of the macaque neocortex. Brain Research: Cognitive Brain Research 14: 187–198.
Schroeder, C. E., and J. J. Foxe. 2004. Multisensory convergence in early cortical processing. In The handbook of
multisensory processes, ed. G. A. Calvert, C. Spence, and B. E. Stein, 295–309. Cambridge, MA: MIT Press.
Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory input
to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327.
Schroeder, C. E., A. D. Mehta, and S. J. Givre. 1998. A spatiotemporal profile of visual system activation
revealed by current source density analysis in the awake macaque. Cerebral Cortex 8: 575–592.
Schroeder, C. E., S. Molholm, P. Lakatos, W. Ritter, and J. J. Foxe. 2004. Human simian correspondence in the
early cortical processing of multisensory cues. Cognitive Processing 5: 140–151.
Schubert, R., F. Blankenberg, S. Lemm, A. Villringer, and G. Curio. 2006. Now you feel it, now you don’t: ERP
correlates of somatosensory awareness. Psychophysiology 43: 31–40.
Sereno, M. I., A. M. Dale, J. B. Reppas, K. K., Kwong, J. W. Belliveau, T. J. Brady, B. R. Rosen, and R. B.
H. Tootell. 1995. Borders of multiple visual areas in humans revealed by functional magnetic resonance
imaging. Science 268: 889–893.
Shapiro, K. L., B. Egerman, and R. M. Klein. 1984. Effects of arousal on human visual dominance. Perception
and Psychophysics 35: 547–552.
Shapiro, K. L., W. J. Jacobs, and V. M. LoLordo. 1980. Stimulus–reinforcer
����������������������������������������������������
interactions in Pavlovian condi-
tioning of pigeons: Implications for selective associations. Animal Learning and Behavior 8: 586–594.
Shapiro, K. L., and T. L. Johnson. 1987. Effects of arousal on attention to central and peripheral visual stimuli.
Acta Psychologica 66: 157–172.
Sinnett, S., S. Soto-Faraco, and C. Spence. 2008. The co-occurrence of multisensory competition and facilita-
tion. Acta Psychologica 128: 153–161.
Sinnett, S., C. Spence, and S. Soto-Faraco. 2007. Visual dominance and attention: The Colavita effect revisited.
Perception and Psychophysics 69: 673–686.
Smiley, J., T. A. Hackett, I. Ulbert, G. Karmos, P. Lakatos, D. C. Javitt, and C. E. Schroeder. 2007. Multisensory
convergence in auditory cortex: I. Cortical connections of the caudal superior temporal plane in Macaque
monkey. Journal of Comparative Neurology 502: 894–923.
Smith, A. 2002. Effects of caffeine on human behavior. Food Chemistry and Toxicology 40: 1243–1255.
Smith, A. P., A. M. Kendrick, and A. L. Maben. 1992. Effects of breakfast and caffeine on performance and
mood in the late morning and after lunch. Neuropsychobiology 26: 198–204.
Smith, W. F. 1933. The relative quickness of visual and auditory perception. Journal of Experimental Psychology
16: 239–257.
Soto-Faraco, S., and A. Alsius. 2007. Conscious access to the unisensory components of a cross-modal illusion.
Neuroreport 18: 347–350.
Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental
Psychology: Human Perception and Performance 35: 580–587.
Spence, C. 2008. Cognitive neuroscience: Searching for the bottleneck in the brain. Current Biology 18:
R965–R968.
Spence, C. 2010. Prior entry: Attention and temporal perception. In Attention and time, ed. A. C. Nobre and J.
T. Coull, 89–104. Oxford: Oxford Univ. Press.
Spence, C., R. Baddeley, M. Zampini, R. James, and D. I. Shore. 2003. Crossmodal temporal order judgments:
When two locations are better than one. Perception and Psychophysics 65: 318–328.
556 The Neural Bases of Multisensory Processes
Spence, C., M. E. R. Nicholls, and J. Driver. 2001a. The cost of expecting events in the wrong sensory modality.
Perception and Psychophysics 63: 330–336.
Spence, C., D. I. Shore, and R. M. Klein. 2001b. Multisensory prior entry. Journal of Experimental Psychology:
General 130: 799–832.
Spence, C., and S. Soto-Faraco. 2009. Auditory perception: Interactions with vision. In Auditory perception, ed.
C. Plack, 271–296. Oxford: Oxford Univ. Press.
Sperdin, H. F., C. Cappe, J. J. Foxe, and M. M. Murray. 2009. Early, low-level auditory–somatosensory multi-
sensory interactions impact reaction time speed. Frontiers in Integrative Neuroscience 3(2): 1–10.
Stein, B. E., N. London, L. K. Wilkinson, and D. P. Price. 1996. Enhancement of perceived visual intensity by
auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience 8: 497–506.
Stone, J. V., N. M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, and N. R. Porter. 2001. When
is now? Perception of simultaneity. Proceedings of the Royal Society (B) 268: 31–38.
Tajadura-Jiménez, A., N. Kitagawa, A. Väljamäe, M. Zampini, M. M. Murray, and C. Spence. 2009. Auditory–
somatosensory multisensory interactions are spatially modulated by stimulated body surface and acous-
tic spectra. Neuropsychologia 47: 195–203.
Taylor, J. L., and D. I. McCloskey. 1996. Selection of motor responses on the basis of unperceived stimuli.
Experimental Brain Research 110: 62–66.
Thompson, R. F., J. F. Voss, and W. J. Brogden. 1958. Effect of brightness of simultaneous visual stimulation
on absolute auditory sensitivity. Journal of Experimental Psychology 55: 45–50.
Titchener, E. B. 1908. Lectures on the elementary psychology of feeling and attention. New York:
Macmillan.
Turatto, M., F. Benso, G. Galfano, L. Gamberini, and C. Umilta. 2002. Non-spatial attentional shifts between
audition and vision. Journal of Experimental Psychology: Human Perception and Performance 28:
628–639.
Uetake, K., and Y. Kudo. 1994. Visual dominance over hearing in feed acquisition procedure of cattle. Applied
Animal Behaviour Science 42: 1–9.
Ulrich, R., and J. Miller. 2008. Response grouping in the psychological refractory period (PRP) paradigm:
Models and contamination effects. Cognitive Psychology 57: 75–121.
Uusitalo, M. A., S. J. Williamson, and M. T. Seppä. 1996. Dynamical organisation of the human visual system
revealed by lifetimes of activation traces. Neuroscience Letters 213: 149–152.
Van Damme, S., G. Crombez, and C. Spence. 2009a. Is the visual dominance effect modulated by the threat
value of visual and auditory stimuli? Experimental Brain Research 193: 197–204.
Van Damme, S., A. Gallace, C. Spence, and G. L. Moseley. 2009b. Does the sight of physical threat induce
a tactile processing bias? Modality-specific attentional facilitation induced by viewing threatening pic-
tures. Brain Research 1253: 100–106.
Vibell, J., C. Klinge, M. Zampini, C. Spence, and A. C. Nobre. 2007. Temporal order is coded temporally in
the brain: Early ERP latency shifts underlying prior entry in a crossmodal temporal order judgment task.
Journal of Cognitive Neuroscience 19: 109–120.
Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex
of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9: 79.
Wilcoxin, H. C., W. B. Dragoin, and P. A. Kral. 1971. Illness-induced aversions in rat and quail: Relative
salience of visual and gustatory cues. Science 171: 826–828.
Zahn, T. P., D. Pickar, and R. J. Haier. 1994. Effects of clozapine, fluphenazine, and placebo on reaction time
measures of attention and sensory dominance in schizophrenia. Schizophrenia Research 13: 133–144.
Zwyghuizen-Doorenbos, A., T. A. Roehrs, L. Lipschutz, V. Timms, and T. Roth. 1990. Effects of caffeine on
alertness. Psychopharmacology 100: 36–39.
Zylberberg, A., S. Dehaene, G. B. Mindlin, and M. Sigman. 2009. Neurophysiological bases of exponential
sensory decay and top-down memory retrieval: A model. Frontiers in Computational Neuroscience 3(4):
1–16.
28 The Body in a
Multisensory World
Tobias Heed and Brigitte Röder
CONTENTS
28.1 Introduction........................................................................................................................... 557
28.2 Construction of Body Schema from Multisensory Information............................................ 558
28.2.1 Representing Which Parts Make Up the Own Body................................................. 558
28.2.2 Multisensory Integration for Limb and Body Ownership......................................... 559
28.2.3 Extending the Body: Tool Use................................................................................... 561
28.2.4 Rapid Plasticity of Body Shape................................................................................. 562
28.2.5 Movement and Posture Information in the Brain...................................................... 563
28.2.6 The Body Schema: A Distributed versus Holistic Representation............................564
28.2.7 Interim Summary...................................................................................................... 565
28.3 The Body as a Modulator for Multisensory Processing........................................................ 565
28.3.1 Recalibration of Sensory Signals and Optimal Integration....................................... 565
28.3.2 Body Schema and Peripersonal Space....................................................................... 566
28.3.3 Peripersonal Space around Different Parts of the Body............................................ 568
28.3.4 Across-Limb Effects in Spatial Remapping of Touch............................................... 569
28.3.5 Is the External Reference Frame a Visual One?........................................................ 570
28.3.6 Investigating the Body Schema and Reference Frames with Electrophysiology...... 572
28.3.7 Summary................................................................................................................... 574
28.4 Conclusion............................................................................................................................. 574
References....................................................................................................................................... 575
28.1 INTRODUCTION
It is our body through which we interact with the environment. We have a very clear sense about
who we are in the sense that we know where our body ends, and what body parts we own. Above
that, we usually are (or can easily become) aware of where each of our body parts is currently
located, and most of our movements seem effortless, whether performed under conscious control or
not. When we think about ourselves, we normally perceive our body as a stable entity. For example,
when we go to bed, we do not expect that our body will be different when we wake up the next
morning. Quite contrary to such introspective assessment, the brain has been found to be surpris-
ingly flexible in updating its representation of the body. As an illustration, consider what happens
when an arm or leg becomes numb after you have sat or slept in an unsuitable position for too long.
Touching the numb foot feels very strange, as if you touch someone else’s foot. When you lift a
numb hand with the other hand, it feels far too heavy. Somehow, it feels as if the limb does not
belong to the own body.
Neuroscientists have long been fascinated with how the brain represents the body. It is usually
assumed that there are several different types of body representations, but there is no consensus
about what these representations are, or how many there may be (de Vignemont 2010; Gallagher
1986; Berlucchi and Aglioti 2010; see also Dijkerman and de Haan 2007 and commentaries thereof).
557
558 The Neural Bases of Multisensory Processes
The most common distinction is that between a body schema and a body image. The body schema
is usually defined as a continuously updated sensorimotor map of the body that is important in the
context of action, informing the brain about what parts belongs to the body, and where those parts
are currently located (de Vignemont 2010). In contrast, the term body image is usually used to
refer to perceptual, emotional, or conceptual knowledge about the body. However, other taxonomies
have been proposed (see Berlucchi and Aglioti 2010; de Vignemont 2010), and the use of the terms
body schema and body image has been inconsistent. This chapter will not present an exhaustive
debate about these definitions, and we refer the interested reader to the articles cited above for
detailed discussion; in this article, we will use the term body schema with the sensorimotor defini-
tion introduced above, referring to both aspects of what parts make up the body, and where they
are located.
The focus of this chapter will be on the importance of multisensory processing for representing
the body, as well as on the role of body representations for multisensory processing. On one hand,
one can investigate how the body schema is constructed and represented in the brain, and Section
28.2 will illustrate that the body schema emerges from the interaction of multiple sensory modali-
ties. For this very reason, one can, on the other hand, ask how multisensory interactions between
the senses are influenced by the fact that the brain commands a body. Section 28.3, therefore, will
present research on how the body schema is important in multisensory interactions, especially for
spatial processing.
sleep), and why it is not visible (e.g., it was lost 20 years ago) (Halligan et al. 1993; Sellal et al.
1996; Bakheit and Roundhill 2005). It has therefore been suggested that the subjective presence of a
supernumerary limb may result from cognitive conflicts between different pieces of sensory infor-
mation (e.g., visual vs. proprioceptive) or fluctuations in the awareness about the paralysis, which in
turn may be resolved by assuming the existence of two (or more) limbs rather than one (Halligan et
al. 1993; Ramachandran and Hirstein 1998).
Whereas a patient with a phantom or a supernumerary limb perceives more limbs than he actu-
ally owns, some brain lesions result in the opposite phenomenon of patients denying the owner-
ship of an existing limb. This impairment, termed somatoparaphrenia, has been reported to occur
after temporo-parietal (Halligan et al. 1995) or thalamic-temporo-parietal damage (Daprati et al.
2000)—notably all involving the parietal lobe, which is thought to mediate multisensory integration
for motor planning. Somatoparaphrenia is usually observed in conjunction with hemineglect and
limb paralysis (Cutting 1978; Halligan et al. 1995; Daprati et al. 2000) and has been suggested to
reflect a disorder of body awareness due to the abnormal sensorimotor feedback for the (paralyzed)
limb after brain damage (Daprati et al. 2000).
Lesions can also affect the representation of the body and self as a whole, rather than just affect-
ing single body parts. These experiences have been categorized into three distinct phenomena
(Blanke and Metzinger 2009). During out-of-body experiences, a person feels to be located outside
of her real body and to look at herself, often from above. In contrast, during an autoscopic illusion,
the person localizes herself in her real body, but sees an illusory body in extrapersonal space (e.g.,
in front of herself). Finally, during heautoscopy, a person sees a second body and feels to be located
in both, either at the same time, or in sometimes rapid alternation. In patients, such illusions have
been suggested to be related to damage to the temporo-parietal junction (TPJ) (Blanke et al. 2004),
and an out-of-body experience was elicited by stimulation of an electrode implanted over the TPJ
for presurgical assessment (Blanke et al. 2002). Interestingly, whole body illusions can coincide
with erroneous visual perceptions about body parts, for example, an impression of limb shortening
or illusory flexion of an arm. It has therefore been suggested that whole body illusions are directly
related to the body schema, resulting from a failure to integrate multisensory (e.g., vestibular and
visual) information about the body and its parts, similar to the proposed causes of supernumerary
limbs (Blanke et al. 2004).
In sum, many brain regions are involved in representing the configuration of the body; some
aspects of these representations seem to be innate, and are probably refined during early develop-
ment. Damage to some of the involved brain regions can lead to striking modifications of the per-
ceived body configuration, as well as to illusions about the whole body.
Ramachandran 2003), participants reported that they “felt” the touch delivered to their real hand to
originate from the shoe and the table. Similarly, early event-related potentials (ERPs) in response
to tactile stimuli were enhanced after synchronous stimulation of a rubber hand as well as of a non-
hand object (Press et al. 2008). Even more surprisingly, participants in Armel and Ramachandran’s
study displayed signs of distress and an increased skin conductance response when the shoe was hit
with a hammer, or a band-aid was ripped off the table surface. Similar results, that is, signs of dis-
tress, were also observed when the needle of a syringe was stabbed into the rubber hand, and these
behavioral responses were associated with brain activity in anxiety-related brain areas (Ehrsson et
al. 2007). Thus, the mere synchrony of visual events at an object with the tactile sensations felt at
the hand seem to have led to some form of integration of the objects (the rubber hand, the shoe, or
the table surface) into the body schema, resulting in physiological and emotional responses usu-
ally reserved for the real body. It is important to understand that participants in the rubber hand
illusion (RHI) do not feel additional limbs; rather, they feel a displacement of their own limb,
which is reflected behaviorally by reaching errors after the illusion has manifested itself (Botvinick
and Cohen 1998; Holmes et al. 2006; but see Kammers et al. 2009a, 2009c, and discussion in de
Vignemont 2010), and by an adjustment of grip aperture when finger posture has been manipulated
during the RHI (Kammers et al. 2009b). Thus, a new object (the rubber hand) is integrated into the
body schema, but is interpreted as an already existing part (the own but hidden arm).
The subjective feeling of ownership of a rubber hand has also been investigated using func-
tional magnetic resonance imaging (fMRI). Activity emerged in the ventral premotor cortex and
(although, statistically, with only a tendency for significance) in the superior parietal lobule (SPL)
(Ehrsson et al. 2004). In the monkey, both of these areas respond to peripersonal stimuli around
the hand and head. Activity related to multisensory integration—synchrony of tactile and visual
events, as well as the alignment of visual and proprioceptive information about arm posture—was
observed in the SPL, presumably in the human homologue of an area in the monkey concerned with
arm reaching [the medial intraparietal (MIP) area]. Before the onset of the illusion, that is, during
its buildup, activity was seen in the intraparietal sulcus (IPS), in the dorsal premotor cortex (PMd),
and in the supplementary motor area (SMA), which are all thought to be part of an arm-reaching
circuit in both monkeys and humans. Because the rubber arm is interpreted as one’s own arm, the
illusion may be based on a recalibration of perceived limb position, mediated parietally, according
to the visual information about the rubber arm (Ehrsson et al. 2004; Kammers et al. 2009c). As
such, the integration of current multisensory information about the alleged position of the hand must
be integrated with long-term knowledge about body structure (i.e., the fact that there is a hand to be
located) (de Vignemont 2010; Tsakiris 2010).
Yet, as noted earlier, an integration of a non-body-like object also seems possible in some cases.
Besides the illusory integration of a shoe or the table surface due to synchronous stimulation, an
association of objects with the body has been reported in a clinical case of a brain-lesioned patient
who denied ownership of her arm and hand; when she wore the wedding ring on that hand, she did
not recognize it as her own. When it was taken off the neglected hand, the patient immediately rec-
ognized the ring as her own (Aglioti et al. 1996). Such findings might therefore indicate an involve-
ment of higher cognitive processes in the construction of the body schema.
It was mentioned in the previous section that brain damage can lead to misinterpretations of
single limbs (say, an arm or a leg), but also of the whole body. Similarly, the rubber hand paradigm
has been modified to study also the processes involved in the perception of the body as a whole
and of the feeling of self. Participants viewed a video image of themselves filmed from the back
(Ehrsson 2007) or a virtual reality character at some distance in front of them (Lenggenhager et al.
2007). They could see the back of the figure in front of them being stroked in synchrony with feel-
ing their own back being stroked. This manipulation resulted in the feeling of the self being located
outside the own body and of looking at oneself (Ehrsson 2007). Furthermore, when participants
were displaced from their viewing position and asked to walk to the location at which they felt
“themselves” during the illusion, they placed themselves in between the real and the virtual body’s
The Body in a Multisensory World 561
locations (Lenggenhager et al. 2007). Although both rubber hand and whole body illusions use the
same kind of multisensory manipulation, the two phenomena have been proposed to tap into dif-
ferent aspects of body processing (Blanke and Metzinger 2009): whereas the rubber hand illusion
leads to the attribution of an object into the body schema, the whole body illusion manipulates the
location of a global “self” (Blanke and Metzinger 2009; Metzinger 2009), and accordingly the first-
person perspective (Ehrsson 2007). This distinction notwithstanding, both illusions convincingly
demonstrate how the representation of the body in the brain is determined by the integration of
multisensory information.
To sum up, our brain uses the synchrony of multisensory (visual and tactile) stimulation to deter-
mine body posture. Presumably, because touch is necessarily located on the body, such synchronous
visuo-tactile stimulation can lead to illusions about external objects to belong to our body, and even
to mislocalization of the location of the whole body. However, the illusion is not of a new body part
having been added, but rather of a non-body object taking the place of an already existing body part
(or, in the case of the whole body illusion, the video image indicating our body’s location).
in locations along the tool (Holmes et al. 2004; Yue et al. 2009). Finally, in a recent study, partici-
pants were asked to make tactile discrimination judgments about stimuli presented to the tip of a
tool. Visual distractors were presented in parallel to the tactile stimuli. fMRI activity in response
to the visual distractors near the end of the tool was enhanced in the occipital cortex, compared to
locations further away from the tool (Holmes et al. 2008). These findings were also interpreted to
indicate an increase of attention at the tool tip, due to the use of the tool.
Experimental results such as these challenge the idea of an extension of the body schema. Other
results, in contrast, do corroborate the hypothesis of an extension of the body schema due to tool
use. For example, tool use resulted in a change of the perceived distance between two touches to
the arm, which was interpreted to indicate an elongated representation of the arm (Cardinali et al.
2009b).
It has recently been pointed out that the rubber hand illusion seems to consist of several disso-
ciable aspects (Longo et al. 2008), revealed by the factor-analytic analysis of questionnaire related
to the experience of the rubber hand illusion. More specific distinctions may need to be made about
the different processes (and, as a consequence, the different effects found in experiments) involved
in the construction of the body schema, and different experimental paradigms may tap into only a
subset of these processes.
In sum, multisensory signals are not only important for determining what parts we perceive to
be made of. Multisensory mechanisms are also important in mediating the ability to use tools. It is
currently under debate if tools extend the body schema by integrating the tool as a body part, or if
other multisensory processes, for example, a deployment of attention to the space manipulated by
the tool, are at the core of our ability to use tools.
Similarly, visually perceived hand size has also been shown to affect grip size, although more so
when the visual image of the hand (a projection of an online video recording of the hand) was bigger
than normal (Marino et al. 2010).
The rubber hand illusion has also been used to create the impression of having an elongated arm
by having participants wear a shirt with an elongated sleeve from which the rubber hand protruded
(Schaefer et al. 2007). By recording magnetoencephalographic (MEG) responses to tactile stimuli
to the illusion hand, this study also demonstrated an involvement of primary somatosensory cortex
in the illusion.
These experiments demonstrate that perception of the body can be rapidly adjusted by the brain,
and that these perceptual changes in body shape affect object perception as well as hand actions.
stimulus after it was displayed for several seconds. Because the patient’s deficit was not a general
disability to detect tactile stimulation or perform hand actions, these results seem to imply that it is
the maintenance of the current postural state of the body that was lost over time unless new visual,
tactile, or proprioceptive information forced an update of the model. The importance of the SPL for
posture control is also evident from a patient who, after SPL damage, lost her ability to correctly
interact with objects requiring whole body coordination, such as sitting on a chair (Kase et al. 1977).
Still further evidence for an involvement of the SPL in posture representation comes from experi-
ments in healthy participants. When people are asked to judge the laterality of a hand presented in a
picture, these judgments are influenced by the current hand posture adopted by the participant: the
more unnatural it would be to align the own hand with the displayed hand, the longer participants
take to respond (Parsons 1987; Ionta et al. 2007). A hand posture change during the hand lateral-
ity task led to an activation in the SPL in fMRI (de Lange et al. 2006). Hand crossing also led to
a change in intraparietal activation during passive tactile stimulation (Lloyd et al. 2003). Finally,
recall that fMRI activity during the buildup of the rubber hand illusion, thought to involve postural
recalibration due to the visual information about the rubber arm, was also observed in the SPL.
These findings are consistent with data from neurophysiological recordings in monkeys show-
ing that neurons in area 5 (Sakata et al. 1973) in the superior parietal lobe as well as neurons in
area PEc (located just at the upper border of the IPS and extending into the sulcus to border MIP;
Breveglieri et al. 2008) respond to complex body postures, partly involving several limbs. Neurons
in these areas respond to tactile, proprioceptive, and visual input (Breveglieri et al. 2008; Graziano
et al. 2000). Furthermore, some area 5 neurons fire most when the felt and the seen position of the
arm correspond rather than when they do not (Graziano 1999; Graziano et al. 2000). These neurons
respond not only to vision of the own arm, but also to vision of a fake arm, if it is positioned in an
anatomically plausible way such that they look as if they might belong to the animal’s own body,
reminiscent of the rubber hand illusion in humans. Importantly, some neurons fire most when the
visual information of the fake arm matches the arm posture of the monkey’s real, hidden arm, but
reduce their firing rate when vision and proprioception do not match.
To summarize, body movement and body posture are represented by different brain regions.
Movement perception relies on the motor structures of the frontal lobe. Probably, the most impor-
tant brain region for the representation of body posture, in contrast, is the SPL. This region is known
to integrate signals from different sensory modalities, and damage to it results in dysfunctions of
posture perception and actions requiring postural adaptations. However, other brain regions are
involved in posture processing as well.
Although winner-take-all schemes for such dominance of one sense over another have been
proposed (e.g., Ramachandran and Hirstein 1998), there is ample evidence that inconsistencies in
the information from the different senses does not simply lead to an overruling of one by the other.
Rather, the brain seems to combine the different senses to come up with a statistically optimal esti-
mate of the true environmental situation, allowing for statistically optimal movements (Körding and
Wolpert 2004; Trommershäuser et al. 2003) as well as perceptual decisions (Ernst and Banks 2002;
Alais and Burr 2004). Because in many cases one of our senses outperforms the others in a specific
sensory ability—for example, spatial acuity is superior in vision (Alais and Burr 2004), and tem-
poral acuity is best in audition (Shams et al. 2002; Hötting and Röder 2004)—many experimental
results have been interpreted in favor of an “overrule” hypothesis. Nevertheless, it has been demon-
strated, for example, in spatial tasks, that the weight the brain assigns to the information received
through a sensory channel is directly related to its spatial acuity, and that audition (Alais and Burr
2004) and touch (Ernst and Banks 2002) will overrule vision when visual acuity is sufficiently
degraded. Such integration is probably involved also in body processing and in such phenomena as
the rubber hand and Pinocchio illusions.
In sum, the body schema influences how multisensory information is interpreted by the brain.
The weight that a piece of multisensory information is given varies with its reliability (see also de
Vignemont 2010).
modulated by two manipulations that are central to neurons representing peripersonal space in
monkeys: (1) extinction can be multisensory and (2) it can dissociate between peripersonal and
extrapersonal space. In addition, locations of lesions associated with extinction coincide (at least
coarsely) with the brain regions associated with peripersonal spatial functions in monkeys (Mort et
al. 2003; Karnath et al. 2001). The study of extinction patients has therefore suggested that a circuit
for peripersonal space exists in humans, analogously to the monkey.
The peripersonal space has also been investigated in healthy humans. One of the important
characteristics of the way the brain represents peripersonal space is the alignment of visual and
tactile events. In an fMRI study in which participants had to judge if a visual stimulus and a tactile
stimulus to the hand were presented from the same side of space, hand crossing led to an increase
of activation in the secondary visual cortex, indicating an influence of body posture on relatively
low-level sensory processes (Misaki et al. 2002). In another study, hand posture was manipulated
in relation to the eye: rather than changing hand posture itself, gaze was directed such that a tac-
tile stimulus occurred either in the right or the left visual hemifield. The presentation of bimodal
visual–tactile stimuli led to higher activation in the visual cortex in the hemisphere contralateral to
the visual hemifield of the tactile location, indicating that the tactile location was remapped with
respect to the visual space and then influenced visual cortex (Macaluso et al. 2002). These influ-
ences of posture and eye position on early sensory cortex may be mediated by parietal cortex. For
example, visual stimuli were better detected when a tactile stimulus was concurrently presented
(Bolognini and Maravita 2007). This facilitatory influence of the tactile stimulus was best when the
hand was held near the visual stimulus, both when this implied a normal or a crossed hand posture.
However, hand crossing had a very different effect when neural processing in the posterior parietal
cortex was impaired by repetitive TMS: now a tactile stimulus was most effective when it was deliv-
ered to the hand anatomically belonging to that side of the body at which the visual stimulus was
presented; when the hands were crossed, a right hand stimulus, for example, facilitated a right-side
visual stimulus, although the hand was located in the left visual space (Bolognini and Maravita
2007). This result indicates that after disruption of parietal processing, body posture was no longer
taken into account during the integration of vision and touch, nicely in line with the findings about
the role of parietal cortex for posture processing (see Section 28.2.5).
A more direct investigation of how the brain determines if a stimulus is located in the peri
personal space was undertaken in an fMRI study that independently manipulated visual and pro-
prioceptive cues about hand posture to modulate the perceived distance of a small visual object
from the participants’ hand. Vision of the arm could be occluded, and the occluded arm was then
located near the visual object (i.e., peripersonally) or far from it; the distance from the object could
be determined by the brain only by using proprioceptive information. Alternatively, vision could be
available to show that the hand was either close or far from the stimulus. Ingeniously, the authors
manipulated these proprioceptive and visual factors together by using a rubber arm: when the real
arm was held far away from the visual object, the rubber hand could be placed near the object so
that visually the object was in peripersonal space (Makin et al. 2007). fMRI activity due to these
manipulations was found in posterior parietal areas. There was some evidence that for the determi-
nation of posture in relation to the visual object, proprioceptive signals were more prominent in the
anterior IPS close to the somatosensory cortex, and that vision was more prominent in more poste-
rior IPS areas, closer to visual areas. Importantly, however, all of these activations were located in
the SPL and IPS, the areas that have repeatedly been shown to be relevant for the representation of
posture and of the body schema.
Besides these neuroimaging approaches, behavioral studies have also been successful in investi-
gating the peripersonal space and the body schema. One task that has rendered a multitude of find-
ings is a cross-modal interference paradigm, the cross-modal congruency (CC) task (reviewed by
Spence et al. 2004b). In this task, participants receive a tactile stimulus to one of four locations; two
of these locations are located “up” and two are located “down” (see Figure 28.1). Participants are
asked to judge the elevation of the tactile stimulus in each trial, regardless of its side (left or right).
568 The Neural Bases of Multisensory Processes
tactile stimuli
visual distractors
FIGURE 28.1 Standard cross-modal congruency task. Tactile stimuli are presented to two locations on the
hand (often index finger and thumb holding a cube; here, back and palm of the hand). In each trial, one of the
tactile stimuli is presented concurrently with one of the visual distractor stimuli. Participants report if tactile
stimulus came from an upper or a lower location. Although they are to ignore visual distractors, tactile judg-
ment is biased toward location of the light. This influence is biggest when the distractor is presented at the
same hand as tactile stimulus, and reduced when the distractor occurs at the other hand.
However, a to-be-ignored visual distractor stimulus is presented with every tactile target stimulus,
also located at one of the four locations at which the tactile stimuli can occur. The visual distractor
is independent of the tactile target; it can therefore occur at a congruent location (tactile and visual
stimulus have the same elevation) or at an incongruent location (tactile and visual stimulus have
opposing elevations). Despite the instruction to ignore the visual distractors, participants’ reaction
times and error probabilities are influenced by them. When the visual distractors are congruent,
participants perform faster and with higher accuracy than when the distractors are incongruent.
The difference of the incongruent minus the congruent conditions (e.g., in RT and in accuracy) is
referred to as the CC effect. Importantly, the CC effect is larger when the distractors are located
close to the stimulated hands rather than far away (Spence et al. 2004a). Moreover, the CC effect is
larger when the distractors are placed near rubber hands, but only if those are positioned in front
of the participant in such a way that, visually, they could belong to the participant’s body (Pavani
et al. 2000). The CC effect is also modulated by tool use in a similar manner as by rubber hands;
when a visual distractor is presented in far space, the CC effect is relatively small, but it increases
when a tool is held near the distractor (Maravita et al. 2002; Maravita and Iriki 2004; Holmes et al.
2007). Finally, the CC effect is increased during the whole body illusion (induced by synchronous
stroking; see Section 28.2.2) when the distractors are presented on the back of the video image felt
to be the own body, compared to when participants see the same video image and distractor stimuli,
but without the induction of the whole body illusion (Aspell et al. 2009). These findings indicate that
cross-modal interaction, as indexed in the CC effect, is modulated by the distance of the distractors
from what is currently represented as the own body (i.e., the body schema) and thus suggest that the
CC effect arises in part from the processing of peripersonal space.
To summarize, monkey physiology, neuropsychological findings, and behavioral research sug-
gest that the brain specially represents the space close around the body, the peripersonal space.
There is a close relationship between the body schema and the representation of peripersonal space,
as body posture must be taken into account to remap, from moment to moment, which part of exter-
nal space is peripersonal.
characteristics for the lower body have so far not been reported (Graziano et al. 2002). The periper-
sonal space representation may thus be limited to body parts that are important for the manipulation
of objects under (mainly) visual control. To test this hypothesis in humans, body schema–related
effects such as the CC effect, which have been conducted for the hands, must be investigated for
other body parts.
The aforementioned study of the CC effect during the whole body illusion (Aspell et al. 2009;
see also Section 28.2.2) demonstrated a peripersonal spatial effect near the back. The CC effect was
observable also when stimuli were delivered to the feet (Schicke et al. 2009), suggesting that a rep-
resentation of the peripersonal space exists also for the space around these limbs. If the hypothesis
is correct that the body schema is created from body part–specific representations, one might expect
that the representation of the peripersonal space of the hand and that of the foot do not interact. To
test this prediction, tactile stimuli were presented to the hands while visual distractors were flashed
either near the participant’s real foot, near a fake foot, or far from both the hand and the foot. The
cross-modal interference of the visual distractors, indexed by the CC effect, was larger when they
were presented in the peripersonal space of the real foot than when they were presented near the
fake foot or in extrapersonal space (Schicke et al. 2009). The spatial judgment of tactile stimuli at
the hand was thus modulated when a visual distractor appeared in the peripersonal space of another
body part. This effect cannot be explained with the current concept of peripersonal space as tactile
RFs encompassed by visual RFs. These results rather imply either a holistic body schema represen-
tation, or, more probably, interactions beyond simple RF overlap between the peripersonal space
representations of different body parts (Holmes and Spence 2004; Spence et al. 2004b).
In sum, the peripersonal space is represented not just for the hands, but also for other body parts.
Interactions between the peripersonal spatial representations of different body parts challenge the
concept of peripersonal space being represented merely by overlapping RFs.
and Kitazawa 2001a; Shore et al. 2002; Röder et al. 2004; Schicke and Röder 2006; Azanon and
Soto-Faraco 2007). It is usually assumed that the performance deficit after hand crossing in the
TOJ task is due to a conflict between two concurrently active reference frames: one anatomical and
one external (Yamamoto and Kitazawa 2001a; Röder et al. 2004; Schicke and Röder 2006). The
right–left coordinate axes of these two reference frames are opposed to each other when the hands
are crossed; for example, the anatomically right arm is located in the externally left hemispace dur-
ing hand crossing. This remapping takes place despite the task being purely tactile, and despite the
detrimental effect of using the external reference frame in the task. Remapping of stimulus location
by accounting for current body posture therefore seems to be an automatically evoked process in
the tactile system.
In the typical TOJ task, the two stimuli are applied to the two hands. It would therefore be pos-
sible that the crossing effect is simply due to a confusion regarding the two homologous limbs,
rather than to the spatial location of the stimuli. This may be due to a coactivation of homologous
brain areas in the two hemispheres (e.g., in SI or SII), which may make it difficult to assign the two
concurrent tactile percepts to their corresponding visual spatial locations. However, a TOJ crossing
effect was found for tactile stimuli delivered to the two hands, to the two feet, or to one hand and the
contralateral foot (Schicke and Röder 2006). In other words, participants were confused not only
about which of the two hands or the two feet was stimulated first, but they were equally impaired
in deciding if it was a hand or a foot that received the first stimulus. Therefore, the tactile location
originating on the body surface seems to be remapped into a more abstract spatial code for which
the original skin location, and the somatotopic coding of primary sensory cortex, is no longer a
dominating feature. In fact, it has been suggested that the location of a tactile stimulus on the body
may be reconstructed by determining which body part currently occupies the part of space at which
the tactile stimulus has been sensed (Kitazawa 2002). The externally anchored reference frame is
activated in parallel with a somatotopic one, and their concurrent activation leads to the observed
behavioral impairment.
To summarize, remapping of stimulus location in a multisensory experiment such as the CC
paradigm is a necessity for aligning signals from different modalities. Yet, even when stimuli are
purely unimodal, and the task would not require a recoding of tactile location into an external coor-
dinate frame, such a transformation nonetheless seems to take place. Thus, even for purely tactile
processing, posture information (e.g., proprioceptive and visual) is automatically integrated.
2003; Pesaran et al. 2006) or auditory (Cohen and Andersen 2000). In addition to these results from
monkeys, an fMRI experiment in humans has suggested common spatial processing of visual and
tactile targets for saccade as well as for reach planning (Macaluso et al. 2007). Still further down-
stream, in the motor-related PMv, which has been proposed to form the peripersonal space circuit
together with VIP, visual RFs are aligned with hand position (Graziano and Cooke 2006).
These findings have led to the suggestion that the external reference frame involved in tactile
localization is a visual one, and that remapping occurs automatically to aid the fusion of spatial
information of the different senses. Such use of visual coordinates may be helpful not only for action
planning (e.g., the reach of the hand toward an object), but also for an efficient online correction of
motor error with respect to the visual target (Buneo et al. 2002; Batista et al. 1999).
A number of variants of the TOJ paradigm have been employed to study the visual origin of the
external reference frame in humans. For example, the crossing effect could be ameliorated when
participants viewed uncrossed rubber hands (with their real hands hidden), indicating that visual
(and not just proprioceptive) cues modulate spatial remapping (Azanon and Soto-Faraco 2007). In
the same vein, congenitally blind people did not display a TOJ crossing effect, suggesting that they
do not by default activate an external reference frame for tactile localization (Röder et al. 2004).
Congenitally blind people also outperformed sighted participants when the use of an anatomically
anchored reference frame was advantageous to solve a task, whereas they performed worse than
the sighted when an external reference frame was better suited to solve a task (Röder et al. 2007).
Importantly, people who turned blind later in life were influenced by an external reference frame in
the same manner as sighted participants, indicating that spatial remapping develops during ontog-
eny when the visual system is available, and that the lack of automatic coordinate transformations
into an external reference frame is not simply an unspecific effect of long-term visual deprivation
(Röder et al. 2004, 2007). In conclusion, the use of an external reference frame seems to be induced
by the visual system, and this suggests that the external coordinates used in the remapping of sen-
sory information are visual coordinates.
Children did not show a TOJ crossing effect before the age of ~5½ years (Pagel et al. 2009). This
late use of external coordinates suggests that spatial remapping requires a high amount of learning
and visual–tactile experience during interaction with the environment. One might therefore expect
remapping to take place only in regions that are accessible to vision. In the TOJ paradigm, one
would thus expect a crossing effect when the hands are held in front, but no such crossing effect
when the hands are held behind the back (as, because of the lack of tactile–visual experience in that
part of space, no visual–tactile remapping should take place). At odds with these predictions, Kobor
and colleagues (2006) observed a TOJ crossing effect (although somewhat reduced) also behind the
back. We conducted the same experiment in our laboratory, and found that the size of the crossing
effect did not differ in the front and in the back [previously unpublished data; n = 11 young, healthy,
blindfolded adults, just noticeable difference (JND) for correct stimulus order: front uncrossed: 66 ±
10 ms; uncrossed back: 67 ± 11 ms; crossed front: 143 ± 39 ms; crossed back: 138 ± 25 ms; ANOVA
main effect of part of space and interaction of hand crossing with hand of space, both F1,10 <1].
Because we must assume only minimal visual–tactile experience for the space behind our body,
the results of these two experiments do not support the idea that the external coordinate system in
tactile remapping is purely visual.
It is possible that the brain uses an action-based reference frame that represents the environment
(or action target location) in external coordinates, which can be used to orient not only the eye, but
also gaze, trunk, or the whole body, rather than simply visual (i.e., eye- or retina-centered) coor-
dinates. In other words, the external reference frame may be anchored to the eyes for that part of
space that is currently accessible to the eyes, but may be related to head, trunk, or body movement
parameters for those parts of space currently out of view. Such a coordinate system would benefit
from using external coordinates, because eye-, head-, and possibly body or trunk position must all
be fused to allow the directing of the eyes (and, with them, usually the focus of attention) onto an
externally located target.
572 The Neural Bases of Multisensory Processes
Such an action-related reference frame seems plausible for several reasons. At least eye and
head orienting (together referred to as gaze orienting) are both mediated by the superior colliculus
(SC) (Walton et al. 2007), a brain structure that is important for multisensory processing (Stein et
al. 1995; Freedman and Sparks 1997; Stuphorn et al. 2000) and that is connected to the IPS (Pare
and Wurtz 1997, 2001). IPS as well as PMv (also connected to IPS) encode reaching (i.e., action)
targets also in the dark, that is, in the absence of visual information (Fattori et al. 2005; Graziano
et al. 1997). Moreover, a recent fMRI study demonstrated activation of the frontal eye fields—a
structure thought to be involved in saccadic eye movements and visual attention—to sounds behind
the head (Tark and Curtis 2009), which would suggest either a representation of unseen space (Tark
and Curtis 2009) or, alternatively, the representation of a target coordinate in “action space” rather
than in eye-centered space.
For locations that one can orient toward with an eye–head movement, an action-based reference
frame could be identical to a visual reference frame and use eye- or gaze-centered coordinates, in
line with eye-centered coding of saccade as well as hand reach targets in LIP and MIP. The mon-
key’s head is usually fixed during single-cell recording experiments, making it impossible to differ-
entiate between eye-centered and gaze-centered (let alone trunk- or body-centered target) coding.
In addition, the spatial coding of reach targets that are out of view (but not in the dark) has, to our
knowledge, not been investigated.
To sum up, many electrophysiological and behavioral studies have suggested that touch is
remapped into visual coordinates, presumably to permit its integration with information from other
modalities. Remapping refers to a recoding of the location of a tactile event on the skin onto its
external-spatial coordinates; in other words, remapping accounts for body posture when matching
visual and tactile spatial locations. Because of the influence of the visual system during ontogeny
(and, therefore, not in the congenitally blind), remapping occurs even for unimodal tactile events.
Yet, the external reference frame may be “more than” visual, subserving orienting actions also to
parts of space outside the current visual field.
28.3.6 Investigating the Body Schema and Reference Frames with Electrophysiology
Most of the evidence for coordinate transformations in humans discussed in this chapter so far has
used behavioral measures. Electrophysiological measures (e.g., ERPs) offer an additional approach
to investigate these processes. ERPs record electrical brain signals with millisecond resolution and
therefore allow a very detailed investigation of functional brain activity. One fruitful approach is the
manipulation of the attentional processing of sensory stimuli: it is known that the ERP is enhanced
when a stimulus is presented at a location the person is currently attending to. In fact, there have
been reports about the effect of hand crossing on the attentional processing of tactile stimuli deliv-
ered to the two hands. When a tactile stimulus is delivered to a hand while participants direct their
attention to that hand, ERP deflections in the time range of 80–150 ms as well as between 200 and
300 ms are enhanced compared to when the same stimuli are delivered to the same hand while it is
not attended. However, when participants crossed their hands, early attentional effects disappeared,
and later effects were significantly reduced (Eimer et al. 2001, 2003).
These ERP results imply that tactile spatial attentional processes do not rely on an anatomi-
cal reference frame alone, as posture should otherwise have had no influence on attention-related
ERP effects. A disadvantage of this experimental design is the only coarse differentiation between
attended and unattended stimulation to determine the influence of reference frames: the lack of
difference between attended and unattended conditions after hand crossing may be due to mere
confusion of the two hands, effectively preventing selective direction of attention to one hand.
Alternatively, it may be due to attentional enhancement to one hand in a somatotopic, and to the
other hand in an external reference frame.
However, the difference in ERP magnitude between attended and unattended spatial locations is
not binary. Rather, ERPs gradually decrease with the distance at which a stimulus is presented from
The Body in a Multisensory World 573
the attended location, a phenomenon termed the spatial attentional gradient (Mangun and Hillyard
1988). The spatial gradient can be exploited to test more thoroughly if the ERP effects due to hand
crossing are attributable to hand confusion, and to investigate if coordinate transformations are
calculated for body parts other than the hands. To this end, participants were asked to attend to one
of their feet, while tactile stimuli were presented to both hands and feet in random stimulus order.
The hands were placed near the feet. Crucially, in some blocks each hand lay near its ipsilateral foot,
whereas in some blocks, the hands were crossed so that each hand lay next to its contralateral foot.
Thus, each hand could be near to or far from the attended location (one of the feet). The external
spatial distance of each hand to the attended foot reversed with hand crossing, whereas of course
the anatomical distance from each hand to the attended foot remained identical in both uncrossed
and crossed conditions. Investigating the spatial gradient in ERPs to hand and foot stimuli thus
made it possible to investigate whether the tactile system defines spatial distance in somatotopic or
in external coordinates.
In the time interval 100–140 ms after stimulus presentation, ERPs of unattended tactile hand
stimuli were more similar to the ERP of an attended hand stimulus when the hands were located
close to the attended foot than when they were located far away, demonstrating that tactile attention
uses an external reference frame (Heed and Röder 2010; see Figure 28.2). At the same time, ERPs
were also influenced by the anatomical distance between the attended and the stimulated locations.
x
[µV]
–1
x
1 x
2
x
3
4
x
–100 0 100 200 300 400 [ms]
FIGURE 28.2 ERP results for hand stimulation. Traces from a fronto-central electrode ipsilateral to stimu-
lation. In the figures depicting the different conditions, the attended foot is indicated by a filled gray dot; the
stimulated right hand is indicated by a gray cross. Thin black (lowest) trace (last figure): the hand was attended
and stimulated. Signal should be highest in this condition. Note that the direction of ERP (positive or negative
deflection) does not carry meaning in this context. Black traces (first and second figures): stimulation of the
hand contralateral to attended foot. Gray traces (third and fourth figures): stimulation of the hand ipsilateral
to attended foot. Thin traces: close spatial distance (according to an external reference frame) between stimu-
lated and attended limb. Bold traces: far spatial distance between stimulated and attended limbs. ERPs started
to differ after ~100 ms after stimulus. Differences were largest in the 100- to 140-ms time interval, which has
been known to be modulated by spatial attention. For hand stimulation both ipsilateral and contralateral to the
attended foot, a short spatial distance from the attended foot led to a more positive ERP in this time interval;
in other words, the ERP was more similar to the thin black trace (stimulation at attended location) for near
than for far spatial distance (thin vs. bold traces), indicating the use of an external spatial reference frame for
the representation of nonattended tactile stimuli. At the same time, anatomical distance (black vs. gray colors)
also modulated ERPs, indicating the use of an anatomical reference frame.
574 The Neural Bases of Multisensory Processes
ERPs to unattended hand stimuli were more similar to the ERP of an attended hand stimulus when
the ipsilateral rather than the contralateral foot was attended.
ERPs in this time range are thought to originate in the secondary somatosensory cortex (SII)
(Frot and Mauguiere 1999; Eimer and Forster 2003). Recall that SII was implicated in the integra-
tion of a rubber hand, as indexed by the perceptual drift of the own hand toward the rubber hand
(Tsakiris et al. 2007), as well as in making postural judgments (Corradi-Dell’Acqua et al. 2009).
These findings thus converge with the ERP results in emphasizing the importance of relatively
lower-level somatosensory areas in the representation of our body schema by coding not only the
current position of our hands, but also the current spatial relationship of different body parts to each
other, both in anatomical and external coordinates.
28.3.7 Summary
The second part of this chapter focused on the influence of the body and the body schema on multi-
sensory processing. We started showing that body posture can be used to calibrate the spatial rela-
tionship between the senses, and we discussed that the brain weighs information from the different
senses depending on their reliability. Such statistically optimal integrational processes may also
be at the heart of the phenomena presented in the first part of the chapter, for example, the rubber
hand illusion. The remainder of the chapter focused on multisensory spatial processing, starting out
with the evidence for a special representation of the space directly around our body, demonstrat-
ing the link between the body schema and multisensory spatial processing. We showed that the
peripersonal space is represented not only for the hands, but also for other body parts, and that not
all experimental results can be explained by the common notion of the peripersonal space being
represented simply by tactile RFs on a body part with a matching visual RF. We then showed that
the body schema is also important in multisensory processing, but also in purely tactile processing,
in that tactile locations are automatically remapped into external spatial coordinates. These external
coordinates are closely related to the visual modality, but extend beyond the current visual field into
the space that cannot be seen. Finally, ERPs were shown to be modulated both by anatomical and
external coordinate frames. This highlights that although in some situations tactile locations seem
to be fully remapped into purely external coordinates, the original, anatomical location of the touch
is never quite forgotten. Such concurrent representations of both anatomical and external location
seem useful in the context of action control. For example, to fend off a dangerous object—be it an
insect ready to sting or the hand of an adversary who has grabbed one’s arm—it is crucial to know
which limb can be chosen to initiate the defensive action, but also where the action must be guided
in space. Thus, when the right arm has been grabbed, one cannot use this arm to strike against the
opponent—and this is independent of the current external location of the captured arm. However,
once it has been determined which arm is free for use in a counterattack, it becomes crucial to know
where (in space) this arm should strike to fend off the attacker.
28.4 CONCLUSION
Our different senses enable us to perceive and act toward the environment. However, they also
enable us to perceive ourselves and, foremost, our body. Because we can move in many different
ways, our brain must keep track of our current posture at all times to guide actions effectively.
However, the brain is also surprisingly flexible with respect to representing what it assumes to
belong to the body at any given point in time, and about the body’s current shape. One of the main
principles of the brain’s body processing seems to be the attempt to “make sense of all the senses”
by integrating all available information. As we saw, this processing principle can lead to surprising
illusions, such as the rubber hand illusion, the Pinocchio nose, or the feeling of being located outside
the body, displaced toward a video image. As is often the case in psychology, these illusions also
enlighten us about the brain processes in normal circumstances.
The Body in a Multisensory World 575
As much as multisensory information is important for the construction of our body schema, this
body representation is in turn important for many instances of multisensory processing. Visual
events in the peripersonal space are specially processed to protect our body, and our flexibility
to move in many ways requires that the spatial locations of the different sensory modalities are
transformed into a common reference system. None of these functions could work without some
representation of the body’s current configuration.
REFERENCES
Aglioti, S., N. Smania, M. Manfredi, and G. Berlucchi. 1996. Disownership of left hand and objects related to
it in a patient with right brain damage. Neuroreport 8: 293–296.
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol
14: 257–262.
Andersen, R. A., and H. Cui. 2009. Intention, action planning, and decision making in parietal–frontal circuits.
Neuron 63: 568–583.
Armel, K. C., and V. S. Ramachandran. 2003. Projecting sensations to external objects: Evidence from skin
conductance response. Proc R Soc Lond B Biol Sci 270: 1499–1506.
Aspell, J. E., B. Lenggenhager, and O. Blanke. 2009. Keeping in touch with one’s self: Multisensory mecha-
nisms of self-consciousness. PLoS ONE 4: e6488.
Avillac, M., S. Deneve, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing
visual and tactile locations in parietal cortex. Nat Neurosci 8: 941–949.
Azanon, E., and S. Soto-Faraco. 2007. Alleviating the ‘crossed-hands’ deficit by seeing uncrossed rubber
hands. Exp Brain Res 182: 537–548.
Bakheit, A. M., and S. Roundhill. 2005. Supernumerary phantom limb after stroke. Postgrad Med J 81: e2.
Batista, A. P., C. A. Buneo, L. H. Snyder, and R. A. Andersen. 1999. Reach plans in eye-centered coordinates.
Science 285: 257–260.
Berlucchi, G., and S. M. Aglioti. 2010. The body in the brain revisited. Exp Brain Res 200: 25–35.
Bestmann, S., A. Oliviero, M. Voss, P. Dechent, E. Lopez-Dolado, J. Driver, and J. Baudewig. 2006.
Cortical correlates of TMS-induced phantom hand movements revealed with concurrent TMS-fMRI.
Neuropsychologia 44: 2959–2971.
Blanke, O., T. Landis, L. Spinelli, and M. Seeck. 2004. Out-of-body experience and autoscopy of neurological
origin. Brain 127: 243–258.
Blanke, O., and T. Metzinger. 2009. Full-body illusions and minimal phenomenal selfhood. Trends Cogn Sci
13: 7–13.
Blanke, O., S. Ortigue, T. Landis, and M. Seeck. 2002. Stimulating illusory own-body perceptions. Nature 419:
269–270.
Bolognini, N., and A. Maravita. 2007. Proprioceptive alignment of visual and somatosensory maps in the pos-
terior parietal cortex. Curr Biol 17: 1890–1895.
Botvinick, M. 2004. Neuroscience. Probing the neural basis of body ownership. Science 305: 782–783.
Botvinick, M., and J. Cohen. 1998. Rubber hands ‘feel’ touch that eyes see. Nature 391: 756.
Breveglieri, R., C. Galletti, S. Monaco, and P. Fattori. 2008. Visual, somatosensory, and bimodal activities in
the macaque parietal area PEc. Cereb Cortex 18: 806–816.
Brugger, P., S. S. Kollias, R. M. Muri, G. Crelier, M. C. Hepp-Reymond, and M. Regard. 2000. Beyond remem-
bering: Phantom sensations of congenitally absent limbs. Proc Natl Acad Sci U S A 97: 6167–6172.
Bruno, N., and M. Bertamini. 2010. Haptic perception after a change in hand size. Neuropsychologia 48: 1853–1856.
Buneo, C. A., M. R. Jarvis, A. P. Batista, and R. A. Andersen. 2002. Direct visuomotor transformations for
reaching. Nature 416: 632–636.
Cardinali, L., C. Brozzoli, and A. Farne. 2009a. Peripersonal space and body schema: Two labels for the same
concept? Brain Topogr 21: 252–260.
Cardinali, L., F. Frassinetti, C. Brozzoli, C. Urquizar, A. C. Roy, and A. Farne. 2009b. Tool-use induces mor-
phological updating of the body schema. Curr Biol 19: R478–R479.
Cohen, Y. E., and R. A. Andersen. 2000. Reaches to sounds encoded in an eye-centered reference frame. Neuron
27: 647–652.
Cohen, Y. E., and R. A. Andersen. 2002. A common reference frame for movement plans in the posterior pari-
etal cortex. Nat Rev Neurosci 3: 553–562.
Collins, T., T. Schicke, and B. Röder. 2008. Action goal selection and motor planning can be dissociated by tool
use. Cognition 109: 363–371.
576 The Neural Bases of Multisensory Processes
Corradi-Dell’Acqua, C., B. Tomasino, and G. R. Fink. 2009. What is the position of an arm relative to the body?
Neural correlates of body schema and body structural description. J Neurosci 29: 4162–4171.
Cutting, J. 1978. Study of anosognosia. J Neurol Neurosurg Psychiatry 41: 548–555.
Daprati, E., A. Sirigu, P. Pradat-Diehl, N. Franck, and M. Jeannerod. 2000. Recognition of self-produced move-
ment in a case of severe neglect. Neurocase 6: 477–486.
de Lange, F. P., R. C. Helmich, and I. Toni. 2006. Posture influences motor imagery: An fMRI study. Neuroimage
33: 609–617.
de Vignemont, F. 2010. Body schema and body image—pros and cons. Neuropsychologia 48: 669–680.
di Pellegrino, G., and F. Frassinetti. 2000. Direct evidence from parietal extinction of enhancement of visual
attention near a visible hand. Curr Biol 10: 1475–1477.
Dijkerman, H. C., and E. H. de Haan. 2007. Somatosensory processes subserving perception and action. Behav
Brain Sci 30: 189–201; discussion 201–239.
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent
visual and somatic response properties. J Neurophysiol 79: 126–136.
Ehrsson, H. H. 2007. The experimental induction of out-of-body experiences. Science 317: 1048.
Ehrsson, H. H., C. Spence, and R. E. Passingham. 2004. That’s my hand! Activity in premotor cortex reflects
feeling of ownership of a limb. Science 305: 875–877.
Ehrsson, H. H., K. Wiech, N. Weiskopf, R. J. Dolan, and R. E. Passingham. 2007. Threatening a rubber hand
that you feel is yours elicits a cortical anxiety response. Proc Natl Acad Sci U S A 104: 9828–9833.
Eimer, M., D. Cockburn, B. Smedley, and J. Driver. 2001. Cross-modal links in endogenous spatial attention
are mediated by common external locations: Evidence from event-related brain potentials. Exp Brain Res
139: 398–411.
Eimer, M., and B. Forster. 2003. Modulations of early somatosensory ERP components by transient and sus-
tained spatial attention. Exp Brain Res 151: 24–31.
Eimer, M., B. Forster, and J. Van Velzen. 2003. Anterior and posterior attentional control systems use differ-
ent spatial reference frames: ERP evidence from covert tactile–spatial orienting. Psychophysiology 40:
924–933.
Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415: 429–433.
Fattori, P., D. F. Kutz, R. Breveglieri, N. Marzocchi, and C. Galletti. 2005. Spatial tuning of reaching activity in
the medial parieto-occipital cortex (area V6A) of macaque monkey. Eur J Neurosci 22: 956–972.
Fitzgerald, P. J., J. W. Lane, P. H. Thakur, and S. S. Hsiao. 2004. Receptive field properties of the macaque second
somatosensory cortex: Evidence for multiple functional representations. J Neurosci 24: 11193–11204.
Fogassi, L., V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal
space in inferior premotor cortex (area F4). J Neurophysiol 76: 141–157.
Freedman, E. G., and D. L. Sparks. 1997. Activity of cells in the deeper layers of the superior colliculus of the
rhesus monkey: Evidence for a gaze displacement command. J Neurophysiol 78: 1669–1690.
Frot, M., and F. Mauguiere. 1999. Timing and spatial distribution of somatosensory responses recorded in the
upper bank of the sylvian fissure (SII area) in humans. Cereb Cortex 9: 854–863.
Gallagher, S. 1986. Body image and body schema: A conceptual clarification. J Mind Behav 7: 541–554.
Graziano, M. S. 1999. Where is my arm? The relative role of vision and proprioception in the neuronal repre-
sentation of limb position. Proc Natl Acad Sci U S A 96: 10418–10421.
Graziano, M. S., and D. F. Cooke. 2006. Parieto-frontal interactions, personal space, and defensive behavior.
Neuropsychologia 44: 845–859.
Graziano, M. S., D. F. Cooke, and C. S. Taylor. 2000. Coding the location of the arm by sight. Science 290:
1782–1786.
Graziano, M. S., and S. Gandhi. 2000. Location of the polysensory zone in the precentral gyrus of anesthetized
monkeys. Exp Brain Res 135: 259–266.
Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. J
Neurophysiol 77: 2268–2292.
Graziano, M. S., L. A. Reiss, and C. G. Gross. 1999. A neuronal representation of the location of nearby sounds.
Nature 397: 428–430.
Graziano, M. S., C. S. Taylor, and T. Moore. 2002. Complex movements evoked by microstimulation of pre-
central cortex. Neuron 34: 841–851.
Graziano, M. S., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science 266:
1054–1057.
Grefkes, C., and G. R. Fink. 2005. The functional organization of the intraparietal sulcus in humans and mon-
keys. J Anat 207: 3–17.
The Body in a Multisensory World 577
Halligan, P. W., J. C. Marshall, and D. T. Wade. 1993. Three arms: A case study of supernumerary phantom
limb after right hemisphere stroke. J Neurol Neurosurg Psychiatry 56: 159–166.
Halligan, P. W., J. C. Marshall, and D. T. Wade. 1995. Unilateral somatoparaphrenia after right hemisphere
stroke: A case description. Cortex 31: 173–182.
Harris, C. M., and D. M. Wolpert. 1998. Signal-dependent noise determines motor planning. Nature 394:
780–784.
Heed, T., and B. Röder. 2010. Common anatomical and external coding for hands and feet in tactile attention:
Evidence from event-related potentials. J Cogn Neurosci 22: 184–202.
Holmes, N. P., G. A. Calvert, and C. Spence. 2004. Extending or projecting peripersonal space with tools?
Multisensory interactions highlight only the distal and proximal ends of tools. Neurosci Lett 372:
62–67.
Holmes, N. P., G. A. Calvert, and C. Spence. 2007. Tool use changes multisensory interactions in seconds:
Evidence from the crossmodal congruency task. Exp Brain Res 183: 465–476.
Holmes, N. P., H. J. Snijders, and C. Spence. 2006. Reaching with alien limbs: Visual exposure to pros-
thetic hands in a mirror biases proprioception without accompanying illusions of ownership. Percept
Psychophys 68: 685–701.
Holmes, N. P., and C. Spence. 2004. The body schema and multisensory representation(s) of peripersonal
space. Cogn Process 5: 94–105.
Holmes, N. P., C. Spence, P. C. Hansen, C. E. Mackay, and G. A. Calvert. 2008. The multisensory attentional
consequences of tool use: A functional magnetic resonance imaging study. PLoS ONE 3: e3502.
Hötting, K., and B. Röder. 2004. Hearing cheats touch, but less in congenitally blind than in sighted individuals.
Psychol Sci 15: 60–64.
Ionta, S., and O. Blanke. 2009. Differential influence of hands posture on mental rotation of hands and feet in
left and right handers. Exp Brain Res 195: 207–217.
Ionta, S., A. D. Fourkas, M. Fiorio, and S. M. Aglioti,. 2007. The influence of hands posture on mental rotation
of hands and feet. Exp Brain Res 183: 1–7.
Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque
postcentral neurones. Neuroreport 7: 2325–2330.
Kammers, M. P., F. de Vignemont, L. Verhagen, and H. C. Dijkerman. 2009a. The rubber hand illusion in
action. Neuropsychologia 47: 204–211.
Kammers, M. P., J. A. Kootker, H. Hogendoorn, and H. C. Dijkerman. 2009b. How many motoric body repre-
sentations can we grasp? Exp Brain Res 202: 203–212.
Kammers, M. P., L. O. Verhagen, L. H. C. Dijkerman, H. Hogendoorn, F. De Vignemont, and D. J. Schutter.
2009c. Is this hand for real? Attenuation of the rubber hand illusion by transcranial magnetic stimulation
over the inferior parietal lobule. J Cogn Neurosci 21: 1311–1320.
Karnath, H. O., S. Ferber, and M. Himmelbach. 2001. Spatial awareness is a function of the temporal not the
posterior parietal lobe. Nature 411: 950–953.
Kase, C. S., J. F. Troncoso, J. E. Court, J. F. Tapia, and J. P. Mohr. 1977. Global spatial disorientation. Clinico-
pathologic correlations. J Neurol Sci 34: 267–278.
Kitazawa, S. 2002. Where conscious sensation takes place. Conscious Cogn 11: 475–477.
Kobor, I., L. Furedi, G. Kovacs, C. Spence, and Z. Vidnyanszky. 2006. Back-to-front: Improved tactile dis-
crimination performance in the space you cannot see. Neurosci Lett 400: 163–167.
Körding, K. P., and D. M. Wolpert, 2004. Bayesian integration in sensorimotor learning. Nature 427:
244–247.
Lackner, J. R. 1988. Some proprioceptive influences on the perceptual representation of body shape and orien-
tation. Brain 111(Pt 2): 281–297.
Lackner, J. R., and P. A. DiZio. 2000. Aspects of body self-calibration. Trends Cogn Sci 4: 279–288.
Lackner, J. R., and B. Shenker. 1985. Proprioceptive influences on auditory and visual spatial localization. J
Neurosci 5: 579–583.
Lacroix, R., R. Melzack, D. Smith, and N. Mitchell. 1992. Multiple phantom limbs in a child. Cortex 28:
503–507.
Ladavas, E. 2002. Functional and dynamic properties of visual peripersonal space. Trends Cogn Sci 6: 17–22.
Ladavas, E., G. di Pellegrino, A. Farne, and G. Zeloni. 1998. Neuropsychological evidence of an integrated
visuotactile representation of peripersonal space in humans. J Cogn Neurosci 10: 581–589.
Lenggenhager, B., T. Tadi, T. Metzinger, and O. Blanke. 2007. Video ergo sum: manipulating bodily self-
consciousness. Science 317: 1096–1099.
Lloyd, D. M., D. I. Shore, C. Spence, and G. A. Calvert. 2003. Multisensory representation of limb position in
human premotor cortex. Nat Neurosci 6: 17–18.
578 The Neural Bases of Multisensory Processes
Longo, M. R., F. Schuur, M. P. Kammers, M. Tsakiris, and P. Haggard. 2008. What is embodiment? A psycho-
metric approach. Cognition 107: 978–998.
Lotze, M., H. Flor, W. Grodd, W. Larbig, and N. Birbaumer. 2001. Phantom movements and pain. An fMRI
study in upper limb amputees. Brain 124: 2268–2277.
Macaluso, E., C. D. Frith, and J. Driver. 2002. Crossmodal spatial influences of touch on extrastriate visual
areas take current gaze direction into account. Neuron 34: 647–658.
Macaluso, E., C. D. Frith, and J. Driver. 2007. Delay activity and sensory–motor translation during planned eye
or hand movements to visual or tactile targets. J Neurophysiol 98: 3081–3094.
Makin, T. R., N. P. Holmes, and E. Zohary. 2007. Is that near my hand? Multisensory representation of peri
personal space in human intraparietal sulcus. J Neurosci 27: 731–740.
Mangun, G. R., and S. A. Hillyard. 1988. Spatial gradients of visual attention: Behavioral and electrophysi-
ological evidence. Electroencephalogr Clin Neurophysiol 70: 417–428.
Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends Cog Sci 8: 79–86.
Maravita, A., C. Spence, S. Kennett, and J. Driver. 2002. Tool-use changes multimodal spatial interactions
between vision and touch in normal humans. Cognition 83: B25–B34.
Marino, B. F., N. Stucchi, E. Nava, P. Haggard, and A. Maravita. 2010. Distorting the visual size of the hand
affects hand pre-shaping during grasping. Exp Brain Res 202: 499–505.
McGonigle, D. J., R. Hanninen, S. Salenius, R. Hari, R. S. Frackowiak, and C. D. Frith. 2002. Whose arm is it
anyway? An fMRI case study of supernumerary phantom limb. Brain 125: 1265–1274.
Metzinger, T. 2009. Why are out-of-body experiences interesting for philosophers? The theoretical relevance
of OBE research. Cortex 45: 256–258.
Misaki, M., E. Matsumoto, and S. Miyauchi. 2002. Dorsal visual cortex activity elicited by posture change in
a visuo- tactile matching task. Neuroreport 13: 1797–1800.
Mort, D. J., P. Malhotra, S. K. Mannan, C. Rorden, A. Pambakian, C. Kennard, and M. Husain. 2003. The
anatomy of visual neglect. Brain 126: 1986–1997.
Mullette-Gillman, O. A., Y. E. Cohen, and J. M. Groh. 2005. Eye-centered, head-centered, and complex coding
of visual and auditory targets in the intraparietal sulcus. J Neurophysiol 94: 2331–2352.
Naito, E. 2004. Sensing limb movements in the motor cortex: How humans sense limb movement. Neuroscientist
10: 73–82.
Naito, E., P. E. Roland, and H. H. Ehrsson. 2002. I feel my hand moving: A new role of the primary motor
cortex in somatic perception of limb movement. Neuron 36: 979–988.
Obayashi, S., M. Tanaka, and A. Iriki. 2000. Subjective image of invisible hand coded by monkey intraparietal
neurons. Neuroreport 11: 3499–3505.
Pagel, B., T. Heed, and B. Röder. 2009. Change of reference frame for tactile localization during child develop-
ment. Dev Sci 12: 929–937.
Paqueron, X., M. Leguen, D. Rosenthal, P. Coriat, P. J. C. Willer, and N. Danziger. 2003. The phenomenology
of body image distortions induced by regional anaesthesia. Brain 126: 702–712.
Pare, M., and R. H. Wurtz. 1997. Monkey posterior parietal cortex neurons antidromically activated from supe-
rior colliculus. J Neurophysiol 78: 3493–3497.
Pare, M., and R. H. Wurtz. 2001. Progression in neuronal processing for saccadic eye movements from parietal
cortex area LIP to superior colliculus. J Neurophysiol 85: 2545–2562.
Parsons, L. M. 1987. Imagined spatial transformations of one’s hands and feet. Cogn Psychol 19: 178–241.
Pavani, F., C. Spence, and J. Driver. 2000. Visual capture of touch: Out-of-the-body experiences with rubber
gloves. Psychol Sci 11: 353–359.
Pellijeff, A., L. Bonilha, P. S. Morgan, K. McKenzie, and S. R. Jackson. 2006. Parietal updating of limb pos-
ture: An event-related fMRI study. Neuropsychologia 44: 2685–2690.
Pesaran, B., M. J. Nelson, and R. A. Andersen. 2006. Dorsal premotor neurons encode the relative position of
the hand, eye, and goal during reach planning. Neuron 51: 125–134.
Pouget, A., S. Deneve, and J. R. Duhamel. 2002. A computational perspective on the neural basis of multi
sensory spatial representations. Nat Rev Neurosci 3: 741–747.
Press, C., C. Heyes, P. Haggard, and M. Eimer. 2008. Visuotactile learning and body representation: An ERP
study with rubber hands and rubber objects. J Cogn Neurosci 20: 312–323.
Previc, F. H. 1998. The neuropsychology of 3-D space. Psychol Bull 124: 123–164.
Ramachandran, V. S. 1993. Behavioral and magnetoencephalographic correlates of plasticity in the adult human
brain. Proc Natl Acad Sci U S A 90: 10413–10420.
Ramachandran, V. S., and W. Hirstein. 1997. Three laws of qualia—what neurology tells us about the biological
functions of consciousness, qualia and the self. J Consciousness Stud 4: 429–458.
The Body in a Multisensory World 579
Ramachandran, V. S., and W. Hirstein. 1998. The perception of phantom limbs. The D. O. Hebb lecture. Brain
121(Pt 9): 1603–1630.
Rizzolatti, G., G. Luppino, and M. Matelli. 1998. The organization of the cortical motor system: New concepts.
Electroencephalogr Clin Neurophysiol 106: 283–296.
Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981a. Afferent properties of periarcuate neurons
in macaque monkeys. I. Somatosensory responses. Behav Brain Res 2: 125–146.
Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981b. Afferent properties of periarcuate neurons
in macaque monkeys: II. Visual responses. Behav Brain Res 2: 147–163.
Röder, B., A. Kusmierek, C. Spence, and T. Schicke. 2007. Developmental vision determines the reference
frame for the multisensory control of action. Proc Natl Acad Sci U S A 104: 4753–4758.
Röder, B., F. Rösler, and C. Spence. 2004. Early vision impairs tactile perception in the blind. Curr Biol 14:
121–124.
Roux, F. E., J. A. Lotterie, E. Cassol, Y. Lazorthes, J. C. Sol, and I. Berry. 2003. Cortical areas involved in vir-
tual movement of phantom limbs: comparison with normal subjects. Neurosurgery 53: 1342–1352.
Saadah, E. S., and R. Melzack. 1994. Phantom limb experiences in congenital limb-deficient adults. Cortex
30: 479–485.
Sakata, H., Y. Takaoka, A. Kawarasaki, and H. Shibutani. 1973. Somatosensory properties of neurons in the
superior parietal cortex (area 5) of the rhesus monkey. Brain Res 64: 85–102.
Schaefer, M., H. Flor, H. J. Heinze, and M. Rotte. 2007. Morphing the body: Illusory feeling of an elongated
arm affects somatosensory homunculus. Neuroimage 36: 700–705.
Scherberger, H., M. A. Goodale, and R. A. Andersen. 2003. Target selection for reaching and saccades share a
similar behavioral reference frame in the macaque. J Neurophysiol 89: 1456–1466.
Schicke, T., F. Bauer, and B. Röder. 2009. Interactions of different body parts in peripersonal space: how vision
of the foot influences tactile perception at the hand. Exp Brain Res 192: 703–715.
Schicke, T., and B. Röder. 2006. Spatial remapping of touch: Confusion of perceived stimulus order across
hand and foot. Proc Natl Acad Sci U S A 103: 11808–11813.
Schlack, A., S. J. Sterbing-D’Angelo, K. Hartung, K. P. Hoffmann, and F. Bremmer. 2005. Multisensory space
representations in the macaque ventral intraparietal area. J Neurosci 25: 4616–4625.
Sellal, F., C. Renaseau-Leclerc, and R. Labrecque. 1996. The man with 6 arms. An analysis of supernumerary
phantom limbs after right hemisphere stroke. Rev Neurol (Paris) 152: 190–195.
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Brain Res Cogn Brain Res
14: 147–152.
Shore, D. I., E. Spry, and C. Spence. 2002. Confusing the mind by crossing the hands. Brain Res Cogn Brain
Res 14: 153–163.
Simmel, M. L. 1962. The reality of phantom sensations. Soc Res 29: 337–356.
Spence, C., F. Pavani, and J. Driver. 2004a. Spatial constraints on visual–tactile cross-modal distractor congru-
ency effects. Cogn Affect Behav Neurosci 4: 148–169.
Spence, C., F. Pavani, A. Maravita, and N. Holmes, N. 2004b. Multisensory contributions to the 3-D represen-
tation of visuotactile peripersonal space in humans: Evidence from the crossmodal congruency task. J
Physiol Paris 98: 171–189.
Stein, B. E., M. T. Wallace, and M. A. Meredith. 1995. Neural mechanisms mediating attention and orientation
to multisensory cues. In The Cognitive Neurosciences, ed. M. S. Gazzaniga, 683–702. Cambridge, MA:
MIT Press, Bradford Book.
Stricanne, B., R. A. Andersen, and P. Mazzoni. 1996. Eye-centered, head-centered, and intermediate coding of
remembered sound locations in area LIP. J Neurophysiol 76: 2071–2076.
Stuphorn, V., E. Bauswein, and K. P. Hoffmann. 2000. Neurons in the primate superior colliculus coding for
arm movements in gaze-related coordinates. J Neurophysiol 83: 1283–1299.
Tark, K. J., and C. E. Curtis. 2009. Persistent neural activity in the human frontal cortex when maintaining
space that is off the map. Nat Neurosci 12: 1463–1468.
Trommershäuser, J., L. T. Maloney, and M. S. Landy. 2003. Statistical decision theory and trade-offs in the
control of motor response. Spat Vis 16: 255–275.
Tsakiris, M. 2010. My body in the brain: a neurocognitive model of body-ownership. Neuropsychologia 48:
703–712.
Tsakiris, M., M. D. Hesse, C. Boy, P. Haggard, and G. R. Fink. 2007. Neural signatures of body ownership: A
sensory network for bodily self-consciousness. Cereb Cortex 17: 2235–2244.
Türker, K. S., P. L. Yeo, and S. C. Gandevia. 2005. Perceptual distortion of face deletion by local anaesthesia of
the human lips and teeth. Exp Brain Res 165: 37–43.
580 The Neural Bases of Multisensory Processes
Walton, M. M., B. Bechara, and N. J. Gandhi. 2007. Role of the primate superior colliculus in the control of
head movements. J Neurophysiol 98: 2022–2037.
Xing, J., and R. A. Andersen. 2000. Models of the posterior parietal cortex which perform multimodal integra-
tion and represent space in several coordinate frames. J Cogn Neurosci 12: 601–614.
Yamamoto, S., and S. Kitazawa. 2001a. Reversal of subjective temporal order due to arm crossing. Nat Neurosci
4: 759–765.
Yamamoto, S., and S. Kitazawa. 2001b. Sensation at the tips of invisible tools. Nat Neurosci 4: 979–980.
Yamamoto, S., S. Moizumi, and S. Kitazawa. 2005. Referral of tactile sensation to the tips of L-shaped sticks.
J Neurophysiol 93: 2856–2863.
Yue, Z., G. N. Bischof, X. Zhou, C. Spence, and B. Röder. 2009. Spatial attention affects the processing of
tactile and visual stimuli presented at the tip of a tool: An event-related potential study. Exp Brain Res
193: 119–128.
Section VII
Naturalistic Multisensory Processes:
Motion Signals
29 Multisensory Interactions
during Motion Perception
From Basic Principles to
Media Applications
Salvador Soto-Faraco and Aleksander Väljamäe
CONTENTS
29.1 Introduction........................................................................................................................... 583
29.2 Basic Phenomenology of Multisensory Interactions in Motion Perception.......................... 584
29.3 Some Behavioral Principles . ................................................................................................ 586
29.3.1 What Is the Processing Level at Which Cross-Modal Interactions in Motion
Processing Originate?................................................................................................ 586
29.3.2 Are These Interactions Specific to Motion Processing?............................................ 588
29.3.3 Pattern of Modality Dominance................................................................................ 588
29.3.4 Multisensory Integration of Motion Speed................................................................ 589
29.4 Neural Correlates of Multisensory Integration of Motion..................................................... 591
29.4.1 Multisensory Motion Processing Areas in the Brain................................................ 591
29.4.2 Evidence for Cross-Modal Integration of Motion Information in the
Human Brain............................................................................................................. 592
29.5 Motion Integration in Multisensory Contexts beyond the Laboratory.................................. 593
29.5.1 Sound Compensating for Reduced Visual Frame Rate............................................. 593
29.5.2 Filling in Visual Motion with Sound......................................................................... 594
29.5.3 Perceptually Optimized Media Applications............................................................ 596
29.6 Conclusions............................................................................................................................ 597
Acknowledgments........................................................................................................................... 598
References....................................................................................................................................... 598
29.1 INTRODUCTION
Hearing the blare of an ambulance siren often impels us to trace the location of the emergency
vehicle with our gaze so we can quickly decide which way to pull the car over. In doing so, we must
combine motion information from the somehow imprecise but omnidirectional auditory system
with the far more precise, albeit spatially bounded, visual system. This type of multisensory inter-
play, so pervasive in everyday life perception of moving objects, has been largely ignored in the
scientific study of motion perception until recently. Here, we provide an overview of recent research
about behavioral and neural mechanisms that support the binding of different sensory modalities
during the perception of motion, and discuss some potential extensions of this research into the
applied context of audiovisual media.
583
584 The Neural Bases of Multisensory Processes
* The phenomenon of apparent motion (namely, experiencing a connected trajectory across two discrete events presented
successively at alternate locations) has been described in different sensory modalities, including the classic example of
vision (Exner 1875; Wertheimer 1912) but also in audition and touch (Burt 1917a, 1917b; Hulin 1927; Kirman 1974).
Moreover, there is evidence suggesting that the principles governing apparent motion are similar for the different senses
(Lakatos and Shepard 1997).
Multisensory Interactions during Motion Perception 585
Proportion correct
Light
0,75
0,50
Task: Judge sound
direction 0,25
0,00
Synch. Asynch.
(b) (d)
Distribution of the cross-modal
Synchronous Sound dynamic capture (n = 384)
congruent Light 70
60 Observed
Synchronous Sound frequency
50
Frequency
conflicting Light Gaussian fit
40
Asynchronous Sound 30
congruent Light
20
500 ms
10
Asynchronous Sound 0
conflicting Light
–1.00
–0.80
–0.60
–0.40
–0.20
0.00
0.20
0.40
0.60
0.80
1.00
500 ms
Size of CDC
FIGURE 29.1 Cross-modal dynamic capture effect. (a) Observer is presented with auditory motion together
with visual motion along the horizontal plane, and is asked to determine the direction of sounds and ignore
the visual event. (b) Examples of different kinds of trials used in the task, combining directional congruency
and synchrony between sound and light. (c) Typical outcome in this task, where accuracy in sound direction
task is strongly influenced by congruency of visual distractor (CDC effect), but only in synchronous trials.
(d) Histogram regarding the size of congruency effect across a sample of 384 participants who performed this
task across a variety of experiments, but under comparable conditions.
motion discrimination performance drops dramatically (by 50%) when the lights are presented
synchronously but move in the opposite direction (see Figure 29.1c and d). This effect of directional
congruency, termed cross-modal dynamic capture (CDC), occurs with equivalent strength when
using continuous (rather than apparent) motion displays, but is eliminated if the visual and auditory
signals are desynchronized in time by as little as half a second. One interesting aspect is that the
frequent errors made by observers under directionally incongruent audiovisual motion are better
explained by a phenomenological reversal in the direction of sounds, rather than by mere confusion.
This latter inference is supported by the finding that the same pattern of directional congruency
effects is seen even after filtering out low confidence responses (self-rated by the observer, after
every trial) from the data (Soto-Faraco et al. 2004a).
Another relevant finding supporting the existence of multisensory integration between motion
signals comes from an adaptation paradigm developed by Kitagawa and Ichihara (2002). In their
study, Kitagawa and Ichihara adapted observers with visual motion either receding from or looming
toward them, and found adaptation aftereffects not only on the perceived direction of subsequent
visual stimuli, but also on auditory stimuli. For example, after adapting observers to looming visual
motion, a steady sound would appear to move away from them (i.e., its intensity would seem to fade
off slightly over time). This result supports the early nature of multisensory interactions between
auditory and visual motion detectors. Interestingly, Kitagawa and Ichihara also tested adaptation
aftereffects when adapting observers with combined auditory and visual motion moving stimuli,
and found that the magnitude of the adaptation effect depended on the directional congruency
586 The Neural Bases of Multisensory Processes
between the two adaptor motion signals (for related findings, see Väljamäe and Soto-Faraco 2008;
Vroomen and de Gelder 2003).
In summary, the findings discussed above seem to point to the existence of robust interactions
between sensory modalities during the extraction of motion information, and in particular of its
direction. However, they are still far from providing a full characterization of these interactions at a
behavioral and at a neural level. We provide some of the main findings regarding these two aspects
in the following sections.
different-direction audiovisual apparent motion streams unless the interstimulus interval between
the two discrete flashes/beeps was larger than 300 ms. Yet, the same observers were able to accu-
rately discriminate the direction of apparent motion streams in each sensory modality individually
for interstimulus intervals below 75 ms. Given that conflict between the stimulus (left–right) and
response (same–different) was not possible in this paradigm, stimulus–response compatibility could
be ruled out as the source of the behavioral effect. Moreover, in this experiment the interstimulus
interval was adjusted using interleaved adaptive staircases to reach the point of perceptual uncer-
tainty. At this point, by definition participants are not aware of whether they are being presented
with a conflicting or a congruent trial, and thus cannot adopt strategies based on stimulus congru-
ence, thereby also ruling out cognitive biases. After eliminating stimulus–response compatibility
and cognitive biases as possible explanations, the participants’ failure to individuate the direction
of each sensory modality in multisensory displays can only be attributed to an interference at a
perceptual level (Soto-Faraco et al. 2005).
Other approaches that have been used to disentangle the contribution of perceptual versus post-
perceptual mechanisms in cross-modal motion effects include the use of analytic tools such as the
signal detection theory (see MacMillan and Creelman 1991), where an independent estimation of
the sensitivity (associated to perceptual sources) and the decision bias (associated to post-perceptual
sources) can be obtained (e.g., Sanabria et al. 2007; Soto-Faraco et al. 2006). In Sanabria et al.
(2007) and Soto-Faraco et al. (2006) studies, for example, participants were asked to discrimi-
nate left-moving sounds (signal) from right-moving sounds (noise) in the context of visual stimuli
that moved in a constant direction throughout the whole experimental block (always left or always
right). The findings were clear, the presence of visual motion lowered sensitivity (d′) to sound direc-
tion as compared to a no-vision baseline, regardless of whether sound direction was consistent or
inconsistent with the visual distractor motion. That is, visual motion made signal and noise motion
signals in the auditory modality more similar between them, and thus, discrimination was more
difficult. However, response criterion (c) in this task shifted consistently with the direction of the
visual distractor. In sum, this experiment was able to dissociate the effects of perceptual interactions
from the effects at the response level. Other authors have used a similar strategy to disentangle the
contribution of perceptual versus post-perceptual processes in somewhat different types of displays
(see Meyer and Wuerger 2001; Meyer et al. 2005). For instance, Meyer and Wuerger presented their
participants with a visual direction discrimination task (using random dot kinematograms) in the
context of auditory distractor motion. They used a mathematical model that included a sensitivity
parameter and a bias parameter, and found that most of the influence that auditory motion had on
the detection responses to visual random dot displays was explained by a decision bias (for a simi-
lar strategy, see Alais and Burr 2004a). This result highlights the importance that post-perceptual
biases can have in experiments using cross-modal distractors, and, in part, contrasts with the result
presented above using the CDC task. Although it is difficult to compare across these methodologi-
cally very different studies, part of the discrepancy might root in the use of vision as the target
modality (as opposed to sound), and the target stimulus being near the threshold for direction dis-
crimination (in Meyer et al.’s case, random dot displays with low directional coherence). More
recent applications of this type of approach have revealed, however, that one can obtain both a shift
in sensitivity over and above any bias effects (Meyer et al. 2005; Wuerger et al. 2003).
In sum, the presence of decision and cognitive influences is very likely to have an effect in most
of the tasks used to address multisensory contributions to motion processing. Yet, the data of several
independent studies seems to show rather conclusively that influences at the level of perception do
also occur during multisensory motion processing. It must be noted, however, that there are lim-
its on how early in the processing hierarchy this cross-modal interaction can occur. For example,
there is evidence that cross-modal motion integration takes place only after certain unisensory
perceptual processes have been completed, such as visual perceptual grouping (e.g., Sanabria et
al. 2004a, 2004b) or the computation of visual speed (López-Moliner and Soto-Faraco 2007; see
Section 29.2.4).
588 The Neural Bases of Multisensory Processes
these are invariably obtained when the visual signal is at or near threshold (e.g., Alais and Burr
2004b; Meyer and Wuerger 2001).
We have incorporated touch to the CDC paradigm in several studies (e.g., Oruç et al. 2008;
Sanabria et al. 2005a, 2005b; Soto-Faraco et al. 2004b). In these cases, participants were asked to
wear vibrotactile stimulators at their index finger pads and place their hands resting on the table,
near the LEDs and/or loudspeakers. Tactile apparent motion was generated by presenting a brief
(50 ms) sine wave (200 Hz) vibration in alternation to each index finger. In this way, auditory, tac-
tile, and visual apparent motion streams could be presented using equivalent stimulus parameters in
terms of onset time, duration, and spatial location. All possible combinations of distractor and target
modality using tactile, visual, and acoustic stimuli were tested (Oruç et al. 2008; Soto-Faraco et al.
2000; Soto-Faraco and Kingstone 2004). When considered as a whole, the results of these experi-
ments reveal a hierarchy of sensory modalities with respect to their contribution to the perception
of motion direction. Vision has a strong influence on auditory motion, yet acoustic distractors did
not modulate the perception of visual motion (along the lines of other recent results, such as those
of Meyer and Wuerger 2001; Kitagawa and Ichiara 2002). A similar pattern applies to visuo-tactile
interactions, whereby vision captures tactile motion direction but touch hardly exerted any influence
on the perception of motion in vision. The combination of auditory and tactile motion stimuli, how-
ever, showed reciprocal influence between both modalities, albeit with a stronger effect of touch on
sound than the reverse. This particular hierarchy, however, must be considered with some caution,
given that factors such as stimulus saliency, reliability, and even cognitive aspects such as attention,
may indeed exert an important influence on the relative strength of the modalities. For example,
it has been shown that directing the focus of attention to one or another modality can modulate
CDC in the case of audio-tactile interactions, although not in modality pairings where vision was
involved (Oruç et al. 2008).
According to the findings described above, vision would be the most dominant sense in terms
of its contribution to compute the direction of motion, then touch, and lastly audition. Within this
framework, multisensory integration would not consist of a process in which one modality overrides
the information in another modality (as the results of the audiovisual case, when considered in isola-
tion, might sometimes suggest; for a similar example based on the dominance of vision over touch
in shape/size perception, see, e.g., Rock and Harris 1967). Instead, the results support the proposal
that multisensory integration of motion would abide to some kind of weighted combination between
different information sources (see López-Moliner and Soto-Faraco 2007). If this is so, then the
strength to which each modality is weighted during the unisensory perception of motion becomes a
particularly relevant issue. This is clearly a matter for further research, but based on the success in
explaining cross-modal results regarding other perceptual domains, one could borrow the ideas of
modality appropriateness or the so-called optimal integration model (Ernst and Banks 2002; Ernst
and Bulthoff 2004) to answer this question.
(a) (b)
Set of moving visual
gratings used
12
45º/s
Spatial frequency
10
30º/s
8
(cycles/º)
15º/s
6
Comparison sound 4
Standard sound
(velocity = 30º/s) (velocity determined 2
by QUEST 0
staircase) 0 0.5 1
Temporal frequency (Hz)
FIGURE 29.2 A summary of results about cross-modal effects in velocity perception, from López-Moliner
and Soto-Faraco’s study. (a) 2IFC paradigm involved a velocity discrimination judgment regarding sounds
(Was the second sound faster or slower than the first?), where second interval could contain either no visual
motion or else moving gratings. (b) Graphical description of different gratings used in the task in space
defined by temporal and spatial frequency. Note that two exemplars of each motion velocity formed by differ-
ent combinations of spatial and temporal frequencies were used. Each of three velocities is denoted by a dif-
ferent symbol (see labels next to symbols). (c–e) Point of subjective equality for sound velocity with reference
to a 30° s−1 standard, when combined with different kinds of moving gratings. Same data are depicted as a
function of (c) spatial frequency, (d) temporal frequency, and (e) velocity of gratings. It can be seen that results
pattern best when depicted along velocity axis.
of the concurrent visual stimulus, so that slow visual motion made participants underestimate the
velocity of concurrent sounds, and rapid visual motion made people perceive the sounds moving
faster than they really were.
In this study, gratings composed of different combinations of spatial and temporal frequencies
could represent visual motion of a given velocity (see Figure 29.2b). This was done so because
sinusoidal moving gratings can be conveniently separated into spatial frequency (sf ) and temporal
frequency (tf ) (Watson and Ahumada 1983), and velocity (v) of a grating can be expressed by the
ratio between its tf (in Hz) and its sf (number of cycles per degree of visual angle). This spatiotem-
poral definition of stimulus space has been previously used to characterize the spectral receptive
fields of neurons at various levels of the visual system in the monkey (e.g., Perrone and Thiele 2001),
and it has received confirmation from human psychophysics (Reisbeck and Gegenfurtner 1999). For
instance, many neurons in the middle temporal cortex (MT) encode velocity, in the sense that the
set of stimuli that best drives these neurons lay along an isovelocity continuum in the space defined
Multisensory Interactions during Motion Perception 591
by sf and tf. Unlike MT neurons, many of the motion-sensitive neurons found at earlier stages of
the visual system such as V1 fail to display an invariant response across different stimuli moving
at equivalent velocities, but rather they often display a response profile tuned to particular temporal
frequencies (however, see Priebe et al. 2006, for velocity responses in some V1 neurons). Given that
the velocity of a grating can be decomposed in terms of spatial and temporal frequency (v = sf/ tf ),
one can then attempt to isolate the influence of varying spatial and temporal frequencies of the
visual stimulus, on the perceived velocity of sounds. What López-Moliner and Soto-Faraco found
is that neither spatial frequency per se, nor temporal frequency, produced any systematic effect on
sound velocity perception, above and beyond that explained by velocity. One could then infer that
the binding of multisensory velocity information might be based on motion information that is
already available at late stages of processing within the visual system. Note that this inference reso-
nates with the finding that multisensory motion integration occurs only after perceptual grouping
has been completed within vision (Sanabria et al. 2004a, 2004b).
lend support to the idea that the consequences of multisensory integration of motion information
can be traced back to relatively early (sensory) stages of motion processing (for behavioral support
of this hypothesis, see Soto-Faraco et al. 2004a, 2005; Sanabria et al. 2007). A second interesting
aspect of Alink et al.’s study is that they reproduced the CDC effect while measuring brain activ-
ity. Thus, they were able to contrast BOLD changes resulting from trials where the CDC illusion
presumably occurred (incorrectly responded ones) with that evoked by otherwise identical trials but
where sounds were perceived to move in the physically correct direction. This contrast revealed that
in trials where the CDC illusion was experienced, activity in the auditory motion areas (AMC) suf-
fered a reduction with respect to veridically perceived trials, whereas the reverse pattern occurred
for visual motion areas (i.e., enhanced activity in MT/V5+ in illusion trials with respect to veridical
perception). This result parallels the visual dominance pattern typically observed in the behavioral
manifestation of CDC. Finally, when extending the scope of their analysis to the whole brain, Alink
et al. found that conflicting motion led to the activation of an extensive network of frontoparietal
areas, including the IPS and supplementary motor areas (SMA). Remarkably, in this analysis Alink
et al. also found that VIP was modulated by the occurrence of illusory motion percepts (as indexed
by the CDC task). In particular, not only VIP resulted more strongly activated in trials leading to
illusory percepts, but this activation seemed to precede in time the activity evoked by the stimulus
itself, an indication that prior state of the motion processing network might be a critical determinant
for CDC to occur, and hence, of cross-modal motion integration.
an accompanying sound effect masks rough editing between two shots (Eidsvik 2005). In a classic
example from George Lucas’ film, The Empire Strikes Back (1980), the visual illusion of a space-
ship door sliding open is created using two successive still shots of a door closed and a door opened
combined with a “whoosh” sound effect (Chion 1994).
These auditory modulations of visual dynamics perception give rise to the practical question of
whether sound could compensate for reduced frame rate in films. A recent study by Mastoropoulou
et al. (2005) investigated the influence of sound in a forced choice discrimination task between pairs
of 3-s video sequences displayed at varying temporal resolutions of 10, 12, 15, 20, or 24 frames per
second (fps). Participants judged motion smoothness of the videos being presented. In visual-only
conditions, naïve participants could discriminate between displays differing by as little as 4 fps. On
the contrary, in audiovisual presentations participants could reliably discriminate between displays
only when they differed by 14 fps. It is perhaps surprising that Mastoropoulou et al. (2005) hypoth-
esized that divided attention might be the cause of the reported effects, without even considering the
alternative explanation that audiovisual information integration might have produced the sensation
of smoother visual displays altogether, thereby making it more difficult to spot discontinuities.
(a) (b) 3
Visual angle
Congruent direction Incongruent direction
Flutter, Ah– 0
(12.5 Hz) t Vl–Al+ Vl–Ah+
–1
53 ms 27 ms
Vh+Ah+ Vl+Ah+
–2
+/– approaching/receding stimuli
Ah/Al: 12.5/6.25Hz auditory flutter
Flicker +
t Vh/Vl: 12.5/6.25Hz visual flicker
Flutter, Vl+Ah– –3
FIGURE 29.3 A subset of experimental conditions and results. (a) Some examples of motion adaptors rep-
resenting low-rate approaching visual stimuli, high-rate receding sounds, and direction conflicting stimuli
combining high-rate sounds with low-rate visual events. (Reprinted from Väljamäe, A. and Soto-Faraco, S.,
Acta Psychol., 129, 249–254, Copyright 2008, with permission from Elsevier.) (b) Magnitude of auditory
aftereffect (in dB/s) after adaptation to time-sampled approaching (+) or receding (−) audiovisual motion
in depth. Left subpanel shows results for directionally congruent adaptors (high-rate visual combined with
high-rate sounds; and low-rate visual combined with high-rate sounds) and right subpanel represents results of
directionally incongruent audiovisual adaptors (low-rate visual combined with low-rate sounds; and low-rate
visual combined with low-rate sounds).
the adaptation aftereffect increased overall, but interestingly both the fast and the slow flicker rates
turned out to be equally effective in producing auditory aftereffects (see Figure 29.3b, left subpanel).
This result strongly suggested that high-rate flutter can fill in sparsely sampled visual object motion.
This filling-in effect could be related to the sound-induced flash phenomenon, whereby the combi-
nation of low-rate flicker with a rapid train of beeps leads to illusory flashes (e.g., Shams et al. 2002).
In fact, the judgments of subjective smoothness regarding the visual flicker stimuli supported the
psychophysical data—low-rate flicker was rated as being smoother when combined with high-rate
beeps than when combined with low-rate flutter.
However, results from these experiments did not speak directly as to whether the observed effects
are specific to motion per se or else, they result just from the effects of the high-frequency temporal
structure of sound signal. In a separate experiment, Väljamäe and Soto-Faraco (2008) tested the
relevance of motion direction congruency of the adaptors by using direction incongruent multisen-
sory adaptors. If the effect of the audiovisual adaptor lacks direction specificity, then the audiovi-
sual adapting stimulus should work equally well despite the cross-modal incongruence in motion
direction. However, the results showed that in the case of incongruent combination of audiovisual
adaptors, weaker aftereffects were produced (Figure 29.3b, right subpanel). In fact, the aftereffects
of these adaptors were not different in size and direction to the auditory motion aftereffects induced
by unimodal acoustic adaptors.
The findings of Väljamäe and Soto-Faraco’s (2008) study could be potentially attributed to the
sound-induced visual flash illusion, given that the timing parameters of the present discrete stimuli
are similar to the ones used in original experiments by Shams et al. 2000 (cf. Lewkowicz 1999 for
discussion on intermodal temporal contiguity window for integration of discrete multisensory events).
Thus, the aftereffects of multisensory adaptors might be explained by perceptual “upgrading” of low-
rate flicker by high-rate train of beeps. In this case, illusory visual flashes might have filled in sparsely
sampled real visual flicker and increased motion aftereffects. Importantly, the observed effects did
not solely depend on the flutter rate, but also on the directional congruency between auditory and
596 The Neural Bases of Multisensory Processes
visual adaptors. This means that the potential of sounds to fill in the visual series critically depends
on some kind of compatibility, or congruence, between the motion signal being processed by hearing
and sight. Thus, above and beyond the potential contribution of the auditory driving phenomenon
(e.g., Welch et al. 1986), the effect described above seems to belong to interactions between motion
cues provided by a moving object. These results might support the ideas that sound can compen-
sate for a reduced visual frame rate in media applications as described in Section 29.5.1. A better
understanding of underlying mechanisms involved in such cross-modal fill-in effects may facilitate
new types of perceptually optimized media applications, where sound and visuals are tightly syn-
chronized on a frame-by-frame basis. From classical examples one can highlight animation films by
Walt Disney (e.g., Fantasia), where music was directly used as a reference for animator’s work, or
the abstract films by Len Lye (e.g., Colour Flight, Rhythm, Free Radicals; see Len Lye filmography
2005) where he was visualizing the music rhythm by painting or scratching directly on celluloid.
optimization may have important implications for audio and video compression and rendering tech-
nologies, especially in wireless communication, which at the present time are developed rather
independently. In these technologies, a critical problem is to find a compromise between the limited
transmission rate of information available to current technology and the realism of the content being
displayed (e.g., Sanchez-Vives and Slater 2005). Future audiovisual media content synthesis, deliv-
ery, and reproduction may switch from such unisensory approach to amodal categories of the end-
user percepts (such as objects, events, or even affective states of the user). In this new multisensory
design, amodal categories may then define sensory modalities to be reproduced and their rendering
quality (cf. “quality of service” approach in media delivery applications).
29.6 CONCLUSIONS
We started by providing an overview of past and recent developments revealing the phenomeno-
logical interactions that can be observed during the perception of motion in multisensory con-
texts. Over and above early findings based on introspective reports (e.g., Anstis 1973; Zapparoli and
Reatto 1969), which already pointed to the existence of strong multisensory interactions in motion
processing, more recent psychophysical studies in humans have often reported that perception of
sound motion can be influenced by the several properties of a concurrently presented visual motion
signal (direction, smoothness, speed). For example, in the CDC effect, sounds can appear to move
in the same direction as a synchronized visual moving object, despite that in reality they travel in
opposite directions (e.g., Soto-Faraco et al. 2002, 2004a). Although these findings were frequently
observed under artificially induced intersensory conflict, they speak directly to the fact of the strong
tendency for multisensory binding that rules motion perception in everyday life naturalistic envi-
ronments. Some of the characteristics that define this multisensory binding of motion signals are:
(1) that these multisensory combination processes occur, at least in part, at early perceptual stages
before other potential effects related to decisional stages take place; (2) that motion information is
subject to multisensory binding, over and above any other binding phenomena that can take place
between spatially static stimuli; and (3) that when other sensory modality combinations are taken
into account, a hierarchy of modalities arises, where vision dominates touch which, in turn, domi-
nates audition (e.g., Soto-Faraco et al. 2003). This hierarchy, however, can be modulated by factors
such as attention focus (Oruç et al. 2008) and most probably by the relative reliabilities between the
sensory signals, as per recent findings in other domains.
We have also touched upon the potential underlying brain mechanisms that support multisensory
binding of motion information. Both animal and human studies reveal that among the brain struc-
tures that are responsive to visual motion, the higher the processing stage at which one looks at, the
more the chances that this area will reveal multisensory properties. In fact, some studies in the past
have provided evidence of overlap between the brain regions that are active during the presentation
of motion in audition, touch, and vision (Bremmer et al. 2001b; Lewis et al. 2000). Two of the struc-
tures that are consistently found in this type of studies are the PMv and parts of the IPS, possibly the
human homologue of the monkey ventral intraparietal (VIP) region. As per animal electrophysiology
data, these two areas are strongly interconnected, display a similar tuning to spatial representations of
moving objects, and contain multisensory neurons. Two recent studies have provided further insight
about the functional organization of multisensory motion processing in the human brain (Alink et
al. 2008; Baumann et al. 2007). In both cases, the involvement of posterior parietal (VIP) and fon-
tral (PMv) areas in binding multisensory motion information seems clear. In addition, Alink et al.’s
results were suggestive of the cross-modal modulation of early sensory areas usually considered to be
involved in unisensory motion processing (MT/V5 in vision, and the PT in audition). One additional
recent finding is also suggestive of the responsiveness of early visual areas to acoustic motion, in this
case as a consequence of brain plasticity in the blind (Saenz et al. 2008).
Finally, we have discussed some of the potential connections between basic and applied research
with regard to the use of dynamic displays in audiovisual media. Film editing techniques that have
598 The Neural Bases of Multisensory Processes
been developed empirically over the years reflect some of the principles that have been indepen-
dently discovered in the laboratory. For example, sound is often used in the cinema to support visual
continuity of a highly dynamic scene, capitalizing on the superior temporal resolution of audition
over vision. Väljamäe and Soto-Faraco (2008) attempted to bridge the gap between basic research
on motion perception and application of multisensory principles by showing that sounds with high-
rate dynamic structure could help compensate for the poor visual continuity of moving stimuli
displayed at low sampling rates. These examples show that better understanding of the underlying
principles of multisensory integration might help to optimize synthesis, transmission, and presenta-
tion of multimedia content.
Future research on multisensory motion perception might make use of the principles that are
being discovered in the laboratory in order to achieve more realistic ecological stimuli using vir-
tual or augmented reality setups. It will also be interesting to study situations where the user or
observer can experience either illusory or real self-motion (see Hettinger 2002 for a recent review).
Although the current multisensory motion research concentrated solely on the situations where the
user is static, viewers often are moving about in real-life situations, which implies that perception
of moving objects of the surrounding environment is modulated by experienced self-motion (Probst
et al. 1984, see Calabro, Soto-Faraco and Vaina 2011, for a multisensory approach). Such investiga-
tions can shed light on interactions between neural mechanisms involved in self-motion and object
motion perception (cf. Bremmer et al. 2005) and, in addition, may further contribute to optimization
of media applications for training and entertainment.
ACKNOWLEDGMENTS
S.S.-F. received support from the Spanish Ministry of Science and Innovation (PSI2010-15426
and Consolider INGENIO CSD2007-00012) and the Comissionat per a Universitats i Recerca del
DIUE-Generalitat de Catalunya (SRG2009-092). A.V. was supported by Fundació La Marató de
TV3 through grant no. 071932.
REFERENCES
Alais, D., and D. Burr. 2004a. The ventriloquist effect results from near-optimal bimodal integration. Current
Biology 14: 257–262.
Alais, D., and D. Burr. 2004b. No direction-specific bimodal facilitation for audiovisual motion detection.
Cognitive Brain Research 19: 185–194.
Alink, A. W., and L. Muckli. 2008. Capture of auditory motion by vision is represented by an activation shift
from auditory to visual motion cortex. Journal of Neuroscience 28: 2690–2697.
Allen, P. G., and P. A. Kolers. 1981. Sensory specificity of apparent motion. Journal of Experimental Psychology:
Human Perception and Performance 7: 1318–1326.
Anstis, S. M. 1973. Hearing with the hands. Perception, 2, 337–341.
Baumann, O., and M. W. Greenlee. 2007. Neural correlates of coherent audiovisual motion perception. Cerebral
Cortex 17:1433–1443.
Berger, T. D., M. Martelli, and D. G. Pelli. 2003. Flicker flutter: Is an illusory event as good as the real thing?
Journal of Vision 3(6): 406–412.
Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic
Bulletin and Review 5: 482–489.
Bitton, J., and S. Agamanolis. 2004. RAW: Conveying minimally-mediated impressions of everyday life with
an audio-photographic tool. In Proceedings of CHI 2004, 495–502. ACM Press.
Bremmer, F., A. Schlack, J. R. Duhamel, W. Graf, and G. R. Fink. 2001a. Space coding in primate parietal
cortex. Neuroimage 14: S46–S51.
Bremmer, F., A. Schlack, N. J. Shah et al. 2001b. Polymodal motion processing in posterior parietal and premo-
tor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron
29: 287–296.
Bremmer, F. 2005. Navigation in space: The role of the macaque ventral intraparietal area. Journal of Physiology
566: 29–35.
Multisensory Interactions during Motion Perception 599
Burtt, H. E. 1917a. Auditory illusions of movement — A preliminary study. Journal of Experimental Psychology
2: 63–75.
Burtt, H. E. 1917b. Tactile illusions of movement. Journal of Experimental Psychology 2: 371–385.
Calabro, F., S. Soto-Faraco, and L. M. Vaina. 2011. Acoustic facilitation of object movement detection during
self-motion. Proceedings of the Royal Academy of Sciences B. doi:10.1098/rspb.2010.2757. In press.
Calvert, G., C. Spence, and E. Barry (eds). 2004. The handbook of multisensory processes. Cambridge, MA:
MIT Press.
Chion, M. 1994. Audio-vision: Sound on screen. New York: Columbia Univ. Press.
Choe, C. S., R. B. Welch, R. M. Gilford, and J. F. Juola. 1975. The ‘ventriloquist effect’: Visual dominance or
response bias? Perception and Psychophysics 18: 55–60.
Colby, C. L., J. R. Duhamel, and M. E. Goldberg. 1993. Ventral intra-parietal area of the macaque: Anatomical
location and visual response properties. Journal of Neurophysiology 69: 902–914.
Connor, S. 2000. Dumbstruck: A cultural history of ventriloquism. Oxford: Oxford Univ. Press.
de Gelder, B., and P. Bertelson. 2003. Multisensory integration, perception and ecological validity. Trends in
Cognitive Sciences 7: 460–467.
Dong, C., N. V. Swindale, and M. S. Cynader. 1999. A contingent aftereffect in the auditory system. Nature
Neuroscience 2: 863–865.
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1991. Congruent representations of visual and somatosensory
space in single neurons of monkey ventral intra-parietal cortex (area VIP). In Brain and space, ed. J.
Palliard, 223–236. Oxford: Oxford Univ. Press.
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intra-parietal area of the macaque: Congruent
visual and somatic response properties. Journal of Neurophysiology 79: 126–136.
Eidsvik, C. 2005. Background tracks in recent cinema. In Moving image theory: Ecological condisderations,
ed. J. D. Anderson and B. F. Anderson, 70–78. Carbondale, IL: Southern Illinois Univ. Press.
Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415: 429–433.
Ernst, M. O., and H. H. Bulthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Sciences
8: 162–169.
Exner, S. 1875. Experimentelle Untersuchung der einfachsten psychischen Processe. Pfluger’s Arch Physiol
11: 403–432.
Fahlenbrach, K. 2002. Feeling sounds: Emotional aspects of music videos. In Proceedings of IGEL 2002 con-
ference, Pécs, Hungary.
Fitts, P. M., and R. L. Deininger. 1954. S–R compatibility: Correspondence among paired elements within
stimulus and response codes. Journal of Experimental Psychology 48: 483–492.
Fitts, P. M., and C. M. Seeger. 1953. S–R compatibility: Spatial characteristics of stimulus and response codes.
Journal of Experimental Psychology 46: 199–210.
Furniss, M. 1998. Art in motion: Animation aesthetics. London: John Libbey.
Gazzaniga, M. S. 1987. Perceptual and attentional processes following callosal section in humans. Neuro
psychologia 25: 119–133.
Gepshtein, S., and K. Kubovy. 2000. The emergence of visual objects in space-time. Proceedings of the National
Academy of Sciences 97: 8186–8191.
Gilbert, G. M. 1939. Dynamic psychophysics and the phi phenomenon. Archives of Psychology 237: 5–43.
Graziano, M. S. A., C. G. Gross, C. S. R. Taylor, and T. Moore. 2004. A system of multimodal areas in the pri-
mate brain. In Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 51–68. Oxford:
Oxford Univ. Press.
Graziano, M. S. A., X. Hu, and C. G. Gross. 1997. Visuo-spatial properties of ventral premotor cortex. Journal
of Neurophysiology 77: 2268–2292.
Graziano, M. S. A., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science
266: 1054–1057.
Hagen, M. C., O. Franzen, F. McGlone, G. Essick, C. Dancer, and J. V. Pardo. 2002. Tactile motion activates
the human MT/V5 complex. European Journal of Neuroscience 16: 957–964.
Hettinger, L. J. 2002. Illusory self-motion in virtual environments. In Handbook of virtual environments, ed.
K. M. Stanney, 471–492. Hillsdale, NJ: Lawrence Erlbaum.
Hoisko, J. 2003. Early experiences of visual memory prosthesis for supporting episodic memory. International
Journal of Human–Computer Interaction 15: 209–320.
Hommel, B. 2000. The prepared reflex: Automaticity and control in stimulus–response translation. In Control
of cognitive processes: Attention and performance XVIII, ed. S. Monsell and J. Driver, 247–273.
Cambridge, MA: MIT Press.
600 The Neural Bases of Multisensory Processes
Howard, I. P., and W. B. Templeton. 1966. Human spatial orientation. New York: Wiley.
Hulin, W. S. 1927. An experimental study of apparent tactual movement. Journal of Experimental Psychology
10: 293–320.
Isono H., S. Komiyama, and H. Tamegaya. 1996. An autostereoscopic 3-D HDTV display system with reality
and presence. SID Digest 135–138.
Kamitani, Y., and S. Shimojo. 2001. Sound-induced visual “rabbit.” Journal of Vision 1: 478a.
Kirman, J. H. 1974. Tactile apparent movement: The effects of interstimulus onset interval and stimulus dura-
tion. Perception and Psychophysics 15: 1–6.
Kitagawa, N., and S. Ichihara. 2002. Hearing visual motion in depth. Nature 416: 172–174.
Kohlrausch, A., R. Fassel, and T. Dau. 2000. The influence of carrier level and frequency on modulation and
beat-detection thresholds for sinusoidal carriers. Journal of the Acoustical Society of America 108:
723–734.
Korte, A. 1915. Kinematoscopische Untersuchungen. Zeitschrift für Psychologie 72: 193–296.
Lakatos, S., and R. N. Shepard. 1997. Constraints common to apparent motion in visual, tactile and auditory
space. Journal of Experimental Psychology: Human Perception and Performance 23: 1050–1060.
Landis, C. 1954. Determinants of the critical flicker-fusion threshold. Physiological Reviews 34: 259–286.
Len Lye filmography. Len Lye Foundation site, http://www.govettbrewster.com/LenLye/Foundation/
LenLyeFoundation.aspx (accessed 28 March 2011).
Lewkowicz, D. J. 1999. The development of temporal and spatial intermodal perception. In Cognitive con-
tributions to the perception of spatial and temporal events, ed. G. Aschersleben, 395–420. Amsterdam:
Elsevier.
Lopez-Moliner, J., and S. Soto-Faraco. 2007. Vision affects how fast we hear sounds move. Journal of Vision
7:6.1–6.7.
Luppino, G., A. Murata, P. Govoni, and M. Matelli. 1999. Largely segregated parietofrontal connections link-
ing rostral intraparietal cortex (areas AIP and VIP) and the ventral premotor cortex (areas F5 and F4).
Experimental Brain Research 128: 181–187.
Macmillan, N. A., and C. D. Creelman. 1991. Detection theory: A user’s guide. Cambridge, UK: Cambridge
Univ. Press.
Manabe, K., and H. Riquimaroux. 2000. Sound controls velocity perception of visual apparent motion. Journal
of the Acoustical Society of Japan 21: 171–174.
Marker, C. 1962. La Jetée [Motion picture]. France: Argos Film.
Mastoropoulou, G., K. Debattista, A. Chalmers, and T. Troscianko. 2005. The influence of sound effects on
the perceived smoothness of rendered animations. Paper presented at APGV’05: Second Symposium on
Applied Perception in Graphics and Visualization, La Coruña, Spain.
Mateeff, S., J. Hohnsbein, and T. Noack. 1985. Dynamic visual capture: Apparent auditory motion induced by
a moving visual target. Perception 14: 721–727.
Maunsell, J. H. R., and D. C. Van Essen. 1983. The connections of the middle temporal visual area (MT) and their
relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience 3: 2563–2580.
McCormick, D., and P. Mamassian. 2008. What does the illusory flash look like? Vision Research 48: 63–69.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748.
Meyer, G. F., and S. M. Wuerger. 2001. Cross-modal integration of auditory and visual motion signals.
Neuroreport 12: 2557–2560.
Meyer, G. F., S. M. Wuerger, F. Röhrbein, and C. Zetzsche. 2005. Low-level integration of auditory and visual
motion signals requires spatial co-localisation. Experimental Brain Research 166: 538–547.
Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. 2003. Auditory capture of vision: Examining temporal
ventriloquism. Cognitive Brain Research 17: 154–163.
Ohmura, H. 1987. Intersensory influences on the perception of apparent movement. Japanese Psychological
Research 29: 1–19.
Oruç, I., S. Sinnett, W. F. Bischof, S. Soto-Faraco, K. Lock, and A. Kingstone. 2008. The effect of attention on
the illusory capture of motion in bimodal stimuli, Brain Research 1242: 200–208.
Pavani, F., E. Macaluso, J. D. Warren, J. Driver, and T. D. Griffiths. 2002. A common cortical substrate acti-
vated by horizontal and vertical sound movement in the human brain. Current Biology 12: 1584–1590.
Perrone, J. A., and A. Thiele. 2001. Speed skills: Measuring the visual speed analyzing properties of primate
MT neurons. Nature Neuroscience 4: 526–532.
Pick, H. L., D. H. Warren, and J. C. Hay. 1969. Sensory conflict in judgments of spatial direction. Perception
and Psychophysics 6: 203–205.
Priebe, N. J., S. G. Lisberger, and J. A. Movshon. 2006. Tuning for spatiotemporal frequency and speed in
directionally selective neurons of macaque striate cortex. Journal of Neuroscience 26: 2941–2950.
Multisensory Interactions during Motion Perception 601
Probst, T., S. Krafczyk, T. Brandt, and E. Wist. 1984. Interaction between perceived self-motion and object
motion impairs vehicle guidance. Science 225: 536–538.
Radeau, M., and P. Bertelson. 1976. The effect of a textured visual field on modality dominance in a ventrilo-
quism situation. Perception and Psychophysics 20: 227–235.
Reisbeck, T. E., and K. R. Gegenfurtner. 1999. Velocity tuned mechanisms in human motion processing. Vision
Research 39: 3267–3285.
Rock, I., and C. S. Harris. 1967. Vision and touch. Scientific American 216: 96–104.
Saenz, M., L. B. Lewis, A. G. Huth, I. Fine, and C. Koch. 2008. Visual motion area MT+/V5 responds to audi-
tory motion in human sight-recovery subjects. Journal of Neuroscience 28: 5141–5148.
Sanabria, D., S. Soto-Faraco, and C. Spence. 2004a. Exploring the role of visual perceptual grouping on the
audiovisual integration of motion. Neuroreport 18: 2745–2749.
Sanabria, D., S. Soto-Faraco, and C. Spence. 2005a. Spatiotemporal interactions between audition and touch
depend on hand posture. Experimental Brain Research 165: 505–514.
Sanabria, D., S. Soto-Faraco, and C. Spence. 2005b. Assessing the influence of visual and tactile distractors on
the perception of auditory apparent motion. Experimental Brain Research 166: 548–558.
Sanabria, D., S. Soto-Faraco, J. S. Chan, and C. Spence. 2004b. When does visual perceptual grouping affect
multisensory integration? Cognitive, Affective, and Behavioral Neuroscience 4: 218–229.
Sanabria, D., C. Spence, and S. Soto-Faraco. 2007. Perceptual
�������������������������������������������������������������
and decisional contributions to audiovisual inter-
actions in the perception of apparent motion: A signal detection study. Cognition 102:299–310.
Sanchez-Vives, M. V., and M. Slater. 2005. From presence to consciousness through virtual reality. Nature
Reviews Neuroscience 4: 332–339.
Schlack, A., S. Sterbing, K. Hartung, K. P. Hoffmann, and F. Bremmer. 2005. Multisensory space representa-
tions in the macaque ventral intraparietal area. Journal of Neuroscience 25: 4616–4625.
Sekuler, R., A. B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385: 308.
Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature 408: 788.
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research 14:
147–152.
Shams, L., Y. Kamitani, S. Thompson, and S. Shimojo. 2001. Sound alters visual evoked potential in humans.
NeuroReport 12: 3849–3852.
Shimojo, S., C. Scheier, R. Nijhawan, L. Shams, Y. Kamitani, and K. Watanabe. 2001. Beyond perceptual
modality: Auditory effects on visual perception. Acoustical Science and Technology 22: 61–67.
Simon, J. R. 1969. Reactions towards the source of stimulation. Journal of Experimental Psychology 81:
174–176.
Snoek, C., and M. Worring. 2002. Multimodal video indexing: A review of the state-of-the-art. Multimedia
Tools and Applications 25: 5–35.
Soto-Faraco, S., and A. Kingstone. 2004. Multisensory integration of dynamic information. In The hand-
book of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein, 49–68. Cambridge, MA:
MIT Press.
Soto-Faraco, S., A. Kingstone, and C. Spence. 2000. The role of movement and attention in modulating audio-
visual and audiotactile ‘ventriloquism’ effects. Abstracts of the Psychonomic Society 5: 40.
Soto-Faraco, S., A. Kingstone, and C. Spence. 2003. Multisensory contributions to the perception of motion.
Neuropsychologia 41: 1847–1862.
Soto-Faraco, S., A. Kingstone, and C. Spence. 2006. Integrating motion information across sensory modalities:
The role of top-down factors. In Progress in Brain Research: Visual Perception Series, vol. 155, ed. S.
Martínez-Conde et al., 273–286. Amsterdam: Elsevier.
Soto-Faraco, S., J. Lyons, M. S. Gazzaniga, C. Spence, and A. Kingstone. 2002. The ventriloquist in motion:
Illusory capture of dynamic information across sensory modalities. Cognitive Brain Research 14:
139–146.
Soto-Faraco, S., C. Spence, and A. Kingstone. 2004a. Crossmodal dynamic capture: Congruency effects in the
perception of motion across sensory modalities. Journal of Experimental Psychology: Human Perception
and Performance 30: 330–345.
Soto-Faraco, S., C. Spence, and A. Kingstone. 2005. Assessing automaticity in the audio-visual integration of
motion. Acta Psychologica 118: 71–92.
Soto-Faraco, S., C. Spence, and A. Kingstone. 2004b. Congruency effects between auditory and tactile motion:
Extending the phenomenon of cross-modal dynamic capture. Cognitive Affective and Behavioral
Neuroscience 4: 208–217.
Staal, H. E., and D. C. Donderi. 1983. The effect of sound on visual apparent movement. American Journal of
Psychology 96: 95–105.
602 The Neural Bases of Multisensory Processes
Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press.
Thomas, F., and O. Johnston. 1981. Disney animation: The illusion of life. New York: Abbeyville Press.
Ungerleider, L. G., and R. Desimone. 1986. Cortical connections of visual area MT in the macaque. Journal of
Computational Neurology 248: 190–222.
Väljamäe, A., and S. Soto-Faraco. 2008. Filling-in visual motion with sounds. Acta Psychologica 129:
249–254.
Väljamäe, A., and A. Tajadura-Jiménez. 2007. Perceptual optimization of audio-visual media: Moved by sound.
In Narration and spectatorship in moving images, ed. B. Anderson and J. Anderson. Cambridge Scholars
Press.
Väljamäe, A., A. Tajadura-Jiménez, P. Larsson, D. Västfjäll, and M. Kleiner. 2008. Handheld experiences:
Using audio to enhance the illusion of self-motion. IEEE MultiMedia 15: 68–75.
van der Zee, E., and A. W. van der Meulen. 1982. The influence of field repetition frequency on the visibility of
flicker on displays. IPO Annual Progress Report 17: 76–83.
Vroomen, J., and B. de Gelder. 2000. Sound enhances visual perception: Cross-modal effects of auditory
organization on vision. Journal of Experimental Psychology: Human Perception and Performance 26:
1583–1590.
Vroomen, J., and B. de Gelder. 2003. Visual motion influences the contingent auditory motion aftereffect.
Psychological Science 14: 357–361.
Watson, A. B., and A. J. Ahumada. 1983. A look at motion in the frequency domain. In Motion: Perception and
representation, ed. J. K. Tsotsos, 1–10. New York: Association for Computing Machinery.
Watson, J. D., R. Myers, R. S. Frackowiak et al. 1993. Area V5 of the human brain: Evidence from a combined
study using positron emission tomography and magnetic resonance imaging. Cerebral Cortex 3: 79–94.
Welch, R. B. 1999. Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and
temporal perceptions. In Cognitive contributions to the perception of spatial and temporal events, ed. G.
Ascherlseben, T. Bachmann, and J. Musseler, 371–387. Amsterdam: Elsevier Science.
Welch, R. B., and D. H. Warren. 1986. Intersensory interactions. In Handbook of perception and human perfor-
mance. Vol. 1, Sensory processes and perception, ed. K. R. Boff, L. Kaufman, and J. P. Thomas, 25–36.
New York: Wiley.
Welch, R. B., L. D. Duttenhurt, and D. H. Warren. 1986. Contributions of audition involved in the multi-
modal integration of perceptual and vision to temporal rate perception. Perception and Psychophysics
39: 294–300.
Wertheimer, M. 1912. Experimentelle Studien über das Sehen von Bewegung. [Experimental studies on the
visual perception of movement]. Zeitschrift für Psychologie 61: 161–265.
Wertheimer, M. 1932. Principles of perceptual organization. Psychologische Forschung 41: 301–350. Abridged
translation by M. Wertheimer, in Readings in perception, ed. D. S. Beardslee and M. Wertheimer, 115–
137. Princeton, NJ: Van Nostrand-Reinhold.
Wuerger, S. M., M. Hofbauer, and G. F. Meyer. 2003. The integration of auditory and visual motion signals at
threshold. Perception and Psychophysics 65: 1188–1196.
Zapparoli, G. C., and L. L. Reatto. 1969. The apparent movement between visual and acoustic stimulus and the
problem of intermodal relations. Acta Psychologia 29: 256–267.
Zihl, J., D. von Cramon, and N. Mai. 1983. Selective disturbance of movement vision after bilateral brain dam-
age. Brain 106: 313–40.
Zihl, J., D. von Cramon, N. Mai, and C. Schmid. 1991. Disturbance of movement vision after bilateral posterior
brain damage. Further evidence and follow up observations. Brain 114: 2235–2252.
30 Multimodal Integration during
Self-Motion in Virtual Reality
Jennifer L. Campos and Heinrich H. Bülthoff
CONTENTS
30.1 Introduction...........................................................................................................................603
30.2 Simulation Tools and Techniques..........................................................................................604
30.2.1 Visual Displays..........................................................................................................604
30.2.2 Treadmills and Self-Motion Simulators....................................................................606
30.3 Influence of Visual, Proprioceptive, and Vestibular Information on Self-Motion
Perception.............................................................................................................................. 611
30.3.1 Unisensory Self-Motion Perception........................................................................... 611
30.3.2 Multisensory Self-Motion Perception........................................................................ 613
30.3.2.1 Effects of Cue Combination........................................................................ 613
30.3.2.2 Cue Weighting under Conflict Conditions.................................................. 616
30.3.3 Unique Challenges in Studying Multisensory Self-Motion Perception..................... 618
30.4 Advantages and Disadvantages of Using Simulation Technology to Study Multisensory
Self-Motion Perception.......................................................................................................... 619
30.5 Multisensory Self-Motion Perception: An Applied Perspective........................................... 620
30.6 Summary............................................................................................................................... 622
Acknowledgments........................................................................................................................... 622
References....................................................................................................................................... 622
30.1 INTRODUCTION
Our most common, everyday activities and those that are most essential to our survival, typically
involve moving within and throughout our environment. Whether navigating to acquire resources,
avoiding dangerous situations, or tracking one’s position in space relative to important landmarks,
accurate self-motion perception is critically important. Self-motion perception is typically experi-
enced when an observer is physically moving through space including self-propelled movements
such as walking, running, or swimming, and also when being passively moved while on train or
when actively driving a car or flying a plane. Self-motion perception is important for estimating
movement parameters such as speed, distance, and heading direction. It is also important for the
control of posture, the modulation of gait, and for predicting time to contact when approaching or
avoiding obstacles. It is an essential component of path integration, which involves the accumula-
tion of self-motion information when tracking one’s position in space relative to other locations or
objects. It is also important for the formation of spatial memories when learning complex routes and
environmental layouts.
During almost all natural forms of self-motion, there are several sensory systems that provide
redundant information about the extent, speed, and direction of egocentric movement, the most
important of which include dynamic visual information (i.e., optic flow), vestibular information (i.e.,
provided through the inner ear organs including the otoliths and semicircular canals), proprioceptive
603
604 The Neural Bases of Multisensory Processes
information provided by the muscles and joints, and the efference copy signals representing the com-
mands of these movements. Also important, although less well studied, are auditory signals related
to self-motion and somatosensory cues provided through wind, vibrations, and changes in pressure.
Currently, much work has been done to understand how several of these individual modalities can
be used to perceive different aspects of self-motion independently. However, researchers have only
recently begun to evaluate how they are combined to form a coherent percept of self-motion and the
relative influences of each cue when more than one is available.
Not only is it important to take a multisensory approach to self-motion perception in order to
understand the basic science underlying cue combination, but it is also important to strive toward
evaluating human behaviors as they occur under natural, cue-rich, ecologically valid conditions.
The inherent difficulty in achieving this is that the level of control that is necessary to conduct
careful scientific evaluations is often very difficult to achieve under natural, realistic conditions.
Consequently, in order to maintain strict control over experimental conditions, much of the past
work has been conducted within impoverished, laboratory environments using unnatural tasks.
More recently, however, Virtual Reality (VR) technology and sophisticated self-motion interfaces
have been providing researchers with the opportunity to provide natural, yet tightly controlled,
stimulus conditions, while also maintaining the capacity to create unique experimental scenarios
that could not occur in the real world (Bülthoff and van Veen 2001; Loomis et al. 1999; Tarr and
Warren 2002). VR also does this in a way that maintains an important perception–action loop that
is inherent to nearly all aspects of human–environment interactions.
Visually simulated Virtual Environments (VEs) have been the most commonly used form of VR,
because, until very recently it has been difficult to simulate full-body motion through these environ-
ments without having to resort to unnatural control devices such as joysticks and keyboards. More
recently, the development of high-precision motion tracking systems and sophisticated self-motion
simulators (e.g., treadmills and motion platforms) are allowing far more control and flexibility in
the presentation of body-based self-motion cues (i.e., proprioceptive and vestibular information).
Consequently, researchers are now able to study multisensory self-motion perception in novel and
exciting ways. The significant technological advancements and increased accessibility of many VR
systems have stimulated a renewed excitement in recognizing its significant potential now and in
the future.
Much of the multisensory research up until this point has focused on tasks involving discrete
stimulus presentations in near body space, including visual–auditory, visual–proprioceptive, and
visual–haptic interactions. Far less is understood about how different sources of sensory informa-
tion are combined during large-scale self-motion through action space. Unlike other approaches
used to examine the integration of two specific cues at a particular, discrete instance in time, navi-
gating through the environment requires the dynamic integration of several cues across space and
over time. Understanding the principles underlying multimodal integration in this context of unfold-
ing cue dynamics provides insight into an important category of multisensory processing.
This chapter begins by a brief description of some of the different types of simulation tools and
techniques that are being used to study self-motion perception, along with some of the advantages
and disadvantages of the different interfaces. Subsequently, some of the current empirical work inves-
tigating multisensory self-motion perception using these technologies will be summarized, focusing
mainly on visual, proprioceptive, and vestibular influences during full-body self-motion through
space. Finally, the implications of this research for several applied areas will be briefly described.
unlike anything that can or does exist within the known real world. Rich, realistic visual details
can be included, or the visual scene can be intentionally limited to particular visual cues of inter-
est such as the optic flow provided through a cloud of dots or the relative positioning of selected
landmarks. Instant teleportation from one position in space to another (Meilinger et al. 2007), the
inclusion of wormholes to create non-Euclidean spaces (Schnapp and Warren 2007), and navigation
throughout four-dimensional (4-D) environments (D’Zmura et al. 2000) are all possible. This type
of control and flexibility is not something that can be achieved in a real-world testing environment.
Whereas in the past the process of using computer graphics to create more complex VEs, such as
realistic buildings or cities, was time consuming and arduous, new software advancements are now
allowing entire virtual cities of varying levels of detail to be built in just a few days (e.g., Müller et
al. 2006).
In order to allow an observer to visualize these VEs, different types of displays have been used
(for a more thorough review, see Campos et al. 2007a). Traditionally, desktop displays have been
the most commonly used visualization tool for presenting VEs. These displays typically consist of
a stationary computer monitor paired with an external control device that is used to interact with
the VE (i.e., a joystick or a mouse). Even though the quality and resolution of desktop displays has
been steadily increasing in recent years (e.g., high dynamic range displays; see Akyüz et al. 2007),
they are nonimmersive, have a limited field of view (FOV), and can accommodate very little natural
movement.
Other displays such as the Cave Automatic Virtual Environments (CAVE™; Cruz-Neira et al.
1993) and other large curved projection screen systems (e.g., Meilinger et al. 2008; http://www
.cyberneum.com/PanoLab_en.html; see Figure 30.1) provide observers with a much wider FOV by
FIGURE 30.1 MPI Panoramic projection screen. This large, spherical panoramic projection screen consists
of four projectors that project images of Virtual Environments (VEs) onto surrounding curved walls and
also the floor. This provides a field of view of more than 220° horizontal and 125° vertical, thereby taking up
almost the entire human visual field. Participants can move through the VE via various different input devices
such as bicycles, driving interfaces, or joysticks (as shown here). The VE displayed in photo is a highly realis-
tic virtual model of the city center of Tübingen. (Photo courtesy of: Axel Griesch.)
606 The Neural Bases of Multisensory Processes
projecting images on the walls surrounding the observer, and in some cases, the floor. Such displays
are often projected with two slightly different images (accounting for the interpupillary distance),
which, when paired with stereo glasses (anaglyph stereo or polarized stereo), can provide a 3-D
display of the environment. Despite the full FOV and high level of immersion provided by these
displays, they again only allow for a limited range of active movements.
Apart from desktop displays, head-mounted displays (HMDs) are perhaps the most widely used
visualization system for navigational tasks. HMDs range in size, resolution, and FOV. Their typi-
cally small FOV is one of the main restrictions. This restriction can be partially ameliorated by
pairing the HMD with a motion tracking system that can be used to update the visual image directly
as a function of the observer’s own head movements. This allows for a greater visual sampling of the
environmental space and a more natural method of visually exploring one’s environment. HMDs
also provide a highly immersive experience because the visual information is completely restricted
to that experienced through the display by blocking out all surrounding visual input. The greatest
advantage of HMDs is the extent of mobility that is possible, allowing for natural, large-scale move-
ments through space such as walking.
In terms of understanding the role of particular sources of sensory information in self-motion
perception, there is often a trade-off between having high-resolution, wide FOV displays, which
provide the most compelling visual information, and the flexibility of having a visualization system
that can move with the observer (i.e., HMD), thus providing natural body-based cues. Therefore,
using a combination of approaches is often advisable.
FIGURE 30.2 MPI Tracking Laboratory. This fully tracked, free-walking space is 12 × 12 m in size. In this
space, participants’ position and orientation are tracked using an optical tracking system (16 Vicon MX13
cameras) through monitoring of reflective markers. Information about a participant’s position and orientation
is sent from optical trackers, via a wireless connection, to a backpack-mounted laptop worn by participant.
This system can therefore be used to both update the visual environment as a function of participants’ own
movements (i.e., in HMD as shown here) and to capture different movement parameters. With this setup, it is
also possible to track two or more observers and thus allows for multiuser interactions within a VE. (Photo
courtesy of Manfred Zentsch.)
as step length, facing direction, pointing direction, and body posture can also be recorded. This
provides a rich source of information as it effectively captures even subtle movement characteristics
at every instance in time (e.g., Campos et al. 2009; Siegle et al. 2009).
Other devices that are used to allow physical walking through VEs are treadmill setups. Unlike
free walking spaces, treadmills permit unconstrained walking over infinite distances. Standard
treadmills typically provide a capacity for straight, forward walking while limiting the walker to
one position in space. Essentially, this limits the body-based cues to proprioceptive information.
Most often these setups also use a handrail for stability and support, which provides additional
haptic information informing the observer of their lack of movement through space. When walk-
ing in place under such conditions, not only are the kinematics of walking different from walking
over ground (e.g., propulsive forces), but the vestibular information that is typically generated dur-
ing the acceleration phase of walking is missing. In order to account for this, other, much larger
treadmills (ranging from 1.5 to 2.5 m wide and 3 to 6 m long) have been developed, which allow for
forward, accelerated walking across the treadmill belt until a constant walking velocity is reached
(Hollerbach et al. 2000; Souman et al. 2010; Thompson et al. 2005). A harness can be used for safety
to ensure that the walker does not leave the surface of the treadmill, while still allowing the flex-
ibility of relatively unconstrained movements. Furthermore, systems such as the Sarcos Treadport
system developed by Hollerbach and colleagues is equipped with a tether that can be used to push
and pull the walker in a way that simulates the accelerating or decelerating forces that accompany
walking through space (Christensen et al. 2000). This tether can also be used to simulate uphill or
downhill locomotion (Tristano et al. 2000).
608 The Neural Bases of Multisensory Processes
By pairing these types of setups with a motion tracking system, the treadmill speed can be
adjusted online in response to the observer’s own movements. Specifically, control algorithms have
been developed as a way of allowing an observer to walk naturally (including stopping and chang-
ing walking speeds), while at the same time the treadmill speed is adjusted in a way that causes
the walker to remain as centrally on the treadmill as possible (e.g., Souman et al. 2010). These
algorithms are also optimized so that the recentering movements produce accelerations that are not
strong enough to create large perturbations during walking, causing a loss of balance. In general,
as a method of naturally moving through VEs, large linear treadmills can effectively provide pro-
prioceptive information during walking, as well as some important vestibular cues. However, they
do not allow for turning or rotational movement trajectories and can create some “noisy” vestibular
stimulation during recentering when using a control algorithm.
Circular treadmills constitute another type of movement device that allows for limitless cur-
vilinear walking through space without reaching any end limits. During curvilinear walking, the
vestibular system is always stimulated, thus providing a rich sensory experience through both pro
prioceptive and inertial senses. Most circular treadmills are quite small in diameter and thus mainly
permit walking or rotating in place (e.g., Jürgens et al. 1999). Larger circular treadmills allow for
natural, full-stride walking in circles (see Figure 30.3 for an image of the MPI circular treadmill that
is 3.6 m in diameter). The MPI circular treadmill is a modified version of that originally developed
by Mittelstaedt and Mittelstaedt (1996), which includes new control and safety features and a motor-
ized handlebar that can move independently of the treadmill belt/disk. Consequently, this provides
a unique opportunity to decouple vestibular and proprioceptive information by having participants
walk in place at one rate as they are moved through space at a different rate. This is achieved by
having the participants’ rate of movement through space (i.e., inertial input) dictated by the speed at
which the handlebar is moved, while the rate at which they walk in place (i.e., proprioceptive input)
FIGURE 30.3 MPI Circular Treadmill. This circular treadmill (3.6 m in diameter) allows for natural, full-
stride walking in circles. It is equipped with a motorized handlebar that can move independently from tread-
mill belt/disk. Using this setup, the relation between handlebar speed and disk speed can be systematically
manipulated to provide different information to two sensory systems. A computer monitor mounted on han-
dlebar can also be used to present visual information during movement. (Photo courtesy of Axel Griesch.)
Multimodal Integration during Self-Motion in Virtual Reality 609
is dictated by the rate of the disk relative to walking/handlebar speed. Using this setup, the relation
between the handlebar speed and the disk speed can be systematically manipulated to provide dif-
ferent information to the two sensory systems.
The main drawback of most of these types of treadmill systems is that they do not allow for
combinations of purely linear and rotational movements, nor can they accommodate changes in
walking direction. To address this problem, there have been a handful of attempts to develop omni-
directional treadmills that allow limitless walking in every direction (Darken et al. 1997; Iwata
1999, Torus treadmill). The newest omnidirectional treadmill built by the Cyberwalk project (http://
www.cyberwalk-project.org) is the largest at 6.5 m (21 ft) × 6.5 m (4 m (13 ft) × 4 m walking area)
and weighing 11 tons (see Figure 30.4). It is made up of a series of individual treadmill belts running
in one direction (x), all mounted on two chains that move the belts in the orthogonal direction (y).
Consequently, the combined motion of belts and chains can create motion in any direction. Again,
this system is used in combination with a customized control algorithm to ensure that the walker
remains centered on the platform while allowing them to change speed and direction (Souman et
al. 2010).
Another form of self-motion perception is that which occurs when one is passively moved
through space. In this case, proprioceptive information about lower limb movements is not avail-
able and thus, in the absence of vision, self-motion is mainly detected through vestibular cues and
other sources of nonvisual information (e.g., wind, changes in skin pressure, vibrations). In order to
understand how inertial information can be used for self-motion perception, researchers have used
devices that are able to move an observer within 2-D space, including manual wheelchairs (Allen
et al. 2004; Waller and Greenauer 2007), programmable robotic wheelchairs (Berthoz et al. 1995;
Israël et al. 1997; Siegle et al. 2009), frictionless sleds (Seidman 2008), rotating platforms (Jürgens
FIGURE 30.4 Cyberwalk Omni-directional Treadmill. This large omnidirectional treadmill was built by the
Cyberwalk project (http://www.cyberwalk-project.org) and is housed at the MPI for Biological Cybernetics.
It is 6.5 × 6.5 m (4 × 4 m walking area) and weighs 11 tons. It is made up of a series of individual treadmill
belts running in one direction (x) all mounted on two chains that can move the belts in the orthogonal direction
(y). Consequently, combined motion of belts and chains can create motion in any direction. (Photo courtesy
of Tina Weidgans.)
610 The Neural Bases of Multisensory Processes
et al. 1999), and circular treadmills (Mittelstaedt and Mittelstaedt 1996; MPI circular treadmill, see
Figure 30.3). Other devices allow for 3-D movements such as standard 6 degree-of-freedom motion
platforms (e.g., Stewart motion platform; Berger et al. 2010; Butler et al. 2010; Lehmann et al. 2008;
Riecke et al. 2006; http://www.cyberneum.com/MotionLab_en.html; see Figure 30.5). The MPI
has recently developed a completely new type of motion simulator based on an anthropomorphic
robot arm design (Teufel et al. 2007, http://www.cyberneum.com/RoboLab_en.html; see Figure
30.6). The MPI Motion Simulator can move participants linearly over a range of several meters and
can rotate them around any axis, thus offering a high degree of freedom of motion. Observers can
be passively moved along predefined trajectories (i.e., open loop; Siegle et al. 2009) or they can be
given complete interactive control of their own movements (i.e., closed loop) via a variety of input
devices, including a helicopter cyclic stick (Beykirch et al. 2007) and a steering wheel. As a conse-
quence of its structure, certain degrees of freedom, such as roll and lateral arcs, do not interact with
other degrees of freedom. Furthermore, this serial design provides a larger workspace and allows
FIGURE 30.5 MPI Stewart motion platform. The Motion Lab at the MPI for Biological Cybernetics consists
of a Maxcue 600, 6 degree-of-freedom Stewart platform coupled with a 86 × 65 degree field of view projection
screen mounted on a platform. Subwoofers are installed underneath the seat to produce somatosensory vibra-
tions as a way of masking platform motors. Movements can be presented passively, or participants can control
the platform via several different input devices including a helicopter cyclic stick and a 4 degree-of-freedom
haptics manipulator. (Photo courtesy of Manfred Zentsch.)
Multimodal Integration during Self-Motion in Virtual Reality 611
FIGURE 30.6 MPI Motion Simulator. The MPI Motion simulator is based on an anthropomorphic robot
arm design and can move participants linearly over a range of several meters and can rotate them around any
axis. Observers can be passively moved along predefined trajectories or they can be given complete interactive
control of their own movements via a variety of input devices, such as a helicopter cyclic stick or a steering
wheel. A curved projection screen can also be mounted on end of the robot arm in front of the seated observer
or alternatively an HMD can be used to present immersive visuals. Optical tracking systems have also been
mounted on the robot arm to measure position and orientation of an observer’s head or their arm during
pointing-based tasks. (Photo courtesy of Anne Faden.)
for upside-down movements, infinite roll capabilities, and continuous centrifugal forces—all of
which are not possible with traditional simulator designs.
In summary, as evidenced by the range of interfaces now available and customizable for address-
ing particular research questions, technology is now providing a means by which to carefully eval-
uate multimodal self-motion perception. Visualization devices can be used to assess how visual
information alone can be used to perceive self-motion and can help to determine the importance
of particular visual cues. Self-motion devices are allowing for the systematic isolation of vestibular
or proprioceptive cues during both active, self-propelled movements and during passive transport.
When these different interfaces are combined, this provides the opportunity to devise very specific
multisensory scenarios. Much of this was not possible until very recently and as such, the field of
multisensory self-motion perception is an exciting and a newly emerging field.
necessary and/or sufficient to accurately perceive self-motion. Performance has been measured for
observers who only receive computer-simulated visual information in the absence of body-based
cues, and also when evaluating behaviors during movements in the complete absence of vision (e.g.,
when walking or being passively moved).
Much of the work on visual self-motion perception has looked specifically at the capacity of an
observer to use optic flow alone to effectively perceive self-motion using either sparse visual input
(i.e., textured ground plane or cloud of dots) or a rich visual scene (i.e., realistic visual environment).
For example, it has been shown that individuals are relatively accurate at using dynamic visual
information to discriminate and reproduce visually simulated traveled distances (Bremmer and
Lappe 1999; Frenz et al. 2003; Frenz and Lappe 2005; Redlick et al. 2001; Sun et al. 2004a) and
to update their landmark-relative position in space (Riecke et al. 2002). Other studies have shown
that optic flow alone can be used to estimate various other characteristics of self-motion including
direction (Warren and Hannon 1988; Warren et al. 2001), and speed (Larish and Flach 1990; Sun
et al. 2003) of self-motion through space. Optic flow can also induce postural sway in the absence
of physical movement perturbations (Lee and Aronson 1974; Lestienne et al. 1977) and can be used
to predict the time to contact with an environmental object (Lee 1976). Characteristics of visually
induced illusory self-motion, referred to as “vection,” have also received considerable interest, par-
ticularly from individuals using VR (Dichgans and Brandt 1978; Hettinger 2002; Howard 1986).
Most readers have likely experienced vection while sitting in a stationary train when a neighbor-
ing train begins to move. In this case, the global movement of the outside visual scene induces a
compelling sense of self-motion when really it is the environment (i.e., the neighboring train) that
is moving relative to you. This phenomenon highlights the extent to which vision alone can create
a compelling illusion of self-motion.
Others have studied conditions in which access to visual information is removed and only body-
based cues (e.g., inertial and proprioceptive cues) remain available during movement. It has been
clearly established that humans are able to view a static target up to 20 m and accurately reproduce
this distance by walking an equal extent without vision (Elliott 1986; Fukusima et al. 1997; Loomis
et al. 1992; Mittelstaedt and Mittelstaedt 2001; Rieser et al. 1990; Sun et al. 2004b; Thomson 1983).
Participants can also continuously point to a previously viewed target when walking past it blind-
folded on a straight, forward trajectory (Campos et al. 2009; Loomis et al. 1992). Others have dem-
onstrated that individuals are able to estimate distance information when learning and responding
through blindfolded walking (Ellard and Shaughnessy 2003; Klatzky et al. 1998; Mittelstaedt and
Mittelstaedt 2001; Sun et al. 2004b). A recent article by Durgin et al. (2009) looked specifically at
the mechanisms through which proprioceptive information can be used to estimate an extent of self-
motion and suggest that step integration might be a form of odometry used by humans (even when
explicit step counting is not permitted). Such mechanisms are similar to those previously shown to
be used by terrestrial insects such as desert ants (Wittlinger et al. 2006). There is some evidence,
however, that step integration could be susceptible to accumulating noise and might therefore only
be reliable for short traveled distances (Cheung et al. 2007).
A thorough collection of research has focused specifically on investigating the role of inertial
information, mainly provided through the vestibular organs, during simple linear and rotational
movements (Berthoz et al. 1995; Bertin and Berthoz 2004; Butler et al. 2010; Harris et al. 2000;
Israël and Berthoz 1989; Ivanenko et al. 1997; Mittelstaedt and Glasauer 1991; Mittelstaedt and
Mittelstaedt 2001; Seidman 2008; Siegle et al. 2009; Yong et al. 2007) and when traveling along
more complex routes involving several different segments (Allen et al. 2004; Sholl et al. 1989; Waller
and Greenauer 2007). Some findings have been interpreted to indicate that head velocity and dis-
placement can be accurately perceived by temporally integrating the linear acceleration information
detected by the otolith system. Others indicate that the influence and/or effectiveness of vestibular
information is somewhat limited, particularly when other nonvisual information such as vibrations
are no longer available (Seidman 2008), when moving along trajectories with more complex velocity
profiles (Siegle et al. 2009) or during larger-scale navigation (Waller and Greenauer 2007).
Multimodal Integration during Self-Motion in Virtual Reality 613
cues on obligatory egocentric spatial updating. It was found that neither in rich nor impoverished
visual conditions did the added inertial information improve performance. Unlike other studies,
however, Riecke et al. (2006) demonstrated that with a realistic, familiar visual scene, dynamic
visual information alone could be used for updating egocentric positions.
Lehmann et al. (2008) evaluated the benefits of having inertially based self-motion information
during the mental rotation of an array of objects. In this case, participants were seated on a motion
simulator while viewing a large projection screen and a virtual array of objects displayed on a table-
top directly in front of them. When having to identify which object in the array was shifted after
a viewpoint change (either physically or visually introduced), a detection advantage was observed
after the physical rotation. This indicates that the inertial information provided during the rota-
tion facilitated mental rotation, thus also supporting previous real-world studies (Simons and Wang
1998).
Others have also investigated individual cue contributions during purely linear movements. For
instance, Harris et al. (2000) evaluated the ability of participants to estimate linear trajectories
using either visual information provided through an HMD and/or vestibular sources when passively
moved on a cart. Here they found that when visual and vestibular inputs were concurrently avail-
able, estimates more closely approximated those of the purely vestibular estimates than the purely
visual estimates. The importance of body-based cues for traveled distance estimation has also been
revealed through a series of studies by Campos et al. (2007b). In these experiments, body-based
cues were provided either by: (1) natural walking in a fully tracked free walking space (propriocep-
tive and vestibular), (2) being passively moved by a robotic wheelchair (vestibular), or (3) walking
in place on a treadmill (proprioceptive). Distances were either presented through optic flow alone,
body-based cues alone, or both visual and body-based cues combined. In this case, combined cue
effects were again always observed, indicating that no modality was ever completely disregarded.
When visual and body-based cues were combined during walking, estimates more closely approxi-
mated the unisensory body-based estimates. When visual and inertial cues were combined during
passive movements, the estimates fell in between the two unisensory estimates.
Sun et al. (2004a) investigated the relative contributions of visual and proprioceptive information
by having participants compare two traveled distances experienced by riding a stationary bicycle
down a virtual hallway viewed in an HMD. It was concluded in this case that visual information
was predominantly used. It is important to note that when riding a bike, there is no absolute one-
to-one relationship between the metrics of visual space and those of the proprioceptive movements
because of the unknown scale of one pedal rotation (i.e., this would be depend on the gear, for
instance). Even under such conditions, combined cue effects were observed, such that, when visual
and proprioceptive cues were both available, estimates differed from those in either of the unimodal
conditions.
Cue combination effects have also been evaluated for speed perception during linear self-motion
(Durgin et al. 2005; Sun et al. 2003). For instance, Durgin et al. (2005) have reported that physi-
cally moving (i.e., walking or being passively moved) during visually simulated self-motion causes
a reduction in perceived visual speed compared to situations in which visually simulated self-
motion is experienced when standing stationary. The authors attribute this to the brain’s attempt
at optimizing its efficiency when presented with two, typically correlated cues with a predictable
relationship.
Slightly more complex paths consisting of two linear segments separated by a rotation of vary-
ing angles have also been used to understand how self-motion is integrated across different types
of movements. Typically, such tasks are used to answer questions about how accurately observers
are able to continuously update their position in space without using landmarks (i.e., perform path
integration). For instance, triangle completion tasks typically require participants to travel a linear
path, rotate a particular angle, travel a second linear path, and then return to home (or face the point
of origin). In a classic triangle completion study, Klatzky et al. (1998) demonstrated that during
purely visual simulations of rotational components of the movement (i.e., the turn), participants
Multimodal Integration during Self-Motion in Virtual Reality 615
were highly disoriented when attempting to face back to start compared to conditions under which
full-body information was present during the rotation. In fact, the physical turn condition resulted in
errors that were almost as low as the errors in the full, real walking condition in which body infor-
mation was available during the entire route (walk, turn, walk, face start). Unlike several of the rota-
tional self-motion experiments described above, vestibular inputs during the rotational component
of this triangle completion task appeared to be very important for perceived self-orientation. Again,
however, it emphasizes the importance of physical movement cues over visual cues in isolation.
Using a similar task, Kearns et al. (2002) demonstrated that pure optic flow information was
sufficient to complete a return-to-origin task, although the introduction of body-based cues (pro-
prioceptive and vestibular) when walking through the virtual environment led to decreased vari-
ability in responding. This was true regardless of the amount of optic flow that was available from
the surrounding environment, thus suggesting a stronger reliance on body-based cues. When using
a two-segment path reproduction task to compare moving via a joystick versus walking on the
Torus treadmill, Iwata and Yoshida (1999) reported a higher accuracy during actual walking on the
treadmill compared to when active control of self-motion was provided through the use of an input
device.
Chance et al. (1998) used a more demanding task in which participants were asked to travel
through a virtual maze and learn the locations of several objects as they moved. At the end of the
route, when prompted, participants turned and faced the direction of a particular target. Here, the
authors compared conditions in which participants actually walked through the maze (propriocep-
tive and vestibular inputs during translation and rotation), to that in which a joystick was used to
navigate the whole maze (vision alone), to that in which a joystick was used to translate and physical
rotations were provided (proprioceptive and vestibular inputs during rotation only). When physi-
cally walking errors were the lowest, when only visual information was available errors were the
highest, and when only physical rotations were possible, responses fell in between (although they
were not significantly different from the vision-only condition). Using similar conditions, Ruddle
and Lessels (2006, 2009) observed a comparable pattern of results when evaluating performances
on a search task in a room-sized virtual environment. Specifically, conditions in which participants
freely walked during their search resulted in highly accurate and efficient search performance,
when allowed to only physically rotate observers were less efficient, and even less efficient with only
visual information.
Waller and colleagues have evaluated questions related to multisensory navigation as they relate
to larger-scale self-motion perception and the acquisition and encoding of spatial representations.
For instance, they have considered whether the inertial information provided during passive move-
ments in a car contributes to the development of an accurate representation of a route beyond the
information already provided through dynamic visual inputs (Waller et al. 2003). They found that
inertial inputs did not significantly improve performance and even when the inertial cues were
not consistent with the visuals, instead of disorienting or distracting observers, there was in fact
no impact on spatial memory. Similarly, Waller and Greenauer (2007) asked participants to travel
along a long indoor route (about 480 ft), and then evaluated their ability to perform a variety of spa-
tial tasks. Although participants learned the route under different sensory conditions—by walking
with updated vision, by being passively moved with updated vision, or by viewing a visual simula-
tion of the same movement—there appeared to be no significant effects of cue availability (however,
see Waller et al. 2004). Overall, the less obvious role of body-based cues in these larger-scale, more
cognitively demanding tasks stands in contrast to the importance of body-based cues evidenced in
simpler self-motion updating tasks. As such, future work must help to reconcile these findings and
to form a more comprehensive model of multisensory self-motion in order to understand how the
scale of a space, the accumulation of self-motion information, and the demands of the task relate to
relative cue weighting.
Not only do the effects of cue combinations exhibit themselves through consciously produced
behaviors or responses in spatial tasks, but they can also be seen in other aspects of self-motion,
616 The Neural Bases of Multisensory Processes
including the characteristics of gait. For instance, Mohler et al. (2007a) investigated differences in
gait parameters such as walking speed, step length, and head-to-trunk angle when walking with
eyes open versus closed, and also when walking in a VE (wearing an HMD) versus walking in the
real world. It was found that participants walked slower and exhibited a shorter stride length when
walking with their eyes closed. During sighted walking while viewing the VE through the HMD,
participants walked slower and took smaller steps than when walking in the real world. Their head-
to-trunk angle was also smaller when walking in the VE, most likely due to the reduced vertical
FOV.
Similarly, Sheik-Nainar and Kaber (2007) evaluated different aspects of gait, such as speed,
cadence, and joint angles when walking on a treadmill. They evaluated the effects of presenting
participants with congruent and updated visuals (via an HMD projecting a simulated version of the
laboratory space) compared to stationary visuals (real-world laboratory space with reduced FOV to
approximate HMD). These two conditions were compared to natural, overground walking. Results
indicated that although both the treadmill conditions caused participants to walk slower and take
smaller steps, when optic flow was consistent with the walking speed, gait characteristics more
closely approximated that of overground walking.
Finally, although most of the work on multisensory self-motion perception has dealt specifically
with visual interactions with body-based cues, it is important to note that researchers have begun to
evaluate the impact of auditory cues on self-motion perception. For instance, Väljamäe et al. (2008)
have shown that sounds associated with self-motion through space, such as footsteps, can enhance
the perception of linear vection. Furthermore, Riecke et al. (2009) have shown that sounds produced
by a particular spatial location (i.e., by water flowing in a fountain) can enhance circular vection
when appropriately updated with the moving visuals.
In the case of transient cue conflicts, it is typically the case that such conflicts occur on a trial-by-
trial basis in an effort to avoid adaptation effects. In this case, the idea is ultimately to understand
the relative cue weighting of visual and body-based cues when combined under normal circum-
stances. For instance, Sun et al. (2003, 2004a) used this strategy in the aforementioned simulated
bike riding experiment as a way of dissociating the proprioceptive information provided by pedal-
ing, from the yoked optic flow information provided via an HMD. In a traveled distance comparison
task, they reported an overall higher weighting of visual information when the relation between the
two cues was constantly varied. However, the presence of proprioceptive information continued to
improve visually specified distance estimates, even when it was not congruent with the visuals (Sun
et al. 2004a). On the other hand, Harris et al. (2000) used a similar technique to examine the relative
contributions of visual–vestibular information to linear self-motion estimation over several meters
and found that observers estimated more closely approximated the distances specified by vestibular
cues than those specified by optic flow. Sun et al. (2003) also evaluated the relative contributions of
visual and proprioceptive information using a speed discrimination task while bike riding down a
virtual hallway. Here, they found that although both cues contributed to speed estimates, proprio-
ceptive information was in fact weighted higher.
For smaller scale, simulated full-body movements have also investigated visual–vestibular inte-
gration by presenting optic flow stimuli via a projection screen and presenting vestibular informa-
tion via a 6 degree-of-freedom motion platform (Butler et al. 2010; Fetsch et al. 2009; Gu et al.
2008). In this case, it has consistently been shown that the variances observed for the estimates in
the combined cue conditions are lower than estimates in either of the unisensory conditions. In the
series of traveled distance experiments by Campos et al. (2007b) described above, subtle cue con-
flicts were also created between visual and body-based cues (see also Kearns 2003). Here, incon-
gruencies were created by either changing the visual gain during physical movements or changing
the proprioceptive gain during walking (i.e., by changing the treadmill speed). Overall, the results
demonstrated a higher weighting of body-based cues during natural overground walking, a higher
weighting of proprioceptive information during treadmill walking, and a relatively equal weighting
of visual and vestibular cues during passive movement. These results were further strengthened by
the fact that the higher weighting of body-based cues during walking was unaffected by whether
visual or proprioceptive gain was manipulated.
The vast majority of the work evaluating relative cue weighting during self-motion perception
using cue conflict paradigms has considered how vision combines with different body-based cues.
Others have recently conducted some of the first experiments to use this technique for studying
proprioceptive–vestibular integration. In order to achieve this, they used the MPI circular treadmill
setup described above (see Figure 30.3). Because this treadmill setup consists of a handlebar that
can move independently of the treadmill disk, the relation between the handlebar speed and the disk
speed can be changed to provide different information to the two sensory systems.
Cue conflict techniques have also been used to evaluate the effect of changing cue relations on
various gait parameters. For instance, Prokop et al. (1997) asked participants to walk at a comfort-
able, yet constant speed on a self-driven treadmill. When optic flow was accelerated or decelerated
relative to the actual walking speed, unintentional modulations in walking speed were observed.
Specifically, when the visual speed increased, walking speeds decreased, whereas the opposite was
true for decreased visual speeds. Similarly, it has also been shown that walk-to-run and run-to-walk
transitions can also be unintentionally modified by providing a walking observer with different
rates of optic flow (Guerin and Bardy 2008; Mohler et al. 2007b). Again, as the rate of optic flow
is increased, the speed at which an observer will transition from running to walking will be lower,
whereas the opposite is true for decreased optic flow rates.
Another group of studies has used prolonged cue conflicts as a way of investigating sensory–
motor recalibration effects during self-motion. A classic, real-world multisensory recalibration
experiment was conducted by Rieser and colleagues (1995), in which an extended mismatch was
created between visual flow and body-based cues. Using a cleverly developed setup, participants
618 The Neural Bases of Multisensory Processes
walked on a treadmill at one speed, while it was pulled behind a tractor moving at either a faster or
slower speed. Consequently, the speed of the movement experienced motorically was either greater
or less than the speed of the visually experienced movement. After adaptation, participants walked
blindfolded to previewed visual targets. Results indicated that when the visual flow was slower than
the locomotor information participants overshot the target (relative to pretest), whereas when the
visual flow was faster than the locomotor information they undershot the target distance.
Although the approach used by Rieser et al. (1995) was ingenious, one can imagine that this
strategy can be accomplished much more easily, safely, and under more highly controlled circum-
stances by using simulation devices. Indeed, the results of Rieser et al.’s (1995) original study have
since been replicated and expanded upon using VR. This has been achieved by having participants
walk on a treadmill or within a tracked walking space while they experience either relatively faster
or slower visually perceived flow via an HMD or a large FOV projection display (Durgin et al. 2005;
Mohler et al. 2007c; Proffitt et al. 2003; Thompson et al. 2005). For instance, it has been shown that
adaptations that occur when subjects are walking through a VE on a treadmill transfer to a real-
world blind walking task (Mohler et al. 2007c). There is also some indication that the aftereffects
observed during walking on solid ground (tracked walking space) are larger than those observed
during treadmill walking (Durgin et al. 2005). Pick et al. (1999) have also shown similar recalibra-
tion effects for rotational self-motion.
In addition, with increasing sources of combined information, lower variance scores were observed.
Cheng et al. (2007) have also summarized some of the multisensory work in locomotion and spatial
navigation and evaluated how these findings fit within the context of Bayesian theoretical predic-
tions. Overall, there remains much important work to be done concerning the development of quan-
titative models describing the principles underlying multisensory self-motion perception.
There are also several clear and consistent perceptual errors that occur in VEs that do not occur
in the real world. For instance, although much research has now demonstrated that humans are very
good at estimating the distance between themselves and a stationary target in the real world (see
Loomis and Philbeck 2008 for a review), the same distance magnitudes are consistently underes-
timated in immersive VEs by as much 50% (Knapp and Loomis 2004; Loomis and Knapp 2003;
Thompson et al. 2004; Witmer and Kline 1998). This effect is not entirely attributable to poor visual
graphics (Thompson et al. 2004) and although some groups have reported a distance compression
effect when the FOV is reduced and the viewer is stationary (Witmer and Kline 1998), others have
shown that when head movements are allowed under restricted FOV conditions, these effects are
not observed (Creem-Regehr et al. 2005; Knapp and Loomis 2004). Strategies have been used to
reduce this distance compression effect, for instance, by providing various forms of feedback when
interacting in the VE (Mohler et al. 2007c; Richardson and Waller 2005; Waller and Richardson
2008), yet the exact cause of this distance compression remains unknown.
Another, less-studied perceptual difference between virtual and real environments that has also
been reported, is the misperception of visual speed when walking in VEs (Banton et al. 2005;
Durgin et al. 2005). For instance, Banton et al. (2005) required participants to match their visual
speed (presented via an HMD) to their walking speed as they walked on a treadmill. When facing
forward during walking, visual speeds were increased by about 1.6× that of the walking speed in
order to appear equal.
When motion tracking is used to visually update an observer’s position in the VE, there is also
the concern that temporal lag has the potential to create unintentional sensory conflict, disrupt the
feeling of presence, and cause cybersickness. There is also some indication that characteristics of
gait change when walking overground in a VE compared to the real world (Mohler et al. 2007a),
and walking on a treadmill in a VE is associated with increased stride frequency (Sheik-Nainar
and Kaber 2007). It is yet unknown how such changes in physical movement characteristics might
impact particular aspects of self-motion perception.
In addition to lower-level perceptual limitations of VEs, there are also higher-level cognitive
effects that can affect behavior. For instance, there is often a general awareness when interacting
within a VE that one is in fact engaging with artificially derived stimuli. Observers might react dif-
ferently to simulated scenarios, for instance, by placing a lower weighting on sensory information
that they know to be simulated. Furthermore, when visually or passively presented movements defy
what is physically possible in the real world, this information might also be treated differently. In
cue conflict situations, it has also been shown that relative cue weighting during self-motion can
change as a function of whether an observer is consciously aware of any cue conflicts that are intro-
duced (Berger and Bülthoff 2009).
There is also a discord between the perceptual attributes of the virtual world that an observer is
immersed in and the knowledge of the real world that they are physically located within. Evidence
that this awareness might impact behavior comes from findings indicating that, during a homing
task in a VE, knowledge of the size of the real-world room impacts navigational behaviors in the VE
(Nico et al. 2002). Specifically, when participants knowingly moved within in a smaller real-world
room they undershot the origin in the VE compared to when they were moving within a larger real-
world space (even though the VEs were of identical size).
Overall, researchers should ideally strive to exploit the advantages offered by the various avail-
able interfaces while controlling for the specific limitations through the use of others. Furthermore,
whenever possible, it is best to take the reciprocally informative approach of comparing and coordi-
nating research conducted in VR with that taking place in real-world testing scenarios.
widely adopted for use in areas as diverse as surgical, aviation, and rescue training, architectural
design, driving and flight simulation, athletic training and evaluation, psychotherapy, gaming, and
entertainment. Therefore, not only is it important to understand cue integration during relatively
simple tasks, but it is also imperative to understand these perception–action loops during more
complex, realistic, multifaceted behaviors. Although most multisensory research has focused on the
interaction of only two sensory cues, most behaviors occur in the context of a variety of sensory
inputs, and therefore understanding the interaction of three or more cues (e.g., Bresciani et al. 2008)
during ecologically valid stimulus conditions is also important. These issues are particularly criti-
cal considering the possibly grave consequences of misperceiving spatial properties or incorrectly
adapting to particular stimulus conditions. Here, we briefly consider two applied fields that we feel
are of particular interest as they relate to multisensory self-motion perception: helicopter flight
behavior and locomotor rehabilitation.
Helicopter flight represents one of the most challenging multisensory control tasks accomplished
by humans. The basic science of helicopter flight behavior is extremely complex and the effects
of specific flight simulation training on real-world performance (i.e., transfer of training) remain
poorly understood. Because several misperceptions are known to occur during helicopter flight, it
is important to first understand the possible causes of such misperceptions in a way that will allow
for more effective training procedures. One example of such a misperception that can occur when
reliable visual information is not available during flight is the somatogravic illusion. In this case,
the inertial forces during accelerations of the aircraft and those specifying gravitational forces may
become confused, thus causing an illusion of tilt during purely linear accelerations, often resulting
in devastating outcomes.
Several studies have been conducted using the MPI Motion Simulator by outfitting it with a heli-
copter cyclic stick and various visualization devices in order to create a unique and customizable
flight simulator. For instance, nonexpert participants were trained on the simulator to acquire the
skills required to stabilize a helicopter during a hovering task (Nusseck et al. 2008). In this case,
the robot was programmed to move in a way that mimicked particular helicopter dynamics and the
participants’ task was to hover in front of real-world targets. Two helicopter sizes were simulated:
one that was light and agile and another that was heavy and inert. Participants were initially trained
on one of the two helicopters and subsequently their performance was tested when flying the second
helicopter. This method was used to reveal the novice pilots’ ability to transfer the general flight
skills they learned on one system, to another system with different dynamics. The results indicated
that participants were able to effectively transfer the skills obtained when training in the light heli-
copter to the heavy helicopter, whereas the opposite was not true. Understanding these transfer of
training effects are important for assessing the effectiveness of both, training in simulators and
flying in actual aircraft, and also for understanding the subtle differences of flying familiar versus
unfamiliar aircraft—something almost all pilots are at one time faced with.
Another applied area that would benefit greatly from understanding multisensory self-motion
perception is the diagnosis and rehabilitative treatment of those with disabling injury or illness. A
significant percentage of the population suffers from the locomotor consequences of Parkinson’s
disease, stroke, acquired brain injuries, and other age-related conditions. Oftentimes rehabilitation
therapies consist of passive range of motion tasks (through therapist manipulation or via robotic-
assisted walking) or self-initiated repetitive action tasks. In the case of lower-limb sensory–motor
disabilities, one rehabilitative technique is to have patients walk on a treadmill as a way of actively
facilitating and promoting the movements required for locomotion. The focus of such techniques,
however, is exclusively on the motor system, with very little consideration given to the multimodal
nature of locomotion. In fact, treadmill walking actually causes a conflict between proprioceptive
information, which specifies that the person is moving, and visual information, which indicates a
complete lack of self-motion.
Considering that one of the key factors in the successful learning or relearning of motor behaviors
is feedback, a natural source of feedback can be provided by the visual flow information obtained
622 The Neural Bases of Multisensory Processes
during walking. As such, incorporating visual feedback into rehabilitative treadmill walking thera-
pies could prove to be of great importance. Actively moving within a VE is also likely to be highly
rewarding for individuals lacking stable mobility and thus may increase levels of motivation in addi-
tion to recalibrating the perceptual–motor information.
Although some work has been done to evaluate multimodal effects in upper-limb movement
recovery, this is not something that has been investigated as thoroughly for full body locomotor
behavior such as walking. A group that has evaluated such questions is Fung et al. (2006), who
have used a self-paced treadmill, mounted on a small motion platform coupled with a projection
display as a way of adapting gait behavior in stroke patients. They found that, by training with
this multimodal system, patients showed clear locomotor improvements such as increases in gait
speed and the ability to more flexibly adapt their gait when faced with changes in ground terrain.
Rehabilitation research and treatment programs can benefit greatly from the flexibility, safety, and
high level of control offered by VR and simulator systems. As such, technologies that offer multi-
modal stimulation and control are expected to have a major impact in the future [e.g., see, Toronto
Rehabilitation Institute’s Challenging Environment Assessment Laboratory (CEAL); http://www
.cealidapt.com].
30.6 SUMMARY
This chapter has emphasized the closed-loop nature of human locomotor behavior by evaluating
studies that preserve the coupling between perception and action during self-motion percep-
tion. This combined cue approach to understanding full body movements through space offers
unique insights into multisensory processes as they occur over space and time. Future work in
this area should aim to define the principles underlying human perceptual and cognitive pro-
cesses in the context of realistic sensory information. Using simulation techniques also allows
for a reciprocally informative approach of using VR as a useful tool for understanding basic
science questions related to the human observer in action, while also utilizing the results of this
research to provide informed methods of improving VR technologies. As such, the crosstalk
between applied fields and basic science research approaches should be strongly encouraged and
facilitated.
ACKNOWLEDGMENTS
We would like to thank members of the MPI Cyberneum group, past and present (http://www
.cyberneum.com/People.html), Jan Souman, John Butler, and Ilja Frissen for fruitful discussions.
We also thank Simon Musall for invaluable assistance and two anonymous reviewers for helpful
comments.
REFERENCES
Akyüz, A. O., R. W. Fleming, B. E. Riecke, E. Reinhard, and H. H. Bülthoff. 2007. Do HDR displays support
LDR content: A psychophysical evaluation. ACM Trans Graphics 26(3:38): 1–7.
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol
14: 257–262.
Allen, G. L., K. C. Kirasic, M. A. Rashotte, and D. B. M. Haun. 2004. Aging and path integration skill:
Kinesthetic and vestibular contributions to wayfinding. Percept Psychophys 66(1): 170–179.
Bakker, N. H., P. J. Werkhoven, and P. O. Passenier. 1999. The effects of proprioceptive and visual feedback on
geographical orientation in virtual environments. Pres Teleop Virtual Environ 8: 36–53.
Banton, T., J. Stefanucci, F. Durgin, A. Fass, and D. Proffitt. 2005. The perception of walking speed in a virtual
environment. Pres Teleop Virtual Environ 14(4): 394–406.
Berger, D. R., J. Schulte-Pelkum, and H. H. Bülthoff. 2010. Simulating believable forward accelerations on a
Stewart motion platform. ACM Trans Appl Percept 7(1:5): 1–27.
Multimodal Integration during Self-Motion in Virtual Reality 623
Berger, D. R., and H. H. Bülthoff. 2009. The role of attention on the integration of visual and inertial cues. Exp
Brain Res 198(2–3): 287–300.
Berthoz, A., I. Israël, P. Georges-François, R. Grasso, and T. Tsuzuku. 1995. Spatial memory of body linear
displacement: What is being stored? Science 269: 95–98.
Bertin, R. J. V., and A. Berthoz. 2004. Visuo-Vestibular interaction in the reconstruction of travelled trajecto-
ries. Exp Brain Res 154: 11–21.
Beykirch, K., F. M. Nieuwenhuizen, H. J. Teufel, H.-G. Nusseck, J. S. Butler, and H. H. Bülthoff. 2007.
Control of a lateral helicopter sidestep maneuver on an anthropomorphic robot. Proceedings of the AIAA
Modeling and Simulation Technologies Conference and Exhibit, 1–8. American Institute of Aeronautics
and Astronautics, Reston, VA, USA.
Blake, A., H. H. Bülthoff, and D. Sheinberg. 1993. Shape from texture: Ideal observers and human psychophys-
ics. Vis Res 33: 1723–1737.
Bremmer, F., and M. Lappe. 1999. The use of optical velocities for distance discrimination and reproduction
during visually simulated self motion. Exp Brain Res 127: 33–42.
Bresciani, J.-P., F. Dammeier, and M. O. Ernst, 2008. Trimodal integration of visual, tactile and auditory signals
for the perception of sequences of events. Brain Res Bull 75(6): 753–760.
Bülthoff, H. H., and H. A. Mallot. 1988. Integration of depth modules: Stereo and shading. J Opt Soc Am 5:
1749–1758.
Bülthoff, H. H., and H.-J. van Veen. 2001. Vision and action in virtual environments: Modern psychophysics.
In Spatial cognition research. Vision and attention, ed. M. L. Jenkin and L. Harris, 233–252. New York:
Springer Verlag.
Bülthoff, H. H., and A. Yuille. 1991. Bayesian models for seeing shapes and depth. Comments Theor Biol 2(4):
283–314.
Bülthoff, H. H., and A. L. Yuille. 1996. A Bayesian framework for the integration of visual modules. In Attention
and Performance XVI: Information Integration in Perception and Communication, ed. J. McClelland and
T. Inui, 49–70. Cambridge, MA: MIT Press.
Butler, J. S., S. T. Smith, J. L. Campos, and H. H. Bülthoff. 2010. Bayesian integration of visual and vestibular
signals for heading. J Vis 10(11): 23, 1–13.
Campos, J. L., J. S. Butler, B. Mohler, and H. H. Bülthoff. 2007b. The contributions of visual flow and locomo-
tor cues to walked distance estimation in a virtual environment. Appl Percept Graphics Vis 4: 146.
Campos, J. L., P. Byrne, and H.-J. Sun. 2010. Body-based cues trump vision when estimating walked distance.
Eur J Neurosci 31: 1889–1898.
Campos, J. L., H.-G. Nusseck, C. Wallraven, B. J. Mohler, and H. H. Bülthoff. 2007a. Visualization and
(mis) perceptions in virtual reality. Tagungsband 10. Proceedings of Workshop Sichtsysteme, ed. R. Möller
and R. Shaker, 10–14. Aachen, Germany.
Campos, J. L., J. Siegle, B. J. Mohler, H. H. Bülthoff and J. M. Loomis. 2009. Imagined self-motion dif-
fers from perceived self-motion: Evidence from a novel continuous pointing method. PLoS ONE 4(11):
e7793. doi:10.1371/journal.pone.0007793.
Chance, S. S., F. Gaunet, A. C. Beall, and J. M. Loomis. 1998. Locomotion mode affects the updating of objects
encountered during travel: The contribution of vestibular and proprioceptive inputs to path integration.
Pres Teleop Virtual Environ 7(2): 168–178.
Cheng, K., S. Shettleworth, J. Huttenlocher, and J. J. Rieser. 2007. Bayesian integrating of spatial information.
Psychol Bull 133(4): 625–637.
Cheung, A., S. Zhang, C. Stricker, and M. V. Srinivasan. 2007. Animal navigation: The difficulty of moving in
a straight line. Biol Cybern 97: 47–61.
Christensen���������������������������������������������������������������������������������������������������
, R., J. M. Hollerbach, Y. Xu, and S. Meek. 2000. �������������������������������������������������
Inertial force feedback for the Treadport locomo-
tion interface. Pres Teleop Virtual Environ 9: 1–14.
Creem-Regehr, S. H., P. Willemsen, A. A. Gooch, and W. B. Thompson. 2005. The influence of restricted
viewing conditions on egocentric distance perception: Implications for real and virtual environments.
Perception 34(2): 191–204.
Cruz-Neira, C., T. A. Sandin, and R. V. DeFantini. 1993. Surround screen projection-based virtual reality: The
design and implementation of the cave. Proc SIGGRAPH, 135–142.
Darken, R. P., W. R. Cockayne, and D. Carmein. 1997. The omni-directional treadmill: A locomotion device
for virtual worlds. Proceedings of the ACM User Interface Software and Technology, Banff, Canada,
October 14–17, 213–221.
Dichgans, J., and T. Brandt. 1978. Visual–vestibular interaction: Effects on self-motion perception and postural
control. In Perception, Vol. VIII, Handbook of sensory physiology, ed. R. Held, H. W. Leibowitz, and H.
L. Teuber, 756–804. Berlin: Springer.
624 The Neural Bases of Multisensory Processes
Durgin, F. H., A. Pelah, L. F. Fox et al. 2005. Self-motion perception during locomotor recalibration: More than
meets the eye. J Exp Psychol Hum Percept Perform 31: 398–419.
Durgin, F. H., M. Akagi, C. R. Gallistel, and W. Haiken. 2009. The precision of locomotor odometry in humans.
Exp Brain Res 193(3): 429–436.
D’Zmura�������������������������������������������������������������������������������������������������
, M., P. Colantoni, and G. Seyranian. 2000. Virtual
�����������������������������������������������������
environments with four or more spatial dimen-
sions. Pres Teleop Virtual Environ 9(6): 616–631.
Ellard, C. G., and S. C. Shaughnessy. 2003. A comparison of visual and non-visual sensory inputs to walked
distance in a blind-walking task. Perception 32(5): 567–578.
Elliott, D. 1986. Continuous visual information may be important after all: A failure to replicate Thomson. J
Exp Psychol Hum Percept Perform 12: 388–391.
Engel, D., C. Curio, L. Tcheang, B. J. Mohler, and H. H. Bülthoff. 2008. A psychophysically calibrated control-
ler for navigating through large environments in a limited free-walking space. In Proceedings of the 2008
ACM Symposium on Virtual Reality Software and Technology, ed. S. Feiner, D. Thalmann, P. Guitton, B.
Fröhlich, E. Kruijff, M. Hachet, 157–164. New York: ACM Press.
Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415: 429–433.
Ernst, M. O., and H. H. Bülthoff. 2004. Merging the senses into a robust percept. Trends Cogn Sci 8: 162–169.
Fetsch, C. R., A. H. Turner, G. C. DeAngelis, and D. E. Angelaki. 2009. Dynamic reweighting of visual and
vestibular cues during self-motion perception. J Neurosci 29(49): 15601–15612.
Frenz, H., F. Bremmer, and M. Lappe. 2003. Discrimination of travel distances from “situated” optic flow. Vis
Res 43(20): 2173–2183.
Frenz, H., and M. Lappe. 2005. Absolute travel distance from optic flow. Vis Res 45(13): 1679–1692.
Fukusima, S. S., J. M. Loomis, and J. A. DaSilva. 1997. Visual perception of egocentric distance as assessed by
triangulation. J Exp Psychol Hum Percept Perform 23: 86–100.
Fung, J., C. L. Richards, F. Malouin, B. J. McFadyen, and A. Lamontagne. 2006. A treadmill and motion
coupled virtual reality system for gait training post-stroke. Cyberpsych Behav 9(2): 157–162.
Gu, Y., D. E. Angelaki, and G. C. Deangelis. 2008. Neural correlates of multisensory cue integration in macaque
MSTd. Nat Neurosci 11(10): 1201–1210.
Guerin, P., and B. G. Bardy. 2008. Optical modulation of locomotion and energy expenditure at preferred tran-
sition speed. Exp Brain Res 189: 393–402.
Harris, L. R., M. Jenkin, and D. C. Zikovitz. 2000. Visual and non-visual cues in the perception of linear self-
motion. Exp Brain Res 135: 12–21.
Hettinger, L. J. 2002. Illusory self-motion in virtual environments. In Handbook of virtual environments, ed.
K. M. Stanney, 471–492. Hillsdale, NJ: Lawrence Erlbaum.
Hollerbach, J. M., Y. Xu, R. Christensen, and S. C. Jacobsen. 2000. Design specifications for the second gen-
eration Sarcos Treadport locomotion interface Haptics Symposium, Proc. ASME Dynamic Systems and
Control Division, DSC-Vol. 69-2, 1293–1298, Orlando, November.
Howard, I. P. 1986. The perception of posture, self-motion, and the visual vertical. In Sensory processes and
perception, Vol. I, Handbook of human perception and performance, ed. K. R. Boff, L. Kaufman, and
J. P. Thomas, 18.1–18.62, New York: Wiley.
Israël, I., and A. Berthoz. 1989. Contributions of the otoliths to the calculation of linear displacement.
J Neurophysiol 62(1): 247–263.
Israël, I., R. Grasso, P. Georges-Francois, T. Tsuzuku, and A. Berthoz. 1997. Spatial memory and path integration
studied by self-driven passive linear displacement: I. Basic properties. J Neurophysiol 77: 3180–3192.
Ivanenko, Y. P., R. Grasso, I. Israël, and A. Berthoz. 1997. The contributions of otoliths and semicircular
canals to the perception of two-dimensional passive whole-body motion in humans. J Physiol 502(1):
223–233.
Iwata, H. 1999. Walking about virtual environments on an infinite floor. IEEE Virtual Real 13–17, March.
Iwata, H., and Y. Yoshida. 1999. Path reproduction tests using a torus treadmill. Pres Teleop Virtual Environ
8(6): 587–597.
Jürgens, R., and W. Becker. 2006. Perception of angular displacement without landmarks: Evidence for
Bayesian fusion of vestibular, optokinetic, podokinesthetic, and cognitive information. Exp Brain Res
174(3): 528–43.
Jürgens, R., T. Boß, and W. Becker. 1999. Estimation of self-turning in the dark: Comparison between active
and passive rotation. Exp Brain Res 128: 491–504.
Kearns, M. J., W. H. Warren, A. P. Duchon, and M. J. Tarr. 2002. Path integration from optic flow and body
senses in a homing task. Perception 31: 349–374.
Multimodal Integration during Self-Motion in Virtual Reality 625
Kearns, M. J. 2003. The roles of vision and body senses in a homing task: The visual environment matters.
Unpublished doctoral thesis, Brown University.
Klatzky, R. L., J. M. Loomis, A. C. Beall, S. S. Chance, and R. G. Golledge. 1998. Spatial updating of self-
position and orientation during real, imagined, and virtual locomotion. Psychol Sci 9(4): 293–298.
Knapp, J. M., and J. M. Loomis. 2004. Limited field of view of head-mounted displays is not the cause of dis-
tance underestimation in virtual environments. Pres Teleop Virtual Environ 13(5): 572–577.
Knill, D. C., and J. A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judg-
ments of surface slant? Vis Res 43: 2539–2558.
Kording, K. P., and D. M. Wolpert. 2004. Bayesian integration in sensorimotor learning. Nature 427(15):
244–247.
Larish, J. F., and J. M. Flach. 1990. Sources of optical information useful for perception of speed of rectilinear
self-motion. J Exp Psychol Hum Percept Perform 16: 295–302.
Lathrop, W. B., and M. K. Kaiser. 2002. Perceived orientation in physical and virtual environments: Changes
in perceived orientation as a function of idiothetic information available. Pres Teleop Virtual Environ
11(1): 19–32.
Laurens, J., and J. Droulez. 2007 Bayesian processing of vestibular information. Biol Cybern 96: 389–404.
Lee, D. N. 1976. Theory of visual control of braking based on information about time-to-collision. Perception
5(4): 437–459.
Lee, D. N., and E. Aronson. 1974. Visual proprioceptive control of standing in human infants. Percept
Psychophys 15(3): 529–532.
Lehmann, A., M. Vidal, and H. H. Bülthoff. 2008. ��������������������������������������������������������������
A high-end virtual reality setup for the study of mental rota-
tions. Pres Teleop Virtual Environ 17(4): 365–375.
Lestienne, F., J. Soechting, and A. Berthoz. 1977. Postural readjustments induced by linear vection of visual
scenes. Exp Brain Res 28(3–4): 363–384.
Loomis, J. M., J. A. Da Silva, N. Fujita, and S. S. Fukusima. 1992. Visual space perception and visually directed
action. J Exp Psychol Hum Percept Perform 18: 906– 921.
Loomis, J. M., J. J. Blascovich, and A. C. Beall. 1999. Immersive virtual environment technology as a basic
research tool in psychology. Behav Res Methods Instrum Comp 31(4): 557–564.
Loomis, J. M., and J. M. Knapp. 2003. Visual perception of egocentric distance in real and virtual environ-
ments. In Virtual and adaptive environments, ed. L. J. Hettinger and M. W. Haas, 21–46. Mahwah, NJ:
Erlbaum.
Loomis, J. M., and J. W. Philbeck. 2008. Measuring perception with spatial updating and action. In Embodiment,
ego-space and action, ed. R. L. Klatzky, M. Behrmann, and B. MacWhinney, 1–42. Mahwah, NJ:
Erlbaum.
MacNeilage, P. R., M. S. Banks, D. R. Berger, and H. H. Bülthoff. 2007. A Bayesian model of the disambigu-
ation of gravitoinertial force by visual cues. Exp Brain Res 179: 263–290.
Meehan, M., S. Razzaque, B. Insko, M. Whitton, and F. P. Brooks. 2005. Review of four studies on the use of
physiological reaction as a measure of presence in stressful Virtual Environments. Appl Psychophysiol
Biofeedback 30(3): 239–258.
Meilinger, T., B. E. Riecke, and H. H. Bülthoff. 2007. Orientation
�����������������������������������������������������
specificity in long-term memory for envi-
ronmental spaces. Proceedings of the Cognitive Sciences Society, Nashville, Tennessee, USA, August
1–4, 479–484.
Meilinger, T., M. Knauff, and H. H. Bülthoff. 2008. Working memory in wayfinding: A dual task experiment in
a virtual city. Proc Cog Sci 32(4): 755–770.
Mittelstaedt, M. L., and S. Glasauer. 1991. Idiothetic navigation in gerbils and humans. Zool J Physiol 95:
427–435.
Mittelstaedt, M. L., and H. Mittelstaedt. 1996. The influence of otoliths and somatic graviceptors on angular
velocity estimation. J Vestib Res 6(5): 355–366.
Mittelstaedt, M. L., and H. Mittelstaedt. 2001. Idiothetic navigation in humans: Estimation of path length. Exp
Brain Res 13: 318–332.
Mohler, B. J., J. L. Campos, M. Weyel, and H. H. Bülthoff. 2007a. Gait parameters while walking in a head-
mounted display virtual environment and the real world. Proc Eurographics, 85–88.
Mohler, B. J., W. B. Thompson, S. H. Creem-Regehr, H. L. Pick, and W. H. Warren. 2007b. Visual flow influ-
ences gait transition speed and preferred walking speed. Exp Brain Res 181(2): 221–228.
Mohler, B. J., W. B. Thompson, S. H. Creem-Regehr, P. Willemsen, H. L. Pick, and J. J. Rieser. 2007c.
Calibration of locomotion due to visual motion in a treadmill-based virtual environment. ACM Trans
Appl Percept 4(1): 20–32.
626 The Neural Bases of Multisensory Processes
Müller, P., P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool. 2006. Procedural modeling of buildings. Proc
ACM SIGGRAPH 2006/ACM Transactions on Graphics (TOG), 25(3): 614–623. New York: ACM Press.
Nico, D., I. Israël, and A. Berthoz. 2002. Interaction of visual and idiothetic information in a path completion
task. Exp Brain Res 146: 379–382.
Nusseck, H.-G., H. J. Teufel, F. M. Nieuwenhuizen, and H. H. Bülthoff. 2008. Learning system dynamics:
Transfer of training in a helicopter hover simulator. Proc AIAA Modeling and Simulation Technologies
Conference and Exhibit, 1–11, AIAA, Reston, VA, USA.
Peck, T. C., M. C. Whitton, and H. Fuchs. 2008. Evaluation of reorientation techniques for walking in
large virtual environments. Proceedings of IEEE Virtual Reality, Reno, NV, 121–128. IEEE Computer
Society.
Pick, H. L., D. H. Warren, and J. C. Hay. 1969. Conflict in judgments of spatial direction. Percept Psychophys
6(4): 203, 1969.
Pick, H. L., D. Wagner, J. J. Rieser, and A. E. Garing. 1999. The recalibration of rotational locomotion. J Exp
Psychol Hum Percept Perform 25(5): 1179–1188.
Proffitt, D. R., J. Stefanucci, T. Banton, and W. Epstein. 2003. The role of effort in perceiving distance. Psychol
Sci 14(2): 106–112.
Prokop, T., M. Schubert, and W. Berger. 1997. Visual influence on human locomotion. Exp Brain Res 114:
63–70.
Razzaque, S., Z. Kohn, and M. Whitton. 2001. Redirected walking. Proceedings of Eurographics, 289–294.
Manchester, UK.
Razzaque, S., D. Swapp, M. Slater, M. C. Whitton, and A. Steed. 2002. Redirected walking in place. Proceedings
of Eurographics, 123–130.
Redlick, F. P., M. Jenkin, and L. R. Harris. 2001. Humans can use optic flow to estimate distance of travel. Vis
Res 41: 213–219.
Richardson, A. R., and D. Waller. 2005. The effect of feedback training on distance estimation in Virtual
Environments. Appl Cogn Psychol 19: 1089–1108.
Riecke, B. E., D. W. Cunningham, and H. H. Bülthoff. 2006. Spatial updating in virtual reality: The sufficiency
of visual information. Psychol Res 71(3): 298–313.
Riecke, B. E., H. A. H. C. van Veen, and H. H. Bülthoff, 2002. Visual homing is possible without landmarks—
A path integration study in virtual reality. Pres Teleop Virtual Environ 11(5): 443–473.
Riecke, B. E., A. Väljamäe, and J. Schulte-Pelkum. 2009. Moving sounds enhance the visually-induced self-
motion illusion (circular vection) in Virtual Reality. ACM Trans Appl Percept 6(2): 1–27.
Rieser, J. J., D. H. Ashmead, C. R. Talor, and G. A. Youngquist. 1990. Visual perception and the guidance of
locomotion without vision to previously seen targets. Perception 19(5): 675–689.
Rieser, J. J., H. L. Pick, D. H. Ashmead, and A. E. Garing. 1995. Calibration of human locomotion and models
of perceptual motor organization. J Exp Psychol Hum Percept Perform 21(3): 480–497.
Ruddle, R. A., and S. Lessels. 2006. For efficient navigational search humans require full physical movement
but not a rich visual scene. Psychol Sci 17: 460–465.
Ruddle, R. A., and S. Lessels. 2009. The benefits of using a walking interface to navigate virtual environments.
ACM Trans Comput-Hum Interact 16(1): 1–18.
Rushton, S. K., J. M. Harris, and M. R. Lloyd. 1998. Guidance of locomotion on foot uses perceived target
location rather than optic flow. Curr Biol 8(21): 1191–1194.
Schnapp, B., and W. Warren. 2007. Wormholes in Virtual Reality: What spatial knowledge is learned for navi-
gation? J Vis 7(9): 758, 758a.
Seidman, S. H. 2008 Translational motion perception and vestiboocular responses in the absence of non-inertial
cues. Exp Brain Res 184: 13–29.
Sheik-Nainar, M. A., and D. B. Kaber. 2007. The utility of a Virtual Reality locomotion interface for studying
gait behavior. Hum Factors 49(4): 696–709.
Sholl, M. J. 1989. The relation between horizontality and rod-and-frame and vestibular navigational perfor-
mance. J Exp Psychol Learn Mem Cogn 15: 110–125.
Siegle, J., J. L. Campos, B. J. Mohler, J. M. Loomis, and H. H. Bülthoff. 2009. Measurement of instantaneous
perceived self-motion using continuous pointing. Exp Brain Res 195(3): 429–444.
Simons, D. J., and R. F. Wang. 1998. Perceiving real-world viewpoint changes. Psychol Sci 9: 315–320.
Souman, J. L., P. Robuffo Giordano, I. Frissen, A. De Luca, and M. O. Ernst. 2010. Making virtual walking real:
Perceptual evaluation of a new treadmill control algorithm. ACM Trans Appl Percept 7(2:11): 1–14.
Souman, J. L., I. Frissen, M. Sreenivasa, and M. O. Ernst. 2009. Walking straight into circles. Curr Biol 19(18):
1538–1542.
Multimodal Integration during Self-Motion in Virtual Reality 627
Sun, H.-J., A. J. Lee, J. L. Campos, G. S. W. Chan, and D. H. Zhang. 2003. Multisensory integration in speed
estimation during self-motion. Cyberpsychol Behav 6(5): 509–518.
Sun, H.-J., J. L. Campos, and G. S. W. Chan. 2004a. Multisensory integration in the estimation of relative path
length. Exp Brain Res 154(2): 246–254.
Sun, H.-J., J. L. Campos, G. S. W. Chan, M. Young, and C. Ellard. 2004b. The contributions of static visual
cues, nonvisual cues, and optic flow in distance estimation. Perception 33: 49–65.
Tarr, M. J., and W. H. Warren. 2002. Virtual reality in behavioral neuroscience and beyond. Nat Neurosci 5:
1089–1092.
Teufel, H. J., H.-G. Nusseck, K. A. Beykirch, J. S. Butler, M. Kerger, and H. H. Bülthoff. 2007. MPI Motion
Simulator: Development and analysis of a novel motion simulator. Proc AIAA Modeling and Simulation
Technologies Conference and Exhibit, 1–11, American Institute of Aeronautics and Astronautics, Reston,
VA, USA.
Thompson, W. B., P. Willemsen, A. A. Gooch, S. H. Creem-Regehr, J. M. Loomis, and A. C. Beall. 2004. Does
the quality of the computer graphics matter when judging distances in visually immersive environments?
Pres Teleop Virtual Environ 13(5): 560–571.
Thompson, W. B., S. H. Creem-Regehr, B. J. Mohler, and P. Willemsen. 2005. Investigations on the interac-
tions between vision and locomotion using a treadmill Virtual Environment. Proc. SPIE/IS&T Human
Vision & Electronic Imaging Conference, January.
Thomson, J. A. 1983. Is continuous visual monitoring necessary in visually guided locomotion? J Exp Psychol
Hum Percept Perform 9: 427–443.
Tristano, D., J. M. Hollerbach, and R. Christensen. 2000. Slope display on a locomotion interface. In
Experimental Robotics VI, ed. P. Corke and J. Trevelyan, 193–201. London: Springer-Verlag.
Väljamäe, A., P. Larsson, D. Västfjäll, and M. Kleiner. 2008. Sound representing self-motion in Virtual
Environments enhances linear vection. Pres Teleop Virtual Environ 17(1): 43–56.
Waller, D., J. M. Loomis, and D. B. M. Haun. 2004. Body-based senses enhance knowledge of directions in
large-scale environments. Psychon Bull Rev 11(1): 157–163.
Waller, D., J. M. Loomis, and S. D. Steck. 2003. Inertial cues do not enhance knowledge of environmental
layout. Psychon Bull Rev 10: 987–993.
Waller, D., E. Bachmann, E. Hodgson, and A. C. Beall. 2007. The HIVE: A Huge Immersive Virtual Environment
for research in spatial cognition. Behav Res Methods 39: 835–843.
Waller, D., and N. Greenauer. 2007. The role of body-based sensory information in the acquisition of enduring
spatial representations. Psychol Res 71(3): 322–332.
Waller, D., and A. R. Richardson. 2008. Correcting distance estimates by interacting with immersive virtual
environments: Effects of task and available sensory information. J Exp Psychol Appl 14(1): 61–72.
Warren, W. H., and D. J. Hannon. 1998. Direction of self-motion is perceived from optical-flow. Nature 336:
162–163.
Warren, W. H., B. A. Kay, W. D. Zosh, A. P. Duchon, and S. Sahuc. 2001. Optic flow is used to control human
walking. Nat Neurosci 4: 213–216.
Warren, W. H., and K. J. Kurtz. 1992. The role of central and peripheral vision in perceiving the direction of
self-motion. Percept Psycho 51(5): 443–454.
Welch, R. B., and D. H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychol
Bull 88(3): 638–667.
Welchman, A. E., J. M. Lam, and H. H. Bülthoff. 2008. Bayesian motion estimation accounts for a surprising
bias in 3D vision. Proc Natl Acad Sci U S A 105(33): 12087–12092.
Wilkie, R. M., and J. P. Wann. 2005. The role of visual and nonvisual information in the control of locomotion.
J Exp Psychol Hum Percept Perform 31(5): 901–911.
Witmer, B. G., and P. B. Kline. 1998. Judging perceived and traversed distance in virtual environments. Pres
Teleop Virtual Environ 7: 144–167.
Wittlinger, M., R. Wehner, and H. Wolf. 2006. The ant odometer: Stepping on stilts and stumps. Science 312:
1965–1967.
Yong, N. A., G. D. Paige, and S. H. Seidman. 2007. Multiple sensory cues underlying the perception of transla-
tion and path. J Neurophysiol 97: 1100–1113.
31 Visual–Vestibular Integration
for Self-Motion Perception
Gregory C. DeAngelis and Dora E. Angelaki
CONTENTS
31.1 The Problem of Self-Motion Perception and the Utility of Visual–Vestibular
Integration.............................................................................................................................. 629
31.1.1 Optic Flow................................................................................................................. 630
31.1.2 Vestibular Signals...................................................................................................... 630
31.2 Potential Neural Substrates for Visual–Vestibular Integration.............................................. 631
31.3 Heading Tuning and Spatial Reference Frames in Area MSTd............................................ 633
31.3.1 Heading Tuning......................................................................................................... 633
31.3.2 Reference Frames...................................................................................................... 634
31.4 The Neuronal Combination Rule and Its Dependence on Cue Reliability............................ 636
31.5 Linking Neuronal and Perceptual Correlates of Multisensory Integration........................... 639
31.5.1 Behavioral Results.....................................................................................................640
31.5.2 Neurophysiological Results....................................................................................... 641
31.5.3 Correlations with Behavioral Choice......................................................................... 642
31.6 Conclusion.............................................................................................................................644
Acknowledgments........................................................................................................................... 645
References....................................................................................................................................... 645
629
630 The Neural Bases of Multisensory Processes
Fernandez and Goldberg 1976a, 1976b; Guedry 1974). The otoliths behave much like linear accel-
erometers, and otolith afferents provide the basis for directional selectivity that could in principle
be used to guide heading judgments. Indeed, with a sensory organ that signals real inertial motion
of the head, one might ask why the nervous system should rely on visual information at all. Part
of the answer is that even a reliable linear accelerometer has shortcomings, such as the inability to
encode constant-velocity motion and the inability to distinguish between translation and tilt relative
to gravity (due to Einstein’s equivalence principle). The latter problem may be resolved using angu-
lar velocity signals from the semicircular canals (Angelaki et al. 1999, 2004; Merfeld et al. 1999),
but the properties of the canals render this strategy ineffective during low-frequency motion or
static tilts. In fact, in the absence of visual cues, linear acceleration is often misperceived as tilt (the
somatogravic illusion; Previc et al. 1992; Wolfe and Cramer 1970). This illusion can be quite dan-
gerous for aviators, who feel compelled to pitch the nose of their aircraft downward to compensate
for a nonexistent upward tilt, when in fact what they experienced was linear inertial acceleration.
In summary, both the visual and vestibular systems are limited in their ability to unambiguously
signal self-motion. A sensible approach for heading estimation would thus be to combine visual and
vestibular information to overcome the limitations of each modality on its own. As discussed fur-
ther below, this cross-modal integration can also improve perceptual discrimination of heading over
what is possible for each modality alone. Thus, we suggest that multisensory integration of visual
and vestibular inputs provides dual benefits: it overcomes important limitations of each sensory
system alone and it provides increased sensitivity when both systems are active.
VIP
3a
2v MST
PIVC
FIGURE 31.1 (See color insert.) Illustration of some of the areas thought to be involved in processing of
visual and/or vestibular signals for self-motion perception (see text for details). A partially inflated surface
of cerebral cortex of a macaque monkey is shown. Colored regions indicate different functionally and ana-
tomically defined areas. MST, medial superior temporal; VIP, ventral intra-parietal; PIVC, parieto-insular
vestibular cortex.
632 The Neural Bases of Multisensory Processes
integrating visual and vestibular signals to subserve heading perception because (1) they have large
receptive fields and selectivity for complex optic flow patterns that simulate self-motion (Duffy
and Wurtz 1991, 1995; Tanaka et al. 1986; Tanaka and Saito 1989; Schaafsma and Duysens 1996;
Bremmer et al. 2002a), (2) they show some compensation for shifts in the focus of expansion due to
pursuit eye movements (Bradley et al. 1996; Zhang et al. 2004; Page and Duffy 1999), and (3) they
have been causally linked to heading judgments based on optic flow in microstimulation studies
(Britten and van Wezel 1998, 2002; Zhang and Britten 2003). Perhaps most importantly, MSTd and
VIP also contain neurons sensitive to physical translation in darkness (Bremmer et al. 1999, 2002b;
Duffy 1998; Gu et al. 2006; Chen et al. 2007; Schlack et al. 2002; Takahashi et al. 2007; Chowdhury
et al. 2009). This suggests the presence of vestibular signals that may be useful for heading percep-
tion, and thus the potential for integration with optic flow signals.
In addition to regions conventionally considered to be largely visual in nature, there are sev-
eral potential loci within the vestibular system where otolith-driven signals regarding transla-
tion could be combined with optic flow signals. Putative visual–vestibular convergence has been
reported as early as one or two synapses from the vestibular periphery, in the brainstem vestibular
nuclei (Daunton and Thomsen 1979; Henn et al. 1974; Robinson 1977; Waespe and Henn 1977) and
vestibulo-cerebellum (Markert et al. 1988; Waespe et al. 1981; Waespe and Henn 1981). However,
responses to visual (optokinetic) stimuli within these subcortical circuits are more likely related
to gaze stabilization and eye movements [optokinetic nystagmus (OKN), vestibulo-ocular reflex
(VOR), and/or smooth pursuit] rather than self-motion perception per se. This conclusion is sup-
ported by recent experiments (Bryan and Angelaki 2008) showing a lack of optic-flow responsive-
ness in the vestibular and deep cerebellar nuclei when animals were required to fixate a head-fixed
target (suppressing OKN).
At higher stages of vestibular processing, several interconnected cortical areas have tradition-
ally been recognized as “vestibular cortex” (Fukushima 1997; Guldin and Grusser 1998), and are
believed to receive multiple sensory inputs, including visual, vestibular, and somatosensory/pro
prioceptive signals. Specifically, three main cortical areas (Figure 31.1) have been characterized as
either exhibiting responses to vestibular stimulation and/or receiving short-latency vestibular sig-
nals (trisynaptic through the vestibular nuclei and the thalamus). These include: (1) area 2v, located
in the transition zone of areas 2, 5, and 7 near the lateral tip of the intraparietal sulcus (Schwarz and
Fredrickson 1971a, 1971b; Fredrickson et al. 1966; Buttner and Buettner 1978); (2) the parietoinsu-
lar vestibular cortex (PIVC), located between the auditory and secondary somatosensory cortices
(Grusser et al. 1990a, 1990b); and (3) area 3a, located within the central sulcus extending into the
anterior bank of the precentral gyrus (Odkvist et al. 1974; Guldin et al. 1992). In addition to show-
ing vestibular responsiveness, neurons in PIVC (Grusser et al. 1990b) and 2v (Buttner and Buettner
1978) were reported to show an influence of visual/optokinetic stimulation, similar to subcortical
structures. However, these studies did not conclusively demonstrate that neurons in any of these
areas provide robust information about self-motion from optic flow. Indeed, we have recently shown
that PIVC neurons generally do not respond to brief (2-second) optic flow stimuli with a Gaussian
velocity profile (Chen et al. 2010), whereas these same visual stimuli elicit very robust directional
responses in areas MSTd and VIP (Gu et al. 2006; Chen et al. 2007). Thus far, we also have not
encountered robust optic flow selectivity in area 2v (unpublished observations).
In summary, the full repertoire of brain regions that carry robust signals related to both optic flow
and inertial motion remains to be further elaborated, and other areas that serve as important players
in multisensory integration for self-motion perception may yet emerge. However, two aspects of the
available data are fairly clear. First, extrastriate areas MSTd and VIP contain robust representations
of self-motion direction based on both visual and vestibular cues. Second, traditional vestibular cor-
tical areas (PIVC, 2v) do not appear to have sufficiently robust responses to optic flow to be serious
candidates for the neural basis of multimodal heading perception. In the remainder of this review,
we shall therefore focus on what is known about visual–vestibular integration in area MSTd, as this
area has been best studied so far.
Visual–Vestibular Integration for Self-Motion Perception 633
(a) (b)
Up
Mirror
Monkey Down
Screen
re
Left
Fo
Field Righ
ft
t
A
coil
(c)
Projector 1.0 0.3
Acceleration (m/s2)
0.2
Velocity (m/s)
0.5
0.1
6 degrees of freedom 0.0 0.0
motion platform –0.1
–0.5
–0.2
Acceleration
–1.0 Velocity –0.3
0.0 0.5 1.0 1.5 2.0
Time (s)
(d)
Vestibular Visual Combined
–90
–45 40
30
0
Firing rate (sp/s)
20
Elevation (º)
45 10
90
(e)
–90
–45 60
0 40
20
45
90
Azimuth (º)
FIGURE 31.2 (a–c) Apparatus and stimuli used to examine visual–vestibular interactions in rhesus mon-
keys. (a) 3-D virtual reality system, (b) heading trajectories, and (c) velocity and acceleration profiles used by
Gu et al. (2006). 3-D heading tuning functions of two example MSTd neurons: (d) a “congruent cell” and (e) an
“opposite” cell. Firing rate (grayscale) is plotted as a function of azimuth (abscissa) and elevation (ordinate)
of heading trajectory. For each cell, tuning was measured in three stimulus conditions: vestibular (inertial
motion only), visual (optic flow only), and combined visual–vestibular stimulation. (Adapted from Gu, Y. et
al., J. Neurosci., 26, 73–85, 2006.)
634 The Neural Bases of Multisensory Processes
stimuli with a Gaussian stimulus velocity profile (Figure 31.2c) that is well suited to activating the
otolith organs (Gu et al. 2006; Takahashi et al. 2007). Heading tuning was measured under three
stimulus conditions: visual only, vestibular only, and a combined condition in which the stimulus
contained precisely synchronized optic flow and inertial motion. We found that about 60% of MSTd
neurons show significant directional tuning for both visual and vestibular heading cues. MSTd neu-
rons showed a wide variety of heading preferences, with individual neurons being tuned to virtually
all possible directions of translation in 3-D space. Notably, however, there was a strong bias for
MSTd neurons to respond best to lateral motions within the frontoparallel plane (i.e., left/right and
up/down), with relatively few neurons preferring fore–aft directions of motion. This was true for
both visual and vestibular tuning separately (Gu et al. 2006, 2010).
Interestingly, MSTd neurons seemed to fall into one of two categories based on their relative
preferences for heading defined by visual and vestibular cues. For congruent cells, the visual and
vestibular heading preferences are closely matched, as illustrated by the example neuron shown in
Figure 31.2d. This neuron preferred rightward motion of the head in both the visual and vestibu-
lar conditions. In contrast, opposite cells have visual and vestibular heading preferences that are
roughly 180° apart (Gu et al. 2006). For example, the opposite cell in Figure 31.2e prefers rightward
and slightly upward motion in the vestibular condition, but prefers leftward and slightly downward
translation in the visual condition. For this neuron, responses in the combined stimulus condition
(Figure 31.2e, right panel) were very similar to those elicited by optic flow in the visual condition.
This pattern of results was common in the study of Gu et al. (2006). However, as discussed further
below, this apparent visual dominance was because high-coherence visual stimuli were used. We
shall consider this issue in considerably more detail in the next section.
The responses of MSTd neurons to translation in the vestibular condition were found to be very
similar when responses were recorded during translation in complete darkness (as opposed to during
viewing of a fixation target on a dim background), suggesting that spatial tuning seen in the vestibu-
lar condition (e.g., Figure 31.2d, e) was indeed of labyrinthine origin (Gu et al. 2006; Chowdhury et
al. 2009). To verify this, we examined the responses of MSTd neurons after a bilateral labyrinthec-
tomy. After the lesion, MSTd neurons did not give significant responses in the vestibular condition,
and spatial tuning was completely abolished (Gu et al. 2007; Takahashi et al. 2007). Thus, responses
observed in MSTd during the vestibular condition arise from otolith-driven input.
Figure 31.3a shows the effect of eye position on the vestibular heading preference of an MSTd
neuron. In this case, heading preference (small white circles connected by dashed line) remains
quite constant as eye position varies, indicating head-centered tuning. Figure 31.3b shows the effect
of eye position on the visual heading tuning of another MSTd neuron. Here, the heading prefer-
ence clearly shifts with eye position, such that the cell signals heading in an eye-centered frame of
reference. A cross-correlation technique was used to measure the amount of shift of the heading
preference relative to the change in eye position. This yields a metric, the displacement index, which
(a) (b)
Cell 1: Vestibular Cell 2: Visual
120
Firing rate (sp/s)
40
90
30
–20º 60
20
30
10
0º
–90
Elevation (º)
–45
0 +20º
45
90
270 180 90 0 –90 270 180 90 0 –90
Azimuth (º)
(c) (d)
Head- 0.89
45 1.0
centered
Eye-centered
40
35 0.8
Displacement index
Number of cells
30
0.24 0.6
25 Visual
Vestibular
20
0.4
15
10 0.2 Vestibular
5 Visual
Combined
0 0.0
–0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.5 1 2 4
Displacement index Visual–vestibular ratio
FIGURE 31.3 Reference frames of visual and vestibular heading signals in MSTd. Tuning functions are
plotted for two example cells in (a) vestibular and (b) visual conditions, measured separately at three static eye
positions along horizontal meridian: −20° (top), 0° (middle), and +20° (bottom). Dashed white line connects
preferred heading in each case, to illustrate horizontal shift (or lack thereof) of tuning function across eye
positions. (c) Histogram of displacement index (DI) values for MSTd neurons tested in vestibular (black bars)
and visual (gray bars) conditions. DI is defined as angular shift of the tuning function normalized by change
in eye position; thus a value of 0 indicates a head- (or body-) centered reference frame and 1 indicates an eye-
centered frame. (d) Binned average DI values for three stimulus conditions (vestibular, visual, combined) as a
function of relative strength of visual and vestibular single-cue tuning (visual/vestibular ratio). (Adapted from
Fetsch, C.R. et al., J. Neurosci., 27, 700–712, 2007.)
636 The Neural Bases of Multisensory Processes
will be 0.0 for head-centered tuning and 1.0 for eye-centered tuning. As shown in Figure 31.3c, we
found that visual heading tuning was close to eye-centered, with a median displacement index of
0.89. In contrast, vestibular heading tuning was found to be close to head-centered, with a median
displacement index of 0.24. This value for the vestibular condition was significantly larger than 0.0,
indicating that vestibular heading tuning was slightly shifted toward eye-centered coordinates.
These data show that visual and vestibular signals in MSTd are not expressed in a common refer-
ence frame. By conventional thinking, this might cast doubt on the ability of this area to perform
sensory integration for heading perception. However, computational modeling suggests that sensory
signals need not explicitly occupy a common reference frame for integration to occur (Avillac et
al. 2005; Fetsch et al. 2007; Deneve et al. 2001). Moreover, as we will see in a later section, MSTd
neurons can account for improved behavioral sensitivity under cue combination. Thus, the conven-
tional and intuitive notion that sensory signals need to be expressed in a common reference frame
for multisensory integration to occur may need to be discarded.
The results of the study by Fetsch et al. (2007) also provide another challenge to conventional
ideas regarding multisensory integration and reference frames. To our knowledge, all previous stud-
ies on reference frames of sensory signals have only examined responses during unisensory stimu-
lation. Also relevant is the reference frame exhibited by neurons during combined, multimodal
stimulation, and how this reference frame depends on the relative strengths of responses to the
two sensory modalities. To examine this issue, Fetsch et al. (2007) measured the reference frame
of activity during the combined (visual–vestibular) condition, as well as the unimodal conditions.
Average displacement index values were computed as a function of the relative strength of unimodal
visual and vestibular responses [visual/vestibular ratio (VVR)]. For the visual (circles) and vestibu-
lar (squares) conditions, the average displacement index did not systematically depend on VVR
(Figure 31.3d), indicating that the reference frame in the unimodal conditions was largely indepen-
dent of the relative strengths of visual and vestibular inputs to the neuron under study. In contrast,
for the combined condition (diamonds), the average displacement index changed considerably as a
function of VVR, such that the reference frame of combined responses was more head-centered for
neurons with low VVR and more eye-centered for neurons with high VVR (Figure 31.3d). Thus, the
reference frame of responses to multimodal stimuli can vary as a function of the relative strengths
of the visual and vestibular inputs. This has potentially important implications for understanding
how multisensory responses are decoded, and deserves further study.
Although many studies have measured additivity and/or enhancement of multisensory responses,
there has been a surprising lack of studies that have directly attempted to measure the mathemati-
cal rule by which multisensory neurons combine their unimodal inputs (hereafter the “combination
rule”). Measuring additivity (or enhancement) for a limited set of stimuli is not sufficient to charac-
terize the combination rule. To illustrate this point, consider a hypothetical neuron whose bimodal
response is the product (multiplication) of its unimodal inputs. The response of this neuron could
appear to be subadditive (e.g., 2 × 1 = 2), additive (2 × 2 = 4), or superadditive (2 × 3 = 6) depend-
ing on the magnitudes of the two inputs to the neuron. Thus, to estimate the combination rule, it is
essential to examine responses to a wide range of stimulus variations in both unimodal domains.
Recently, we have performed an experiment to measure the combination rule by which neurons
in area MSTd integrate their visual and vestibular inputs related to heading (Morgan et al. 2008).
We asked whether bimodal responses in MSTd are well fit by a weighted linear summation of uni-
modal responses, or whether a nonlinear (i.e., multiplicative) combination rule is required. We also
asked whether the combination rule changes with the relative reliability of the visual and vestibular
cues. To address these questions, we presented eight evenly spaced directions of motion (45° apart)
in the horizontal plane (Figure 31.4, inset). Unimodal tuning curves (Figure 31.4a–c, margins) were
measured by presenting these eight headings in both the vestibular and visual stimulus conditions.
In addition, we measured a full bimodal interaction profile by presenting all 64 possible combina-
tions of these 8 vestibular and 8 visual headings, including 8 congruent and 56 incongruent (cue-
conflict) conditions. Figure 31.4a–c shows data from an exemplar “congruent” cell in area MSTd.
The unimodal tuning curves (margins) show that this neuron responded best to approximately right-
ward motion (0°) in both the visual and vestibular conditions. When optic flow at 100% coherence
was combined with vestibular stimulation, the bimodal response profile of this neuron (grayscale
map in Figure 31.4a) was dominated by the visual input, as indicated by the horizontal band of high
firing rates. When the optic flow stimulus was weakened by reducing the motion coherence to 50%
(Figure 31.4b), the bimodal response profile showed a more balanced, symmetric peak, indicating
that the bimodal response now reflects roughly equal contributions of visual and vestibular inputs.
When the motion coherence was further reduced to 25% (Figure 31.4c), the unimodal visual tuning
curve showed considerably reduced amplitude and the bimodal response profile became dominated
by the vestibular input, as evidenced by the vertical band of high firing rates. Thus, as the relative
strengths of visual and vestibular cues to heading vary, bimodal responses of MSTd neurons range
from visually dominant to vestibularly dominant.
To characterize the combination rule used by MSTd neurons in these experiments, we attempted
to predict the bimodal response profile as a function of the unimodal tuning curves. We found that
bimodal responses were well fit by a weighted linear summation of unimodal responses (Morgan et
al. 2008). On average, this linear model accounted for ~90% of the variance in bimodal responses,
and adding various nonlinear components to the model (such as a product term) accounted for only
1–2% additional variance. Thus, weighted linear summation provides a good model for the combi-
nation rule used in MSTd, and the weights are typically less than 1 (Figure 31.4d, e), indicating that
subadditive interactions are commonplace.
How does the weighted linear summation model of MSTd integration depend on the reliability
of the cues to heading? As the visual cue varies in reliability due to changes in motion coherence,
the bimodal response profile clearly changes shape (Figure 31.4a–c). There are two basic possible
explanations for this change in shape. One possibility is that the bimodal response profile changes
simply from the fact that lower coherences elicit visual responses with weaker modulation as a func-
tion of heading. In this case, the weights with which each neuron combines its vestibular and visual
inputs remain constant and the decreased visual influence in the bimodal response profile is simply
due to weaker visual inputs at lower coherences. In this scenario, each neuron has a combination
rule that is independent of cue reliability. A second possibility is that the weights given to the ves-
tibular and visual inputs could change with the relative reliabilities of the two cues. This outcome
638 The Neural Bases of Multisensory Processes
Number of cells
100%
(b) 50% Coherence 50%
15 15
Visual heading (º)
–90 0 0 0
40 0.0 0.4 0.8 1.2 0.0 0.4 0.8 1.2
0 Vestibular weight Visual weight
–90 0 90 180 (f ) (g)
(c) 1.5 1.5
25% Coherence
50
180
Vestibular weight
Visual weight
1 1
90 25
0
0.5 0.5
–90 0
0 0
–90 0 90 180 25 50 75 100 25 50 75 100
Vestibular heading (º) Coherence (%) Coherence (%)
FIGURE 31.4 Effects of cue strength (motion coherence) on weighted summation of visual and vestibular
inputs by MSTd neurons. (a–c) Comparison of unimodal and bimodal tuning for a congruent MSTd cell, tested
at three motion coherences. Grayscale maps show mean firing rates as a function of vestibular and visual
headings in bimodal condition (including all 64 possible combinations of 8 visual headings and 8 vestibular
headings at 45° intervals). Tuning curves along left and bottom margins show mean (±SEM) firing rates versus
heading for unimodal visual and vestibular conditions, respectively. (a) Bimodal responses at 100% coherence
are visually dominated. (b) Bimodal responses at 50% coherence show a balanced contribution of visual and
vestibular cues. (c) At 25% coherence, bimodal responses appear to be dominated by vestibular input. (d–g)
Dependence of vestibular and visual weights on visual motion coherence. Vestibular and visual weights for
each MSTd neuron were derived from linear fits to bimodal responses. (d, e) Histograms of vestibular and
visual weights computed from data at 100% (black) and 50% (gray) coherence. Triangles are plotted at medi-
ans. (f, g) Vestibular and visual weights are plotted as a function of motion coherence for each neuron exam-
ined at multiple coherences. Data points are coded by significance of unimodal visual tuning (open vs. filled
circles). (Adapted from Morgan, M.L. et al., Neuron, 59, 662–673, 2008.)
would indicate that the neuronal combination rule is not fixed, but changes with cue reliability. This
is a fundamental issue of considerable importance in multisensory integration.
To address this issue, we obtained the best fit of the weighted linear summation model separately
for each motion coherence. At all coherences, the linear model provided a good fit to the bimodal
responses. The key question then becomes whether the visual and vestibular weights attributed to
each neuron remain constant as a function of coherence or whether they change systematically.
Figure 31.4d, e shows the distributions of weights obtained at 100% (black bars) and 50% (gray
Visual–Vestibular Integration for Self-Motion Perception 639
bars) coherence. The average visual weight is significantly higher at 100% coherence than 50%
coherence, whereas the average vestibular weight shows the opposite effect. For all neurons that
were tested at multiple coherences, Figure 31.4f, g shows how the vestibular and visual weights,
respectively, change with coherence for each neuron. There is a clear and significant trend for ves-
tibular weights to decline with coherence, whereas visual weights increase (Morgan et al. 2008).
A model in which the weights are fixed across coherences does not fit the data as well as a model
in which the weights vary with coherence, for the majority of neurons (Morgan et al. 2008). The
improvement in model fit with variable weights (although significant) is rather modest for most neu-
rons, however, and it remains to be determined whether these weight changes have large or small
effects on population codes for heading.
The findings of Morgan et al. (2008) could have important implications for understanding the
neural circuitry that underlies multisensory integration. Whereas the neuronal combination rule is
well described as weighted linear summation for any particular values of stimulus strength/energy,
the weights in this linear combination rule are not constant when stimulus strength varies. If MSTd
neurons truly perform a simple linear summation of their visual and vestibular inputs, then this
finding would suggest that the synaptic weights of these inputs change as a function of stimulus
strength. Although this is possible, it is not clear how synaptic weights would be dynamically modi-
fied from moment to moment when the stimulus strength is not known in advance. Yet, it is well
established that human cue integration behavior involves a dynamic, trial-by-trial reweighting of
cues. A recent neural theory of cue integration shows that neurons that simply sum their multisen-
sory inputs can account for dynamic cue reweighting at the perceptual level, if their spiking statis-
tics fall into a Poisson-like family (Ma et al. 2006). In this theory, it was not necessary for neurons
to change their combination rule with stimulus strength, but this is what the results of Morgan et al.
(2008) demonstrate.
One possible resolution to this conundrum is that multisensory neurons linearly sum their inputs
with fixed weights, at the level of membrane potential, but that some network-level nonlinearity
makes the weights appear to change with stimulus strength. A good candidate mechanism that may
account for the findings of Morgan et al. (2008) is divisive normalization (Carandini et al. 1997;
Heeger 1992). In a divisive normalization circuit, each cell performs a linear weighted summation
of its inputs at the level of membrane potential, but the output of each neuron is divided by the
summed activity of all neurons in the circuit (Heeger 1992). This model has been highly success-
ful in accounting for how the responses of neurons in the primary visual cortex (V1) change with
stimulus strength (i.e., contrast; Carandini et al. 1997) and how neurons in visual area MT combine
multiple motion signals (Rust et al. 2006), and has also recently been proposed as an explanation
for how selective attention modifies neural activity (Lee and Maunsell 2009; Reynolds and Heeger
2009). Recent modeling results (not shown) indicate that divisive normalization can account for the
apparent changes in weights with coherence (Figure 31.4f, g), as well as a variety of other classic
findings in multisensory integration (Ohshiro et al. 2011). Evaluating the normalization model of
multisensory integration is a topic of current research in our laboratories.
(Gu et al. 2008; Fetsch et al. 2009). This task enabled us to ask two fundamental questions that
had remained unaddressed: (1) Can monkeys integrate visual and vestibular cues near-optimally to
improve heading discrimination performance? (2) Can the activity of MSTd neurons account for the
behavioral improvement observed?
(a) Time
3 4
2
1
(b) (c)
3.5 Monkey C Monkey A
1.0 Vestibular
Visual 1.2
rightward decisions
Threshold (º)
Combined 3.0
Proportion of
0.5 1.0
2.5
σves = 3.5º
σvis = 3.6º
σcom = 2.3º 2.0 0.8
e d ed
ed d
ar
m l
ar
m l
n
Co ua
Co sua
Pr ine
tio
tio
ul
ul
Pr bin
0.0
s
tib
tib
Vi
Vi
ic
ic
b
–8 –4 0 4 8
s
s
Ve
Ve
FIGURE 31.5 Heading discrimination task and behavioral performance. (a) After fixating a visual target,
the monkey experienced forward motion (real and/or simulated with optic flow) with a small leftward or right-
ward component, and subsequently reported his perceived heading (“left” vs. “right”) by making a saccadic
eye movement to one of two choice targets. (b) Psychometric functions for one animal under unimodal (ves-
tibular: dashed curve, visual: solid curve) and bimodal (gray curve) conditions. Psychophysical threshold was
defined as the standard deviation (σ) of fitted cumulative Gaussian. (c) Summary of measured and predicted
psychophysical thresholds for monkey C. Bars show average threshold (±SE) for vestibular (white), visual
(dark gray), and combined conditions (black), along with predicted threshold for combined condition assum-
ing optimal cue integration (light gray). (d) Summary of psychophysical performance for monkey A. (Adapted
from Gu, Y. et al., Neuron, 66, 596–609, 2008.)
Visual–Vestibular Integration for Self-Motion Perception 641
Optimal cue-integration models (e.g., Alais and Burr 2004; Ernst and Banks 2002; Knill and
2
Saunders 2003) predict that the threshold in the combined condition (σ comb ) should be lower than the
2 2
single-cue thresholds (σ ves, σ vis), as given by the following expression:
2 2
2 σ vis σ ves
σ comb = 2 2 (31.1)
σ vis + σ ves
To maximize the predicted improvement in performance, the reliability of the visual and vestibular
cues (as measured by thresholds in the single-cue conditions) was matched by adjusting the motion
coherence of optic flow in the visual display (for details, see Gu et al. 2008). Psychometric func-
tions for one animal are plotted in Figure 31.5b. The vestibular (filled symbols, dashed curve) and
visual (open symbols, solid curve) functions are nearly overlapping, with thresholds of 3.5° and 3.6°,
respectively. In the combined condition (gray symbols and curve), the monkey’s heading threshold
was substantially smaller (2.3°), as evidenced by the steeper slope of the psychometric function.
Figure 31.5c, d summarizes the psychophysical data from two monkeys. For both animals, psy-
chophysical thresholds in the combined condition were significantly lower than thresholds in the
visual and vestibular conditions, and were quite similar to the optimal predictions generated from
Equation 31.1 (Gu et al. 2008). Thus, monkeys integrate visual and vestibular cues near-optimally
to improve their sensitivity in the heading discrimination task. Similar results were also found for
human subjects (Fetsch et al. 2009).
(a) (b)
40 Vestibular 0º
Visual 80
30
–90º 90º 60
20 ±180º
40
10 20
Firing rate (sp/s)
0 0
–180 –90 0 90 180 –180 –90 0 90 180
(c) (d)
40 0º 40
Combined
30
30
20
20
10
10 0
–10 –5 0 5 10 –20 –10 0 10 20
(e) (f )
1.0
decisions of ideal observer
1.0
Proportion ‘rightward’
0.5 0.5
σves = 5.1º σves = 5.7º
σvis = 2.6º σvis = 2.6º
σcom = 1.8º σcom = 40.8º
0.0 0.0
–10 –5 0 5 10 –20 –10 0 10 20
Heading direction (º)
FIGURE 31.6 Heading tuning and heading sensitivity in area MSTd. (a–b) Heading tuning curves of two
example neurons with (a) congruent and (b) opposite visual–vestibular heading preferences. (c–d) Responses
of same neurons to a narrow range of heading stimuli presented while the monkey performed the discrimi-
nation task. (e–f) Neurometric functions computed by ROC analysis from firing rate data plotted in pan els
(c) and (d). Smooth curves show best-fitting cumulative Gaussian functions. (Adapted from Gu, Y. et al.,
Neuron, 66, 596– 609, 2008.)
Equation 31.1. A significant correlation was seen between the combined threshold/predicted thresh-
old ratio and CI (Figure 31.7a), such that neurons with large positive CIs (congruent cells, black
circles) had thresholds close to the optimal prediction (ratios near unity). Thus, neuronal thresholds
for congruent MSTd cells followed a pattern similar to the monkeys’ behavior. In contrast, com-
bined thresholds for opposite cells were generally much higher than predicted from optimal cue
integration (Figure 31.7a, open circles), indicating that these neurons became less sensitive during
cue combination.
(a)
Congruent
Opposite
(combined/predicted)
Intermediate
Threshold ratio
10
0.7
Choice probability
0.6
0.5
0.4
0.3
0.2
–1.0 –0.5 0.0 0.5 1.0
Congruency index
FIGURE 31.7 Neuronal thresholds and choice probabilities as a function of visual–vestibular congruency
in combined condition. (a) Ordinate in this scatter plot represents ratio of threshold measured in combined
condition to prediction from optimal cue integration. Abscissa represents CI of heading tuning for visual and
vestibular responses. Asterisks denote neurons for which CI is not significantly different from zero. Dashed
horizontal line denotes that threshold in combined condition is equal to the prediction. (b) Choice probability
(CP) data are plotted as a function of congruency index for each MSTd neuron tested in combined condition.
Note that congruent cells (black filled symbols), which have neuronal thresholds similar to optimal prediction
in panel (a), also have CPs consistently and substantially larger than 0.5. (Adapted from Gu, Y. et al., Neuron,
66, 596–609, 2008.)
probabilities” (CPs) (Britten et al. 1996). CPs are computed by ROC analysis similar to neuronal
thresholds, except that the ideal observer is asked to predict the monkey’s choice (rather than the
stimulus) from the firing rate of the neuron. This analysis is performed after the effect of heading
on response has been removed, such that it isolates the effect of choice on firing rates. Thus, CPs
quantify the relationship between trial-to-trial fluctuations in neural firing rates and the monkeys’
perceptual decisions. A CP significantly greater than 0.5 indicates that the monkey tended to choose
the neuron’s preferred sign of heading (leftward or rightward) when the neuron fires more strongly.
Such a result is thought to reflect a functional link between the neuron and perception (Britten et
al. 1996; Krug 2004; Parker and Newsome 1998). Notably, although MSTd is classically considered
visual cortex, CPs significantly larger than 0.5 (mean = 0.55) were seen in the vestibular condition
(Gu et al. 2007), indicating that MSTd activity is correlated with perceptual decisions about heading
based on nonvisual information.
It is of particular interest to examine the relationship between CP and CI in the combined con-
dition, where the monkey makes use of both visual and vestibular cues. Given that opposite cells
become insensitive during cue combination and congruent cells increase sensitivity, we might expect
CP to depend on congruency in the combined condition. Indeed, Figure 31.7b shows that there is a
644 The Neural Bases of Multisensory Processes
robust correlation between CP and CI (Gu et al. 2008). Congruent cells (black symbols) generally
have CPs greater than 0.5, often much greater, indicating that they are robustly correlated with the
animal’s perceptual decisions during cue integration. In contrast, opposite cells (unfilled symbols)
tend to have CP values near 0.5, and the mean CP for opposite cells does not differ significantly
from 0.5 (t-test, p = .08). This finding is consistent with the idea that the animals selectively monitor
congruent cells to achieve near-optimal cue integration.
These findings suggest that opposite cells are not useful for visual–vestibular cue integration dur-
ing heading discrimination. What, then, is the functional role of opposite cells? We do not yet know
the answer to this question, but we hypothesize that opposite cells, in combination with congruent
cells, are important for dissociating object motion from self-motion. In general, the complex pat-
tern of image motion on the retina has two sources: (1) self-motion combined with the 3-D layout
of the scene and (2) objects moving in the environment. It is important for estimates of heading not
to be biased by the presence of moving objects, and vice versa. Note that opposite cells will not be
optimally stimulated when a subject moves through a static environment, but may fire more robustly
when retinal image motion is inconsistent with self-motion. Thus, the relative activity of congruent
and opposite cells may help identify (and perhaps discount) retinal image motion that is not pro-
duced by self-motion. Indeed, ongoing modeling work suggests that decoding a mixed population
of congruent and opposite cells allows heading to be estimated with much less bias from moving
objects.
In summary, by simultaneously monitoring neural activity and behavior, it has been possible
to study neural mechanisms of multisensory processing under conditions in which cue integra-
tion is known to take place perceptually. In addition to demonstrating near-optimal cue integration
by monkeys, a population of neurons has been identified in area MSTd that could account for the
improvement in psychophysical performance under cue combination. These findings implicate area
MSTd in sensory integration for heading perception and establish a model system for studying the
detailed mechanisms by which neurons combine different sensory signals.
31.6 CONCLUSION
These studies indicate that area MSTd is one important brain area where visual and vestibular sig-
nals might be integrated to achieve robust perception of self-motion. It is likely that other areas also
integrate visual and vestibular signals in meaningful ways, and a substantial challenge for the future
will be to understand the specific roles that various brain regions play in multisensory perception
of self-motion and object motion. In addition, these studies raise a number of important general
questions that may guide future studies on multisensory integration in multiple systems and species.
What are the respective functional roles of neurons that have congruent or incongruent tuning for
two sensory inputs? Do the spatial reference frames in which multiple sensory signals are expressed
constrain the contribution of multisensory neurons to perception? Do multisensory neurons gener-
ally perform weighted linear summation of their unimodal inputs, or do the mathematical combina-
tion rules used by neurons vary across brain regions and across stimuli/tasks within a brain region?
How can we account for the change in the weights that neurons apply to their unimodal inputs as the
strength of the sensory inputs varies? Does this require dynamic changes in synaptic weights or can
this phenomenology be explained in terms of nonlinearities (such as divisive normalization) that
operate at the level of the network? During behavioral discrimination tasks involving cue conflict,
do single neurons show correlates of the dynamic cue reweighting effects that have been seen con-
sistently in human perceptual studies of cue integration? How do populations of multimodal sensory
neurons represent the reliabilities (i.e., variance) of the sensory cues as they change dynamically
in the environment? Most of these questions should be amenable to study within the experimental
paradigm of visual–vestibular integration that we have presented thus far. Thus, we expect that this
will serve as an important platform for tackling critical questions regarding multisensory integra-
tion in the future.
Visual–Vestibular Integration for Self-Motion Perception 645
ACKNOWLEDGMENTS
We thank Amanda Turner and Erin White for excellent monkey care and training. This work
was supported by NIH EY017866 and EY019087 (to DEA) and by NIH EY016178 and an EJLB
Foundation grant (to GCD).
REFERENCES
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol
14: 257–262.
Anderson, K. C., and R. M. Siegel. 1999. Optic flow selectivity in the anterior superior temporal polysensory
area, STPa, of the behaving monkey. J Neurosci 19: 2681–2692.
Angelaki, D. E. 2004. Eyes on target: What neurons must do for the vestibuloocular reflex during linear motion.
J Neurophysiol 92: 20–35.
Angelaki, D. E., and K. E. Cullen. 2008. Vestibular system: The many facets of a multimodal sense. Annu Rev
Neurosci 31: 125–150.
Angelaki, D. E., M. Q. Mchenry, J. D. Dickman, S. D. Newlands, and B. J. Hess. 1999. Computation of inertial
motion: Neural strategies to resolve ambiguous otolith information. J Neurosci 19: 316–327.
Angelaki, D. E., A. G. Shaikh, A. M. Green, and J. D. Dickman. 2004. Neurons compute internal models of the
physical laws of motion. Nature 430: 560–564.
Avillac, M., S. Ben Hamed, and J. R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area
of the macaque monkey. J Neurosci 27: 1922–1932.
Avillac, M., S. Deneve, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing
visual and tactile locations in parietal cortex. Nat Neurosci 8: 941–949.
Banks, M. S., S. M. Ehrlich, B. T. Backus, and J. A. Crowell. 1996. Estimating heading during real and simu-
lated eye movements. Vision Res 36: 431–443.
Benson, A. J., M. B. Spencer, and J. R. Stott. 1986. Thresholds for the detection of the direction of whole-body,
linear movement in the horizontal plane. Aviat Space Environ Med 57: 1088–1096.
Berthoz, A., B. Pavard, and L. R. Young. 1975. Perception of linear horizontal self-motion induced by periph-
eral vision (linearvection) basic characteristics and visual–vestibular interactions. Exp Brain Res 23:
471–489.
Bradley, A., B. C. Skottun, I. Ohzawa, G. Sclar, and R. D. Freeman. 1987. Visual orientation and spatial fre-
quency discrimination: A comparison of single neurons and behavior. J Neurophysiol 57: 755–772.
Bradley, D. C., M. Maxwell, R. A. Andersen, M. S. Banks, and K. V. Shenoy. 1996. Mechanisms of heading
perception in primate visual cortex. Science 273: 1544–1547.
Brandt, T., J. Dichgans, and E. Koenig. 1973. Differential effects of central verses peripheral vision on egocen-
tric and exocentric motion perception. Exp Brain Res 16: 476–491.
Bremmer, F., J. R. Duhamel, S. Ben Hamed, and W. Graf. 2002a. Heading encoding in the macaque ventral
intraparietal area (VIP). Eur J Neurosci 16: 1554–1568.
Bremmer, F., F. Klam, J. R. Duhamel, S. Ben Hamed, and W. Graf. 2002b. Visual–vestibular interactive
responses in the macaque ventral intraparietal area (VIP). Eur J Neurosci 16: 1569–1586.
Bremmer, F., M. Kubischik, M. Pekel, M. Lappe, and K. P. Hoffmann. 1999. Linear vestibular self-motion
signals in monkey medial superior temporal area. Ann N Y Acad Sci 871: 272–281.
Britten, K. H., W. T. Newsome, M. N. Shadlen, S. Celebrini, and J. A. Movshon. 1996. A relationship between
behavioral choice and the visual responses of neurons in macaque MT. Vis Neurosci 13: 87–100.
Britten, K. H., M. N. Shadlen, W. T. Newsome, and J. A. Movshon. 1992. The analysis of visual motion: A
comparison of neuronal and psychophysical performance. J Neurosci 12: 4745–4765.
Britten, K. H., and R. J. Van Wezel. 1998. Electrical microstimulation of cortical area MST biases heading
perception in monkeys. Nat Neurosci 1: 59–63.
Britten, K. H., and R. J. Van Wezel. 2002. Area MST and heading perception in macaque monkeys. Cereb
Cortex 12: 692–701.
Bryan, A. S., and D. E. Angelaki. 2008. Optokinetic and vestibular responsiveness in the macaque rostral ves-
tibular and fastigial nuclei. J Neurophysiol 101: 714–720.
Buttner, U., and U. W. Buettner. 1978. Parietal cortex (2v) neuronal activity in the alert monkey during natural
vestibular and optokinetic stimulation. Brain Res 153: 392–397.
Carandini, M., D. J. Heeger, and J. A. Movshon. 1997. Linearity and normalization in simple cells of the
macaque primary visual cortex. J Neurosci 17: 8621–8644.
646 The Neural Bases of Multisensory Processes
Chen, A., G. C. Deangelis, and D. E. Angelaki. 2010. Macaque parieto-insular vestibular cortex: Responses to
self-motion and optic flow. J Neurosci 30: 3022–3042.
Chen, A., E. Henry, G. C. Deangelis, and D. E. Angelaki. 2007. Comparison of responses to three-dimensional
rotation and translation in the ventral intraparietal (VIP) and medial superior temporal (MST) areas of
rhesus monkey. Program No. 715.19. 2007 Neuroscience Meeting Planner. San Diego, CA: Society for
Neuroscience, 2007. Online Society for Neuroscience.
Chowdhury, S. A., K. Takahashi, G. C. Deangelis, and D. E. Angelaki. 2009. Does the middle temporal area
carry vestibular signals related to self-motion? J Neurosci 29: 12020–12030.
Crowell, J. A., M. S. Banks, K. V. Shenoy, and R. A. Andersen. 1998. Visual self-motion perception during head
turns. Nat Neurosci 1: 732–737.
Daunton, N., and D. Thomsen. 1979. Visual modulation of otolith-dependent units in cat vestibular nuclei. Exp
Brain Res 37: 173–176.
Deneve, S., P. E. Latham, and A. Pouget. 2001. Efficient computation and cue integration with noisy population
codes. Nat Neurosci 4: 826–831.
Dichgans, J., and T. Brandt. 1974. The psychophysics of visually-induced perception of self motion and tilt. In
The Neurosciences, 123–129. Cambridge, MA: MIT Press.
Dichgans, J., and T. Brandt. 1978. Visual–vestibular interaction: Effects on self-motion perception and postural
control. In Handbook of sensory physiology, ed. R. Held, H. W. Leibowitz, and H. L. Teuber. Berlin:
Springer-Verlag.
Duffy, C. J. 1998. MST neurons respond to optic flow and translational movement. J Neurophysiol 80:
1816–1827.
Duffy, C. J., and R. H. Wurtz. 1991. Sensitivity of MST neurons to optic flow stimuli: I. A continuum of
response selectivity to large-field stimuli. J Neurophysiol 65: 1329–1345.
Duffy, C. J., and R. H. Wurtz. 1995. Response of monkey MST neurons to optic flow stimuli with shifted cen-
ters of motion. J Neurosci 15: 5192–5208.
Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415: 429–433.
Fernandez, C., and J. M. Goldberg. 1976a. Physiology of peripheral neurons innervating otolith organs of the
squirrel monkey: I. Response to static tilts and to long-duration centrifugal force. J Neurophysiol 39:
970–984.
Fernandez, C., and J. M. Goldberg. 1976b. Physiology of peripheral neurons innervating otolith organs of the
squirrel monkey: II. Directional selectivity and force–response relations. J Neurophysiol 39: 985–995.
Fetsch, C. R., A. H. Turner, G. C. Deangelis, and D. E. Angelaki. 2009. Dynamic reweighting of visual and
vestibular cues during self-motion perception. J Neurosci 29: 15601–15612.
Fetsch, C. R., S. Wang, Y. Gu, G. C. Deangelis, and D. E. Angelaki. 2007. Spatial reference frames of visual,
vestibular, and multimodal heading signals in the dorsal subdivision of the medial superior temporal area.
J Neurosci 27: 700–712.
Fredrickson, J. M., P. Scheid, U. Figge, and H. H. Kornhuber. 1966. Vestibular nerve projection to the cerebral
cortex of the rhesus monkey. Exp Brain Res 2: 318–327.
Fukushima, K. 1997. Corticovestibular interactions: Anatomy, electrophysiology, and functional consider-
ations. Exp Brain Res 117: 1–16.
Gibson, J. J. 1950. The perception of the visual world. Boston: Houghton-Mifflin.
Gibson, J. J. 1954. The visual perception of objective motion and subjective movement. Psychol Rev 61:
304–314.
Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley.
Groh, J. M. 2001. Converting neural signals from place codes to rate codes. Biol Cybern 85: 159–165.
Grusser, O. J., M. Pause, and U. Schreiter. 1990a. Localization and responses of neurones in the parieto-insular
vestibular cortex of awake monkeys (Macaca fascicularis). J Physiol 430: 537–557.
Grusser, O. J., M. Pause, and U. Schreiter. 1990b. Vestibular neurones in the parieto-insular cortex of monkeys
(Macaca fascicularis): Visual and neck receptor responses. J Physiol 430: 559–583.
Gu, Y., D. E. Angelaki, and G. C. Deangelis. 2008. Neural correlates of multisensory cue integration in macaque
MSTd. Nat Neurosci 11: 1201–1210.
Gu, Y., G. C. Deangelis, and D. E. Angelaki. 2007. A functional link between area MSTd and heading percep-
tion based on vestibular signals. Nat Neurosci 10: 1038–1047.
Gu, Y., C. R. Fetsch, B. Adeyemo, G. C. Deangelis, and D. E. Angelaki. 2010. Decoding of MSTd population
activity accounts for variations in the precision of heading perception. Neuron 66: 596–609.
Gu, Y., P. V. Watkins, D. E. Angelaki, and G. C. Deangelis. 2006. Visual and nonvisual contributions to three-
dimensional heading selectivity in the medial superior temporal area. J Neurosci 26: 73–85.
Visual–Vestibular Integration for Self-Motion Perception 647
Guedry, F. E. 1974. Psychophysics of vestibular sensation. In Handbook of sensory physiology. The vestibular
system, ed. H. H. Kornhuber. New York: Springer-Verlag.
Guedry Jr., F. E. 1978. Visual counteraction on nauseogenic and disorienting effects of some whole-body
motions: A proposed mechanism. Aviat Space Environ Med 49: 36–41.
Guldin, W. O., S. Akbarian, and O. J. Grusser. 1992. Cortico-cortical connections and cytoarchitectonics
of the primate vestibular cortex: A study in squirrel monkeys (Saimiri sciureus). J Comp Neurol 326:
375–401.
Guldin, W. O., and O. J. Grusser. 1998. Is there a vestibular cortex? Trends Neurosci 21: 254–259.
Heeger, D. J. 1992. Normalization of cell responses in cat striate cortex. Vis Neurosci 9: 181–197.
Henn, V., L. R. Young, and C. Finley. 1974. Vestibular nucleus units in alert monkeys are also influenced by
moving visual fields. Brain Res 71: 144–149.
Hlavacka, F., T. Mergner, and B. Bolha. 1996. Human self-motion perception during translatory vestibular and
proprioceptive stimulation. Neurosci Lett 210: 83–86.
Hlavacka, F., T. Mergner, and G. Schweigart. 1992. Interaction of vestibular and proprioceptive inputs for
human self-motion perception. Neurosci Lett 138: 161–164.
Knill, D. C., and J. A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judg-
ments of surface slant? Vision Res 43: 2539–2558.
Krug, K. 2004. A common neuronal code for perceptual processes in visual cortex? Comparing choice and
attentional correlates in V5/MT. Philos Trans R Soc Lond B Biol Sci 359: 929–941.
Lee, J., and J. H. Maunsell. 2009. A normalization model of attentional modulation of single unit responses.
PLoS ONE 4: e4651.
Logan, D. J., and C. J. Duffy. 2006. Cortical area MSTd combines visual cues to represent 3-D self-movement.
Cereb Cortex 16: 1494–1507.
Ma, W. J., J. M. Beck, P. E. Latham, and A. Pouget. 2006. Bayesian inference with probabilistic population
codes. Nat Neurosci 9: 1432–1438.
Markert, G., U. Buttner, A. Straube, and R. Boyle. 1988. Neuronal activity in the flocculus of the alert monkey
during sinusoidal optokinetic stimulation. Exp Brain Res 70: 134–144.
Matsumiya, K., and H. Ando. 2009. World-centered perception of 3D object motion during visually guided
self-motion. J Vis 9: 151–153.
Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus.
Science 221: 389–391.
Meredith, M. A., and B. E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior
colliculus results in multisensory integration. J Neurophysiol 56: 640–662.
Merfeld, D. M., L. Zupan, and R. J. Peterka. 1999. Humans use internal models to estimate gravity and linear
acceleration. Nature 398: 615–618.
Morgan, M. L., G. C. Deangelis, and D. E. Angelaki. 2008. Multisensory integration in macaque visual cortex
depends on cue reliability. Neuron 59: 662–673.
Odkvist, L. M., D. W. Schwarz, J. M. Fredrickson, and R. Hassler. 1974. Projection of the vestibular nerve to
the area 3a arm field in the squirrel monkey (Saimiri sciureus). Exp Brain Res 21: 97–105.
Ohshiro, T., D. E. Angelaki, and G. C. DeAngelis. 2011. A normalization model of multisensory integration.
Nat Neurosci In press.
Page, W. K., and C. J. Duffy. 1999. MST neuronal responses to heading direction during pursuit eye move-
ments. J Neurophysiol 81: 596–610.
Parker, A. J., and W. T. Newsome. 1998. Sense and the single neuron: Probing the physiology of perception.
Annu Rev Neurosci 21: 227–277.
Perrault Jr., T. J., J. W. Vaughan, B. E. Stein, and M. T. Wallace. 2003. Neuron-specific response characteristics
predict the magnitude of multisensory integration. J Neurophysiol 90: 4022–406.
Perrault Jr., T. J., J. W. Vaughan, B. E. Stein, and M. T. Wallace. 2005. Superior colliculus neurons use distinct
operational modes in the integration of multisensory stimuli. J Neurophysiol 93: 2575–2586.
Previc, F. H., D. C. Varner, and K. K. Gillingham. 1992. Visual scene effects on the somatogravic illusion. Aviat
Space Environ Med 63: 1060–1064.
Reynolds, J. H., and D. J. Heeger. 2009. The normalization model of attention. Neuron 61: 168–185.
Robinson, D. A. 1977. Linear addition of optokinetic and vestibular signals in the vestibular nucleus. Exp Brain
Res 30: 447–450.
Royden, C. S., M. S. Banks, and J. A. Crowell. 1992. The perception of heading during eye movements. Nature
360: 583–585.
Royden, C. S., J. A. Crowell, and M. S. Banks. 1994. Estimating heading during eye movements. Vis Res 34:
3197–3214.
648 The Neural Bases of Multisensory Processes
Royden, C. S., and E. C. Hildreth. 1996. Human heading judgments in the presence of moving objects. Percept
Psychophys 58: 836–856.
Rushton, S. K., and P. A. Warren. 2005. Moving observers, relative retinal motion and the detection of object
movement. Curr Biol 15: R542–R543.
Rust, N. C., V. Mante, E. P. Simoncelli, and J. A. Movshon. 2006. How MT cells analyze the motion of visual
patterns. Nat Neurosci 9: 1421–1431.
Schaafsma, S. J., and J. Duysens. 1996. Neurons in the ventral intraparietal area of awake macaque monkey
closely resemble neurons in the dorsal part of the medial superior temporal area in their responses to
optic flow patterns. J Neurophysiol 76: 4056–4068.
Schlack, A., K. P. Hoffmann, and F. Bremmer. 2002. Interaction of linear vestibular and visual stimulation in
the macaque ventral intraparietal area (VIP). Eur J Neurosci 16: 1877–1886.
Schwarz, D. W., and J. M. Fredrickson. 1971a. Rhesus monkey vestibular cortex: A bimodal primary projection
field. Science 172: 280–281.
Schwarz, D. W., and J. M. Fredrickson. 1971b. Tactile direction sensitivity of area 2 oral neurons in the rhesus
monkey cortex. Brain Res 27: 397–401.
Shenoy, K. V., D. C. Bradley, and R. A. Andersen. 1999. Influence of gaze rotation on the visual response of
primate MSTd neurons. J Neurophysiol 81: 2764–2786.
Siegel, R. M., and H. L. Read. 1997. Analysis of optic flow in the monkey parietal area 7a. Cereb Cortex 7:
327–346.
Stanford, T. R., S. Quessy, and B. E. Stein. 2005. Evaluating the operations underlying multisensory integration
in the cat superior colliculus. J Neurosci 25: 6499–6508.
Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press.
Stein, B. E., and T. R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the
single neuron. Nat Rev Neurosci 9: 255–266.
Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual com-
munication information in the primate ventrolateral prefrontal cortex. J Neurosci 26: 11138–11147.
Takahashi, K., Y. Gu, P. J. May, S. D. Newlands, G. C. Deangelis, and D. E. Angelaki. 2007. Multimodal coding
of three-dimensional rotation and translation in area MSTd: Comparison of visual and vestibular selectiv-
ity. J Neurosci 27: 9742–9756.
Tanaka, K., K. Hikosaka, H. Saito, M. Yukie, Y. Fukada, and E. Iwai. 1986. Analysis of local and wide-field
movements in the superior temporal visual areas of the macaque monkey. J Neurosci 6: 134–144.
Tanaka, K., and H. Saito. 1989. Analysis of motion of the visual field by direction, expansion/contraction, and
rotation cells clustered in the dorsal part of the medial superior temporal area of the macaque monkey. J
Neurophysiol 62: 626–641.
Telford, L., I. P. Howard, and M. Ohmi. 1995. Heading judgments during active and passive self-motion. Exp
Brain Res 104: 502–510.
Waespe, W., U. Buttner, and V. Henn. 1981. Visual–vestibular interaction in the flocculus of the alert monkey:
I. Input activity. Exp Brain Res 43: 337–348.
Waespe, W., and V. Henn. 1977. Neuronal activity in the vestibular nuclei of the alert monkey during vestibular
and optokinetic stimulation. Exp Brain Res 27: 523–538.
Waespe, W., and V. Henn. 1981. Visual–vestibular interaction in the flocculus of the alert monkey: II. Purkinje
cell activity. Exp Brain Res 43: 349–360.
Warren, P. A., and S. K. Rushton. 2007. Perception of object trajectory: Parsing retinal motion into self and
object movement components. J Vis 7: 21–11.
Warren, P. A., and S. K. Rushton. 2008. Evidence for flow-parsing in radial flow displays. Vis Res 48:
655–663.
Warren, W. H. 2003 Optic flow. In The visual neurosciences, ed. L. M. Chalupa and J. S. Werner. Cambridge,
MA: MIT Press.
Warren, W. H., and J. A. Saunders. 1995. Perceiving heading in the presence of moving objects. Perception 24:
315–331.
Wexler, M. 2003. Voluntary head movement and allocentric perception of space. Psychol Sci 14: 340–346.
Wexler, M., F. Panerai, I. Lamouret, and J. Droulez. 2001. Self-motion and the perception of stationary objects.
Nature 409: 85–88.
Wexler, M., and J. J. Van Boxtel. 2005. Depth perception by the active observer. Trends Cogn Sci 9: 431–438.
Wichmann, F. A., and N. J. Hill. 2001. The psychometric function: I. Fitting, sampling, and goodness of fit.
Percept Psychophys 63: 1293–1313.
Wolfe, J. W., and R. L. Cramer. 1970. Illusions of pitch induced by centripetal acceleration. Aerosp Med 41:
1136–1139.
Visual–Vestibular Integration for Self-Motion Perception 649
Zhang, T., and K. H. Britten. 2003. Microstimulation of area VIP biases heading perception in monkeys.
Program No. 339.9. 2003 Neuroscience Abstract Viewer/Itinerary Planner. New Orleans, LA: Society
for Neuroscience.
Zhang, T., H. W. Heuer, and K. H. Britten. 2004. Parietal area VIP neuronal responses to heading stimuli are
encoded in head-centered coordinates. Neuron 42: 993–1001.
Section VIII
Naturalistic Multisensory Processes:
Communication Signals
32 Unity of the Senses for Primate
Vocal Communication
Asif A. Ghazanfar
CONTENTS
32.1 Introduction........................................................................................................................... 653
32.2 Multisensory Communication Is the Default Mode of Communication............................... 654
32.3 Monkeys Link Facial Expressions to Vocal Expressions...................................................... 654
32.4 Dynamic Faces Modulate Voice Processing in Auditory Cortex . ....................................... 655
32.5 Auditory Cortical Interactions with Superior Temporal Sulcus Mediates Face/Voice
Integration.............................................................................................................................. 656
32.6 Viewing Vocalizing Conspecifics.......................................................................................... 658
32.7 Somatosensory Feedback during Vocal Communication...................................................... 659
32.8 Emergence of Multisensory Systems for Communication....................................................660
32.9 Conclusions............................................................................................................................ 661
Acknowledgments........................................................................................................................... 661
References....................................................................................................................................... 662
32.1 INTRODUCTION
The basic tenet of neocortical organization is: different regions of the cortex have different func-
tions. Some regions receive visual, auditory, tactile, olfactory, and gustatory sensations. Each of
these sensory regions is thought to send projections that converge on an “association area,” which
then enables the association between the different senses and between the senses and movement.
According to a highly influential two-part review by Norman Geschwind, entitled, “Disconnexion
syndromes in animals and man” (Geschwind 1965a, 1965b), the connections between sensory asso-
ciation areas are not robust in nonhuman animals, limiting their ability to make cross-modal sen-
sory associations. In contrast, humans can readily make such associations, for example, between the
sight of a lion and the sounds of its roar.
This picture of human versus nonhuman cross-modal abilities based on anatomy led to the idea
that human speech and language evolved in parallel with robust cross-modal connections within the
neocortex. Geschwind claimed that the “ability to acquire speech has as a prerequisite the ability to
form cross-modal associations” (Geschwind 1965a, 1965b). This view of cross-modal associations
as a potentially uniquely human capacity remains present even in more current ideas about the evo-
lution of language. For example, it has been suggested that human language depends on our unique
ability to imitate in multiple modalities, which in turn relies on a “substantial change in neural orga-
nization, one that affects not only imitation but also communication” (Hauser et al. 2002, p. 1575).
The purpose of this review is twofold: (1) to refute the view that the cross-modal (multisensory,
hereafter) associations are mediated solely through association areas and (2) to debunk the view
that human communication is uniquely multisensory. To achieve these two goals, I will focus on the
multisensory nature of nonhuman primate vocal communication and the many possible roles that
one, nonassociation area plays: the auditory cortex.
653
654 The Neural Bases of Multisensory Processes
16
Frequency (kHz)
0
0 0.8
16
Frequency (kHz)
0
0 Time (s) 0.26
FIGURE 32.1 Exemplars of facial expressions produced concomitantly with vocalizations. Rhesus monkey
coo and scream calls taken at midpoint of expressions with their corresponding spectrograms.
and monkeys and in which the auditory cortex may serve as a key node in a larger neocortical
network.
Face + voice
Pr Grunt Gr Grunt Voice
100 80
Face
Disk + voice
Spikes/s
0
0
FIGURE 32.2 (See color insert.) Single neuron examples of multisensory integration of Face + Voice
stimuli compared with Disk + Voice stimuli in lateral belt area. Left: enhanced response when voices are
coupled with faces, but no similar modulation when coupled with disks. Right: similar effects for a suppressed
response. x-Axes show time aligned to onset of face (solid line). Dashed lines indicate onset and offset of voice
signal. y-Axes depict firing rate of neuron in spikes per second. Shaded regions denote SEM.
The specificity of face/voice integrative responses was tested by replacing the dynamic faces with
dynamic disks that mimicked the aperture and displacement of the mouth. In human psychophysical
experiments, such artificial dynamic stimuli can still lead to enhanced speech detection, but not to
the same degree as a real face (Bernstein et al. 2004; Schwartz et al. 2004). When cortical sites or
single units were tested with dynamic disks, far less integration was seen when compared to the real
monkey faces (Ghazanfar et al. 2005, 2008; Figure 32.2). This was true primarily for the lateral belt
auditory cortex (LFPs and single units) and was observed to a lesser extent in the primary auditory
cortex (LFPs only). This suggests that there may be increasingly specific influences of “extra” sen-
sory modalities as one moves away from the primary sensory regions.
Unexpectedly, grunt vocalizations were overrepresented relative to coos in terms of enhanced
multisensory LFP responses (Ghazanfar et al. 2005). As coos and grunts are both produced fre-
quently in a variety of affiliative contexts and are broadband spectrally, the differential representa-
tion cannot be attributed to experience, valence, or the frequency tuning of neurons. One remaining
possibility is that this differential representation may reflect a behaviorally relevant distinction, as
coos and grunts differ in their direction of expression and range. Coos are generally contact calls
rarely directed toward any particular individual. In contrast, grunts are often directed toward indi-
viduals in one-on-one situations, often during social approaches as in baboons and vervet monkeys
(Cheney and Seyfarth 1982; Palombit et al. 1999). Given their production at close range and context,
grunts may produce a stronger face/voice association than coo calls. This distinction appeared to
be reflected in the pattern of significant multisensory responses in the auditory cortex, that is, this
multisensory bias toward grunt calls may be related to the fact the grunts (relative to coos) are often
produced during intimate, one-to-one social interactions.
et al. 1977; Bruce et al. 1981; Schroeder and Foxe 2002; Barraclough et al. 2005; Chandrasekaran
and Ghazanfar 2009). One mechanism for establishing whether auditory cortex and the STS inter-
act at the functional level is to measure their temporal correlations as a function of stimulus condi-
tion. Concurrent recording LFPs and spiking activity in the lateral belt of the auditory cortex and
the upper bank of the STS revealed that functional interactions, in the form of gamma band cor-
relations, between these two regions increased in strength during presentations of faces and voices
together relative to the unimodal conditions (Ghazanfar et al. 2008; Figure 32.3a). Furthermore,
these interactions were not solely modulations of response strength, as phase relationships were
significantly less variable (tighter) in the multisensory conditions (Figure 32.3b).
The influence of the STS on the auditory cortex was not merely on its gamma oscillations.
Spiking activity seems to be modulated, but not “driven,” by ongoing activity arising from the STS.
Three lines of evidence suggest this scenario. First, visual influences on single neurons were most
robust when in the form of dynamic faces and were only apparent when neurons had a significant
response to a vocalization (i.e., there were no overt responses to faces alone). Second, these integra-
tive responses were often “face-specific” and had a wide distribution of latencies, which suggested
that the face signal was an ongoing signal that influenced auditory responses (Ghazanfar et al.
2008). Finally, this hypothesis for an ongoing signal is supported by the sustained gamma band
activity between the auditory cortex and the STS and by a spike-field coherence analysis. This
analysis reveals that just before spiking activity in the auditory cortex, there is an increase in gamma
band power in the STS (Ghazanfar et al. 2008; Figure 32.3c).
(a) (b)
Face + voice Face Voice
Normalized concentration
198 2.0 1.25
Normalized amplitude
Face
113 1.15 Disk + voice
94 1.6
78 1.4 1.10
64
54 1.05
1.2
44 1.00
37 1.0
–400 –200 0 200 400 600 –400 –200 0 200 400 600 –400 –200 0 200 400 600 40 60 80 100120140160180
FIGURE 32.3 (See color insert.) (a) Time–frequency plots (cross-spectrograms) illustrate modulation of
functional interactions (as a function of stimulus condition) between lateral belt auditory cortex and STS for
a population of cortical sites. x-Axes depict time in milliseconds as a function of onset of auditory signal
(solid black line). y-Axes depict frequency of oscillations in Hz. Color bar indicates amplitude of these signals
normalized by baseline mean. (b) Population phase concentration from 0 to 300 ms after voice onset. x-Axes
depict frequency in Hz. y-Axes depict average normalized phase concentration. Shaded regions denote SEM
across all electrode pairs and calls. All values are normalized by baseline mean for different frequency bands.
Right panel shows phase concentration across all calls and electrode pairs in gamma band for four conditions.
(c) Spike-field cross-spectrogram illustrates relationship between spiking activity of auditory cortical neurons
and STS local field potential across population of cortical sites. x-Axes depict time in milliseconds as a func-
tion of onset of multisensory response in auditory neuron (solid black line). y-Axes depict frequency in Hz.
Color bar denotes cross-spectral power normalized by baseline mean for different frequencies.
658 The Neural Bases of Multisensory Processes
Both the auditory cortex and the STS have multiple bands of oscillatory activity generated in
responses to stimuli that may mediate different functions (Lakatos et al. 2005; Chandrasekaran
and Ghazanfar 2009). Thus, interactions between the auditory cortex and the STS are not lim-
ited to spiking activity and high frequency gamma oscillations. Below 20 Hz, and in response to
naturalistic audiovisual stimuli, there are directed interactions from the auditory cortex to the STS,
whereas above 20 Hz (but below the gamma range), there are directed interactions from the STS to
the auditory cortex (Kayser and Logothetis 2009). Given that different frequency bands in the STS
integrate faces and voices in distinct ways (Chandrasekaran and Ghazanfar 2009), it is possible that
these lower frequency interactions between the STS and the auditory cortex also represent distinct
multisensory processing channels.
Two things should be noted here. The first is that functional interactions between the STS and the
auditory cortex are not likely to occur solely during the presentation of faces with voices. Other con-
gruent, behaviorally salient audiovisual events such as looming signals (Maier et al. 2004; Gordon
and Rosenblum 2005; Cappe et al. 2009) or other temporally coincident signals may elicit similar
functional interactions (Noesselt et al. 2007; Maier et al. 2008). The second is that there are other
areas that, consistent with their connectivity and response properties (e.g., sensitivity to faces and
voices), could also (and very likely) have a visual influence on the auditory cortex. These include
the ventrolateral prefrontal cortex (Romanski et al. 2005; Sugihara et al. 2006) and the amygdala
(Gothard et al. 2007; Kuraoka and Nakamura 2007).
(a) (b)
60 30
Eye Without sound
Mouth
% Fixations
15
20 10
0 0
Normal Mismatch Silent 0 5 10 15 20 25 30
Mouth movement onset relative
to video start (s)
FIGURE 32.4 (a) Average fixation on eye region versus mouth region across three subjects while viewing
a 30-s video of vocalizing conspecific. Audio track had no influence on proportion of fixations falling onto
mouth or eye region. Error bars represent SEM. (b) We also find that when monkeys do saccade to mouth
region, it is tightly correlated with onset of mouth movements (r = .997, p < .00001).
2008), one possibility is that the fixations at the onset of mouth movements send a signal to the audi-
tory cortex, which resets the phase of an ongoing oscillation. This proprioceptive signal thus primes
the auditory cortex to amplify or suppress (depending on the timing of) a subsequent auditory signal
originating from the mouth. Given that mouth movements precede the voiced components of both
human (Abry et al. 1996) and monkey vocalizations (Ghazanfar et al. 2005; Chandrasekaran and
Ghazanfar 2009), the temporal order of visual to proprioceptive to auditory signals is consistent
with this idea. This hypothesis is also supported (although indirectly) by the finding that sign of
face/voice integration in the auditory cortex and the STS is influenced by the timing of mouth
movements relative to the onset of the voice (Ghazanfar et al. 2005; Chandrasekaran and Ghazanfar
2009).
Although the substrates for these somatosensory–auditory effects have not been explored, inter-
actions between the somatosensory system and the auditory cortex seem like a likely source for the
phenomena described above for the following reasons. First, many auditory cortical fields respond
to, or are modulated by, tactile inputs (Schroeder et al. 2001; Fu et al. 2003; Kayser et al. 2005).
Second, there are intercortical connections between somatosensory areas and the auditory cortex
(Cappe and Barone 2005; de la Mothe et al. 2006; Smiley et al. 2007). Third, the Caudomedial
auditory area CM, where many auditory–tactile responses seem to converge, is directly connected
to somatosensory areas in the retroinsular cortex and the granular insula (de la Mothe et al. 2006;
Smiley et al. 2006). Oddly enough, a parallel influence of audition on somatosensory areas has also
been reported: neurons in the “somatosensory” insula readily and selectively respond to vocaliza-
tions (Beiser 1998; Remedios et al. 2009). Finally, the tactile receptive fields of neurons in auditory
cortical area CM are confined to the upper body, primarily the face and neck regions (areas consist-
ing of, or covering, the vocal tract) (Fu et al. 2003) and the primary somatosensory cortical (area 3b)
representation for the tongue (a vocal tract articulator) projects to auditory areas in the lower bank of
the lateral sulcus (Iyengar et al. 2007). All of these facts lend further credibility to the putative role
of somatosensory–auditory interactions during vocal production and perception.
Like humans, other primates also adjust their vocal output according to what they hear. For
example, macaques, marmosets (Callithrix jacchus), and cotton-top tamarins (Saguinus oedipus)
adjust the loudness, timing, and acoustic structure of their vocalizations depending on background
noise levels and patterns (Sinnott et al. 1975; Brumm et al. 2004; Egnor and Hauser 2006; Egnor
et al. 2006, 2007). The specific number of syllables and temporal modulations in heard conspecific
calls can also differentially trigger vocal production in tamarins (Ghazanfar et al. 2001, 2002).
Thus, auditory feedback is also very important for nonhuman primates, and altering such feedback
can influence neurons in the auditory cortex (Eliades and Wang 2008). At this time, however, no
experiments have been conducted to investigate whether somatosensory feedback plays a role in
influencing vocal feedback. The neurophysiological and neuroanatomical data described above sug-
gest that it is not unreasonable to think that it does.
All sensorimotor tracts are heavily myelinated by 2 to 3 months after birth in rhesus monkeys, but
not until 8 to 12 months after birth in human infants. Finally, at the behavioral level, the differential
patterns of brain growth in the two species lead to differential timing in the emergence of species-
specific motor, socioemotional, and cognitive abilities (Antinucci 1989; Konner 1991).
The heterochrony of neural and behavioral development across different primate species raises
the possibility that the development of multisensory integration may be different in monkeys rela-
tive to humans. In particular, Turkewitz and Kenny (1982) suggested that the neural limitations
imposed by the relatively slow rate of neural development in human infants may actually be advan-
tageous because the limitations may provide them with greater functional plasticity. This, in turn,
may make human infants initially more sensitive to a broader range of sensory stimulation and to
the relations among multisensory inputs. This theoretical observation has received empirical sup-
port from studies showing that infants go through a process of “perceptual narrowing” in their
processing of unisensory as well as multisensory information, that is, where initially they exhibit
broad sensory tuning, they later exhibit narrower tuning. For example, 4- to 6-month-old human
infants can match rhesus monkey faces and voices, but 8- to 10-month-old infants no longer do so
(Lewkowicz and Ghazanfar 2006). These findings suggest that as human infants acquire increas-
ingly greater experience with conspecific human faces and vocalizations—but none with hetero-
specific faces and vocalizations—their sensory tuning (and their neural systems) narrows to match
their early experience.
If a relatively immature state of neural development leaves a developing organism more “open”
to the effects of early sensory experience, then it stands to reason that the more advanced state of
neural development in monkeys might result in a different outcome. In support of this, a study of
infant vervet monkeys that was identical in design to the human infant study of cross-species mul-
tisensory matching (Lewkowicz and Ghazanfar 2006) revealed that, unlike human infants, they
exhibit no evidence of perceptual narrowing (Zangehenpour et al. 2008). That is, the infant vervet
monkeys could match faces and voices of rhesus monkeys despite the fact that they had no prior
experience with macaque monkeys and that they continued to do so well beyond the ages where
such matching ability declines in human infants (Zangehenpour et al. 2008). The reason for this
lack of perceptual narrowing may lie in the precocial neurological development of this Old World
monkey species.
These comparative developmental data reveal that although monkeys and humans may appear
to share similarities at the behavioral and neural levels, their different developmental trajectories
are likely to reveal important differences. It is important to keep this in mind when making claims
about homologies at either of these levels.
32.9 CONCLUSIONS
The overwhelming evidence from the studies reviewed here, and numerous other studies from dif-
ferent domains of neuroscience, all converge on the idea that the neocortex is fundamentally mul-
tisensory (Ghazanfar and Schroeder 2006). This is not terribly surprising given that the sensory
experiences of humans and other animals are profoundly multimodal. This does not mean, however,
that every cortical area is uniformly multisensory. Indeed, I hope that the role of the auditory cortex
reviewed above for vocal communication illustrates that cortical areas maybe weighted differently
by “extra”-modal inputs depending on the task at hand and its context.
ACKNOWLEDGMENTS
The author gratefully acknowledges the scientific contributions and numerous discussions with
the following people: Chand Chandrasekaran, Kari Hoffman, David Lewkowicz, Joost Maier,
and Hjalmar Turesson. This work was supported by NIH R01NS054898 and NSF BCS-0547760
CAREER Award.
662 The Neural Bases of Multisensory Processes
REFERENCES
Abry, C., M.-T. Lallouache, and M.-A. Cathiard. 1996. How can coarticulation models account for speech sen-
sitivity in audio-visual desynchronization? In Speechreading by humans and machines: Models, systems
and applications, ed. D. Stork and M. Henneke, 247–255. Berlin: Springer-Verlag.
Adachi, I., H. Kuwahata, K. Fujita, M. Tomonaga, and T. Matsuzawa. 2006. Japanese macaques form a cross-
modal representation of their own species in their first year of life. Primates 47: 350–354.
Antinucci, F. 1989. Systematic comparison of early sensorimotor development. In Cognitive structure and devel-
opment in nonhuman primates, ed. F. Antinucci, 67–85. Hillsday, NJ: Lawrence Erlbaum Associates.
Barnes, C. L., and D. N. Pandya. 1992. Efferent cortical connections of multimodal cortex of the superior tem-
poral sulcus in the rhesus-monkey. Journal of Comparative Neurology 318: 222–244.
Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17: 377–391.
Batterson, V. G., S. A. Rose, A. Yonas, K. S. Grant, and G. P. Sackett. 2008. The effect of experience on the
development of tactual–visual transfer in pigtailed macaque monkeys. Developmental Psychobiology
50: 88–96.
Beiser, A. 1998. Processing of twitter-call fundamental frequencies in insula and auditory cortex of squirrel
monkeys. Experimental Brain Research 122: 139–148.
Benevento, L. A., J. Fallon, B. J. Davis, and M. Rezak. 1977. Auditory–visual interactions in single cells
in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey.
Experimental Neurology 57: 849–872.
Bernstein, L. E., E. T. Auer, and S. Takayanagi. 2004. Auditory speech detection in noise enhanced by lipread-
ing. Speech Communication 44: 5–18.
Besle, J., A. Fort, C. Delpuech, and M. H. Giard. 2004. Bimodal speech: Early suppressive visual effects in
human auditory cortex. European Journal of Neuroscience 20: 2225–2234.
Bizley, J. K., F. R. Nodal, V. M. Bajo, I. Nelken, and A. J. King. 2007. Physiological and anatomical evidence
for multisensory interactions in auditory cortex. Cerebral Cortex 17: 2172–2189.
Bruce, C., R. Desimone, and C. G. Gross. 1981. Visual properties of neurons in a polysensory area in superior
temporal sulcus of the macaque. Journal of Neurophysiology 46: 369–384.
Brumm, H., K. Voss, I. Kollmer, and D. Todt. 2004. Acoustic communication in noise: Regulation of call char-
acteristics in a New World monkey. Journal of Experimental Biology 207: 443–448.
Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of
cortical processing in the monkey. European Journal of Neuroscience 22: 2886–2902.
Cappe, C., G. Thut, V. Romei, and M. M. Murray. 2009. Selective integration of auditory–visual looming cues
by humans. Neuropsychologia 47: 1045–1052.
Chandrasekaran, C., and A. A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices
differently in the superior temporal sulcus. Journal of Neurophysiology 101: 773–788.
Cheney, D. L., and R. M. Seyfarth. 1982. How vervet nonkeys perceive their grunts—Field playback experi-
ments. Animal Behaviour 30: 739–751.
De La Mothe, L. A., S. Blumell, Y. Kajikawa, and T. A. Hackett. 2006. Cortical connections of the auditory
cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:
27–71.
Driver, J., and T. Noesselt. 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’
brain regions, neural responses, and judgments. Neuron 57: 11–23.
Egnor, S. E. R., and M. D. Hauser. 2006. Noise-induced vocal modulation in cotton-top tamarins (Saguinus
oedipus). American Journal of Primatology 68: 1183–1190.
Egnor, S. E. R., C. G. Iguina, and M. D. Hauser., 2006. Perturbation of auditory feedback causes system-
atic perturbation in vocal structure in adult cotton-top tamarins. Journal of Experimental Biology 209:
3652–3663.
Egnor, S. E. R., J. G. Wickelgren, and M. D. Hauser. 2007. Tracking silence: Adjusting vocal production to
avoid acoustic interference. Journal of Comparative Physiology A–Neuroethology Sensory Neural and
Behavioral Physiology 193: 477–483.
Eliades, S. J., and X. Q. Wang. 2008. Neural substrates fo vocalization feedback monitoring in primate auditory
cortex. Nature 453: 1102–1107.
Ettlinger, G., and W. A. Wilson. 1990. Cross-modal performance: Behavioural processes, phylogenetic consid-
erations and neural mechanisms. Behavioural Brain Research 40: 169–192.
Unity of the Senses for Primate Vocal Communication 663
Evans, T. A., S. Howell, and G. C. Westergaard. 2005. Auditory–visual cross-modal perception of communica-
tive stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology—Animal
Behavior Processes 31: 399–406.
Fitch, W. T. 1997. Vocal tract length and formant frequency dispersion correlate with body size in rhesus
macaques. Journal of the Acoustical Society of America 102: 1213–1222.
Fitch, W. T., and M. D. Hauser. 1995. Vocal production in nonhuman primates—Acoustics, physiology, and
functional constraints on honest advertisement. American Journal of Primatology 37: 191–219.
Fowler, C. A. 2004. Speech as a supramodal or amodal phenomenon. In The handbook of multisensory pro-
cesses, ed. G.A. Calvert, C. Spence, and B.E. Stein, 189–201. Cambridge, MA: MIT Press.
Fu, K. M. G., T. A. Johnston, A. S. Shah, L. Arnold, J. Smiley, T. A. Hackett, P. E. Garraghty, and C. E. Schroeder.
2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience 23:
7510–7515.
Fu, K. M. G., A. S. Shah, M. N. O’Connell, T. Mcginnis, H. Eckholdt, P. Lakatos, J. Smiley, and C. E. Schroeder.
2004. Timing and laminar profile of eye-position effects on auditory responses in primate auditory cor-
tex. Journal of Neurophysiology 92: 3522–3531.
Geschwind, N. 1965a. Disconnexion syndromes in animals and man, Part I. Brain 88: 237–294.
Geschwind, N. 1965b. Disconnexion syndromes in animals and man, Part II. Brain 88: 585–644.
Ghazanfar, A. A., C. Chandrasekaran, and N. K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience 28: 4457–4469.
Ghazanfar, A. A., and C. F. Chandrasekaran. 2007. Paving the way forward: Integrating the senses through
phase-resetting of cortical oscillations. Neuron 53: 162–164.
Ghazanfar, A. A., J. I. Flombaum, C. T. Miller, and M. D. Hauser. 2001. The units of perception in the antipho-
nal calling behavior of cotton-top tamarins (Saguinus oedipus): Playback experiments with long calls.
Journal of Comparative Physiology A – Neuroethology Sensory Neural and Behavioral Physiology 187:
27–35.
Ghazanfar, A. A., and N. K. Logothetis. 2003. Facial expressions linked to monkey calls. Nature 423:
937–938.
Ghazanfar, A. A., J. X. Maier, K. L. Hoffman, and N. K. Logothetis. 2005. Multisensory integration of dynamic
faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25: 5004–5012.
Ghazanfar, A. A., K. Nielsen, and N. K. Logothetis. 2006. Eye movements of monkeys viewing vocalizing
conspecifics. Cognition 101: 515–529.
Ghazanfar, A. A., and D. Rendall. 2008. Evolution of human vocal production. Current Biology 18:
R457–R460.
Ghazanfar, A. A., and C. E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive
Sciences 10: 278–285.
Ghazanfar, A. A., D. Smith-Rohrberg, A. A. Pollen, and M. D. Hauser. 2002. Temporal cues in the antiphonal
long-calling behaviour of cottontop tamarins. Animal Behaviour 64: 427–438.
Ghazanfar, A. A., H. K. Turesson, J. X. Maier, R. Van Dinther, R. D. Patterson, and N. K. Logothetis. 2007.
Vocal tract resonances as indexical cues in rhesus monkeys. Current Biology 17: 425–430.
Gibson, K. R. 1991. Myelination and behavioral development: A comparative perspective on questions of
neoteny, altriciality and intelligence. In Brain maturation and cognitive development: Comparative
and cross-cultural perspectives, ed. K. R. Gibson and A. C. Petersen, 29–63. New York: Aldine de
Gruyter.
Gogate, L. J., A. S. Walker-Andrews, and L. E. Bahrick. 2001. The intersensory origins of word comprehen-
sion: An ecological–dynamic systems view. Developmental Science 4: 1–18.
Gordon, M. S., and L. D. Rosenblum. 2005. Effects of intrastimulus modality change on audiovisual time-to-
arrival judgments. Perception and Psychophysics 67: 580–594.
Gothard, K. M., F. P. Battaglia, C. A. Erickson, K. M. Spitler, and D. G. Amaral. 2007. Neural responses to facial
expression and face identity in the monkey amygdala. Journal of Neurophysiology 97: 1671–1683.
Gunderson, V. M. 1983. Development of cross-modal recognition in infant pigtail monkeys (Macaca nemes-
trina). Developmental Psychology 19: 398–404.
Gunderson, V. M., S. A. Rose, and K. S. Grantwebster. 1990. Cross-modal transfer in high-risk and low-risk
infant pigtailed macaque monkeys. Developmental Psychology 26: 576–581.
Hackett, T. A., L. A. de La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C. E. Schroeder. 2007a. Multisensory
convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane.
Journal of Comparative Neurology 502: 924–952.
664 The Neural Bases of Multisensory Processes
Hackett, T. A., J. F. Smiley, I. Ulbert, G. Karmos, P. Lakatos, L. A. de La Mothe, and C. E. Schroeder., 2007b.
Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception 36: 1419–1430.
Hackett, T. A., I. Stepniewska, and J. H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in
macaque monkeys. Brain Research 817: 45–58.
Harries, M. H., and D. I. Perrett. 1991. Visual processing of faces in temporal cortex—Physiological evidence for
a modular organization and possible anatomical correlates. Journal of Cognitive Neuroscience 3: 9–24.
Hauser, M. D., N. Chomsky, and W. Fitch. 2002. The faculty of language: What is it, who has it, and how did
it evolve? Science 298: 1569–1579.
Hauser, M. D., C. S. Evans, and P. Marler. 1993. The role of articulation in the production of rhesus-monkey,
Macaca mulatta. Vocalizations. Animal Behaviour 45: 423–433.
Hauser, M. D., and M. S. Ybarra. 1994. The role of lip configuration in monkey vocalizations—Experiments
using xylocaine as a nerve block. Brain and Language 46: 232–244.
Ito, T., M. Tiede, and D. J. Ostry. 2009. Somatosensory function in speech perception. Proceedings of the
National Academy of Sciences of the United States of America 106: 1245–1248.
Iyengar, S., H. Qi, N. Jain, and J. H. Kaas. 2007. Cortical and thalamic connections of the representations of
the teeth and tongue in somatosensory cortex of new world monkeys. Journal of Comparative Neurology
501: 95–120.
Izumi, A., and S. Kojima. 2004. Matching vocalizations to vocalizing faces in a chimpanzee (Pan troglodytes).
Animal Cognition 7: 179–184.
Jiang, J. T., A. Alwan, P. A. Keating, E. T. Auer, and L. E. Bernstein. 2002. On the relationship between face
movements, tongue movements, and speech acoustics. EURASIP Journal of Applied Signal Processing
1174–1188.
Jones, J. A., and K. G. Munhall. 2003. Learning to produce speech with an altered vocal tract: The role of audi-
tory feedback. Journal of the Acoustical Society of America 113: 532–543.
Jones, J. A., and K. G. Munhall. 2005. Remapping auditory–motor representations in voice production. Current
Biology 15: 1768–1772.
Jordan, K. E., E. M. Brannon, N. K. Logothetis, and A. A. Ghazanfar. 2005. Monkeys match the number of
voices they hear with the number of faces they see. Current Biology 15: 1034–1038.
Kayser, C., and N. K. Logothetis. 2009. Directed interactions between auditory and superior temporal cor-
tices and their role in sensory integration. Frontiers in Integrative Neuroscience 3: 7. doi: 10.3389/
neuro.07.007.2009.
Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory
cortex. Neuron 48: 373–384.
Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2007. Functional imaging reveals visual modulation
of specific fields in auditory cortex. Journal of Neuroscience 27: 1824–1835.
Kayser, C., C. I. Petkov, and N. K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral
Cortex 18: 1560–1574.
Klin, A., W. Jones, R. Schultz, F. Volkmar, and D. Cohen. 2002. Visual fixation patterns during viewing of
naturalistic social situations as predictors of social competence in individuals with autism. Archives of
General Psychiatry 59: 809–816.
Konner, M. 1991. Universals of behavioral development in relation to brain myelination. In Brain maturation
and cognitive development: Comparative and cross-cultural perspectives, ed. K. R. Gibson and A. C.
Petersen, 181–223. New York: Aldine de Gruyter.
Kuhl, P. K., K. A. Williams, and A. N. Meltzoff. 1991. Cross-modal speech perception in adults and infants
using nonspeech auditory stimuli. Journal of Experimental Psychology: Human perception and perfor-
mance 17: 829–840.
Kuraoka, K., and K. Nakamura. 2007. Responses of single neurons in monkey amygdala to facial and vocal
emotions. Journal of Neurophysiology 97: 1379–1387.
Lakatos, P., C.-M. Chen, M. N. O’Connell, A. Mills, and C. E. Schroeder. 2007. Neuronal oscillations and
multisensory interaction in primary auditory cortex. Neuron 53: 279–292.
Lakatos, P., A. S. Shah, K. H. Knuth, I. Ulbert, G. Karmos, and C. E. Schroeder. 2005. An oscillatory hier-
archy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of
Neurophysiology 94: 1904–1911.
Lansing, I. R., and G. W. Mcconkie. 2003. Word identification and eye fixation locations in visual and visual-
plus-auditory presentations of spoken sentences. Perception and Psychophysics 65: 536–552.
Lewkowicz, D. J., and A. A. Ghazanfar. 2006. The decline of cross-species intersensory perception in
human infants. Proceedings of the National Academy of Sciences of the United States of America 103:
6771–6774.
Unity of the Senses for Primate Vocal Communication 665
Lewkowicz, D. J., and R. Lickliter. 1994. The development of intersensory perception: Comparative perspec-
tives. Hillsdale, NJ: Lawrence Erlbaum Associates.
Liberman, A. M., and I. Mattingly. 1985. The motor theory revised. Cognition 21: 1–36.
Maier, J. X., C. Chandrasekaran, and A. A. Ghazanfar. 2008. Integration of bimodal looming signals through
neuronal coherence in the temporal lobe. Current Biology 18: 963–968.
Maier, J. X., J. G. Neuhoff, N. K. Logothetis, and A. A. Ghazanfar. 2004. Multisensory integration of looming
signals by Rhesus monkeys. Neuron 43: 177–181.
Malkova, L., E. Heuer, and R. C. Saunders. 2006. Longitudinal magnetic resonance imaging study of rhesus
monkey brain development. European Journal of Neuroscience 24: 3204–3212.
Mcgurk, H., and J. Macdonald. 1976. Hearing lips and seeing voices. Nature 264: 229–239.
Meltzoff, A. N., and M. Moore. 1997. Explaining facial imitation: A theoretical model. Early Development and
Parenting 6: 179–192.
Nasir, S. M., and D. J. Ostry. 2008. Speech motor learning in profoundly deaf adults. Nature Neuroscience 11:
1217–1222.
Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H.-J. Heinze, and J. Driver. 2007.
Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus pri-
mary sensory cortices. Journal of Neuroscience 27: 11431–11441.
Oram, M. W., and D. I. Perrett. 1994. Responses of anterior superior temporal polysensory (Stpa) neurons to
biological motion stimuli. Journal of Cognitive Neuroscience 6: 99–116.
Palombit, R. A., D. L. Cheney, and R. M. Seyfarth. 1999. Male grunts as mediators of social interaction with
females in wild chacma baboons (Papio cynocephalus ursinus). Behaviour 136: 221–242.
Parr, L. A. 2004. Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition.
Animal Cognition 7: 171–178.
Patterson, M. L., and J. F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice.
Developmental Science 6: 191–196.
Remedios, R., N. K. Logothetis, and C. Kayser. 2009. An auditory region in the primate insular cortex respond-
ing preferentially to vocal communication sounds. Journal of Neuroscience 29: 1034–1045.
Robinson, D. A., and A. F. Fuchs. 1969. Eye movements evoked by stimulation of frontal eye fields. Journal of
Neurophysiology 32: 637–648.
Romanski, L. M., B. B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate
ventrolateral prefrontal cortex. Journal of Neurophysiology 93: 734–747.
Romanski, L. M., J. F. Bates, and P. S. Goldman-Rakic. 1999. Auditory belt and parabelt projections to the
prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403: 141–157.
Romanski, L. M., and A. A. Ghazanfar. 2009. The primate frontal and temporal lobes and their role in multi-
sensory vocal communication. In Primate neuroethology, ed. M. L. Platt and A. A. Ghazanfar. Oxford:
Oxford Univ. Press.
Rosenblum, L. D. 2005. Primacy of multimodal speech perception. In Handbook of speech perception, ed.
D. B. Pisoni and R. E. Remez, 51–78. Malden, MA: Blackwell.
Sacher, G. A., and E. F. Staffeldt. 1974. Relation of gestation time to brain weight for placental mammals:
Implications for the theory of vertebrate growth. American Naturalist 108: 593–615.
Sams, M., R. Mottonen, and T. Sihvonen. 2005. Seeing and hearing others and oneself talk. Cognitive Brain
Research 23: 429–435.
Schall, J. D., A. Morel, D. J. King, and J. Bullier. 1995. Topography of visual cortex connections with frontal
eye field in macaque: Convergence and segregation of processing streams. Journal of Neuroscience 15:
4464–4487.
Schroeder, C. E., and J. J. Foxe, 2002. The timing and laminar profile of converging inputs to multisensory
areas of the macaque neocortex. Cognitive Brain Research 14: 187–198.
Schroeder, C. E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual ampli-
fication of speech. Trends in Cognitive Science 12: 106–113.
Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory input
to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327.
Schwartz, J.-L., F. Berthommier, and C. Savariaux. 2004. Seeing to hear better: Evidence for early audio-visual
interactions in speech identification. Cognition 93: B69–B78.
Seltzer, B., and D. N. Pandya. 1989. Frontal-lobe connections of the superior temporal sulcus in the rhesus-
monkey. Journal of Comparative Neurology 281: 97–113.
Seltzer, B., and D. N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior tem-
poral sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology 343:
445–463.
666 The Neural Bases of Multisensory Processes
Sinnott, J. M., W. C. Stebbins, and D. B. Moody. 1975. Regulation of voice amplitude by monkey. Journal of
the Acoustical Society of America 58: 412–414.
Smiley, J. F., T. A. Hackett, I. Ulbert, G. Karmas, P. Lakatos, D. C. Javitt, and C. E. Schroeder. 2007. Multisensory
convergence in auditory cortex: I. Cortical connections of the caudal superior temporal plane in macaque
monkeys. Journal of Comparative Neurology 502: 894–923.
Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual
communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:
11138–11147.
Tremblay, S., D. M. Shiller, and D. J. Ostry. 2003. Somatosensory basis of speech production. Nature 423:
866–869.
Turkewitz, G., and P. A. Kenny. 1982. Limitations on input as a basis for neural organization and perceptual
development: A preliminary theoretical statement. Developmental Psychobiology 15: 357–368.
Van Wassenhove, V., K. W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of
auditory speech. Proceedings of the National Academy of Sciences of the United States of America 102:
1181–1186.
Wallace, M. T., and B. E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior
colliculus. Journal of Neuroscience 21: 8886–8894.
Werner-Reiss, U., K. A. Kelly, A. S. Trause, A. M. Underhill, and J. M. Groh. 2003. Eye position affects activity
in primary auditory cortex of primates. Current Biology 13: 554–562.
Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior.
Speech Communication 26: 23–43.
Yehia, H. C., T. Kuratate, and E. Vatikiotis-Bateson. 2002. Linking facial animation, head motion and speech
acoustics. Journal of Phonetics 30: 555–568.
Zangehenpour, S., A. A. Ghazanfar, D. J. Lewkowicz, and R. J. Zatorre. 2008. Heterochrony and cross-species
intersensory matching by infant vervet monkeys. PLoS ONE 4: e4302.
33 Convergence of Auditory,
Visual, and Somatosensory
Information in Ventral
Prefrontal Cortex
Lizabeth M. Romanski
CONTENTS
33.1 Introduction........................................................................................................................... 667
33.2 Anatomical Innervation of Ventral Prefrontal Cortex........................................................... 668
33.2.1 Visual Projections to Ventral Prefrontal Cortex........................................................ 668
33.2.2 Auditory Projections to Prefrontal Cortex.................................................................668
33.2.3 Somatosensory Connections with Prefrontal Cortex................................................ 670
33.3 Physiological Responses in VLPFC Neurons........................................................................ 670
33.3.1 Visual Responses....................................................................................................... 670
33.3.2 Auditory Responses and Function in Prefrontal Cortex............................................ 671
33.3.3 Prefrontal Responses to Vocalizations...................................................................... 672
33.3.4 Somatosensory Responses......................................................................................... 673
33.3.5 Multisensory Responses............................................................................................ 676
33.3.6 Functional Considerations......................................................................................... 678
References....................................................................................................................................... 678
33.1 INTRODUCTION
Our ability to recognize and integrate auditory and visual stimuli is the basis for many cognitive
processes but is especially essential in meaningful communication. Although many brain regions
contribute to recognition and integration of sensory signals, the frontal lobes both receive a mul-
titude of afferents from sensory association areas and have influence over a wide region of the
nervous system to govern behavior. Furthermore, the frontal lobes are special in that they have
been associated with language processes, working memory, planning, and reasoning, which all
depend on the recognition and integration of a vast network of signals. Research has also shown
that somatosensory afferents reach the frontal lobe and that in specific regions single cells encode
somatosensory signals. In this chapter we will focus on the ventrolateral prefrontal cortex (VLPFC),
also known as the inferior convexity in some studies, and describe the connectivity of VLPFC
with auditory, visual, and somatosensory cortical areas. This connectivity provides the circuitry
for prefrontal responses to these stimuli, which will also be described from previous research. The
potential function of combined auditory, visual, and somatosensory inputs will be described with
regard to communication and object recognition.
667
668 The Neural Bases of Multisensory Processes
and Pandya 1988). Latter studies confirmed the connection of the posterior STG with areas 46,
dorsal area 8, and the middle STG with rostral–dorsal 46 and 10, area 9, and area 12 (Petrides and
Pandya 1988; Barbas 1992).
Connections of ventrolateral prefrontal areas with auditory association cortex have been consid-
ered by several groups. Cytoarchitectonic analysis of the VLPFC suggested that the region labeled
by Walker as area 12 in the macaque monkey has similar characteristics as that of human area 47,
and was thus renamed in the macaque as area 47/12 by Petrides and Pandya (1988). Analysis of the
connections of areas 45 and 47/12 in the VLPFC has shown that they receive innervation from the
STG, the inferotemporal cortex, and from multisensory regions within the superior temporal sulcus.
Combining physiological recording with anatomical tract tracing, Romanski and colleagues (1999)
analyzed the connections of physiologically defined areas of the belt and parabelt auditory cortex
and determined that the projections to prefrontal cortex are topographically arranged so that rostral
and ventral prefrontal cortex receive projections from the anterior auditory association cortex (areas
AL and anterior parabelt), whereas caudal prefrontal regions are innervated by the posterior audi-
tory cortex (areas CL and caudal parabelt; Figure 33.1). Together with recent auditory physiological
recordings from the lateral belt (Tian et al. 2001) and from the prefrontal cortex (Romanski and
Goldman-Rakic 2002; Romanski et al. 2005), these studies suggest that separate auditory streams
originate in the anterior and posterior auditory cortex and target anterior-ventrolateral object, and
(a) 2 3
1 cs
asd
8b 8a
9 46d
46v AI
12vl 45 asv CL
10 ML
ls
AL
sts
(b) 2 asd
1 asd
ps
46 45
46
12vl
12o
los
FIGURE 33.1 Innervation of prefrontal cortex by auditory belt and parabelt injections. (a) Projections from
anterior auditory cortex to ventrolateral prefrontal cortex (VLPFC) are shown with black arrows and projec-
tions from caudal auditory cortex to dorsolateral prefrontal cortex (DLPFC) are shown in white. (b) Coronal
sections through the frontal lobe detail anatomical connections. Injections placed into anterior auditory belt
area AL resulted in projections to rostral 46, ventrolateral area 12vl, and lateral orbital cortex area 12o (shown
in black). Projections from caudal auditory cortex area CL and adjacent parabelt targeted caudal dorsal pre-
frontal cortex areas 46, area 8a, and part of area 45 (shown as white cells and fibers). Projections from ML
included some dorsal and ventral targets and are shown in gray. asd, dorsal ramus of arcuate sulcus; asv, ven-
tral ramus of arcuate sulcus; cs, central sulcus; ls, lateral sulcus; sts, superior temporal sulcus.
670 The Neural Bases of Multisensory Processes
dorsolateral spatial domains in the frontal lobe, respectively (Romanski 2007), similar to those
of the visual system. Ultimately, this also implies that auditory and visual afferents target similar
regions of dorsolateral and ventrolateral prefrontal cortex (Price 2008). The convergence of audi-
tory and visual ventral stream inputs to the same VLPFC domain implies that they may be inte-
grated and combined to serve a similar function, that of object recognition.
Dorsal stream
spatial vision
Parietal
8a cortex
DLPFC
46
12 45
10 VLPFC V1
FIGURE 33.2 Lateral brain schematic of visual pathways in nonhuman primate showing dorsal–spatial and
ventral–object visual streams that terminate in DLPFC and VLPFC, respectively. Wilson et al. (1993) showed
that neurons in DLPFC (black) respond during perception and memory of visuospatial information, whereas
neurons in VLPFC (gray) responded to object features including color, form, and type of visual stimulus. Later
studies by O’Scalaidhe et al. (1997, 1999) described “face cells” that were localized to gray region, of VLPFC
in areas 12 and 45.
Convergence of Information in Ventral Prefrontal Cortex 671
faces. These VLPFC cells did not respond in the spatial working memory task but did respond in
an object-fixation task and an object-conditional association task. Further electrophysiological and
neuroimaging studies have demonstrated face selectivity in this same area of VLPFC (O’Scalaidhe
et al. 1997, 1999; Tsao et al. 2008), confirming this functional domain separation.
Although these studies were the first to demonstrate an electrophysiological dissociation between
DLPFC and VLPFC, they were not the first to suggest a functional difference and to show the pref-
erence for object as opposed to spatial processing in the ventral prefrontal cortex. An earlier study
by Mishkin and Manning (1978) showed that lesions of the VLPFC in nonhuman primates interfere
with the processing of nonspatial information, including color and form. These ventral prefrontal
lesions had a severe and lasting impairment on the performance of three nonspatial tasks, whereas
lesions of the principal sulcus had only a transient effect (Mishkin and Manning 1978). Just a few
years earlier, Passingham (1975) had also suggested a dissociation between dorsal and ventral PFC.
In their study, rhesus monkeys were trained on delayed color matching task and delayed spatial
alternation tasks. Lesions of the VLPFC resulted in an impairment only in the delayed color match-
ing task, whereas lesions of the DLPFC only impaired the delayed spatial alternation task. These
results, like the Wilson et al. study two decades later, demonstrated a double dissociation of dorsal
and ventral PFC and suggested a role in the processing of object features and recognition for the
VLPFC.
Further analysis of the properties of cells in the VLPFC was done by Joaquin Fuster and col-
leagues. In their electrophysiological analysis of ventral prefrontal neurons, they showed that single
cells are responsive to simple and complex visual stimuli presented at the fovea (Pigarev et al. 1979;
Rosenkilde et al. 1981). The foveal receptive field properties of these cells had first been shown in
studies by Suzuki and Azuma (1977), who examined receptive field properties of neurons across
the expanse of lateral prefrontal cortex. The receptive fields of neurons in DLPFC were found to
lie outside the fovea and favored the contralateral visual field, whereas neurons below the principal
sulcus in areas 12/47 and 45 were found to be driven best by visual stimuli shown within the fovea
(Suzuki and Azuma 1977). Hoshi et al. (2000) examined the spatial distribution of location-selective
and shape-selective neurons during cue, delay, and response periods, and found more location-
selective neurons in the posterior part of the lateral PFC, whereas more shape-selective neurons
were found in the anterior part, corresponding to area 12/47. Ninokura et al. (2004) found that cells
that responded selectively to the physical properties (color and shape) of objects were localized to
the VLPFC. These various studies fostered the notion that visual neurons in VLPFC were tuned to
nonspatial features including color, shape, and type of object, and had receptive fields representing
areas in and around the fovea.
Finally, studies from Goldman-Rakic and colleagues further demonstrated that neurons in the
VLPFC were not only responsive to object features, but that some neurons were highly special-
ized and were face-selective (Wilson et al. 1993; O’Scalaidhe et al. 1997, 1999). The face-selective
neurons were found in several discrete regions including an anterior location that appears to be
area 12/47, a posterior, periarcuate, location within area 45, and some penetrations into the orbital
cortex also yielded face cells. These single-unit responses were further corroborated with functional
magnetic resonance imaging (fMRI) data by Tsao and colleagues (2008). In their fMRI, study they
showed that three loci within the VLPFC of macaques were selectively activated by faces (Tsao
et al. 2008; Figure 33.3). These three locations correspond roughly to the same anterior, posterior,
and ventral/orbital locations that O’Scalaidhe et al. (1997, 1999) mapped as being face-responsive
in their single-unit recording studies. Demonstration by both methods of visual responsiveness and
face selectivity substantiates the notion that the VLPFC is involved in object and face processing.
+28.5
6.5
4
+30.0
6.5
FIGURE 33.3 (See color insert.) Activation of macaque prefrontal cortex by faces in the study of Tsao et al.
(2008). Shown here are two coronal sections showing “face patches” in VLPFC (activations are yellow), delineated
with white arrows. (Reprinted from Tsao, D. Y. et al., Nat. Neurosci., 11, 877–879, 2008. With permission.)
neurons. In the human brain, the posterior aspects of Broca’s area are thought to be especially
involved in the phonetic and motor control of speech, whereas more anterior regions have been shown
to be activated during semantic processing, comprehension, and auditory working memory (Zatorre
et al. 1992; Paulesu et al. 1993; Buckner et al. 1995; Demb et al. 1995; Fiez et al. 1996; Stromswold
et al. 1996; Cohen et al. 1997; Gabrieli et al. 1998; Stevens et al. 1998; Price 1998; Posner et al. 1999;
Gelfand and Bookheimer 2003). Examination of prefrontal auditory function in nonhuman primates
has not received as much attention as visual prefrontal function. A few studies have investigated the
effects of large prefrontal lesions on behavioral task performance of auditory discrimination or mne-
monic processing of complex acoustic stimuli. In each of these four studies, relatively large lesions
of the lateral PFC were shown to cause an impairment in an auditory go/no-go task for food reward
(Weiskrantz and Mishkin 1958; Gross and Weiskrantz 1962; Gross 1963; Goldman and Rosvold
1970). This was taken as evidence of the PFC’s involvement in modality-independent processing
especially in tasks requiring inhibitory control (Weiskrantz and Mishkin 1958).
Despite the localization of language function in the human brain to ventral frontal lobe regions and
the demonstration that lesions of lateral PFC in nonhuman primates interferes with auditory discrimi-
nation, single-cell responses to acoustic stimuli have only been sporadically noted in the frontal lobes
of Old and New World monkeys (Benevento et al. 1977; Bodner et al. 1996; Newman and Lindsley
1976; Tanila et al. 1992, 1993; Wollberg and Sela 1980). However, a close look at these studies reveals
that few of the studies sampled neurons in ventrolateral and orbitofrontal regions. Most recordings in
the past have been confined to the dorsolateral surface of the frontal lobe where projections from sec-
ondary and tertiary auditory cortices are sparse. Only one early study recorded from the lateral orbital
region in the macaque cortex and found both auditory and visual responses to simple visual flashes
and to broadband auditory clicks (Benevento et al. 1977). Furthermore, none of the studies tested
neurons systematically with naturalistic and species-relevant acoustic stimuli. Recent approaches to
frontal lobe auditory function have utilized naturalistic stimuli, including species-specific vocaliza-
tions and have extended the area of investigation to orbital and ventral PFC regions.
auditory responsive region in the macaque VLPFC (Romanski and Goldman-Rakic 2002). This
VLPFC region has neurons that respond to complex acoustic stimuli, including species-specific
vocalizations, and lies adjacent to the object- and face-selective region proposed previously
(O’Scalaidhe et al. 1997, 1999; Wilson et al. 1993). Although VLPFC auditory neurons have not
been thoroughly tested for directional selectivity, further examination has suggested that they
encode complex auditory features and thus respond to complex stimuli on the basis of similar acous-
tic features (Romanski et al. 2005; Averbeck and Romanski 2006).
Use of a large library of rhesus macaque vocalizations to test auditory selectivity in prefrontal
neurons has shown that VLPFC neurons are robustly responsive to species-specific vocalizations
(Romanski et al. 2005). A cluster analysis of these vocalization responses did not show a cluster-
ing of responses to vocalizations depicting similar functions (i.e., food calls) but demonstrated that
neurons tend to respond to multiple vocalizations with similar acoustic morphology (Romanski
et al. 2005; Figure 33.4). Neuroimaging in rhesus monkeys has revealed a small ventral prefron-
tal locus that was active during presentation of complex acoustic stimuli including vocalizations
(Poremba and Mishkin 2007). Additional electrophysiological recording studies by Cohen and col-
leagues have suggested that prefrontal auditory neurons may also participate in the categorization
of species-specific vocalizations (Gifford et al. 2005). These combined data are consistent with a
role for VLPFC auditory neurons in a ventral auditory processing stream that analyzes the features
of auditory objects including vocalizations.
Evidence for object-based auditory processing in the ventral frontal lobe of the human brain is
suggested by neuroimaging studies that have detected activation in the VLPFC not only by speech
stimuli but by nonspeech and music stimuli (Belin et al. 2000; Binder et al. 2000; Scott et al. 2000;
Zatorre et al. 2004) in auditory recognition tasks and voice recognition tasks (Fecteau et al. 2005).
The localization of an auditory object processing stream in the human brain to the very same ven-
tral prefrontal region in a nonhuman primate suggests a functional similarity between this area and
human language-processing regions located in the inferior frontal gyrus (Deacon 1992; Romanski
and Goldman- Rakic 2002).
33.0
sp/s
Frequency (Hz)
2
0 0 0
0 0.580 0 0.250
Time (s)
Coo
Bark
Grunt
Girney
Gecker
Warble
Scream
Shrill Bark
Cop Scream
Harmonic Arch
FIGURE 33.4 A vocalization responsive cell in VLPFC. (a) Response to 10 vocalization exemplars is shown in raster/spike density plots. Strongest response was to
submissive scream vocalizations and copulation scream vocalizations, which are similar in acoustic features as shown in spectrogram in panel (b). A cluster analysis
(shown in c) of mean firing rate to these calls shows that calls with similar acoustic features tend to evoke similar neuronal responses. (Modified from Romanski, L. M. et
al., J. Neurophysiol., 93, 734– 747, 2005.)
The Neural Bases of Multisensory Processes
Convergence of Information in Ventral Prefrontal Cortex 675
(a) (b)
80 51
Hz
0 0
(c) (d)
82 39
Hz
0 0
(e) (f )
28 77
Hz
0 0
0 1 2 3 0 1 2 3
s s
45
neurons
0
0 1 2 3
s
FIGURE 33.5 Single-neuron spike density functions from six different neurons. Dark bars above each plot
indicate times during which the neuron’s firing rate carried significant (P < .01) monotonic signal about base
stimulus. (a, c, e) Positive monotonic neurons. (b, d, f) Negative monotonic neurons. (g) Total number of
recorded neurons (during fixed 3-s delay period runs) carrying a significant signal about the base stimulus, as
a function of time relative to beginning of delay period. Individual neurons may participate in more than one
bin. Base stimulus period is shaded gray, and minimum and maximum number of neurons, during the first,
middle, and last seconds of delay period, respectively, are indicated with arrows. (Reprinted by permission
from Macmillan Publishers Ltd., Romo, R. et al., Nature, 399, 470–473, 1999. With permission.)
memory activation of human VLPFC areas 47/12 and 45 by Kostopoulos et al. (2007). In their
fMRI study, the authors not only demonstrated activity of the VLPFC during a vibrotactile working
memory task but also showed functional connectivity with the secondary somatosensory cortex,
which was also active in this vibrotactile delayed discrimination task. The area activated, area 47
in the human brain, is analogous to monkey area 12/47, where face and vocalization responses have
been recorded (O’Scalaidhe et al. 1997, 1999; Romanski and Goldman-Rakic 2002; Romanski et al.
2005). The anatomical, electrophysiological, and neuroimaging data suggest that somatosensory
stimuli may converge in similar VLPFC regions where auditory and visual responsive neurons are
found and may combine to participate in object recognition.
676 The Neural Bases of Multisensory Processes
Spikes/s
0
–250 0 1000 1250
0
A V AV
Spikes/s
0
–250 0 1000 1250
0
A V AV
FIGURE 33.6 Multisensory neuronal responses in prefrontal cortex. Responses of two single units are
shown in (a) and (b) as raster/spike density plots to auditory vocalization alone (Aud), face alone (Vis), and
both presented simultaneously (AV). A bar graph of mean response to these stimuli is shown at right depict-
ing auditory (dark gray), visual (white), and multisensory (light gray) responses. Cell in panel (a) exhibited
multisensory enhancement and cell in panel (b) showed multisensory suppression.
appears to be related most to communication. Although Romo et al. (1999) showed evidence of
somatosensory processing related to touch, the innervation of ventral prefrontal cortex includes
afferents from the face region of SII (Preuss et al. 1989). This somatosensory face information
is arriving at ventral prefrontal regions that receive information about face identity, features, and
expression from areas TE and TPO (Webster et al. 1994; O’Scalaidhe et al. 1997, 1999), in addition
as
ps Audiovisual
responsive cells
(Sugihara et al. 2006)
Somatosensory
responsive region
ls
(Romo et al. 1999)
Auditory responsive
region sts
(Romanski and Goldman-
Rakic 2002)
Visual/Face
responsive region
(O'Scalidhe et al. 1997)
FIGURE 33.7 Auditory, visual, and somatosensory convergence in VLPFC is shown on a lateral brain sche-
matic of macaque frontal lobe. VLPFC location of vocalization-responsive area (dark gray), visual object- and
face-responsive area (light gray), somatosensory-responsive area (dashed line circle), audiovisual-responsive
cells (black dots) are all depicted on prefrontal cortex of macaque monkey in which they were recorded as,
arcuate sulcus; ls, lateral sulcus; ps, principal sulcus; sts, superior temporal sulcus. (Data from Sugihara, T.
et al., J. Neurosci., 26, 11138–11147, 2006.)
678 The Neural Bases of Multisensory Processes
to auditory inputs that carry information regarding species-specific vocalizations (Romanski et al.
2005).
REFERENCES
Averbeck, B. B., L. M. Romanski. 2006. Probabilistic encoding of vocalizations in macaque ventral lateral
prefrontal cortex. Journal Neuroscience 26: 11023–11033.
Barbas, H. 1988. Anatomic organization of basoventral and mediodorsal visual recipient prefrontal regions in
the rhesus monkey. Journal of Comparative Neurology 276: 313–342.
Convergence of Information in Ventral Prefrontal Cortex 679
Barbas, H. 1992. Architecture and cortical connections of the prefrontal cortex in the rhesus monkey. Advances
in Neurology 57: 91–115.
Barbas, H., and D. N. Pandya. 1989. Architecture and intrinsic connections of the prefrontal cortex in the rhesus
monkey. Journal of Comparative Neurology 286: 353–375.
Barbas, H., and M. M. Mesulam. 1981. Organization of afferent input to subdivisions of area 8 in the rhesus
monkey. Journal of Comparative Neurology 200: 407–431.
Barbas, H., and M. M. Mesulam. 1985. Cortical afferent input to the principalis region of the rhesus monkey.
Neuroscience 15: 619–637.
Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and auditory
information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive
Neuroscience 17: 377–391.
Belin, P., R. J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex.
Nature 403: 309–312.
Benevento, L. A., J. Fallon, B. J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the
cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental
Neurology 57: 849–872.
Binder, J. R., J. A. Frost, T. A. Hammeke, P. S. Bellgowan, J. A. Springer, J. N. Kaufman, and E. T. Possing.
2000. Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex 10: 512–528.
Bodner, M., J. Kroger, and J. M. Fuster. 1996. Auditory memory cells in dorsolateral prefrontal cortex.
Neuroreport 7: 1905–1908.
Buckner, R. L., M. E. Raichle, and S. E. Petersen. 1995. Dissociation of human prefrontal cortical areas across
different speech production tasks and gender groups. Journal of Neurophysiology 74: 2163–2173.
Bullier, J., J. D. Schall, and A. Morel. 1996. Functional streams in occipito-frontal connections in the monkey.
Behavioural Brain Research 76: 89–97.
Carmichael, S. T., and J. L. Price. 1995. Sensory and premotor connections of the orbital and medial prefrontal
cortex of macaque monkeys. Journal of Comparative Neurology 363: 642–664.
Cavada, C., and P. S. Goldman-Rakic. 1989. Posterior parietal cortex in rhesus monkey: II. Evidence for
segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. Journal of
Comparative Neurology 287: 422–445.
Chavis, D. A., and D. N. Pandya. 1976. Further observations on corticofrontal connections in the rhesus mon-
key. Brain Research 117: 369–386.
Cipolloni, P. B., and D. N. Pandya. 1989. Connectional analysis of the ipsilateral and contralateral afferent
neurons of the superior temporal region in the rhesus monkey. Journal of Comparative Neurology 281:
567–585.
Cipolloni, P. B., and D. N. Pandya. 1999. Cortical connections of the frontoparietal opercular areas in the rhesus
monkey. Journal of Comparative Neurology 403: 431–458.
Cohen, J. D., W. M. Perlstein, T. S. Braver, L. E. Nystrom, D. C. Noll, J. Jonides, and E. E. Smith. 1997.
Temporal dynamics of brain activation during a working memory task. Nature 386: 604–608.
Cohen, Y. E., F. Theunissen, B. E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their
representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97: 1470–1484.
Deacon, T. W. 1992. Cortical connections of the inferior arcuate sulcus cortex in the macaque brain. Brain
Research 573: 8–26.
Demb, J. B., J. E. Desmond, A. D. Wagner, C. J. Vaidya, G. H. Glover, and J. D. Gabrieli. 1995. Semantic
encoding and retrieval in the left inferior prefrontal cortex: A functional MRI study of task difficulty and
process specificity. Journal of Neuroscience 15: 5870–5878.
Ettlinger, G., and J. Wegener. 1958. Somaesthetic alternation, discrimination and orientation after frontal and
parietal lesions in monkeys. The Quarterly Journal of Experimental Psychology 10: 177–186.
Fecteau, S., J. L. Armony, Y. Joanette, and P. Belin. 2005. Sensitivity to voice in human prefrontal cortex.
Journal of Neurophysiology 94: 2251–2254.
Fiez, J. A., E. A. Raife, D. A. Balota, J. P. Schwarz, M. E. Raichle, and S. E. Petersen. 1996. A positron emis-
sion tomography study of the short-term maintenance of verbal information. Journal of Neuroscience
16: 808–822.
Fuster, J. M., M. Bodner, and J. K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of
frontal cortex. Nature 405: 347–351.
Gabrieli, J. D. E., R. A. Poldrack, and J. E. Desmond. 1998. The role of left prefrontal cortex in language and
memory. Proceedings of the National Academy of Sciences 95: 906–913.
Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporal-
frontal interaction in the rhesus monkey. Brain 114: 2133–2144.
680 The Neural Bases of Multisensory Processes
Galaburda, A. M., and D. N. Pandya. 1983. The intrinsic architectonic and connectional organization of the
superior temporal region of the rhesus monkey. Journal of Comparative Neurology 221: 169–184.
Gelfand, J. R., and S. Y. Bookheimer. 2003. Dissociating neural mechanisms of temporal sequencing and pro-
cessing phonemes. Neuron 38: 831–842.
Ghazanfar, A. A., C. Chandrasekaran, and N. K. Logothetis. 2008. Interactions between the superior tempo-
ral sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of
Neuroscience 28: 4457–4469.
Gilbert, A. M., and J. A. Fiez. 2004. Integrating rewards and cognition in the frontal cortex. Cognitive, Affective
and Behavioral Neuroscience 4: 540–552.
Goldman, P. S., and H. E. Rosvold. 1970. Localization of function within the dorsolateral prefrontal cortex of
the rhesus monkey. Experimental Neurology 27: 291–304.
Gross, C. G. 1963. A comparison of the effects of partial and total lateral frontal lesions on test performance by
monkeys. Journal of Comparative Physiological Psychology 56: 41–47.
Gross, C. G., and L. Weiskrantz. 1962. Evidence for dissociation of impairment on auditory discrimination
and delayed response following lateral frontal lesions in monkeys. Experimental Neurology 5: 453–476.
Hagen, M. C., D. H. Zald, T. A. Thornton, and J. V. Pardo. 2002. Somatosensory processing in the human infe-
rior prefrontal cortex. Journal of Neurophysiology 88: 1400–1406.
Hickok, G., B. Buchsbaum, C. Humphries, and T. Muftuler. 2003. Auditory–motor interaction revealed by fMRI:
Speech, music, and working memory in area spt. Journal of Cognitive Neuroscience 15: 673–682.
Homae, F., R. Hashimoto, K. Nakajima, Y. Miyashita, and K. L. Sakai. 2002. From perception to sentence com-
prehension: The convergence of auditory and visual information of language in the left inferior frontal
cortex. NeuroImage 16: 883–900.
Hoshi, E., K. Shima, and J. Tanji. 2000. Neuronal activity in the primate prefrontal cortex in the process of
motor selection based on two behavioral rules. Journal of Neurophysiology 83: 2355–2373.
Gifford III, G. W., K. A. Maclean, M. D. Hauser, and Y. E. Cohen. 2005. The neurophysiology of functionally
meaningful categories: Macaque ventrolateral prefrontal cortex plays a critical role in spontaneous cat-
egorization of species-specific vocalizations. Journal of Cognitive Neuroscience 17: 1471–1482.
Jones, E. G., and T. P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral
cortex of the monkey. Brain 93: 793–820.
Jones, J. A., and D. E. Callan. 2003. Brain activity during audiovisual speech perception: An fMRI study of the
McGurk effect. Neuroreport 14: 1129–1133.
Kostopoulos, P., M. C. Albanese, and M. Petrides. 2007. Ventrolateral prefrontal cortex and tactile memory dis-
ambiguation in the human brain. Proceedings of the National Academy of Sciences of the United States
of America 104: 10223–10228.
Miquee, A., C. Xerri, C. Rainville, J. L. Anton, B. Nazarian, M. Roth, and Y. Zennou-Azogui. 2008. Neuronal
substrates of haptic shape encoding and matching: A functional magnetic resonance imaging study.
Neuroscience 152: 29–39.
Mishkin, M., and F. J. Manning. 1978. Non-spatial memory after selective prefrontal lesions in monkeys. Brain
Research 143: 313–323.
Newman, J. D., and D. F. Lindsley. 1976. Single unit analysis of auditory processing in squirrel monkey frontal
cortex. Experimental Brain Research 25: 169–181.
Ninokura, Y., H. Mushiake, and J. Tanji. 2004. Integration of temporal order and object information in the
monkey lateral prefrontal cortex. Journal of Neurophysiology 91: 555–560.
O’Scalaidhe, S. P. O., F. A. W. Wilson, and P. G. R. Goldman-Rakic. 1999. Face-selective neurons during pas-
sive viewing and working memory performance of rhesus monkeys: Evidence for intrinsic specialization
of neuronal coding. Cerebral Cortex 9: 459–475.1
O’Scalaidhe, S. P., F. A. Wilson, and P. S. Goldman-Rakic. 1997. Areal segregation of face-processing neurons
in prefrontal cortex. Science 278: 1135–1138.
Pandya, D. N., and F. Sanides. 1973. Architectonic parcellation of the temporal operculum in rhesus monkey
and its projection pattern. Zeitschrift fuer Anatomie und Entwicklungsgeschichte 139: 127–161.
Pandya, D. N., and H. G. Kuypers. 1969. Cortico-cortical connections in the rhesus monkey. Brain Research
13: 13–36.
Pandya, D. N., M. Hallett, and S. K. Kmukherjee. 1969. Intra- and interhemispheric connections of the neocor-
tical auditory system in the rhesus monkey. Brain Research 14: 49–65.
Papoutsi, M., J. A. de Zwart, J. M. Jansma, M. J. Pickering, J. A. Bednar, and B. Horwitz. 2009. From pho-
nemes to articulatory codes: An fMRI study of the role of Broca’s area in speech production. Cerebral
Cortex 19: 2156–2165.
Convergence of Information in Ventral Prefrontal Cortex 681
Passingham, R. 1975. Delayed matching after selective prefrontal lesions in monkeys (Macaca mulatta). Brain
Research 92: 89–102.
Paulesu, E., C. D. Frith, and R. S. J. Frackowiak. 1993. The neural correlates of the verbal component of work-
ing memory. Nature 362: 342–5.32
Petrides, M., and D. N. Pandya. 1988. Association fiber pathways to the frontal cortex from the superior tem-
poral region in the rhesus monkey. Journal of Comparative Neurology 273: 52–66.
Pigarev, I. N., G. Rizzolatti, and C. Schandolara. 1979. Neurons responding to visual stimuli in the frontal lobe
of macaque monkeys. Neuroscience Letters 12: 207–212.
Poremba, A., and M. Mishkin. 2007. Exploring the extent and function of higher-order auditory cortex in rhesus
monkeys. Hearing Research 229: 14–23.
Posner, M. I., Y. G. Abdullaev, B. D. McCandliss, and S. C. Sereno. 1999. Neuroanatomy, circuitry and plastic-
ity of word reading. Neuroreport 10: R12–R23.
Preuss, T. M., and P. S. Goldman-Rakic. 1989. Connections of the ventral granular frontal cortex of macaques
with perisylvian premotor and somatosensory areas: Anatomical evidence for somatic representation in
primate frontal association cortex. Journal of Comparative Neurology 282: 293–316.
Price, C. J. 1998. The functional anatomy of word comprehension and production. Trends in Cognitive Sciences
2: 281–288.
Price, J. L. 2008. Multisensory convergence in the orbital and ventrolateral prefrontal cortex. Chemosensory
Perception 1: 103–109.
Rao, S. C., G. Rainer, and E. K. Miller. 1997. Integration of what and where in the primate prefrontal cortex.
Science 276: 821–824.
Rizzolatti, G., and L. Craighero. 2004. The mirror–neuron system. Annual Review of Neuroscience 27: 169–192.
Romanski, L. M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral
lateral prefrontal cortex. Cerebral Cortex 17 S1: i61–i69.
Romanski, L. M., and P. S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature
Neuroscience 5: 15–16.
Romanski, L. M., B. B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate
ventrolateral prefrontal cortex. Journal of Neurophysiology 93: 734–747.
Romanski, L. M., B. Tian, J. Fritz, M. Mishkin, P. S. Goldman-Rakic, and J. P. Rauschecker. 1999b. Dual streams
of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2:
1131–1136.
Romanski, L. M., J. F. Bates, and P. S. Goldman-Rakic. 1999a. Auditory belt and parabelt projections to the
prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403: 141–157.
Romo, R., C. D. Brody, A. Hernandez, and L. Lemus. 1999. Neuronal correlates of parametric working mem-
ory in the prefrontal cortex. Nature 399: 470–473.
Rosenkilde, C. E., R. H. Bauer, and J. M. Fuster. 1981. Single cell activity in ventral prefrontal cortex of behav-
ing monkeys. Brain Research 209: 375–394.
Schall, J. D., A. Morel, D. J. King, and J. Bullier. 1995. Topography of visual cortex connections with frontal eye field
in macaque: Convergence and segregation of processing streams. Journal of Neuroscience 15: 4464–4487.
Scott, S. K., C. C. Blank, S. Rosen, and R. J. Wise. 2000. Identification of a pathway for intelligible speech in
the left temporal lobe. Brain 12: 2400–2406.
Stevens, A. A., P. S. Goldman-Rakic, J. C. Gore, R. K. Fulbright, and B. E. Wexler. 1998. Cortical dysfunction
in schizophrenia during auditory word and tone working memory demonstrated by functional magnetic
resonance imaging. Archives of General Psychiatry 55: 1097–1103.
Stilla, R., and K. Sathian. 2008. Selective visuo-haptic processing of shape and texture. Human Brain Mapping
29: 1123–1138.
Stromswold, K., D. Caplan, N. Alpert, and S. Rauch. 1996. Localization of syntactic comprehension by posi-
tron emission tomography. Brain & Language 52: 452–473.
Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual
communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:
11138–11147.
Suzuki, H., and M. Azuma. 1977. Prefrontal neuronal activity during gazing at a light spot in the monkey. Brain
Research 126: 497–508.
Tanila, H., S. Carlson, I. Linnankoski, and H. Kahila. 1993. Regional distribution of functions in dorsolateral
prefrontal cortex of the monkey. Behavioral Brain Research 53: 63–71.
Tanila, H., S. Carlson, I. Linnankoski, F. Lindroos, and H. Kahila. 1992. Functional properties of dorsolateral
prefrontal cortical neurons in awake monkey. Behavioral Brain Research 47: 169–180.
682 The Neural Bases of Multisensory Processes
Tian, B., D. Reser, A. Durham, A. Kustov, and J. P. Rauschecker. 2001. Functional specialization in rhesus
monkey auditory cortex. Science 292: 290–293.
Tsao, D. Y., N. Schweers, S. Moeller, and W. A. Freiwald. 2008. Patches of face-selective cortex in the macaque
frontal lobe. Nature Neuroscience 11: 877–879.
Ungerleider, L. G., D. Gaffan, and V. S. Pelak. 1989. Projections from inferior temporal cortex to prefrontal
cortex via the uncinate fascicle in rhesus monkeys. Experimental Brain Research 76: 473–484.
Webster, M. J., J. Bachevalier, and L. G. Ungerleider. 1994. Connections of inferior temporal areas TEO and
TE with parietal and frontal cortex in macaque monkeys. Cerebral Cortex 4: 470–483.
Weiskrantz, L., and M. Mishkin. 1958. Effects of temporal and frontal cortical lesions on auditory discrimina-
tion in monkeys. Brain 80: 406–414.
Wilson, F. A., S. P. O’Scalaidhe, and P. S. Goldman-Rakic. 1993. Dissociation of object and spatial processing
domains in primate prefrontal cortex. Science 260: 1955–1958.
Wollberg, Z., and J. Sela. 1980. Frontal cortex of the awake squirrel monkey: Responses of single cells to visual
and auditory stimuli. Brain Research 198: 216–220.
Xu, J., P. J. Gannon, K. Emmorey, J. F. Smith, and A. R. Braun. 2009. Symbolic gestures and spoken language
are processed by a common neural system. Proceedings of the National Academy of Sciences of the
United States of America 106: 20664–20669.
Zatorre, R. J., A. C. Evans, E. Meyer, and A. Gjedde. 1992. Lateralization of phonetic and pitch discrimination
in speech processing. Science 256: 846–849.
Zatorre, R. J., M. Bouffard, and P. Belin. 2004. Sensitivity to auditory object features in human temporal neo-
cortex. The Journal of Neuroscience 24: 3637–3642.
34 A Multisensory Perspective
on Human Auditory
Communication
Katharina von Kriegstein
CONTENTS
34.1 Introduction........................................................................................................................... 683
34.2 The Auditory Perspective on Auditory Communication....................................................... 684
34.3 The Visual Perspective on Visual Communication............................................................... 685
34.4 The Multisensory Perspective on Auditory Communication................................................ 686
34.4.1 Improving Unisensory Recognition by Multisensory Learning................................ 687
34.4.1.1 Face Benefit: Auditory Recognition Is Improved after Voice–Face
Learning...................................................................................................... 687
34.4.1.2 Is the Face Benefit Caused by Greater Attention during Voice–Face
Learning?.................................................................................................... 688
34.4.1.3 Importance of a Common Cause for Rapid Learning Effects.................... 689
34.4.2 Auditory–Visual Model for Human Auditory Communication................................690
34.4.2.1 Visual Face Areas Are Behaviorally Relevant for Auditory
Recognition.................................................................................................690
34.4.3 A Multisensory Predictive Coding Framework for Auditory Communication . ...... 693
34.5 Conclusions and Future Directions........................................................................................ 694
References....................................................................................................................................... 695
34.1 INTRODUCTION
We spend a large amount of our time communicating with other people. Much of this communica-
tion occurs face to face, where the availability of sensory input from several modalities (e.g., audi-
tory, visual, tactile, olfactory) ensures a robust perception of information (e.g., Sumby and Pollack
1954; Gick and Derrick 2009). Robustness, in this case, means that the perception of a communi-
cation signal is veridical even when parts of the signal are noisy or occluded (Ay et al. 2007). For
example, if the auditory speech signal is noisy, then the concurrent availability of visual speech sig-
nals (e.g., lip movements and gestures) improves the perception of the speech information (Sumby
and Pollack 1954; Ross et al. 2007). The robustness in face-to-face communication does not only
pertain to speech recognition (Sumby and Pollack 1954; Ross et al. 2007), but also to other infor-
mation relevant for successful human interaction, for example, recognition of gender (Smith et al.
2007), emotion (de Gelder and Vroomen 1995; Massaro and Egan 1996), or identity (Schweinberger
et al. 2007).
Nevertheless, in our daily life there are also often situations when only a single modality is avail-
able, for example, when talking on the phone, listening to the radio, or when seeing another person
from a distance. Current models assume that perception in these unimodal tasks is based on and
683
684 The Neural Bases of Multisensory Processes
constrained to the unimodal sensory system. For example, in this view, solely the auditory system
is involved in the initial sensory analysis of the auditory speech signal during a telephone conver-
sation (see, e.g., Belin et al. 2004; Scott 2005; Hickok and Poeppel 2007). Similarly, it is assumed
that solely the visual system is involved in the initial sensory analysis of faces (Bruce and Young
1986; Haxby et al. 2000). In this chapter, I will review evidence that these models might need to be
extended; perception in human communication may always involve multisensory processing even
when our brains are processing only unimodal input (see, e.g., Hall et al. 2005; Pitcher et al. 2008;
von Kriegstein et al. 2008b). This involvement of multisensory processing might contribute to the
robustness of perception. I will start with a brief overview on mechanisms and models for auditory
speech and visual face processing from a modality-specific perspective. This will be followed by a
summary and discussion of recent behavioral and functional neuroimaging experiments in human
auditory communication that challenge the modality-specific view. They show that an interaction
between auditory and visual sensory processing can increase robustness and high performance in
auditory-only communication. I conclude with a view how one can explain these findings with a
model that unifies unimodal and multimodal recognition.
2000; von Kriegstein et al. 2003, 2008a). In contrast, right temporal lobe regions [temporal lobe
voice areas (TVA)] are more involved in extracting the nonlinguistic voice properties of the speech
signal such as speaker identity (Belin et al. 2000, 2002; von Kriegstein et al. 2003; von Kriegstein
and Giraud 2004). This left–right dichotomy is also supported by lesion studies that typically find
speech processing deficits after left-hemispheric lesions. In contrast, acquired phonagnosia, that is,
a deficiency in recognizing identity by voice, has been reported with right parietal and temporal
lobe damage (Van Lancker and Canter 1982; Van Lancker et al. 1989; Neuner and Schweinberger
2000; Lang et al. 2009). Whether the left–right dichotomy is only relative is still a matter of debate
(Hickok and Poeppel 2000). For example, although speech recognition can be impaired after left
hemispheric lesions (Boatman et al. 1995), it can also be impaired after right hemispheric lesions
in adverse listening conditions (Boatman et al. 2006). The functional view of hemispheric special-
ization might boil down to a specialization of different regions for different time windows in the
speech input. There is evidence that the right hemisphere samples over longer time windows than
the left hemisphere (Poeppel 2003; Boemio et al. 2005; Giraud et al. 2007; Abrams et al. 2008;
Overath et al. 2008). This implies that the relative specialization of the left hemisphere for speech
processing is a result of the highly variable nature of the acoustic input required for speech recog-
nition. In contrast, the relative specialization of the right hemisphere for speaker processing might
be a result of the relatively constant nature of speaker parameters, which also enable us to identify
others by voice (Lavner et al. 2000; Sheffert et al. 2002).
In addition to temporal lobe areas, there is evidence that motor regions (i.e., primary motor and
premotor cortex) play a role in the sensory analysis of speech sounds at the level of phonemes and syl-
lables (Liberman and Mattingly 1985; Watkins et al. 2003; D’Ausilio et al. 2009); however, whether
this involvement reflects a necessary sensory mechanism or other mechanisms necessary for spoken
language comprehension is still being debated (Hickok and Poeppel 2007; Scott et al. 2009).
At a higher level, one of the overarching goal of the sensory analysis of speech signals is to
understand spoken language or to recognize who is talking. The former involves a range of pro-
cessing steps from connecting speech sounds to words and sentences, to grammatical rules and
semantic processing. These processing steps involve an extended network of brain areas (see, e.g.,
Vigneau et al. 2006; Price 2000; Marslen-Wilson and Tyler 2007). For example, prefrontal areas
(BA 44/45) have been implicated in relatively complex language tasks such as syntax or work-
ing memory (Friederici 2002; Hickok and Poeppel 2007). Furthermore, semantic analysis might
involve several temporal lobe areas as well as an associative system comprising many widely dis-
tributed brain regions (Martin and Caramazza 2003; Barsalou 2008). One example for such seman-
tic analysis is the involvement of specific areas in the motor cortex for action words (Pulvermuller
et al. 2006; Hauk et al. 2008).
Moreover, the recognition of who is talking involves processing steps beyond sensory analysis
of speaker characteristics and voice identification, for example, associating a specific face or name
with the voice. This is thought to involve several extra-auditory areas, for example, supramodal
areas coding for person identity or visual areas that are involved in face identity processing (Ellis
et al. 1997; Gainotti et al. 2003; Tsukiura et al. 2006; von Kriegstein and Giraud 2006; Campanella
and Belin 2007).
Similarly to the auditory modality, face processing is assumed to be separated into processing of
variable aspects of the face (e.g., expression, speech-related orofacial movements) and processing
of the more invariant aspects of the face (i.e., face identity) (Bruce and Young 1986; Burton et al.
1999; Haxby et al. 2000). This distinction was initially based on behavioral studies showing that
face movement or expression processing can be separately impaired from face-identity processing
(Bruce and Young 1986; Young et al. 1993). For example, the patient LM cannot speech-read from
moving faces but has intact face recognition (Campbell et al. 1997). In contrast, prosopagnosics,
that is people who have a deficiency in recognizing identity from the face, are thought to be unim-
paired in the recognition of dynamic aspects of the human face (Humphreys et al. 1993; Lander et
al. 2004; Duchaine et al. 2003).
The prevalent model of face processing assumes that aspects relevant for identity recognition
are processed in the fusiform face area (FFA) in the ventrotemporal cortex (Sergent et al. 1992;
Kanwisher et al. 1997; Haxby et al. 2000; Bouvier and Engel 2006). Recognition of face expres-
sion and face movement involves the mid/posterior STS (Puce et al. 1998; Pelphrey et al. 2005;
Thompson et al. 2007). However, not all studies are in support for two entirely separate routes in
processing face identity and face dynamics, and the extent of specialization of the two areas for
dynamic versus invariant aspects of faces is still under debate (O’Toole et al. 2002; Calder and
Young 2005; Thompson et al. 2005; Fox et al. 2009).
Visual and visual association cortices have been described as the “core system” of face percep-
tion. In contrast, the “extended system” of face perception involves several nonvisual brain regions—
for example, the amygdala and anterior temporal lobe for processing social significance, emotion,
and person identity (Baron-Cohen et al. 2000; Haxby et al. 2000; Neuner and Schweinberger 2000;
Haxby et al. 2002; Kleinhans et al. 2009). Furthermore, in the model developed by Haxby and
colleagues, the extended system also comprises auditory cortices that are activated in response to
lipreading from faces (Calvert et al. 1997; Haxby et al. 2000).
receive any direct sensory (i.e., visual) input? Recent research suggests that it does. For example,
activation of visual areas has been shown to improve recognition of speech information in auditory-
only situations, such as when talking on the phone (von Kriegstein and Giraud 2006; von Kriegstein
et al. 2006; von Kriegstein et al. 2008b). These findings show that (after a brief period of audiovisual
learning), activation of visual association cortices (i.e., the FFA and the face-movement sensitive
STS) is correlated with behavioral benefits for auditory-only recognition. Such findings are at odds
with the above-described unisensory perspective on auditory-only communication, because they
imply that not only auditory sensory but also visual sensory areas are instrumental for auditory-
only tasks. In the following, I will review these behavioral and neuroimaging findings in detail and
discuss the implications for models of human auditory-only perception in human communication.
Speech 94%
Nico
+ er + geht
82%
Peter
Speaker
+ Peter + Daniel
2%
Symbol
Ingo
5%
Voice–occupation learning
FIGURE 34.1 Example for experimental design. Subjects were first trained on voices and names of six
different speakers. For three of these speakers, training was done with a voice–face video of the speaking
person (voice–face training). For the three other speakers training was done with the voice and a symbol for
the speaker’s occupation. In the subsequent test session, subjects performed a speech or speaker recognition
task on blocks of sentences spoken by previously trained speakers. Results show mean % correct recognition
over subjects. Face benefit is calculated as difference in performance after voice–face vs. voice–occupation
training. (Adapted from von Kriegstein, K. et al., Proc. Natl. Acad. Sci. U.S.A. 105, 6747–6752, 2008b.)
with phonemes spoken by a different set of speakers. In contrast, the speaker-specific face benefits
have been shown to develop very quickly. For example, Sheffert and Olson trained their subjects
with ca. 50 words from each of five speakers. Further studies showed that less than 2 min of training
per speaker already resulted in a significant face benefit [i.e., 9% for speaker recognition in the study
of von Kriegstein and Giraud 2006, and ca. 5% (speaker)/2%(speech) in the report of von Kriegstein
et al. 2008b]. Note, however, that the brief exposure times required for speaker-specific face benefits
seem to have their lower limits. Speaker recognition ability has been investigated after presentation
of only one sentence (mean duration 15 syllables/ca. 900 ms) and after multiple repetitions of a sen-
tence (45 syllables/ca. 2.7 s) (Cook and Wilding 1997, 2001). For the one-sentence condition, voice
recognition was actually worse after voice–face exposure (in contrast to voice-only exposure). For
the three-sentence condition, voice recognition was the same after voice–face exposure (in contrast
to voice-only exposure). Thus, the beneficial effect of voice–face training for voice recognition in
auditory-only conditions seems to occur somewhere between 3 s and 2 min of training (Cook and
Wilding 1997, 2001; von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b).
34.4.1.2 Is the Face Benefit Caused by Greater Attention during Voice–Face Learning?
One simple explanation for the face benefits could be that seeing people talking is much more excit-
ing and attention-grabbing than just listening to the audio track, even if it is additionally accom-
panied by a visual symbol (Figure 34.1). This increase in attention during training with videos
A Multisensory Perspective on Human Auditory Communication 689
may lead to better performance during test in auditory-only conditions. However, there is strong
evidence against this possibility. First, in the Sheffert and Olson (2004) study, subjects additionally
performed an old/new recognition test on words spoken by the familiarized speakers or nonfamiliar
speakers. If subjects paid more attention during the voice–face training (in contrast to the voice-
only training), they should remember words from the voice–face training condition better (than
those from the voice-only training). However, there was no such difference in word memory for the
two training conditions. Second, in the von Kriegstein and Giraud (2006) study, subjects were addi-
tionally trained to recognize ringtones of cell phones. In one condition, subjects were trained with
videos of hands operating cell phones. In the control condition, subjects were trained with the brand
names of cell phones. Subsequently, ringtone recognition was tested in an auditory-only condition.
If training with videos was more attention-grabbing, then one would expect better recognition of
ringtones after training with videos in contrast to after training with brand names. However, there
was no such benefit for ringtone recognition. Third, the probably most compelling argument against
an attentional effect is that the face benefits for speech and speaker recognition are behaviorally
dissociable. This dissociability was shown in a study on developmental prosopagnosic subjects and
controls (von Kriegstein et al. 2008b). Developmental prosopagnosia is a lifelong inability to rec-
ognize other people by their face (McConachie 1976; Behrmann and Avidan 2005; Duchaine and
Nakayama 2005; Gruter et al. 2007). The perception of facial dynamics has been shown to be unim-
paired (Lander et al. 2004). In our study (von Kriegstein et al. 2008b), we trained prosopagnosics
and control subjects to associate six speakers’ voices with their names (see Figure 34.1). Training
was done in two conditions. In one condition (voice–face), subjects learned via previously recorded
auditory–visual videos of the speaker’s voice and face. In the control condition (voice–symbol),
subjects learned by listening to the auditory track of the same videos and seeing a visual symbol for
the occupation of the person. After training, all subjects were tested on two tasks in auditory-only
conditions. In one task, subjects recognized the speakers by voice (speaker recognition), in the other
task subjects recognized what was said (speech recognition). If the improvement in auditory-only
conditions by prior voice–face training (i.e., the face benefit) depends on attention, one would expect
that both groups have similar face benefits on the two tasks. This was not the case. Although pros-
opagnosics have a normal face benefit for speech recognition (as compared to controls), they had
no face benefit for speaker recognition (which is different from controls). This means that the face
benefit in speech recognition can be normal, whereas the face benefit in speaker recognition can
be selectively impaired. It suggests that the face benefits in speech and speaker recognition rely on
two distinct and specific mechanisms instead of one common attentional mechanism. I will explain
what these mechanisms might be in terms of brain processes in Section 34.4.2.
by a specific vocal tract movement. This common cause results in similar and tightly correlated
dynamics in the visual and auditory modality (Chandrasekaran et al. 2009). Not only movement
has an expression in the visual and auditory modality, but also shape and other material properties
(Lakatos et al. 1997; Smith et al. 2005). For example, voices give information about the physical
characteristics of the speaker, such as body size, because the length of the vocal tract influences the
timbre of the voice and is correlated with body size (Smith et al. 2005). In contrast, other auditory–
visual events can be arbitrarily related. Ringtones and cell phones, for example, relate to a unique
ecologically valid multimodal source, but their association is arbitrary. The visual appearance of the
cell phone does not physically cause the characteristics of the ringtone and vice versa. We assume
that the rapid acquisition of face benefits and multisensory learning benefits occurs if the brain can
exploit already existing knowledge about the relationship between auditory and visual modalities
(von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b). This would explain why there are
rapid learning benefits when auditory and visual information is tightly correlated, whereas there are
no such rapid learning benefits when they are arbitrarily related (Seitz et al. 2006; von Kriegstein
and Giraud 2006; Kim et al. 2008).
34.4.2.1 Visual Face Areas Are Behaviorally Relevant for Auditory Recognition
Recent neuroimaging studies show that face-sensitive areas (STS and FFA) are involved in the rec-
ognition of auditory communication signals (von Kriegstein et al. 2005; von Kriegstein and Giraud
2006; von Kriegstein et al. 2006, 2008b). They suggest that the FFA is behaviorally relevant for
auditory-only speaker recognition, and that the face-movement sensitive STS is behaviorally rel-
evant for auditory-only speech recognition.
FIGURE 34.2 Audiovisual model for human communication. Schematic for processing of human commu-
nication signals during speech and speaker recognition. (a) Audiovisual input enters auditory and visual pre-
processing areas. These feed into two distinct networks, which process speech and speaker information. This
panel schematically depicts potential mechanism during voice–face training (see Figure 34.1) as well as areas
potentially involved in this process. (b) Auditory-only input enters auditory preprocessing areas. For speech
recognition, facial and vocal speech areas interact while engaging concurrently with higher levels of speech
processing. Similarly, for speaker recognition, face and voice identity areas interact while engaging concur-
rently with higher levels of speaker identity processing. This panel schematically depicts potential mechanism
during auditory testing after voice–face training (see Figure 34.1) as well as areas potentially involved in this
process. Note that interactions between boxes do not imply direct anatomical connections and that boxes may
represent more than one area, in particular for higher levels of speech and speaker recognition.
tasks is increased after prior voice–face experience and is task-specific. Activation of the FFA is
higher if subjects are asked to recognize the speaker in contrast to recognizing what is said, even
if the stimulus input for the two tasks is exactly the same. Figure 34.3 shows an example for FFA
activity during speaker recognition after and before voice–face and voice–name learning. Note that
in contrast to the increased FFA activation after voice–face learning, the auditory voice region in the
right temporal lobe (TVA) shows similar activation increase for the two training conditions (Figure
34.3). This could be taken as indication against the view that face benefits can be explained by an
increased effectiveness of auditory-only processing.
Not only does the level of activation change after a brief voice–face learning, but also the func-
tional connectivity of the FFA to other brain areas. When subjects recognize previously heard
voices of nonfamiliar people, the FFA is functionally connected to a frontoparietal network (von
Kriegstein and Giraud 2006). This pattern is similar to the connectivity pattern of the FFA, when
subjects are instructed to vividly imagine faces without any meaningful sensory input besides the
task instructions (Ishai et al. 2002; Mechelli et al. 2004). The connectivity changes dramatically
after a brief voice–face training. After training, the functional connectivity of the FFA to the fronto
parietal network is decreased. In contrast, connectivity between FFA and auditory voice-sensitive
areas (TVA) increases (von Kriegstein and Giraud 2006). A similar pattern of connectivity between
FFA and TVA can also be found during recognition of personally familiar speakers’ voices (von
Kriegstein et al. 2005). The change in connectivity suggests that the FFA activation after voice–face
training results from a different mechanism than before training or during task-instructed imagery.
The more direct connectivity between FFA and TVA after voice–face learning is compatible with
the hypothesis that auditory and visual areas interact already at stages of sensory analysis as sug-
gested by the audiovisual model (von Kriegstein and Giraud 2006).
692 The Neural Bases of Multisensory Processes
Signal change
Voice–face Voice–name Voice–face Voice–name
FIGURE 34.3 Blood oxygen level dependent (BOLD) responses in voice- (left panel) and face-sensitive
(right panel) areas after and before different types of audiovisual training. In this study, control training
involved learning of voice–name associations (instead of voice–occupation symbol association displayed in
Figure 34.1). Note that increase in activation in auditory voice areas (TVA) is similar for both training condi-
tions. In contrast, responses in fusiform face area increase only after voice–face training, not after voice–
name training. Signal change here refers to a contrast between speaker recognition and ringtone recognition
(for details, see von Kriegstein and Giraud 2006).
y = –45
y = –51
FIGURE 34.4 (See color insert.) Face-sensitive left STS (blue) is located in regions of STS that are distinct
from those that are responsive to auditory speech (green). Positive correlation of activity in STS with face
benefit in speech task (red) overlaps with the face area (overlap in purple) but not with the auditory area (green)
(for more details on specific contrasts used, see von Kriegstein et al. 2008b). y, MNI coordinate in anterior–
posterior direction.
recognition with the amount of FFA activation. The behavioral and neuroanatomical dissociation
is in accord with the audiovisual model (Figure 34.2). Speech and speaker recognition largely rest
on two different sets of audiovisual correlations. Speech recognition is based predominantly on fast
time-varying acoustic cues produced by the varying vocal tract shape (Fant 1960), and much of this
is visible on the speaker’s face (Yehia et al. 1998). Conversely, speaker recognition uses predomi-
nantly very slowly varying properties of the speech signal such as the acoustic properties of the
vocal tract length (Lavner et al. 2000). If the brain uses encoded visual information for processing
auditory-only speech, the behavioral improvement that is induced by voice–face training (i.e., the
face benefit) must be dissociable for speech and speaker recognition (von Kriegstein et al. 2008b).
“talking face” (Siciliano et al. 2002). Such external simulation helps hearing-impaired listeners to
understand what is said. This creation of an artificial talking face uses a phoneme recognizer and
a face synthesizer to recreate the facial movements based on the auditory input. The audiovisual
model for auditory communication predicts that the human brain routinely uses a similar mecha-
nism: Auditory-only speech processing and speaker recognition is improved by internal simulation
of a talking face. How can such a model be explained in computational modeling terms? Recent
theoretical neuroscientific work has suggested that recognition can be modeled using a predic-
tive coding framework (Friston 2005). This framework assumes that efficient online recognition
of sensory signals is accomplished by a cortical hierarchy that is tuned to the prediction of sensory
signals. It is assumed that high levels of the hierarchy (i.e., further away from the sensory input)
provide predictions about the representation of information at a lower level of the hierarchy (i.e.,
closer to the sensory input). Each level contains a forward or generative model for the causes of
the sensory input and uses this model to generate predictions and constraints for the interpreta-
tion of the sensory input. Higher levels send predictions to the lower level, whereas the lower level
sends prediction errors to the higher level. One prerequisite to make such a mechanism useful is
that the brain learns regularities within the environment to efficiently predict the future sensory
input. Furthermore, these regularities should be adaptable to allow for changes in the regularities of
the environment. Therefore predictive coding theories have been formulated in a Bayesian frame-
work. In this framework, predictions are based on previous sensory evidence and can have varying
degrees of certainty. In visual and sensory–motor processing, “internal forward models” have been
used to explain how the brain encodes complex sensory data by relatively few parameters (Wolpert
et al. 1995; Knill et al. 1998; Rao and Ballard 1999; Bar 2007; Deneve et al. 2007).
Although predictive coding theories usually emphasize interaction between high and low levels,
a similar interaction might occur between sensory modalities. For example, the brain might use
audiovisual forward models, which encode the physical, causal relationship between a person talking
and its consequences for the visual and auditory input (von Kriegstein et al. 2008b). Critically, these
models encode the causal dependencies between the visual and auditory trajectories. Perception is
based on the “inversion” of models, that is, the brain identifies causes (e.g., Mr. Smith says “Hello”)
that explain the observed audiovisual input best. The changes in behavioral performance after a
brief voice–face experience suggest that the human brain can quickly and efficiently learn “a new
person” by adjusting key parameters in existing internal audiovisual forward models. Once param-
eters for an individual person are learned, auditory speech processing is improved because the brain
learned parameters of an audiovisual forward model with strong dependencies between internal
auditory and visual trajectories. The use of these models is reflected in an increased activation of
face-processing areas during auditory tasks. The audiovisual speaker model enables the system to
simulate visual trajectories (via the auditory trajectories) when there is no visual input. The talking
face simulation works best if the learned coupling between auditory and visual input is strong and
veridical. The visual simulation is fed back to auditory areas thereby improving auditory recogni-
tion by providing additional constraints. This mechanism can be used iteratively until the inversion
of the audiovisual forward model converges on a percept. In summary, this scheme suggests that
forward models encode and exploit dependencies in the environment and are used to improve recog-
nition in unisensory conditions by simulating the causes of the sensory input. Note that this chapter
focuses on the visual part of this simulation process. It is currently unclear whether motor processes
also play a role for this online simulation and whether the simulation proposed here has a relation
to simulation accounts underlying the motor theory of speech perception (Fischer and Zwaan 2008;
D’Ausilio et al. 2009).
analysis of auditory communication signals (von Kriegstein et al. 2005; von Kriegstein and Giraud
2006; von Kriegstein et al. 2006; von Kriegstein et al. 2008b). Speech recognition is supported by
selective recruitment of the face-sensitive STS, which is known to be involved in orofacial movement
processing (Puce et al. 1998; Thompson et al. 2007). Speaker recognition is supported by selective
recruitment of the FFA, which is involved in face-identity processing (Eger et al. 2004; Rotshtein
et al. 2005; Bouvier and Engel 2006). These findings challenge auditory-only models for speech
processing, because they imply that during large parts of ecologically valid social interactions, not
only auditory but also visual areas are involved to solve auditory tasks. For example, during a phone
conversation with personally familiar people (e.g., friends or colleagues), face sensitive areas will
be employed to optimally understand what the person is saying and to identify the other by his/her
voice. The same applies to less familiar people, given a brief prior face-to-face interaction.
The results have been explained by an audiovisual model couched in a predictive coding frame-
work (von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b). This model assumes that the
brain routinely simulates talking faces in response to auditory input and that this internal audio
visual simulation is used to actively predict and thereby constrain the possible interpretations of the
auditory signal. This mechanism leads to improved recognition in situations where only the audi-
tory modality is available.
Whether the audiovisual simulation scheme is a general principle of how unisensory tasks are
performed when one or more of the usual input modalities are missing is unclear. I assume that the
same principle also applies to other voice–face information that is correlated in the auditory and
visual domains, such as recognition of emotion from voice and face (de Gelder and Vroomen 1995;
Massaro and Egan 1996). Furthermore, the principle might even be applicable to noncommunica-
tion sensory signals with a (veridical or illusionary) common cause such as the recognition of move-
ment trajectories of computer-animated dot patterns and moving sound sources (Seitz et al. 2006).
Neuroscientific research has focused on responses in visual sensory areas in auditory-only condi-
tions after a brief voice–face sensory experience. However, visual sensory areas could also play a
role for speakers for which no specific voice–face sensory experience was made. For example, the
speaker-independent effect of foreign phoneme training (Hardison 2003; Hazan et al. 2005; Hirata
and Kelly, 2010) could be based on extrapolating the speaker-specific face model to other speakers.
Similar mechanisms might occur during development of the speech perception system in children.
The use of internal face models for speech and speaker recognition might be especially impor-
tant in situations where there is uncertainty about the input modality. There are multiple sources
for uncertainty in human auditory communication. For example, a low level of experience with a
second language will likely result in a high level of uncertainty about the trajectory of the speech
signal. Furthermore, a high level of background noise will result in a high level of uncertainty about
the speech input. The use of an internal face simulation mechanism could increase robustness of
perception in these situations.
REFERENCES
Abrams, D. A., T. Nicol, S. Zecker, and N. Kraus. 2008. Right-hemisphere auditory cortex is dominant for cod-
ing syllable patterns in speech. J Neurosci 28: 3958–3965.
American Standards Association. 1960. Acoustical Terminology SI. New York: Association AS.
Arnal, L. H., B. Morillon, C. A. Kell, and A. L. Giraud. 2009. Dual neural routing of visual facilitation in
speech processing. J Neurosci 29: 13445–13453.
Ay, N., J. Flack, and D. C. Krakauer. 2007. Robustness and complexity co-constructed in multimodal signalling
networks. Philos Trans R Soc Lond B Biol Sci 362: 441–447.
Bar, M. 2007. The proactive brain: Using analogies and associations to generate predictions. Trends Cogn Sci
11: 280–289.
Baron-Cohen, S., H. A. Ring, E. T. Bullmore, S. Wheelwright, C. Ashwin, and S. C. R. Williams. 2000. The
amygdala theory of autism. Neurosci Biobehav Rev 24: 355–364.
Barsalou, L. W. 2008. Grounded cognition. Annu Rev Psychol 59: 617–645.
696 The Neural Bases of Multisensory Processes
Behrmann, M., and G. Avidan. 2005. Congenital prosopagnosia: Face-blind from birth. Trends Cogn Sci
9:180–187.
Belin, P., S. Fecteau, and C. Bedard. 2004. Thinking the voice: Neural correlates of voice perception. Trends
Cogn Sci 8: 129–135.
Belin, P., R. J. Zatorre, and P. Ahad. 2002. Human temporal-lobe response to vocal sounds. Brain Res Cogn
Brain Res 13: 17–26.
Belin, P., R. J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex.
Nature 403: 309–312.
Besle, J., C. Fischer, A. Bidet-Caulet, F. Lecaignard, O. Bertrand, and M. H. Giard. 2008. Visual activation
and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in
humans. J Neurosci 28: 14301–14310.
Bizley, J. K., K. M. Walker, B. W. Silverman, A. J. King, and J. W. Schnupp. 2009. Interdependent encoding of
pitch, timbre, and spatial location in auditory cortex. J Neurosci 29: 2064–2075.
Boatman, D., R. P. Lesser, and B. Gordon. 1995. Auditory speech processing in the left temporal lobe: An
electrical interference study. Brain Lang 51: 269–290.
Boatman, D. F., R. P. Lesser, N. E. Crone, G. Krauss, F. A. Lenz, and D. L. Miglioretti. 2006. Speech recogni-
tion impairments in patients with intractable right temporal lobe epilepsy. Epilepsia 47: 1397–1401.
Boemio, A., S. Fromm, A. Braun, and D. Poeppel. 2005. Hierarchical and asymmetric temporal sensitivity in
human auditory cortices. Nat Neurosci 8: 389–395.
Bouvier, S. E., and S. A. Engel. 2006. Behavioral deficits and cortical damage loci in cerebral achromatopsia.
Cereb Cortex 16: 183–191.
Bruce, V., and A. Young. 1986. Understanding face recognition. Br J Psychol 77: 305–327.
Burton, A. M., V. Bruce, and P. J. B. Hancock. 1999. From pixels to people: A model of familiar face recogni-
tion. Cogn Sci 23: 1–31.
Calder, A. J., and A. W. Young. 2005. Understanding the recognition of facial identity and facial expression.
Nat Rev Neurosci 6: 641–651.
Calvert, G. A., E. T. Bullmore, M. J. Brammer, R. Campbell, S. C. Williams, P. K. McGuire, P. W. Woodruff,
S. D. Iversen, and A. S. David. 1997. Activation of auditory cortex during silent lipreading. Science 276:
593–596.
Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends Cogn Sci 11:
535–543.
Campbell, R., J. Zihl, D. Massaro, K. Munhall, and M. M. Cohen. 1997. Speechreading in the akinetopsic
patient, L.M. Brain 120 (Pt 10): 1793–1803.
Chandrasekaran, C., A. Trubanova, S. Stillittano, A. Caplier, and A. A. Ghazanfar. 2009. The natural statistics
of audiovisual speech. PLoS Comput Biol 5: e1000436.
Clopper, C. G., and D. B. Pisoni. 2004. Some acoustic cues for the perceptual categorization of American
English regional dialects. J Phon 32: 111–140.
Cook, S., and J. Wilding. 1997. Earwitness testimony: 2. Voices, faces and context. Appl Cogn Psychol 11:
527–541.
Cook, S., and J. Wilding. 2001. Earwitness testimony: Effects of exposure and attention on the face overshad-
owing effect. Br J Psychol 92: 617–629.
D’Ausilio, A., F. Pulvermuller, P. Salmas, I. Bufalari, C. Begliomini, and L. Fadiga. 2009. The motor somato-
topy of speech perception. Curr Biol 19: 381–385.
de Gelder, B., and J. Vroomen. 1995. The perception of emotions by ear and by eye, 289–311. Los Angeles:
Psychology Press.
Deneve, S., J. R. Duhamel, and A. Pouget. 2007. Optimal sensorimotor integration in recurrent cortical net-
works: A neural implementation of Kalman filters. J Neurosci 27: 5744–5756.
Duchaine, B., and K. Nakayama. 2005. Dissociations of face and object recognition in developmental proso
pagnosia. J Cogn Neurosci 17: 249–261.
Duchaine, B. C., H. Parker, and K. Nakayama. 2003. Normal recognition of emotion in a prosopagnosic.
Perception 32: 827–838.
Eger, E., P. G. Schyns, and A. Kleinschmidt. 2004. Scale invariant adaptation in fusiform face-responsive
regions. Neuroimage 22: 232–242.
Ellis, H. D., D. M. Jones, and N. Mosdell. 1997. Intra- and inter-modal repetition priming of familiar faces and
voices. Br J Psychol 88 (Pt 1): 143–156.
Fant, G. 1960. Acoustic theory of speech production. Paris: Mouton.
Fischer, M. H., and R. A. Zwaan. 2008. Embodied language: A review of the role of the motor system in lan-
guage comprehension. Q J Exp Psychol (Colchester) 61: 825–850.
A Multisensory Perspective on Human Auditory Communication 697
Fox, C. J., S. Y. Moon, G. Iaria, and J. J. Barton. 2009. The correlates of subjective perception of identity and
expression in the face network: An fMRI adaptation study. Neuroimage 44: 569–580.
Friederici, A. D. 2002. Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6: 78–84.
Friston, K. 2005. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci 360: 815–836.
Gainotti, G., A. Barbier, and C. Marra. 2003. Slowly progressive defect in recognition of familiar people in a
patient with right anterior temporal atrophy. Brain 126: 792–803.
Gick, B., and D. Derrick. 2009. Aero-tactile integration in speech perception. Nature 462: 502–504.
Giraud, A. L., A. Kleinschmidt, D. Poeppel, T. E. Lund, R. S. Frackowiak, and H. Laufs. 2007. Endogenous cortical
rhythms determine cerebral specialization for speech perception and production. Neuron 56: 1127–1134.
Grey, J. M. 1977. Multidimensional perceptual scaling of musical timbres. J Acoust Soc Am 61: 1270–1277.
Griffiths, T. D., S. Uppenkamp, I. Johnsrude, O. Josephs, and R. D. Patterson. 2001. Encoding of the temporal
regularity of sound in the human brainstem. Nat Neurosci 4: 633–637.
Gruter, M., T. Gruter, V. Bell, P. W. Halligan, J. Horst, Sperling et al. 2007. Hereditary prosopagnosia: The first
case series. Cortex 43: 734–749.
Hall, D. A., C. Fussell, and A. Q. Summerfield. 2005. Reading fluent speech from talking faces: Typical brain
networks and individual differences. J Cogn Neurosci 17: 939–953.
Hardison, D. M. 2003. Acquisition of second-language speech: Effects of visual cues, context, and talker vari-
ability. Appl Psycholinguist 24: 495.
Hauk, O., Y. Shtyrov, and F. Pulvermuller. 2008. The time course of action and action–word comprehension in
the human brain as revealed by neurophysiology. J Physiol Paris 102: 50–58.
Haxby, J. V., E. A. Hoffman, and M. I. Gobbini. 2000. The distributed human neural system for face perception.
Trends Cogn Sci 4: 223–233.
Haxby, J. V., E. A. Hoffman, and M. I. Gobbini. 2002. Human neural systems for face recognition and social
communication. Biol Psychiatry 51: 59–67.
Hazan, V., A. Sennema, M. Iba, and A. Faulkner. 2005. Effect of audiovisual perceptual perception and produc-
tion of training on the consonants by Japanese learners of English. Speech Commun 47: 360–378.
Hickok, G., and D. Poeppel. 2000. Towards a functional neuroanatomy of speech perception. Trends Cogn Sci
4: 131–138.
Hickok, G., and D. Poeppel. 2007. The cortical organization of speech processing. Nat Rev Neurosci 8:
393–402.
Hirata, Y., and S. D. Kelly. 2010. Effects of lips and hands on auditory learning of second language speech
sounds. Lang Hear Res 2: 298–310.
Humphreys, G. W., N. Donnelly, and M. J. Riddoch. 1993. Expression is computed separately from facial
identity, and it is computed separately for moving and static faces: Neuropsychological evidence.
Neuropsychologia 31: 173–181.
Ishai, A., J. V. Haxby, and L. G. Ungerleider. 2002. Visual imagery of famous faces: Effects of memory and
attention revealed by fMRI. Neuroimage 17: 1729–1741.
Iverson, P., and C. L. Krumhansl. 1993. Isolating the dynamic attributes of musical timbre. J Acoust Soc Am
94: 2595–2603.
Johnson, W. F., R. N. Emde, K. R. Scherer, and M. D. Klinnert. 1986. Recognition of emotion from vocal cues.
Arch Gen Psychiatry 43: 280–283.
Kanwisher, N., J. McDermott, and M. M. Chun. 1997. The fusiform face area: A module in human extrastriate
cortex specialized for face perception. J Neurosci 17: 4302–4311.
Kim, R. S., A. R. Seitz, and L. Shams. 2008. Benefits of stimulus congruency for multisensory facilitation of
visual learning. PLoS ONE 3: e1532.
Kleinhans, N. M., L. C. Johnson, T. Richards, R. Mahurin, J. Greenson, G. Dawson et al. 2009. Reduced neural
habituation in the amygdala and social impairments in autism spectrum disorders. Am J Psychiatry 166:
467–475.
Knill, D., D. Kersten, A. Yuille, and W. Richards. 1998. Introduction: A Bayesian formulation of visual percep-
tion. In Perception as Bayesian Inference, 1–21. Cambridge, MA: Cambridge Univ. Press.
Lakatos, S., S. McAdams, and R. Causse. 1997. The representation of auditory source characteristics: Simple
geometric form. Percept Psychophys 59: 1180–1190.
Lander, K., G. Humphreys, and V. Bruce. 2004. Exploring the role of motion in prosopagnosia: Recognizing,
learning and matching faces. Neurocase 10: 462–470.
Lang, C. J., O. Kneidl, M. Hielscher-Fastabend, and J. G. Heckmann. 2009. Voice recognition in aphasic and
non-aphasic stroke patients. J Neurol 256: 1303–1306.
Lavner, Y., I. Gath, and J. Rosenhouse. 2000. The effects of acoustic modifications on the identification of
familiar voices speaking isolated vowels. Speech Commun 30: 9–26.
698 The Neural Bases of Multisensory Processes
Leff, A. P., T. M. Schofield, K. E. Stephan, J. T. Crinion, K. J. Friston, and C. J. Price. 2008. The cortical
dynamics of intelligible speech. J Neurosci 28: 13209–13215.
Liberman, A. M., and I. G. Mattingly. 1985. The motor theory of speech perception revised. Cognition 21:
1–36.
Marslen-Wilson, W. D., and L. K. Tyler. 2007. Morphology, language and the brain: The decompositional sub-
strate for language comprehension. Philos Trans R Soc Lond B Biol Sci 362: 823–836.
Martin, A., and A. Caramazza. 2003. Neuropsychological and neuroimaging perspectives on conceptual knowl-
edge: An introduction. Cogn Neuropsychol 20: 195–212.
Massaro, D. W., and P. B. Egan. 1996. Perceiving affect from the voice and the face. Psychon Bull Rev 3:
215–221.
McAdams, S., S. Winsberg, S. Donnadieu, G. Desoete, and J. Krimphoff. 1995. Perceptual scaling of syn-
thesized musical timbres—Common dimensions, specificities, and latent subject classes. Psychol Res
Psychol Forsch 58: 177–192.
McConachie, H. R. 1976. Developmental prosopagnosia. A single case report. Cortex 12: 76–82.
Mechelli, A., C. J. Price, K. J. Friston, and A. Ishai. 2004. Where bottom-up meets top-down: Neuronal interac-
tions during perception and imagery. Cereb Cortex 14: 1256–1265.
Menon, V., D. J. Levitin, B. K. Smith, A. Lembke, B. D. Krasnow, D. Glazer et al. 2002. Neural correlates of
timbre change in harmonic sounds. Neuroimage 17: 1742–1754.
Nelken, I., and O. Bar-Yosef. 2008. Neurons and objects: The case of auditory cortex. Front Neurosci 2: 107–113.
Neuner, F., and S. R. Schweinberger. 2000. Neuropsychological impairments in the recognition of faces, voices,
and personal names. Brain Cogn 44: 342–366.
O’Toole, A. J., D. A. Roark, and H. Abdi. 2002. Recognizing moving faces: A psychological and neural syn-
thesis. Trends Cogn Sci 6: 261–266.
Obleser, J., and F. Eisner. 2008. Pre-lexical abstraction of speech in the auditory cortex. Trends Cogn Sci 13(1):
14–19.
Overath, T., S. Kumar, K. von Kriegstein, and T. D. Griffiths. 2008. Encoding of spectral correlation over time
in auditory cortex. J Neurosci 28: 13268–13273.
Patterson, R. D., S. Uppenkamp, I. S., Johnsrude, and T. D. Griffiths. 2002. The processing of temporal pitch
and melody information in auditory cortex. Neuron 36: 767–776.
Pekkola, J., V. Ojanen, T. Autti, I. P. Jaaskelainen, R. Mottonen et al. 2005. Primary auditory cortex activation
by visual speech: An fMRI study at 3 T. Neuroreport 16: 125–128.
Pelphrey, K. A., J. P. Morris, C. R. Michelich, T. Allison, and G. McCarthy. 2005. Functional anatomy of bio-
logical motion perception in posterior temporal cortex: An FMRI study of eye, mouth and hand move-
ments. Cereb Cortex 15: 1866–1876.
Penagos, H., J. R. Melcher, and A. J. Oxenham. 2004. A neural representation of pitch salience in nonpri-
mary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci 24:
6810–6815.
Pitcher, D., L. Garrido, V. Walsh, and B. C. Duchaine. 2008. Transcranial magnetic stimulation disrupts the
perception and embodiment of facial expressions. J Neurosci 28: 8929–8933.
Poeppel, D. 2003. The analysis of speech in different temporal integration windows: Cerebral lateralization as
‘asymmetric sampling in time.’ Speech Commun 41: 245–255.
Price, C., G. Thierry, and T. Griffiths. 2005. Speech-specific auditory processing: Where is it? Trends Cogn Sci
9: 271–276.
Price, C. J. 2000. The anatomy of language: Contributions from functional neuroimaging. J Anat 197(Pt 3):
335–359.
Puce, A., T. Allison, S. Bentin, J. C. Gore, and G. McCarthy. 1998. Temporal cortex activation in humans view-
ing eye and mouth movements. J Neurosci 18: 2188–2199.
Pulvermuller, F., M. Huss, F. Kherif, F. M. D. P. Martin, O. Hauk, and Y. Shtyrov. 2006. Motor cortex maps
articulatory features of speech sounds. Proc Natl Acad Sci U S A 103: 7865–7870.
Rao, R. P., and D. H. Ballard. 1999. Predictive coding in the visual cortex: A functional interpretation of some
extra-classical receptive-field effects. Nat Neurosci 2: 79–87.
Ross, L. A., D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe. 2007. Do you see what I am say-
ing? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex 17:
1147–1153.
Rotshtein, P., R. N. Henson, A. Treves, J. Driver, and R. J. Dolan. 2005. Morphing Marilyn into Maggie dissoci-
ates physical and identity face representations in the brain. Nat Neurosci 8: 107–113.
Sathian, K., A. Zangaladze, J. M. Hoffman, and S. T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport
8: 3877–3881.
A Multisensory Perspective on Human Auditory Communication 699
Scherer, K. R. 1986. Vocal affect expression: A review and a model for future research. Psychol Bull 99:
143–165.
Schweinberger, S. R., D. Robertson, and J. M. Kaufmann. 2007. Hearing facial identities. Q J Exp Psychol
(Colchester) 60: 1446–1456.
Scott, S. K. 2005. Auditory processing—Speech, space and auditory objects. Curr Opin Neurobiol 15:
197–201.
Scott, S. K., C. McGettigan, and F. Eisner. 2009. A little more conversation, a little less action—Candidate roles
for the motor cortex in speech perception. Nat Rev Neurosci 10: 295–302.
Scott, S. K., C. C. Blank, S. Rosen, and R. J. Wise. 2000. Identification of a pathway for intelligible speech in
the left temporal lobe. Brain 123(Pt 12): 2400–2406.
Seitz, A. R., and H. R. Dinse. 2007. A common framework for perceptual learning. Curr Opin Neurobiol 17:
148–153.
Seitz, A. R., R. Kim, and L. Shams. 2006. Sound facilitates visual learning. Curr Biol 16: 1422–1427.
Sergent, J., S. Ohta, and B. MacDonald. 1992. Functional neuroanatomy of face and object processing. A posi-
tron emission tomography study. Brain 115(Pt 1): 15–36.
Shams, L., and A. R. Seitz. 2008. Benefits of multisensory learning. Trends Cogn Sci 12: 411–417.
Sheffert, S. M., and E. Olson. 2004. Audiovisual speech facilitates voice learning. Percept Psychophys 66:
352–362.
Sheffert, S. M., D. B. Pisoni, J. M. Fellowes, and R. E. Remez, 2002. Learning to recognize talkers from natu-
ral, sinewave, and reversed speech samples. J Exp Psychol Hum Percept Perform 28: 1447–1469.
Siciliano, C., G. Williams, J. Beskow, and A. Faulkner. 2002. Evaluation of a multilingual synthetic talking face
as a communication aid for the hearing-impaired. Speech Hear Lang Work Prog 14: 51–61.
Smith, D. R. R., R. D. Patterson, R. Turner, H. Kawahara, and T. Irino. 2005. The processing and perception of
size information in speech sounds. J Acoust Soc Am 117: 305–318.
Smith, E. L., M. Grabowecky, and S. Suzuki. 2007. Auditory–visual crossmodal integration in perception of
face gender. Curr Biol 17: 1680–1685.
Sumby, W. H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26:
212–215.
Thomas, E. R., and J. Reaser. 2004. Delimiting perceptual cues used for the ethnic labeling of African American
and European American voices. J Socioling 8: 54–87.
Thompson, J. C., M. Clarke, T. Stewart, and A. Puce. 2005. Configural processing of biological motion in
human superior temporal sulcus. J Neurosci 25: 9059–9066.
Thompson, J. C., J. E. Hardee, A. Panayiotou, D. Crewther, and A. Puce. 2007. Common and distinct brain
activation to viewing dynamic sequences of face and hand movements. Neuroimage 37: 966–973.
Tsukiura, T., H. Mochizuki-Kawai, and T. Fujii. 2006. Dissociable roles of the bilateral anterior temporal lobe
in face–name associations: An event-related fMRI study. Neuroimage 30: 617–626.
Van Lancker, D. R., and J. G. Canter. 1982. Impairment of voice and face recognition in patients with hemi-
spheric damage. Brain Cogn 1: 185–195.
Van Lancker, D. R., J. Kreiman, and J. Cummings. 1989. Voice perception deficits: Neuroanatomical correlates
of phonagnosia. J Clin Exp Neuropsychol 11: 665–674.
van Wassenhove, V., K. W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of audi-
tory speech. Proc Natl Acad Sci U S A 102: 1181–1186.
Vigneau, M., V. Beaucousin, P. Y. Herve, H. Duffau, F. Crivello, O. Houde et al. 2006. Meta-analyzing left hemi-
sphere language areas: phonology, semantics, and sentence processing. Neuroimage 30: 1414–1432.
von Kriegstein, K., and A. L. Giraud. 2004. Distinct functional substrates along the right superior temporal
sulcus for the processing of voices. Neuroimage 22: 948–955.
von Kriegstein, K., and A. L. Giraud. 2006. Implicit iultisensory associations influence voice recognition. PLoS
Biol 4: e326.
von Kriegstein, K., A. Kleinschmidt, and A. L. Giraud. 2006. Voice recognition and cross-modal responses to
familiar speakers’ voices in prosopagnosia. Cereb Cortex 16: 1314–1322.
von Kriegstein, K., R. D. Patterson, and T. D. Griffiths. 2008a. Task-dependent modulation of medial geniculate
body is behaviorally relevant for speech recognition. Curr Biol 18: 1855–1859.
von Kriegstein, K., E. Eger, A. Kleinschmidt, and A. L. Giraud. 2003. Modulation of neural responses to speech
by directing attention to voices or verbal content. Brain Res Cogn Brain Res 17: 48–55.
von Kriegstein, K., A. Kleinschmidt, P. Sterzer, and A. L. Giraud. 2005. Interaction of face and voice areas dur-
ing speaker recognition. J Cogn Neurosci 17: 367–376.
von Kriegstein, K., D. R. Smith, R. D. Patterson, D. T. Ives, and T. D. Griffiths. 2007. Neural representation of
auditory size in the human voice and in sounds from other resonant sources. Curr Biol 17: 1123–1128.
700 The Neural Bases of Multisensory Processes
von Kriegstein, K., O. Dogan, M. Gruter, A. L. Giraud, C. A. Kell, T. Gruter et al. 2008b. Simulation of talk-
ing faces in the human brain improves auditory speech recognition. Proc Natl Acad Sci U S A 105:
6747–6752.
von Kriegstein, K., D. R. Smith, R. D. Patterson, S. J. Kiebel, and T. D. Griffiths. 2010. How the human brain
recognizes speech in the context of changing speakers. J Neurosci 30: 629–638.
Warren, J. D., A. R. Jennings, and T. D. Griffiths. 2005. Analysis of the spectral envelope of sounds by the
human brain. Neuroimage 24: 1052–1057.
Watkins, K. E., A. P. Strafella, and T. Paus. 2003. Seeing and hearing speech excites the motor system involved
in speech production. Neuropsychologia 41: 989–994.
Wolpert, D. M., Z. Ghahramani, and M. I. Jordan. 1995. An internal model for sensorimotor integration. Science
269: 1880–1882.
Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior.
Speech Commun 26:23–43.
Young, A. W., F. Newcombe, E. H. de Haan, M. Small, and D. C. Hay. 1993. Face perception after brain injury.
Selective impairments affecting identity and expression. Brain 116 (Pt 4): 941–959.
Zangaladze, A., C. M. Epstein, S. T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile
discrimination of orientation. Nature 401: 587–590.
Section IX
Naturalistic Multisensory Processes:
Flavor
35 Multimodal Chemosensory
Interactions and
Perception of Flavor
John Prescott
CONTENTS
35.1 Introduction........................................................................................................................... 703
35.2 Chemosensory Interactions and Integration..........................................................................704
35.3 Associative Learning and Integration................................................................................... 705
35.4 Cross-Modal Chemosensory Binding.................................................................................... 706
35.5 Attentional Processes in Binding.......................................................................................... 708
35.6 Analysis and Synthesis in Perception of Flavor.................................................................... 708
35.7 Investigating Cognitive Processes in Flavor Perception........................................................ 710
35.8 Hedonic Implications of Chemosensory Integration . .......................................................... 712
References....................................................................................................................................... 714
35.1 INTRODUCTION
Writing in the early nineteenth century, the gastronomic pioneer, Brillat-Savarin was “tempted to
believe that smell and taste are in fact but a single sense, whose laboratory is the mouth and whose
chimney is the nose” (Brillat-Savarin 1825). Much of the subsequent history of perception research
in the chemical senses has, in contrast, been characterized by a focus on discrete sensory channels,
and their underlying anatomy and physiology. However, there has recently been renewed interest
in examining flavor as a functional perceptual system. This has been borne to some extent out of
a realization that in our everyday food experiences, we respond, perceptually and hedonically, not
to discrete tastes, odors, and tactile sensations, but to flavors constructed from a synthesis of these
sensory signals (Prescott 2004b).
This refocus regarding flavor is very much in line with the ecological approach to perception that
had been advocated by Gibson (1966). Gibson argued that the primary purpose of perception is to
seek out objects in our environment, particularly those that are biologically important. As such, the
physiological origin of sensory information is less salient than that the information can be used in
object identification. Effectively, then, the key to successful perception is that sensory information
is interpreted as qualities that belong to the object itself. Within this context, flavor can be seen as
a functionally distinct sense that is cognitively “constructed” from the integration of distinct physi-
ologically defined sensory systems (such as olfaction and gustation) that are “functionally united
when anatomically separated” (Gibson 1966, p. 137) in order to identify and respond to objects that
are important to our survival, namely, foods.
703
704 The Neural Bases of Multisensory Processes
or a nonsweet taste (monosodium glutamate, a savory quality). The most plausible interpretation of
these findings is that the smelled sweetness of benzaldehyde and tasted sweetness of saccharin were
being integrated at subthreshold levels. Similar odor threshold effects have also been found using
a somewhat different experimental protocol in which both the odorant and tastant were presented
together in solution (Delwiche and Heffelfinger 2005).
Reciprocal effects of odors on tastes are also found. These include increases in the detection
accuracy of a sweet taste at around threshold in the presence of an orthonasally presented congruent
odorant (strawberry) as compared to one that was not sweet (ham) (Djordjevic et al. 2004), as well
as a similar effect using a priming procedure, in which the odorant preceded the taste presentation
(Prescott 2004b), showing that a sweet-smelling odor produced a greater change in detectability,
relative to no odor, than did another, nonsweet odorant. Similar priming effects at suprathreshold
levels have been demonstrated behaviorally in a study in which subjects were asked to identify a
taste quality sipped from a device that also simultaneously presented an odor—either congruent or
incongruent—orthonasally. Speed of naming of tastes during presentation of congruent odor/taste
pairs (sweet smelling cherry odor/sucrose; sour smelling grapefruit odor/citric acid) was faster rela-
tive to incongruent pairs (cherry odor/citric acid; grapefruit odor/sucrose), or neutral/control pairs
(either butanol or no odor plus either sucrose or citric acid) (White and Prescott 2007).
was paired with a taste, a cross-modal configural stimulus—that is, a flavor—is encoded in mem-
ory. Subsequently sniffing the odor alone will evoke the most similar odor memory—the flavor—
that will include both the odor and the taste component. Thus, for example, sniffing caramel odor
activates memorial representations of caramel flavors, which includes a sweet taste component.
This results either in perceptions of smelled taste properties such as sweetness or, in the case of a
mixture, a perceptual combination of the memorial odor representation with the physically present
taste in solution (Stevenson and Boakes 2004; Stevenson et al. 1998).
Olfactory nerves
Olfactory
receptors
Smell ahead
in time
Taste ahead
FIGURE 35.1 Temporal and spatial determinants of odor/taste integration. Combination of smell and taste
into a single sensation. A varying time difference between stimuli moves locus of sensation from tip of the
nose back to the throat and forward again to tip of the tongue. (Reprinted from von Bekesy, G., J. Appl.
Physiol., 19, 369–373, 1964. Copyright, used with permission from The American Physiological Society.)
at least in the case of temporal asynchrony, congruency between the odor and taste may be crucial.
Hence, it has been demonstrated that judgments of audiovisual asynchrony are more difficult when
the different modalities are bound by a common origin (Spence and Parise 2010).
The olfactory location illusion is effectively an equivalent phenomenon to the auditory/visual
“ventriloquism effect” in that, like the ventriloquist’s voice, the location of the odor is captured
by other sensory inputs. The extent to which either concurrent taste or somatosensation, or both,
is chiefly responsible for the capture and referral of olfactory information to the oral cavity is not
known. However, the somatosensory system is more strongly implicated since it provides more
detailed spatial information than does taste (Lim and Green 2008). Moreover, in neuroimaging
studies, odors that are available to bind with tastes—that is, those presented retronasally (via the
mouth) —have been shown to activate the mouth area of the primary somatosensory cortex, whereas
the same odors presented via the nose do not (Small et al. 2005). This distinction, which occurs even
when subjects are unaware of route of stimulation, suggests a likely neural correlate of the binding
process, and supports the idea that somatosensory input is the underlying mechanism.
In fact, our tastes experiences may themselves be multimodal. Under most circumstances, taste
and tactile sensations in the mouth are so well integrated that we cannot begin to disentangle them,
and there is growing evidence that our everyday experiences of taste are themselves multisensory,
in that they involve somatosensory input (Green 2003; Lim and Green 2008). Taste buds are inner-
vated by somatosensory fibers (Whitehead et al. 1985) and various categories of somatosensory
stimuli are also capable of inducing taste sensations. Thus, it has been noted that about 25% of
fungiform papillae respond to tactile stimulation by fine wires with a taste quality (Cardello 1981).
More recently, tastes have been shown to be elicited by heated and cooled probes placed on areas
innervated by cranial nerves VII and IX, which subserve taste (Cruz and Green 2000), and by
the application of the prototypical “pure” irritant, capsaicin, to circumvallate papillae (Green and
Hayes 2003). Further evidence points to the ability of tactile stimulation to capture taste, presum-
ably by providing superior spatial information and enhancing localization (Delwiche et al. 2000;
Lim and Green 2008; Todrank and Bartoshuk 1991). Tactile information may therefore have an
important role in binding tastes, perhaps together with odors, both to one another and to a physical
stimulus such as a food.
The binding of odors to tastes and tactile stimuli may also rely on processing information about
the origins of odor stimulation. Orthonasally presented odors are more readily identified and have
lower thresholds than the same odors presented retronasally via the mouth (Pierce and Halpern
708 The Neural Bases of Multisensory Processes
1996; Voirol and Daget 1986), and there is a strong suggestion that the two routes of stimulation are
processed with some independence. Thus, neuroimaging studies show different activation patterns
in cortical olfactory areas as a result of route of administration (Small et al. 2005). From an adaptive
point of view, this makes sense. Olfaction has been described (Rozin 1982) as the only dual sense
because it functions both to detect volatile chemicals in the air (orthonasal sniffing) and to classify
objects in the mouth as foods or not, and each of these roles has unique adaptive significance. Since
the mouth acts as the gateway to the gut, our chemical senses can be seen as part of a defense system
to protect our internal environment—once something is placed in the mouth, there is high survival
value in deciding whether consumption is appropriate. Sensory qualities (tastes, retronasal odors,
tactile qualities) occurring together in the mouth are therefore bound into a single perception, which
identifies a substance as a food (cf. Gibson 1966).
of light wavelengths). By contrast, the mixing of tastes is typically seen as an analytic process,
because individual taste qualities do not fuse to form new qualities and, like simultaneous auditory
tones, can be distinguished from one another in mixtures. A further category of interaction, namely,
fusion—the notion of sensations combined to form a single percept, rather than combining syntheti-
cally to form a new sensation—has also been proposed and applied to flavor perception (McBurney
1986).
The notion of fusion in flavor perception implies that the percept remains analyzable into its
constituent elements even when otherwise perceived as a whole. Thus, although our initial response
is to apple flavor—an effortless combining of all of its sensory qualities into a single percept—we
can, if required, switch between a synthetic approach to flavor and an analysis of the flavor ele-
ments. Hence, apple flavor can be both a synthetic percept and, with minimal effort, a collection of
tastes (sweet; sour), textures (crisp; juicy) and odor notes (lemony; acetone-like; honey) (see Figure
35.2). A more precise way of conceptualizing flavor therefore is that cross-modal sensory signals
are combined to produce a percept, rather than combining synthetically—in the way that odors
themselves do—to form a new sensation. During normal food consumption, we typically respond
to flavors synthetically—an approach reinforced by the olfactory illusion and by the extent to which
flavor components are congruent. As noted earlier, this implies a sharing of perceptual qualities,
for example, sweetness of a taste and of an odor, derived from prior experience of these qualities
together.
Conversely, analytic approaches to complex food or other flavor stimuli (e.g., wines) are often
used by trained assessors to provide a descriptive profile of discrete sensory qualities, as distinct
from an assessment of the overall flavor. Asking assessors to become analytical appears to produce
the same inhibitory effects on odor–taste interactions noted in studies by Frank et al. (1993) and
others. In one study using both trained descriptive panelists and untrained consumers (Bingham et
al. 1990), solutions of the sweet-smelling odorant maltol plus sucrose were rated as sweeter than a
solution of sucrose alone by the untrained consumers. In contrast, no such enhancement was found
in the ratings of those trained to adopt an analytical approach to the sensory properties of this
mixture.
In experimental paradigms, whether an odor/taste mixture is perceived analytically or syntheti-
cally can be determined by the responses required of the subject. Multiple ratings of appropriate
FIGURE 35.2 Synthetic and analytic views of a flavor. In each case, sensory signals are identical, but per-
ception differs—whole flavor of apple versus a collection of sensory qualities on which different apples may
vary.
710 The Neural Bases of Multisensory Processes
attributes force an analytical approach, whereas a single rating of a sensory quality that can apply
to both congruent odors and tastes (e.g., the tasted sweetness of sucrose and the smelled sweetness
of strawberry odor) encourages synthesis of the common quality from both sensory modalities.
The components of these flavors may not be treated separately when judged in terms of sweetness
or other single characteristics. When instructions require separation of the components, however,
this can be done—the components of a flavor are evaluated individually, and sweetness enhance-
ment is eliminated. In other words, rating requirements lead to different perceptual approaches
(analytical or synthetic) that, in turn, influence the degree of perceptual integration that occurs.
A recent study of odor mixtures has indicated that an analytical approach is similarly able to influ-
ence the integration of the individual mixture components, as shown in a reduction in the extent
to which subjects perceived a unique quality distinct from those of the components (Le Berre
et al. 2008).
not become sweeter or sourer smelling, whereas taste-paired odors that had not been pre-exposed
did. In contrast, initial attempts to disrupt configural integrity by directing attention toward the
elemental nature of the compound stimulus during associative pairing were unsuccessful. Neither
training subjects to distinguish the individual odor and taste components of flavors prior to learning
(Stevenson and Case 2003) nor predisposing subjects to adopt an analytical strategy by requiring
intensity ratings of these odor and taste components separately during their joint exposure (Prescott
et al. 2004) were initially successful in influencing whether odors paired with a sweet tastes became
sweeter smelling. This is probably attributable to methodological reasons.
If it is the case that odors and tastes are automatically coencoded as a flavor in the absence of task
demands that focus attention on the elements, then experimental designs in which odor and taste
elements appear together without such an attentional strategy are likely to predispose toward syn-
thesis. Hence, the analytical strategy used by Stevenson and Case (2003) was likely to be ineffective
since they asked subjects during the exposure to rate overall liking for the odor–taste compound,
an approach that may have encouraged integration of the elements. The analytical manipulation in
Prescott et al.’s (2004) study may not have influenced the development of smelled sweetness because
it took place after the initial pairing of the sweet taste and odor that occurred before the formal asso-
ciative process—that is, as the preconditioning measure in the pre–post design. As noted earlier, a
second study in Prescott et al.’s (2004) report demonstrated that a single odor–sweet taste coexpo-
sure can produce an odor that smells sweet.
More recently, it has been demonstrated that when such methodological considerations are
addressed, prior analytical training in which attention is explicitly directed toward the individual
elements in an odor and sweet taste mixture does inhibit the development of a sweet-smelling odor
(Prescott and Murphy 2009; see Figure 35.3a). In this study, subjects only ever received a particular
(a) (b)
35
30 Sucrose
Mean change in odor smelled sweetness
Water 30
Mean change in tasted sweetness
20
25
10 20
15
0
10
–10
5
–20
0
Synthetic Analytic Synthetic Analytic
–30 –5
Group Group
FIGURE 35.3 Changes in perceptual characteristics of odors and flavors as a function of odor–taste coex-
posure and attentional strategy. (a) Mean ratings of smelled sweetness of odors increase after repeat paired
with a sweet taste in solution, but only for a group using a strategy in which odor and taste elements are
treated synthetically. In contrast, coexposure to same odor–taste mixture when odor and taste are attended to
analytically as distinct elements produces no such increase in smelled sweetness. (Reprinted from Prescott,
J., and Murphy, S., Q. J. Exp. Psychol., 62 (11), 2133–2140, 2009. Copyright, with permission from Taylor &
Francis.) (b) Mean ratings of sweetness of a flavor composed of sucrose in solution together with an odor that
has previously been conditioned with this taste so that it smells sweet. Despite this, enhancement is evident
only in a group that treated elements synthetically during their association. (Adapted from Prescott, J. et al.,
Chem. Senses, 29, 331–334, 2004. With permission.)
712 The Neural Bases of Multisensory Processes
odor taste combination under conditions in which they had been trained to respond to the combina-
tion in explicitly synthetic or analytical ways. Moreover, the fact that the training used different
odor/taste combinations than were later used in the conditioning procedure suggests that an atten-
tional approach (analytical or synthetic) was being induced in the subjects during training that was
then applied to new odor/taste combinations during conditioning.
The findings from this study have important theoretical implications, in that they are clearly
consistent with configural accounts of perceptual odor–taste learning and flavor representation
(Stevenson and Boakes 2004; Stevenson et al. 1998). Under conditions where attention is directed
toward individual stimulus elements during conditioning, the separate representation of these ele-
ments may be incompatible with learning of a configural representation. This explanation is sup-
ported by the demonstration that an analytical approach also acted to inhibit a sweet-smelling odor’s
ability to enhance a sweet taste when the odor/taste combination were evaluated in solution after
repeated pairing (Prescott et al. 2004; see Figure 35.3b). In other words, an analytical attentional
strategy can be shown to interfere with either the development of a flavor configuration resulting
from associative learning, or the subsequent ability of this configuration to combine with a physi-
cally present tastant.
learning also eliminates the transfer of hedonic properties from the taste to the odor (Prescott and
Murphy 2009), suggesting that the formation of an odor–taste configuration that includes hedonic
values has been inhibited. Recent evidence also suggests that, even after learning, the hedonic
value of a flavor can be altered by the extent to which an analytical approach is taken to the fla-
vor. Comparisons between acceptability ratings alone and the same ratings followed by a series
of analytical ratings of flavor sensory qualities found a reduction of liking in the latter condition
(Prescott et al. 2011), suggesting that analytical approaches are inhibitory to liking even once that
liking has been established. The explanation for this effect is that, as with the similar effects on
perceptual learning reported by Prescott et al. (2004), an analytical attentional strategy is induced
by knowledge that the flavour is to be perceptually analyzed, reducing the configuration process
responsible for the transfer of hedonic properties. This finding joins a number of others indicating
that analytical cognitions are antagonistic toward the expression of likes and dislikes (Nordgren
and Dijksterhuis 2008).
An additional consequence of EC has been demonstrated in studies that have measured the
behavioral consequences of pairing an odor with a tastant that may be valued metabolically. A
considerable body of animal (Myers and Sclafani 2001) and human (Kern et al. 1993; Prescott
2004a; Yeomans et al. 2008b) literature exists showing that odor–taste pairing leading to learned
preferences is highly effective when a tastant that provides valued nutrients is ingested. This process
can be shown to be independent of the hedonic value of the tastant—for example, by comparing
conditioning of odors using sweet tastants that provide energy (e.g., sucrose) with those that do not
(Mobini et al. 2007). As with EC generally, this form of postingestive conditioning is sensitive to
motivational state and is maximized when conditioning and evaluation of learning take place under
relative hunger (Yeomans and Mobini 2006). It has also been recently demonstrated that a novel
flavor paired with ingested monosodium glutamate (MSG) not only increased in rated liking, even
when tested without added MSG, but also, relative to a non-MSG control, produced behavioral
changes including increases in ad libitum food intake and rated hunger after an initial tasting of the
flavor (Yeomans et al. 2008).
Finally, one interesting behavioral consequence of odor–taste perceptual integration has been
a demonstration that a sweet-smelling odor significantly increased pain tolerance relative to a no-
odor control (Prescott and Wilkie 2007). Given that the effect was not seen in an equally pleasant,
but not sweet-smelling, odor, the conclusion drawn was that the odor sweetness was acting in an
equivalent manner to sweet tastes, which have been shown to have this same effect on pain (Blass
and Hoffmeyer 1991). Although the presumption is that such effects are also the result of the same
learned integration that produces the sweet smell and the ability to modify taste perceptions, the
crucial demonstration of this has yet to be carried out. It does suggest, however, that the process of
elicitation of a flavor representation by an odor may have broad behavioral as well as perceptual and
hedonic consequences.
There have been some recent attempts to explore the practical implications of odor–taste learn-
ing, opening opportunities to perhaps exploit its consequences. It has been shown, for example, that
the enhancement of tastes by congruent odors seen in model systems (i.e., solutions) also occurs in
foods, with bitter- and sweet-smelling odors enhancing their respective congruent tastes in milk
drinks (Labbe et al. 2006). Also consistent with data derived from model systems was a failure in
these studies for a sweet-smelling odor to enhance the sweetness of an unfamiliar beverage. Most
recently, an examination of the potential for odors from a range of salty foods to enhance saltiness
in solution (Lawrence et al. 2009) raised the possibility that such odors could be used to effectively
reduce the sodium content of foods, without the typical concurrent loss of acceptability that occurs
(Girgis et al. 2003). Similarly, the finding that odors can take on fatlike properties after associa-
tive pairing with fats (Sundquist et al. 2006) might allow odors to partially substitute for actual fat
content in foods. These studies therefore point to an exciting prospect, in which research aimed at
understanding multisensory processes in flavor perception may lead to applications that ultimately
have important public health consequences.
714 The Neural Bases of Multisensory Processes
REFERENCES
Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez. 1990. Flavor–flavor and color–flavor conditioning in
humans. Learning and Motivation 21: 434–445.
Bingham, A. F., G. G. Birch, C. De Graaf, J. M. Behan, and K. D. Perring. 1990. Sensory studies with sucrose–
maltol mixtures. Chemical Senses 15(4): 447–456.
Blass, E. M., and L. B. Hoffmeyer. 1991. Sucrose as an analgesic for newborn infants. Pediatrics 87(2):
215–218.
Brillat-Savarin, J.-A. 1825. The physiology of taste, 1994 ed. London: Penguin Books.
Burdach, K. J., J. H. A. Kroeze, and E. P. Koster. 1984. Nasal, retronasal, and gustatory perception: An experi-
mental comparison. Perception & Psychophysics 36(3): 205–208.
Calvert, G. A., M. J. Brammer, E. T. Bullmore, R. Campbell, S. D. Iversen, and A. S. David. 1999. Response
amplification in sensory-specific cortices during crossmodal binding. NeuroReport 10: 2619–2623.
Calvert, G. A., M. J. Brammer, and S. D. Iversen. 1998. Crossmodal identification. Trends in Cognitive Sciences
2(7): 247–253.
Cardello, A. V. 1981. Comparison of taste qualities elicited by tactile, electrical and chemical stimulation of
single human taste papillae. Perception & Psychophysics 29: 163–169.
Clark, C. C., and H. T. Lawless. 1994. Limiting response alternatives in time–intensity scaling: An examination
of the halo-dumping effect. Chemical Senses 19(6): 583–594.
Cruz, A., and B. G. Green. 2000. Thermal stimulation of taste. Nature 403: 889–892.
Dalton, P., N. Doolittle, H. Nagata, and P. A. S. Breslin. 2000. The merging of the senses: Integration of sub-
threshold taste and smell. Nature Neuroscience 3: 431–432.
De Houwer, J., S. Thomas, and F. Baeyens. 2001. Associative learning of likes and dislikes: A review of
25 years of research on human evaluative conditioning. Psychological Bulletin 127(6): 853–869.
Delwiche, J. F., and A. L. Heffelfinger. 2005. Cross-modal additivity of taste and smell. Journal of Sensory
Studies 20: 512–525.
Delwiche, J. F., M. L. Lera, and P. A. S. Breslin. 2000. Selective removal of a target stimulus localized by taste
in humans. Chemical Senses 25: 181–187.
Djordjevic, J., R. J. Zatorre, and M. Jones-Gotman. 2004. Effects of perceived and imagined odors on taste
detection. Chemical Senses 29: 199–208.
Dravnieks, A. 1985. Atlas of odor character profiles. Philadelphia, PA: American Society for Testing and
Materials.
Driver, J., and C. Spence. 2000. Multisensory perception: Beyond modularity and convergence. Current Biology
10: R731–R735.
Frank, R. A. 2003. Response context affects judgments of flavor components in foods and beverages. Food
Quality and Preference 14: 139–145.
Frank, R. A., and J. Byram. 1988. Taste-smell interactions are tastant and odorant dependent. Chemical Senses
13(3): 445–455.
Frank, R. A., K. Ducheny, and S. J. S. Mize. 1989. Strawberry odor, but not red color, enhances the sweetness
of sucrose solutions. Chemical Senses 14(3): 371–377.
Frank, R. A., N. J. van der Klaauw, and H. N. J. Schifferstein. 1993. Both perceptual and conceptual factors
influence taste–odor and taste–taste interactions. Perception & Psychophysics 54(3): 343–354.
Gibson, J. J. 1966. The senses considered as perceptual systems. Boston: Houghton Mifflin Company.
Girgis, S., B. Neal, J. Prescott et al. 2003. A one-quarter reduction in the salt content of bread can be made
without detection. European Journal of Clinical Nutrition 57(4): 616–620.
Green, B. G. 2003. Studying taste as a cutaneous sense. Food Quality and Preference 14: 99–109.
Green, B. G., and J. E. Hayes. 2003. Capsaicin as a probe of the relationship between bitter taste and chemes-
thesis. Physiology & Behavior 79: 811–821.
Kern, D. L., L. McPhee, J. Fisher, S. Johnson, and L. L. Birch. 1993. The postingestive consequences of fat
condition preferences for flavors associated with high dietary fat. Physiology & Behavior 54: 71–76.
Kobayakawa, T., H. Toda, and N. Gotow. 2009. Synchronicity judgement of gustation and olfaction. Paper
presented at the Association for Chemoreception Sciences, Sarasota, FL.
Labbe, D., L. Damevin, C. Vaccher, C. Morgenegg, and N. Martin. 2006. Modulation of perceived taste by
olfaction in familiar and unfamiliar beverages. Food Quality and Preference 17: 582–589.
Laing, D. G., and M. E. Willcox. 1983. Perception of components in binary odour mixtures. Chemical Senses
7(3–4): 249–264.
Lawrence, G., C. Salles, C. Septier, J. Busch, and T. Thomas-Danguin. 2009. Odour–taste interactions: A way
to enhance saltiness in low-salt content solutions. Food Quality and Preference 20(3): 241–248.
Multimodal Chemosensory Interactions and Perception of Flavor 715
Le Berre, E., T. Thomas-Danguin, N. Beno, G. Coureaud, P. Etievant, and J. Prescott. 2008. Perceptual process-
ing strategy and exposure influence the perception of odor mixtures. Chemical Senses 33: 193–199.
Levey, A. B., and I. Martin. 1975. Classical conditioning of human ‘evaluative’ responses. Behavior Research
& Therapy 13: 221–226.
Lim, J., and B. G. Green. 2008. Tactile interaction with taste localization: Influence of gustatory quality and
intensity. Chemical Senses 33: 137–143.
Martino, G., and L. E. Marks. 2001. Synesthesia: Strong and weak. Current Directions in Psychological
Science 10(2): 61–65.
McBurney, D. H. 1986. Taste, smell, and flavor terminology: Taking the confusion out of fusion. In
Clinical measurement of taste and smell, ed. H. L. Meiselman and R. S. Rivkin, 117–125. New York:
Macmillan.
Melara, R. D., L. E. Marks, and K. E. Lesko. 1992. Optional processes in similarity judgments. Perception &
Psychophysics 51(2): 123–133.
Mobini, S., L. C. Chambers, and M. R. Yeomans. 2007. Interactive effects of flavour–flavour and flavour–con-
sequence learning in development of liking for sweet-paired flavours in humans. Appetite 48: 20–28.
Myers, K. P., and A. Sclafani. 2001. Conditioned enhancement of flavor evaluation reinforced by intragastric
glucose: I. Intake acceptance and preference analysis. Physiology & Behavior 74: 481–493.
Nguyen, D. H., D. Valentin, M. H. Ly, C. Chrea, and F. Sauvageot. 2002. When does smell enhance taste? Effect
of culture and odorant/tastant relationship. Paper presented at the European Chemoreception Research
Organisation conference, Erlangen, Germany.
Nordgren, L. F., and A. P. Dijksterhuis. 2008. The devil is in the deliberation: Thinking too much reduces pref-
erence consistency. Journal of Consumer Research 36: 39–46.
Pangborn, R. M. 1970. Individual variation in affective responses to taste stimuli. Psychonomic Science 21(2):
125–126.
Pfieffer, J. C., T. A. Hollowood, J. Hort, and A. J. Taylor. 2005. Temporal synchrony and integration of sub-
threshold taste and smell signals. Chemical Senses 30: 539–545.
Pierce, J., and B. Halpern. 1996. Orthonasal and retronasal odorant identification based upon vapor phase input
from common substances. Chemical Senses 21(5): 529–543.
Prescott, J. 1998. Comparisons of taste perceptions and preferences of Japanese and Australian consum-
ers: Overview and implications for cross-cultural sensory research. Food Quality and Preference 9(6):
393–402.
Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory
qualities of foods. Food Quality and Preference 10: 349–356.
Prescott, J. 2004a. Effects of added glutamate on liking for novel food flavors. Appetite 42(2): 143–150.
Prescott, J. 2004b. Psychological processes in flavour perception. In Flavour perception, ed. A. J. Taylor and
D. Roberts, 256–277. London: Blackwell Publishing.
Prescott, J., V. Johnstone, and J. Francis. 2004. Odor/taste interactions: Effects of different attentional strategies
during exposure. Chemical Senses 29: 331–340.
Prescott, J., and S. Murphy. 2009. Inhibition of evaluative and perceptual odour–taste learning by attention to
the stimulus elements. Quarterly Journal of Experimental Psychology 62: 2133–2140.
Prescott, J., K.-O. Kim, and S. M. Lee. 2011. Analytic approaches to evaluation modify hedonic responses.
Food Quality and Preference 22: 391–393.
Prescott, J., and J. Wilkie. 2007. Pain tolerance selectively increased by a sweet-smelling odor. Psychological
Science 18(4): 308–311.
Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Perception & Psychophysics
31(4): 397–401.
Sakai, N., T. Kobayakawa, N. Gotow, S. Saito, and S. Imada. 2001. Enhancement of sweetness ratings of aspar-
tame by a vanilla odor presented either by orthonasal or retronasal routes. Perceptual and Motor Skills
92: 1002–1008.
Schifferstein, H. N. J., and P. W. J. Verlegh. 1996. The role of congruency and pleasantness in odor-induced
taste enhancement. Acta Psychologica 94: 87–105.
Small, D. M., J. C. Gerber, Y. E. Mak, and T. Hummel. 2005. Differential neural responses evoked by orthona-
sal versus retronasal odorant perception in humans. Neuron 47: 593–605.
Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current
Biology 13: R519–R521.
Spence, C., and C. Parise. 2010. Prior-entry: A review. Consciousness and Cognition 19: 364–379.
Stein, B. E., W. S. Huneycutt, and M. A. Meredith. 1988. Neurons and behavior: The same rules of multisen-
sory integration apply. Brain Research 448: 355–358.
716 The Neural Bases of Multisensory Processes
Steiner, J. E., D. Glaser, M. E. Hawilo, and K. C. Berridge. 2001. Comparative expression of hedonic impact:
Affective reactions to taste by human infants and other primates. Neuroscience & Biobehavioral Reviews
25: 53–74.
Stevenson, R. J., and R. A. Boakes. 2004. Sweet and sour smells: The acquisition of taste-like qualities by odors.
In Handbook of multisensory processes, ed. G. Calvert, C. B. Spence, and B. Stein, 69–83. Cambridge,
MA: MIT Press.
Stevenson, R. J., R. A. Boakes, and J. Prescott. 1998. Changes in odor sweetness resulting from implicit learn-
ing of a simultaneous odor–sweetness association: An example of learned synesthesia. Learning and
Motivation 29: 113–132.
Stevenson, R. J., and T. I. Case. 2003. Preexposure to the stimulus elements, but not training to detect them,
retards human odour–taste learning. Behavioural Processes 61: 13–25.
Stevenson, R. J., J. Prescott, and R. A. Boakes. 1995. The acquisition of taste properties by odors. Learning &
Motivation 26: 1–23.
Stevenson, R. J., J. Prescott, and R. A. Boakes. 1999. Confusing tastes and smells: How odors can influence the
perception of sweet and sour tastes. Chemical Senses 24: 627–635.
Sumby, W. H., and I. Polack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical
Society of America 26: 212–215.
Sundquist, N., R. J. Stevenson, and I. R. J. Bishop. 2006. Can odours acquire fat-like properties? Appetite 47:
91–99.
Todrank, J., and L. M. Bartoshuk. 1991. A taste illusion: Taste sensation localised by touch. Physiology &
Behavior 50: 1027–1031.
Treisman, A. 1998. Feature binding, attention and object perception. Philosophical Transactions of the Royal
Society, London B 353: 1295–1306.
Treisman, A. 2006. How the deployment of attention determines what we see. Visual Cognition 14: 411–443.
van der Klaauw, N. J., and R. A. Frank. 1996. Scaling component intensities of complex stimuli: The influence
of response alternatives. Environment International 22(1): 21–31.
Voirol, E., and N. Daget. 1986. Comparative study of nasal and retronasal olfactory perception. Lebensmittel-
Wissenschaft und-Technologie 19: 316–319.
von Bekesy, G. 1964. Olfactory analogue to directional hearing. Journal of Applied Physiology 19: 369–373.
White, T. L., and J. Prescott. 2007. Chemosensory cross-modal Stroop effects: Congruent odors facilitate taste
identification. Chemical Senses 32: 337–341.
Whitehead, M. C., C. S. Beeman, and B. A. Kinsella. 1985. Distribution of taste and general sensory nerve end-
ings in fungiform papillae of the hamster. American Journal of Anatomy 173: 185–201.
Yeomans, M. R., N. Gould, S. Mobini, and J. Prescott. 2008a. Acquired flavor acceptance and intake facilitated
by monosodium glutamate in humans. Physiology & Behavior 93: 958–66.
Yeomans, M. R., M. Leitch, N. J. Gould, and S. Mobini. 2008b. Differential hedonic, sensory and behav-
ioral changes associated with flavor–nutrient and flavor–flavor learning. Physiology & Behavior 93:
798–806.
Yeomans, M. R., and S. Mobini. 2006. Hunger alters the expression of acquired hedonic but not sensory quali-
ties of food-paired odors in humans. Journal of Experimental Psychology: Animal Behavior Processes
32(4): 460–466.
Yeomans, M. R., S. Mobini, T. D. Elliman, H. C. Walker, and R. J. Stevenson. 2006. Hedonic and sensory char-
acteristics of odors conditioned by pairing with tastants in humans. Journal of Experimental Psychology:
Animal Behavior Processes 32(3): 215–228.
Yeomans, M. R., J. Prescott, and N. G. Gould. 2009. Individual differences in responses to tastes determine
hedonic and perceptual changes to odours following odour/taste conditioning. Quarterly Journal of
Experimental Psychology 62(8): 1648–1664.
Zellner, D. A., P. Rozin, M. Aron, and C. Kulish. 1983. Conditioned enhancement of human’s liking for flavor
by pairing with sweetness. Learning and Motivation 14: 338–350.
36 A Proposed Model of
a Flavor Modality
Dana M. Small and Barry G. Green
CONTENTS
36.1 Introduction........................................................................................................................... 717
36.2 Flavor is Taste, Touch, and Smell.......................................................................................... 717
36.3 Oral Referral.......................................................................................................................... 720
36.3.1 Olfactory Referral...................................................................................................... 720
36.3.2 Taste Referral: Localization of Taste by Touch......................................................... 724
36.3.3 Shared Qualities between Olfaction and Taste.......................................................... 725
36.4 The Proposed Model.............................................................................................................. 725
36.4.1 Odor Objects.............................................................................................................. 725
36.4.1.1 Synthesis..................................................................................................... 725
36.4.1.2 Experience.................................................................................................. 727
36.4.2 Flavor Objects............................................................................................................ 727
36.4.3 Encoding of Flavor Objects....................................................................................... 728
36.5 Neural Mechanisms............................................................................................................... 729
36.5.1 The Binding Mechanism........................................................................................... 729
36.5.2 Neural Correlates of Flavor Object............................................................................ 731
36.6 Alternative Models................................................................................................................ 733
36.7 Summary............................................................................................................................... 733
References....................................................................................................................................... 733
36.1 INTRODUCTION
The perception of flavor occurs when a food or drink enters the mouth. Although the resulting per-
ception depends on inputs from multiple sensory modalities, it is experienced as a unitary percept of
a food or beverage. In this chapter the psychophysical characteristics and neural substrates of flavor
perception are reviewed within the context of a proposed model of a flavor modality in which the
diverse sensory inputs from the mouth and nose become integrated. More specifically, it is argued
that a binding mechanism in the somatomotor mouth area of the cortex brings taste, touch, and
smell together into a common spatial register and facilitates their perception as a coherent “flavor
object.” We propose that the neural representation of the flavor object is a distributed pattern of
activity across the insula, overlying operculum (including the somatomotor mouth region), orbito
frontal, piriform, and anterior cingulate cortex.
717
718 The Neural Bases of Multisensory Processes
Ortho
Retro
Nares
Ortho
Retro
Tongue
FIGURE 36.1 Orthonasal vs. retronasal olfaction. Schematic depiction of two routes of olfactory percep-
tion: orthonasal and retronasal. Odors sensed orthonasally enter the body through the nose (nares) and travel
directly to olfactory epithelium in nasal cavity. Odors sensed retronasally enter the mouth during eating and
drinking. Volatiles are released from food or drink and subsequently pass through the nasophyarynx at back
of oral cavity to enter nasal cavity and reach olfactory epithelium. (From Kringelbach, M.L., Berridge, K.C.,
eds., Oxford handbook: Pleasures of the brain, 2009. With permission from Oxford University Press, Inc.)
Chandrashekar et al. 2006), and perhaps fat (Chale-Rush et al. 2007; Gilbertson 1998; Gilbertson
et al. 1997). Each of the five major taste qualities serves to signal a specific class of nutrients or
potential threats: sweet signals energy in the form of calories, salty signals electrolytes, sour signals
low pH, savory (umami) signals proteins, and since most poisonous substances are bitter, bitter-
ness signals potential toxins (Scott and Plata-Salaman 1991). Thus, the sense of taste helps identify
physiologically beneficial nutrients and potentially harmful stimuli. Because taste receptors lie side
by side in the oral cavity with thermoreceptors, mechanoreceptors, and nociceptors, everything that
is tasted induces tactile and thermal sensations, and sometimes also chemesthetic sensations (e.g.,
burning and stinging; Green 2003; Simon et al. 2008). In addition, some taste stimuli can them-
selves evoke somatosensory sensations. For example, in moderate to high concentrations, salts and
acids can provoke chemesthetic sensations of burning, stinging, or pricking (Green and Gelhard
1989; Green and Lawless 1991). Consequently, even presumably “pure taste” stimuli can have an
oral somatosensory component.
The taste signal itself is carried from taste receptor cells in the oral cavity by cranial nerves
VII, IX, and X to the nucleus of the solitary tract in the brainstem, where taste inputs are joined
by oral somatosensory projections from the spinal trigeminal nucleus. The precise locations of the
trigeminal projections vary across species, but there is evidence (including in humans) of overlap
with gustatory areas (Whitehead 1990; Whitehead and Frank 1983), and of tracts that run within the
nucleus of the solitary tract that may facilitate cross-modal integration (Travers 1988; Figure 36.2).
Somatosensory input also reaches the nucleus of the solitary tract via the glossopharyngeal nerve,
which contains taste-sensitive, as well as mechano- and thermosensitive neurons (Bradley et al.
1992). Overlapping representation of gustatory and somatosensory information also occurs in the
A Proposed Model of a Flavor Modality 719
PO
ACC
AI Thalamus
MI
LOFC
LOFC VI
MOFC Amyg
Piri
OB
Taste
VII X Somatosensory
IX Olfactory
NST
V
FIGURE 36.2 Oral sensory pathways. A glass brain schematic depiction of taste (black circles), somatosen-
sory (white circles), and olfactory (gray circles) pathways. Anatomical locations are only approximate and
connectivity is not exhaustive. Information from taste receptors on tongue is conveyed via the chorda tympani
(VII), glossophyarngeal nerve (IX), and vagus nerve (X) to rostral nucleus of the solitary tract (NST), which
then projects to thalamus. From here, taste information projects to mid insula (MI) and anterior insula and
overlying frontal operculum (AI). AI also projects to ventral insula (VI), medial orbitofrontal cortex (MOFC),
and lateral orbitofrontal cortex (LOFC). Somatosensory input reaches NST via glossopharyngeal nerve (IX)
and trigeminal nerve (V), which then project to thalamus. Oral somatosensory information is then relayed to
opercular region of postcentral gyrus (PO). Olfactory information is conveyed via cranial nerve I to olfactory
bulb, which projects to primary olfactory cortex, including piriform cortex (piri). Piriform cortex, in turn,
projects to VI and orbitofrontal cortex. Anterior cingulated cortex (ACC) and amygdala (Amyg) are also
strongly interconnected with insula and orbital regions representing taste, smell, and oral somatosensation.
(From Kringelbach, M.L., Berridge, K.C., eds., Oxford handbook: Pleasures of the brain, 2009. With permis-
sion from Oxford University Press, Inc.)
thalamus (Pritchard et al. 1989) and at the cortical level (Cerf-Ducastel et al. 2001; Pritchard et al.
1986). For example, the primary gustatory cortex contains nearly as many somatosensory-specific
as taste-specific neurons, in addition to bimodal neurons responding to both somatosensory and
taste stimulation (Kadohisa et al. 2004; Plata-Salaman et al. 1992, 1996; Smith-Swintosky et al.
1991; Yamamoto et al. 1985). ��������������������������������������������������������������������
In sum, taste and oral somatosensation have distinct receptor mecha-
nisms, but their signals converge at virtually every level of the neuroaxis, suggestive of extensive
interaction.
Although taste and oral somesthesis provide critical information about the physicochemical
nature of ingested stimuli, it is the olfactory component of food that is required for flavor identifica-
tion (Mozell et al. 1969). The acts of chewing and swallowing release volatile molecules into the oral
cavity, which during exhalation traverse the epipharynx (also referred to as the nasopharynx) and
stimulate receptors on the olfactory epithelium. This process is referred to as retronasal olfaction
(Figure 36.1), in contrast to orthonasal olfaction, which occurs during inhalation through the nose.
Both orthonasal and retronasal olfactory signals are carried via cranial nerve I to the olfactory bulb,
which projects to the anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, several
amygdaloid subnuclei, and rostral entorhinal cortex and thalamus. These areas, in turn, project to
additional amygdala subnuclei, the entorhinal, insula, and orbitofrontal cortex (OFC) (de Olmos et
al. 1978; Price 1973; Turner et al. 1978; Figure 36.2). Thus, olfactory information is carried to the
720 The Neural Bases of Multisensory Processes
brain by distinct pathways and does not converge with gustation and oral somatosensation until
higher-order cortical regions, such as the insula and the OFC.
In summary, the perception of flavor depends on multiple distinct inputs that interact at several
levels in the central nervous system. How these interactions act to “bind” the signals into coherent
perceptions of flavor is currently unknown. Here, we propose a model in which the somatomotor
mouth area orchestrates this binding via a process that results in referral of olfactory sensations to
the oral cavity. It is worth noting that flavor percepts can also be influenced by visual inputs (Koza
et al. 2005) and by beliefs and expectations (de Araujo et al. 2003), which are factors that represent
top-down modulation of flavor. However, these types of cognitive effects are outside the scope of
the present chapter.
10
Nostrils open
No ethyl butyrate
"0.00133%"
"0.00685%"
5
"0.037%"
Line of identity
0
0 5 10 15
Nostrils closed
FIGURE 36.3 Taste–odor confusion. This figure is a stylized representation of data reported in Figure 4
of Murphy and colleagues (1977) (rendered with permission from Dr. Claire Murphy) and represents first
experimental demonstration of taste–odor confusion. Graph depicts perceived taste magnitude of mixtures
of ethyl butyrate and saccharin sipped when nostrils were open versus taste of magnitude of mixtures sipped
when nostrils were closed (open symbols). The parameter is concentration of odorant ethyl butyrate. Closed
circles represent judgments of stimuli that contained no ethyl butyrate, only saccharin. Dashed line is the line
of identity.
referral could be attributed to the activation of taste cells by odors, because the chemicals that pro-
duce taste-like smells (e.g., strawberry smells sweet) do not taste sweet when sampled in the mouth
with the nares occluded (Murphy and Cain 1980; Sakai et al. 2001; Schifferstein and Verlegh 1996;
Stevenson et al. 2000b). Thus, the sweet quality of an odor occurs in the absence of the activation of
taste receptor cells, but when sensed retronasally may nevertheless be attributed to taste.
Indeed, it has been argued that orthonasal and retronasal olfaction represent two distinct modali-
ties. Inspired by a comment made by a friend that “I really love the taste (of Limburger cheese) if
only I can get it by my nose,” Rozin (1982) first proposed that olfaction is a dual-sense modality,
with one component (orthonasal olfaction) specialized for sensing objects in the world and the other
(retronasal olfaction) specialized for sensing objects in the mouth. Building upon Gibson’s proposal
that “tasting” and “smelling” are distinct perceptual systems that cut across receptor classes, Rozin
suggested that “the same olfactory stimulation may be perceived and evaluated in two qualitatively
different ways, depending on whether it was referred to the mouth or the external world.” In support
of this view, he found that subjects frequently reported disliking the smell, but liking the taste, of
certain foods (e.g., fish, eggs, and cheese). He also demonstrated that subjects had great difficulty
correctly identifying flavor stimuli that had first been learned via the orthonasal route. These data
are therefore consistent with the notion that olfactory stimuli arising from the mouth have different
sensory–perceptual properties than those originating in the external world. Rozin suggested that
these perceptual processes might be achieved by differential gating of inputs triggered by the pres-
ence of a palpable object in the mouth, or by the direction of movement of the odor across the olfac-
tory mucosa. Alternatively, he posited that it may be that odor information is not gated but rather
is combined with available oral inputs into an emergent percept in which the olfactory component
loses its separate identity.
After the publication of Rozin’s hypothesis, several investigators argued that the differences
between orthonasal and retronasal olfaction were primarily quantitative rather than qualitative.
This argument was based on evidence that retronasal stimulation by the same physical stimulus
722 The Neural Bases of Multisensory Processes
tends to result in lower perceived intensity than orthonasal stimulation (Pierce and Halpern 1996;
Voirol and Dagnet 1986). Although it is clear that quantitative differences are present, there is
also more recent evidence supporting the duality hypothesis (Bender et al. 2009; Heilmann and
Hummel 2001; Hummel et al. 2006; Koza et al. 2005; Landis et al. 2005; Small et al. 2005; Sun and
Halpern 2005; Welge-Lussen et al. 2009). Of particular note, Hummel and his colleagues devised a
method for delivering odorants in the vapor phase via either the ortho- or retronasal routes (Figure
36.4). Critically, the method allows assessment of retronasal olfaction without stimulation of the
oral cavity (Heilmann and Hummel 2004). Two tubes are inserted into the subject’s nose under
endoscopic guidance so that one tube ends at the external nares (to achieve orthonasal delivery)
and the other tube at the epipharynx (to achieve retronasal delivery). The tubes are, in turn, con-
nected to a computer-controlled olfactometer that delivers pulses of odorant embedded in an odor-
less airstream. Using an electronic nose to measure the stimulus in the airspace below the olfactory
epithelium, the authors demonstrated that the maximum concentration and duration of the signal
was equivalent after delivery by either route (Hummel et al. 2006). Despite similar signals and
the absence of oral stimulation, the olfactory localization illusion was, in part, maintained (Figure
36.5). Subjects were more likely to report that the retronasal odors came from the back of the throat,
whereas orthonasal odors appeared to come from the nose. The mechanism(s) behind the olfac-
tory referral illusion remain unknown. However, this study ruled out intensity differences as a cue,
because the odors were titrated to equate perceived intensity. The finding also suggests that oral
stimulation is not required for at least some referral to occur, since the procedure involved neither
a gustatory nor somatosensory stimulus. However, in a subsequent investigation in which subjects
were asked to indicate if the odor were delivered orthonasally or retronasally (rather than localize it
to the nose or mouth), trigeminal (chemesthetic) stimulation was found to be an important factor for
making the discrimination (Frasnelli et al. 2008). More work is therefore needed to determine the
degree to which odors can be referred to the mouth based on the direction of flow of the olfactory
stimulus.
FIGURE 36.4 (See color insert.) An MRI image showing tubing placement using methods described by
Heilmann and Hummel (2004). This sagittal brain section reveals placement of nasal cannulae at external
nares to achieve orthonasal delivery, and at nasopharynx to achieve retronasal delivery. Tubes appear white
and odor delivery is represented by small white dots. (Reproduced from Small, D.M. et al., Neuron, 47,
593– 605, 2005. With permission from Elsevier.)
A Proposed Model of a Flavor Modality 723
Session 1 Session 2
Orthonasal Retronasal
30 30
***
***
* ***
Throat 15 15
0 0
Nose
–15 –15
–30 –30
H2S CO2 H2S CO2
FIGURE 36.5 Odorant localization. Preliminary data from 20 subjects showing that orthonasal odor is
perceived as coming from front of nasal cavity and retronasal odor as coming from back of nasal/oral cavity
(throat). This perception occurred despite constant airflow through both routes at all times and no change
in air pressure or flow rate during switching between odor and pure air. Odorants were one pure olfactant
[hydrogen sulfide (H2S)] and one olfactory/chemesthetic stimulus with a significant trigeminal component
[carbon dioxide (CO2)]. Results represent mean ratings from 20 subjects. Error bars represent standard error
of the mean. Positive numbers indicate that subjects perceived odor at back of nasal/oral cavity (throat area),
whereas negative numbers indicate subjects perceived odor at front of the nose; the higher the numbers, the
more certain were subjects about their decision (range of scale from −50 to 0, and from 0 to 50). Data were
obtained in two sessions separated by at least 1 day. Stimuli of 200-ms duration were presented using air-
dilution olfactometry (birhinal olfactometer OM6b; Burghart Instruments, Wedel, Germany). Thus, stimula-
tion was the same as used in fMRI study (t-test: *p < .05; ***p < .001). (Reproduced from Small, D.M. et al.,
Neuron, 47, 593–605, 2005. With permission from Elsevier.)
A possible mechanism by which such referral might occur is the direction of odorant flow across
the olfactory epithelium. Indeed, since the data supplied from the electronic nose indicated that the
physical stimulus arriving at the epithelium can be identical (at least for the measured parameters),
the primary difference between the routes in Hummel’s experiments was the direction of odorant
flow. Hummel and colleagues therefore suggested there may be a distinct organization of olfactory
receptor neurons in the back versus the more anterior portions of the nasal cavity. This hypothesis
is consistent with Mozell’s chromotagraphic model of olfaction, which postulates that the pattern of
odorant binding to receptors can lead to different odor perceptions (Mozell 1970). Further support
for the chromatographic model comes from a study in the laboratory of Sobel et al. (1999), which
showed that subtle differences in airflow patterns between the left and right nostrils can lead to dif-
ferent perceptual experiences.
Although neither taste nor oral somatosensation appear to be required for at least some degree
of referral to occur (Heilmann and Hummel 2004; Hummel et al. 2006; Small et al. 2005), fur-
ther study is needed to determine if stimulation of these modalities may nevertheless contribute to
referral.
In summary, the olfactory localization illusion, coupled with the fact that flavor identity is con-
veyed primarily by olfaction, leads to the perception that flavors come from the mouth. Despite
the fact that this illusion has a profound impact on flavor perception, the mechanisms that produce
it remain unknown. Possible mechanisms include spatiotemporal differences in odorant binding
across the olfactory epithelium during retro- versus orthonasal stimulation, and/or capture by tactile
and/or gustatory stimulation.
724 The Neural Bases of Multisensory Processes
“Veridical” condition
1 cm
“Referral” condition
Taste stimulus dH2O
1 cm
FIGURE 36.6 Taste localization by touch. Stimulus configuration used to measure referral of taste sensations to
site of tactile stimulation. On each trial, experimenter touched three saturated cotton swabs simultaneously to ante-
rior edge of tongue, producing identical tactile stimulation at each site. In veridical condition (top), taste stimulus
was delivered only on middle swab, with deionized water on two outer swabs. In referral condition, taste stimulus
was delivered only on two outer swabs, with deionized water on middle swab. In both conditions, subjects’ task
was to ignore any tastes on outer swabs and to rate intensity of taste perceived at middle swab. Significant taste
sensations were reported at middle swab in referral condition for all four taste stimuli tested (sucrose, NaCl, citric
acid, and quinine). (From Green, B.G., Food Qual. Prefer., 14, 99–109, 2002. With permission.)
A Proposed Model of a Flavor Modality 725
regions of high and low taste bud density. When the path began in a region of low taste bud density,
taste sensations started out weak. As the path intersected regions of greater taste bud density, taste
sensations became stronger. However, when the path returned to low density regions the sensation
remained nearly as intense as it was in the high density region. The authors interpreted this result
to mean that the taste sensation was “captured” by the tactile stimulation of the swab and dragged
into the insensitive area. More recent work has corroborated this interpretation by finding that tastes
can be localized to a spatially adjacent tactile stimulus (Green 2003; Lim and Green 2008; Figure
36.6).
Although it is also true that tastes can be localized independently from touch (Delwiche et al.
2000; Lim and Green 2008; Shikata et al. 2000), we believe that referral of taste to touch helps to
create a coherent “perceptive field” onto which odors are also referred, thus providing the founda-
tion for a unitary flavor percept.
36.4.1.1 Synthesis
With regard to synthesis, Wilson and Stevenson (2003) propose that odor elements combine to
produce novel odor qualities within which the odor elements are no longer discernible, and thus
that olfaction is a synthetic modality akin to color vision. Recognizing that these perceptual fea-
tures of olfaction are at odds with the analytical organization of the peripheral olfactory system,
Wilson and Stevenson argued that an experience-dependent synthesis of odor information from the
periphery occurs (Haberly 2001) that creates an emergent neural code in the cortex. Specifically,
726 The Neural Bases of Multisensory Processes
they proposed that neurons in anterior piriform cortex receive signals about odorant features from
the olfactory bulb (analytical elements) and initially function as coincident feature detectors (Figure
36.7). The response properties of the cortical neurons then rapidly shift as stimulation continues,
resulting in an experience- and odorant-dependent neural signature within an ensemble of neurons,
the “odor object.” In support of this view, recent work from Wilson’s laboratory examined neural
and perceptual responses to a set of odorant mixture “morphs”—odor mixtures with one or more
components of a 10-component (stock) mixture either removed or replaced (Barnes et al. 2008).
Electrohphysiological recordings from the rodent brain showed that the neural ensemble activity
in the piriform cortex, but not in the olfactory bulb, remained correlated when one of the compo-
nents was missing, resulting in rats being unable to discriminate the nine-element mixture from the
stock mixture. However, when a component was replaced, the piriform ensemble activity decor-
related and discrimination was possible. This suggests that neural ensembles in rodent piriform
cortex code odor quality and perform pattern completion to support perceptual stability of odor
objects. Similarly, in humans, Gottfried and colleagues used functional magnetic resonance imag-
ing (fMRI) to demonstrate a double dissociation of odor coding in the piriform cortex, with the
posterior piriform sensitive to the physiochemical features of odors (i.e., alcohol vs. aldehyde) and
not the odor quality (e.g., vegetable vs. fruit), and the anterior piriform sensitive to odor quality and
not physiochemical features (Gottfried et al. 2006b). This result indicates that it is the odor object,
and not the physical stimulus, that is represented past the initial cortical relay. Since it is likely that
conscious perception of odors in humans requires the OFC (Li et al. 2008), it is reasonable to con-
clude that olfactory perceptions are based on odor objects.
(a) Isoamyl acetate Ethyl pentanoate (b) Isoamyl acetate Ethyl pentanoate
O O O O O O
O O
ORN ORN
Glomerulus Glomerulus
M/T M/T
aPCX aPCX
E7 AA AA
E7
TRENDS in Neurosciences
FIGURE 36.7 (See color insert.) Synthetic processing in anterior piriform cortex. This figure depicts model
of olfactory processing proposed by Wilson and Stevenson. Recent olfactory sensory physiology is consistent
with a view of olfactory bulb mitral cells serving a largely feature-detection role in odor processing and
neurons in anterior piriform cortex (aPCX) serving as synthetic processors, capable of learning unique com-
binations of feature input associated with specific odors. (a) In response to a novel odor, neurons of piriform
cortex function largely as coincidence detectors for coactive feature input from mitral and tufted (M/T) cells
[color-coded for type of feature input they receive from olfactory receptor neurons (ORN)]. As coincidence
detectors, they might not be efficient at discriminating different odors within their receptive fields. (b) After
rapid perceptual learning and plasticity of association and/or afferent synapses, single neurons of piriform
cortex respond to odors as a whole, which enables enhanced discrimination between odors within their recep-
tive fields and allows maintained responsiveness to partially degraded inputs. Odorants in this example are
isoamyl acetate (AA) and ethyl pentanoate (E7), although the model also applies to mixtures of multiple
odorants. (Figure and caption are reproduced from Wilson, D.A., and Stevenson, R.J., Trends Neurosci., 26,
243–247, 2003. With permission from Elsevier and from Don Wilson.)
A Proposed Model of a Flavor Modality 727
However, the development of unique neural codes representing odors and odor mixtures does not
necessarily mean that odor objects are perceptually synthetic. Although studies of odor identifica-
tion in mixtures by Laing et al. (Laing and Francis 1989; Livermore and Laing 1996) have been
cited as evidence of synthesis (Wilson and Stevenson 2003), those results actually show a degree of
analytical processing that led Livermore and Laing (1996) to conclude that “. . . olfaction is neither
entirely analytic nor synthetic, but . . . contains elements of both” (p. 275). Thus, even though both
“expert” and novice subjects have difficulty identifying more than two or three odors in a mixture
(Livermore and Laing 1996), the ability to perceive at least some components rules out a purely
synthetic process. We therefore favor the view of Jinks and Laing (2001) that olfactory perception
is “configurational” in a manner similar to facial perception in vision (Rakover and Teucher 1997).
As those authors described it, configurational processing is based on perceptual fusion rather than
perceptual synthesis of odor qualities, which creates a gestalt in which “limited analysis” of mix-
ture components is possible. This view is also consistent with Gottfried’s conclusion that emerging
data in olfactory neuroscience support the conclusion “that the brain has simultaneous access to the
elemental and configural representations” (Gottfried 2009). As will be shown below, this concept
has also been applied to flavor perception.
36.4.1.2 Experience
There are many examples of experience dependence in the olfactory system (Dade et al. 1998;
Dalton et al. 2002; Li et al. 2006; Wilson et al. 2006). One particularly elegant example of olfactory
perceptual learning comes from Li and colleagues, who presented subjects with odor enantiomer
pairs (mirror image molecules) that were initially indistinguishable (Li et al. 2008). Subsequently,
they associated one member of the enantiomer pair with a shock. This resulted in perceptual learn-
ing in which subjects became able to distinguish the members of the pair and, consistent with
Wilson and Stevenson’s theory, this was accompanied by a divergence in neural response to the odor
pair in the anterior piriform cortex.
A second example of the role of experience in shaping olfactory perception, which is particularly
relevant to this chapter, is that when an odor is experienced with a taste, the odor later comes to
smell more like the taste with which it was experienced (Stevenson and Prescott 1995). This has
been termed the acquisition of taste-like properties by odors, and is described in depth in Chapter 35
by Prescott. It is likely that this form of perceptual learning plays an important role in the formation
of the flavor objects.
activation of the binding mechanism that mediates oral referral, and that the binding mechanism is
required to fuse flavor components into a flavor object. As such, retronasal olfaction has a privileged
role in the formation of flavor objects. That is, unless a flavor has been experienced retronasally, it
is not incorporated into a flavor object. A prediction that follows from this line of reasoning is that
if Stevenson’s basic, taste–odor learning paradigm is repeated, but the conditioning trials are per-
formed with orthonasal rather than retronasal odor stimulation, then the odors should not acquire
taste-like properties. This experiment has yet to be carried out.
not be possible (Stevenson et al. 2000a). To test this possibility they subjected tastes, and odors and
tastes and colors, to a counterconditioning paradigm. In a single conditioning session, subjects were
exposed to taste–odor and taste–color pairs. At least 24 h later, one taste–odor and one taste–color
pair underwent counterconditioning (e.g., the odor and the color were paired with new tastes). As
predicted, the odor maintained its original taste and did not acquire the new taste. In contrast, an
expectancy measure indicated that subjects expected the colored solution to taste like the counter-
conditioned taste rather than the originally conditioned taste. One caveat is that, to date, all of the
odors used in studies of odor acquisition of taste-like qualities have been rated as having perceptible
amounts of the target taste quality before the conditioning trials. Accordingly, it may be more accu-
rate to view the effect of taste–odor learning as an enhancement rather than an acquisition of taste-
like qualities. If so, it would not be surprising if pairing odors with other tastes failed to eliminate a
taste quality that the odor possessed before the original odor–taste pairing.
An obvious next question concerns the nature of odor–somatosensory learning. There are some
data to suggest that odors may acquire fat sensations after pairing with a fat-containing milk
(Sundqvist et al. 2006). However, fat may be sensed via taste channels (Gilbertson 1998; Gilbertson
et al. 1997), and therefore may be perceived as qualities of odors via the same mechanism as other
taste qualities. Certainly, sniffed odors do not appear to invoke sensations of texture and tempera-
ture. It is likely, therefore, that although configural and synthetic processes may occur during taste–
odor perceptual learning, oral somatosensory contributions to the unitary flavor percept may not be
learned, and undergo sensory fusion rather than synthesis.
Notably, whereas a pure strawberry odor may result in the perception of sweetness, a pure sweet
solution, or the texture of a berry, never evokes the perception of strawberry. Together with refer-
ral, these observations further support the view that olfaction has a privileged role in the flavor
modality. Specifically, food identity, and thus perception of flavor objects, depends primarily on
the olfactory channel (Mozell et al. 1969). Although many different foods can be characterized as
predominantly sweet, predominantly salty, smooth, or crunchy, in nature there is only one food that
is predominantly “strawberry” and one food that is predominantly “peach.” Such an arrangement
has clear advantages because it enables organisms to learn to identify many different potential food
sources and to associate them with the presence of nutrients (e.g., sugars) or toxins. Moreover, the
duality of the olfactory modality allows key sensory signals about the sources of nutrients or toxins
to be incorporated into the odor percept during eating and drinking (retronasal olfaction), which
then enables them to be sensed at a distance (orthonasal olfaction). Indeed, although humans do not
normally use their noses to sniff out food sources, the ability to use orthonasal olfaction to locate a
food source is preserved (Porter et al. 2007).
neural response if responses were collapsed across odorant type. The only significant finding was
that the oral somatomotor mouth area responded preferentially to retronasal compared to orthonasal
odors, regardless of odor identity (Figure 36.8). The response in this region was therefore suggested
to reflect olfactory referral to the oral cavity, which was documented to occur during retronasal, but
not orthonasal, stimulation.
It is not possible to know from this study whether the response in the somatomotor mouth area
was the result or the cause of referral. However, there are several factors that point to this region as
the likely locus of the binding mechanism. First, the somatomotor mouth region was the only area
to show a significant differential response to retronasal compared to orthonasal stimulation. Second,
responses there were independent of whether the odor represented a food or a nonfood stimulus.
Third, the perception of flavor consistently results in greater responses in this region than does the
perception of a tasteless solution. (Cerf-Ducastel and Murphy 2001; de Araujo and Rolls 2004;
Marciani et al. 2006), indicating that it is active when flavor percepts are experienced. Fourth, since
it is argued that stimulus integration and configural encoding are dependent on oral referral, it fol-
lows that the binding mechanism should be localized in the cortical representation of the mouth. We
also note that the location of the binding mechanism in the somatomotor mouth area is consistent
with Auvery and Spence’s suggestion that the formation of the flavor perceptual modality is depen-
dent on a higher-order cortical binding mechanism (Auvray and Spence 2008). In addition to the
initial binding, it is further predicted that neural computations in the somatomotor mouth area play
a “permissive” role in enabling the sculpting of multimodal neurons. Specifically, it is proposed that
unimodal taste and unimodal smell neurons located in the piriform and anterior dorsal insula sculpt
the profiles of bimodal taste/smell neurons located in the ventral anterior insula and the caudal OFC
only when there is concurrent activation of the binding substrate (and associated oral referral).
This model is consistent with the observations of subthreshold taste–odor summation. Whereas
subthreshold summation between orthonasally sensed odor and taste appears, like taste enhance-
ment, to be dependent on perceptual congruency (Dalton et al. 2000), subthreshold summation
between retronasally sensed odors and tastes occurs for both congruent and incongruent pairs
(Delwiche and Heffelfinger 2005). This suggests that experience is not required for summation of
subthreshold taste and retronasal olfactory signals. This observation is consistent with the proposed
model because all retronasal odors are predicted to give rise to a response in the somatomotor
mouth area. In contrast, orthonasal olfactory experiences do not activate the somatomotor mouth
SMM
4
3
2
FIGURE 36.8 Preferential activation of somatomotor mouth area by retronasal compared to orthonasal
sensation of odors. Functional magnetic resonance imaging data from a study (Small et al. 2005) using the
Heilmann and Hummel (2004) method of odorant presentation to study brain response to orthonasal and
retronasal odors. Image represents a sagittal section of brain showing response in somatomotor mouth area to
retronasal vs. orthonasal sensation of same odors superimposed upon averaged anatomical scans. (Adapted
with permission from Small, D.M. et al., Neuron 47, 593–605, 2005.)
A Proposed Model of a Flavor Modality 731
area and are therefore not referred to the mouth. As a result, orthonasal olfactory inputs can only
integrate with other oral sensations by reactivating odor objects, which have been previously associ-
ated with flavor objects.
The role of the somatomotor mouth area in oral referral and in the creation of the flavor modal-
ity could be tested in a variety of ways. For example, one could record single-unit responses in the
somatomotor mouth area and the OFC in a taste–odor learning paradigm. In humans, one could
examine taste–odor learning in patients with specific damage to the somatomotor mouth region or
in healthy controls by using transcranial magnetic stimulation to induce temporary “lesions.” The
prediction in both cases would be that lesions disrupt oral referral and the enhancement of taste-like
properties by odors. Another possibility would be to use a combination of fMRI and network con-
nectivity models such as dynamic causal modeling (Friston et al. 2003; Friston and Price 2001) to
test whether response in the somatomotor mouth area to flavors influences responses in regions such
as the OFC, and to test whether the magnitude of this influence changes as a function of learning.
GSO
OGSO
Insula
GSO GSO
G
GS
Orbital frontal O
Piriform
FIGURE 36.9 Proposed flavor network. A “glass” brain drawing depicting proposed flavor network as gray
circles. G, gustation; S, somatosensation; O, olfaction. Arrows indicate point of entry for sensory signal.
Dashed line box with GS represents gustatory (G) and somatosensory (S) relays in thalamus. Hatched region
indicates insular cortex. Bolded gray circle with S (somatosensory) indicates somatomotor mouth area. Note
that gustatory and somatosensory information are colocalized, except in somatomotor mouth area. Unitary
flavor percept is formed only when all nodes (gray circles) receive inputs. No single sensory channel (gusta-
tory, olfactory, or somatosensory) can invoke flavor object in isolation.
732 The Neural Bases of Multisensory Processes
Pardo 1997; Zatorre et al. 1992). In accordance with these findings in humans, single-cell recording
studies in monkeys have identified both taste- and smell-responsive cells in the insula/operculum
(Scott and Plata-Salaman 1999) and OFC (Rolls and Baylis 1994; Rolls et al. 1996).
Although not considered traditional chemosensory cortex, the anterior cingulate cortex receives
direct projections from the insula and the OFC (Carmichael and Price 1996; Vogt and Pandya 1987),
responds to taste and smell (de Araujo and Rolls 2004; de Araujo et al. 2003; Marciani et al. 2006;
O’Doherty et al. 2000; Royet et al. 2003; Savic et al. 2000; Small et al. 2001, 2003; Zald et al. 1998;
Zald and Pardo 1997), and shows supra-additive responses to congruent taste–odor pairs (Small et al.
2004). Therefore, it is possible that this region contributes to flavor processing. Moreover, a meta-
analysis of all independent studies of taste and smell confirmed large clusters of overlapping activa-
tion in the insula/operculum, OFC, and anterior cingulate cortex (Verhagen and Engelen 2006).
There is also evidence for supra-additive responses to the perception of congruent but not incon-
gruent taste–odor solutions in the anterodorsal insula/frontal operculum, anteroventral insula/
caudal OFC, frontal operculum, and anterior cingulate cortex (McCabe and Rolls 2007; Small
et al. 2004). Such supra-additive responses are thought to be a hallmark of multisensory integra-
tion (Calvert 2001; Stein 1998). The fact that the supra-additive responses in these regions are
experience-dependent strongly supports the possibility that these areas are key nodes of the distrib-
uted representation of the flavor object. In support of this possibility, an unpublished work suggests
that there are differential responses to food versus nonfood odors, and that such responses occur in
the insula, operculum, anterior cingulate cortex, and OFC (Small et al., in preparation).
Finally, neuroimaging studies with whole brain coverage frequently report responses in similar
regions of the cerebellum (Cerf-Ducastel and Murphy 2001; Savic et al. 2002; Small et al. 2003;
Sobel et al. 1998; Zatorre et al. 2000) and amygdala (Anderson et al. 2003; Gottfried et al. 2002a,
2002b, 2006b; Small et al. 2003, 2005; Verhagen and Engelen 2006; Winston et al. 2005; Zald
et al. 1998; Zald and Pardo 1997) to taste and smell stimulation, although neither region shows
supra-additive responses to taste and smell (Small et al. 2004). We have elected not to include these
regions in the proposed network, but acknowledge that there is at least some empirical basis for
further investigation of their role in flavor processing.
One important but still unresolved question regarding the neurophysiology of flavor perception is
whether the process by which an odor object becomes part of a flavor percept results in changes to
the odor object (Wilson and Stevenson 2004). Preliminary work suggests that the taste-like proper-
ties of food odors are encoded in the same region of insula that encodes sweet taste, and not in the
piriform cortex or OFC (Veldhuizen et al. 2010). Subjects underwent fMRI scanning while being
exposed to a weak sweet taste (sucrose), a strong sweet taste, two sweet food odors (strawberry
and chocolate), and to sweet nonfood odors (rose and lilac). A region of insular cortex was identi-
fied that responded to taste and odor sweetness. This finding is consistent with a recent report that
insular lesions disrupt taste and odor-induced taste perception (Stevenson et al. 2008). Moreover, it
was found that the magnitude of insular response to food, but not nonfood odors, correlated with
perceived sweetness. The selectivity of the association between response and sweetness perception
strongly suggests that experience with an odor in the mouth as a food or flavor modifies neural
activity, and that this occurs in the insula, but not in the piriform cortex. This, in turn, suggests that
odor objects represented in the piriform cortex are not modified by flavor learning. In summary, it is
proposed that bimodal taste–odor neurons in the OFC and anterior insula are changed during simul-
taneous perception of taste and retronasally sensed odor, whereas piriform neurons are not. Thus,
we hypothesize that the flavor object comprises an unmodified odor object and modified bimodal
cells that become associated within a distributed pattern of activation during initial binding.
Another critical question for understanding neural encoding of flavor objects is whether the
entire active network is encoded or only a subset of key elements. For example, is activation of the
somatomotor mouth area required to reexperience the flavor percept? If not, what are the key ele-
ments? The answers to these questions are currently unknown. However, as discussed above, it is
possible that the taste signal is critical (Davidson et al. 1999; Synder et al. 2007).
A Proposed Model of a Flavor Modality 733
36.7 SUMMARY
We propose that during tasting, retronasal olfactory, gustatory, and somatosensory stimuli form
a perceptual gestalt—the “flavor object”—the elements of which maintain their individual qualities
to varying degrees. The development and experience of this percept is dependent on oral refer-
ral, for which neural processing in the somatomotor mouth area is deemed critical. An as-yet-
unidentified neural mechanism within this region is hypothesized to bind the pattern of responses
elicited by flavor stimuli. When the binding mechanism is active, unimodal inputs shape the selec-
tivity of bimodal taste–odor neurons. Flavor objects are then encoded via configural learning as a
distributed pattern of response across the somatomotor mouth area, multiple regions of insula and
overlying operculum, orbitofrontal cortex, piriform cortex, and anterior cingulate cortex. It is these
functionally associated regions that constitute the neural basis of the proposed flavor modality.
REFERENCES
Anderson, A. K., K. Christoff, I. Stappen et al. 2003. Dissociated neural representations of intensity and valence
in human olfaction. Nat Neurosci 6: 196–202.
Ashkenazi, A., and L. E. Marks. 2004. Effect of endogenous attention on detection of weak gustatory and olfac-
tory flavors. Percept Psychophys 66: 596–608.
Auvray, M., and C. Spence. 2008. The multisensory perception of flavor. Conscious Cogn 17: 1016–1031.
Baeyens, F., P. Eelen, O. Van den Bergh et al. 1989. Acquired affective–evaluative vale: Conservative but not
interchangeable. Behav Res Ther 27: 279–287.
Barnes, D. C., R. D. Hofacer, A. R. Zaman et al. 2008. Olfactory perceptual stability and discrimination. Nat
Neurosci 11: 1378–1380.
Bartoshuk, L. M. 1991 Taste, smell, and pleasure. In The hedonics of taste and smell, ed. R. C. Bolles, 15–28.
Hillsdale, NJ: Lawrence Erlbaum Associates.
Bender, G., T. Hummel, S. Negoias et al. 2009. Separate signals for orthonasal vs. retronasal perception of food
but not nonfood odors. Behav Neurosci 123: 481–489.
Bradley, R. M., R. H. Smoke, T. Akin et al. 1992. Functional regeneration of glossopharyngeal nerve through
micromachined sieve electrode arrays. Brain Res 594: 84–90.
Breslin, P. A. 2000. Human Gestation. In The neurobiology of taste and smell, ed. T. E. Finger and W. L. Singer,
423–461. San Diego, CA: Wiley-Liss, Inc.
Buck, L., and R. Axel. 1991. A novel multigene family may encode odorant receptors: a molecular basis for
odor recognition.[see comment]. Cell 65: 175–187.
Bult, J. H., R. A. de Wijk, and T. Hummel. 2007. Investigations on multimodal sensory integration: texture,
taste, and ortho- and retronasal olfactory stimuli in concert. Neurosci Lett 411: 6–10.
734 The Neural Bases of Multisensory Processes
Calvert, G. A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies.
Cereb Cortex 11: 1110–1123.
Carmichael, S. T., and J. L. Price. 1996. Connectional networks within the orbital and medial prefrontal cortex
of Macaque monkeys. J Comp Physiol Psychol 371: 179–207.
Cerf-Ducastel, B., and C. Murphy. 2001. fMRI activation in response to odorants orally delivered in aqueous
solutions. Chem Senses 26: 625–637.
Cerf-Ducastel, B., P. F. Van de Moortele, P. MacLeod et al. 2001. Interaction of gustatory and lingual somato
sensory perceptions at the cortical level in the human: A functional magnetic resonance imaging study.
Chem Senses 26: 371–383.
Chale-Rush, A., J. R. Burgess, and R. D. Mattes. 2007. Evidence for human orosensory (taste?) sensitivity to
free fatty acids. Chem Senses 32: 423–431.
Chandrashekar, J., M. A. Hoon, N. J. Ryba et al. 2006. The receptors and cells for mammalian taste. Nature
444: 288–294.
Cruikshank, S. J., and N. M. Weinberger. 1996. Evidence for the Hebbian hypothesis in experience-dependent
physiological plasticity of the neocortex: A critical review. Brain Res Rev 22: 191–228.
Dade, L. A., M. Jones-Gotman, R. J. Zatorre et al. 1998. Human brain function during odor encoding and rec-
ognition. A PET activation study. Ann NY Acad Sci 855: 572–574.
Dalton, P., N. Doolittle, and P. A. Breslin. 2002. Gender-specific induction of enhanced sensitivity to odors.
Nat Neurosci 5: 199–200.
Dalton, P., N. Doolittle, H. Nagata et al. 2000. The merging of the senses: Integration of subthreshold taste and
smell. Nat Neurosci 3: 431–432.
Davidson, J. M., R. S. T. Linforth, T. A. Hollowood et al. 1999. Effect of sucrose on the perceived flavor inten-
sity of chewing gum. J Agric Food Chem 47: 4336–4340.
de Araujo, E., and E. T. Rolls. 2004. Representation in the human brain of food texture and oral fat. J Neurosci
24: 3086–3093.
de Araujo, E., E. T. Rolls, M. L. Kringelbach et al. 2003. Taste–olfactory convergence, and the representation
of the pleasantness of flavour in the human brain. Eur J Neurosci 18: 2059–2068.
de Olmos, J., H. Hardy, and L. Heimer. 1978. The afferent connections of the main and the accessory olfactory
bulb formations in the rat: An experimental HRP-study. J Comp Neurol 181: 213–244.
Delwiche, J. F., and A. L. Heffelfinger. 2005. Cross-modal additivity of taste and smell. J Sens Stud 20:
512–525.
Delwiche, J. F., M. F. Lera, and P. A. S. Breslin. 2000. Selective removal of a target stimulus localized by taste
in humans. Chem Senses 25: 181–187.
Dravnieks, A. 1985. Atlas of odor character profiles (ASTM Data series DS61). West Conshohocken, PA:
American Society for Testing and Materials.
Francis, S., E. T. Rolls, R. Bowtell et al. 1999. The representation of pleasant touch in the brain and its relation-
ship with taste and olfactory areas. Neuroreport 10: 435–459.
Frank, G. K., W. H. Kaye, C. S. Carter et al. 2003. The evaluation of brain activity in response to taste stimuli—A
pilot study and method for central taste activation as assessed by event-related fMRI. J Neurosci Methods
131: 99–105.
Frasnelli, J., M. Ungermann, and T. Hummel. 2008. Ortho- and retronasal presentation of olfactory stimuli
modulates odor percepts. Chemosens Percept 1: 9–15.
Friston, K., L. Harrison, and W. D. Penny. 2003. Dynamic causal modelling. Neuroimage 19: 1273–1302.
Friston, K., and C. J. Price. 2001. Dynamic representations and generative models of brain function. Brain Res
Bull 54: 275–285.
Gilbertson, T. A. 1998. Gustatory mechanisms for the detection of fat. Curr Opin Neurobiol 8: 447–452.
Gilbertson, T. A., D. T. Fontenot, L. Liu et al. 1997. Fatty acid modulation of K+ channels in taste receptor cells:
Gustatory cues for dietary fat. Am J Physiol 272: C1203–C1210.
Gottfried, J. A. 2009. Function follows form: Ecological constraints on odor codes and olfactory percepts. Curr
Opin Neurobiol, in press.
Gottfried, J. A., R. Deichmann, J. S. Winston et al. 2002a. Functional heterogeneity in human olfactory cortex:
An event-related functional magnetic resonance imaging study. J Neurosci 22: 10819–10828.
Gottfried, J. A., J. O’Doherty, and R. J. Dolan. 2002b. Appetitive and aversive olfactory learning in humans
studied using event-related functional magnetic resonance imaging. J Neurosci 22: 10829–10837.
Gottfried, J. A., D. M. Small, and D. H. Zald. 2006a. The chemical senses. In The orbitofrontal cortex, ed. D. H.
Zald and S. L. Rauch, 125–171. New York: Oxford Univ. Press.
Gottfried, J. A., J. S. Winston, and R. J. Dolan. 2006b. Dissociable codes of odor quality and odorant structure
in human piriform cortex. Neuron 49: 467–479.
A Proposed Model of a Flavor Modality 735
Green, B. G. 1977. Localization of thermal sensation: An illusion and synthetic heat. Percept Psychophys 22:
331–337.
Green, B. G. 2002. Studying taste as a cutaneous sense. Food Qual Prefer 14: 99–109.
Green, B. G. 2003. Studying taste as a cutaneous sense. Food Qual Prefer 14: 99–109.
Green, B. G., and B. Gelhard. 1989. Salt as an oral irritant. Chem Senses 14: 259–271.
Green, B. G., and H. T. Lawless. 1991. The psychophysics of somatosensory chemoreception in the nose and
mouth. In Smell and taste in health and disease, ed. T.V. Getchell, R. L. Doty, L. M. Bartoshuk, and J. B.
Snow, 235–253. New York: Raven Press.
Haberly, L. B. 2001. Parallel-distributed processing in olfactory cortex: New insights from morphological and
physiological analysis of neuronal circuitry. Chem Senses 26: 551–576.
Harper, R., D. G. Land, N. M. Griffiths et al. 1968. Odor qualities: A glossary of usage. Br J Psychol 59:
231–252.
Harris, J. A., F. L. Shand, L. Q. Carroll et al. 2004. Persistence of preference for a flavor presented in simulta
neous compound with sucrose. J Exp Psychol Anim Behav Processes 30: 177–189.
Hebb, D. O. 1949. The organization of behavior. New York: Wiley.
Heilmann, S., and T. Hummel. 2004. A new method for comparing orthonasal and retronasal olfaction. Behav
Neurosci 118: 412–419.
Hollingworth, H. L., and A. T. Poffenberger. 1917. The sense of taste. New York: Moffat, Yard.
Hummel, T., S. Heilmann, B. N. Landis et al. 2006. Perceptual differences between chemical stimuli presented
through the ortho- or retronasal route. Flavor Fragrance J 21: 42–47.
Jinks, A., and D. G. Laing. 2001. The analysis of odor mixtures by humans: Evidence for a configurational
process. Physiol Behav 72: 51–63.
Kadohisa, M., E. T. Rolls, and J. V. Verhagen. 2004. Orbitofrontal cortex: Neuronal representation of oral tem-
perature and capsaicin in addition to taste and texture. Neuroscience 127: 207–221.
Kohler, W. 1929. Gestalt psychology. New York: Horace Liveright.
Koza, B. J., A. Cilmi, M. Dolese et al. 2005. Color enhances orthonasal olfactory intensity and reduces retro-
nasal olfactory intensity. Chem Senses 30: 643–649.
Kringelbach, M. L., and K. C. Berridge. 2009. Oxford handbook: Pleasures of the brain. Oxford: Oxford Univ.
Press.
Laing, D. G., and G. W. Francis. 1989. The capacity of humans to identify odors in mixtures. Physiol Behav
46: 809–814.
Landis, B. N., J. Frasnelli, J. Reden et al. 2005. Differences between orthonasal and retronasal olfactory func-
tions in patients with loss of the sense of smell. Arch Otolaryngol Head Neck Surg 131: 977–981.
Li, W., J. D. Howard, T. B. Parrish et al. 2008. Aversive learning enhances perceptual and cortical discrimina-
tion of indiscriminable odor cues. Science 319: 1842–1845.
Li, W., E. Luxenberg, T. Parrish et al. 2006. Learning to smell the roses: Experience-dependent neural plasticity
in human piriform and orbitofrontal cortices. Neuron 52: 1097–1108.
Lim, J., and B. G. Green. 2008. Tactile interaction with taste localization: Influence of gustatory quality and
intensity. Chem Senses 33: 137–143.
Livermore, A., and D. G. Laing. 1996. Influence of training and experience on the perception of multicompo-
nent odor mixtures. J Exp Psychol Hum Percept Perform 46: 809–814.
Marciani, L., J. C. Pfeiffer, J. Hort et al. 2006. Improved methods for fMRI studies of combined taste and aroma
stimuli. J Neurosci Methods 158: 186–194.
McBurney, D. H. 1986. Taste, smell and flavor terminology: Taking the confusion out of confusion. In Clinical
measurement of taste and smell, ed. H. L. Meiselman and R. S. Rivkin, 117–124. New York: Macmillan.
McCabe, C., and E. T. Rolls. 2007. Umami: A delicious flavor formed by convergence of taste and olfactory
pathways in the human brain. Eur J Neurosci 25: 1855–1864.
Mozell, M. M. 1970. Evidence for a chromatographic model of olfaction. J Gen Physiol 56: 46–63.
Mozell, M. M., B. P. Smith, P. E. Smith et al. 1969. Nasal chemoreception in flavor identification. Arch
Otolaryngol 90: 367–373.
Murphy, C., W. S. Cain, and L. M. Bartoshuk. 1977. Mutual action of taste and olfaction. Sens Processes 1:
204–211.
Murphy, C. A., and W. S. Cain. 1980. Taste and olfaction: Independence vs interaction. Physiol Behav 24:
601–605.
O’Doherty, J., E. T. Rolls, S. Francis et al. 2000. Sensory-specific satiety-related olfactory activation of the
human orbitofrontal cortex. Neuroreport 11: 399–403.
Pearce, J. M. 2002. Evaluation and development of a connectionist theory of configural learning. Anim Learn
Behav 30: 73–95.
736 The Neural Bases of Multisensory Processes
Pierce, J., and B. P. Halpern. 1996. Orthonasal and retronasal odorant identification based upon vapor phase
input from common substances. Chem Senses 21: 529–543.
Plata-Salaman, C. R., T. R. Scott, and V. L. Smith-Swintosky. 1992. Gustatory neural coding in the monkey
cortex: l-Amino acids. J Neurophysiol 67: 1552–1561.
Plata-Salaman, C. R., V. L. Smith-Swintosky, and T. R. Scott. 1996. Gustatory neural coding in the monkey
cortex: Mixtures. J Neurophysiol 75: 2369–2379.
Poellinger, A., R. Thomas, P. Lio et al. 2001. Activation and habituation in olfaction—An fMRI study.
Neuroimage 13: 547–560.
Porter, J., B. Craven, R. M. Khan et al. 2007. Mechanisms of scent-tracking in humans. Nat Neurosci 10:
27–29.
Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory
qualities of foods. Food Qual Prefer 10: 349–356.
Price, J. L. 1973. An autoradiographic study of complementary laminar patterns of termination of afferent
fibers to the olfactory cortex. J Comp Neurol 150: 87–108.
Pritchard, T. C., R. B. Hamilton, J. R. Morse et al. 1986. Projections of thalamic gustatory and lingual areas in
the monkey, Macaca fascicularis. J Comp Neurol 244: 213–228.
Pritchard, T. C., R. B. Hamilton, and R. Norgren. 1989. Neural coding of gustatory information in the thalamus
of Macaca mulatta. J Neurophysiol 61: 1–14.
Rakover, S. S., and B. Teucher. 1997. Facial inversion effects: Parts and whole relationship. Percept Psychophys
59: 752–761.
Rescorla, R. A. 1981 Simultaneous associations. In Predictability, Correlation, and Contiguity, ed. P. Harzen
and M. D. Zeilner, 47–80. Chichester: Wiley.
Rescorla, R. A., and L. Freeberg. 1978. The extinction of within-compound flavor associations. Learn Motiv
9: 411–427.
Rolls, E. T. 2007. Sensory processing in the brain related to the control of food intake. Proc Nutr Soc 66:
96–112.
Rolls, E. T., and L. L. Baylis. 1994. Gustatory, olfactory, and visual convergence within the primate orbitofron-
tal cortex. J Neurosci 14: 5437–5452.
Rolls, E. T., H. D. Critchley, and A. Treves. 1996. Representation of olfactory information in the primate
orbitofrontal cortex. J Neurophysiol 75: 1982–1996.
Royet, J. P., J. Plailly, C. Delon-Martin et al. 2003. fMRI of emotional responses to odors: Influence of hedonic
valence and judgment, handedness, and gender. Neuroimage 20: 713–728.
Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Percept Psychophys 31:
397–401.
Sakai, N., T. Kobayakawa, N. Gotow et al. 2001. Enhancement of sweetness ratings of aspartame by a vanilla
odor presented either by orthonasal or retronasal routes. Percept Mot Skills 92: 1002–1008.
Savic, I., B. Gulyas, and H. Berglund. 2002. Odorant differentiated pattern of cerebral activation: comparison
of acetone and vanillin. Hum Brain Mapp 17: 17–27.
Savic, I., B. Gulyas, M. Larsson et al. 2000. Olfactory functions are mediated by parallel and hierarchical pro-
cessing. Neuron 26: 735–745.
Schifferstein, H. N. J., and P. W. J. Verlegh. 1996. The role of congruency and pleasantness in odor-induced
taste enhancement. Acta Psychol 94: 87–105.
Schoenbaum, G., and H. Eichenbaum. 1995a. Information coding in the rodent prefrontal cortex: I. Single-
neuron activity in orbitofrontal cortex compared with that in piriform cortex. J Neurophysiol 74:
733–750.
Schoenbaum, G., and H. Eichenbaum. 1995b. Information coding in the rodent prefrontal cortex: II. Ensemble
activity in orbitofrontal cortex. J Neurophysiol 74: 751–762.
Scott, T. R., and C. R. Plata-Salaman. 1991 Coding of Taste Quality. In Smell and taste in health and disease,
ed. T. V. Getchel. New York: Raven Press.
Scott, T. R., and C. R. Plata-Salaman. 1999. Taste in the monkey cortex. Physiol Behav 67: 489–511.
Shikata, H., D. B. McMahon, and P. A. Breslin. 2000. Psychophysics of taste lateralization on anterior tongue.
Percept Psychophys 62: 684–694.
Simon, S. A., I. de Araujo, J. R. Stapleton et al. 2008. Multisensory processing of gustatory stimuli. Chemosens
Percept, in press.
Small, D. M. 2008. Flavor and the formation of category-specific processing in olfaction. Chemosens Percept
1: 136–146.
Small, D. M., J. Gerber, Y. E. Mak et al. 2005. Differential neural responses evoked by orthonasal versus retro-
nasal odorant perception in humans. Neuron 47: 593–605.
A Proposed Model of a Flavor Modality 737
Small, D. M., M. D. Gregory, Y. E. Mak et al. 2003. Dissociation of neural representation of intensity and affec-
tive valuation in human gustation. Neuron 39: 701–711.
Small, D. M., M. Jones-Gotman, R. J. Zatorre et al. 1997. Flavor processing: More than the sum of its parts.
Neuroreport 8: 3913–3917.
Small, D. M., and J. Prescott. 2005. Odor/taste integration and the perception of flavor. Exp Brain Res 166:
345–357.
Small, D. M., J. Voss, Y. E. Mak et al. 2004. Experience-dependent neural integration of taste and smell in the
human brain. J Neurophysiol 92: 1892–1903.
Small, D. M., D. H. Zald, M. Jones-Gotman et al. 1999. Human cortical gustatory areas: A review of functional
neuroimaging data. Neuroreport 10: 7–14.
Small, D. M., R. J. Zatorre, A. Dagher et al. 2001. Changes in brain activity related to eating chocolate: From
pleasure to aversion. Brain 124: 1720–1733.
Smith-Swintosky, V. L., C. R. Plata-Salaman, and T. R. Scott. 1991. Gustatory neural coding in the monkey
cortex: stimulus quality. J Neurophysiol 66: 1156–1165.
Sobel, N., R. M. Khan, A. Saltman et al. 1999. Olfaction: The world smells different to each nostril. Nature
402: 35.
Sobel, N., V. Prabhakaran, J. E. Desmond et al. 1998. Sniffing and smelling: Separate subsystems in the human
olfactory cortex. Nature 392: 282–286.
Sobel, N., V. ������������������������������������������������������������������������������������������������
Prabhakaran�������������������������������������������������������������������������������������
, C. A. Hartley et al. 1998. Odorant-induced
��������������������������������������������������������
and sniff-induced activation in the cer-
ebellum of the human. J Neurosci 18: 8990–9001.
Stein, B. E. 1998. Neural mechanisms for synthesizing sensory information and producing adaptive behaviors.
Exp Brain Res 123: 124–135.
Stevenson, R. J. 2001. Associative learning and odor quality perception: How sniffing an odor mixture can alter
the smell of its parts. Learn Motiv 32: 154–177.
Stevenson, R. J., and R. A. Boakes. 2004 Sweet and sour smells: Learned synesthesia between the senses of
taste and smell. In The handbook of multisensory processes, ed. G. A. Calvert, C. Spence, and B. E. Stein,
69–83. Boston: MIT Press.
Stevenson, R. J., R. A. Boakes, and J. P. Wilson. 2000a. Counter-conditioning following human odor-taste and
color-taste learning. Learn Motiv 31: 114–127.
Stevenson, R. J., R. A. Boakes, and J. P. Wilson. 2000b. Resistance to extinction of conditioned odor percep-
tions: Evaluative conditioning is not unique. J Exp Psychol Learn Mem Cogn 26: 423–440.
Stevenson, R. J., L. A. Miller, and Z. C. Thayer. 2008. Impairments in the perception of odor-induced tastes and
their relationship to impairments in taste perception. J Exp Psychol Hum Percept Perform 34: 1183–1197.
Stevenson, R. J., and J. Prescott. 1995. The acquisition of taste properties by odors. Learn Motiv 26: 433–455.
Stevenson, R. J., J. Prescott, and R. A. Boakes. 1999. Confusing tastes and smells: how odours can influence
the perception of sweet and sour tastes. Chem Senses 24: 627–635.
Stevenson, R. J., and C. Tomiczek. 2007. Olfactory-induced synesthesias: A review and model. Psychol Bull
133: 294–309.
Sun, B. C., and B. P. Halpern. 2005. Identification of air phase retronasal and orthonasal odorant pairs. Chem
Senses 30: 693–706.
Sundqvist, N. C., R. J. Stevenson, and I. R. J. Bishop. 2006. Can odours acquire fat-like properties? Appetite
47: 91–99.
Synder, D. J., C. J. Clark, F. A. Catalanotto et al. 2007. Oral anesthesia specifically impairs retronasal olfaction.
Chem Senses 32: A15.
Tastevin, J. 1937. En partant de l’experience d’Aristote. Encephale 1: 57–84, 140–158.
Tichener, E. B. 1909. A textbook of psychology. New York: Macmillan.
Todrank, J., and L. M. Bartoshuk. 1991. A taste illusion: Taste sensation localized by touch. Physiol Behav 50:
1027–1031.
Travers, J. B. 1988. Efferent projections from the anterior nucleus of the solitary tract of the hamster. Brain
Res 457: 1–11.
Turner, B. H., K. C. Gupta, and M. Mishkin. 1978. The locus and cytoarchitecture of the projection areas of the
olfactory bulb in Macaca mulatta. J Comp Neurol 177: 381–396.
Veldhuizen, M. G., D. Nachtigal, L. Teulings et al. 2010. The insular taste cortex contributes to odor quality
coding. Frontiers in Human Neuroscience 21:4. Pii: 58
Verhagen, J. V. 2007. The neurocognitive bases of human multimodal food perception: Consciousness. Brain
Res Rev 53: 271–286.
Verhagen, J. V., and L. Engelen. 2006. The neurocognitive bases of human multimodal food perception: Sensory
integration. Neurosci Biobehav Rev 30: 613–650.
738 The Neural Bases of Multisensory Processes
Verhagen, J. V., M. Kadohisa, and E. T. Rolls. 2004. Primate insular/opercular taste cortex: Neuronal repre-
sentations of the viscosity, fat texture, grittiness, temperature, and taste of foods. J Neurophysiol 92:
1685–1699.
Vogt, B. A., and D. Pandya. 1987. Cingulate cortex of the rhesus monkey: II. Cortical afferents. J Comp Neurol
262: 271–289.
Voirol, E., and N. Dagnet. 1986. Comparative study of nasal and retronasal olfactory perception. Food Sci
Technol 19: 316–319.
Welge-Lussen, A., J. Drago, M. Wolfensberger et al. 2005. Gustatory stimulation influences the processing of
intranasal stimuli. Brain Res 1038: 69–75.
Welge-Lussen, A., A. Husner, M. Wolfensberger et al. 2009. Influence of simultaneous gustatory stimuli on
orthonasal and retronasal olfaction. Neurosci Lett 454: 124–128.
Whitehead, M. C. 1990. Subdivisions and neuron types of the nucleus of the solitary tract that project to the
parabrachial nucleus in the hamster. J Comp Neurol 301: 554–574.
Whitehead, M. C., and M. E. Frank. 1983. Anatomy of the gustatory system in the hamster: Central projections
of the chorda tympani and the lingual nerve. J Comp Neurol 220: 378–395.
Wilson, D. A., M. Kadohisa, and M. L. Fletcher. 2006. Cortical contributions to olfaction: Plasticity and per-
ception. Semin Cell Dev Biol 17: 462–470.
Wilson, D. A., and R. J. Stevenson. 2003. The fundamental role of memory in olfactory perception. Trends
Neurosci 26: 243–247.
Wilson, D. A., and R. J. Stevenson. 2004. The fundamental role of memory in olfactory perception. Trends
Neurosci 25: 243–247.
Winston, J. S., J. A. Gottfried, J. M. Kilner et al. 2005. Integrated neural representations of odor intensity and
affective valence in human amygdala. J Neurosci 25: 8903–8907.
Yamamoto, T., N. Yuyama, T. Kato et al. 1985. Gustatory responses of cortical neurons in rats: II. Information
processing of taste quality. J Neurophysiol 53: 1370–1386.
Yeomans, M. R., S. Mobini, T. D. Elliman et al. 2006. Hedonic and sensory characteristics of odors conditioned
by pairing with tastants in humans. J Exp Psychol Anim Behav Processes 32: 215–228.
Zald, D. H., J. T. Lee, K. W. Fluegel et al. 1998. Aversive gustatory stimulation activates limbic circuits in
humans. Brain 121: 1143–1154.
Zald, D. H., and J. V. Pardo. 1997. Emotion, olfaction, and the human amygdala: amygdala activation during
aversive olfactory stimulation. Proc Natl Acad Sci U S A 94: 4119–4124.
Zatorre, R. J., M. Jones-Gotman, A. C. Evans et al. 1992. Functional localization and lateralization of human
olfactory cortex. Nature 360: 339–340.
Zatorre, R. J., M. Jones-Gotman, and C. Rouby. 2000. Neural mechanisms involved in odor pleasantness and
intensity judgments. Neuroreport 11: 2711–2716.
37 Assessing the Role of
Visual and Auditory
Cues in Multisensory
Perception of Flavor
Massimiliano Zampini and Charles Spence
CONTENTS
37.1 Introduction........................................................................................................................... 739
37.2 Multisensory Interactions between Visual and Flavor Perception........................................ 740
37.2.1 Role of Color Cues on Multisensory Flavor Perception............................................ 740
37.2.2 Color-Flavor Interactions: Possible Role of Taster Status.......................................... 743
37.2.3 Color–Flavor Interactions: Possible Role of Learned Associations between
Colors and Flavors..................................................................................................... 745
37.2.4 Color–Flavor Interactions: Neural Correlates........................................................... 747
37.2.5 Interim Summary...................................................................................................... 748
37.3 Role of Auditory Cues in the Multisensory Experience of Foodstuffs................................. 749
37.3.1 Effect of Sound Manipulation on the Perception of Crisps....................................... 749
37.3.2 Effect of Auditory Cues on the Perception of Sparkling Water................................ 751
37.4 Conclusions............................................................................................................................ 752
References....................................................................................................................................... 753
37.1 INTRODUCTION
Our perception of the objects and events that fill the world in which we live depends on the integration
of the sensory inputs that simultaneously reach our various sensory systems (e.g., vision, audition,
touch, taste, and smell). Perhaps the best-known examples of genuinely multisensory experiences
come from our perception and evaluation of food and drink. The average person would say that the
flavor of food derives primarily from its taste in the mouth. They are often surprised to discover that
there is a strong “nasal” role in the perception of flavor. In fact, it has been argued that the majority
of the flavor of food actually comes from its smell (e.g., Cain 1977; Murphy and Cain 1980; Rozin
1982).* Our perception of food and drink, however, is not simply a matter of combining gustatory
* For example, coffee and tea are indistinguishable (with both having a bitter taste) if drunk while holding one’s nose
pinched shut. Whereas the taste of a lemon only actually consists of sour, sweet, and bitter components, most of the flavor
we normally associate with the taste of a lemon actually comes from the terpene aroma, one of the constituent chemicals
that stimulate the olfactory mucosa via the nasopharynx (i.e., retronasal olfaction). Odor molecules may reach the recep-
tors in the olfactory epithelium (i.e., the area located in the rear of the nasal cavity) traveling inward from the anterior
nares or through the posterior nares of the nasopharynx. Most typically, orthonasal olfaction occurs during respiratory
inhalation or sniffing, whereas retronasal olfaction occurs during respiratory exhalation or after swallowing. People
usually report experiencing odors as originating from the external world when perceived orthonasally, and as coming
from the mouth when perceived retronasally (Rozin 1982). Importantly, the latest cognitive neuroscience evidence has
highlighted the fact that somewhat different neural structures are used to process these two kinds of olfactory informa-
tion (Small et al. 2005, 2008; see also Koza et al. 2005).
739
740 The Neural Bases of Multisensory Processes
and olfactory food cues (although this is undoubtedly very important; Dalton et al. 2000). For
instance, our evaluation of the pleasantness of a particular foodstuff can be influenced not only by
what it looks, smells, and tastes like, but also what it sounds like in the mouth (think, for example, of
the auditory sensations associated with biting into a potato chip or a stick of celery; see Spence and
Zampini 2006, for a review). The feel of a foodstuff (i.e., its oral–somatosensory attributes) is also
very important; the texture, temperature, viscosity, and even the painful sensations we experience
when eating hot foods (e.g., chilli peppers) all contribute to our overall multisensory experience of
foodstuffs (e.g., Bourne 1982; Lawless et al. 1985; Tyle 1993). Flavor perception is also influenced
by the interactions taking place between oral texture and both olfactory and gustatory cues (see also
Bult et al. 2007; Christensen 1980a, 1980b; Hollowood et al. 2002). Given the multisensory nature
of our perception of food, it should come as little surprise that many studies have been conducted in
order to try and understand the relative contribution of each sense to our overall evaluation of food
(e.g., see Delwiche 2004; Spence 2002; Stevenson 2009; Stillman 2002). In this chapter, we review
the contribution of visual and auditory cues to the multisensory perception of food. Moreover, any
possible influence of visual and auditory aspects of foods and drinks might take place at different
stages of the food experience. Visual cues are perceived when foodstuffs are outside of the mouth.
Auditory cues are typically primarily perceived when we are actually consuming food.
DuBose et al. 1980; Johnson and Clydesdale 1982; Morrot et al. 2001; Oram et al. 1995; Philipsen
et al. 1995; Roth et al. 1988; Stillman 1993; Zellner and Durlach 2003). One might therefore argue
that the visual modulation of flavor perception reported in many of these previous studies simply
reflects a decisional bias introduced by the obvious variation in the color cues (cf. the literature on
the effectiveness of the color of medications on the placebo effect; e.g., de Craen et al. 1996; see also
Engen 1972), rather than a genuine perceptual effect (i.e., whereby the color cues actually modulate
the perception of flavor itself; although see also Garber et al. 2001, 2008, for an alternative perspec-
tive from the field of marketing). For example, if participants found it difficult to correctly identify the
flavor of the food or drink on the basis of gustatory and olfactory cues in flavor discrimination tasks,
then they may simply have based their responses on the more easily discriminable color cues instead.
Therefore, it might be argued that participants’ judgments in these previous studies may simply have
been influenced by decisional processes instead.
In their study, Zampini et al. (2007) tried to reduce any possible influence of response biases that
might emerge when studying color–flavor interactions by explicitly informing their participants that
the color–flavor link would often be misleading (i.e., that the solutions would frequently be presented
in an inappropriate color; cf. Bertelson and Aschersleben 1998). This experimental manipulation
was introduced in order to investigate whether the visual cues would still influence human flavor
perception when the participants were aware of the lack of any meaningful correspondence between
the color and the flavor of the solutions that they were tasting. The participants in Zampini et al.’s
study were presented with strawberry, lime, orange fruit–flavored solutions or flavorless solutions,
and requested to identify the flavor of each solution. Each of the different flavors was associated
equiprobably with each of the different colors (red, green, orange, and colorless). This meant that,
for example, the strawberry-flavored solutions were just as likely to be colored red, green, orange,
as to be presented as a colorless solution. Therefore, each of the solutions might have been colored
either “appropriately” or “inappropriately” (consisting of incongruently colored or colorless solu-
tions). The participants were informed that they would often be tricked by the color of the solutions
that would often not correspond to the flavor typically associated with that color.
The most important finding to emerge from Zampini et al.’s (2007) study was that color infor-
mation has a strong impact on flavor identification even when participants were informed that the
colors of the drinks that they were testing were often misleading. In particular, flavors associ-
ated with appropriate colors (e.g., lime flavor–green color; orange flavor–orange color) or colorless
were recognized far more accurately than when they were presented with an inappropriate color-
ing (i.e., lime-flavored drinks that were colored either red or orange; orange-flavored drinks that
were colored either green or red). These results therefore show that inappropriate coloring tends to
lead to impaired flavor discrimination responses, whereas appropriate coloring does not necessarily
improve the accuracy of participants’ flavor discrimination responses (at least when compared to
the flavor discrimination accuracy for the colorless solutions). Interestingly, however, no significant
effect of color was shown for the strawberry-flavored solutions. That is, the inappropriate coloring
of the strawberry-flavored solutions (i.e., when those solutions were colored green or orange) did not
result in a significant reduction in the participants’ ability to recognize the actual strawberry flavor.
One possible explanation for this result is that those flavors that are more strongly associated with
a particular color are more difficult to identify when presented in inappropriately colored solutions
(see Shankar et al. 2009). In fact, Zampini et al. (2007, Experiment 1; see Table 37.1) has shown that
the link between color and a specific flavor was stronger for the orange- and green-colored solutions
than for the red-colored solutions. That is, the participants in their study more often matched the
orange color with the flavor of orange and the green color with the flavor of lime. By contrast, the
red color was associated with strawberry, raspberry, and cherry flavors.
Whatever the reason for the difference in the effect of the various colors on participants’ flavor
discrimination responses, it is important to note that Zampini et al.’s (2007) results nevertheless show
that people can still be misled by the inappropriate coloring of a solution even if they know that the
color does not provide a reliable guide to the flavor of the solution. By contrast, the participants in
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 743
TABLE 37.1
Flavors Most Frequently Associated with Each Colored Solution
in Zampini et al.’s (2007, Experiment 1) Study
Color Most Associated Flavors
Green Lime (69%)a
Orange Orange (91%)a
Yellow Lemon (89%)a
Blue Spearmint (86%)a
Gray Black currant (53%), licorice (40%)a
Red Strawberry (46%), raspberry (27%), cherry (27%)
Colorless Flavorless (51%)a
Source: Zampini, M. et al., Food Qual. Prefer., 18, 975–984, 2007. With permission.
a Significant color–flavor association tested using χ2 analysis.
the majority of previous studies in this area (e.g., DuBose et al. 1980; Johnson and Clydesdale 1982;
Morrot et al. 2001; Oram et al. 1995; Philipsen et al. 1995; Roth et al. 1988; Stillman 1993; Zellner
and Durlach 2003) were not explicitly informed that the flavors of the solutions might not be paired
with the appropriately colored solutions. Zampini et al.’s results therefore suggest that the modula-
tory role of visual information on multisensory flavor perception is robust enough to override any
awareness that participants might have (e.g., as informed by the experimenter) concerning the lack
of congruency between the color and the flavor of the solutions that they taste. However, it would be
interesting in future research to investigate whether knowing that there is no meaningful relation-
ship between the color of the solutions and their flavor would modulate (i.e., reduce vs. enhance) the
influence of colors on flavors perception, as compared to the situation in which the participants are
not given any prior information about whether the colors are meaningfully related to the flavors.
* The individual differences in taste sensitivity most extensively studied are those for the bitterness intensity of PROP [and
phenylthiocarbamide (PTC) in earlier work]. Supertasters, medium tasters, and nontasters rate the bitterness of PROP as
very to intensely strong, moderate to strong, and weak, respectively. Research using taste solutions have identified other
differences in the three taster groups (see Prescott et al. 2004). Different PROP taster groups reported different taste
intensities and liking of other bitter, salty, sweet, and fat-containing substances. The three different PROP taster groups
are known to possess corresponding genetic differences. In particular, studies of taste genetics have revealed the exis
tence of multiple bitterness receptor genes (Kim et al. 2004; see also Bufe et al. 2005; Duffy 2007).
744 The Neural Bases of Multisensory Processes
that they experienced on a Labelled Magnitude Scale (e.g., Green et al. 1993). The participants were
then classified into one of three taster groups: nontasters, medium tasters, and supertasters based
on the cutoff values (non-tasters <10.90; 10.91 < medium tasters < 61.48; supertasters > 61.49; see
also Essick et al. 2003, for a similar criterion). Zampini et al.’s findings revealed that the modulatory
cross-modal effect of visual cues on people’s flavor identification responses were significantly more
pronounced in the nontasters than in the medium tasters, who, in turn, were influenced to a greater
extent by visual cues on their flavor identification responses than were the supertasters (see Figure
37.1). In particular, the nontasters (and, to a lesser extent, medium tasters) identified the flavors of
the solutions significantly more accurately when they were colored appropriately than when they
were colored inappropriately (or else were presented as colorless solutions). By contrast, the super-
tasters identified the flavors of the solutions more accurately overall, and their performance was not
affected by the colors of the solutions.
Zampini et al.’s (2008) results are consistent with recent accounts of sensory dominance derived
from studies of cross-modal interactions between tactile, visual, and auditory stimuli (see, e.g.,
Alais and Burr 2004; Ernst and Banks 2002). Ernst and Banks used the maximum likelihood
50 50 50
Nontasters
25 25 25
0 0 0
Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless
Color of the solutions Color of the solutions Color of the solutions
75 75 75
Medium 50 50 50
tasters
25 25 25
0 0 0
Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless
Color of the solutions Color of the solutions Color of the solutions
75 75 75
Supertasters 50 50 50
25 25 25
0 0 0
Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless Yellow Gray Orange Red Colorless
Color of the solutions Color of the solutions Color of the solutions
FIGURE 37.1 Mean flavor intensity ratings for three groups of participants (nontasters, medium tasters, and
supertasters) for blackcurrant, orange, and flavorless solutions presented in Zampini et al.’s (2008) study of
effects of color cues on multisensory flavor perception in humans. Black columns represent solutions where
fruit acids had been added and white columns represent solutions without fruit acids. Error bars represent
between-participants standard errors of the means. (Reprinted from Zampini, M. et al., Food Qual. Prefer.,
18, 975–984, 2007. With permission.)
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 745
estimation (MLE) approach to argue that the contribution of a given sensory input to multisensory
perception is determined by weighting the sensor estimates in each sensory modality by the noise
(or variance) present in that modality. It could be argued that in Zampini et al.’s study, the estimates
of the flavors of the fruit-flavored solutions by the nontasters were simply more variable (i.e., their
judgments were less sensitive) than those of either the medium tasters or the supertasters. As a con-
sequence, given the presumably uniform levels of visual discriminability across these three groups
of participants, the MLE account would predict that nontasters should weigh the visual cues more
highly when making their responses than the medium tasters, who in turn should weigh the gusta-
tory cues less highly than the supertasters, just as we observed. It will be an interesting question
for future research to determine whether flavor discrimination responses can be modeled using the
MLE approach. It is important to note here that such an analysis may also be able to reveal whether
there are any underlying attentional biases (to weight information from one sensory modality more
highly than information from another modality) that may be present in the different taster groups
(cf. Battaglia et al. 2003). Moreover, it is interesting to consider at this point that although more than
100 studies examining visual contributions to flavor perception have been published over the past 80
years, Zampini et al.’s study represents the first attempt to take the taster status of participants into
consideration when analyzing their results. The results of Zampini et al.’s study clearly demonstrate
that taster status plays an important role in modulating the cross-modal contribution of visual cues
to flavor perception in fruit-flavored beverages.*
* However, it should also be noted that the relatively small number of participants were tested in each category (i.e., four
non-tasters, five medium tasters, and five supertasters), thus placing a caveat in terms of generalizing from Zampini et
al.’s (2008) findings. In future studies, taster status should therefore be assessed with much larger sample sizes.
746 The Neural Bases of Multisensory Processes
that are produced for the UK market contain orange-flavored chocolate, whereas all of the other
colors contain unadulterated milk chocolate. By contrast, Smarties that have been produced for
other markets all contain unadulterated milk chocolate, regardless of their color. Crucially, the par-
ticipants were sometimes presented with pairs of stimuli that differed in their color but not in their
flavor, or with pairs of Smarties that differed in both their color and flavor, or else with Smarties
pairs that differed in their flavor but not their color.
In a preliminary questionnaire, a number of the participants in Levitan et al.’s (2008) study
stated their belief that a certain non-orange (i.e., red and green) Smartie had a distinctive flavor
(which is incorrect), whereas other participants believed (correctly) that all the non-orange Smarties
tasted the same. In the first experiment, the participants were presented with all possible pairings of
orange, red, and green Smarties and were asked to judge whether a given pair of Smarties differed
in flavor by tasting them while either sighted or blindfolded. The results showed that people’s beliefs
concerning specific color–flavor associations for Smarties exerted a significant modulatory effect
on their flavor responses. In the sighted condition, those participants who believed that non-orange
Smarties all taste the same were more likely to judge correctly that a red–green pairing of Smarties
tasted identical in comparison to the first group, who performed at a level that was significantly
below chance (i.e., they reported that the red and green Smarties tasted different on the majority
of trials). In other words, those participants who thought that there was a difference between the
flavors of the red and green Smarties did in fact judge the two Smarties as tasting different far more
frequently when compared with participants who did not hold such a belief in the sighted condition.
The results of Levitan et al.’s study are consistent with the results of the other studies presented in
this section in showing that food color can have a powerful cross-modal influence on people’s per-
ception of the flavor of food. However, Levitan et al.’s findings show that people’s beliefs about the
cross-modal color–flavor associations of specific foods can modulate this influence, and that such
cognitive influences can be robust and long-lasting despite extensive experience with the particular
food item concerned.*
In another recent study, Shankar et al. (2009) found that another variety of sugar-coated choco-
late candies (multicolored M&Ms, which are all physically identical in taste) were rated as having a
stronger chocolate flavor when they were labeled as “dark chocolate” than when they were labeled
as “milk chocolate.” Many other studies have found a similar effect of expectations produced by
labeling a stimulus before sampling on flavor perception (see Cardello 1994; Deliza and MacFie
1996; Lee et al. 2006; Yeomans et al. 2008; Zellner et al. 2004, for reviews). Shankar et al. have
also investigated whether the influence of expectations on flavor perception might be driven by color
information (see Levitan et al. 2008). In their study, participants were asked to evaluate how “choco-
latey” they found green- or brown-colored M&Ms. Participants rated the brown M&Ms as being
more “chocolatey” than the green ones. This result suggests that the color brown generates stronger
expectations of “chocolate” than the green color (cf. Duncker 1939). Finally, Shankar et al. studied
whether there was an interaction between the expectation generated by either color or label on mul-
tisensory flavor perception. The participants were again presented with brown- or green-colored
M&Ms and informed about the “chocolate category” (i.e., either “milk chocolate” or “dark choco-
late”) with each color–label combination (green–milk, brown–milk, green–dark, brown–dark)
presented in a randomized order. Brown-colored M&Ms were given a higher chocolatey rating
than green-colored M&Ms. Similarly, those labeled as “dark chocolate” were given higher ratings
than those labeled “milk chocolate.” However, no interaction between these colors and labels was
found, thus suggesting that these two factors exerted independent effects, implying that two distinct
associations were being retrieved from memory and then utilized (e.g., the color–flavor association
* It is interesting to note that the participants in Levitan et al.’s (2008) study were able to maintain such inappropriate
beliefs about differently colored Smarties tasting different, despite the objective evidence that people perceive no differ-
ence in their flavor, and the fact that they have presumably had extensive previous exposure to the fact that these colors
provide no useful information in this foodstuff.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 747
and the label–flavor association). Shankar et al.’s findings therefore provide the first evidence that
color can influence the flavor of a product whose flavor identity cannot be predicted by its color. In
other words, the colors of the coatings of the M&Ms are independent of their taste (which is always
chocolate).
One final issue that remains unresolved here concerns the extent to which the influence of color
on flavor discrimination reflects a perceptual versus a more decisional effect, or whether instead
both perceptual and decisional factors may contribute to participants’ performance (see Spence et
al., submitted; and Zampini et al. 2007, on this point). If it is a purely perceptual effect, the par-
ticipant’s gustatory experience should be changed by viewing the color, that is, knowledge of the
color might improve the sensitivity of participants’ flavor discrimination responses by reducing the
variability of the multisensory flavor signal. Alternatively, however, according to the decisional
account, people should always have given the same gustatory response for a given color–flavor pair-
ing regardless of whether sighted or blindfolded. In fact, what may have changed is their decisional
criteria. In Levitan et al.’s (2008) study, the participants who were uncertain of their responses for
a given pair of Smarties might have biased their choice toward making different responses because
they could see that they had a different color. By contrast, those participants who already knew that
red and green Smarties were normally identical in taste might have been biased toward making a
same response. In the case of olfaction, Engen (1972) has already shown results consistent with the
claim that color can influence odor perception as a result of its effect on decisional mechanisms, but
this does not, of course, necessarily rule out a role for perceptual interactions as well, at least when
tested under the appropriate experimental conditions (see Zellner and Kautz 1990).
However, it is possible to hypothesize that a person’s beliefs about particular foods tasting dif-
ferent if they have a different color may paradoxically result in them actually tasting different.
Analogously, de Craen et al. (1996) discussed a number of findings showing that color cues modu-
late the effectiveness of medicines as well as placebo pills. Although the mechanism behind placebo
effects such as these is not as yet well understood, the effects themselves are nevertheless robust
(e.g., for a recent review, see Koshi and Short 2007). What is more, just as in Levitan et al.’s (2008)
Smarties experiment, there is at least some evidence that different people may hold different beliefs
about differently colored pills, and that these beliefs can carry over into the actual effects that the
differently colored placebo pills are shown to have (Lucchelli et al. 1978). Therefore, if people’s
beliefs about color and medication can affect their physical state (e.g., resulting in a genuine change
in their tolerance for pain, say, or in their ability to sleep), it would seem conceivable that a person’s
belief that a certain colored Smartie tasted distinctive (from a Smartie of a different color) might
give rise to the effect of it, paradoxically, actually tasting different to that person, despite there being
no physical difference in flavor.
findings revealed that the presentation of appropriate odor–color combinations (e.g., odor of straw-
berry matched with red color) increased the brain activity seen in the OFC when compared with the
brain activation seen in the odor-alone conditions. By contrast, there was a suppression of neural
activity in the same area when inappropriate color–odor combinations were presented (e.g., when
the odor of strawberry was presented with a turquoise patch of color on the monitor; see also De
Araujo et al. 2003). Taken together, these results would appear to suggest that presenting an appro-
priate color–odor association may actually lead to increased neural activity in brain areas respon-
sible for processing olfactory stimuli, whereas presenting inappropriate color–odor associations can
suppress brain activity below that observed to the odors alone. The positive correlation between the
perceived congruency of color–odor pairs and the changes in the pattern of brain activation found
in Osterbauer et al.’s study (see also Skrandies and Reuther 2008), therefore, provides a neurophysi-
ological basis for the perceptual changes elicited by changing the color of food.
perception is particularly interesting for multisensory researchers precisely because the rules of
integration, and cross-modal influence, are likely to be somewhat different.
In the previous sections, we also discussed how individual differences can affect the nature
of the cross-modal visual–flavor interactions that are observed. In particular, visual influences on
multisensory flavor perception can be significantly modulated as a function of the taster status of
the participant. Visual dominance effects in multisensory flavor perception are more pronounced
in those participants who are less sensitive to gustatory cues (i.e., nontasters) than in supertasters,
who appear to enjoy the benefit of enhanced gustatory resolution. Therefore, taster status, although
often neglected in studies investigating color–flavor interactions, should certainly be considered
more carefully in any future research in this area. Finally, we have reviewed the role of expectancy
resulting from visual information on the overall food perception.
freshness of potato chips would be affected by only modifying the sounds produced during the bit-
ing action. In fact, the Pringles potato chips used in their experiment have all the same visual (i.e.,
shape) and oral–tactile (i.e., texture) aspects. The participants in this study had to make a single
bite with their front teeth into a large number (180) of potato chips (otherwise known as crisps in
the United Kingdom) with their mouth placed directly above the microphone and then to spit the
crisp out (without swallowing) into a bowl placed on their lap. They then rated the crispness and
freshness of each potato chip using a computer-based visual analog scale. The participants might
hear the veridical sounds they made when biting into a crisp without any auditory frequency adjust-
ment or with frequencies in the range 2–20 of the biting sounds amplified or attenuated by 12 dB.
Furthermore, for each frequency manipulation, there was an attenuation of the overall volume of
0 (i.e., no attenuation), 20, or 40 dB. The results showed that the perception of both crispness and
freshness were affected by the modulation of the auditory cues produced during the biting action. In
particular, the potato chips were perceived as being both crisper and fresher when either the overall
sound level was increased, or when just the high frequency sounds (in the range of 2–20 kHz) were
selectively amplified (see Figure 37.2).
(b)
Crisper 100
0 dB
80
(magnitude estimation)
60
–40 dB
40
Headphones
20
Softer 0
Attenuate Normal Amplify
Microphone
Frequency manipulation
(c)
Fresher 100
80 0 dB
(magnitude estimation)
Perceived freshness
–20 dB
60
–40 dB
40
20
Response pedals
Staler 0
Attenuate Normal Amplify
Frequency manipulation
FIGURE 37.2 (a) Schematic view of apparatus and participant in Zampini et al.’s (2004) study. Door of experi-
mental booth was closed during the experiment and response scale was viewed through the window in left-hand
side wall of booth. Mean responses for soft–crisp (b) and fresh–stale (c) response scales for three overall attenuation
levels (0, −20, or −40 dB) against three frequency manipulations (high frequencies attenuated, veridical auditory
feedback, or high frequencies amplified) are reported. Error bars represent between-participants standard errors of
means. (Reprinted from Zampini, M., and Spence, C., J. of Sens. Stud., 19, 347–363, 2004. With permission.)
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 751
Given that the crisps in Zampini and Spence’s (2004) study were very similar to each other in
terms of their visual, tactile, and flavor attributes, the only perceptual aspect that varied during the
task was the sound (which, of course, also contributes to flavor). Therefore, participants may have
“felt” that the crisps had a different texture only guided by the sound since the other senses always
received the same information. Additional evidence highlighting the powerful effect of auditory
cues on the overall perception of the crisps was that the majority of the participants (15 out of 20)
stated anecdotally on debriefing after the experiment that they believed the crisps to have been
selected from different packages. Additionally, the majority of the participants also reported that
the auditory information had been more salient than the oral tactile information, and this may also
help to account for the effects reported by Zampini and Spence. In fact, one of the fundamental laws
of multisensory integration that has emerged over the past few decades states that the sense that
provides the more reliable (or salient) information is the one that dominates, or modulates, percep-
tion in another sensory modality (e.g., Ernst and Banks 2002; Shimojo and Shams 2001; Welch and
Warren 1980). However, the sensory dominance effect can be explained by the fact that the human
brain might rely on the most attended senses (Spence and Shankar 2010). The role of attention in the
multisensory influence of auditory information on food perception is consistent with the results of a
study in which the participants had to try and detect weak solutions of sucrose or citric acid in a mix-
ture (Marks and Wheeler 1998). Participants were more accurate at detecting the tastant they were
attending to than for the tastant they were not attending to (see also Ashkenazi and Marks 2004).
Marks and Wheeler suggested that our ability to detect a particular sensory quality (e.g., tastant or
flavor) may be modulated by selective attention toward (or away from) that quality. Therefore, in a
similar vein, one might suggest that the effect found in crispness perception by increasing the over-
all loudness of the sounds produced when biting into crisps can change a participant’s perception
by making the sounds more pronounced than would have been the case if this information had been
derived solely from the texture in the mouth or from normal-level auditory cues. That is, partici-
pants’ attention would be directed toward this feature of the food by externally changing the relative
weighting of the sensory cues that signify this. Louder sounds are also presumably more likely to
capture a person’s attention than quieter sounds. However, at present, it is unclear how many of the
findings taken to support an attentional account of any sensory dominance effect can, in fact, be
better accounted for in terms of sensory estimates of stimulus attributes simply being more accu-
rate (i.e., less variable) in the dominant modality than those in the other modalities (e.g., Alais and
Burr 2004; Battaglia et al. 2003; Ernst and Banks 2002). Finally, it is important to note that these
explanations are not mutually exclusive. For example, Zampini and Spence’s (2004) results can be
accounted for either in terms of attentional capture or in terms of multisensory integration.
therefore show that auditory cues can modulate the perception of the carbonation of a water sample
held in the hand, but cannot modulate people’s perception of a water sample held in the mouth. This
might be because the perception of carbonation in the mouth is more dependent on oral–somatosen-
sory and/or nociceptive inputs than on auditory cues, or alternatively, that it is more important that
we correctly perceive stimuli once they have entered the oral cavity (see Koza et al. 2005). Once
again, these findings are consistent with the hypothesis that the modality dominating multisensory
perception (when the senses are put into conflict) is the most accurate and/or informative sense (e.g.,
see Ernst and Banks 2002).
37.4 CONCLUSIONS
The past few years have seen a rapid growth of interest in the multisensory aspects of food perception
(see Auvray and Spence 2008; Delwiche 2004; Prescott 1999, 2004; Stevenson 2009; Stevenson and
Tomiczek 2007; Stillman 2002; Verhagen and Engelen 2006, for reviews). The research reviewed here
highlights the profound effect that visual (i.e., color of food) and auditory cues (i.e., variations in the
overall sound level and variations in the spectral distribution of energy) can have on people’s percep-
tion foodstuffs (such as potato chips and beverages). When people are asked to identify the flavors of
food and beverages, their responses can be influenced by the colors of those food and beverages. In
particular, the identification of specific flavors has often been shown to be less accurate when they are
paired with an inappropriate color (e.g., DuBose et al. 1980; Zampini et al. 2007, 2008). Our percep-
tion of the flavor and physical characteristics of food and beverages can also be modulated by auditory
cues. For instance, it is possible to change the perceived crispness of crisps or the perceived fizziness
of a carbonated beverage (such as sparkling water) simply by modifying the sounds produced when
eating the crisps or by the bubbles of the sparkling water (Zampini et al. 2004, 2005).
It is important to note that visual and auditory information are available at different stages of eat-
ing. Typically, visual (not to mention orthonasal olfactory and, on occasion, auditory) cues are avail-
able long before our ingestion of food (and before any other sensory cues associated with the food
are available). Therefore, visual cues (e.g., food colors) might be expected to create an expectancy
concerning the possible flavor of the food to be eaten (Hutchings 1977; Shankar et al. 2010). By
contrast, any role of expectancy might be reduced when thinking at the potential influence of audi-
tory cues on the perception of food. In fact, the sounds produced when biting into or chewing food
are available at the moment of consumption. Therefore, it is possible to hypothesize that the role of
multisensory integration is somewhat different when looking at the role of visual and auditory cues
on the overall food perception. Given that visual cues are typically available long before a food is
consumed and outside the mouth, it is quite unlikely that visual–flavor interactions are modulated
by the spatial and temporal rules (i.e., greater multisensory interaction with spatial and temporal
coincidence between the stimuli; see Calvert et al. 2004, for a review). Therefore, visual influences
on multisensory flavor perception are better explained by looking at the role of expectancy than at
the role of the spatial and temporal rules, which might help us to understand the role of auditory
cues on food perception instead. However, some sounds might produce an expectancy effect as well.
For example, sound of the food package being opened will normally precede the consumption of a
particular packaged food item (think only of the rattling of the crisps packet). Several researchers
have demonstrated that people’s expectations regarding what they are about to consume can also
have a significant effect on their perception of pleasantness of the food or drink itself (see Spence
et al., in press, for a recent review). It is also important to note that the visual and auditory con-
tribution to multisensory flavor perception typically takes place without people necessarily being
consciously aware that what they are seeing or hearing is influencing their overall flavor experience
(e.g., Zampini et al. 2004, 2005). In Zampini et al.’s more recent research (e.g., Zampini et al. 2007,
2008), the participants were influenced by the inappropriate colors of the beverages that they were
evaluating even though they had been informed beforehand that there might be a lack of congruency
between the colors that they saw and the flavors that they were tasting. This shows, therefore, that
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 753
the effect was powerful enough to override participants’ awareness that color information might
mislead their identification of the flavors. The potential role of the sounds made when eating food
on food perception is often ignored by people. For example, most of the participants in Zampini et
al.’s (2004) study thought that the crisps were actually different (i.e., sorted from different pack-
ages or with different level of freshness and, therefore, of crispness). They seem to ignore the fact
that the experimenters changed only the sounds produced when biting into the crisps and the crisps
were not different. Nevertheless, the study reported here are consistent with a growing number of
neurophysiological and electrophysiological studies demonstrating close visual–flavor (Osterbauer
et al. 2005; Small 2004; Small and Prescott 2006; Verhagen and Engelen 2006) and audiotacile
(Gobbelé et al. 2003; Kitagawa and Spence 2006; Levänen et al. 1998; Schroeder et al. 2001; von
Békésy 1957)* interactions at the neuronal level. Results such as these therefore help to emphasize
the limitations that may be associated with relying solely on introspection and verbal report (as is
often the case in commercial consumer testing settings) when trying to measure people’s perception
and evaluation of foodstuffs.
REFERENCES
Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current
Biology 14: 257–262.
Alley, R. L., and T. R. Alley. 1998. The influence of physical state and color on perceived sweetness. Journal
of Psychology: Interdisciplinary and Applied 132: 561–568.
Ashkenazi, A., and L. E. Marks. 2004. Effect of endogenous attention on detection of weak gustatory and olfac-
tory flavors. Perception & Psychophysics 66: 596–608.
Auvray, M., and C. Spence. 2008. The multisensory perception of flavor. Consciousness & Cognition 17:
1016–1031.
Bartoshuk, L. M., V. B. Duffy, and I. J. Miller. 1994. PTC/PROP tasting: Anatomy, psychophysics, and sex
effects. Physiology & Behavior 56: 1165–1171.
Bartoshuk, L. M., K. Fast, T. A. Karrer, S. Marino, R. A. Price, and D. A. Reed. 1992. PROP supertasters and
the perception of sweetness and bitterness. Chemical Senses 17: 594.
Battaglia, P. W., R. A. Jacobs, and R. N. Aslin. 2003. Bayesian integration of visual and auditory signals for
spatial localization. Journal of the Optical Society of America A 20: 1391–1397.
Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic
Bulletin & Review 5: 482–489.
Blackwell, L. 1995. Visual clues and their effects on odour assessment. Nutrition and Food Science 5: 24–28.
Bourne, M. C. 1982. Food texture and viscosity. New York: Academic Press.
Bufe, B., P. A. Breslin, C. Kuhn et al. 2005. The molecular basis of individual differences in phenylthiocarbam-
ide and propylthiouracil bitterness perception. Current Biology 15: 322–327.
Bult, J. H. F., R. A. de Wijk, and T. Hummel. 2007. Investigations on multimodal sensory integration: Texture,
taste, and ortho- and retronasal olfactory stimuli in concert. Neuroscience Letters 411: 6–10.
Cain, W. S. 1977. History of research on smell. In Handbook of perception: Vol. 6a: Tasting and smelling, ed.
E. C. Carterette and M. P. Friedman, 197–229. New York: Academic Press.
Calvert, G., C. Spence, and B. E. Stein. 2004. The handbook of multisensory processing. Cambridge, MA: MIT
Press.
Cardello, A. V. 1994. Consumer expectations and their role in food acceptance. In Measurement of food
preferences, ed. H. J. H. MacFie, and D. M. H. Thomson, 253–297. London: Blackie Academic &
Professional.
Chan, M. M., and C. Kane-Martinelli. 1997. The effect of color on perceived flavour intensity and acceptance
of foods by young adults and elderly adults. Journal of the American Dietetic Association 97: 657–659.
Chandrashekar, J., D. Yarmolinsky, L. von Buchholtz et al. 2009. The taste of carbonation. Science 326:
443–445.
* However, it is important to note that, to the best of our knowledge, no neuroimaging studies have as yet been conducted
to investigate the role of auditory cues on multisensory food perception (cf. Spence and Zampini 2006; Verhagen and
Engelen 2006).
754 The Neural Bases of Multisensory Processes
Chen, H., C. Karlsson, and M. Povey. 2005. Acoustic envelope detector for crispness assessment of biscuits.
Journal of Texture Studies 36: 139–156.
Christensen, C. M. 1980a. Effects of taste quality and intensity on oral perception of viscosity. Perception &
Psychophysics 28: 315–320.
Christensen, C. M. 1980b. Effects of solution viscosity on perceived saltiness and sweetness. Perception &
Psychophysics 28: 347–353.
Christensen, C. M., and Z. M. Vickers. 1981. Relationship of chewing sounds to judgments of food crispness.
Journal of Food Science 46: 574–578.
Clydesdale, F. M. 1993. Color as a factor in food choice. Critical Reviews in Food Science and Nutrition 33:
83–101.
Dalton, P., N. Doolittle, H. Nagata, and P. A. S. Breslin. 2000. The merging of the senses: Integration of sub-
threshold taste and smell. Nature Neuroscience 3: 431–432.
Davis, R. 1981. The role of nonolfactory context cues in odor identification. Perception & Psychophysics 30:
83–89.
De Araujo���������������������������������������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������
, I. E. T., E. T.����������������������������������������������������������������������������������
���������������������������������������������������������������������������������
Rolls, M. L. Kringelbach, F. McGlone, and N. Phillips. 2003. ��������������������
Taste–olfactory con-
vergence, and the representation of the pleasantness of flavour, in the human brain. European Journal of
Neuroscience 18: 2059–2068.
de Craen, A. J. M., P. J. Roos, A. L. de Vries, and J. Kleijnen. 1996. Effect of colour of drugs: Systematic review
of perceived effect of drugs and their effectiveness. British Medical Journal 313: 1624–1626.
Deliza, R., and H. MacFie. 1996. The generation of sensory expectation by external cues and its effect on sen-
sory perception and hedonic ratings: A review. Journal of Sensory Studies 11: 103–128.
Delwiche, J. 2004. The impact of perceptual interactions on perceived flavour. Food Quality and Preference
15: 137–146.
Demattè, M. L., D. Sanabria, and C. Spence. 2006. Crossmodal associations and interactions between olfaction
and vision. Chemical Senses 31: E50–E51.
Demattè, M. L., D. Sanabria, and C. Spence. 2009. Olfactory identification: When vision matters? Chemical
Senses 34: 103–109.
Drake, B. K. 1963. Food crunching sounds. An introductory study. Journal of Food Science 28: 233–241.
Drake, B. K. 1970. Relationships of sounds and other vibrations to food acceptability. Proceedings of the 3rd
International Congress of Food Science and Technology, pp. 437–445. August 9–14, Washington, DC.
Drewnowski, A. 2003. Genetics of human taste perception. In Human olfaction and gustation, 2nd ed., ed.
R. L. Doty, 847–860. New York: Marcel Dekker, Inc.
DuBose, C. N., A. V. Cardello, and O. Maller. 1980. Effects of colourants and flavourants on identification,
perceived flavour intensity, and hedonic quality of fruit-flavoured beverages and cake. Journal of Food
Science 45: 1393–1399, 1415.
Duffy, V. B. 2007. Variation in oral sensation: Implications for diet and health. Current Opinion in
Gastroenterology 23: 171–177.
Duncker, K. 1939. The influence of past experience upon perceptual properties. American Journal of Psychology
52: 255–265.
Engen, T. 1972. The effect of expectation on judgments of odour. Acta Psychologica 36: 450–458.
Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal
fashion. Nature 415: 429–433.
Essick, G. K., A. Chopra, S. Guest, and F. McGlone. 2003. Lingual tactile acuity, taste perception, and the den-
sity and diameter of fungiform papillae in female subjects. Physiology & Behavior 80: 289–302.
Frank, R. A., K. Ducheny, and S. J. S. Mize. 1989. Strawberry odor, but not red color, enhances the sweetness
of sucrose solutions. Chemical Senses 14: 371–377.
Garber Jr., L. L., E. M. Hyatt, and Ü. Ö. Boya. 2008. �������������������������������������������������
The mediating effects of the appearance of nondu-
rable consumer goods and their packaging on consumer behavior. In Product experience, ed. H. N. J.
Schifferstein and P. Hekkert, 581–602. London: Elsevier.
Garber Jr., L. L., E. M. Hyatt, and R. G. Starr Jr. 2000. The effects of food colour on perceived flavour. Journal
of Marketing Theory and Practice 8: 59–72.
Garber Jr., L. L., E. M. Hyatt, and R. G. Starr Jr. ����������������������������������������������������������
2001. Placing food color experimentation into a valid con-
sumer context. Journal of Food Products Marketing 7: 3–24.
Gifford, S. R., and F. M. Clydesdale. 1986. The psychophysical relationship between colour and sodium chlo-
ride concentrations in model systems. Journal of Food Protection 49: 977–982.
Gifford, S. R., F. M. Clydesdale, and R. A. Damon Jr. 1987. The psychophysical relationship between colour
and salt concentration in chicken flavoured broths. Journal of Sensory Studies 2: 137–147.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 755
Gobbelé, R., M. Schürrmann, N. Forss, K. Juottonen, H. Buchner, and R. Hari. 2003. Activation of the human
posterior and tempoparietal cortices during audiotactile interaction. Neuroimage 20: 503–511.
Green, B. G., G. S. Shaffer, and M. M. Gilmore. 1993. Derivation and evaluation of a semantic scale of oral
sensation with apparent ratio properties. Chemical Senses 18: 683–702.
Hollowood, T. A., R. S. T. Linforth, and A. J. Taylor. 2002. The effect of viscosity on the perception of flavour.
Chemical Senses 27: 583–591.
Hutchings, J. B. 1977. The importance of visual appearance of foods to the food processor and the consumer.
In Sensory properties of foods, ed. G. G. Birch, J. G. Brennan, and K. J. Parker, 45–57. London: Applied
Science Publishers.
Johnson, J. L., and F. M. Clydesdale. 1982. Perceived sweetness and redness in coloured sucrose solutions.
Journal of Food Science 47: 747–752.
Johnson, J. L., E. Dzendolet, and F. M. Clydesdale. 1983. Psychophysical relationships between sweetness and
redness in strawberry-drinks. Journal of Food Protection 46: 21–25.
Kim, U. K., P. A. Breslin, D. Reed, and D. Drayna. 2004. Genetics of human taste perception. Journal of Dental
Research 83: 448–453.
Kitagawa, N., and C. Spence. 2006. Audiotactile multisensory interactions in information processing. Japanese
Psychological Research 48: 158–173.
Koshi, E. B., and C. A. Short. 2007. Placebo theory and its implications for research and clinical practice: A
review of the recent literature. Pain Practice 7: 4–20.
Koza, B., A. Cilmi, M. Dolese, and D. Zellner. 2005. Color enhances orthonasal olfactory intensity and reduces
retronasal olfactory intensity. Chemical Senses 30: 643–649.
Lavin, J., and H. T. Lawless. 1998. Effects of colour and odor on judgments of sweetness among children and
adults. Food Quality and Preference 9: 283–289.
Lawless, H., P. Rozin, and J. Shenker. 1985. Effects of oral capsaicin on gustatory, olfactory and irritant sensa-
tions and flavor identification in humans who regularly or rarely consume chili pepper. Chemical Senses
10: 579–89.
Lee, L., S. Frederick, and D. Ariely. 2006. Try it, you’ll like it. Psychological Science 17: 1054–1058.
Levänen, S., V. Jousmäki, and R. Hari. 1998. Vibration-induced auditory-cortex activation in a congenitally
deaf adult. Current Biology 8: 869–872.
Levitan, C. A., M. Zampini, R. Li, and C. Spence. 2008. Assessing the role of colour cues and people’s beliefs
about colour–flavour associations on the discrimination of the flavour of sugar-coated chocolates.
Chemical Senses 33: 415–423.
Lucchelli, P. E., A. D. Cattaneo, and J. Zattoni. 1978. Effect of capsule colour and order of administration of
hypnotic treatments. European Journal of Clinical Pharmacology 13: 153–155.
Maga, J. A. 1974. Influence of colour on taste thresholds. Chemical Senses and Flavour 1: 115–119.
Marks, L. E., and M. E. Wheeler. 1998. Attention and the detectability of weak-taste stimuli. Chemical Senses
23: 19–29.
Masuda, M., Y. Yamaguchi, K. Arai, and K. Okajima. 2008. Effect of auditory information on food recognition.
IEICE Technical Report 108(356): 123–126.
Moir, H. C. 1936. Some observations on the appreciation of flavour in food stuffs. Chemistry and Industry 55:
145–148.
Morrot, G., F. Brochet, and D. Dubourdieu. 2001. The colour of odors. Brain and Language 79: 309–320.
Murphy, C., and W. S. Cain. 1980. Taste and olfaction: Independence vs. interaction. Physiology and Behavior
24: 601–605.
Oram, N., D. G. Laing, I. Hutchinson et al. 1995. The influence of flavour and colour on drink identification by
children and adults. Developmental Psychobiology 28: 239–246.
Osterbauer, R. A., P. M. Matthews, M. Jenkinson, C. F. Beckmann, P. C. Hansen, and G. A. Calvert. 2005. Color
of scents: Chromatic stimuli modulate odor responses in the human brain. Journal of Neurophysiology
93: 3434–3441.
Pangborn, R. M. 1960. Influence of colour on the discrimination of sweetness. American Journal of Psychology
73: 229–238.
Pangborn, R. M., and B. Hansen. 1963. The influence of colour on discrimination of sweetness and sourness in
pear-nectar. American Journal of Psychology 76: 315–317.
Philipsen, D. H., F. M. Clydesdale, R. W. Griffin, and P. Stern. 1995. Consumer age affects response sensory
characteristics of a cherry flavoured beverage. Journal of Food Science 60: 364–368.
Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory
qualities of foods. Food Quality and Preference 10: 349–356.
756 The Neural Bases of Multisensory Processes
Prescott, J. 2004. Psychological processes in flavour perception. In Flavour perception, ed. A. J. Taylor and D.
Roberts, 256–278. London: Blackwell Publishing.
Prescott, J., N. Ripandelli, and I. Wakeling. 2001. Binary taste mixture interactions in PROP non-tasters,
medium-tasters and super-tasters. Chemical Senses 26: 993–1003.
Prescott, J., J. Soo, H. Campbell, and C. Roberts. 2004. Response of PROP taster groups to variations in sen-
sory qualities within foods and beverages. Chemical Senses 26: 993–1003.
Reed, D. R. 2008. Birth of a new breed of supertaster. Chemical Senses 33: 489–491.
Rolls, E. T. 2004. Smell, taste, texture, and temperature multimodal representations in the brain, and their rel-
evance to the control of appetite. Nutrition Reiews 62: S193–S204.
Rolls, E. T., and L. L. Baylis. 1994. Gustatory, olfactory, and visual convergence within the primate orbitofron-
tal cortex. Journal of Neuroscience 14: 5437–5452.
Roth, H. A., L. J. Radle, S. R. Gifford, and F. M. Clydesdale. 1988. Psychophysical relationships between perceived
sweetness and colour in lemon- and lime-flavoured drinks. Journal of Food Science 53: 1116–1119, 1162.
Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Perception & Psychophysics
31: 397–401.
Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory in
put to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327.
Seymour, S. K., and D. D. Hamann. 1988. Crispness and crunchiness of selected low moisture foods. Journal
of Texture Studies 19: 79–95.
Shankar, M. U., C. A. Levitan, J. Prescott, and C. Spence. 2009. The influence of color and label information
on flavor perception. Chemosensory Perception 2: 53–58.
Shankar, M. U., C. Levitan, and C. Spence. 2010. “Grape expectations”: Does higher level knowledge mediates
the interpretation of multisensory cues? Consciousness & Cognition 19: 380–390.
Shimojo, S., and L. Shams. 2001. Sensory modalities are not separate modalities: Plasticity and interactions.
Current Opinion in Neurobiology 11: 505–509.
Skrandies, W., and N. Reuther. 2008. Match and mismatch of taste, odor, and color is reflected by electrical
activity in the human brain. Journal of Psychophysiology 22: 175–184.
Small, D. M. 2004. Crossmodal integration—insights from the chemical senses. Trends in Neurosciences 27:
120–123.
Small, D. M., J. C. Gerber, Y. E. Mak, and T. Hummel. 2005. Differential neural responses evoked by orthona-
sal versus retronasal odorant perception in humans. Neuron 47: 593–605.
Small, D. M., and J. Prescott 2005. Odor/taste integration and the perception of flavour. Experimental Brain
Research 166: 345–357.
Small, D. M., M. G. Veldhuizen, J. Felsted, Y. E. Mak, and F. McGlone. 2008. Separable substrates for anticipa-
tory and consummatory food chemosensation. Neuron 57: 786–797.
Spence, C. 2002. The ICI report on the secret of the senses. London: The Communication Group.
Spence, C., C. Levitan, M. U. Shankar, and M. Zampini. 2010. Does food colour influence flavour identification
in humans? Chemosensory Perception 3: 68–84.
Spence, C., and M. U. Shankar. 2010. The influence of auditory cues on the perception of, and responses to,
food and drink. Journal of Sensory Studies 25: 406–430.
Spence, C., M. U. Shankar, and H. Blumenthal. In press. ‘Sound bites’: Auditory contributions to the percep-
tion and consumption of food and drink. To appear in Art and the senses, ed. F. Bacci and D. Melcher.
Oxford: Oxford Univ. Press.
Spence, C., and M. Zampini. 2006. Auditory contributions to multisensory product perception. Acta Acustica
united with Acustica 92: 1009–1025.
Stevenson, R. J., and R. A. Boakes. 2004. Sweet and sour smells: Learned synesthesia between the senses of
taste and smell. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B. E. Stein,
69–83. Cambridge, MA: MIT Press.
Stevenson, R. J., R. A. Boakes, and J. Prescott. 1998. Changes in odor sweetness resulting from implicit learn-
ing of a simultaneous odor–sweetness association: An example of learned synesthesia. Learning and
Motivation 29: 113–132.
Stevenson, R. J. 2009. The psychology of flavour. Oxford: Oxford Univ. Press.
Stevenson, R. J., and M. Oaten. 2008. The effect of appropriate and inappropriate stimulus color on odor dis-
crimination. Perception & Psychophysics 70: 640–646.
Stevenson, R. J., and C. Tomiczek. 2007. Olfactory-induced synesthesias: A review and model. Psychological
Bulletin 133: 294–309.
Stillman, J. 1993. Colour influences flavour identification in fruit-flavoured beverages. Journal of Food Science
58: 810–812.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor 757
Stillman, J. A. 2002. Gustation: Intersensory experience par excellence. Perception 31: 1491–1500.
Strugnell, C. 1997. Colour and its role in sweetness perception. Appetite 28: 85.
Tyle, P. 1993. Effect of size, shape and hardness of particles in suspension on oral texture and palatability. Acta
Psychologica 84: 111–118.
Varela, P., J. Chen, C. Karlsson, and M. Povey. 2006. Crispness assessment of roasted almonds by an integrated
approach to texture description: Texture, acoustics, sensory and structure. Journal of Chemometrics 20:
311–320.
Verhagen, J. V., and L. Engelen. 2006. The neurocognitive bases of human multimodal food perception: Sensory
integration. Neuroscience and Biobehavioral Reviews 30: 613–650.
Vickers, Z. M. 1979. Crispness and crunchiness in foods. In Food texture and rheology, ed. P. Sherman, 145–
166. London: Academic Press.
Vickers, Z. M. 1981. Relationships of chewing sounds to judgments of crispness, crunchiness and hardness.
Journal of Food Science 47: 121–124.
Vickers, Z. M. 1983. Pleasantness of food sounds. Journal of Food Science 48: 783–786.
Vickers, Z. M. 1984. Crispness and crunchiness—A difference in pitch? Journal of Texture Studies 15:
157–163.
Vickers, Z. M. 1987. Crispness and crunchiness—Textural attributes with auditory components. In Food tex-
ture: Instrumental and sensory measurement, ed. H. R. Moskowitz, 45–66. New York: Marcel Dekker.
Vickers, Z. M. 1991. Sound perception and food quality. Journal of Food Quality 14: 87–96.
Vickers, Z. M., and M. C. Bourne. 1976. A psychoacoustical theory of crispness. Journal of Food Science 41:
1158–1164.
Vickers, Z. M., and S. S. Wasserman. 1979. Sensory qualities of food sounds based on individual perceptions.
Journal of Texture Studies 10: 319–332.
von Békésy, G. 1957. Neural volleys and the similarity between some sensations produced by tones and by skin
vibrations. Journal of the Acoustical Society of America 29: 1059–1069.
Welch, R. B., and D. H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological
Bulletin 3: 638–667.
Wheatley, J. 1973. Putting colour into marketing. Marketing 67: 24–29.
White, T. L., and J. Prescott. 2007. Chemosensory cross-modal Stroop effects: Congruent odors facilitate taste
identification. Chemical Senses 32: 337–341
Yau, N. J. N., and M. R. McDaniel. 1992. The effect of temperature on carbonation perception. Chemical
Senses 14: 337–348.
Yeomans, M., L. Chambers, H. Blumenthal, and A. Blake. 2008. The role of expectancy in sensory and hedonic
evaluation: The case of smoked salmon ice-cream. Food Quality and Preference 19: 565–573.
Zampini, M., D. Sanabria, N. Phillips, and C. Spence. 2007. The multisensory perception of flavour: Assessing
the influence of colour cues on flavour discrimination responses. Food Quality & Preference 18:
975–984.
Zampini, M., and C. Spence. 2004. The role of auditory cues in modulating the perceived crispness and stale-
ness of potato chips. Journal of Sensory Studies 19: 347–363.
Zampini, M., and C. Spence. 2005. Modifying the multisensory perception of a carbonated beverage using
auditory cues. Food Quality and Preference 16: 632–641.
Zampini, M., E. Wantling, N. Phillips, and C. Spence. 2008. Multisensory flavour perception: Assessing the
influence of fruit acids and colour cues on the perception of fruit-flavoured beverages. Food Quality &
Preference 18: 335–343.
Zellner, D. A., A. M. Bartoli, and R. Eckard. 1991. Influence of colour on odor identification and liking ratings.
American Journal of Psychology 104: 547–561.
Zellner, D. A., and P. Durlach. 2003. Effect of colour on expected and experienced refreshment, intensity, and
liking of beverages. American Journal of Psychology 116: 633–647.
Zellner, D. A., and M. A. Kautz. 1990. Colour affects perceived odor intensity. Journal of Experimental
Psychology: Human Perception and Performance 16: 391–397.
Zellner, D. A., and L. A. Whitten. 1999. The effect of colour intensity and appropriateness on color-induced
odor enhancement. American Journal of Psychology 112: 585–604.
MEDICINE
It has become accepted in the neuroscience community that perception and performance are
quintessentially multisensory by nature. Using the full palette of modern brain imaging and
neuroscience methods, The Neural Bases of Multisensory Processes details current understanding
in the neural bases for these phenomena as studied across species, stages of development, and
clinical statuses.
Organized thematically into nine subsections, the book is a collection of contributions by leading
scientists in the field. Chapters build generally from basic to applied, allowing readers to ascertain
how fundamental science informs the clinical and applied sciences.
The last sections of the book focus on naturalistic multisensory processes in three separate contexts:
motion signals, multisensory contributions to the perception and generation of communication
signals, and how the perception of flavor is generated. The text provides a solid introduction for
newcomers and a strong overview of the current state of the field for experts.
K10614