You are on page 1of 47

March 2011 Volume 15, Number 3 pp.

95–140

Editor
Update
Stavroula Kousta
Executive Editor, Neuroscience Book Review
Katja Brose 95 How does the brain make economic Antonio Rangel
Journal Manager decisions? Review of: Foundations of
Rolf van der Sanden Neuroeconomic Analysis (by Paul W. Glimcher)
Journal Administrator
Myarca Bonsink
Opinion
Advisory Editorial Board
R. Adolphs, Caltech, CA, USA 97 What drives the organization of object Bradford Z. Mahon and
R. Baillargeon, U. Illinois, IL, USA knowledge in the brain? Alfonso Caramazza
N. Chater, University College, London, UK
P. Dayan, University College London, UK 104 Specifying the self for cognitive Kalina Christoff, Diego Cosmelli,
S. Dehaene, INSERM, France neuroscience Dorothée Legrand and Evan Thompson
D. Dennett, Tufts U., MA, USA
J. Driver, University College, London, UK
Y. Dudai, Weizmann Institute, Israel Review
A.K. Engel, Hamburg University, Germany
M. Farah, U. Pennsylvania, PA, USA
113 Songs to syntax: the linguistics of birdsong Robert C. Berwick, Kazuo Okanoya,
S. Fiske, Princeton U., NJ, USA Gabriel J.L. Beckers and
A.D. Friederici, MPI, Leipzig, Germany Johan J. Bolhuis
O. Hikosaka, NIH, MD, USA
R. Jackendoff, Tufts U., MA, USA
122 Representing multiple objects as an George A. Alvarez
P. Johnson-Laird, Princeton U., NJ, USA ensemble enhances visual cognition
N. Kanwisher, MIT, MA, USA
132 Cognitive neuroscience of self-regulation Todd F. Heatherton and
C. Koch, Caltech, CA, USA
M. Kutas, UCSD, CA, USA failure Dylan D. Wagner
N.K. Logothetis, MPI, Tübingen, Germany
J.L. McClelland, Stanford U., CA, USA
E.K. Miller, MIT, MA, USA
E. Phelps, New York U., NY, USA
R. Poldrack, U. Texas Austin, TX, USA
M.E. Raichle, Washington U., MO, USA
T.W. Robbins, U. Cambridge, UK
A. Wagner, Stanford U., CA, USA
V. Walsh, University College, London, UK

Editorial Enquiries
Trends in Cognitive Sciences
Cell Press
600 Technology Square
Cambridge, MA 02139, USA
Tel: +1 617 397 2817
Forthcoming articles
Fax: +1 617 397 2810
E-mail: tics@elsevier.com Implicit social cognition: from measures to mechanisms
Brian A. Nosek, Carlee Beth Hawkins and Rebecca S. Frazier
Thalamic pathways for active vision
Robert H. Wurtz, Kerry McAlonan, James Cavanaugh and Rebecca A. Berman
Posterior cingulate cortex: adapting behavior to a changing world
John M. Pearson, Sarah R. Heilbronner, David L. Barack, Benjamin Y. Hayden and Michael L. Platt
Visual Crowding: a fundamental limit on conscious perception and object recognition
David Whitney and Dennis M. Levi
Frontal Pole Cortex: encoding ends at the end of the endbrain
Satoshi Tsujimoto, Aldo Genovesio and Steven P. Wise

Cover: Failing to control one's own behavior underlies several social and mental health problems. On pages 132–139
Todd F. Heatherton and Dylan D. Wagner review a large body of recent psychological and neuroscientific research on
self-regulation failures, including addictive or hedonistic behavior, lack of emotional control, as well as stereotyping and
prejudicial behavior. The authors propose a model of self-regulation that accounts for relf-regulation failures in terms of a
loss of balance between prefrontal cortical regions that implement cognitive control and subcortical structures that drive
appetitive behaviors. Although facetious, the cover image (Brett Lamb/iStock Vectors/Getty Images) powerfully demonstrates
the detrimental effects of loss of control.
Update

Book Review

How does the brain make economic decisions?


Foundations of Neuroeconomic Analysis by Paul W. Glimcher. Oxford University Press, 2010. $69.95/£40.00 (488 pages)
ISBN 978-0-19r-r974425-1.

Antonio Rangel
Division of Humanities and Social Sciences & Computational and Neural Systems, Caltech, 1200 E. California Blvd, Pasadena,
CA, USA

For millennia the quest to understand First, it makes the case for bringing all of the parent
human nature and, in particular, why fields together in a unified and interdisciplinary effort to
we behave the way we do, was mostly understand human behavior simultaneously at multiple
the domain of religion and philosophy. levels of analysis. Importantly, Glimcher argues that the
Over the last two centuries, this quest benefits of this ‘unholy marriage’ flow in all directions:
has become the domain of three scientific economists and psychologists will benefit from grounding
disciplines: behavioral neuroscience, psy- their theories on the reality of how the brain actually
chology and economics. Although these makes decisions, and neuroscientists will benefit by being
disciplines share a common goal, their forced to understand the brain at the computational level.
methodology and sensibilities are signifi- Glimcher forcefully argues that this effort will result in a
cantly different, which often leads to inconsistent and even synthetic theory of human behavior that will generate new
contradictory explanations of the same behavioral phe- critical insights for all of the parent disciplines.
nomena. Consider, for example, the basic question of Second, the book provides a brilliant introduction to
why some individuals become addicted whereas others critical ideas in economics and psychology for neuroscien-
do not. The most popular economic theory, called the tists, and to critical ideas in behavioral and perceptual
rational addiction model [1], assumes that individuals neuroscience for economists and psychologists. For this
become addicted as a result of maximizing a strong taste reason alone, anyone considering doing research in the
for consuming drugs in the short-term that also increases computational or neurobiological foundations of decision-
the desire to consume them in the future. By contrast, making, and anyone interested in why we act the way we
current neurobiological theories of addiction are based on do (from lawyers to philosophers), should read this book.
the idea that consumption of drugs leads to a systematic Third, the book reviews some critical findings in the field
malfunction of the brain’s reward learning systems, which and argues that they already provide a glimpse of how a
induces addicted individuals to consume them even when unified model of decision-making might look. For example,
it is not optimal to do so [2,3]. Glimcher argues that we have begun to understand how
Neuroeconomics is a relatively new field that seeks to the brain computes values, makes choices by comparing
reconcile these conflicting theories of human behavior [4]. those values, and learns those values through a process
The goal of the field is to combine methods and theories known as reinforcement learning. He also argues that the
from behavioral neuroscience, psychology, economics and existing findings, together with some basic neuroscience
computer science to answer the following basic questions: ideas such as divisive normalization [6] (a principle
(i) What are the computations made by the brain to make explaining how the cortex integrates competing inputs
different types of decisions? (ii) How does the underlying to maximize encoded information while keeping neurons
neurobiology implement and constrain those computa- within bounded firing ranges), provide a computational
tions? (iii) What are the implications of this knowledge and neurobiological implementation of economic concepts
for understanding behavior in economic, clinical, policy such as prospect theory or random utility. Although the
and legal contexts? The ultimate goal of the field is to ideas in this section of the book are controversial, they are
produce a computational and neurobiological account of also extremely thought-provoking.
decision-making that can serve as a common foundation for By necessity, this ambitious book also reflects some of
understanding human behavior across the natural and the current shortcomings of this young field. For example,
social sciences. In this sense, neuroeconomics can be traditional economists are skeptical as to whether the field
thought of as the realization of the dream outlined by will provide transformative insights for their discipline [7],
E.O. Wilson in Consilience: The Unity of Knowledge [5]. and at this early stage it is hard to provide concrete
In Foundations of Neuroeconomic Analysis, Paul Glim- examples against this view. In addition, although I admire
cher, one of the founders of the field, outlines his vision for Glimcher’s attempt to begin sketching a synthetic model of
this ambitious research agenda. The book accomplishes choice, it can be argued that it might be too early to do so.
several aims with remarkable effectiveness. For example, at this stage in our understanding of the
brain’s decision-making circuitry, it is unclear how to
reconcile the standard neuroeconomic model proposed in
Corresponding author: Rangel, A. (rangel@hss.caltech.edu). the book with evidence showing that behavior can be
95
Update Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

influenced by at least three different behavioral controllers by its author. In fact, in many ways this book might do for
(called the Pavlovian, habitual and goal-directed control- neuroeconomics what David Marr’s Vision did for vision
lers) that are often at odds with each other [4], or that that science [8].
there might be multiple and competing value learning
systems. References
These caveats notwithstanding, I was truly inspired by 1 Becker, G. and Murphy, K. (1988) A theory of rational addiction. J. Polit.
Econ. 96, 675
this book. It is an impressive piece of scholarly work by
2 Redish, A.D. (2004) Addiction as a computational process gone awry.
one of the world’s most prominent neuroeconomists. Al- Science 306, 1944–1947
though I have been working in the field for years, it has 3 Redish, A.D. et al. (2008) Addiction as vulnerabilities in the decision
changed the way I think about many of the open ques- process. Behav. Brain Sci. 31, 461–487
tions we study. The book will probably stir up debate 4 Rangel, A. et al. (2008) A framework for studying the neurobiology of
value-based decision making. Nat. Rev. Neurosci. 9, 545–556
among the parent disciplines about the feasibility and 5 Wilson, E.O. (1999) Consilience: The Unity of Knowledge, Vintage
virtues of the neuroeconomics approach. It is beautifully 6 Reynolds, J.H. and Heeger, D.J. (2009) The normalization model of
written, with a voice that is scholarly yet accessible at the attention. Neuron 61, 168–185
same time. It will be of interest not only to those working 7 Bernheim, B.D. (2009) On the potential of neuroeconomics: a critical
in the field, but also to a wide audience of readers. Finally, (but hopeful) appraisal. Am. Econ. J. Microecon. 1, 1–41
8 Marr, D. (1982) Vision, W.H. Freeman and Co.
I suspect that the thoughtfulness of its arguments and
the passion of its rhetoric will inspire a new generation of 1364-6613/$ – see front matter
researchers to stake their careers on the vision outlined doi:10.1016/j.tics.2010.12.006 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3

96
Opinion

What drives the organization of object


knowledge in the brain?
Bradford Z. Mahon1,2 and Alfonso Caramazza3,4
1
Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, USA
2
Department of Neurosurgery, 601 Elmwood Ave, University of Rochester Medical Center, Rochester, NY 14642, USA
3
Department of Psychology, William James Hall, 33 Kirkland Street, Harvard University, Cambridge, MA 02138, USA
4
Center for Mind/Brain Sciences, University of Trento, Palazzo Fedrigotti, Corso Bettini 31, I-38068 Rovereto (TN), Italy

Various forms of category-specificity have been de- questions about the principles that determine brain orga-
scribed at both the cognitive and neural levels, inviting nization [4,10–12,16,17]. To date, the emphasis of research
the inference that different semantic domains are pro- on the organization of the ventral stream has been on the
cessed by distinct, dedicated mechanisms. In this paper, stimulus properties that drive responses in a particular
we argue for an extension of a domain-specific interpre- brain region, studied in relative isolation from other
tation to these phenomena that is based on network- regions. This approach was inherited from well-estab-
level analyses of functional coupling among brain lished traditions in neurophysiology and psychophysics
regions. On this view, domain-specificity in one region where it has been enormously productive for mapping
of the brain emerges because of innate connectivity with psychophysical continua in primary sensory systems. It
a network of regions that also process information about does not follow that the same approach will yield equally
that domain. Recent findings are reviewed that converge useful insights for understanding the principles of the
with this framework, and a new direction is outlined for neural organization of conceptual knowledge. The reason
understanding the neural principles that shape the or- is that unlike the peripheral sensory systems, the pattern
ganization of conceptual knowledge. of neural responses in higher order areas is only partially
driven by the physical input – it is also driven by how the
stimulus is interpreted, and that interpretation does not
Category-specificity as a means to study constraints on occur in a single, isolated region. The ventral object proces-
brain organization sing stream is the central pathway for the extraction of
Brain-damaged patients with category-specific semantic object identity from visual information in the primate
impairments have conceptual level impairments that are brain – but what the brain does with that information
specific to a category of items, such as animals, fruit/ about object identity depends on how the ventral stream is
vegetables, nonliving things or conspecifics. Detailed anal- connected to the rest of the brain.
ysis of those patients (Box 1) suggests that conceptual Here, we focus on visual object recognition, as this has
knowledge is organized according to domain-specific con- been the aspect of object knowledge and processing that
straints [1,2]. According to the domain-specific hypothesis has been studied in greatest depth; however, similar prin-
[2], there are innately dedicated neural circuits for the ciples would be expected to apply to other modalities as
efficient processing of a limited number of evolutionarily appropriate. We argue that there are innately determined
motivated domains of knowledge. This interpretation of patterns of connectivity that mediate the integration of
the neuropsychological phenomenon of category-specific information from the ventral stream with information
semantic deficits has been extended to interpret results computed by other brain regions. Those channels are at
from functional magnetic resonance imaging (fMRI) in the grain of a limited number of evolutionarily relevant
healthy subjects [3,4]. Much of the research using fMRI domains of knowledge. We further suggest that what is
to study category-specificity has focused on the pattern of given innately is the connectivity, and that specialization
responses in the ventral visual pathway, which projects by semantic category in the ventral stream is driven by
from early visual areas to lateral and ventral occipital– that connectivity. The implication of this proposal is that
temporal regions, and processes object shape, texture and the organization of the ventral stream by category is
color in ways that are relatively invariant to viewpoint, relatively invariant to visually based, bottom-up, con-
size and orientation [5–7]. Different regions within the straints. This approach corrects an imbalance in explana-
ventral pathway preferentially respond to images of faces, tions of the causes of the consistent topography by
animals, tools, places, written words and body parts [4,6,8– semantic category in the ventral object-processing stream
13], see also [13–15]. by giving greater prominence to endogenously determined
The existence of consistent topographic biases by se- constraints on brain organization.
mantic category in the ventral stream raises fundamental
The distributed domain-specific hypothesis
Corresponding authors: Mahon, B.Z. (mahon@rcbi.rochester.edu); Caramazza, A. A domain-specific neural system is a network of brain
(caram@wjh.harvard.edu). regions [11] in which each region processes a different type
1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.004 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3 97
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Box 1. Cognitive neuropsychological evidence for domain-specific constraints


Patients with category-specific semantic deficits can be differentially varying extents in the representation of items from different categories.
or even selectively impaired for knowledge of animals, plants, However, the existence of category-specificity in imaging [4], neuro-
conspecifics or artifacts (for review see [11]). The knowledge physiology [67] and neuropsychology [11] cannot be explained
impairment cannot be explained in terms of a differential impairment exclusively by appeal to modality-based principles of organization. This
to a sensory or motor-based modality of information. Although suggests that the dimensions of brain organization that express
discussion and debate continues as to whether non-categorical themselves as phenomena of category-specificity (across methods
dimensions of organization can lead to category-specific brain and populations) are in fact domain-specific constraints on brain
organization, there is consensus that the phenomenon itself is organization. Finally, there is emerging neuropsychological evidence
‘categorical’ (see Figure I for representative patients’ performance in for endogenous constraints on brain organization, including the
picture naming and answering semantic probe questions). existence of category-specific semantic deficits tested at age 16 years
There are important parallels between the neuropsychological after stroke at 1 day of age [patient Adam, see below; ref 68].
literature on category-specific semantic deficits and the findings from There are also parallels between the patterns of category-specific
functional neuroimaging and neurophysiology. First, the categories that semantic deficits and psychophysical studies of putatively specialized
emerge from the neuropsychological literature map onto the categories routes for processing specific classes of visual stimuli. For instance,
that emerge in functional imaging and neurophysiology. This indicates New and colleagues [69], using a change detection paradigm,
that the different methods and populations are tracking the same demonstrated a significant advantage for living animate stimuli.
underlying property of brain organization. Second, the resistance of Thorpe and colleagues [70] have demonstrated extremely rapid and
category-specific deficits to be explained by dimensions of organization accurate detection of face and animal stimuli. Almeida and colleagues
that do not include semantic category [2] parallels the same pattern that [65] have demonstrated that conceptual information about manipulable
has emerged in imaging and neurophysiology [60]. It is clearly the case objects can be extracted from stimuli that are putatively not processed
that the brain is organized by sensory and motor modalities, and it is by the ventral visual pathway. These and other findings could indicate
[()TD$FIG]
also the case that different sensory and motor modalities participate to experimental ways of isolating domain-specific networks.

Category-specific semantic deficits


Picture naming performance by category
Key:
100
Living animate + nonliving`
80
Percent correct

Fruit/vegetable + nonliving
60
Fruit/vegetable
40
Living animate
20 Nonliving

0 Conspecifics
RC EW RS MD KS APA CW PL
Patients

Semantic probe questions by category and modality


100
Key:
80
Percent correct

Living: visual/perceptual
60
Nonliving: visual/perceptual
40 Living: nonvisual

20 Nonliving: nonvisual

0
EW GR FM DB RC ADAM
Patients
TRENDS in Cognitive Sciences

Figure I. Representative patients with category-specific semantic deficits. Patients with category-specific semantic deficits may have selective impairments for naming
items from one category of items compared to other categories (top panel). Those patients may also have categorical impairments for answering questions about all
types of object properties (i.e., visual/perceptual and functional/associative; bottom panel). For further discussion and references to the patients shown here, see [11].

of information about the same domain or category of objects tations that must be performed over items from the domain
[2,18]. The types of information processed by different parts are sufficiently ‘eccentric’ [19] so as to merit a specialized
of a network can be sensory, motor, affective or conceptual. process. In other words, the coupling across different brain
The range of potential domains or classes of items that can regions that is necessary for successful processing of a given
have dedicated neural circuits is restricted to those with an domain is different in kind from the types of coupling that
evolutionarily relevant history that could have biased the are needed for other domains of knowledge.
system toward a coherent organization. A second important For instance, the need to integrate motor-relevant in-
characteristic of domain-specific systems is that the compu- formation with visual information is present for tools and

98
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

other graspable objects and less so for animals or faces. By the relative periphery (e.g. houses) [37]. Another class of
contrast, the need to integrate affective information, bio- proposals is based on the suppositions that items from the
logical motion processing and visual form information is same category tend to look more similar than items from
strong for conspecifics and animals, and less so for tools or different categories, and similarity in visual shape is
places. Thus, our proposal is that domain-specific con- mapped onto the ventral occipital–temporal cortex [17].
straints are expressed as patterns of connectivity among It has also been proposed that a given category could
regions of the ventral stream and other areas of the brain require differential processing relative to other categories,
that process nonvisual information about the same classes for instance in terms of expertise [38], visual crowding [39]
of items. For instance, specialization for faces in the lateral or the relevance of visual information for categorization
fusiform gyrus (fusiform face area [20–22]) arises because [40]. Other accounts appeal to ‘feature’ similarity and
that region of the brain has connectivity with the amygdala distributed feature maps [41]. Finally, it has been sug-
and the superior temporal sulcus (among other regions) gested that multiple, visually based, dimensions of orga-
which are important for the extraction of socially relevant nization combine super-additively to generate the
information and biological motion. Specificity for tools and boundaries among category-preferring regions [12]. Com-
manipulable objects in the medial fusiform gyrus is driven, mon to all of these accounts is the assumption that visual
in part, by connectivity between that region and regions of experience provides the necessary structure, and that a
parietal cortex that subserve object manipulation [23–26]. visual dimension of organization happens to be highly
Connectivity-based constraints can also be responsible for correlated with semantic category.
other effects of category-specificity in the ventral visual Although visual information is important in shaping
stream, such as connectivity between somatomotor areas how the ventral stream is organized, recent findings indi-
and regions of the ventral stream that differentially re- cate that visual experience is not necessary in order for the
spond to body parts [27–29] (extrastriate body area), con- same, or similar, patterns of category-specificity to be
nectivity between left lateralized frontal language present in the ventral stream. In an early position emission
processing regions and ventral stream areas specialized tomography study, Buchel and colleagues [42] showed that
for printed words (visual word form area [30,31]), and congenitally blind subjects show activation for words (pre-
connectivity between regions involved in spatial analysis sented in Braille) in the same region of the ventral stream
and ventral stream regions showing differential responses as sighted individuals (presented visually). Pietrini and
to highly contextualized stimuli, such as houses, scenes colleagues [43] used multi-voxel pattern analysis to show
and large non-manipulable objects (parahippocampal that the pattern of activation over voxels in the ventral
place area [32]). stream was more consistent across different exemplars
within a category than exemplars across categories. More
The role of visual experience recently, we [44] have shown that the same medial-to-
According to the distributed domain-specific hypothesis, lateral bias in category preferences on the ventral surface
the organization by category in the ventral stream is not of the occipital–temporal cortex that is present in sighted
only a reflection of the visual structure of the world, it also individuals is present in congenitally blind subjects. Spe-
reflects the structure of how ventral visual cortex is con- cifically, nonliving things, compared to animals elicit stron-
nected to other regions of the brain [11,23,33]. However, ger activation in medial regions of the ventral stream
visual experience and dimensions of visual similarity are (Figure 1).
also crucial in shaping the organization of the ventral Although these studies on category-specificity in blind
stream [34,35] – after all, the principal afferents to the individuals represent only a first-pass analysis of the role
ventral stream come from earlier stages in the visual of visual experience in driving category-specificity in the
hierarchy [36]. ventral stream, they indicate that visual experience is not
Although some authors have recently discussed nonvi- necessary in order for category-specificity to emerge in the
sual dimensions that could be relevant in shaping the ventral stream. This fact raises an important question – if
organization of the ventral stream [4,6,7], many accounts visual experience is not needed for the same topographical
differentially weight the contribution of visual experience biases in category-specificity to be present in the ventral
in their explanation of the causes of category specific stream, then, what drives such organization? One possi-
organization within the ventral stream. Several hypothe- bility, as we have suggested, is innate connectivity between
ses have been developed, and we merely touch on them regions of the ventral stream and other regions of the brain
here to illustrate a common assumption: that the organi- that process affective, motor and conceptual information.
zation of the ventral stream reflects the visual structure of
the world, as interpreted by domain-general processing Connectivity as an innate domain-specific constraint
constraints. Thus, the general thrust of those accounts is A crucial component of the distributed domain-specific
that the visual structure of the world is correlated with hypothesis is the notion of connectivity. The most obvious
semantic category distinctions in a way that is captured by candidate to mediate such networks is white matter con-
how visual information is organized in the brain. One of the nectivity. However, it is important to underline that
most explicit proposals is that there are weak eccentricity functional networks need not be restricted by the grain
preferences in higher order visual areas that are inherited of white matter connectivity and, perhaps more important-
from earlier stages in the processing stream. Those eccen- ly, task- and state-dependent changes could bias proces-
tricity biases interact with our experience of foveating sing toward different components of a broader anatomical
some classes of items (e.g. faces) and viewing others in brain network. For instance, connectivity between lateral

99
()TD$FIG][ Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Category-specific organization does not require visual experience


Sighted: Sighted: picture viewing
picture viewing
Left ventral ROI Right ventral ROI
2 2

t Values (Living - Nonliving)


1 1
0 0

-1 -1
-2 -2
-3 -3
-4 -4
-40 -38 -36 -34 -32 -30 -28 -26 -24 24 26 28 30 32 34 36 38 40
Tal. Coord. X Dim Tal. Coord. X Dim

Sighted: Sighted: auditory task


auditory task Left ventral ROI Right ventral ROI
1 1
t Values (Living - Nonliving)

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
-40 -38 -36 -34 -32 -30 -28 -26 -24 24 26 28 30 32 34 36 38 40
Tal. Coord. X Dim Tal. Coord. X Dim

Congenitally blind: Congenitally blind: auditory task


auditory task Left ventral ROI Right ventral ROI
0 0
t Values (Living - Nonliving)

-0.5 -0.5
-1 -1

-1.5 -1.5
-2 -2
-2.5 -2.5

-40 -38 -36 -34 -32 -30 -28 -26 -24 24 26 28 30 32 34 36 38 40


Tal. Coord. X Dim Tal. Coord. X Dim
TRENDS in Cognitive Sciences

Figure 1. Congenitally blind and sighted participants were presented with auditorily spoken words of living things (animals) and nonliving things (tools, non-manipulable
objects) and were asked to make size judgments about the referents of the words. The sighted participants were also shown pictures corresponding to the same stimuli in a
separate scan. For sighted participants viewing pictures, the known finding was replicated that nonliving things such as tools and large non-manipulable objects lead to
differential neural responses in medial aspects of the ventral occipital–temporal cortex. This pattern of differential BOLD responses for nonliving things in medial aspects of
the ventral occipital–temporal cortex was also observed in congenitally blind participants and sighted participants performing the size judgment task over auditory stimuli.
These data indicate that the medial-to-lateral bias in the distribution of category-specific responses does not depend on visual experience. For details of the study, see [44].

and orbital prefrontal regions and the ventral occipital– result in difficulties categorizing all types of visual stimuli,
temporal cortex [45,46] is crucial for categorization of disruption of the afferents to the prefrontal cortex from a
visual input. It remains an open question whether multiple specific category-preferring area could lead to categoriza-
functional networks are subserved by this circuit, each tion problems selective to that domain. The neural basis of
determined by the type of visual stimulus being catego- the connectivity that supports domain-specific neural sys-
rized. For instance, when categorizing manipulable tems is, admittedly, in need of further development and
objects, connectivity between parietofrontal somatomotor articulation. Below, we will return to expectations that can
areas and prefrontal cortex could dominate, whereas when be drawn from this explanation.
categorizing faces other regions could express stronger
functional coupling to those same prefrontal regions. Such Evidence for innate constraints
a suggestion would generate the expectation that whereas The signature of innate structure is similarity across
damaging prefrontal-to-ventral stream connections could individuals, both within a species and potentially across

100
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

species. ‘Innate’ does not imply ‘present-from-birth’, al- monkeys [35,56], comparable to observations with similar
though present-from-birth strongly suggests an innate methods in awake human subjects [15]. More recently,
contribution. Maturation in the context of the right types functional imaging with macaques [57] and chimpanzees
of experience could be necessary for the expression of [58] suggests that at least for the category of faces, compa-
innate structure, and interactions between innate and rable clusters of face preferring voxels can be found in the
experiential factors can jointly constrain outcome [47]. temporal cortex in monkeys, as are observed in humans.
This is particularly the case for mental processes, as there Such common patterns of neural organization for some
would be nothing to process without the content provided classes of items in monkeys and humans could, of course,
by experience. Several lines of evidence show that genetic be entirely driven by dimensions of visual similarity, which
variables capture similarity in functional brain organiza- are known to modulate responses in the IT cortex [59].
tion as it relates to the presence of domain-specific neural However, even when serious attempts have been made to
circuits. explain such responses in terms of dimensions of visual
similarity, taxonomic structure emerges over and above
Twin studies the contribution of known visual dimensions. For instance,
Two recent reports highlight greater neural or functional Kriegeskorte and colleagues [60] used multi-voxel pattern
similarity between monozygotic twin pairs than between analysis to compare the similarity structure of a large
dizygotic twin pairs (for discussion see [48,49]). The array of different body, face, animal, plant and artifact
strength of these studies is that experiential contributions stimuli in the monkey IT cortex and human occipital–
are held constant across the two types of twin pairs. In a temporal cortex. The similarity among the stimuli was
fMRI study, Polk and colleagues [50] studied the similarity measured in terms of the similarity of the patterns of brain
between twin pairs in the distribution of responses to faces, responses they elicited, separately on the basis of the
houses, pseudowords and chairs in the ventral stream. The neurophysiological data (monkeys) [56] and fMRI data
authors found that face and place-related responses within (humans). The similarity structure that emerged revealed
face and place selective regions, respectively, were signifi- a tight taxonomic structure common to monkeys and
cantly more similar for monozygotic than for dizygotic humans, and which could not be reduced to known dimen-
twins. In another study, Wilmer and colleagues [51] stud- sions of visual similarity.
ied the face recognition and memory abilities [52] in mono-
zygotic and dizygotic twin pairs. The authors found that Next steps
the correlation in performance on the face recognition task Specialization of function in the brain is clearest at the
for monozygotic twins was more than double that for level of primary sensory and motor areas that have a
dizygotic twins. This difference was not present for control physical organization in the brain that projects topograph-
tasks of verbal and visual memory, indicating selectivity in ically onto a psychophysical dimension such as retinotopy,
the genetic contribution to behavioral abilities (see also tonotopy or somatotopy. At the other end of the continuum,
[53]). there are aspects of human cognition that have eluded neat
parcellation in the brain, such as the neural instantiation
Congenital prosopagnosia of the abstract and recursive systems that make human
Further evidence for a genetic contribution to face recog- thought and metacognition possible. Somewhere in the
nition abilities comes from congenital prosopagnosia, a middle are conceptual representations – they interface
developmental disorder in which individuals can have with and draw on the sensory and motor systems and at
selective impairments for recognizing faces [54]. A recent the same time require the flexibility characteristic of sym-
study by Thomas and colleagues [55] found that congenital bolic representations [61]. We have outlined a framework
prosopagnosia was associated with reduced structural in- for understanding the causes of category-specific organiza-
tegrity of the inferior longitudinal fasciculus, which pro- tion in the brain that is based on the hypothesis that there
jects from the fusiform gyrus to anterior regions of the are innate patterns of connectivity that constrain the
temporal lobe. Reduced structural integrity was also ob- distribution of category-specific neural regions. This pro-
served for the inferior fronto-occipital fasciculus which posal fully embraces a hierarchical view of the organization
projects from the ventral occipital–temporal cortex to fron- of conceptual knowledge [3]: the organization of the ventral
tal regions. Such observations of reduced integrity of major stream reflects the final product of a complex tradeoff of
white matter tracts linking the posterior occipital–tempo- pressures, some of which are expressed locally within the
ral cortex with other brain regions underlines the strength ventral stream and some of which are expressed as con-
of a network-level analysis in understanding the con- nectivity to the rest of the brain. Our suggestion is that
straints that shape the organization of knowledge in the connectivity to the rest of the brain is the first, or broadest,
ventral stream. principle according to which the ventral stream comes to be
organized by semantic category.
Non-human primates Although there is striking overlap in the semantic cate-
An expectation on the view that innate constraints shape gories that can dissociate under conditions of brain damage
category-specificity in the ventral stream is that such and which show consistent topographic organization in the
specificity, at least for some categories, can also be found ventral stream (Box 1), there is some divergence between
in non-human primates. It is well known, using neuro- the lesion locations in patients with category-specific def-
physiological recordings, that preferences for natural ob- icits and the patterns of neural activation observed with
ject stimuli exist in the inferior temporal (IT) cortex of fMRI. In particular, focal lesions to category-preferring

101
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

regions within the ventral stream do not invariably lead to the ventral stream. In particular, the regions specialized for
category-specific semantic deficits. This suggests that what printed words could offer a means to test this issue, as there
is damaged in patients with category-specific semantic is no motivation for presuming specialization of function to
deficits are the broader neural circuits that are specialized be innately present for printed words in the human brain.
for the impaired domain of knowledge. Damage to multiple Because there are regions that are consistently specialized
regions within that domain-specific neural circuit could for printed words, the expectation would be that this spe-
lead to a category-specific deficit by disrupting or disorga- cialization is driven by connectivity between the ventral
nizing the broader network. Furthermore, damage to stream and regions of the brain involved in linguistic pro-
regions that serve to integrate processing across the whole cessing. The prediction can be made that subject-by-subject
domain, such as the anterior temporal lobes [62,63] for the variation in the location of the visual word form area (tested
domains of animals and conspecifics, could particularly with Braille) in congenitally blind individuals will match up
disrupt functioning throughout the broader network. with subject-by-subject variation in connectivity between
A second direction for research that is encouraged by the that region of the ventral stream and other language proces-
distributed domain-specific hypothesis is to characterize sing regions of the brain.
the patterns of both anatomical and functional connectivity The core of our proposal, that specialization in a region
within domain-specific neural circuits. The expectation is of the brain is driven, in part, by constraints on how that
that there will be a tight coupling between patterns of information will ultimately be used in the service of be-
connectivity and the locations of category-preferring havior, is not new. It is well established that visual proces-
regions. In this regard, it is important to note that regions sing bifurcates into a dorsal stream for object-directed
expressing connectivity with category-specific regions action and spatial processing and a ventral stream for
within the ventral stream are not necessarily ‘downstream’ the extraction of object identity [66]. The two visual system
from visual object recognition, and do not necessarily model places important restrictions on plasticity of func-
represent ‘more developed’ or ‘more processed’ information tion within the visual system. Analogously, the distributed
than what is computed in the ventral stream. Stimuli are domain-specific hypothesis places new limits on plasticity
processed through multiple routes in parallel, such as of function within the ventral object processing stream,
subcortical processing of emotional face stimuli [20,21] and suggests that the key to describing those limits lies in
and dorsal stream processing of manipulable objects the patterns of connectivity between the ventral stream
[64,65]. Thus, one exciting possibility is that fast but coarse and other category-specific brain regions.
analysis of the visual input that bypasses the geniculate
striate pathway could ‘cue’ or ‘bias’ processing within the
References
ventral stream according to the content of the stimulus to 1 Capitani, E. et al. (2003) What are the facts of category-specific deficits?
be processed [45], analogous to attentional modulation of A critical review of the clinical evidence. Cogn. Neuropsychol. 20, 213–
early visual responses. 261
A third way in which the distributed domain-specific 2 Caramazza, A. and Shelton, J.R. (1998) Domain specific knowledge
systems in the brain: the animate-inanimate distinction. J. Cogn.
hypothesis can be tested is to explore the connectivity of all
Neurosci. 10, 1–34
the categories that show selective responses in the ventral 3 Caramazza, A. and Mahon, B.Z. (2003) The organization of conceptual
stream. For instance, an expectation that could be gener- knowledge: the evidence from category-specific semantic deficits.
ated is that stimuli from different domains, such as hands Trends Cogn. Sci. 7, 354–361
and tools, can live next to each other in the ventral stream 4 Martin, A. (2007) The representation of object concepts in the brain.
Annu. Rev. Psychol. 58, 25–45
because both would be predicted to have connectivity to the 5 Miceli, G. et al. (2001) The dissociation of color from form and function
somatomotor cortex. In other words, the way in which knowledge. Nat. Neurosci. 4, 662–667
representations are organized in the ventral stream should 6 Grill-Spector, K. and Malach, R. (2004) The human visual cortex. Annu.
follow patterns of connectivity, such that they are orga- Rev. Neurosci. 27, 649–677
nized according to similarity metrics represented in other 7 Cant, J.S. et al. (2009) fMR-adaptation reveals separate processing
regions for the perception of form and texture in the human ventral
parts of the brain, rather than (only) by dimensions of stream. Exp. Brain Res. 192, 391–405
visual similarity. 8 Allison, T. et al. (1994) Human extrastriate visual cortex and the
Perhaps the most pressing issue that must be addressed perception of faces, words, numbers, and colors. Cereb. Cortex 4,
by the distributed domain-specific hypothesis is whether 544–554
9 Chao, L.L. et al. (1999) Attribute-based neural substrates in posterior
connectivity drives specialization by category, as we have
temporal cortex for perceiving and knowing about objects. Nat.
proposed, or whether specialization of function is present Neurosci. 2, 913–919
independently of connectivity, and the connectivity emerges 10 Kanwisher, N. (2000) Domain specificity in face perception. Nature 3,
later. One way to empirically address this is to test individ- 759–763
uals who are blind since birth. Sensory deprivation will 11 Mahon, B.Z. and Caramazza, A. (2009) Concepts and categories: a
cognitive neuropsychological perspective. Annu. Rev. Psychol. 60, 1–15
remove the influence of local constraints, presumably
12 Op de Beeck, H.P. et al. (2008) Interpreting fMRI data: maps, modules
expressed over short-range bottom-up connections from and dimensions. Nat. Rev. Neurosci. 9, 123–135
earlier visual regions, but would not be expected to funda- 13 Pitcher, D. et al. (2009) Triple dissociation of faces, bodies, and objects
mentally alter the ‘longer range’ connections. Combining in extrastriate cortex. Curr. Biol. 19, 319–324
detailed analysis of connectivity in such individuals with 14 Bentin, S. et al. (1996) Electrophysiological studies of face perception in
humans. J. Cogn. Neurosci. 8, 551–565
analysis of the location of category-preferring regions in the 15 Kreiman, G. et al. (2000) Category-specific visual responses of
ventral stream could ground inferences about whether con- single neurons in the human medial temporal lobe. Nat. Neurosci.
nectivity in fact drives the location of category preferences in 3, 946–953

102
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

16 Cantlon, J.F. et al. (2011) Cortical representations of symbols, objects, 44 Mahon, B.Z. et al. (2009) Category-specific organization in the human
and faces are pruned back during early childhood. Cereb. Cortex 21, brain does not require visual experience. Neuron 63, 397–405
191–199 45 Kveraga, K. et al. (2007) Magnocellular projections as the trigger of top-
17 Haxby, J.V. et al. (2001) Distributed and overlapping representations down facilitation in recognition. J. Neurosci. 27, 13232–13240
of faces and objects in ventral temporal cortex. Science 293, 2425–2430 46 Miller, E.K. et al. (2003) Neural correlates of categories and concepts.
18 Carey, S. and Spelke, E. (1994) Domain specific knowledge and Curr. Opin. Neurobiol. 13, 198–203
conceptual change. In Mapping the Mind: Domain Specificity in 47 Lewontin, R. (2000) The Triple Helix: Genes, Organisms, and
Cognition and Culture (Hirschfeld, L. and Gelman, S.A., eds), pp. Environment, Harvard University Press
169–200, Cambridge University Press 48 Park, J. et al. (2009) Face processing: the interplay of nature and
19 Fodor, J. (1983) Modularity of Mind, MIT Press nurture. Neuroscientist 15, 445–449
20 Pasley, B.N. et al. (2004) Subcortical discrimination of unperceived 49 Zhu, Q. et al. (2010) Heritability of the specific cognitive ability of face
objects during binocular rivalry. Neuron 42, 163–172 perception. Curr. Biol. 20, 137–142
21 Vuilleumier, P. et al. (2004) Distant influences of amygdala lesion on 50 Polk, T.A. et al. (2007) Nature versus nurture in ventral visual cortex: a
visual cortical activation during emotional face processing. Nat. functional magnetic resonance imaging study of twins. J. Neurosci. 27,
Neurosci. 7, 1271–1278 13921–13925
22 Martin, A. and Weisberg, J. (2003) Neural foundations for 51 Wilmer, J. et al. (2010) Human face recognition ability is specific and
understanding social and mechanical concepts. Cogni. Neuropsychol. highly heritable. Proc. Natl. Acad. Sci. U.S.A. 107, 5238–5241
20, 575–587 52 Duchaine, B. and Nakayama, K. (2006) The Cambridge Face Memory
23 Mahon, B.Z. et al. (2007) Action-related properties shape object Test: results for neurologically intact individuals and an investigation
representations in the ventral stream. Neuron 55, 507–520 of its validity using inverted face stimuli and prosopagnosic subjects.
24 Valyear, K.F. and Culham, J.C. (2010) Observing learned object- Neuropsychologia 44, 576–585
specific functional grasps preferentially activates the ventral 53 Zhu, Q. et al. (2010) Heritability of the specific cognitive ability of face
stream. J. Cogn. Neurosci. 22, 970–984 perception. Curr. Biol. 20, 1–6
25 Noppeney, U. et al. (2006) Two distinct neural mechanisms for 54 Duchaine, B.C. et al. (2006) Prosopagnosia as an impairment to face
category-selective responses. Cereb. Cortex 16, 437–445 specific mechanisms: elimination of the alternative hypotheses in a
26 Rushworth, M.F.S. et al. (2006) Connection patterns distinguish 3 developmental case. Cogn. Neuropsychol. 23, 714–747
regions of human parietal cortex. Cereb. Cortex 16, 1418–1430 55 Thomas, C. et al. (2009) Reduced structural connectivity in ventral
27 Astafiev, S.V. et al. (2004) Extrastriate body area in human occipital visual cortex in congenital prosopagnosia. Nat. Neurosci. 12, 29–31
cortex responds to the performance of motor actions. Nat. Neurosci. 7, 56 Kiani, R. et al. (2007) Object category structure in response patterns of
542–548 neuronal population in monkey inferior temporal cortex. J.
28 Orlov, T. et al. (2010) Topographic representation of the human body in Neurophysiol. 97, 4296–4309
the occipitotemporal cortex. Neuron 68, 586–600 57 Tsao, D.Y. et al. (2006) A cortical region consisting entirely of face-
29 Peelen, M.V. and Caramazza, A. (2010) What body parts reveal about selective cells. Science 311, 670–674
the organization of the brain. Neuron 68, 331–333 58 Parr, L.A. et al. (2009) Face processing in the chimpanzee brain. Curr.
30 Dehaene, S. et al. (2005) The neural code for written words: a proposal. Biol. 19, 50–53
Trends Cogn. Sci. 9, 335–341 59 Op de Beeck, H. et al. (2001) Inferotemporal neurons represent low-
31 Martin, A. (2006) Shades of Déjerine – forging a causal link between the dimensional configurations of parameterized shapes. Nat. Neurosci. 4,
visual word form area and reading. Neuron 50, 173–190 1244–1252
32 Bar, M. and Aminoff, E. (2003) Cortical analysis of visual context. 60 Kriegeskorte, N. et al. (2008) Matching categorical object
Neuron 38, 347–358 representations in inferior temporal cortex of man and monkey.
33 Riesenhuber, M. (2007) Appearance isn’t everything: news on object Neuron 60, 1126–1141
representation in cortex. Neuron 55, 341–344 61 Mahon, B.Z. and Caramazza, A. (2008) A critical look at the embodied
34 Op de Beeck, H.P. et al. (2006) Discrimination training alters object cognition hypothesis and a new proposal for grounding conceptual
representations in human extrastriate cortex. J. Neurosci. 26, 13025– content. J. Physiol. Paris 102, 59–70
13036 62 Damasio, H. et al. (2004) Neural systems behind word and concept
35 Tanaka, K. et al. (1991) Coding visual images of objects in the retrieval. Cognition 92, 179–229
inferotemporal cortex of the macaque monkey. J. Neurophysiol. 66, 63 Patterson, K. et al. (2007) Where do you know what you know? The
170–189 representation of semantic knowledge in the human brain? Nat. Rev. 8,
36 Felleman, D.J. and Van Essen, D.C. (1991) Distributed hierarchical 976–987
processing in primate visual cortex. Cereb. Cortex 1, 1–47 64 Fang, F. and He, S. (2005) Cortical responses to invisible objects in the
37 Levy, I. et al. (2001) Center-periphery organization of human object human dorsal and ventral pathways. Nat. Neurosci. 8, 1380–1385
areas. Nat. Neurosci. 4, 533–539 65 Almeida, J. et al. (2008) Unconscious processing dissociates along
38 Gauthier, I. et al. (1999) Activation of the middle fusiform ‘face area’ categorical lines. Proc. Natl. Acad. Sci. U.S.A. 105, 15214–15218
increases with expertise in recognizing novel objects. Nat. Neurosci. 2, 66 Goodale, M.A. and Milner, A.D. (1992) Separate visual pathways for
568–573 perception and action. Trends Neurosci. 15, 20–25
39 Rogers, T.T. et al. (2005) Fusiform activation to animals is driven by the 67 Kriegeskorte, N. et al. (2008) Matching categorical object
process, not the stimulus. J. Cogn. Neurosci. 17, 434–445 representations in inferior temporal cortex of man and monkey.
40 Mechelli, A et al. (2006) Semantic relevance explains category effects in Neuron 60, 1126–1141
medial fusiform gyri. Neuroimage 3, 992–1002 68 Farah, M.J. and Rabinowitz, C. (2003) Genetic and environmental
41 Tyler, L.K. et al. (2003) Do semantic categories activate distinct cortical influences on the organization of semantic memory in the brain: Is
regions? Evidence for a distributed neural semantic system. Cogn. ‘‘living things’’ an innate category? Cogn. Neuropsychol. 20, 401–
Neuropsychol. 20, 541–559 408
42 Buchel, C. et al. (1998) A multimodal language region in the ventral 69 New, J. et al. (2007) Category-specific attention for animals reflects
visual pathway. Nature 394, 274–277 ancestral priorities, not expertise. Proc. Natl. Acad. Sci. U.S.A. 104,
43 Pietrini, P. et al. (2004) Beyond sensory images: object-based 16598–16603
representation in the human ventral pathway. Proc. Natl. Acad. Sci. 70 Thorpe, S. et al. (1996) Speed of processing in the human visual system.
U.S.A. 101, 5658–5663 Nature 381, 520–522

103
Opinion

Specifying the self for cognitive


neuroscience
Kalina Christoff1, Diego Cosmelli2, Dorothée Legrand3 and Evan Thompson4
1
Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4 Canada
2
Escuela de Psicologı́a, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, Macul, Santiago, Chile
3
Centre de Recherche en Epistémologie Appliqué (CREA), ENSTA-32, boulevard Victor, 75015 Paris, cedex 15, France
4
Department of Philosophy, University of Toronto, 170 St George Street, Toronto, ON, M5R 2M8 Canada

Cognitive neuroscience investigations of self-experience processing [4,7–10], might not be self-specific, because they
have mainly focused on the mental attribution of fea- are also recruited for a wide range of other cognitive
tures to the self (self-related processing). In this paper, processes – recall of information from memory, inferential
we highlight another fundamental, yet neglected, aspect reasoning, and representing others’ mental states [3,5,6].
of self-experience, that of being an agent. We propose In addition, the PCC appears to be engaged in attentional
that this aspect of self-experience depends on self-spec- processes and might be a hub for attention and motivation
ifying processes, ones that implicitly specify the self by [11,12], whereas the TPJ is important for attentional
implementing a functional self/non-self distinction in reorienting [13]. Hence, describing these regions (singly
perception, action, cognition and emotion. We describe or collectively) as self-specific could be unwarranted [3,5,6].
two paradigmatic cases – sensorimotor integration and Second, studies employing self-related processing ap-
homeostatic regulation – and use the principles from proach self-experience through the self-attribution of men-
these cases to show how cognitive control, including tal and physical features, and thereby focus on the self as
emotion regulation, is also self-specifying. We argue that an object of attribution and not the self as the knowing
externally directed, attention-demanding tasks, rather subject and agent. To invoke James’ [14] classic distinction,
than suppressing self-experience, give rise to the self- this paradigm targets the ‘Me’ – the self as known through
experience of being a cognitive–affective agent. We con- its physical and mental attributes – and not the ‘I’ – the self
clude with directions for experimental work based on as subjective knower and agent. Thus, relying exclusively
our framework. on this paradigm would limit the cognitive neuroscience of
self-experience to self-related processing (the ‘Me’), to the
neglect of the self-experience of being a knower and agent
Investigating self-experience in cognitive neuroscience (the ‘I’) [6,15].
How does the embodied brain give rise to self-experience? In this paper, we focus on the ‘I’ – experiencing oneself as
This question, long addressed by neurology [1] and neuro- the agent of perception, action, cognition and emotion – and
physiology [2], now attracts strong interest from cognitive
neuroscience and the neuroimaging community [3–6]. Glossary
Recent neuroimaging studies have investigated self- Cognitive control: the process by which one focuses and sustains attention on
experience mainly by employing paradigms that contrast task-relevant information and selects task-relevant behavior.
Emotion regulation: the process by which one influences one’s experience and
self-related with non-self-related stimuli and tasks. Such expression of emotion.
paradigms aim to reveal the cerebral correlates of ‘self- Homeostatic regulation: the process of keeping vital organismic parameters
related processing’ (see Glossary). Recent reviews identify within a given dynamical range despite external or internal perturbations.
‘I’ versus ‘Me’: experiencing oneself as subjective knower and agent versus
several brain regions that appear most consistently acti- experiencing oneself as an object of perception or self-attribution.
vated in self-related paradigms such as assessing one’s Self-related processing: processing requiring one to evaluate or judge some
personality, physical appearance or feelings; recognizing feature in relation to one’s perceptual image or mental concept of oneself.
Self-specific: a component or feature that is exclusive (characterizes oneself
one’s face; or detecting one’s first name (see [4,6] for and no one else) and noncontingent (changing or losing it entails changing or
extensive reviews). The medial prefrontal cortex (mPFC) losing the distinction between self and non-self).
and the precuneus/posterior cingulate cortex (Precuneus/ Self-specifying: any process that specifies the self as subject and agent by
implementing a functional self/non-self distinction.
PCC) are the most frequently discussed [4–10], but two Sensorimotor integration: the mechanisms by which sensory information is
additional regions, the temporoparietal junction (TPJ) and processed to guide motor acts, and by which motor acts are guided to facilitate
temporal pole, are also consistently activated [6]. sensory processing.
Task-negative/default-network brain regions: regions exhibiting sustained
Although these studies have contributed valuable infor- functional activity during rest but showing consistent deactivations during
mation about the neural correlates of self-related proces- externally directed, attention-demanding tasks. Such regions include the
precuneus/posterior cingulate cortex, medial prefrontal cortex and bilateral
sing, two issues have recently arisen [3,6]. First,
temporoparietal junction.
the identified regions, especially the midline regions Task-positive brain regions: regions consistently activated during externally
(mPFC, Precuneus/PCC) often associated with self-related directed, attention-demanding tasks. Such regions include the intraparietal
sulcus, frontal eye field, middle temporal area, lateral prefrontal cortex and
dorsal anterior cingulate.
Corresponding author: Thompson, E. (evan.thompson@utoronto.ca).

104 1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.001 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

we propose a theoretical framework that links this type of ception and action [1,20,21] (Figure 1b). These investiga-
self-experience to a wide range of neuroscientific findings tions have focused on bodily awareness in sensorimotor
at different levels of neural functioning. integration [20,21] and homeostatic regulation [1,22,23].
According to our proposal, experiencing oneself as an Central to this approach is the notion that the organism
agent depends on the existence of specific types of dynamic constantly integrates efferent and afferent signals in a way
interactive processes between the organism and its envi- that distinguishes fundamentally between reafference –
ronment. We call these processes ‘self-specifying’ because afferent signals arising as a result of the organism’s own
they implement a functional self/non-self distinction that efferent processes (self) – and exafference – afferent signals
implicitly specifies the self as subject and agent [6,16]. To arising as a result of environmental events (non-self). By
illustrate the basic principles of self-specifying processes, implementing this functional self/non-self distinction, ef-
we describe two paradigmatic examples – sensorimotor ferent–afferent integration implicitly specifies the self as a
integration and homeostatic regulation – that underlie bodily agent [6,16,21].
the self-experience of being a bodily agent. We then argue
that although externally directed attention-demanding Sensorimotor integration
tasks can compromise self-related processing [7–10,17– The notion of self-specifying processes is easiest to illus-
19], such tasks can be expected to enhance another funda- trate through the systematic linkage of sensory and motor
mental type of self-experience, namely that of being a processes in the perception–action cycle (Box 1). An organ-
cognitive–affective agent [6,15,16]. In support of this point, ism needs to be able to distinguish between sensory
and to show how cognitive neuroscience can begin to model changes arising from its own motor actions (self) and
this type of self-experience, we apply the concept of self- sensory changes arising from the environment (non-self).
specifying processes to cognitive control, including emotion The central nervous system (CNS) distinguishes the two by
regulation. We conclude with suggestions for future exper- systematically relating the efferent signals (motor com-
imental work based on our framework. mands) for the production of an action (e.g. eye, head or
hand movements) to the afferent (sensory) signals arising
Self-experience as arising from self-specifying from the execution of that action (e.g. the flow of visual or
processes haptic sensory feedback). According to various models
Many neuroimaging studies have focused on the type of going back to Von Holst [24], the basic mechanism of this
self-experience that occurs when a person directs his or her integration is a comparator that compares a copy of the
attention away from the external world (e.g. when task motor command (information about the action executed)
demands are low, when performing a self-reflective task or with the sensory reafference (information about the senso-
during rest) [7–10,17] (Figure 1a). At the same time, other ry modifications owing to the action) [25]. Through such a
lines of investigation concerned with embodied experience mechanism, the organism can register that it has executed
have examined self-experience during world-directed per- a given movement, and it can use this information to
[()TD$FIG]

(a) (b)

OR

TRENDS in Cognitive Sciences

Figure 1. Two types of self-experience. (a) The ‘Me’ or self-related processing (here depicted as self-recognition and reflective thinking about oneself). Its neural substrates
are thought to be restricted to a subset of midline cortical regions (mPFC and Precuneus/PCC). It is also thought to compete for cognitive resources when some aspect of the
world demands attention. (b) The ‘I’ as embodied agent. This type of self-experience arises from the integration of efferent and reafferent processes, notably sensorimotor
integration (green loop) and homeostatic regulation (red loop), as well as possible higher level efferent–reafferent regulatory loops such as the one instantiated by cognitive
control processes (blue loop). Such regulatory loops implement a functional self/non-self distinction that implicitly specifies the self as agent. This type of self-experience
implicitly occurs during attention-demanding interactions with the environment (black arrows).

105
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Box 1. Self-experience and sensorimotor integration


The self-experience of being an embodied agent depends on the tive exists, regardless of the properties of the represented content
sensorimotor mechanisms that integrate efference with reafference [6,15,16,21].
(Figure I). A basic level mechanism allows efferences to be system- The original mechanism of sensorimotor integration (Figure I) can
atically related to their reafferent consequences. This anchoring of be elaborated to include higher level comparators between intended,
efference to reafference implements a functional self/non-self distinc- predicted and actual reafference (Figure II). For example, Wolpert and
tion that implicitly specifies the self as a bodily agent [6,21]. colleagues [25] described a two-process model of action monitoring.
For example, consider the motor act of biting a lemon and the The first process (Figure II, left) uses the motor command and the
resulting taste. This experience is characterized by (i) a specific current state estimate to achieve a next state estimate using the
content (lemon, not chocolate); (ii) a specific mode of presentation forward model (or a prediction) to simulate the arm’s dynamics. The
(tasting, not seeing); and (iii) a specific perspective (my experience of second process (Figure II, right) uses the difference between expected
tasting). The process of relating an efference (the biting) to a and actual sensory feedback to correct the forward model’s next state
reafference (the resulting taste of acidity) is what allows the estimate. Through such sophisticated comparators, the model can
perception to be characterized not only by a given content (the handle higher level phenomena, such as intentions, predictions,
acidity) but also by a self-specific perspective (I am the one [()TD$FIG]
mental simulation and goals [20].
experiencing the acidity of the lemon juice) [6,21].
The agent’s perspective is thus a central concept within this
framework. Although the basic sensorimotor integration processes Next state estimate
do not involve any representation of the self per se, they are estimate
nonetheless self-specifying [6] because they implement a unique
egocentric perspective in perception and action, and thus implicitly
specify the self as subject and agent of that perspective. According to Sensory discrepancy/
this view, self-experience is present whenever a self-specific perspec- Comparator
[()TD$FIG] state correction

Self

Sensorimotor integration

Actual
Comparator Predicted Comparator reafference
Reafference
next state
(Forward model)

Predicted
Efference copy reafference

Motor command Effector

External world
Current state estimate
Motor
TRENDS in Cognitive Sciences
command
TRENDS in Cognitive Sciences
Figure I. Sensorimotor integration
Comparator mechanism for relating efferent signals to reafferent sensory feedback. Figure II. Two-process model of action monitoring (Ref. [25]).

process the resulting sensory reafference. The crucial point internal state are continually coupled with corresponding
for our purposes is that reafference is self-specific, because efferent regulatory processes that keep afferent param-
it is intrinsically related to the agent’s own action (there is eters within a tight domain of possible values [1,22,23].
no such thing as a non-self-specific reafference). Thus, by Reafferent–efferent loops from spinal nuclei to brainstem
relating efferent signals to their afferent consequences, the nuclei and midbrain structures are involved in somato-
CNS marks the difference between self-specific (reafferent) autonomic adjustments; these loops are modulated by the
and non-self-specific (exafferent) information in the per- hypothalamus as well as mid/posterior insula (sensory)
ception–action cycle. In this way, the CNS implements a and anterior cingulate (motor) cortices [23]. This vertically
functional self/non-self distinction that implicitly specifies integrated, interoceptive homeostatic system specifies the
the self as the perceiving subject and agent. self as a bodily agent by maintaining the body’s integrity
(self) in relation to the environment (non-self) [22], and by
Homeostatic regulation supporting the implicit feeling of the body’s internal con-
Self-specifying reafferent–efferent processes are key com- dition in perception and action [23].
ponents of homeostatic regulation, which implements the
self/non-self distinction at the basic level of life preserva- Specifying the self as knowing subject and agent
tion [1,16,22,23]. To ensure the organism’s survival The reafferent–efferent processes just described specify the
through changing internal and external conditions, affer- self not as an object of perception or attribution (the ‘Me’)
ent signals conveying information about the organism’s but as the experiential subject and agent of perception,

106
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

action and feeling (the ‘I’). Sensorimotor integration based partly on findings from a growing number of studies
specifies a unique perceptual perspective on the world, examining spontaneous fluctuations in the fMRI signal
whereas homeostatic regulation specifies a unique affec- during task-free, resting-state conditions [29]. These find-
tive perspective based on the inner feeling of one’s body. ings have distinguished between (i) task-positive regions
The resulting perspective is self-specific in the strict sense (e.g. dorsolateral PFC, inferior parietal cortex and supple-
of being both exclusive (it characterizes oneself and no one mentary motor area), whose activity increases during ex-
else) and noncontingent (changing or losing it entails ternally oriented attention and (ii) task-negative/default-
changing or losing the distinction between self and non- network regions (e.g. mPFC, Precuneus/PCC and TPJ),
self) [6]. In the general case, ‘I’ perceive and act from my whose activity decreases across a wide variety of tasks.
self-specific perspective while implicitly experiencing my- These task-positive and task-negative networks also ap-
self as perceiver and agent. In some particular cases, what pear to be anticorrelated in their spontaneous activity
‘I’ perceive is ‘Me’, such as when I visually recognize myself. during the resting state [30], so that increased activity
Although many non-human animals can implicitly experi- in one network has been noted to correlate with decreased
ence themselves as embodied agents through the types of activity in the other [17–19].
self-specifying sensorimotor and homeostatic processes A prominent interpretation of these findings is that the
described above [26], only humans and a few other species brain alternates dynamically between a task-oriented,
seem capable of self-recognition [27], and thus of experien- externally directed state and a task-independent, self-di-
tially relating the ‘I’ and the ‘Me’. What we emphasize here rected state, with self-experience in the form of self-related
is that whereas the ‘Me’ consists in the features one processing mainly occurring during the task-independent,
perceives as belonging to oneself, the ‘I’ consists in the self-directed state [8–10,18,19]. A wide variety of studies
self-specific, agentive perspective from which such percep- have been taken to support this interpretation; these
tions occur; hence, to explain the ‘I’ we need to explain how studies indicate that externally oriented, attention-de-
such a perspective is implemented. Our proposal is that the manding tasks, which are considered to suppress intro-
reafferent–efferent processes of sensorimotor integration spective thoughts, tend to suspend default-network
and homeostatic regulation implement a self-specific, activity, whereas resting conditions, as well as practiced
agentive perspective at the bodily level of perception and tasks that do not suppress introspective thoughts, corre-
feeling. late with an active default network (see [31] for a compre-
This model predicts that if a brain process involves only hensive review). Additional support is thought to come
afference without a matching efference/reafference, it will from the finding that tasks requiring individuals to make
not specify the organism as subject or agent, and thus will explicit reference to some aspect of themselves implicate
not constitute a self-specifying process. For example, the medial prefrontal regions also active as part of the default
‘feedforward sweep’ in visual processing from early visual network [4,5,26,31]. Hence, it has been proposed, on the
areas to extrastriate areas, which Lamme [28] argues is one hand, that self-experience is largely absent during
not accompanied by conscious awareness, would not quali- world-directed attention (because self-related processing
fy as self-specifying, whereas ‘recurrent processing’ in is strongly suppressed) [17], and, on the other hand, that
multiple visual areas, which Lamme argues is associated during rest conditions, subjects mainly engage in self-
with ‘phenomenal awareness’ (short-lived awareness that referential processing [7–10].
is not necessarily reportable), would qualify as self-speci- This conclusion, however, rests on the following
fying only if linked to matching efference/reafference. Our assumptions: (i) the main way to experience the self is
model thus allows that non-self-specifying processes occur as an object of one’s attention (i.e. through self-related
in parallel with self-specifying ones, and it leaves open the processing); (ii) self-reflective, introspective processes are
question whether there exist conscious processes that do linked to task-negative/default-network regions; and (iii)
not include even minimal self-specification (as Lamme’s the brain is organized into a dynamic system of task-
proposal suggests) or whether every conscious process is positive regions subserving world-directed attention and
also minimally self-specifying (as others have argued task-negative/default regions subserving self-directed at-
[15]). tention, with these two networks acting in opposition so
Given this model, we next consider the view, prevalent that recruitment of one suppresses the other.
in the recent neuroimaging literature [7–10,17–19], that Each of these assumptions, however, needs qualification
self-experience is suppressed during externally directed, in light of the recent theoretical literature and empirical
attention-demanding tasks. We argue that this view needs findings.
qualification to take into account the self-experience of First, treating self-related processing as the main form
being a cognitive–affective agent. of self-experience limits self-experience to the ‘Me’ (self as
object of one’s attention) while neglecting the ‘I’ (self as
Is self-experience suppressed during world-directed knowing subject and agent). For example, if the agentic ‘I’
attention? is considered at the bodily level of sensorimotor integra-
One outcome of functional magnetic resonance imaging tion, then task-positive regions such as the supplementary
(fMRI) studies using self-related processing as the main motor cortex and inferior parietal cortex could be viewed as
paradigm for understanding self-experience is the view crucial to self-experience, for these regions serve to imple-
that self-experience occurs mostly when individuals are ment sensorimotor integration tasks [25,32,33]. More gen-
not preoccupied with externally oriented tasks and that it erally, although world-directed attention can suppress self-
is suppressed when such tasks do occur [7–10]. This view is related processing, one cannot conclude that it suppresses

107
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

every form of self-experience, especially the self-experience habitual or otherwise prepotent responses. For example, in
of being a cognitive agent (which it can instead enhance). a Stroop task, the goal is to name the ink color of a printed
Second, self-referential and introspective processes color name while ignoring the word’s meaning. Individuals
have also been linked to recruitment of regions outside are slower to respond when the information is incongruent
the default network. For example, self-related processing (e.g. the word RED is printed in blue ink) than when it is
activates the temporopolar cortex as consistently as the congruent (e.g. the word RED is printed in red ink), and the
three main default network regions (mPFC, Precuneus/ slower response time is taken to reflect the need for higher
PCC and TPJ) [34], and is also frequently associated with attentional control when a conflict in perceptual informa-
activations in the insula and lateral PFC [6]. Furthermore, tion is present.
introspective mental processes have been linked to a re- According to the influential ‘conflict-monitoring model’
cruitment of the anterior portion of the lateral PFC, name- [43], cognitive control is implemented through a regulatory
ly the rostrolateral PFC [35–37], which is considered to be conflict–control loop consisting of two components. An
part of a cognitive control network separable from the evaluative or conflict-monitoring component detects con-
default network [38]. These findings indicate that self- flicts in the information available for task performance,
referential processing is not uniquely associated with whereas a regulative component exerts a top-down biasing
task-negative/default-network regions. Therefore, reduced influence on the cognitive and motor processes required for
or inhibited activity in default network regions does not task performance. At the neural level, the dorsal anterior
necessarily indicate that self-directed introspective pro- cingulate cortex (dACC) has been proposed to support the
cesses are suppressed, because they can be implemented evaluative process of conflict monitoring [43,44], whereas
through regions outside the default network. lateral PFC regions have been proposed to underlie the
Finally, recent studies have begun to qualify the picture regulative process of cognitive control [43,45]. This model
of task-positive and task-negative/default networks as predicts that strong ACC activity should be followed by
invariably acting in opposition to each other. A parallel behavior reflecting relatively focused attention, and weak
recruitment of task-positive and task-negative/default-net- ACC activity by behavior reflecting less focused attention.
work regions has been observed during several tasks, such In keeping with this prediction, Kerns and colleagues [46]
as passive sensory stimulation [39], continuous movie found that high dACC activation for incongruent trials in
viewing [40], narrative speech comprehension [41], auto- the Stroop task was followed by low interference on the
biographical planning [42] and mind wandering during a subsequent trial, as well as by strong activation in dorso-
sustained attention task [36]. These diverse findings sug- lateral PFC. These findings suggest that the dACC could
gest that characterizing brain activity as either task-posi- signal the need for control adjustments to lateral PFC and
tive/world-directed or task-negative/self-directed is thereby strengthen cognitive control [45].
incomplete. Rather, such neural recruitments and cogni- Our aim in describing the conflict-monitoring model
tive processes can occur in parallel. is not to endorse it against other important models of
In contrast to the view that attention-demanding tasks cognitive control [47–49] or ACC functioning [50,51]. In
suppress self-experience, we propose that such tasks can particular, we do not suppose that dACC is involved in
be expected to enhance the self-experience of being a cognitive but not emotional functions, whereas ventral
cognitive–affective agent. An outstanding task for cogni- ACC does the reverse [52], because recent experimental
tive neuroscience is to integrate this type of self-experience findings and theoretical considerations argue against both
and self-related processing into an overarching explanato- this particular cognitive–affective division [53] as well as
ry framework that can guide empirical research. In the emotion–cognition separations more generally in the brain
next section, we propose what we believe is a crucial and behavior [53,54]. Instead, we use the model to illus-
element of such a framework. By describing how the trate how cognitive–control processes can be self-specify-
concept of self-specifying processes can be applied to cog- ing.
nitive control, including emotion regulation, we argue that For the purposes of the present argument, the key
cognitive–affective processes instantiate the self-experi- feature of the conflict-monitoring model is the functional
ence of being a cognitive–affective agent. In this way, we distinction between a regulatory function and an evalua-
show how cognitive neuroscience can investigate this type tive function. The control loop comprising these two func-
of self-experience by including paradigms involving atten- tions (Figure 2) strongly resembles the integration of
tion to the external world. efferent and reafferent information during sensorimotor
processing, with the regulative component corresponding
Self-specifying processes during attention-demanding to efferent influence and the evaluative component corre-
tasks sponding to a reafferent process. We propose that such a
Can cognitive control processes in affectively neutral con- regulative–evaluative loop can implement a functional self/
texts and affectively arousing contexts implicitly specify non-self distinction between, on the one hand, reafferent
the self as a cognitive–affective agent? signals about modifications in level of conflict resulting
from one’s own cognitive–control efforts (self), and, on the
Cognitive control processes in affectively neutral other hand, exafferent signals about the level of conflict
contexts resulting from environmental sources such as stimulus
Cognitive control processes serve both to focus attention on properties (non-self). By implementing this self-specific,
task-relevant information versus other competing sources agentive perspective in cognitive control, the regulatory
of information and to select task-relevant behavior over conflict–control loop would implicitly specify the self as a

108
()TD$FIG][ Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Conflict detection regulation and self-related processing have often been


Dorsal ACC Modified level
Evaluation/re-afference of conflict linked by pointing to their common reliance on midline
cortical structures [61,62], we propose that another funda-
mental but less explored link between self-experience and
emotion regulation can be found in how emotion regulation
Biasing influence Posterior brain processes are also self-specifying.
Lateral PFC
Regulation/efference regions Recent discussions have proposed a distinction between
TRENDS in Cognitive Sciences two main forms of emotion regulation – a deliberate or
voluntary form, and an implicit or incidental form
Figure 2. Cognitive control as a self-specifying process. The conflict-monitoring
model of cognitive control [43] depicted as implementing a possible efferent/
[60,61,63–65]. Deliberate emotion regulation relies on
reafferent regulatory loop. This loop can define the functional self/non-self the same cognitive control mechanisms required for atten-
distinction between reafferent signals resulting from one’s own cognitive control tion-demanding tasks [61]. Thus, tasks requiring reap-
efforts (self) and exafferent signals about the level of conflict resulting from
environmental sources such as stimulus properties (non-self).
praisal – reinterpreting the meaning of a stimulus to
change one’s emotional response to it [60,61] – recruit
dACC and lateral PFC regions [61]. Here these regions
cognitive agent. Note that this cognitive form of self-expe- are thought to subserve explicit reasoning about how the
rience would subsume the self-experience of being an association between a situation and one’s emotional re-
embodied agent resulting from sensorimotor integration, sponse to it can be changed. For example, if one is viewing a
because cognitive control operates on sensorimotor pro- picture of a burn victim in a hospital bed, it might be
cesses themselves, and thus occurs at higher levels of possible to modify the original emotional response of dis-
integration in the perception–action cycle [55]. tress or sadness by focusing on possible positive aspects,
As originally conceived, the cognitive control of atten- such as the victim’s successful progress toward a healthier
tion was closely linked to self-regulation [56,57], includ- state or that the victim survived. Maintaining such
ing the self-experience of being a cognitive agent [57]. descriptions is thought to bias perceptual and associa-
Concern with this link, however, seems to have largely tive-memory systems; these systems in turn send signals
disappeared from the recent cognitive neuroscience liter- to subcortical appraisal systems, such as the amygdala and
ature, possibly because of the assumption that self-expe- ventral striatum [61], and thus indirectly modify the origi-
rience is suppressed during attention-demanding tasks nal emotional response.
[7–10,17–19], as well as the observation that brain regions We propose that such a regulatory–evaluative loop can
associated with cognitive control, such as the lateral PFC implement a functional self/non-self distinction between
and dACC, largely overlap with the task-positive regions the effortful reappraisal process (self) and the target of that
outlined earlier. Indeed, meta-analyses show that the process, namely the emotional scene (non-self). In this way,
lateral PFC and dACC are among the most consistently emotion regulation can implicitly specify the self as the
recruited brain regions across a broad range of attention- cognitive–affective agent engaged in trying to reinterpret
demanding tasks, including perception, response and thereby control an emotional response.
selection, executive control, working memory, episodic Deliberate forms of emotion regulation are associated
memory and problem solving [58,59]. Nevertheless, as not only with dACC and lateral PFC – regions crucially
discussed above, recruitment of these task-positive involved in cognitive control – but also with recruitment of
regions is not mutually exclusive with recruitment of dorsomedial PFC (dmPFC) [61,64,65], a brain region con-
the task-negative/default-network regions. Although in- sidered to support reflective awareness of one’s feelings,
tense engagement in sensorimotor tasks can suppress the and thus to enable higher level, metarepresentations of
task-negative/default-network regions that also subserve one’s own experience [63]. By allowing the maintenance of
self-related processing [17–19], one can envision situa- such emotion-specific metarepresentations, and through
tions (e.g. introspection, envisioning the perspective of its dense interconnections with the ventromedial PFC
others, mind wandering) in which the required mental (vmPFC) [66], the dmPFC can exert a biasing influence
processes call upon resources from both sets of regions on emotion processes during deliberate attempts at emo-
and hence lead to more balanced activations between tion regulation. Thus, by both influencing and re-repre-
them, as indicated by recent results [36,39–42]. Further- senting the emotion processes in more ventral systems, the
more, even in situations where the dACC and lateral PFC dmPFC and its interconnected ventral structures can form
are recruited in opposition to task-negative/default-net- another regulatory–evaluative loop that implicitly speci-
work regions (i.e. with a concomitant deactivation of these fies the self as cognitive–affective agent in effortful emotion
regions), self-experience might still be crucially present in regulation.
the form of the ‘I’ or self-as-cognitive-agent, as a result of In contrast to deliberate emotion regulation, implicit or
cognitive control processes being self-specifying in the incidental emotion regulation has been linked to medial
way just outlined above. regions such as the rostral ACC (rACC), subgenual ACC and
vmPFC [61]. For example, the rACC is associated with
Emotion regulation regulation of attention to emotional (but not non-emotional)
The cognitive and behavioral control of emotion in affec- distracters during an emotional version of the Stroop task
tively arousing or challenging situations [60,61] provides [67,68]. During this task, subjects are not instructed to
another case where we can expect to find the self-experi- regulate their emotions, thus the recruitment of the rACC
ence of being a cognitive–affective agent. Although emotion and its accompanying regulation of emotional attention can

109
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

be considered incidental to the main task [65]. Activation in Box 2. Questions for further research
rACC appears to be accompanied by a simultaneous and
correlated reduction of amygdala activity; this relation sug-  Is the ‘I’ or self as-subject all or nothing, or graded?
gests that resolving emotional conflict depends on a rACC–  When multiple self-specifying processes are activated, does a
stronger sense of ‘I’ occur?
amygdala regulatory loop [67] that also appears to use the  Can self-specifying processes be altered through attentional and
general cognitive monitoring mechanism of the dACC to emotion regulation training?
detect the presence of conflict [68]. Thus, a self-specifying  Do self-specifying processes require higher level remapping of
evaluative–regulatory loop can be formed between rACC efferent–reafferent integration, or can such integration occur
through dynamical mechanisms such as phase synchronization?
and dACC, analogous to that between lateral PFC and
 Can self-specifying processes be identified in neuroimaging data
dACC, but dedicated to the resolution of emotional conflict through functional connectivity measures, and can statistical
through an rACC biasing influence on amygdala activity. measures such as Granger causality be used to identify directional
Furthermore, regions playing a role in deliberate emo- influences in such processes?
tion regulation, such as the dACC and dmPFC [63,64], and  Can self-specifying processes be identified as part of the brain’s
intrinsic functional architecture through intrinsic connectivity
possibly the right ventrolateral PFC [65], also appear to
measures in resting state neuroimaging data?
participate in implicit emotion regulation. For example,  Can transcranial magnetic stimulation interfere selectively in self-
the dACC and dmPFC have autonomic regulatory func- specifying loops and thereby alter cognitive–affective self-experi-
tions mediated by direct neural connections with subcorti- ence?
cal visceromotor centers such as the lateral hypothalamus  Are self-specifying processes altered in psychiatric disorders,
such as schizophrenia or anorexia nervosa, which involve altered
[66]. In addition, neuroimaging studies noting an inverse
self-experience and self-other evaluation?
correlation between medial PFC activity and heart rate
variability suggest that medial PFC activity can have a
tonic inhibitory effect mediated through the vagus nerve
[63]. Based on these findings, researchers have described signals in more posteriorly located motor and sensory
an evaluative–regulatory feedback mechanism, including regions during homeostatic regulation [23]. Similarly, dur-
an equilibration process between bottom-up and top-down ing cognitive control, anteriorly located lateral PFC
interactions, through which the body state is altered as regions, such as the rostrolateral PFC, can remap the
arousal processes become modulated and differentiated second-order comparison between the regulative and eval-
[63]. This mechanism provides another candidate for uative outcomes of processes supported by the more pos-
a self-specifying process at implicit levels of emotion teriorly located dorsolateral PFC and dACC [35]. Such
regulation. hierarchically organized systems can be present at multi-
Given that these candidate self-specifying processes ple neural levels and in multiple functional domains. On
belong to implicit emotion regulation, the functional self/ the other hand, another type of mechanism not requiring
non-self distinction they implement would be closely relat- explicit remapping by dedicated neural structures, but
ed to the one established through homeostatic regulation relying instead on dynamical coupling across multiple
between the feeling body and the environment. Indeed, areas [69] (e.g. through phase synchronization of neuronal
implicit emotion regulation processes overlap conceptually signals [70]), could be responsible for signal integration.
and neurally with the higher levels of the homeostatic Such dynamical mechanisms can also be implemented at
regulation system described earlier [1,22,23,26]. Thus, multiple neural levels and in various functional domains
the self-experience of being an emotional agent that these [69,70]. Whether self-specifying processes depend on either
processes elicit would occur at the level of affect and action or both of these mechanisms is an important issue for
tendencies [26], whereas this bodily level would be sub- future research.
sumed by the self-experience of being a cognitive–affective A second issue concerns the subjective nature of self-
agent in deliberate emotion regulation, analogous to the experience. Although objective measures from experimen-
way the self-experience of being a cognitive agent also tally controlled tasks and uncontrolled rest conditions are
subsumes the self-experience of being an embodied agent certainly useful, we believe a richer understanding of self-
in attention-demanding cognitive tasks. experience requires the incorporation of subjective mea-
sures such as self-reports into neuroimaging protocols
Concluding remarks and future directions [36,71]. Certain questions seem tractable only with such
Using the concept of self-specifying processes, we have an approach. For example, is self-experience all-or-nothing
outlined a model of how cognitive control processes, includ- or graded in character? When multiple self-specifying
ing emotion regulation, implicitly specify the self as a processes are activated at various levels of neural func-
cognitive–affective agent. Our model suggests several tioning, does a stronger sense of self occur than when only a
questions for future investigations (Box 2). We highlight few are recruited? Can mental training of attention and
two issues here. emotion regulation [72,73] alter self-experience and its
One issue concerns the types of neural mechanisms that neural substrates?
integrate the efferent–reafferent and regulatory–evalua- As argued here, how cognitive neuroscience specifies the
tive signals in self-specifying processes. On the one hand, self profoundly shapes our view of self-experience and its
the comparison between efferent and reafferent signals can neural substrates. By broadening our investigations to
be remapped at higher levels by specific neural structures. include the self-experience of being a cognitive agent, we
For example, the anterior insula can serve to remap the can deepen our understanding of how the brain and body
second-order comparison between efferent and reafferent work together to create our sense of self.

110
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Acknowledgments 29 Fox, M.D. and Raichle, M.E. (2007) Spontaneous fluctuations in brain
For helpful comments we thank Norm Farb, Alisa Mandrigin, Luiz activity observed with functional magnetic resonance imaging. Nat.
Pessoa, Rebecca Todd and four anonymous reviewers. K.C. was supported Rev. Neurosci. 8, 700–711
by grants from the Canadian Institutes of Health Research (CIHR MOP 30 Fox, M.D. et al. (2005) The human brain is intrinsically organized into
81188), the Natural Sciences and Engineering Research Council of dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci.
Canada (NSERC) and the Michael Smith Foundation for Health Research U.S.A. 102, 9673–9678
(MSFHR); D.C. by Fondo National de Desarrollo Cientifico y Tecnológico 31 Buckner, R.L. et al. (2008) The brain’s default network: anatomy,
Grant 1090612; and E.T. by the Social Sciences and Humanities Research function, and relevance to disease. Ann. N. Y. Acad. Sci. 1124, 1–38
Council of Canada. 32 Andersen, R.A. and Buneo, C.A. (2003) Sensorimotor integration in
posterior parietal cortex. Adv. Neurol. 93, 159–177
References 33 Haggard, P. and Whitford, B. (2004) Supplementary motor area provides
1 Damasio, A.R. (1999) The Feeling of What Happens, Harcourt an efferent signal for sensory suppression. Cogn. Brain Res. 19, 52–58
2 Llinas, R. (2001) The I of the Vortex, MIT Press 34 Christoff, K. et al. (2004) Neural basis of spontaneous thought
3 Gillihan, S. and Farah, M. (2005) Is self special? A critical review of processes. Cortex 40, 623–630
evidence from experimental psychology and cognitive neuroscience. 35 Christoff, K. and Gabrielli, J.D.E. (2000) The frontopolar cortex and
Psychol. Bull. 131, 76–97 human cognition: evidence for a rostrocaudal hierarchical organization
4 Northoff, G. et al. (2006) Self-referential processing in our brain – a within the human prefrontal cortex. Psychobiology 28, 168–186
meta-analysis of imaging studies on the self. Neuroimage 31, 440– 36 Christoff, K. et al. (2009) Experience sampling during fMRI reveals
457 default network and executive system contributions to mind
5 Uddin, L.Q. et al. (2007) The self and social cognition: the role of cortical wandering. Proc. Natl. Acad. Sci. U.S.A. 106, 8719–8724
midline structures and mirror neurons. Trends Cogn. Sci. 11, 153–157 37 McCaig, R.G. et al. (2010) Improved modulation of rostrolateral
6 Legrand, D. and Ruby, P. (2009) What is self-specific? A theoretical prefrontal cortex using real-time fMRI and meta-cognitive
investigation and critical review of neuroimaging results. Psychol. Rev. awareness. Neuroimage [Epub ahead of print].
116, 252–282 38 Vincent, J.L. et al. (2008) Evidence for a frontoparietal control system
7 Gusnard, D.A. et al. (2001) Medial prefrontal cortex and self-referential revealed by intrinsic functional connectivity. J. Neurophysiol. 100,
mental activity: relation to a default mode of brain function. Proc. Natl. 3328–3342
Acad. Sci. U.S.A. 98, 4259–4264 39 Greicius, M.D. and Menon, V. (2004) Default-mode activity during a
8 Gusnard, D.A. (2005) Being a self: considerations from functional passive sensory task: uncoupled from deactivation but impacting
imaging. Conscious. Cogn. 14, 679–697 activation. J. Cogn. Neurosci. 16, 1484–1492
9 Wicker, B. et al. (2003) A relation between rest and self in the brain? 40 Golland, Y. et al. (2007) Extrinsic and intrinsic systems in the posterior
Brain Res. Rev. 43, 224–230 cortex of the human brain revealed during natural sensory stimulation.
10 Schneider, F. et al. (2008) The resting brain and our self: self- Cereb. Cortex 17, 766–777
relatedness modulates resting state neural activity in cortical 41 Wilson, S.M. et al. (2008) Beyond superior temporal cortex: intersubject
midline structures. Neuroscience 157, 120–131 correlations in narrative speech comprehension. Cereb. Cortex 18, 230–
11 Mohanty, A. et al. (2008) The spatial attention network interacts with 242
limbic and monoaminergic systems to modulate motivation-induced 42 Spreng, R.N. et al. (2010) Default network activity, coupled with the
attention shifts. Cereb. Cortex 18, 2604–2613 frontoparietal control network, supports goal-directed cognition.
12 Engelmann, J.B. et al. (2009) Combined effects of attention and Neuroimage 53, 303–317
motivation on visual task performance: transient and sustained 43 Botvinick, M.M. et al. (2001) Conflict monitoring and cognitive control.
motivational effects. Front. Hum. Neurosci. 3, 1–17 Psychol. Rev. 108, 624–652
13 Corbetta, M. et al. (2000) Voluntary orienting is dissociated from 44 Botvinick, M.M. et al. (2004) Conflict monitoring and anterior cingulate
target detection in human posterior parietal cortex. Nat. Neurosci. cortex: an update. Trends Cogn. Sci. 8, 539–546
3, 292–297 45 Miller, E.K. and Cohen, J.D. (2001) An integrative theory of prefrontal
14 James, W. (1890/1981) The Principles of Psychology, Harvard cortex function. Annu. Rev. Neurosci. 24, 167–202
University Press 46 Kerns, J.G. et al. (2004) Anterior cingulate conflict monitoring and
15 Legrand, D. (2007) Pre-reflective self-as-subject from experiential and adjustments in control. Science 303, 1023–1026
empirical perspectives. Conscious. Cogn. 16, 583–599 47 Enger, T. (2008) Multiple conflict-driven control mechanisms in the
16 Thompson, E. (2007) Mind in Life, Harvard University Press human brain. Trends Cogn. Sci. 12, 374–380
17 Goldberg, I.I. et al. (2006) When the brain loses its self: prefrontal 48 Vergut, T. and Notebaert, M. (2009) Adaptation by binding: a learning
inactivation during sensorimotor processing. Neuron 50, 329–339 account of cognitive control. Trends Cogn. Sci. 13, 252–257
18 Fransson, P. (2005) Spontaneous low-frequency BOLD signal 49 Mayr, U. and Ach, E. (2009) The elusive link between conflict and
fluctuations: an fMRI investigation of the resting-state default mode conflict adaptation. Psychol. Res. 73, 794–802
of brain function hypothesis. Hum. Brain Mapp. 26, 15–29 50 Rushworth, M.F.S. et al. (2007) Contrasting roles for anterior cingulate
19 Fransson, P. (2006) How default is the default mode of brain function? and orbitofrontal cortex in decisions and social behaviour. Trends
Further evidence from intrinsic BOLD signal fluctuations. Cogn. Sci. 11, 169–176
Neuropsychologia 44, 2836–2845 51 Etkin, A. et al. (2010) Emotional processing in anterior cingulate and
20 Blakemore, S-J. and Frith, C. (2003) Self-awareness and action. Curr. medial prefrontal cortex. Trends Cogn. Sci. DOI: 10.1016/j.tics.2010.
Opin. Neurobiol. 13, 219–224 11.004
21 Legrand, D. (2006) The bodily self: the sensori-motor roots of pre- 52 Bush, G. et al. (2000) Cognitive and emotional influences in anterior
reflexive self-consciousness. Phenom. Cogn. Sci. 5, 89–118 cingulate cortex. Trends Cogn. Sci. 4, 215–222
22 Parvizi, J. and Damasio, A.R. (2001) Consciousness and the brainstem. 53 Pessoa, L. (2008) On the relationship between emotion and cognition.
Cognition 79, 135–160 Nat. Rev. Neurosci. 9, 148–158
23 Craig, A.D. (2009) How do you feel – now? The anterior insula and 54 Pessoa, L. (2010) Emotion and cognition and the amygdala: from ‘what
human awareness. Nat. Rev. Neurosci. 10, 59–70 is it?’ to ‘what’s to be done?’. Neuropsychologia 48, 3416–3429
24 Von Holst, E. (1954) Relations between the central nervous system and 55 Botvinick, M.M. (2007) Multilevel structure in behaviour and in the
the peripheral organs. Br. J. Anim. Behav. 2, 89–94 brain: a model of Fuster’s hierarchy. Philos. Trans. R. Soc. Lond. B
25 Wolpert, D.M. et al. (1995) An internal model for sensorimotor Biol. Sci. 362, 1615–1626
integration. Science 269, 1880–1882 56 Norman, D.A. and Shallice, T. (1986) Attention to action: willed and
26 Northoff, G. and Panksepp, J. (2008) The transpecies concept of self and automatic control of behavior. In Consciousness and Self-regulation.
the subcortical-cortical midline system. Trends Cogn. Sci. 12, 259–264 Advances in Research and Theory (Vol. 4) (Davidson, R.J. et al., eds), In
27 de Waal, F.B.M. (2008) The thief in the mirror. PLoS Biol. 6, e201 pp. 1–18, Plenum Press
28 Lamme, V.A.F. (2003) Why visual awareness and attention are 57 Posner, M.I. and Rothbart, M.K. (1998) Attention, self-regulation and
different. Trends. Cogn. Sci. 7, 12–18 consciousness. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 1915–1927

111
Opinion Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

58 Duncan, J. and Owen, A.M. (2000) Common regions of the human 66 Price, J.L. et al. (1996) Networks related to the orbital and medial
frontal lobe recruited by diverse cognitive demands. Trends Neurosci. prefrontal cortex; a substrate for emotional behavior? Prog. Brain Res.
23, 475–483 107, 523–536
59 Corbetta, M. and Shulman, G. (2002) Control of goal-directed and 67 Etkin, A. et al. (2006) Resolving emotional conflict: a role for the rostral
stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215 anterior cingulate cortex in modulating activity in the amygdala.
60 Gross, J.J. and Thomspon, R.A. (2007) Emotion regulation: conceptual Neuron 51, 871–882
foundations. In Handbook of Emotion Regulation (Gross, J.J., ed.), pp. 68 Egner, T. et al. (2008) Dissociable neural systems resolve conflict from
3–25, Guilford emotional versus nonemotional distracters. Cereb. Cortex 18, 1475–
61 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. 1484
Trends Cogn. Sci. 9, 242–249 69 Bressler, S.L. and Menon, V. (2010) Large-scale brain networks in
62 Northoff, G. (2005) Is emotion regulation self-regulation? Trends Cogn. cognition: emerging methods and principles. Trends Cogn. Sci. 14, 277–
Sci. 9, 408–409 290
63 Lane, R.D. (2008) Neural substrates of implicit and explicit emotional 70 Varela, F.J. et al. (2001) The brainweb: phase synchronization and
processes: a unifying framework for psychosomatic medicine. large-scale integration. Nat. Rev. Neurosci. 2, 229–239
Psychosom. Med. 70, 214–231 71 Jack, A. and Roepstorff, A. (2002) Introspection and cognitive brain
64 Phillips, M.L. et al. (2008) A neural model of voluntary and automatic mapping: from stimulus-response to script-report. Trends Cogn. Sci. 6,
emotion regulation: implications for understanding the 333–339
pathophysiology and neurodevelopment of bipolar disorder. Mol. 72 Lutz, A. et al. (2008) Attention regulation and monitoring in
Psychiatr. 13, 833–857 meditation. Trends Cogn. Sci. 12, 163–169
65 Berkman, E.T. and Lieberman, M.D. (2009) Using neuroscience to 73 Farb, N.A.S. et al. (2007) Attending to the present: mindfulness
broaden emotion regulation: theoretical and methodological meditation reveals distinct neural modes of self-reference. Soc.
considerations. Soc. Pers. Psychol. Comp. 3/4, 475–493 Cogn. Affect. Neurosci. 2, 313–322

112
Review

Songs to syntax: the linguistics of


birdsong
Robert C. Berwick1, Kazuo Okanoya2,3, Gabriel J.L. Beckers4 and Johan J. Bolhuis5
1
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
2
Department of Cognitive and Behavioral Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
3
RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-City, Saitama 351-0198, Japan
4
Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, D-82319 Seewiesen, Germany
5
Behavioural Biology and Helmholtz Institute, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands

Unlike our primate cousins, many species of bird share


Glossary
with humans a capacity for vocal learning, a crucial
factor in speech acquisition. There are striking beha- Bigram: a subsequence of two elements (notes, words or phrases) in a string.
Context-free language (CFL): the sets of strings that can be recognized or
vioural, neural and genetic similarities between audito- generated by a pushdown-stack automaton or context-free grammar. A CFL
ry-vocal learning in birds and human infants. Recently, might have grammatical dependencies nested inside to any depth, but
dependencies cannot overlap.
the linguistic parallels between birdsong and spoken
Finite-state automaton (FSA, FA): a computational model of a machine with
language have begun to be investigated. Although both finite memory, consisting of a finite set of states, a start state, an input
birdsong and human language are hierarchically orga- alphabet, and a transition function that maps input symbols and current states
to some set of next states.
nized according to particular syntactic constraints, bird- Finite-state grammar (FSG): a grammar that formally replicates the structure of
song structure is best characterized as ‘phonological a FSA, also generating the regular languages.
syntax’, resembling aspects of human sound structure. K-reversible finite-state automaton: an FSA that is deterministic when one
‘reverses’ all the transitions so that the automaton runs backwards. One can
Crucially, birdsong lacks semantics and words. Formal ‘look behind’ k previous words to resolve any possible ambiguity about which
language and linguistic analysis remains essential for next state to move to.
the proper characterization of birdsong as a model sys- Language: any possible set of strings over some (usually finite) alphabet of
words.
tem for human speech and language, and for the study of Locally testable language: a strict subset of the regular languages formed by
the brain and cognition evolution. the union, intersection, or complement of strictly locally testable languages.
(First-order) Markov model or process: a random process where the next state
of a system depends only on the current state and not its previous states.
Human language and birdsong: the biological Applied to word or acoustic sequences, the next word or acoustic unit in the
perspective sequence depends only on the current word or acoustic unit, rather than
Darwin [1] noted strong similarities between the ways that previous words or units.
Mildly context-sensitive language (MCSL): a language family that lies ‘just
human infants learn to speak and birds learn to sing. This beyond’ the CFLs in terms of power, and thought to encompass all the known
‘perspective from organismal biology’ [2] initially led to a human languages. A MCSL is distinguished from a CFL in that it contains
focus on apes as model systems for human speech and clauses that can be nested inside clauses arbitrarily deeply, with a limited
number of overlapping grammatical dependencies.
language (see Glossary), with limited success, however Morphology: the possible ‘word shapes’ in a language; that is, the syntax of
[3,4]. Since the end of the 20th century, biologists and words and word parts.
linguists have shown a renewed interest in songbirds, Phoneme: the smallest possible meaningful unit of sound.
Phonetics: the study of the actual speech sounds of all languages, including
revealing fascinating similarities between birdsong and their physical properties, the way they are perceived and the way in which
human speech at the behavioural, neural, genomic and vocal organs produce sounds.
cognitive levels [5–9]. Yip has reviewed the relationship Phonology: the study of the abstract sound patterns of a particular language,
usually according to some system of rules.
between human phonology and birdsong [7]. Here, we Push-down stack automaton (PDA): a FSA augmented with a potentially
address another potential parallel between birdsong and unbounded memory store, a push-down stack, that can be accessed in terms of
a last-in, first-out basis, similar to a stack of dinner plates, with the last element
human language: syntax.
placed on the stack being the top of the stack, and first accessible memory
Comparing syntactic ability across birds and humans is element. PDAs recognize the class of CFLs.
important, because at least since the beginning of the Recursion: a property of a (set of) grammar rules such that a phrase A can
eventually be rewritten as itself with non-empty strings of words or phrase
modern era in cognitive science and linguistics, a combi- names on either side in the form aAb and where A derives one or more words
natorial syntax has been viewed to lie at the heart of the in the language.
distinctive creative and open-ended nature of human lan- Regular language: a language recognized or generated by a FSA or a FSG.
Semantics: the analysis of the meaning of a language, at the word, phrase,
guage [10]. Here, we discuss current understanding of the sentence level, or beyond.
relationship between birdsong and human syntax in light Strictly locally testable language (or stringset): a strict subset of the regular
of recent experimental and linguistic advances, focusing on languages defined in terms of a finite list of strings of length less than or equal
to some upper length k (the ‘window length’).
the formal parallels and their implications for underlying Sub-regular language: any subset of the regular languages, in particular
cognitive and computational abilities. Finally, we sketch generally a strict subset with some property of interest, such as local testability.
the prospects for future experimental work, as part of the Syllable: in linguistics, a vowel plus one or more preceding or following
consonants.
Syntax: the rules for arranging items (sounds, words, word parts or phrases)
into their possible permissible combinations in a language.
Corresponding author: Bolhuis, J.J. (j.j.bolhuis@uu.nl).

1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.002 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3 113
()TD$FIG][ Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Motif
Syllable
Note
10
i i i

Frequency (kHz)
0

0.5 s
TRENDS in Cognitive Sciences

Figure 1. Sound spectrogram of a typical zebra finch song depicting a hierarchical structure. Songs often start with ‘introductory notes’ (denoted by ‘i’) that are followed by
one or more ‘motifs’, which are repeated sequences of syllables. A ‘syllable’ is an uninterrupted sound, which consists of one or more coherent time-frequency traces,
which are called ‘notes’. A continuous rendition of several motifs is referred to as a ‘song bout’.

ongoing debate as to what is species specific about human learning acoustic sequences, and might itself involve ab-
language [3,11]. We show that, although it has a simple stract representations that are not strictly sensorimotor,
syntactic structure, birdsong cannot be directly compared such as stress placement. In current linguistic frameworks,
with the syntactic complexity of human language, princi- (i) aligns with acoustic phonetics and phonology, for both
pally because it has neither semantics nor a lexicon. production and perception. Component (ii) feeds into both
the sensorimotor interface (i), as well as a conceptual–in-
Comparing human language and birdsong tentional system (iii), and is usually described via some
Human speech and birdsong both consist of complex, pat- model of recursive syntax.
terned vocalizations (Figure 1). Such sequential structures Although linguists debate the details of these compo-
can be analysed and compared via formal syntactic methods. nents, there seems to be more general agreement as to the
Aristotle described language as sound paired with meaning nature of (i), less agreement as to the nature of (ii) and
[12]. Although partly accurate, a proper interspecies com- widespread controversy as to (iii). For instance, whereas
parison calls for a more articulated ‘system diagram’ of the the connection between a fully recursive syntax and a
key components of human language, and their non-human conceptual–intentional system is sometimes considered
counterparts. We depict these as a tripartite division to lie at the heart of the species-specific properties of
(Figure 2): (i) an ‘external interface’, a sensorimotor-driven, human language, there is considerable debate over the
input–output system providing proper articulatory output details, which plays out as the distinct variants of current
and perceptual analysis; (ii) a rule system generating cor- linguistic theories [13–16]. Some of these accounts reduce
rectly structured sentence forms, incorporating words; and or even eliminate the role of (ii), assuming a more direct
(iii) an ‘internal interface’ to a conceptual–intentional sys- relation between (i) and (iii) (e.g. [17,18]). The system
tem of meaning and reasoning; that is, ‘semantics’. Compo- diagram in Figure 2 therefore cannot represent any de-
nent (i) corresponds to systems for producing, perceiving and tailed neuroanatomical or abstract ‘wiring diagram’, but
[()TD$FIG]

Words (lexical items)


+
Syntactic rules

External interface Internal interface


Phonological forms/sequencing
acoustic-phonetics

Perception Production

Sounds, gestures Concepts, intentions, reasoning


(external to organism) (internal to organism)
TRENDS in Cognitive Sciences

Figure 2. A tripartite diagram of abstract components encompassing both human language and birdsong. On the left-hand side, an external interface (i), comprised of
sensorimotor systems, links the perception and production of acoustic signals to an internal system of syntactic rules, (ii). On the right-hand side, an internal interface links
syntactic forms to some system of concepts and intentions, (iii). With respect to this decomposition, birdsong seems distinct from human language in the sense of lacking
both words and a fully developed conceptual–intentional system.

114
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

rather a way to factor apart the distinct knowledge types in motifs containing song elements in a fixed order lasting
the sense of Marr [19]. Notably, our tripartite arrangement 0.5–1.5 seconds. Gentner and Hulse [26] found that a first-
does not preclude the possibility that only humans have order Markov model (i.e. bigrams) suffices to describe most
syntactic rules, or that such rules always fix information motif sequence information in starling songs (Box 2). Thus,
content in a language-like manner. For example, in song- for the most part, the next motif is predictable by the
birds, sequential syntactic rules might exist only to con- immediately preceding motif. Starlings also use this infor-
struct variable song element sequences rather than mation to recognize specific song bouts. Similarly, in Amer-
variable meanings per se [9]. ican thrush species, relatively low-order Markov chains
suffice for modelling song sequence variability [27].
Birdsong and human syntax: similarities and differences Can songbird ‘phonological syntax’ [28] ever be more
Both birdsong and human language are hierarchically complex than this? Bengalese finch song typically contains
organized according to syntactic constraints. We compare approximately eight song note types organized into 2–5
them by first considering the complexity of their sound note ‘chunks’ that also follow local transition probabilities
structure, and then turning in the next section, to aspects [29] (Figure I, Box 1). Unlike single-note Markov processes,
beyond this dimension. Overall, we find that birdsong chunks such as the three-note sequence cde can be reused
sound structure, at least for the Bengalese finch, seems in other places in a song [24,30]. However, chunks are not
characterizable by an easily learnable, highly restricted reused inside other chunks, so the hierarchical depth is
subclass of the regular languages (languages that can be strictly limited.
recognized or generated by finite-state machines; see Box If Bengalese finch song could be characterized solely in
3). Whereas human language sound structure also appears terms of bigrams, it would belong to the class of so-called
to be describable via finite-state machines, comparable ‘strictly locally 2-testable languages’, a highly restricted
results are lacking in the case of human language, al- subset of the class of the regular languages, That is, a bird
though certain parts of human language sound structure, could verify, either for purposes of production or for recog-
such as stress patterns, have also recently been shown to nition, whether a song is properly formed by simply ‘slid-
be easily learnable [20]. ing’ a set of two-note sequences or ‘window constraints’
In birdsong, individual notes can be combined as par- across the entire note sequence, checking to see that all the
ticular sequences into syllables, syllables into ‘motifs’, and two-note sequences found ‘pass’ (Box 3). For example, if the
motifs into complete song ‘bouts’ (Figure 1). Birdsong thus valid note sequences were ab, abab, ababab, and so on,
consists of chains of discrete acoustic elements arranged in then every a must be followed by a b, except at the song
a particular temporal order [21–23]. Songs might consist of start; and every b must be followed by an a, except at the
fixed sequences with only sporadic variation (e.g. zebra song end. Thus, aside from the beginning and end of a song,
finches), or more variable sequences (e.g. nightingales, a bird could check whether a song is well formed by using
starlings, or Bengalese finches), where a song element two bigram templates: [a-b] and [b-a]. This turns out to be
might be followed by several alternatives, with overall the simplest kind of pattern recognizable by a finite-state
song structure describable by probabilistic rules between automaton (FSA), because the internal states of the au-
a finite number of states [23,24] (Figure I, Box 1). For tomaton need not be used for any detailed computation
example, a song of a nightingale is built out of a fixed 4- aside from bigram note template matching (Box 3).
second note sequence. An individual nightingale has 100– The Bengalese finch song automaton in Figure I (Box 1),
200 song types, clustered into 2–12 ‘packages’. Package which encompasses the full song sequence repertoire
singing order remains probabilistic [25]. A starling song extracted from a single, actual bird [31], indicates that
bout might last up to 1 minute, composed of many distinct birdsong structure can be more complicated than a simple

Box 1. Birdsong, human language syntax and the Chomsky hierarchy


All sets of strings, or languages, can be rank ordered via strict set- can be accessed from the top working down. PDAs can be thought
inclusion according to their computational power. The resulting of as augmenting FSA with the ability to use subroutines, yielding
‘rings’ are called the ‘Chomsky hierarchy’ [61] (Figure I; ring the recursive transition networks. Grammars for these languages
numbers are used below). For birdsong and human syntax are consequently more general and can include rules such as
comparisons, the most important point is the small overlap between X!Ya, X!aYa or X!aXa, or context-free rules.
the possible languages generated by human syntax (the irregular- 4. The PDA whose stacks might themselves be augmented with
shaded grey set), as opposed to birdsong syntax (the stippled embedded stacks, generating the MCSLs. Examples of such
grey set). patterns in human languages are rare, but do exist [63,64]. These
1. The finite languages, all sets of strings of finite length. patterns are exemplified by stringsets such as anbmcndm, where the
2. The FSA generating the regular languages. An FSA is represented as and cs must match up in number and order and, separately, the
as a directed graph of states with labelled edges, a finite-state bs and the cs, so-called ‘cross-serial’ dependencies (see [65,66]). A
transition network. The corresponding grammar of an FSA has broad range of linguistic theories accommodate this much
rules of the form X!aY or X!a, or right-linear, where X and Y complexity [13–16,59,66]. No known human languages require
range over possible automaton states (nonterminals), and a ranges more power than this. The two irregular sets drawn cutting across
over symbols corresponding to the labelled transitions between the hierarchy depict the probable location of the human languages
states. The FSA recognizing the (ab)1 language only need to test for (shaded) and birdsong (stippled). Both clearly do not completely
four specific adjacent string symbol pairs (bigrams; the pairs (left- subsume any of the previously mentioned stringsets. Birdsong and
edge, a); (a,b); (b, a); and (b, right-edge) [62]. human languages intersect at the very bottom owing to the
3. The PDA, generating the CFLs. PDAs are finite-state machines possible overlap of finite lists of human words and the vocal
augmented with a potentially unbounded auxiliary memory that repertoire of certain birds.

115
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
[()TD$FIG]

6 Recursively enumerable
languages

5 Context-sensitive languages
anbncndnen

? 4 Mildly context-sensitive
languages
anbncndn

a1 a2 a3 a4 b1 b2 b3 b4

Jon Mary Peter Jane lets help teach swim

3 Context-free languages
Human languages anbn

a1 a2 b2 b1
the starling the cats want was tired

2 Regular languages
Bengalese finch song

ab
ab cde fg
0 1 2 3
Birdsong ab

1 Finite
languages

TRENDS in Cognitive Sciences

Figure I. The Chomsky hierarchy of languages along with the hypothesized locations of both human language and birdsong. The nested rings in the figure correspond
to the increasingly larger sets, or languages, generated or recognized by more powerful automata or grammars. An example of the state transition diagram
corresponding to a typical Bengalese finch song [31] is shown in the next ring after this, corresponding to some subset of the regular languages.

116
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Box 2. Is recursion for the birds?


Recursive constructs occur in many familiar human language exam- way that the starling must be paired with the singular form was,
ples, such as the starling the cats want was tired, where one finds a full rather than the plural were; similarly, the cats must be paired with
sentence phrase, the starling was tired, that contains within it a second, want rather than the singular form wants. So, for example, to
‘nested’ or ‘self-embedded’ sentence, S, the cats want. In this case, the indicate a nested dependency pattern properly, the form a3b3
rule that constructs Sentences can apply to its own output, recursively should be more accurately written as a1a2a3b3b2b1, where the
generating a pattern of ‘nested’ or ‘serial’ dependencies. superscripts indicate which as and bs must be paired up. Thus, any
We can write a simple CFG with three rules that illustrates this method to detect whether an animal can either recognize or produce
concept as follows: S!aB; B!Sb; S!e, where e corresponds to the a strictly context-free pattern requires that one demonstrates that
empty symbol. We can use this grammar to show that one can first the correct as and bs are paired up; merely recognizing that the
apply the rule that expands S as aB and then can apply the second rule number of as matches the number of bs does not suffice. This is one
to expand B as Sb, thus obtaining, aSb; S now appears with non-null key difficulty with the Gentner et al. protocol and result [56], which
elements on both sides, so we say that S has been ‘self-embedded’. If probed only for the ability of starlings to recognize an equal number
we now use the third rule to replace S with the empty symbol, we of as and bs in terms of warble and rattle sound classes (i.e.
obtain the output ab. Alternatively, we could apply the first and second warlble3rattle3 patterns) but did not test for whether these warble-
rules over again to obtain the string aabb, or, more generally, anbn for rattles were properly paired off in a nested dependency fashion. As
any integer n. a result, considerable controversy remains as to whether any non-
In our example, the as and the bs in fact form nested human species can truly recognize strictly context-free patterns
dependencies because they are correspondingly paired in the same [11,67].

Box 3. Descriptive complexity, birdsong and human syntax


The substructure of the regular languages, sub-regular language locally testable languages, denoted SLk, where k is the ‘window
hierarchies, could be relevant to gain insight into the computational length’ [56,62,68]. It might be of some value to understand the range
capacities of animals and humans in the domain of acoustic and of sub-regular patterns that birds can perceive or produce. To
artificial language learning [62,68,69]. Similar to the Chomsky tentatively answer this question, we applied a program for computing
hierarchy, the family of regular languages can itself be ordered in local testability [38,44,70]. For example, the FSA in Figure I (Box 1)
terms of strictly inclusive sets of increasing complexity [69]. The recognizes a language that is locally testable. This answer agrees with
ordering uses the notion of descriptive complexity, corresponding the independent findings of Okanoya [31] and Gentner [26,57].
informally to how much local context and internal state information Other sub-regular pattern families have been recently explored in
must be used by a finite-state machine to recognize a particular string connection with human language sound systems [20,71]. Some of these
pattern correctly. For example, to recognize the regular pattern used might ultimately prove relevant to birdsong because they deal with
in the starling experiment [56], (ab)1, a finite-state machine needs only acoustic patterns. In particular, possible sound combinations might fall
to check four adjacency relations or bigrams as they appear directly in into the same classes as those of human languages. Finally, all these
a candidate string: the beginning of the string followed by an a; an a sub-regular families could be extended straightforwardly to include
followed by a b;, a b followed by an a or else a b followed by the end phrases explicitly, but still without the ability to ‘count’, as seems true
of the string. We can say such a pattern is strictly locally 2-testable or of human language ([66,72–74] R. Berwick, PhD Thesis, MIT, 1982). It is
SL2 [69]. As we increase the length of these factors, we obtain a clear that we have only just begun to scratch the surface of the detailed
strictly increasing set hierarchy of regular languages, the strictly structure of sub-regular patterns and their cognitive relevance.

bigram description. Although there are several paths more fully a second, more complex Bengalese finch song
through this network from the beginning state on the left drawn according to the same transition network method-
to the double-circled end state on the right, the ‘loop’ back ology, this time explicitly showing the probability that one
from state 2 to state 1 along with the loop from state 3 to 1 [()TD$FIG]
state follows another via the numbers on the links between
can generate songs with an arbitrary number of cde ab
notes, followed by the notes cde fg. From there, a song can
hh
continue with the notes ab back to state 1, and so lead to ilga
hh 0.08
another arbitrary number of cde ab notes, all finally ending aaa bcadb b 0.37
b 0.12
in cde fg. In fact, the transitions between states are sto-
chastic; that is, the finch can vary its song by choosing to go eekfff
from state 2 back to state 1 with some likelihood that is 0.88 lga
measurably different from the likelihood of continuing on adb 0.55
bhh
to state 3. In any case, formally this means that the notes 0.33
0.22
cde fg can appear in the ‘middle’ of a song, arbitrarily far
from either end, bracketed on both sides by some arbitrari-
ly long number of cde ab repetitions. Such a note pattern is
f
no longer strictly locally testable because now there can be 0.44
no fixed-length ‘window’ that can check whether a note
sequence ‘passes’. Instead of checking the note sequences jaa
directly, one must use the memory of the FSA indirectly to TRENDS in Cognitive Sciences
‘wait’ after encountering the first of a possibly arbitrarily
Figure 3. Probabilistic finite-state transition diagram of the song repertoire of a
long sequence of cde abs. The automaton must then stay in Bengalese finch. Directed transition links between states are labelled with note
this state until the required cde fg sequence appears. Such sequences along with the probability of moving along that particular link and
a language pattern remains recognizable by a restricted producing the associated note sequence. The possibility of loops on either side of
fixed note sequences such as hh or lga mean that this song is not strictly locally
FSA, but one more powerful than a simple bigram checking testable (see Box 3 and main text). However, it is still k-reversible, and so easily
machine. Such complexity seems typical. Figure 3 displays learned from example songs [35]. Adapted, with permission, from [75].

117
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

states [32]. It too contains loops, including one from the ‘beat patterns’ found in human speech or music, it remains
final, double-circled state back to the start, so that a certain an open question whether birdsong metrical structure is
song portion can be found located arbitrarily far in the amenable to the formal analysis of musical meter, or even
middle. For example, among several other possibilities, the how stress is perceived in birds as opposed to humans [47–
note sequence lga, which occurs on the transition to the 49] (Box 4).
double-circled final state, can be preceded by any number
of b hh repetitions, as well as followed by jaa b bcadb and Tweets to phrases: the role of words
then an arbitrary number of eekfff adb notes, again via a Turning to syntactic description that lies beyond sound
loop. structure, we find that birdsong and human language
Nightingales, another species with complex songs, can syntax sharply diverge. In human syntax, but not birdsong,
sing motifs with notes that are similarly embedded within hierarchical combinations of effectively arbitrary depth
looped note chunks [33]. Considering that there are hun- can be assembled by combining words and words parts,
dreds of such motifs in a song repertoire of a nightingale, such as the addition of s to the end of apple to yield apples, a
their songs must be at least as complex as those of Benga- word-construction process called ‘morphology’. Human
lese finches, at least from this formal standpoint. syntax then goes even further, organizing words into
More precisely and importantly, the languages involved higher-order phrases and entire sentences. None of these
here, at least in the case of Bengalese finch, and perhaps additional levels appear to be found in birdsong. This
other avian species, are closely related to constraints on reinforces Marler’s long-standing view [28] that birdsong
regular languages that enable them to be easily learned might best be regarded as ‘phonological syntax’, a formal
[31,34,35]. Kakishita et al. [29] constructed a particular language; that is, a set of units (here acoustic elements)
kind of restricted FSA generating the observed sequences that are arranged in particular ways but not others accord-
(a k-reversible FSA). Intuitively, in this restricted case, a ing to a definable rule set.
learner can determine whether two states are equivalent What accounts for this difference between birdsong and
by examining only the local note sequences that can follow language? First, birdsong lacks semantics and words in the
from any two states, determining whether the two states human sense, because song elements are not combined to
should be considered equivalent [36,37] (Figure I, Box 1). It yield novel ‘meanings’. Instead, birdsong can convey only a
is this local property that enables a learner to learn cor- limited set of intentions, as a graded, holistic communica-
rectly and efficiently the proper automaton corresponding tion system to attract mates or deter rivals and defend
to external song sequences simply by listening to them, territory. In terms of the tripartite diagram of Figure 2, the
something that is impossible in general for FSA [38,39]. conceptual–intentional component is greatly reduced.
What about human language sound structure or its Birds might still have some internalized conceptual–inten-
phonology? This is also now known to be describable purely tional system, but for whatever reason it is not connected to
in terms of FSA [40], a result that was not anticipated by a syntactic and externalization component. By contrast,
earlier work in the field [41] which assumed more general human syntax is intimately wedded to our conceptual
computational devices well beyond the power of FSA (Box system, involving words in both their syntactic and seman-
1). For example, there are familiar ‘phonotactic’ con- tic aspects, so that, for example, combining ‘red’ with
straints in every language, such that English speakers ‘apples’ yields a meaning quite distinct from, for example,
know that a form such as ptak could not be a possible ‘green apples’. It seems plausible that this single distinc-
English word, but plast might be [42]. To be sure, such tion drives fundamental differences between birdsong and
constraints are often not ‘all or none’ but might depend on human syntax. In particular, birds such as Bengalese
the statistical frequency of word subparts. Such gradation finches and nightingales can and do vary their songs in
might also be present in birdsong, as reflected by the the acoustic domain, rearranging existing ‘chunks’ to pro-
probabilistic transitions between states, as shown in Fig- duce hundreds of distinct song types that might serve to
ure I (Box 1) and Figure 3 [31,43]. Once stochastic grada- identify individual birds and their degree of sexual arousal,
tion is modelled, phonotactic constraints more closely as well as local ‘dialect-based’ congener groups [50–52],
mirror those found in birdsong finite-state descriptions. although a recent systematic study of song recombination
Such formal findings have been buttressed by recent suggests that birds rarely introduce improvised song notes
experiments with both human infants and Bengalese or sequences [32]. For example, skylarks mark individual
finches, confirming that adjacent acoustic dependencies identity by particular song notes [51], as starlings do with
of this sort are readily learnable from an early age using song sequences [52]; and canaries use special ‘sexy sylla-
statistical and prosodic cues [32,44–46]. bles’ to strengthen the effect of mate attraction [50]. How-
However, other human sound structure rules apparent- ever, more importantly, this bounded acoustic creativity
ly go beyond this simplest kind of strictly local description, pales in comparison with the seemingly limitless open-
although remaining finite state. These include the rules ended variation observed in even a single human speaker,
that account for ‘vowel harmony’ in languages such as where variation might be found not only at the acoustic
Turkish, where, for example, the properties of the vowel level in how a word is spoken, but also in how words are
u in the word pul, ‘stamp’, are ‘propagated’ through to all combined into larger structures with distinct meanings,
its endings [7], and stress patterns (J. Heinz, PhD thesis, what could be called ‘compositional creativity’. It is this
University of California at Los Angeles, 2007). Whereas latter aspect that appears absent in birdsong. Song var-
the limited-depth hierarchies that arise in songbird syntax iants do not result in distinct ‘meanings’ with completely
seem reminiscent of the bounded rhythmic structures or new semantics, but serve only to modify the entirety of the

118
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

original behavioural function of the song within the context The evidence for a corresponding ability in birds
of mating, never producing a new behavioural context, and remains weak, despite recent experiments on training
so remaining part of a graded communication system. For starlings to recognize such patterns (which must be care-
example, the ‘sexy syllable’ conveys the strength of the fully distinguished from the ability to produce such
motivation of a canary, but does not change the meaning of sequences in a naturalistic setting, as described in the
its song [50]. In this sense, birdsong creativity lies along a previous section) [56,57]. In starlings, only the ability to
finite, acoustic combinational dimension, never at the level recognize nesting was tested, and not the crucial depen-
of human compositional creativity. dency aspect that pairs up particular as with particular bs
Second, unlike birdsong, human language sentences are [11] (Box 2). In fact, human syntax goes beyond this kind of
potentially unbounded in length and structure, limited only recursion to encompass certain strictly mildly context-
by extraneous factors, such as short-term memory or lung sensitive constructions that have even more complex, over-
capacity [53]. Here too words are important. The combina- lapping dependency patterns (Box 1). Importantly, even
tion of the Verb ate and the Noun apples yields the combi- though they differ on much else, since approximately 1970
nation ate apples that has the properties of a Verb rather a broad range of syntactic theories, comprising most of the
than a Noun. This effectively ‘names’ the combination as a major strands of modern linguistic thought, have incorpo-
new piece of hierarchical structure, phrase, with the label rated Bloomfield’s [54] central insight that human lan-
ate, dubbed the head of the phrase [54]. This new Verb-like guage syntax is combinatorially word-centric in the
combination can then act as a single object and enter into manner described above [13–16,58,59], as well as having
further syntactic combinations. For example, Allison and ate the power to describe both nested and overlapped depen-
apples can combine to form Allison ate apples, again taking dencies. To our knowledge, such mild-context sensitivity
ate as the head. Phrases can recombine ad infinitum to form has never been demonstrated, or even tested, in any non-
ever-longer sentences, so exhibiting the open-ended novelty human species.
that von Humboldt famously called ‘the infinite use of finite In short, word-driven structure building seems totally
means’ [55], that is immediately recognized as the hallmark absent in songbird syntax, and this limits its potential
of human language: Pat recalled that Charlie said that hierarchical complexity. Birdsong motifs lack word-centric
Moira thought that Allison ate apples. Thus in general, ‘heads’ and so cannot be individuated via some internal
sentences can be embedded within other sentences, recur- labelling mechanism to participate in the construction of
sively, as in the starling the cats want was tired, in a ‘nested’ arbitrary-depth structures. Whereas a starling song might
dependency pattern, where we find one ‘top-level’ sentence, consist of a sequence of warbles and rattle motif classes
the starling was tired, consisting of a Subject, the starling, [57], there seems to be no corresponding way in which the
and a Predicate phrase was tired, that in turn itself contains acoustical features of the warble class are then used to
a Sentence, the cats want formed out of another Subject, the ‘name’ distinctively the warble-rattle sequence as a whole,
cats, and a Predicate, want. Informally, we call such embed- so that this combination can then be manipulated as single
dings ‘recursive’, and the resulting languages ‘context-free unit phrases into ever-more complex syntactic structures.
languages’ (CFLs; Box 1). This pattern reveals a character-
istic possibility for human language, a ‘nested dependency’. Birdsong phrase structure?
The singular number feature associated with the Subject, Nonetheless, recent findings suggest that birds have a
the starling, must match up with the singular number limited ability to construct phrases, at least in the acoustic
feature associated with top-level Verb form was, whereas domain, as noted above, accounting for individual varia-
the embedded sentence, the cats want has a plural Subject, tion within species [32,33]. In particular, there might be
the cats, that must agree with the plural Verb form want. acoustic segmentation chunking in the self-produced song
Such ‘serial nested dependencies’ in the abstract form, of the Bengalese finch [29,31]. Suge and Okanoya used the
a1a2b2b1 are both produced and recognized quite generally ‘click’ protocol pioneered by Fodor et al. [60] to probe the
in human language [53]. ‘psychological reality’ of syntactic phrases in humans [34].

Box 4. Questions for future research


where a bird would have to recognize a note(s) such as b arbitrarily
 We do not know for certain the descriptive complexity of birdsong. far from both ends of a song [68]? What about sub-regular patterns
Does it belong to any particular member of the sub-regular that are more complicated than this?
language hierarchies, or does it lie outside these, possibly in the  The Gentner et al. experiment [49] did not test for the nested
family of strictly CFLs? If birdsong is contained in some sub-regular dependency structure characteristic of embedded sentences in
hierarchy, how is this result to be reconciled with the findings in the human language. Can birds be trained to recognize truly nested
Gentner et al. starling study [56]? If birdsong is context free, then dependencies, even if just of finite depth?
we can again ask to what family of CFLs it belongs: is it a  Using the methods developed in, for example, [71], what is the
deterministic CFL (as opposed to a general CFL)? Is it learnable from descriptive complexity of prosody or rhythmic stress patterns in
positive examples? birdsong?
 Current tests of finite-state versus CFL abilities in birdsong have  What are the neural mechanisms underlying variable song
chosen only the weakest (computationally and descriptively sequences in songbirds? Both human speech and birdsong involve
simplest) finite-state language to compare against the simplest sequentially arranged vocalizations. Are there similar neural
CFL. Can starlings be trained to recognize descriptively more mechanisms for the production and perception of such sequences
complex finite-state patterns; for example, a locally testable but in songbirds and humans? Bolhuis et al. [9] have summarized
not non-strictly local testable finite-state pattern, such as a1(ba1)1, current knowledge of these mechanisms in humans and birds.

119
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Applied to human language, subjects given ‘click’ stimuli in 8 Okanoya, K. (2007) Language evolution and an emergent property.
Curr. Op. Neurobiol. 17, 271–276
the middle of phrases such as ate the apples, tend to
9 Bolhuis, J.J. et al. (2010) Twitter evolution: converging mechanisms in
‘migrate’ their perception of where the click occurs to the birdsong and human speech. Nature Rev. Neurosci. 11, 747–759
beginning or end of the phrase. Suge and Okanoya estab- 10 Chomsky, C. (1966) Cartesian Linguistics, Harper & Row
lished that 3-4 note sequences, such as the cde in Figure I 11 Corballis, M.C. (2007) Recursion, language, and starlings. Cogn. Sci.
(Box 1) are perceived as unitary ‘chunks’ so that the finches 31, 697–704
12 Aristotle (1970) Historia Animalium. v.II, Harvard University Press
tended to respond as if the click was at the c or e end of an
13 Steedman, M. (2001) The Syntactic Process, MIT Press
cde ‘chunk [34]. Importantly, recall that Bengalese finches 14 Kaplan, R. and Bresnan, J. (1982) Lexical-functional grammar: a
are also able to produce such sequence chunks, as de- formal system for grammatical relations. In The Mental
scribed earlier and in Figure I (Box 1) and Figure 3. This Representation of Grammatical Relations (Bresnan, J., ed.), pp. 173–
is strikingly similar to the human syntactic capacity to 281, Cambridge, MA, MIT Press
15 Gazdar, G. et al. (1985) Generalized Phrase-structure Grammar,
‘remember’ an entire sequence encapsulated as a single Harvard University Press
phrase or a ‘state’ of an automaton, and to reuse that 16 Pollard, C. and Sag, I. (1994) Head-driven Phrase Structure Grammar,
encapsulation elsewhere, just as human syntax reuses University of Chicago Press
Noun Phrases and Verb Phrases. However, Bengalese 17 Culicover, P. and Jackendoff, R. (2005) Simpler Syntax, Oxford
finches do not seem to be able to manipulate chunks with University Press
18 Goldberg, A. (2006) Constructions at Work: The Nature of
the full flexibility of dependent nesting found in human Generalization in Language, Oxford University Press
syntax. One might speculate that, with the addition of 19 Marr, D. (1982) Vision, W.H. Freeman & Co
words, humans acquired the ability to label and ‘hold in 20 Rogers, J. et al. (2010) On languages piecewise testable in the strict
memory’ in separate locations distinct phrases such as sense. In Proceedings of the 11th Meeting of the Mathematics of
Language Association (eds), pp. 255–265, Springer-Verlag
Allison ate apples and Moira thought otherwise, parallel
21 Okanoya, K. (2004) The Bengalese finch: a window on the behavioral
to the ability to label and separately store in memory the neurobiology of birdsong syntax. Ann. NY Acad. Sci. 1016, 724–735
words ate and thought. Once words infiltrated the basic 22 Sasahara, K. and Ikegami, T. (2007) Evolution of birdsong syntax by
pre-existing syntactic machinery, the combinatory possi- interjection communication. Artif. Life 13, 259–277
bilities became open ended. 23 Catchpole, C.K. and Slater, P.J.B. (2008) Bird Song: Biological Themes
and Variations, (2nd edn), Cambridge University Press
24 Wohlgemuth, M.J. et al. (2010) Linked control of syllable sequence and
Conclusions and perspectives phonology in birdsong. J. Neurosci. 29, 12936–12949
Despite considerable linguistic interest in birdsong, few 25 Todt, D. and Hultsch, H. (1996) Acquisition and performance of
studies have applied formal syntactic methods to its struc- repertoires: ways of coping with diversity and versatility. In Ecology
ture. Those that do exist suggest that birdsong syntax lies and Evolution of Communication (Kroodsma, D.E. and Miller, E.H.,
eds), pp. 79–96, Cornell University Press
well beyond the power of bigram descriptions, but is at 26 Gentner, T. and Hulse, S. (1998) Perceptual mechanisms for individual
most only as powerful as k-reversible regular languages, vocal recognition in European starlings. Sturnus vulgaris. Anim.
lacking the nested dependencies that are characteristic of Behav. 56, 579–594
human syntax [11,29,56,57]. This is probably because of 27 Dobson, C.W. and Lemon, R.E. (1979) Markov sequences in songs of
the lack of semantics in birdsong, because song sequence American thrushes. Behaviour 68, 86–105
28 Marler, P. (1977) The structure of animal communication sounds. In
changes typically alter message strength but not message Recognition of Complex Acoustic Signals: Report of the Dahlem
type. This would imply that birdsong might best serve as Workshop on Recognition of Complex Acoustic Signals, Berlin
an animal model to study learning and neural control of (Bullock, T.H., ed.), pp. 17–35, Abakon-Verlagsgesellschaft
human speech [9], rather than internal syntax or seman- 29 Kakishita, Y. et al. (2009) Ethological data mining: an automata-based
approach to extract behavioural units and rules. Data Min. Knowl.
tics per se. Furthermore, comparing the structure of hu-
Disc. 18, 446–471
man speech and birdsong can be a useful tool for the study 30 Hilliard, A.T. and White, S.A. (2009) Possible precursors of syntactic
of the evolution of brain and behaviour (Box 4). Bolhuis components in other species. In Biological Foundations and Origin of
et al. [9] have argued that, in the evolution of vocal Syntax (Bickerton, D. and Szathmáry, E., eds), pp. 161–184, MIT Press
learning, both common descent (homologous brain 31 Okanoya, K. (2004) Song syntax in Bengalese finches: proximate and
ultimate analyses. Adv. Stud. Behav. 34, 297–345
regions) and evolutionary convergence (distant taxa exhi-
32 Takahasi, M. et al. (2010) Statistical and prosodic cues for song
biting functionally similar auditory-vocal learning) have segmentation learning by Bengalese finches (Lonchura striata var.
a role. domestica). Ethology 116, 481–489
33 Todt, D. and Hultsch, H. (1998) How songbirds deal with large amount
References of serial information: retrieval rules suggest a hierarchical song
1 Darwin, C. (1882) The Descent of Man and Selection in Relation to Sex, memory. Biol. Cybern. 79, 487–500
Murray 34 Suge, R. and Okanoya, K. (2010) Perceptual chunking in the self-
2 Margoliash, D. and Nusbaum, H.C. (2009) Language: the perspective produced songs of Bengalese finches (Lonchuria striata var.
from organismal biology. Trends Cogn. Sci. 13, 505–510 domestica). Anim. Cog. 13, 515–523
3 Hauser, M.D. et al. (2002) The faculty of language: what is it, who has it, 35 Kakishita, Y. et al. (2007) Pattern extraction improves automata-based
and how did it evolve? Science 298, 1569–1579 syntax analysis in songbirds. ACAL 2007. Lect. Notes in Artif. Intell.
4 Bolhuis, J.J. and Wynne, C.D.L. (2009) Can evolution explain how 828, 321–333
minds work? Nature 458, 832–833 36 Kobayashi, S. and Yokomori, T. (1994) Learning concatenations of
5 Doupe, A.J. and Kuhl, P.K. (1999) Birdsong and human speech: locally testable languages from positive data. Algorithmic Learning
common themes and mechanisms. Annu. Rev. Neurosci. 22, 567–631 Theory, Lect. Notes in Comput. Sci. 872, 407–422
6 Bolhuis, J.J. and Gahr, M. (2006) Neural mechanisms of birdsong 37 Kobayashi, S. and Yokomori, T. (1997) Learning approximately regular
memory. Nature Rev. Neurosci. 7, 347–357 languages with reversible languages. Theor. Comput. Sci. 174, 251–257
7 Yip, M. (2006) The search for phonology in other species. Trends Cogn. 38 Angluin, D. (1982) Inference of reversible languages. J. ACM 29, 741–
Sci. 10, 442–446 765

120
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

39 Berwick, R. and Pilato, S. (1987) Learning syntax by automata 59 Joshi, A. et al. (1991) The convergence of mildly context-sensitive
induction. J. Mach. Learning 3, 9–38 grammar formalisms. In Foundational Issues in Natural Language
40 Johnson, C.D. (1972) Formal Aspects of Phonological Description, Processing (Sells, P. et al., eds), pp. 31–82, MIT Press
Mouton 60 Fodor, J. et al. (1965) The psychological reality of linguistic segments.
41 Chomsky, N. and Halle, M. (1968) The Sound Patterns of English, J. Verb. Learn. Verb. Behav. 4, 414–420
Harper & Row 61 Chomsky, N. (1956) Three models for the description of language. IRE
42 Halle, M. (1978) Knowledge unlearned and untaught: what Trans. Info. Theory 2, 113–124
speakers know about the sounds of their language. In Linguistic 62 Rogers, J. and Hauser, M. (2010) The use of formal language theory in
Theory and Psychological Reality (Halle, M. et al., eds), pp. 294– studies of artificial language learning: a proposal for distinguishing the
303, MIT Press differences between human and nonhuman animal learners. In
43 Pierrehumbert, J. and Nair, R. (1995) Word games and syllable Recursion and Human Language (van der Hulst, H., ed.), pp. 213–
structure. Lang. Speech 38, 78–116 232, De Gruyter Mouton
44 Kuhl, P. (2008) Early language acquisition: cracking the speech code. 63 Huybregts, M.A.C. (1984) The weak adequacy of context-free phrase
Nat. Rev. Neurosci. 5, 831–843 structure grammar. In Van Periferie Naar Kern (de Haan, G.J. et al.,
45 Newport, E. and Aslin, R. (2004) Learning at a distance. I. Statistical eds), pp. 81–99, Foris
learning of non-adjacent regularities. Cog. Sci. 48, 127–162 64 Shieber, S. (1985) Evidence against the context-freeness of natural
46 Gervain, J. and Mehler, J. (2010) Speech perception and language. Ling. Philos. 8, 333–343
language acquisition in the first year of life. Ann. Rev. Psychol. 61, 65 Kudlek, M. et al. (2003) Contexts and the concept of mild context-
191–218 sensitivity. Ling Phil. 26, 703–725
47 Halle, M. and Vergnaud, J-R. (1990) An Essay on Stress, MIT Press 66 Berwick, R. and Weinberg, A. (1984) The Grammatical Basis of
48 Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Linguistic Performance, MIT Press
Music, MIT Press 67 van Heijningen, C.A.A. et al. (2009) Simple rules can explain
49 Fabb, N. and Halle, M. (2008) A New Theory of Meter in Poetry, discrimination of putative recursive syntactic structures by a
Cambridge University Press songbird species. Proc. Natl. Acad. Sci. U.S.A. 106, 20538–20543
50 Kreutzer, M. et al. (1999) Social stimulation modulates the use of the ‘A’ 68 Rogers, J. and Pullum, G. Aural pattern recognition experiments and
phrase in male canary songs. Behaviour 136, 1325–1334 the subregular hierarchy. J. Logic, Lang. & Info (in press)
51 Briefer, E. et al. (2009) Response to displaced neighbours in a 69 McNaughton, R. and Papert, S. (1971) Counter-free Automata, MIT Press
territorial songbird with a large repertoire. Naturwissenschaften 70 Trahtman, A. (2004) Reducing the time complexity of testing for local
96, 1067–1077 threshold testability. Theor. Comp. Sci. 328, 151–160
52 Knudsen, D.P. and Gentner, T.Q. (2010) Mechanisms of song 71 Heinz, J. (2009) On the role of locality in learning stress patterns.
perception in oscine birds. Brain Lang. 115, 59–68 Phonology 26, 305–351
53 Chomsky, N. and Miller, G. (1963) Finitary models of language users. 72 Crespi-Reghizzi, S. (1978) Non-counting context-free languages. J.
In Handbook of Mathematical Psychology (Luce, R. et al., eds), pp. 419– ACM 4, 571–580
491, Wiley 73 Crespi-Reghizzi, S. (1971) Reduction of enumeration in grammar
54 Bloomfield, L. (1933) Language, Henry Holt acquisition. In Proceedings of the 2nd International Joint Conference
55 von Humboldt, W. (1836) Über die Verschiedenheit des menschlichen on Artificial Intelligence (Cooper, D.C., ed.), pp. 546–552, William
Sprachbaues und ihren Einfluss auf die geistige Entwickelung des Kaufman
Menshengeschlechts, Ferdinand Dümmler 74 Crespi-Reghizzi, S. and Braitenburg, V. (2003) Towards a brain
56 Gentner, T.Q. et al. (2006) Recursive syntactic pattern learning by compatible theory of language based on local testability. In
songbirds. Nature 440, 1204–1207 Grammars and Automata for String Processing: from Mathematics
57 Gentner, T. (2007) Mechanisms of auditory pattern recognition in and Computer Science (Martin-Vide, C. and Mitrana, V., eds), pp. 17–
songbirds. Lang. Learn. Devel. 3, 157–178 32, Gordon & Breach
58 Chomsky, N. (1970) Remarks on nominalization. In Readings in 75 Hosino, T. and Okanoya, K. (2000) Lesion of a higher-order song
English Transformational Grammar (Jacobs, R.A.P. and nucleus disrupts phrase level complexity in Bengalese finches.
Rosenbaum, P., eds), pp. 184–221, Ginn Neuroreport 11, 2091–2095

121
Review

Representing multiple objects as an


ensemble enhances visual cognition
George A. Alvarez
Vision Sciences Laboratory, Department of Psychology, Harvard University, 33 Kirkland Street, William James Hall, Room 760,
Cambridge, MA 02138, USA

The visual system can only accurately represent a hand- a collection of objects. Other statistics that describe a set,
ful of objects at once. How do we cope with this severe such as variance [28], skew and kurtosis, are also ensemble
capacity limitation? One possibility is to use selective representations, although the ability to compute and rep-
attention to process only the most relevant incoming resent these statistics has been the focus of less attention
information. A complementary strategy is to represent in recent research (but see [29,30] for reviews on earlier
sets of objects as a group or ensemble (e.g. represent the research). Finally, the concept of ensemble representations
average size of items). Recent studies have established can be extended beyond first-order summary statistics, to
that the visual system computes accurate ensemble include higher-order summary statistics [31–33].
representations across a variety of feature domains Ensemble representations have been explored under
and current research aims to determine how these repre- various names in the literature, including ‘global features’
sentations are computed, why they are computed and [32,34,35], ‘(w)holistic’ or ‘configural’ features [36–38], ‘sets’
where they are coded in the brain. Ensemble representa- [18,39] and ‘statistical properties’ or ‘statistical summa-
tions enhance visual cognition in many ways, making ries’ [19,40]. Each of these terms shares the notion that
ensemble coding a crucial mechanism for coping with multiple measurements are combined to give rise to a
the limitations on visual processing. higher level description. The term ‘ensemble representa-
tion’ is used here as an umbrella term encompassing these
Benefits of ensemble representation different ideas. Although there is, as yet, no unifying model
Unlike artificial displays used in laboratory experiments, of ensemble representation across these domains, recent
where there is no reliable pattern across individual items, research on ensemble representation is unified by a com-
the real world is highly structured and predictable [1,2]. mon principle: representing multiple objects as an ensem-
For instance, at the object level, the visual field often ble enhances visual cognition.
consists of collections of similar objects – faces in a crowd,
berries on a bush. At a more primitive feature level, natu- The power of averaging
ral images are highly regular in terms of their contrast and How can computing ensemble representations help over-
intensity distributions [3,4], color distributions [5–8], re- come the severe capacity limitations of our visual system?
flectance spectra [9,10] and spatial structure [2,11–14]. The answer lies in the power of averaging: simply put, the
Where there is structure, there is redundancy, and where average of multiple noisy measurements can be much more
there is redundancy, there is an opportunity to form a precise than the individual measurements themselves. For
compressed and efficient representation of information instance, one can measure reaction time with millisecond
[15–17]. One way to capitalize on this structure and re- precision even when rounding reaction times to the nearest
dundancy is to represent collections of objects or features at 100 ms (Box 1). The same principle is at play in the ‘wisdom
a higher level of description, describing distributions or of crowds’ effect, in which people guess the weight of an ox
sets of objects as an ensemble rather than as individuals. and the average response is closer to the correct answer
An ensemble representation is any representation that than are the individual guesses on average [41]. These
is computed from multiple individual measurements, ei- benefits arise because, when measurements are averaged,
ther by collapsing across them or by combining them across random error in one individual measurement will tend to
space and/or time. For instance, any summary statistic cancel out uncorrelated random error in another measure-
(e.g. the mean) is an ensemble representation because it ment. Thus, the benefits of averaging depend on the extent
collapses across individual measurements to provide a to which the noise in individual measurements is correlat-
single description of the set. People are remarkably accu- ed (less correlated, more benefit) and the number of indi-
rate at computing averages, including the mean size vidual measurements averaged (more measurements,
[18,19], brightness [20], orientation [18,21,22] and location more benefit). The benefit of averaging can be formalized
of a collection of objects [23]; the average emotion [24], mathematically, given certain assumptions regarding the
gender [24] and identity [25] of faces in a crowd; and the noise in the individual measurements (Figure 1).
average number for a set of symbolically presented num- If the human visual system is capable of averaging, then
bers [26,27]. These are all measures of central tendency for observers should be able to judge the average size of a set
more accurately than they can judge the individuals in the
Corresponding author: Alvarez, G.A. (alvarez@wjh.harvard.edu). set. This is exactly what was demonstrated by Dan Ariely’s
122 1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.003 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Box 1. The power of averaging


Imagine you are running an experiment with an expected effect size Now suppose your keyboard checks for a key once every 100 ms.
of 20 ms, which is not uncommon in behavioral research (e.g. This would be equivalent to rounding each reaction time up to the
negative priming or simple detection tasks). Do you need to worry nearest 100 ms, which on the face of it sounds like it would add error
about the sampling rate of your keyboard? First let us consider what to the estimate of the mean and variance of each condition. Indeed, it
would happen if we simply rounded reaction times to the nearest would lead to overestimates of the reaction time in each condition.
100 ms. By averaging multiple samples, individual errors owing to However, the relative difference between conditions could be
rounding will tend to cancel each other out, and it is possible to preserved. The simulation above was repeated with two conditions
obtain millisecond precision in the estimate of the mean despite in which the true mean between conditions was simulated so that
rounding. Figure Ia shows the results of a simulation with ten virtual condition two was 20 ms slower than condition one on average.
subjects and only 30 trials per subject. The true average of the Figure Ib shows the results of the simulation, in which condition two
population is 600 ms, and subjects are normally distributed around was reliably slower than condition one for each individual subject,
this mean (i.e. each subject has their own true mean, but the average and the 20 ms difference is significant at p < 0.05 using a standard
across subjects will be 600 ms). For each simulated trial, reaction within-subject t-test. In general, whether the effect can be detected
time was simulated as the subject’s true mean plus 15% random thus will depend on the degree of rounding, the expected size of the
noise around their true mean. This is fairly typical of reaction time effect and the variability of the data.
data, but the simulation results do not depend crucially on this value. For the present purpose, the important point is that, by averaging a
The simulated reaction times were then rounded to the nearest relatively modest number of trials, it is possible to overcome a great
100 ms. When the true reaction times (from the simulation) are deal of noise in individual estimates to obtain a precise representation
compared to the rounded reaction times, the mean and variance of of the mean (Figure Ia) and to detect a subtle difference between two
[()TD$FIG] the two data sets are nearly indistinguishable. conditions (Figure Ib).

(a) Effect of rounding (b) Effect of rounding-up


800
700
700
600
600
Reaction time (ms)

Reaction time (ms)

500
500
400
400
300
300
200 200

100 100

0 0
True values Rounded values Condition 1 Condition 2
TRENDS in Cognitive Sciences

Figure I. (a) The effect of rounding on estimating the mean and variance in a single condition. Error bars depict the standard deviation across subjects. (b) The effect of
rounding-up on the comparison of two conditions in which the true mean differs by 20 ms. Error bars depict the within-subject standard error of the mean.

influential research on the ability of people to perceive the possible to combine that imprecise information to recover
mean size of a set [18], which showed that observers can an accurate measure of the group [23].
estimate with high accuracy the average size of a set of Figure 2 illustrates how attention might affect the
objects, even when they appear unable to report the size of fidelity of ensemble representations. Inside the focus of
the individual objects in the set. attention (red beams), individual items will be represented
This type of averaging provides a potential mechanism with relatively high precision. The average of these items
for coping with the severe limitations on attentional pro- will be represented with even higher precision, as expected
cessing. Attention appears to be a fluid and flexible re- from the benefits of averaging. For items outside the focus
source: we can give full attention to a single item and of attention, we assume that they must be attended to some
represent that item with high precision, or we can divide extent to be perceived at all. For instance, the results of
our attention among many items but consequently repre- inattentional blindness studies have shown that without
sent each item with lower precision [42–44]. In general, attention, there is little or no consciously accessible repre-
objects outside the focus of attention are perceived with sentation of visual information [49–51]. These studies
less clarity [45], lower contrast [46] and a weaker high- typically aim for participants to completely withdraw at-
frequency response [47,48]. Presumably all objects in the tention from the tested items, and in some cases observers
visual field are represented with varying degrees of preci- even actively inhibit information outside of the attentional
sion, depending on the amount of attention they receive. In set [51]. However, when observers know they will be asked
some cases, objects outside the focus of attention are so about information outside the focus of attention, it is
poorly represented that it seems like we have no useful probable that they diffusely attend to those items.
information about them at all. However, it turns out to be Figure 2 implies a parallel system with multiple foci of

123
()TD$FIG][ Review [()TD$FIG] Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Ensemble
Ensemble
representation
representation

Individual
Individual representation
representation
Focal
Focal
attention
attention
Image
of world

TRENDS in Cognitive Sciences


Distributed
attention
Figure 1. Gaining precision at a higher level of abstraction. By taking individual
measurements and averaging them, it is possible to extract a higher-level TRENDS in Cognitive Sciences
ensemble representation. If error is independent between the individual
representations, then the ensemble average will be more precisely represented Figure 2. Effect of attention on the fidelity of ensemble representations. Two sets
than the individuals in the set. This benefit can be quantified after making certain of items are depicted: one set inside the focus of attention (red beams) and one set
assumptions. For instance, if each individual were represented with the same diffusely attended outside the focus of attention (pink region). For illustrative
degree of independent, Gaussian noise (standard deviation = s), then the average purposes, both sets are composed of identical individuals, and thus both sets have
of these individual estimates would have less noise, with a standard deviation the same individual and mean representations. For items inside the focus of
equal to s/Hn, where n is the number of individual measurements. The process is attention, individual representations will be relatively precise (red curves). The
depicted for the representation of object size, but the logic holds for any feature ensemble representation of the items inside the focus of attention will be even
dimension. more precise, owing to the benefits of averaging. For items outside the focus of
attention which are diffusely attended, the individual representations will be very
imprecise (gray curves). However, the benefits of averaging are so great that the
ensemble representation will be fairly precise, even when a relatively small
number of individual representations are averaged (just three in this example).
attention, plus diffuse attention spread over items outside
the foci of attention. However, a similar result could be
modeled with a single spotlight of attention that spends
more time in some locations than others. Either way this and that mean size is computed by taking the total activa-
diffuse attention results in extremely imprecise represen- tion and dividing it by the number of items [52]. However,
tations of the individual items, and yet averaging even just Ariely’s use of the term ‘discard’ suggests that his intended
three imprecise measurements results in a fairly precise meaning was that the individual properties are computed,
representation of the ensemble. If a large enough sample of combined and then discarded. This type of averaging model
items is averaged together, then the ensemble representa- has been supported by research on the computation of
tion for items outside the focus of attention can be nearly as mean orientation [21]. Addressing this question empirical-
accurate as the ensemble representation for items inside ly is a challenge because it is possible to compute accurate
the focus of attention. ensemble representations even from very imprecise indi-
vidual measurements. Consequently, a poor representa-
The mechanisms of averaging tion of individual items cannot be used as evidence for
Although there is general agreement that human obser- mean computation without computing individuals – unless
vers can accurately represent ensemble features, many the mean can be shown to be represented more accurately
questions remain regarding ‘how’ these ensemble repre- than expected based on the number and fidelity of individ-
sentations are computed, including: (i) Are individual ual items represented.
representations computed and then combined to form an
ensemble representation, or are ensemble representations Are individual representations discarded?
somehow computed without computing individuals? (ii) If How do we explain such poor performance when observers
individual representations are computed, are they dis- are required to report the properties of individual members
carded once the ensemble has been computed? (iii) How of a set? One possibility is that these properties are com-
many individual items are sampled and included in the puted and then discarded. An important alternative pos-
calculation of the mean? Is it just a few or could it be all of sibility is that the individual representations are not
them? (iv) Do all items contribute to the mean equally? discarded, but are simply so noisy and inaccurate that
observers cannot consistently identify individuals from
Are ensembles built up from representations of the set owing to this high level of noise. Alvarez and Oliva
individuals? found support for this possibility by modeling their results
Ariely [18] proposed that the visual system performs a type [23], consistently finding that the accuracy of ensemble
of compression, by creating an ensemble representation judgments is perfectly predicted from the accuracy of
and then discarding individual representations. Some individual judgments – even when individuals appear to
have interpreted this proposal to mean that the ensemble be judged with near chance accuracy. This alternative
representation is computed without first directly comput- possibility fits with a framework in which the representa-
ing individual measurements. For instance, it is possible tion of an image is hierarchical, retaining information at
that there is a ‘total activation map’ and a ‘number map’ multiple levels of abstraction [35,53].

124
Review [()TD$FIG] Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

How many items are sampled? Ensemble


A great deal of enthusiasm surrounding studies on ensem- representation
ble representations stems from the possibility that there
Individual
are specialized ensemble processing mechanisms which representation
are separate from the mechanisms employed to represent
individual objects. However, this idea has spurred some Image
controversy in the area of research on mean size percep- of world
tion, where modeling study has shown that it is possible to
accurately estimate the mean by sampling a small subset TRENDS in Cognitive Sciences
of items [54]. In some cases, the average of the set could be
Figure 3. Effect of set size on the fidelity of individual and ensemble
accurately estimated by strategically sampling as few as representations. The ensemble average should become more precise as the
one or two items, and estimating the average of those items number of individual items increases, because the benefits of averaging accrue
alone [54]. Consistent with this subset sampling hypothe- with each additional item averaged (with diminishing returns, of course). However,
if the precision with which individual items can be represented decreases with set
sis, the accuracy of the mean estimate is typically constant size, as depicted here, it is possible for this decrease to perfectly offset the benefits
as the number of items in the set increases beyond four of averaging so that the precision of the average remains constant with set size.
items [18,55,56], whereas the benefits of averaging should
accrue as more items are averaged together. This would be
expected if observers were sampling just a subset of the increase in noise that occurs as the number of items
items. increases.
However, there are several reasons to believe that
observers are not strategically subsampling when they Do all items contribute to the mean equally?
compute the mean. In the case of crowded items, observers There is already some evidence that not all items contrib-
simply cannot sample individual items, thus it is unlikely ute equally to the mean [58]. Intuitively, if some measures
that judgments for crowded displays [21] reflect a sam- are very unreliable, and other measures are very reliable,
pling strategy. When items are not crowded, it has been we should give the more reliable measures more weight
shown that intermixing conditions that would require when combining these measurements. In general, comput-
different sampling strategies does not impair performance ing a weighted average in which more reliable estimates
on mean size estimation [57], suggesting that subjects are given greater weight will minimize the error in esti-
either are not using a strategic sampling strategy or can mates of the mean. To illustrate this point, Figure 4 shows
instantly deploy a new strategy based on some property of the results of a simulation in which the mean size of eight
the display. This latter possibility is unlikely, given that items was estimated. Half of the individual item sizes were
the displays in [57] were only presented for 200 ms. One estimated with high precision (low variance), whereas the
study on perceiving the average facial expression has other half were estimated with low precision (high vari-
shown that observers discount outliers when computing ance). The individual measurements were then averaged
the average, but a sampling strategy would show a large using the standard equal-weight average or using a preci-
effect of outliers [58]. Moreover, the accuracy of centroid sion-weighted average in which each individual measure-
estimates suggests that ‘all’ of the items must be averaged ment was weighted proportional to its precision. A total of
to compute the centroid with the level of precision ob- 1000 trials were simulated, and for each trial error was
served, requiring the representation of a minimum of eight measured as the difference between the actual mean size
individual items [23]. and the estimated mean size. The error distributions show
If observers are not strategically subsampling, the fact that error was lower for the precision-weighted average
that the precision of mean size estimation is constant with than for the standard, equal-weighted average.
the number of items beyond four presents a bit of a [()TD$FIG]
mystery. One possibility is that the benefits of averaging
accrue quickly, and that one would predict a steep improve- Equal-Weighted average
Frequency

ment in the precision of mean estimation from one to four


items, with a leveling off beyond four items [58]. Another
possibility is that the precision with which each individual
item is represented decreases as the number of items Error
increases, because each item receives less attention Precision-Weighted average
[42,44] and/or because items are more crowded and appear
Frequency

further in the periphery on average. If this were the case,


then the benefits from averaging additional items would be
offset by the decrease in precision with which the individ-
ual items are represented, as illustrated in Figure 3. This Error
account predicts that the slope of the function relating the TRENDS in Cognitive Sciences

precision of mean judgments to the number of items would Figure 4. Benefits of precision-weighted averaging. A standard equal-weighted
depend on the degree to which the noise in individual items average will be less precise on average than a precision-weighted average in
increases with the number of items. In practice, this slope which more reliable individual measurements are given more weight in the
average. Thus, if the precision of individual measurements is known, the optimal
is often fairly shallow or even flat [18,55,56]. This raises the strategy for computing the average is to combine individual measurements with
intriguing possibility that averaging perfectly offsets the more weight given to more reliable individual measurements.

125
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Exactly how to implement precision-weighted averag- features across space, including: (i) the ability to average
ing depends on how the problem is formulated. When faced features across time; (ii) the ability to represent other
with a group of samples to average, we could either assume ensemble properties, such as the number of items in a
that each individual item is a sample drawn from a single set; (iii) the ability to represent spatial patterns; (iv) the
distribution or that each individual item is a sample drawn relationship between ensemble representation and crowd-
from a separate distribution. If we assume that individual ing; and (v) the neural correlates of ensemble representa-
measurements are separate samples from a single distri- tion.
bution, and the goal is to estimate the central tendency of
the underlying distribution, then each measurement i Computing ensemble representations across time
should weighted by1/si2 (where si2 is the variance for item In addition to spatial structure, there is a great deal of
i). For instance, if one of the items has infinite variance, it temporal structure and redundancy in the input to the
will be completely ignored. This type of weighted average visual system, and thus it would be advantageous to be
has been used extensively in the cue integration literature able to also compute ensemble representations across time.
to define the optimal strategy for combining cues that have Recent research has shown that observers can judge the
different degrees of reliability [59]. Alternatively, if the mean size of a dynamically changing item or groups of
items are considered samples from separate distributions, items [40], or the mean expression of a dynamically chang-
and the goal is to estimate the mean of the sample, then ing face [56]. These findings demonstrate that perceptual
items should never be given zero weight in the average. averaging can operate over continuous and dynamic input,
One strategy would be to compute the mean and variance and that averaging across time can be as precise as aver-
of the samples, and to adjust the mean towards more aging across space. Whether temporal averaging mechan-
reliable measures in proportion to their variance. In this isms constantly accumulate information or sample from
case, an item with infinite variance would be included in high information points, such as salient transitions or
the initial estimate of the mean, but there would be no discontinuities in the input stream, remains an open ques-
additional updating of the mean towards this item. This tion. However, there is some evidence that certain infor-
strategy was employed in the simulations shown in mation in a temporal sequence will be given more weight in
Figure 4. the average than other information, possibly related to the
For ensemble averaging mechanisms to employ this amount of attention allocated to different points in the
type of precision-weighted averaging, the visual system temporal sequence [40].
would either have to know the degree of reliability with
which items are represented or have a heuristic to calcu- Number as an ensemble representation
late it. Both of these routes are plausible. Some models of Perhaps the most basic summary description for a collec-
visual perception model representations of individual tion of items is the number of items in the set. Without
items as probabilistic [59–61], in which knowledge is stored verbally counting, observers are able to estimate the ap-
as a probability distribution that explicitly contains a proximate number of items in a set [64–66]. Similar to the
representation of the reliability/variance of the represen- perception of mean properties, the ability to enumerate
tation. Alternatively, certain heuristics could be employed items in a set occurs rapidly. It is also possible to extract
for estimating reliability, such as giving peripheral items the number of items across multiple sets in parallel [39].
less weight because visual resolution is known to drop off Surprisingly, there is even evidence that number is directly
with eccentricity. Similarly, items inside the focus of at- perceived in the same way as other primary visual attri-
tention might be weighted more than items outside the butes [67]. Burr and Ross [67] demonstrated that it is
focus of attention because the precision with which items possible to adapt to number in the same way that it is
are represented is proportional to the amount of attention possible to adapt to visual properties such as color, orien-
we give them. These heuristics would not be explicit repre- tation or motion. Number literally seems to be a ‘perceived
sentations of reliability, but they are cues that are tightly property’ of sets. The relationship between the mechan-
correlated with reliability, and thus they could be used to isms underlying number representation and perceptual
weigh individual items as a proxy for reliability. averaging is an important topic for future research.
It has been suggested that attended items are given
more weight in the averaging of crowded orientation sig- Representing spatial patterns
nals [62]. One study has shown that when attention is Statistical summary representations, such as the mean or
drawn to a particular item in the set, the mean judgment is number of items in a set, are extremely compact represen-
biased towards that item [63]. One possible interpretation tations, collapsing the description of a set down to a single
of this finding is that attention enhances the resolution number. However, images often consist of spatially distrib-
with which the attended item is represented [42–44,48], uted patterns of information, also referred to as spatial
and that items are weighed by their precision or reliability regularities or spatial layout statistics. For example, nat-
when computing the mean [40]. This possibility is specu- ural images consist of regular distributions of orientation
lative and has not been directly tested in uncrowded dis- and spatial frequency information [34,68]. In one study,
plays. Oliva and Torralba [34] measured orientation energy at
different spatial scales over thousands of images and con-
Beyond spatial averaging ducted a principal components analysis on these measure-
Recent research on ensemble representation has gone ments. This analysis revealed that there are regularities in
beyond assessing the ability of observers to average visual the structure of natural images, with certain patterns of

126
()TD$FIG][ Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

objects? There is a growing body of evidence suggesting


that one perceives the higher-order summary statistics of
Spatial ensemble
information within the crowded region [21,73]. For a
representation
crowded set of oriented items, one perceives the average
orientation [21]. For more complex patterns, such as a set
of letters, the perceived pattern appears to result from a
Individual more complex statistical representation [73]. Balas and
representation colleagues generated stimuli using a model which uses the
joint statistics of cells which code for position, phase,
orientation and scale [73]. Any pattern, such as sets of
letters, can be passed through this model, resulting in a
Image synthetic image that is somewhat distorted, yet is statisti-
of world cally similar to the original. When directly viewed, the
original and the synthetic image look very different. How-
TRENDS in Cognitive Sciences ever, identification performance with these synthetic
images correlates with identification performance for
Figure 5. Spatial ensemble representations. Individual orientation measurements
can be combined to represent patterns of orientation information. For each
crowded letters in the periphery, suggesting that percep-
pattern, local orientation measurements are made (depicted as Gaussian curves tion in the periphery could consist of a similar statistical
centered around the true orientation), but each individual measure has a high representation. The relationship between ensemble repre-
degree of noise or uncertainty. Similar orientation signals are then pooled together
to characterize regions with similar orientation signals using the average
sentation and crowding raises important questions regard-
orientation. In the first column, the top half of the image has a mean orientation ing whether ensemble coding occurs automatically and
of vertical, whereas the bottom half has the a mean orientation of horizontal. The whether it is perceptual in nature (Box 2).
same is true for the image in the middle column. However, the pattern is flipped for
the third column, here the top half has a mean of horizontal and the bottom half
Other studies suggest that there could be important
has a mean of vertical. Crucially, at the level of individual representations, the left differences between ensemble representation and crowd-
and middle columns are just as different from each other as the left and right
columns. However, at the ensemble level, the left and middle columns are more
similar to each other than the left and right columns.
Box 2. Automaticity and directly perceived ensemble
representations
spatial frequency and orientation more likely to occur than A central question is whether the visual system automatically
other patterns. A schematic of a common pattern is shown computes ensemble representations without conscious intention or
in Figure 5, in which orientation signals tend to be more effort, or whether they are computed voluntarily based on task
similar to each other within the top and bottom halves of demands. If ensemble representations were automatically com-
puted, then we would conclude that there are dedicated mechan-
the image than they are across the top and bottom halves. isms for computing and representing them. We might then focus on
It would be efficient for the visual system to capitalize on identifying the core ensemble feature dimensions and assessing
the redundancy in natural images by using visual mechan- their tuning properties. To understand such mechanisms, we can
isms that are tuned to the statistics of the natural world bring to bear methods that have been employed to understand
perception, such as single-cell physiology, and perceptual adapta-
[11,69]. Indeed, a great deal of research has suggested that
tion. If ensemble representations are not computed automatically,
low-level sensory mechanisms are tuned to real-world but instead reflect a voluntary high-level judgment, then the
statistical regularities [17,70–72]. methods we would use, and questions we would ask, might be
The representation of such spatial ensemble statistics is somewhat different. For instance, physiology and adaptation are
robust to the withdrawal of attention, as would be expected unlikely to reveal much about these mechanisms and ensemble
representations would probably depend on task incentives and
if these ensemble representations are computed by pooling
observers’ goals. To understand such representations, we might
together local measurements [31]. For example, while explore regularities in how observers make ensemble judgments
attending to a set of moving objects in the foreground, and turn our attention towards identifying consistent heuristics and
changes to the background were only noticed when they biases in ensemble judgments.
altered the ensemble structure of the display, not when the In addition to the distinction between automatic and voluntary,
there is an important distinction between ‘directly perceived’ and
ensemble structure remained the same, even though these ‘read-out’ ensemble representations. In some cases the observer
changes were perfectly matched in terms of the magnitude directly perceives the ensemble representation. For example, when
of local change [31]. This suggests that the visual system a collection of items is presented in the periphery, their orientations
maintains an accurate representation of the spatial en- appear to be automatically averaged [28]. With such crowded items,
the perceptual experience is of ‘directly seeing’ the average
semble statistics of a scene, even when attention is focused
orientation (all items appear to have an orientation equal to the
on a subset of items in the visual field. mean of the group), with an accompanying loss of perceptual access
to the individual orientation signals. By contrast, when the same
Ensemble representation and crowding display appears at the fovea, the oriented items are not crowded and
Items in the visual field are often spaced too closely for each the orientation signals do not appear to be obligatorily averaged: it
is clear that the items have different orientations and none of them
individual item to be resolved. For instance, it is unlikely
appears to have an orientation that matches the average. However,
that one can perceive the individual letters three sentences even for uncrowded displays, it is possible that ensemble repre-
above or below this one. Yet, one can tell that there are sentations are automatically computed. For example, ensemble
letters present, that these letters are grouped into several representations appear to be automatically computed when the
words and so on. What is the nature of our perceptual primary task does not require it [77] and even when they impair task
performance [94].
representation when looking at a crowded collection of

127
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

ing. For instance, crowding is greater in the upper visual mutually exclusive manner. However, none of these previ-
field than the lower visual field, whereas under the same ous studies directly pitted ensemble memory versus indi-
conditions the accuracy of ensemble judgments was the vidual memory and assessed possible trade-offs between
same in the upper and lower visual field [74]. Thus, al- them. Future research will be necessary to explore the
though ensemble coding and crowding are closely related, extent to which ensemble representations and individual
there could be important dissociations between them. representations compete in memory. In terms of perceptu-
al representations, it seems clear that individual and
Neural correlates of ensemble representation ensemble representations can be maintained simulta-
Relatively little research has explored the neural mechan- neously [23].
isms of ensemble representation. Perhaps the most basic Whether ensemble coding is lossy or lossless depends
question we can ask is whether there are brain regions on the fate of lower-level, individual representations.
with neurons dedicated to computing ensemble represen- However, at the level of the ensemble representation, it
tations (above and beyond the computation of individual is clear the data have been transformed into a more
object representations). Extensive research suggests that compressed form. It is possible that this format is more
the parietal cortex plays an important role in the repre- conducive to memory storage and learning. Ensemble
sentation of number [75]. However, much less research has representations are more precise than the lower-level
been done to explore the representation of perceptual representations composing them. Thus, there can be
averages, such as mean size, mean facial expression or higher specificity of response at the ensemble level than
mean orientation. Future research in this area would at lower levels of representation. Such sparse coding has
provide important insight into the nature of ensemble several advantages [79,80], including minimizing overlap
coding, as well as the functional organization of the visual between representations stored in memory [81] and learn-
cortex. ing associations in neural networks [82]. The extent to
which observers can learn over ensemble representations
Additional benefits of computing ensemble of the type described in the present article is an important
representations topic for future research, because it could bridge the gap
The present article has focused on one primary benefit of between research on ensemble coding in visual cognition
ensemble representation: the ability to combine imprecise with the vast field of research on sparse coding and
individual measurements to construct an accurate repre- memory.
sentation of the group, or ensemble. However, computing
ensemble representations could yield many related bene- Ensemble representations as a basis for statistical
fits [18,76], which are discussed here. inference and outlier detection
Another potential benefit to building an ensemble repre-
Information compression sentation is to enable statistical inferences [83], including
Compression is the process of recoding data so that it takes estimating the parameters of the distribution (mean, vari-
fewer bits of information to represent that data. To the ance, range, shape), setting confidence intervals on those
extent that the encoding scheme distorts or loses informa- parameter estimates and classifying items into groups. A
tion, the compression is said to be lossy. For instance, TIFF special case of classification is outlier detection, and an
image encoding uses a form of lossless compression, where- ensemble representation is ideal for this purpose [18,76].
as JPEG image encoding is a lossy form of image compres- For instance, if a set is well described by a distribution
sion – although the information lost occurs at such a high along an arbitrary dimension, say with a mean of 20 and
spatial frequency that human observers typically cannot standard deviation of 3, then an item with a value of 30
detect this loss. Ariely [18] proposed that reducing the along this dimension is unlikely to be a member of the set.
representation of a set to the mean, and discarding indi- The ensemble representation would enable labeling this
vidual representations, would be a sensible form of lossy item as an outlier or even as a member of a different group.
compression for the human visual system: it leaves avail- Outlier detection has been extensively studied using the
able an informative global percept which could potentially visual search paradigm, in which the question has been
be used to navigate and choose regions of interest for whether an oddball item will instantly ‘pop out’ from a
further analysis. However, this form of compression would larger set of homogeneous items [84]. Items that are very
only be economical if ensemble representations and indi- different from the set, say a red item among green items,
vidual representations were ‘competing’ in some sense. are said to be salient, and are easy to find in a visual search
Otherwise, in terms of compression, there is no advantage task [85,86]. Interestingly, computational models of salien-
to discarding the individual representations, and one cy focus on ‘local differences’ between each item and its
might as well extract the ensemble and retain the individ- neighbors [87]. However, one could imagine displays in
ual representations. There is some evidence that ensemble which the local context of a search target remained un-
representations take the same memory space as individual changed, but more distant items varied to either increase
representations [39], although other studies suggest that or decrease the degree to which the target appeared to be a
ensemble representations and noisy individual represen- member of the overall set. Finding that outlier status
tations are maintained concurrently and that these levels guides visual search above and beyond its effects on local
are mutually informative [77,78]. These findings suggest saliency would provide strong support for the idea that
that ensemble representations and individual representa- ensemble representations play an important role in outlier
tions probably do not compete for storage, at least not in a detection.

128
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Although it would be interesting if ensemble represen- Benefits of building a hierarchical representation of a


tations could enable rapid outlier detection, this finding is scene
not necessary to support the idea that ensemble represen- There are distinct computational advantages to building a
tations play an important role in classifying and grouping hierarchical representation of a scene. In particular, by
items. For instance, a face with a unique facial expression integrating information across levels of representation, it
does not pop-out in a visual search task [88]. However, is possible to increase the accuracy of lower-level repre-
recent research shows that an outlier face is given reduced sentations. It appears that observers automatically con-
weight in the ensemble representation of a group of faces struct this type of representation when asked to hold a
[58], even though observers often fail to perceive the outli- scene in working memory [77,78]. For instance, when
er. This finding is consistent with the possibility that the recalling the size of an individual item from a display,
ensemble representation enables labeling of items, but the remembered size was biased towards the mean size of
could also indicate that the ensemble computation gives the set of items in the same color, and towards the overall
outliers lower weight without attaching a classification mean size of all items in the display [77]. These results
label. The role of ensemble representations in determining were well captured by a Bayesian model in which obser-
set membership has not yet been extensively studied, and vers integrate information at multiple levels of abstrac-
research in this area can potentially bridge the gap be- tion to inform their judgment about the size of the tested
tween study on ensemble representation, statistical infer- item.
ence and perceptual grouping.
Concluding remarks
Building a ‘gist’ representation that can guide the focus Traditional research on visual cognition has typically
of attention assessed the limits of visual perception and memory for
As detailed in previous sections, the power of averaging individual objects, often using random and unstructured
makes it possible to combine imprecise local measure- displays. However, there is a great deal of structure and
ments to yield a relatively precise representation of the redundancy in real-world images, presenting an opportu-
ensemble (Figure 1). Moreover, it is possible to combine nity to represent groups of objects as an ensemble. Because
individual measurements to describe spatial patterns of ensemble representations summarize the properties of a
information (Figure 5). A primary benefit of computing group, they are necessarily spatially and temporally im-
either type of ensemble representation is to provide a precise. Nevertheless, such ensemble representations con-
precise and accurate representation of the ‘gist’ of infor- fer several important benefits. Much of the previous
mation outside the focus of attention. Without focused research on ensemble representation has focused on the
attention, our representations of visual information are fact that the human visual system is capable of computing
highly imprecise [23]. If we were to simply discard or ignore accurate ensemble representations. However, the field is
these noisy representations, our conscious visual experi- moving towards a focus on investigating the mechanisms
ence would be limited to only those items currently within that enable ensemble coding, the nature of the ensemble
the focus of attention. Indeed, some have argued that this representation, the utility of ensemble representations and
is the nature of conscious visual experience [89,90]. In such the neural mechanisms underlying ensemble coding. This
a system, attention would be ‘flying blind’, without access future research promises to uncover important new prop-
to any information about what location or region to focus on erties of the representations underlying visual cognition
next. and to further demonstrate how representing ensembles
Although locally imprecise, ensemble representations enhances visual cognition.
provide an accurate representation of higher-level patterns
and regularities outside the focus of attention [23,31]. Acknowledgments
These patterns and regularities are highly diagnostic of For helpful conversation and/or comments on earlier drafts, I thank Talia
the type of scene one is viewing [14], and therefore they are Konkle, Jason Haberman and Jordan Suchow. G.A.A. was supported by
useful for determining which environment one is currently the National Science Foundation (Career Award BCS-0953730).
located within. Over experience, observers appear to learn
associations between these ensemble representations and References
the location of objects in the visual field. For instance, 1 Kersten, D. (1987) Predictability and redundancy of natural images. J.
Opt. Soc. Am. A 4, 2395–2400
observers appear to use global contextual information to
2 Field, D.J. (1987) Relations between the statistics of natural images
guide the deployment of attention to locations likely to and the response properties of cortical cells. J. Opt. Soc. Am. A 4, 2379–
contain the target of a visual search task [33,91–93].Thus, 2394
rather than flying blind, the visual system can compute 3 Brady, N. and Field, D.J. (2000) Local contrast in natural images:
ensemble representations, providing a sense of the gist of normalisation and coding efficiency. Perception 29, 1041–1055
4 Frazor, R.A. and Geisler, W.S. (2006) Local luminance and contrast in
information outside the focus of attention, and guiding the
natural images. Vis. Res. 46, 1585–1598
deployment of attention to important regions of a scene. 5 Webster, M.A. and Mollon, J.D. (1997) Adaptation and the color
In terms of forming a complete representation of a statistics of natural images. Vis. Res. 37, 3283–3298
scene, gist representation and outlier detection probably 6 Hyvärinen, A. and Hoyer, P.O. (2000) Emergence of phase and shift
work in tandem. For instance, when holding a scene in invariant features by decomposition of natural images into
independent feature subspaces. Neural Comput. 12, 1705–1720
working memory, observers appear to encode the gist of the 7 Judd, D.B. et al. (1964) Spectral distribution of typical daylight as a
scene plus individual items that cannot be incorporated function of correlated color temperature. J. Opt. Soc. Am. A 54, 1031–
into the summary for the rest of the scene (i.e. outliers) [78]. 1040

129
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

8 Long, F. et al. (2006) Spectral statistics in natural scenes predict hue, 41 Galton, F. (1907) Vox populi. Nature 75, 450–451
saturation, and brightness. Proc. Natl. Acad. Sci. U.S.A. 103, 6013– 42 Palmer, J. (1990) Attentional limits on the perception and memory
6018 of visual information. J. Exp. Psychol. Hum. Percept. Perform. 16, 332–
9 Maloney, L.T. (1986) Evaluation of linear models of surface spectral 350
reflectance with small numbers of parameters. J. Opt. Soc. Am. A 3, 43 Alvarez, G.A. and Franconeri, S.L. (2007) How many objects can you
1673–1683 track? Evidence for a resource-limited attentive tracking mechanism.
10 Maloney, L.T. and Wandell, B.A. (1986) Color constancy: a method for J. Vis. 7, 1–10
recovering surface spectral reflectance. J. Opt. Soc. Am. A 3, 29–33 44 Franconeri, S.L. et al. (2007) How many locations can be selected at
11 Field, D.J. (1989) What the statistics of natural images tell us about once? J. Exp. Psychol. Hum. Percept. Perform. 33, 1003–1012
visual coding. SPIE: Hum. Vis. Vis. Process. Digit. Display 1077, 45 Titchener, E.B. (1908) Lectures on the Elementary Psychology of Feeling
269–276 and Attention, Macmillan
12 Burton, G.J. and Moorehead, I.R. (1987) Color and spatial structure in 46 Carrasco, M. et al. (2004) Attention alters appearance. Nat. Neurosci. 7,
natural scenes. Appl. Opt. 26, 157–170 308–313
13 Geisler, W.S. (2008) Visual perception and the statistical properties of 47 Carrasco, M. et al. (2002) Covert attention increases spatial resolution
natural scenes. Annu. Rev. Psychol. 59, 167–192 with or without masks: support for signal enhancement. J. Vis. 2,
14 Torralba, A. and Oliva, A. (2003) Statistics of natural image categories. 467–479
Network 14, 391–412 48 Yeshurun, Y. and Carrasco, M. (1998) Attention improves or impairs
15 Huffman, D.A. (1952) A method for construction of minimum visual performance by enhancing spatial resolution. Nature 396, 72–75
redundancy codes. Proc. IRE 40, 1098–1101 49 Mack, A. and Rock, I. (1998) Inattentional Blindness, The MIT Press
16 Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of 50 Neisser, U. and Becklen, R. (1975) Selective looking: attending to
Communication, The University of Illinois Press visually specified events. Cognit. Psychol. 7, 480–494
17 Atick, J.J. (1992) Could information theory provide an ecological theory 51 Most, S.B. et al. (2005) What you see is what you set: sustained
of sensory processing? Network: Comput. Neural Syst. 3, 213–251 inattentional blindness and the capture of awareness. Psychol. Rev.
18 Ariely, D. (2001) Seeing sets: representation by statistical properties. 112, 217–242
Psychol. Sci. 12, 157–162 52 Setic, M. et al. (2007) Modelling the statistical processing of visual
19 Chong, S.C. and Treisman, A. (2003) Representation of statistical information. Neurocomputing 70, 1808–1812
properties. Vis. Res. 43, 393–404 53 Kinchla, R.A. and Wolfe, J.M. (1979) The order of visual processing:
20 Bauer, B. (2009) Does Steven’s power law for brightness extend to ‘‘Top-down’’, ‘‘bottom-up’’, or ‘‘middle-out’’. Percept. Psychophys. 25,
perceptual brightness averaging? Psychol. Rec. 59, 171–186 225–231
21 Parkes, L. et al. (2001) Compulsory averaging of crowded orientation 54 Myczek, K. and Simons, D.J. (2008) Better than average: alternatives
signals in human vision. Nat. Neurosci. 4, 739–744 to statistical summary representations for rapid judgments of average
22 Dakin, S.C. and Watt, R.J. (1997) The computation of orientation size. Percept. Psychophys. 70, 772–788
statistics from visual texture. Vis. Res. 37, 3181–3192 55 Chong, S.C. and Treisman, A. (2005) Attentional spread in the
23 Alvarez, G.A. and Oliva, A. (2008) The representation of simple statistical processing of visual displays. Percept. Psychophys. 67, 1–13
ensemble visual features outside the focus of attention. Psychol. Sci. 56 Haberman, J. et al. (2009) Averaging facial expression over time. J. Vis.
19, 392–398 9, 1–13
24 Haberman, J. and Whitney, D. (2007) Rapid extraction of mean 57 Chong, S.C. et al. (2008) Statistical processing: not so implausible after
emotion and gender from sets of faces. Curr. Biol. 17, R751–R753 all. Percept. Psychophys. 70, 1327–1334
25 de Fockert, J. and Wolfenstein, C. (2009) Rapid extraction of mean 58 Haberman, J. and Whitney, D. (2010) The visual system discounts
identity from sets of faces. Q. J. Exp. Psychol. (Colchester) 62, 1716– emotional deviants when extracting average expression. Atten. Percept.
1722 Psychophys. 72, 1825–1838
26 Spencer, J. (1961) Estimating averages. Ergonomics 4, 317–328 59 Kersten, D. and Yuille, A. (2003) Bayesian models of object perception.
27 Smith, A.R. and Price, P.C. (2010) Sample size bias in the estimation of Curr. Opin. Neurobiol. 13, 150–158
means. Psychon. Bull. Rev. 17, 499–503 60 Vul, E. and Pashler, H. (2008) Measuring the crowd within:
28 Morgan, M. et al. (2008) A ‘dipper’ function for texture discrimination probabilistic representations within individuals. Psychol. Sci. 19,
based on orientation variance. J. Vis. 8, 1–8 645–647
29 Peterson, C.R. and Beach, L.R. (1967) Man as an intuitive statistician. 61 Vul, E. and Rich, A.N. (2010) Independent sampling of features enables
Psychol. Bull. 68, 29–46 conscious perception of bound objects. Psychol. Sci. 21, 1168–1175
30 Pollard, P. (1984) Intuitive judgments of proportions, means, and 62 Mareschal, I. et al. (2010) Attentional modulation of crowding. Vis. Res.
variances: a review. Curr. Psychol. 3, 5–18 50, 805–809
31 Alvarez, G.A. and Oliva, A. (2009) Spatial ensemble statistics are 63 de Fockert, J.W. and Marchant, A.P. (2008) Attention modulates set
efficient codes that can be represented with reduced attention. Proc. representation by statistical properties. Percept. Psychophys. 70,
Natl. Acad. Sci. U.S.A. 106, 7345–7350 789–794
32 Oliva, A. and Torralba, A. (2006) Building the gist of a scene: the role of 64 Dehaene, S. et al. (1998) Abstract representations of numbers in the
global image features in recognition. Prog. Brain Res. 155, 23–36 animal and human brain. Trends Neurosci. 21, 355–361
33 Oliva, A. and Torralba, A. (2007) The role of context in object 65 Feigenson, L. et al. (2004) Core systems of number. Trends Cogn. Sci. 8,
recognition. Trends Cogn. Sci. 11, 520–527 307–314
34 Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a 66 Whalen, J. et al. (1999) Nonverbal counting in humans: the
holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, psychophysics of number representation. Psychol. Sci. 10, 130–137
145–175 67 Burr, D. and Ross, J. (2008) A visual sense of number. Curr. Biol. 18,
35 Navon, D. (1977) Forest before trees: the precedence of global features 425–428
in visual perception. Cognit. Psychol. 9, 353–383 68 Geisler, W.S. et al. (2001) Edge co-occurrence in natural images
36 Kimchi, R. (1992) Primacy of wholistic processing and global/local predicts contour grouping performance. Vis. Res. 41, 711–724
paradigm: a critical review. Psychol. Bull. 112, 24–38 69 Chandler, D.M. and Field, D.J. (2007) Estimates of the information
37 Thompson, P. (1980) Margaret Thatcher: a new illusion. Perception 9, content and dimensionality of natural scenes from proximity
483–484 distributions. J. Opt. Soc. Am. A 24, 922–941
38 Young, A.W. et al. (1987) Configurational information in face 70 Barlow, H.B. and Foldiak, P. (1989) Adaptation and decorrelation in
perception. Perception 16, 747–759 the cortex. In The Computing Neuron (Durbin, R. et al., eds), pp. 54–72,
39 Halberda, J. et al. (2006) Multiple spatially overlapping sets can be Addison-Wesley
enumerated in parallel. Psychol. Sci. 17, 572–576 71 Lewicki, M.S. (2002) Efficient coding of natural sounds. Nat. Neurosci.
40 Albrecht, A.R. and Scholl, B.J. (2010) Perceptually averaging in a 5, 356–363
continuous visual world: extracting statistical summary 72 Olshausen, B.A. and Field, D.J. (1996) Natural image statistics and
representations over time. Psychol. Sci. 21, 560–567 efficient coding. Network 7, 333–339

130
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

73 Balas, B. et al. (2009) A summary-statistic representation in peripheral 83 Rosenholtz, R. (2000) Significantly different textures: a computational
vision explains visual crowding. J. Vis. 9, 13–18 model of pre-attentive texture segmentation. In Proceedings of the 6th
74 Bulakowski, P.F. et al. Reexamining the possible benefits of visual European Conference on Computer Vision (Vernon, D., ed.), pp. 197–
crowding: dissociating crowding from ensemble percepts. Atten. 211, Springer-Verlag
Percept. Psychophys. (in press) 84 Rosenholtz, R. (1999) A simple saliency model predicts a number of
75 Piazza, M. and Izard, V. (2009) How humans count: numerosity and the motion popout phenomena. Vis. Res. 39, 3157–3163
parietal cortex. Neuroscientist 15, 261–273 85 Itti, L. and Koch, C. (2001) Computational modelling of visual
76 Cavanagh, P. (2001) Seeing the forest but not the trees. Nat. Neurosci. attention. Nat. Rev. Neurosci. 2, 194–203
4, 673–674 86 Wolfe, J.M. (1994) Guided search 2.0: a revised model of visual search.
77 Brady, T.F. and Alvarez, G.A. Hierarchical encoding in visual working Psychon. Bull. Rev. 1, 202–238
memory: ensemble statistics bias memory for individual items. 87 Itti, L. and Koch, C. (2000) A saliency-based search mechanism for
Psychol. Sci. (in press) overt and covert shifts of visual attention. Vis. Res. 40, 1489–1506
78 Brady, T.F. and Tenenbaum, J.B. (2010) Encoding higher-order 88 Nothdurft, H.C. (1993) Faces and facial expressions do not pop out.
structure in visual working-memory: a probabilistic model. In Perception 22, 1287–1298
Proceedings of the 32nd Annual Conference of the Cognitive Science 89 Noë, A. and O’Regan, J.K. (2000) Perception, attention and the grand
Society (Ohlsson, S. and Catrambone, R., eds), pp. 411–416, Cognitive illusion. Psyche 6 (http://psyche.cs.monash.edu.au/v6/psche-6-15-noe.
Science html)
79 Olshausen, B.A. and Field, D.J. (2004) Sparse coding of sensory inputs. 90 O’Regan, J.K. (1992) Solving the ‘‘real’’ mysteries of visual perception:
Curr. Opin. Neurobiol. 14, 481–487 the world as an outside memory. Can. J. Psychol. 46, 461–488
80 Olshausen, B.A. and Field, D.J. (1997) Sparse coding with an 91 Torralba, A. et al. (2006) Contextual guidance of eye movements and
overcomplete basis set: a strategy employed by V1? Vis. Res. 37, attention in real-world scenes: the role of global features in object
3311–3325 search. Psychol. Rev. 113, 766–786
81 Willshaw, D.J. et al. (1969) Non-holographic associative memory. 92 Ehinger, K.A. et al. (2009) Modeling search for people in 900 scenes: a
Nature (Lond.) 222, 960–962 combined source model of eye guidance. Vis. Cogn. 17, 945–978
82 Zetzsche, C. (1990) Sparse coding: the link between low level vision 93 Chun, M.M. (2000) Contextual cueing of visual attention. Trends Cogn.
and associative memory. In Parallel Processing in Neural Systems Sci. 4, 170–178
and Computers (Eckmiller, R. et al., eds), pp. 273–276, Elsevier 94 Haberman, J. and Whitney, D. (2009) Seeing the mean: ensemble
Science coding for sets of faces. Hum. Percept. Perform. 35, 718–734

131
Review

Cognitive neuroscience of
self-regulation failure
Todd F. Heatherton and Dylan D. Wagner
Department of Psychological and Brain Sciences, 6207 Moore Hall, Dartmouth College, Hanover, NH 03755, USA

Self-regulatory failure is a core feature of many social at annoying coworkers and curb bad habits, such as smok-
and mental health problems. Self-regulation can be ing or drinking too much. Psychologists have made consid-
undermined by failures to transcend overwhelming erable progress in identifying the individual and
temptations, negative moods and resource depletion, situational factors that encourage or impair self-control
and when minor lapses in self-control snowball into self- [4,5,10]. The most common circumstances under which
regulatory collapse. Cognitive neuroscience research self-regulation fails are when people are in bad moods,
suggests that successful self-regulation is dependent when minor indulgences snowball into full-blown binges,
on top-down control from the prefrontal cortex over when people are overwhelmed by immediate temptations
subcortical regions involved in reward and emotion. or impulses, and when control itself is impaired (e.g. after
We highlight recent neuroimaging research on self-reg- alcohol consumption or effort depletion). Researchers have
ulatory failure, the findings of which support a balance examined each of these and we briefly discuss the major
model of self-regulation whereby self-regulatory failure findings, beginning with the behavioral literature and then
occurs whenever the balance is tipped in favor of sub- discussing recent neuroscience findings.
cortical areas, either due to particularly strong impulses
or when prefrontal function itself is impaired. Such a Negative moods
model is consistent with recent findings in the cognitive Among the most important triggers of self-regulation fail-
neuroscience of addictive behavior, emotion regulation ure are negative emotions [11,12]. When people become
and decision-making. upset they sometimes act aggressively [13], spend too
much money [14], engage in risky behavior [15], including
The advantages of self-control unprotected sex [16], comfort the self with alcohol, drugs or
The ability to control behavior enables humans to live food [4,17], and fail to pursue important life goals. Indeed,
cooperatively, achieve important goals and maintain negative emotional states are related to relapse for a
health throughout their life span. Self-regulation enables number of addictive behaviors, such as alcoholism, gam-
people to make plans, choose from alternatives, control bling and drug addiction [18,19]. Laboratory studies have
impulses, inhibit unwanted thoughts and regulate social demonstrated that inducing negative affect leads to height-
behavior [1–4]. Although humans have an impressive ca- ened cravings among alcoholics [12], increased eating by
pacity for self-regulation, failures are common and people chronic dieters [20,21] and greater smoking intensity by
lose control of their behavior in a wide variety of circum- smokers [22].
stances [1,5]. Such failures are an important cause of A theory by Heatherton and Baumeister provides an
several contemporary societal problems – obesity, addic- explanation for the roles of negative affect in disinhibited
tion, poor financial decisions, sexual infidelity and so on. eating [23], which is also applicable to other self-regulatory
Indeed, it has been estimated that 40% of deaths are failures. This theory proposes that dieters hold a negative
attributable to poor self-regulation [6]. Conversely, those view of self that is generally unpleasant (especially con-
who are better able to self-regulate demonstrate improved cerning physical appearance) and that dieters are motivat-
relationships, increased job success and better mental ed to escape from these unpleasant feelings by constricting
health [7,8] and are less at risk of developing alcohol abuse their cognitive attention to the immediate situation while
problems or engaging in risky sexual behavior [9]. An ignoring the long-term implications and higher-level sig-
understanding of the circumstances under which people nificance of their current actions. This escape from aversive
fail at self-regulation – as well as the brain mechanisms self-awareness not only helps dieters to forget their un-
associated with those failures – can provide valuable pleasant views of self, but also disengages long-term plan-
insights into how people regulate and control their ning and meaningful thinking and weakens the inhibitions
thoughts, behaviors and emotions. that normally restrain a dieter’s food intake. This might
explain, in part, the lack of insight that occurs in drug
Self-regulation failure addiction [24]. Other behavioral accounts of the impact of
The modern world holds many temptations. Every day, negative mood on behavior include the idea that negative
people need to resist fattening foods, avoid browsing the affect occupies attention, thereby leading to fewer
internet when they should be working, keep from snapping resources to inhibit behavior [25], or that engaging in
appetitive behaviors reduces anxiety and comforts the self
Corresponding author: Heatherton, T.F. (heatherton@dartmouth.edu). and is therefore a form of coping [26].
132 1364-6613/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.12.005 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Lapse-activated consumption (Figure 1a). This disinhibition of dietary restraint has been
A common pattern of self-regulation failure occurs for replicated numerous times [20,28] and demonstrates that
addicts and chronic dieters when they ‘fall off the wagon’ dieters often eat a great deal after they perceive their diets
by consuming the addictive substance or violating their to be broken. It is currently not clear, however, how a small
diets [5]. Marlatt coined the term abstinence violation indulgence, which itself might not be problematic, esca-
effect to refer to situations in which addicts respond to lates into a full-blown binge [29].
an initial indulgence by consuming even more of the for-
bidden substance [11]. In one of the first studies to examine Cue exposure
this effect, Herman and Mack experimentally violated the At the core of self-regulation is impulse control, but how do
diets of dieters by requiring them to drink a milkshake, a impulses arise? Both human and animal studies have
high-calorie food, as part of a supposed taste perception demonstrated that exposure to drug cues increases the
study [27]. Although non-dieters ate less after consuming likelihood that the cued substance will be consumed [30–
the milkshakes, presumably because they were full, dieters 33], and additionally increases cravings, attention and
[()TD$FIG]paradoxically ate more after having the milkshake physiological responses such as changes in heart rate

(a) 250
No preload
Milkshake

Ice cream consummed (g)


200

150

100

50

0
Diet Non-diet

0.5
(b) No preload
0.4 Milkshake

0.3
Bold signal change

0.2

0.1

-0.1

-0.2
Right NAcc (12, 9, -3)
-0.3
Diet Non-diet

0.5
No preload
0.4 Milkshake

0.3
Bold signal change

0.2

0.1

-0.1

-0.2
Left NAcc (-15, 3, -8)
-0.3
Diet Non-diet

TRENDS in Cognitive Sciences

Figure 1. (a) When restrained eaters’ diets were broken by consumption of a high-calorie milkshake preload, they subsequently show disinhibited eating (e.g. increased
grams of ice-cream consumed) compared to control subjects and restrained eaters who did not drink the milkshake (figure based on data from [30]). (b) Restrained eaters
whose diets were broken by a milkshake preload showed increased activity in the nucleus accumbens (NAcc) compared to restrained eaters who did not consume the
preload and satiated non-dieters [64].

133
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

[33–35]. Yet people might be unaware that their environ- emerge from this research is that self-regulation draws
ments are influencing them because stimuli can activate on a common domain-general resource, so that, for exam-
goals, cravings and so forth implicitly [36,37]. Even if ple, regulating one’s emotions over an extended period of
people are somewhat aware of cues around them, they time impairs subsequent attempts at resisting the temp-
are unaware of the process by which exposure to those cues tation to eat appetizing foods and results in disinhibited
implicitly activates cognitive processes that determine eating [43]. Baumeister and Heatherton proposed a
behavior [38]. A recent meta-analysis of 75 articles found strength model of self-regulation in which it was hypothe-
that implicit cognition is a strong and reliable predictor of sized that the ability to effectively regulate behavior
substance use [39]. From this perspective, cognition that is depends on a limited resource that is consumed by effortful
spontaneously activated by stimuli from the environment attempts at self-regulation [5]. In addition, this model also
alters how people act in a given situation. posited that self-regulatory capacity can be built up
The ability to transcend immediate temptations in the through practice and training (Box 1).
service of long-term goals is a key aspect of self-regulation Since its formulation there has been a tremendous surge
[5,40]. In an important series of studies, Mischel and in research supporting the notion that self-regulation
colleagues studied how preschoolers responded in the face relies on a limited resource. Studies of self-regulatory
of temptation in situations in which delaying gratification resource depletion have demonstrated that self-regulatory
led to larger rewards [40,41]. Successful self-control was resources can be depleted by a wide range of activities,
associated with either redirection of attention away from from suppressing thoughts [44] and inhibiting emotions
temptation or cognitive reframing of ‘hot’ appetitive fea- [43] to managing the impressions we make [45] and en-
tures into ‘cool’ representations [40]. A related pattern is gaging in interracial interactions [46]. A recent meta-anal-
found in behavioral economic studies in which people ysis of 83 studies of self-regulatory depletion concluded
discount future rewards in decision-making by choosing that the limited resource account of self-regulation
less objectively valuable rewards that are immediately remains the best explanation for this effect [10]. More
available [42]. A common feature of these studies is that recently, it has been suggested that self-regulation relies
people respond to appetizing cues by succumbing to imme- on adequate levels of circulating blood glucose that are
diate gratification rather than resisting temptation to temporarily reduced by tasks that require effortful self-
achieve long-term goals. regulation (Box 2).

Self-regulatory resource depletion Functional neuroimaging studies of self-regulation


Self-regulation, like many other cognitive faculties, is sub- Functional neuroimaging studies of self-regulation and its
ject to fatigue. One of the more influential theories to failures suggest that self-regulation involves a balance
between brain regions representing the reward, salience
and emotional value of a stimulus and prefrontal regions
Box 1. Can self-regulatory capacity be increased?
In addition to postulating that self-regulation relies on a limited Box 2. Self-regulatory resource depletion and blood
domain-general resource, the limited resource account of self- glucose
regulatory failure [5] also predicted that that self-regulatory capacity
could be increased through practice or training. In the first study to One issue with the limited resource model of self-regulation has
examine the effect of self-regulatory training, participants engaged been the lack of biological specificity in identifying the actual
in a variety of daily tasks that required exertion of small amounts of resource that is depleted by acts of self-control. It has recently been
self-control (e.g. remembering to maintain good posture). Com- suggested that self-regulation relies on circulating blood glucose
pared to control participants, those who engaged in modest [104]. In a series of experiments, Gailliot and colleagues demon-
amounts of daily self-control were more resistant to the effects of strated that engaging in effortful self-control reduces blood glucose
self-regulatory depletion [100]. In addition, it has been shown that levels [105]. Moreover, they also found that artificially raising blood
simple self-control regimens, such as using the non-dominant hand glucose levels eliminates the effects of self-regulatory depletion
for daily activities, can reduce the depleting effects of suppressing [105,106].
stereotypes [101]. More recently, these results have been extended Although the notion that glucose metabolism affects self-regula-
to health behaviors such as smoking cessation. Engaging in simple tion is recent, the impact of glucose on cognitive performance has
daily self-control exercises (e.g. avoiding unhealthy foods) before been known for some time. For example, studies conducted in the
stopping smoking led to increased abstinence rates at follow-up for 1990 s showed that administering glucose improves performance
those who practiced self-control compared to a control group that on memory tasks and on tasks requiring response inhibition [107]. In
did not [102]. These findings support the notion that self-regulatory many respects this should come as no surprise, because glucose
strength can be increased through practice and that once increased, metabolism is the primary contrast in functional neuroimaging with
this newfound capacity to self-regulate can be used not only for positron emission tomography (PET), which, among numerous
comparatively banal tasks such as maintaining posture or using other findings, has demonstrated that glucose metabolism in-
one’s non-dominant hand, but also for behaviors with important creases with task difficulty [108]. In light of this research, it seems
health consequences such as resisting the temptation to smoke. plausible that self-regulatory failure following resource depletion is
If self-regulatory capacity can be increased through simple self- at least partly due to a temporary reduction in brain glucose stores.
control exercises over relatively short periods of time, what about Finally, self-regulation relies primarily on cognitive functions that
people whose profession requires constant self-regulation (e.g. are ascribed to the prefrontal cortex, so depletion effects should
professional musicians, air traffic controllers)? The study of self- presumably be greatest when both the depleting task and the
regulatory capacity in such populations has remained largely subsequent self-regulation task recruit the same region of the brain.
unexplored; however, related research has shown that a relation- Although this has yet to be tested, PET neuroimaging, with its ability
ship exists between musical training and grey matter in the to directly measure glucose metabolism, is an ideal method for
dorsolateral prefrontal cortex [103], a brain region that has been investigating the link between focal glucose depletion in the brain
implicated in both working memory and self-control [3]. and subsequent impairments in self-regulation.

134
()TD$FIG][ Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

Threats to self-regulation Impulses overwhelm


prefrontal control
Cue exposure Prefrontal-subcortical
circuit is broken
Lapse activated consumption

Negative mood
XLateral
Resource depletion PFC
NAcc X
amygdala
Alcohol consumption

Prefrontal brain damage PFC function Leading to


is impaired self-regulatory failure

TRENDS in Cognitive Sciences

Figure 2. Schematic of a balance model of self-regulation and its failure, highlighting the four threats to self-regulation identified in the text and their putative impact on
brain areas involved in self-regulation. This model suggests that self-regulatory failure occurs whenever the balance is tipped in favor of subcortical regions involved in
reward and emotion, either due to the strength of an impulse or due to a failure to appropriately engage top-down control mechanisms.

associated with self-control. When this balance tips in Of particular interest is what happens when partici-
favor of bottom-up impulses, either because of a failure pants attempt to regulate their responses to reward cues
to engage prefrontal control areas or because of an espe- such as those representing money, food or drugs. When
cially strong impulse (e.g. the sight and smell of cigarettes cocaine users [60] or smokers [61,62] are instructed to
for an abstinent smoker), then the likelihood of self-regu- inhibit craving, they show increased activity in regions
latory failure increases (Figure 2). of the prefrontal cortex (PFC) associated with self-control
and reduced cue-reactivity in regions associated with re-
Regulation of appetitive behaviors ward processing. Specifically, Volkow and colleagues
A universal feature of rewards, including drugs of abuse, is showed that when cocaine users inhibit their craving in
that they activate dopamine receptors in the mesolimbic response to cocaine cues, they show reduced activity in the
dopamine system, especially the nucleus accumbens orbitofrontal cortex and ventral striatum [60]. Moreover,
(NAcc) in the ventral striatum [47–49]. Functional neuro- the magnitude of this reduction is correlated with an
imaging studies have shown that the ingestion of drugs increase in activity in lateral PFC [60]. Similarly, in smo-
similarly increases activity in NAcc [50]. Earlier we noted kers, activity in the dorsolateral PFC during regulation of
that cue exposure is associated with self-regulation failure. smoking craving correlated with reduced activity in the
Neuroimaging studies reveal a plausible mechanism for ventral striatum to smoking cues and this relationship
such effects. When addicted individuals are exposed to mediated reductions in self-reported craving [61]. This
visual cues that have become associated with drugs (e.g. effect is also observed in healthy participants who are
images of drugs and drug paraphernalia), they also show instructed to regulate their response to cues representing
cue-related activity in the mesolimbic reward system [51– monetary rewards; regulation of their response to reward
53] and the insula [54]. Likewise, in neuroeconomic studies cues results in decreased cue-related activity in the ventral
of decision-making, activity in mesolimbic reward struc- striatum [63]. Finally, a recent study extended the above
tures is associated with choosing immediate monetary findings by demonstrating that individual differences in
rewards [55,56]. Indeed, dopamine agonists increase im- activity in the lateral PFC during a simple inhibition task
pulsive behavior in intertemporal choice tasks [57]. Hence, were associated with real-world reductions in cigarette
exposure to cues activates reward regions, probably be- craving and consumption among smokers over a 3-week
cause of learned expectancies that the observed stimulus period [64].
will be consumed and provide genuine reward. That is, The above studies indicate that regulation of craving
over the course of human evolution, food-relevant stimuli, requires top-down control of brain reward systems by PFC
for example, were usually real and edible rather than mere control regions [60,61,63]. But what happens when self-
visual representations. Thus, cue exposure motivates peo- control breaks down? As mentioned previously, one com-
ple to seek out relevant rewards. Interestingly, it seems mon reason why self-regulation fails is lapse-activated
likely that cue reactivity might influence motivation out- consumption, such as when dieters break their diet and
side of conscious awareness [24,37,38,54]. Indeed, Child- temporarily engage in disinhibited eating [20,27,65,66].
ress and colleagues found that ‘unseen’ stimuli of cocaine One possible mechanism for this paradoxical pattern is
(presented for 33 ms and then backward masked) produced that the initial intake of the food serves as a hedonic prime,
striatal activity for cocaine addicts [58]. This supports the and thereby brain regions involved in reward (i.e. NAcc)
proposition that implicit cognition might be important in are freed from the regulatory influence of PFC, subse-
part because people are unaware that such unconscious quently demonstrating a heightened response to appetiz-
processes are shaping their behavior and are therefore ing food. A recent study tested this proposition by
unable to resist their influence [59]. examining the effect of breaking a diet on neural

135
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

cue-reactivity to appetizing foods in dieters [67]. Compared tween the medial PFC and the amygdala [83]. Similarly,
to both non-dieters and dieters whose diet remained intact, reductions in white matter connectivity between the me-
those who had their diet broken showed increased cue- dial PFC and the amygdala, as measured with diffusion
reactivity to appetizing foods in the NAcc (Figure 1b), tensor imaging, were found for individuals with high anxi-
which echoes the behavioral findings of Herman and ety [84]. In the non-clinical population, it has been shown
Mack[27]. Interestingly, non-dieters showed the opposite that prolonged sleep deprivation leads to increased amyg-
result; the NAcc showed the greatest response in the water dala response to aversive images [85].
condition, when subjects might have been hungry, but not
in the milkshake condition, when participants were sati- Regulation of attitudes and prejudice
ated. Thus, exposure to relevant cues or ingestion of for- Social psychological models of person categorization sug-
bidden substances heightens subcortical activity in reward gest that stereotypes are automatically activated on en-
regions, thereby tipping the balance so that frontal countering outgroup members and that active inhibition is
mechanisms seem to have less power over behavior. required to suppress stereotypes and thereby avoid preju-
Self-regulation failure also occurs when frontal execu- dicial behavior [86,87]. Functional neuroimaging research
tive functions are compromised, such as following alcohol on race perception has largely corroborated these models
consumption [68] or injury [3]. For instance, patients with by showing evidence of top-down regulation of the amyg-
frontal lobe damage show a preference for immediate dala by the lateral PFC when viewing members of a racial
rewards in intertemporal choice tasks [69]. Likewise, tran- outgroup [88,89]. Echoing the findings on the regulation of
scranial magnetic stimulation to lateral PFC increases craving and emotions outlined above, activity in the lateral
choices of immediate over delayed rewards [70]. It is PFC was found to be inversely correlated with amygdala
plausible that negative mood and resource depletion inter- activity to racial outgroup members (i.e. African Ameri-
fere with self-regulation because they disrupt frontal con- cans) when viewing faces [88] and when assigning a verbal
trol, thereby tipping the balance. We noted above that label to faces [89].
negative emotional states are associated with self-regula- Further evidence that the recruitment of lateral PFC
tion failure, possibly because they interfere with higher- observed in these studies reflects self-regulatory processes
order representations, such as those involved in self- comes from a study by Richeson and colleagues that com-
awareness and insight. Sinha and colleagues found that bined functional neuroimaging with a behavioral measure
recall of personally distressing episodes led to decreased of self-regulatory resource depletion [90]. Activity in the
activity in PFC and increased activity in ventral striatal PFC (specifically lateral PFC and anterior cingulate cor-
regions [71], which supports the idea that stress tips the tex) when viewing black versus white faces was correlated
balance to favor subcortical structures. with the degree to which participants experienced self-
regulatory resource depletion in a separate behavioral
Regulation of emotions experiment in which they were required to discuss racially
Paralleling studies of appetitive regulation, research on charged topic with a black confederate [90]. Put differently,
emotion regulation has converged on a top-down model the degree to which participants found the inter-racial
whereby neural responses to emotional material in the interaction cognitively depleting was associated with in-
amygdala and associated limbic regions are downregulated creased activity in lateral prefrontal regions when viewing
by the lateral PFC [72–74]. Analogous to the cue-reactivity black versus white faces during fMRI. Taken together,
research outlined above, a frequent finding in studies of these findings suggest that, as with emotions and drug
emotion regulation is of an inverse relationship between cues, regulation of attitudes towards outgroup members
activity in the lateral PFC and the amygdala, a limbic requires downregulation of the amygdala by the PFC.
structure sensitive to emotionally arousing stimuli [74–
78]. For instance, Wager and colleagues found that two Prefrontal–subcortical balance model of self-regulation
independent pathways mediate frontal regulation of emo- A longstanding idea in psychology is that resisting tempta-
tion: a frontal–striatal pathway is associated with success- tions reflects competition between impulses and self-con-
ful regulation whereas a frontal–amygdala pathway is trol [2,5,40]. More recently, such dual-system models have
associated with less successful regulation [79]. Likewise, received support from imaging research, with substantial
Schardt et al. found that increased functional coupling evidence of frontal–subcortical connectivity and reciprocal
between lateral PFC and amygdala was associated with activity [15,49,60,91–94]. Neuroscientific models of emo-
successful emotion regulation for those with genotypes tion regulation and self-control in drug addiction share
associated with hyper-responsivity to negative stimuli conceptual similarities. For instance, models of drug ad-
[80]. diction posit that brain reward systems are hypersensi-
Research on patients with mood disorders has demon- tized to drug cues and become uncoupled from PFC regions
strated that the reciprocal relationship between PFC and involved in top-down regulation [95,96]. Likewise, neuroe-
amygdala during emotion regulation breaks down in conomic studies of decision-making find that PFC activity
patients suffering from major depressive disorder and is associated with long-term outcomes, whereas subcortical
borderline personality disorder (BPD) [75,81,82]. Recent activity is associated with more immediate outcomes [97].
studies suggest that this prefrontal–amygdala circuit Similarly, models of emotion regulation and stereotype
might be related to differences in brain structure and suppression suggest that prefrontal regions are involved
connectivity. For instance, in contrast to controls, partici- in actively regulating emotion – or prejudicial attitudes –
pants with BPD showed no coupling of metabolism be- based on the observation of an inverse relationship be-

136
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

tween PFC and activity in the amygdala [77,88,89]. Stud- Box 3. Outstanding questions
ies of patients with anxiety and mood disorders offer
similar evidence in the form of reduced functional [75]
 Are individual differences in susceptibility to self-regulatory
and structural [84] connectivity between the PFC and failure related to prefrontal–subcortical connectivity or the
the amygdala. Similarly, alcohol consumption, which is integrity of frontal circuitry?
known to disrupt self-regulation, shifts activity from the  Can direct measurements of brain glucose levels with FDG PET be
PFC to subcortical limbic structures [98], whereas exces- used to test the glucose model of resource depletion?
 Does self-regulatory training alter brain connectivity and morpho-
sive alcohol use leads to degeneration in cortical areas
metry and do these changes predict greater self-regulatory
important for controlling impulsivity [68], which might success?
serve to further undermine attempts to control impulses  Are patients with prefrontal damage, or adults with age-related
among alcoholics. During development, when frontal exec- cognitive decline, more susceptible to external cues such as
utive functions are still maturing, subcortical structures appetizing foods or the sight and smell of cigarettes?
 Does the frontal–subcortical reciprocal relation change during
might more easily tip the balance and overwhelm self- childhood development or during aging or as a function of
regulatory resources, thereby explaining why adolescents substance use?
might be prone to heightened emotionality and risk-taking
[15].
What these different models have in common is the self-regulatory failures occur because of their influence
notion that during successful self-regulation, there is a on reward (i.e. cue reactivity and lapse-activated con-
balance between prefrontal regions involved in self-control sumption) whereas others occur because of their influence
and subcortical regions involved in representing reward on PFC (i.e. negative moods, self-regulatory depletion,
incentives, emotions or attitudes. We propose that the physiological disruption or damage of PFC).
precise subcortical target of top-down control is dependent We also note that self-regulatory failure depends on the
on the regulatory context that individuals find themselves individual. That is, the particular domain a person tries to
in: when a person regulates their food intake, this involves a control is the one that is most prone to self-regulation
prefrontal–striatal circuit, and when this same person later failure. For example, self-regulatory resource depletion
regulates their emotions, they instead invoke a prefrontal– might lead an abstinent smoker to turn to cigarettes, a
amygdala circuit. From this perspective, the nature of self- dieter to high-calorie foods or a prejudiced individual to
regulation is constant across different types of regulation, make bigoted remarks; although the outcome is different in
despite variability in the neural regions that are being each case and the underlying subcortical regions involved
regulated [49]. Indeed, a recent review of self-control across can even differ (i.e. striatum or amygdala), the overall
six different domains found that lateral PFC is involved in process is probably the same.
exerting control regardless of the specific domain [99]. This
supports our conjecture that the mechanism for self-regula- Concluding remarks
tion is domain-general, whereas the subcortical region in- In this review we highlighted a number of threats to self-
volved varies depending on the nature of the stimulus, regulation, from negative mood and potent appetitive cues
which might explain why the effects of resource depletion to lapse-activated consumption and self-regulatory re-
are not tied to any one self-regulatory domain. source depletion. Neuroimaging research on self-regulato-
ry failure is still in its infancy. Recently, a small number of
Why do people fail at self-regulation? studies of drug addicts, patients and healthy individuals
Giving in to temptations can occur for a variety of reasons; have shed light on the neural mechanisms underlying self-
for instance, dieters attempting to control their food in- regulatory failure. This research corroborates theoretical
take might find it easy to ignore most foods, but when models of self-control in which the PFC is involved in
confronted with their favorite dessert their craving can actively regulating subcortical responses to emotions
overpower their resolve. Similarly, bad moods or compet- and appetitive cues. This prefrontal–subcortical balance
ing regulatory demands can all conspire to break the hold model emphasizes that self-regulatory collapse can occur
people have over their impulses and desires. From the because of both insufficient top-down control and over-
perspective of the prefrontal–subcortical balance model whelming bottom-up impulses.
outlined above, anything that tips the balance in favor of
subcortical regions can lead to self-regulatory collapse. Acknowledgments
This can occur in a bottom-up manner when people are We thank Bill Kelley and Paul Whalen for helpful discussions in
developing this model. This work was supported by NIH grant
confronted with especially potent cues, such as a favorite
R01DA022582.
food, a free drink or a strong emotion, and in a top-down
manner, such as when prefrontal functioning is impaired
References
either when self-regulatory resources are depleted or due 1 Baumeister, R.F. et al. (1994) Losing Control: How and Why People
to drugs, alcohol or brain damage [3]. Therefore, for suc- Fail at Self-Regulation, Academic Press
cessful self-regulation, current self-regulatory ability 2 Hofmann, W. et al. (2009) Impulse and self-control from a dual-
must withstand the strength of an impulse. On this point, systems perspective. Perspect. Psychol. Sci. 4, 162–176
researchers have generally neglected to consider the situ- 3 Wagner, D.D. and Heatherton, T.F. (2010) Giving in to temptation:
the emerging cognitive neuroscience of self-regulatory failure, In
ational factors that influence the balance between activity Handbook of Self-Regulation: Research, Theory, and Applications
in subcortical regions and the PFC in self-regulation (2nd edn) (Vohs, K.D. and Baumeister, R.F., eds), pp. 41–63,
failure (Box 3). Our review suggests that some classic Guilford Press

137
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

4 Heatherton, T.F. (2011) Self and identity: neuroscience of self and self- 33 Stewart, J. et al. (1984) Role of unconditioned and conditioned drug
regulation. Annu. Rev. Psychol. 62, 363–390 effects in the self-administration of opiates and stimulants. Psychol.
5 Baumeister, R.F. and Heatherton, T.F. (1996) Self-regulation failure: Rev. 91, 251–268
an overview. Psychol. Inq. 7, 1–15 34 Drobes, D.J. and Tiffany, S.T. (1997) Induction of smoking urge
6 Schroeder, S.A. (2007) We can do better – improving the health of the through imaginal and in vivo procedures: physiological and self-
American people. New Engl. J. Med. 357, 1221–1228 report manifestations. J. Abnorm. Psychol. 106, 15–25
7 Tangney, J.P. et al. (2004) High self-control predicts good adjustment, 35 Payne, T.J. et al. (2006) Pretreatment cue reactivity predicts end-of-
less pathology, better grades, and interpersonal success. J. Pers. 72, treatment smoking. Addict. Behav. 31, 702–710
271–324 36 Ferguson, M.J. and Bargh, J.A. (2004) How social perception can
8 Duckworth, A.L. and Seligman, M.E. (2005) Self-discipline outdoes IQ automatically influence behavior. Trends Cogn. Sci. 8, 33–39
in predicting academic performance of adolescents. Psychol. Sci. 16, 37 Stacy, A.W. and Wiers, R.W. (2010) Implicit cognition and addiction: a
939–944 tool for explaining paradoxical behavior. Annu. Rev. Clin. Psychol. 6,
9 Quinn, P.D. and Fromme, K. (2010) Self-regulation as a protective 551–575
factor against risky drinking and sexual behavior. Psychol. Addict. 38 Bargh, J.A. and Morsella, E. (2008) The unconscious mind. Perspect
Behav. 24, 376–385 Psychol. Sci. 3, 73–79
10 Hagger, M.S. et al. (2010) Ego depletion and the strength model of self- 39 Rooke, S.E. et al. (2008) Implicit cognition and substance use: a meta-
control: a meta-analysis. Psychol. Bull. 136, 495–525 analysis. Addict. Behav. 33, 1314–1328
11 Marlatt, G.A. and Gordon, J.R. (1985) Relapse Prevention: 40 Metcalfe, J. and Mischel, W. (1999) A hot/cool-system analysis of delay
Maintenance Strategies in the Treatment of Addictive Behaviors, of gratification: dynamics of willpower. Psychol. Rev. 106, 3–19
Guilford Press 41 Mischel, W. et al. (2010) ‘Willpower’ over the life span: mechanisms,
12 Sinha, R. (2009) Modeling stress and drug craving in the laboratory: consequences, and implications. Soc. Cogn. Affect. Neurosci. DOI:
implications for addiction treatment development. Addict. Biol. 14, 10.1093/scan/nsq081
84–98 42 Bickel, W.K. and Marsch, L.A. (2001) Toward a behavioral economic
13 Anderson, C.A. and Bushman, B.J. (2002) Human aggression. Annu. understanding of drug dependence: delay discounting processes.
Rev. Psychol. 53, 27–51 Addiction 96, 73–86
14 Bruyneel, S.D. et al. (2009) I felt low and my purse feels light: 43 Vohs, K.D. and Heatherton, T.F. (2000) Self-regulatory failure: a
depleting mood regulation attempts affect risk decision making. J. resource-depletion approach. Psychol. Sci. 11, 249–254
Behav. Decis. Making 22, 153–170 44 Muraven, M. et al. (2002) Self-control and alcohol restraint: an initial
15 Somerville, L.H. et al. (2010) A time of change: behavioral and neural application of the self-control strength model. Psychol. Addict. Behav.
correlates of adolescent sensitivity to appetitive and aversive 16, 113–120
environmental cues. Brain Cogn. 72, 124–133 45 Vohs, K.D. et al. (2005) Self-regulation and self-presentation:
16 Bousman, C.A. et al. (2009) Negative mood and sexual behavior regulatory resource depletion impairs impression management and
among non-monogamous men who have sex with men in the effortful self-presentation depletes regulatory resources. J. Pers. Soc.
context of methamphetamine and HIV. J. Affect. Disord. 119, 84–91 Psychol. 88, 632–657
17 Magid, V. et al. (2009) Negative affect, stress, and smoking in college 46 Richeson, J.A. and Shelton, J.N. (2003) When prejudice does not pay:
students: unique associations independent of alcohol and marijuana effects of interracial contact on executive function. Psychol. Sci. 14,
use. Addict. Behav. 34, 973–975 287–290
18 Sinha, R. (2007) The role of stress in addiction relapse. Curr. 47 Baler, R.D. and Volkow, N.D. (2006) Drug addiction: the neurobiology
Psychiatry Rep. 9, 388–395 of disrupted self-control. Trends Mol. Med. 12, 559–566
19 Witkiewitz, K. and Villarroel, N.A. (2009) Dynamic association 48 Robinson, T.E. and Berridge, K.C. (2003) Addiction. Annu. Rev.
between negative affect and alcohol lapses following alcohol Psychol. 54, 25–53
treatment. J. Consult. Clin. Psychol. 77, 633–644 49 Volkow, N.D. et al. (2008) Overlapping neuronal circuits in addiction
20 Heatherton, T.F. et al. (1991) Effects of physical threat and ego threat and obesity: evidence of systems pathology. Philos. Trans. R. Soc.
on eating behavior. J. Pers. Soc. Psychol. 60, 138–143 Lond. B: Biol. Sci. 363, 3191–3200
21 Macht, M. (2008) How emotions affect eating: a five-way model. 50 O’Doherty, J.P. et al. (2003) Temporal difference models and reward-
Appetite 50, 1–11 related learning in the human brain. Neuron 38, 329–337
22 McKee, S. et al. (2010) Stress decreases the ability to resist smoking 51 Garavan, H. et al. (2000) Cue-induced cocaine craving:
and potentiates smoking intensity and reward. J. Psychopharmacol. neuroanatomical specificity for drug users and drug stimuli. Am. J.
DOI: 10.1177/0269881110376694 Psychiatry 157, 1789–1798
23 Heatherton, T.F. and Baumeister, R.F. (1991) Binge eating as escape 52 Grant, S. et al. (1996) Activation of memory circuits during cue-
from self-awareness. Psychol. Bull. 110, 86–108 elicited cocaine craving. Proc. Natl. Acad. Sci. U.S.A. 93, 12040–12045
24 Goldstein, R.Z. et al. (2009) The neurocircuitry of impaired insight in 53 Myrick, H. et al. (2008) Effect of naltrexone and ondansetron on
drug addiction. Trends Cogn. Sci. 13, 372–380 alcohol cue-induced activation of the ventral striatum in alcohol-
25 Ward, A. and Mann, T. (2000) Don’t mind if I do: disinhibited eating dependent people. Arch. Gen. Psychiatry 65, 466–475
under cognitive load. J. Pers. Soc. Psychol. 78, 753–763 54 Naqvi, N.H. and Bechara, A. (2009) The hidden island of addiction: the
26 Sinha, R. (2008) Chronic stress, drug use, and vulnerability to insula. Trends Neurosci. 32, 56–67
addiction. Ann. N.Y. Acad. Sci. 1141, 105–130 55 Diekhof, E.K. and Gruber, O. (2010) When desires collide with reason:
27 Herman, C.P. and Mack, D. (1975) Restrained and unrestrained functional interactions between anteroventral prefrontal cortex and
eating. J. Pers. 43, 647–660 nucleus accumbens underlie the human ability to resist impulsive
28 Herman, C.P. and Polivy, J. (2010) The self-regulation of eating: desires. J. Neurosci. 30, 1488–1493
theoretical and practical problems, In Handbook of Self- 56 McClure, S.M. et al. (2004) Separate neural systems value immediate
Regulation: Research, Theory, and Applications (2nd edn) (Vohs, and delayed monetary rewards. Science 306, 503–507
K.D. and Baumeister, R.F., eds), pp. 492–508, Guilford Press 57 Pine, A. et al. (2010) Dopamine, time, and impulsivity in humans. J.
29 Marlatt, G.A. et al. (2009) Relapse prevention: evidence base and Neurosci. 30, 8888–8896
future directions, In Evidence-Based Addiction Treatment (1st edn) 58 Childress, A.R. et al. (2008) Prelude to passion: limbic activation by
(Miller, P.M., ed.), pp. 215–232, Elsevier/Academic Press ‘‘unseen’’ drug and sexual cues. PLoS ONE 3, e1506
30 Drummond, D.C. et al. (1990) Conditioned learning in alcohol 59 Wagner, D.D. et al. (2011) Spontaneous action representation in
dependence: implications for cue exposure treatment. Br. J. Addict. smokers watching movie smoking. J. Neurosci. 31, 894–898
85, 725–743 60 Volkow, N.D. et al. (2010) Cognitive control of drug craving inhibits
31 Glautier, S. and Drummond, D.C. (1994) Alcohol dependence and cue brain reward regions in cocaine abusers. Neuroimage 49, 2536–
reactivity. J. Stud. Alcohol. 55, 224–229 2543
32 Jansen, A. (1998) A learning model of binge eating: cue reactivity and 61 Kober, H. et al. (2010) Prefrontal-striatal pathway underlies cognitive
cue exposure. Behav. Res. Ther. 36, 257–272 regulation of craving. Proc. Natl. Acad. Sci. U.S.A. 107, 14811–14816

138
Review Trends in Cognitive Sciences March 2011, Vol. 15, No. 3

62 Brody, A.L. et al. (2007) Neural substrates of resisting craving during 86 Devine, P.G. (1989) Stereotypes and prejudice – their automatic and
cigarette cue exposure. Biol. Psychiatry 62, 642–651 controlled components. J. Pers. Soc. Psychol. 56, 5–18
63 Delgado, M.R. et al. (2008) Regulating the expectation of reward via 87 Fiske, S.T. (1998) Stereotyping, prejudice, and discrimination. In The
cognitive strategies. Nat. Neurosci. 11, 880–881 Handbook of Social Psychology (Vol. 2) (Gilbert, D. et al., eds), In pp.
64 Berkman, E.T., et al. In the trenches of real-world self-control: Neural 357–411, McGraw-Hill
correlates of breaking the link between craving and smoking. Psychol. 88 Cunningham, W.A. et al. (2004) Separable neural components in the
Sci., in press processing of black and white faces. Psychol. Sci. 15, 806–813
65 Heatherton, T.F. et al. (1992) Effects of distress on eating: the 89 Lieberman, M.D. et al. (2005) An fMRI investigation of race-related
importance of ego-involvement. J. Pers. Soc. Psychol. 62, 801–803 amygdala activity in African-American and Caucasian-American
66 Heatherton, T.F. et al. (1993) Self-awareness, task failure, and individuals. Nat. Neurosci. 8, 720–722
disinhibition: how attentional focus affects eating. J. Pers. 61, 49–61 90 Richeson, J.A. et al. (2003) An fMRI investigation of the impact of
67 Demos, K.E. et al. (2011) Dietary restraint violations influence reward interracial contact on executive function. Nat. Neurosci. 6, 1323–1328
responses in nucleus accumbens and amygdala. J. Cogn. Neurosci. 91 Banks, S.J. et al. (2007) Amygdala–frontal connectivity during
21568 DOI: 10.1162/jocn. 2010 emotion regulation. Soc. Cogn. Affect. Neurosci. 2, 303–312
68 Crews, F.T. and Boettiger, C.A. (2009) Impulsivity, frontal lobes and 92 Batterink, L. et al. (2010) Body mass correlates inversely with
risk for addiction. Pharmacol. Biochem. Behav. 93, 237–247 inhibitory control in response to food among adolescent girls: an
69 Sellitto, M., Ciaramelli, E. and de Pellegrino, G. (2010) Myopic fMRI study. Neuroimage 52, 1696–1703
discounting of future rewards after medial orbitofrontal damage in 93 Li, C.S. and Sinha, R. (2008) Inhibitory control and emotional stress
humans. J. Neurosci. 30, 6429–6436 regulation: neuroimaging evidence for frontal–limbic dysfunction in
70 Figner, B. et al. (2010) Lateral prefrontal cortex and self-control in psycho-stimulant addiction. Neurosci. Biobehav. Rev. 32, 581–597
intertemporal choice. Nat. Neurosci. 13, 538–539 94 MacDonald, K.B. (2008) Effortful control, explicit processing, and the
71 Sinha, R. et al. (2005) Neural activity associated with stress-induced regulation of human evolved predispositions. Psychol. Rev. 115, 1012–
cocaine craving: a functional magnetic resonance imaging study. 1031
Psychopharmacology 183, 171–180 95 Bechara, A. (2005) Decision making, impulse control and loss of
72 Davidson, R.J. et al. (2000) Dysfunction in the neural circuitry of willpower to resist drugs: a neurocognitive perspective. Nat.
emotion regulation – a possible prelude to violence. Science 289, 591– Neurosci. 8, 1458–1463
594 96 Koob, G.F. and Le Moal, M. (2008) Addiction and the brain antireward
73 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. system. Annu. Rev. Psychol. 59, 29–53
Trends Cogn. Sci. 9, 242–249 97 Heuttel, S.A. (2010) Ten challenges for decision neuroscience. Front.
74 Hariri, A.R. et al. (2003) Neocortical modulation of the amygdala Neurosci. 4, 1–7
response to fearful stimuli. Biol. Psychiatry 53, 494–501 98 Volkow, N.D. et al. (2008) Moderate doses of alcohol disrupt the
75 Johnstone, T. et al. (2007) Failure to regulate: counterproductive functional organization of the human brain. Psychiatry Res. 162,
recruitment of top-down prefrontal–subcortical circuitry in major 205–213
depression. J. Neurosci. 27, 8877–8884 99 Cohen, J.R. and Lieberman, M.D. (2010) The common neural basis of
76 Ochsner, K.N. et al. (2002) Rethinking feelings: an FMRI study of the exerting self-control in multiple domains. In Self Control in Society,
cognitive regulation of emotion. J. Cogn. Neurosci. 14, 1215–1229 Mind, and Brain (Hassin, R. et al., eds), pp. 141–162, Oxford
77 Ochsner, K.N. et al. (2004) For better or for worse: neural systems University Press
supporting the cognitive down- and up-regulation of negative 100 Muraven, M. et al. (1999) Longitudinal improvement of self-
emotion. Neuroimage 23, 483–499 regulation through practice: building self-control strength through
78 Urry, H.L. et al. (2006) Amygdala and ventromedial prefrontal cortex repeated exercise. J. Soc. Psychol. 139, 446–457
are inversely coupled during regulation of negative affect and predict 101 Gailliot, M.T. et al. (2007) Increasing self-regulatory strength can
the diurnal pattern of cortisol secretion among older adults. J. reduce the depleting effect of suppressing stereotypes. Pers. Soc.
Neurosci. 26, 4415–4425 Psychol. Bull. 33, 281–294
79 Wager, T.D. et al. (2008) Prefrontal–subcortical pathways mediating 102 Muraven, M. (2010) Practicing self-control lowers the risk of smoking
successful emotion regulation. Neuron 59, 1037–1050 lapse. Psychol. Addict. Behav. 24, 446–452
80 Schardt, D.M. et al. (2010) Volition diminishes genetically mediated 103 Bermudez, P. et al. (2009) Neuroanatomical correlates of
amygdala hyperreactivity. Neuroimage 53, 943–951 musicianship as revealed by cortical thickness and voxel-based
81 Donegan, N.H. et al. (2003) Amygdala hyperreactivity in borderline morphometry. Cereb. Cortex 19, 1583–1596
personality disorder: implications for emotional dysregulation. Biol. 104 Gailliot, M.T. and Baumeister, R.F. (2007) The physiology of
Psychiatry 54, 1284–1293 willpower: linking blood glucose to self-control. Pers. Soc. Psychol.
82 Silbersweig, D. et al. (2007) Failure of frontolimbic inhibitory function Rev. 11, 303–327
in the context of negative emotion in borderline personality disorder. 105 Gailliot, M.T. et al. (2007) Self-control relies on glucose as a limited
Am. J. Psychiatry 164, 1832–1841 energy source: willpower is more than a metaphor. J. Pers. Soc.
83 New, A.S. et al. (2007) Amygdala–prefrontal disconnection in Psychol. 92, 325–336
borderline personality disorder. Neuropsychopharmacology 32, 106 Gailliot, M.T. et al. (2009) Stereotypes and prejudice in the blood:
1629–1640 sucrose drinks reduce prejudice and stereotyping. J. Exp. Soc.
84 Kim, M.J. and Whalen, P.J. (2009) The structural integrity of an Psychol. 45, 288–290
amygdala–prefrontal pathway predicts trait anxiety. J. Neurosci. 29, 107 Benton, D. et al. (1994) Blood glucose influences memory and
11614–11618 attention in young adults. Neuropsychologia 32, 595–607
85 Yoo, S.S. et al. (2007) The human emotional brain without sleep – a 108 Jonides, J. et al. (1997) Verbal working memory load affects regional
prefrontal amygdala disconnect. Curr. Biol. 17, R877–878 brain activation as measured by PET. J. Cogn. Neurosci. 9, 462–475

139

You might also like