You are on page 1of 280


Emerging Communication
Studies in New Technologies and Practices in Communication
Emerging Communication publishes state-of-the-art papers that examine a broad range of issues in communication technology, theories, research, practices and applications. It presents the latest development in the field of traditional and computer-mediated communication with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. Since Emerging Communication seeks to be a general forum for advanced communication scholarship, it is especially interested in research whose significance crosses disciplinary and sub-field boundaries. Editors-in-Chief Giuseppe Riva, Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano, Milan, Italy Fabrizio Davide, TELECOM ITALIA Learning Services S.p.A., Rome, Italy Editorial Board Luigi Anolli, University of Milan-Bicocca, Milan, Italy Cristina Botella, Universitat Jaume I, Castellon, Spain Martin Holmberg, Linkping University, Linkping, Sweden Ingemar Lundstrm, Linkping University, Linkping, Sweden Salvatore Nicosia, University of Tor Vergata, Rome, Italy Brenda K. Wiederhold, Interactive Media Institute, San Diego, CA, USA Luciano Gamberini, State University of Padua, Padua, Italy

Volume 10
Previously published in this series: Vol. 9. Vol. 8. Vol. 7. Vol. 6. Vol. 5. Vol. 4. Vol. 3. G. Riva, M.T. Anguera, B.K. Wiederhold and F. Mantovani (Eds.), From Communication to Presence R. Baldoni, G. Cortese, F. Davide and A. Melpignano (Eds.), Global Data Management L. Anolli, S. Duncan Jr., M.S. Magnusson and G. Riva (Eds.), The Hidden Structure of Interaction G. Riva, F. Vatalaro, F. Davide and M. Alcaiz (Eds.), Ambient Intelligence G. Riva, F. Davide and W.A. IJsselsteijn (Eds.), Being There V. Milutinovi and F. Patricelli (Eds.), E-Business and E-Challenges L. Anolli, R. Ciceri and G. Riva (Eds.), Say Not to Say: New Perspectives on Miscommunication

ISSN 1566-7677

Enacting Intersubjectivity
A Cognitive and Social Perspective on the Study of Interactions

Edited by

Francesca Morganti
University of Lugano, Lugano, Switzerland Istituto Auxologico Italiano, Milano, Italy

Antonella Carassa
University of Lugano, Lugano, Switzerland


Giuseppe Riva
Catholic University of Milan, Milano, Italy Istituto Auxologico Italiano, Milano, Italy

Amsterdam Berlin Oxford Tokyo Washington, DC

2008 The authors. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-58603-850-2 Library of Congress Control Number: 2008924571 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail: Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.

Intersubjectivity is a central theoretical construct intersecting various disciplines. As a research field it is therefore characterised by its being the meeting point of areas and methodologies even very different from each other. In the history of science, when this sort of overlapping takes place, we witness the gradual emergence of ever more complex theoretical constructs, which can become the conceptual ground for building more general theories. On the other hand, the interest in the area of intersubjectivity arises from the growing awareness of the intrinsically relational nature of the human species, highlighted by numerous recent discoveries in various fields e.g., neuropsychology, cognitive science, ethology so much so that some scholars have coined for our species the term ultra-social species. The landscape of disciplines involving theoretical or experimental research pertaining to the area of intersubjectivity is vast: neuropsychology and neurosciences, consciousness, emotions, embodiment, the relational mind, sharing other peoples mind states, and more generally various areas of philosophy, ethology, general and evolutionary psychology, social and cultural psychology, clinical psychology and psychiatry, psychoanalysis. The intersubjectivity construct is therefore utilized on various levels, corresponding to the specific research areas. On one hand, this points to the power and fecundity of the construct; on the other it may be a source of problems communication, interpretation, explanation and comparison problems when the meeting involves disciplines that may have, and frequently inevitably do have, different theoretical presuppositions and research methodologies. It then becomes truly important as wonderfully exemplified by this volume to build opportunities for comparison and debate, i.e. frontier territories where the exchange needed for the growth of shared multidisciplinary knowledge takes place. Such a sharing is essential in order to build more general and more structured theories, whose fallout may support the creation or improvement of practical applications or more generally increase our understanding of complex phenomena pertaining to our species. The issue of practical application is particularly dear to me. When a student asks the Clinical Psychology professor what impact intersubjectivity research has on clinical practice, the answer is that it deals with concepts that are fundamental for the understanding of the therapeutic relationship. They are essential for building that relational field in which the sharing of meaning leading to the therapeutic alliance develops, the field supporting the clients exploration of new ways of functioning and new ways of understanding him or herself in relationship with others. In some particular studies the concept of intersubjectivity has played a crucial role, e.g. the research about autism or that about attachment. But the most relevant aspect for clinical practice is that, when the therapist is aware of her own embodied and intersubjective nature, she will read differently the clients narrative, the clients relationship with her and her own relationship with the client. This applies also to ideas, constructs and relational modalities of the client. Also the thera-


peutic techniques, such as self-description and autobiography, will have a different meaning and will be more oriented toward viable solutions. In fact, a good theoretical model improves the use of therapeutic techniques and supports the growth of both self-knowledge and knowledge of the other. This does not just concern the clinicians, but obviously also teachers, trainers, and people operating in a group or institutional contexts or in the media. We need only be reminded of the complexity of the teacher-student relationship, or of the role of an actor in film or theatre. Generally speaking a better understanding of interactive mechanisms can benefit all those who are active in social contexts by providing new and useful conceptual tools. For this reason the present volume, with its high and clear scientific character, by also exploring the frontiers and perspectives of related disciplinary areas, can be of particular interest for a much larger public than the specialised readership. Prof. Giorgio Rezzonico Full Professor of Clinical Psychology Director of the Postgraduate School in Psychiatry Faculty of Medicine and Surgery University of Milan-Bicocca Milan, Italy


In recent years a new trend in socio-cognitive research investigates into the mental capacities that allow humans to relate to each other and to engage in social interactions. One of the mainstream is the study of intersubjectivity, namely the mutual sharing of experiences, conceived of as a basic dimension of consciousness on which socialness is grounded. At the very heart of contemporary studies is an intense debate around some central questions that concern the nature and forms of human intersubjectivity, its development and its role in situated joint activities. Striving to achieve a unified theoretical framework, these studies are characterized by a strong interdisciplinary approach founded on philosophical accounts, conceptual analysis, neuroscientific results and experimental data offered by developmental and comparative psychology. The book aims to give a general overview of this relevant and innovative area of research by bringing together seventeen contributions by eminent scholars who address the more relevant issues in the field. The book is organized into four main Sections: Section I. Section II. Bringing forward Intersubjectivity Perspectives on Intersubjectivity

Section III. Forms of Intersubjectivity Section IV. Enacting Intersubjectivity

Section I introduces the study of intersubjectivity, outlining the research areas which are involved and can contribute in delineating a multidisciplinary view to the study of interactions. Section II comprehends the essays aimed at outlining and discussing different perspectives which can be considered in the study of intersubjectivity, with a focus on conceptual and methodological aspects. Space is given to open questions, concepts to be refined and lesser explored features of intersubjectivity. In Section III the authors focus their attention on aspects of the cognitive architecture, with the aim to understand which socio-cognitive skills are at the hearth of human interaction. Theoretical models and experimental studies concerning the central issues of the research area, such as perception and understanding of the actions of others, self-recognition and self-reflection, self-other distinction, imitation or participation in co-operative activities, are presented. Obviously, the inclusion of the contributions in one particular section rather than in another could not be made on the basis of clear-cut criteria: each chapter is grounded on an explicit theoretical framework and frequently incorporates wideranging reflections on the adopted perspective. Section IV responds to our wish to include studies showing how the human capacity to create an intersubjective space is enacted in ongoing interactions. Intersubjectivity is enacted for example, when a number of individuals participate to a particular joint activity which requires a specific expertise: when they play a sym-


phony, perform a choreographed dance or a piece of theatre. It is also enacted when a participant is co-operatively engaged in an experimental task. Following this line of thought we have ventured to include essays which refer to situations which are specific under other dimensions: in this case the problem can be to understand, for example, the kind of interpersonal relationship and emotional exchange, the subjective experience which can take place when people with particular diseases are immersed in everyday life events. Looking to future research, we hope that more situated studies of individuals-in-interaction, will bridge the gap between two separate research traditions, that of cognitive science mostly centered on individual capacities and that of social sciences exclusively focused on interactional behaviour. We will now give a brief guided tour through the contents of the chapters. The starting point of the book is the Chapter by Francesca Morganti where she provides an account of the disciplines involved in research on intersubjectivity, showing how this object of study can be derived from the cross-fertilization among situated cognitive science, social cognition and cognitive neuroscience. The opening Chapter in Section II by Corrado Sinigaglia contrasts the standard view that we understand the behaviors of others because we are able to read their mental states such as intentions, beliefs and desires and develops a motor approach to intentionality based on neuroscience results regarding mirror-neurons. He shows how this approach may constitute the way to rethinking the basis and the development of intentional understanding within a unitary, theoretically and neurophysiologically grounded framework. Starting from the question What is a social interaction? the article by Hanne De Jaegher and Ezequiel Di Paolo presents an enactive approach aimed at integrating individual cognition and the interaction process in order to arrive at new and more parsimonious explanations of social understanding. Their concept of participatory sense-making, connecting coordination with meaning-generation, contributes to the enrichment of the dialogue between cognitive science individualistic approaches on social cognition and social science approaches which are instead uniquely focussed on interactional behaviour. In Chapter 4, Jessica Lindblom and Tom Ziemke contrast traditional, disembodied information-processing approaches to intersubjectivity in socio-cognitive research with more recent embodied approaches. Different notions of embodiment and their role in cognition and social interaction are clarified and a theoretical discussion on the function of the body in social interaction is conducted, integrating a broad range of theoretical perspectives and empirical evidence from the different disciplines involved in intersubjectivity research. The contribution by Timothy Racine, David Leavens, Noah Susswein and Tyler Wereha in Chapter 5 addresses some conceptual and methodological issues in the investigation of primate intersubjectivity, in particular the primates ability to point. They argue that the present debate about the human ability to declaratively point and the absence of this ability in other great apes, rests on problematic ideas about the nature of meaning and mind. In their view, the conception of the mind as an inner entity that is logically distinct from activity and cultural surroundings rearing history and so forth, give rise to conceptual and methodological problems that interfere with the interpretation of data and the construction of valid theories.


In Chapter 6 Maurizio Tirassa and Francesca Bosco outline a theory of agency and communication cast in a mentalistic and radically constructivistic framework and discuss the role that the capability to share plays in it. The final Chapter of Section II, by Giuseppe Riva, presents a conceptual framework that uses the concept of Presence the feeling of being and acting in a world outside us to link the enaction of our intentions with the understanding of other peoples intentions. In Section III different basic aspects of intersubjectivity are investigated. In Chapter 8, Jordan Zlatev, Ingar Brinck and Mats Andrn offer a model of perceptual intersubjectivity (PI), the phenomenon of two or more subjects focusing their attention on the same external target. Support fot this model is provided through an empirical study of adult-infant interaction in two species of great apes (chimpanzees and bonobos) and human beeings. The Chapter 9, by Stein Brten, regards the ability of infant and adults to imitatively re-enact what they have seen being done or co-enact what the companion is doing. This intersubjective enactment is illustrated with reference to layers of intersubjective attunement in ontogeny and with a focus on infant learning by altercentric participation in what the model is doing in face-to-face situation, as if they were co-author of the models actions. With reference to mirror neurons discovery, the neurosocial support by an altercentric mirror system is indicated. In Chapter 10 Manos Tsakiris addresses the question of how the self can be distinguished from other people, described as self-other distinction. Taking into account and discussing perspectives and evidences in cognitive neuroscience studies, the author puts forward the hypothesis that the experience and representation of ones body may underpin the distinction between the self and other agents. In Chapter 11 Wolfgang Prinz gives a cognitive science perspective on social mirroring, the notion that the individuals come to perceive and understand themselves by understanding how their conduct is perceived, received and understood by others. Varieties of social mirroring, arising from different modes of mirroring and different modes of communication, are proposed by the author. For social mirroring to work, it is argued that two basic requirements must be fulfilled, a functional one the operation of representational devices with mirror-like properties and a social one the discourse and practices for using and exploiting mirrors within social interaction. Chapter 12, by Moritz Daum, Norbert Zmyj and Giza Aschersleben, addresses the controversially discussed question of how infants abilities to perceive and understand goal-directed action is interrelated with their competence to perform the same behaviour. With the aim of contradicting results in studies on the development of this interrelation, the chapter integrates various findings in recent studies investigating perception, production and imitation of goal-directed action and discusses them in the light of existing hypotheses and theories. In Chapter 13 Antonella Carassa, Marco Colombetti and Francesca Morganti contend that certain explanatory inadequacies of current models of intersubjectivity depend on failing to appreciate the fundamental role of normativity in collective intentionality. Basing their argument on Margaret Gilberts theory of plural subjects, the authors try to show how the concept of joint commitment is a powerful tool in order to explain certain specific features of human joint activities and discuss some lines along which a psychology of plural subjects can be developed.

How intersubjectivity is enacted in interaction is the focus of Section IV. Music performance is examined in Chapter 14 by Peter Keller, where the author explains how ensemble musicians coordinate their actions with remarkable precision. Three cognitive processes which enable individuals to realize shared goals when engaged in musical joint action are illustrated and the way in which these processes interact to determine ensemble coordination is discussed. In Chapter 15 Wolfgang Prinz and Gertrud Rapinett consider how participants engaged in a specific task enact their abilities to represent partially occluded actions. To investigate if simulation, that is the representation of the events during occlusion, merely carries on old processes or initiates new ones, an experimental paradigm is used that allows one to study the impact that features of unoccluded action segments make on the representation of occluded segments. The results suggest that action simulation is a creative process creating novel, invisible actions rather than extrapolating visible actions. In Chapter 16 Jonathan Cole refers to his extensive studies on the role of face in the constitution of self in relation to others and on the impact of this aspect on interpersonal relationship. His investigation is based on the idea of exploring the role of face by taking into account what happens when something goes wrong as in Moebious syndrome, focusing on the subjective experience of the person in interaction. In this sense, his work can be characterized in terms of a first person approach to the study of consciousness, with a primary attention on first person data as expressed in narratives. Finally Chapter 17 by Fran Hagstrom is aimed at theoretically exploring intersubjectivity when social development goes awry, in the case of autism. The main point of the chapter is to complement the neurodevelopmental view of autism as a cognitive disorder with an investigation into the individual developmental paths of intersubjectivity in everyday life. The adopted socio-cultural framework and the analysis of a case-study material allow the author to show how intersubjectivity may be experienced differently depending on everyday situations and to develop the idea that cultural tools and social others often function to support lateemerging intersubjectivity during adolescence. The Editors gratefully acknowledge the assistance of a number of people and institutions without whose help this project could not have been carried out. We would like to thank the Istituto Auxologico Italiano, the University of Lugano and the Catholic University of Milan for their support. We acknowledge the European Community for grant given to Giuseppe Riva and Francesca Morganti for the FP6 project PASION (IST-2005-027654). We are also grateful to Laura Carelli who volunteered to help the Editors in the complex editorial process that was involved in the preparation of the current book. We hope that the contents of this book will stimulate further integrated research on intersubjectivity allowing us to better understand the neurobiological foundations, cognitive architecture and social abilities which define human beings. Francesca Morganti Antonella Carassa Giuseppe Riva


Mats ANDRN Centre for Languages and Literature Lund University, Sweden

Gisa ASCHERSLEBEN Department of Psychology Infant Cognition and Action Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

Stein BRATEN Department of Sociology and Human Geography University of Oslo, Norway

Francesca M. BOSCO Center for Cognitive Science University of Torino, Italy

Ingar BRINCK Department of Philosophy Lund University, Sweden

Antonella CARASSA IPSC, Institute of Psychology and Sociology of Communication University of Lugano, Switzerland

Jonathan COLE Department of Clinical Neurophysiology Poole Hospital, Poole, United Kingdom

Marco COLOMBETTI Institute for Communication Technologies University of Lugano, Switzerland

Department of Electronics and Informatics

Polythecnic University of Milan, Italy


Moritz M. DAUM Department of Psychology Infant Cognition and Action Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

Hanne DE JAEGHER Centre for Psychosocial Medicine, Department of Psychiatry University of Heidelberg, Germany CCNR, Centre for Computational Neuroscience and Robotics University of Sussex, United Kingdom

Ezequiel DI PAOLO CCNR, Centre for Computational Neuroscience and Robotics University of Sussex, United Kingdom

Fran HAGSTROM Department of Rehabilitation, Human Resources and Communication Disorders University of Arkansas, Fayetteville, Arkansas, USA

Peter KELLER Music Cognition & Action Group Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

David A. LEAVENS Department of Psychology University of Sussex, Falmer, United Kingdom

Jessica LINDBLOM School of Humanities and Informatics University of Skvde, Sweden

Francesca MORGANTI IPSC, Institute of Psychology and Sociology of Communication University of Lugano, Switzerland ATN-P Lab, Applied Technology for Neuro-Psychology Lab Istituto Auxologico Italiano, Milano, Italy

Wolfgang PRINZ Department of Psychology Max Plank Institute for Human Cognitive and Brain Sciences, Leipzig, Germany


Timothy P. RACINE Department of Psychology Simon Fraser University, Burnaby, British Columbia, Canada

Gertrude RAPINETT Department of General and Developmental Psychology University of Zurich, Switzerland

Giuseppe RIVA Department of Psychology Catholic University of Milan, Milano, Italy ATN-P Lab, Applied Technology for Neuro-Psychology Lab Istituto Auxologico Italiano, Milano, Italy

Corrado SINIGAGLIA Department of Philosophy University of Milan, Italy

Noah SUSSWEIN Department of Psychology Simon Fraser University, Burnaby, British Columbia, Canada

Maurizio TIRASSA Center for Cognitive Science University of Torino, Italy

Manos TSAKIRIS Department of Psychology Royal Holloway University of London, United Kingdom

Tyler J. WEREHA Department of Psychology Simon Fraser University, Burnaby, British Columbia, Canada

Tom ZIEMKE School of Humanities and Informatics University of Skvde, Sweden


Jordan ZLATEV Centre for Languages and Literature Lund University, Sweden

Norbert ZMYL Department of Psychology Infant Cognition and Action Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany


Preface Giorgio Rezzonico Introduction Francesca Morganti, Antonella Carassa and Giuseppe Riva Contributors Section I. Chapter 1. Bringing Forward Intersubjectivity What Intersubjectivity Affords: Paving the Way for a Dialogue Between Cognitive Science, Social Cognition and Neuroscience Francesca Morganti Perspectives on Intersubjectivity Enactive Understanding and Motor Intentionality Corrado Sinigaglia Making Sense in Participation: An Enactive Approach to Social Cognition Hanne De Jaegher and Ezequiel Di Paolo Interacting Socially Through Embodied Action Jessica Lindblom and Tom Ziemke Conceptual and Methodological Issues in the Investigation of Primate Intersubjectivity Timothy P. Racine, David A. Leavens, Noah Susswein and Tyler J. Wereha On the Nature and Role of Intersubjectivity in Human Communication Maurizio Tirassa and Francesca Marina Bosco Enacting Interactivity: The Role of Presence Giuseppe Riva 17 xi vii v

Section II. Chapter 2. Chapter 3.

33 49

Chapter 4. Chapter 5.


Chapter 6.

81 97

Chapter 7.

Section III. Forms of Intersubjectivity Chapter 8. Chapter 9. Stages in the Development of Perceptual Intersubjectivity Jordan Zlatev, Ingar Brinck and Mats Andrn 117

Intersubjective Enactment by Virtue of Altercentric Participation Supported by a Mirror System in Infant and Adult 133 Stein Braten


Chapter 10. The Self-Other Distinction: Insights from Self-Recognition Experiments Manos Tsakiris Chapter 11. Mirror Games Wolfgang Prinz Chapter 12. Early Ontogeny of Action Perception and Control Moritz M. Daum, Norbert Zmyj and Gisa Aschersleben Chapter 13. The Role of Joint Commitment in Intersubjectivity Antonella Carassa, Marco Colombetti and Francesca Morganti Section IV. Enacting Intersubjectivity Chapter 14. Joint Action in Music Performance Peter E. Keller Chapter 15. Filling the Gap: Dynamic Representation of Occluded Action Wolfgang Prinz and Gertrude Rapinett Chapter 16. The Role of the Face in Intersubjectivity, Emotional Communication and Emotional Experience; Lessons from Moebius Syndrome Jonathan Cole Chapter 17. Autism During Adolescence: Rethinking the Development of Intersubjectivity Fran Hagstrom Author Index

149 165 175 187

205 223





I can't find my way around this table And I can't find my way around your face And I can't find my way around your body And wasted days turn into wasted nights Hey lookit me now! Hey lookit me now! Hey lookit me now now now now now now! Lookit me now! David Byrne - Cowboy Mambo, 1992

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.

What Intersubjectivity Affords: Paving the Way for a Dialogue between Cognitive Science, Social Cognition and Neuroscience
Francesca MORGANTI
Abstract. The past decade has witnessed a growing interest in the study of the self-other relation; as a result, there has been a convergence of theoretical thought and research in the cognitive sciences, social cognition, and the neurosciences. At the moment, probably under the impact of recent mirror neurons findings, one notices a gradual but significant coming together of disciplines whose research tradition used to be grounded in areas often far apart from each other. In particular, it may legitimately be claimed that, albeit from different perspectives, the study of inter-subjectivity has laid the foundations for a constructive dialogue between these disciplines generating a common ground for the study of interpersonal relations. The present contribution aims to show that, if we take this stance, some concepts close to the situated cognitive sciences, such as embodied cognition and enaction, become neurobiologically plausible in research on mirror neurons, and manage to shed new light on what social cognition has known for some time on the relation between human beings.

Contents 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Introduction........................................................................................................... 4 Towards an enactive cognitive science ................................................................. 5 Towards a cognitive perspective on the study of social interactions .................... 7 Towards a neuroscientific endorsement ................................................................ 9 Conclusions......................................................................................................... 10 Acknowledgements ............................................................................................. 12 References........................................................................................................... 13

F. Morganti / What Intersubjectivity Affords

1.1 Introduction Blossoming interest in the study of intersubjectivity has opened up new perspectives for cognitive and social research, by providing plausible and longawaited biological foundations to the study of the human relationship. The picture that unfolded before the eyes of a scholar of the cognitive and/or social sciences until only a few years ago looked rather different from todays. On the one hand, the study of the human mind had focused quite closely on the study of consciousness, i.e. the peculiar ability of human beings to live in this world, and their ability to act intentionally in the specific context that surrounds them [1]. On the other hand, equally authoritative disciplines stemming from the social sciences and psychology concerned with interpersonal relations [2] have highlighted the relationship dynamics that human beings are capable of when confronting their fellow beings, while disregarding the study of the cognitive functions that support these abilities. At the same time, perhaps because of their kinship with medical-clinical fields rather than with to the humanities, the neurosciences have continued to conduce research by themselves, on the neurobiological foundations of the cognitive nature of human beings. Indeed, their research on the functioning of neurons has sought principally to confirm or revisit the cognitive theories of mind functioning and, more marginally, the social theories [3]. It is only in the past ten years that new discoveries have been made substantially altering our perception: the emergence of social cognitive neurosciences as well as mirror neuron studies have laid sound foundations for a fruitful convergence of the cognitive sciences, social cognition, and the neurosciences. The celebrated discovery of mirror neurons, on the one hand, invested the self-other relation with a new wave of optimism by proposing a much longed-for biological plausibility to the studies on the simulation of action and identification with the other. On the other hand, it has taken advantage of these studies to gain a fuller understanding of the cognitive significance of their discoveries [4, 5]. At the same time, these finding shed new light on research projects that had long been focussing on the study of intersubjectivity [6-8]. Today, the study of inter-subjectivity acts as a test-bed for the convergence of research interests enhanced by the triangulation between cognitive sciences, social sciences, and the neurosciences. The contribution made by these disciplines is complementary to the study of the relation, carrying with them significant research findings that the same disciplines have so far been unable to explain in full, as they draw exclusively on resources available within their research areas. Intersubjectivity provides them with a primary forum for exchange, enabling them to extricate findings that have until now suffered from an excessively clear-cut modularity of research. The following paragraphs will be devoted to showing how, these disciplines have been preparing the ground for convergence over the years, while now , in their latest research phases, have laid the foundations for a new form of dialogue. In paragraph 2, we will look at the way a computational cognitive science has given way to the situated perspective of cognition which contemplates its embodied and enactive nature. In paragraph 3 we will examine the shift from the merely representational study of a self-other understanding to an approach that is less related to mindreading in understanding the others intentions. Finally, in

F. Morganti / What Intersubjectivity Affords

paragraph 4 we will try to illustrate two facts: on the one hand, how the neurosciences have become less behavioural and much more cognitive over the years; on the other how the rise of social cognitive neuroscience and of mirror neuron studies have sown the seeds for a cross-fertilisation of disciplines. 1.2 Towards an enactive cognitive science It was only at the end of the last century that the scientific community began to greet with more warmth the study of human cognition as closely connected to action, and to the interaction with the context where agents are steeped. From a strictly computational paradigm, widespread among the cognitive sciences [9,10], we have more recently moved on to a conception of human cognition as associated with a potential for significant action and interaction with the world [11-16]. Classical cognitive sciences have long promoted a modular view of cognition upholding the dichotomy between perception and action. Then, new approaches were introduced that were more closely context-related, and at that point the dichotomy was (albeit in part) set aside to make space for a more accurate study of the mind-world interaction. By contrast with a classical view of the cognitive sciences, within a situated view of the mind, cognition is not the result of aggregation and organisation of noteworthy information from the outside world; it is the product of perception-action cycles where mind and world are constantly at play. This shift has created a different vantage point for the study of cognition where interaction does more than just point to the single action in the world (or the sequence of more complex actions the user can perform through it); it points to the dynamic building up of meaning as human beings tend to do while acting in a surrounding context. But what does it mean, indeed, to create a dynamic, meaningful relationship between mind and environment? As early as 1977 Gibson [13] stressed the need for an often disregarded theoretical shift: the objects of the world every time become affordances, but at the same time represent species-specific opportunities for the agent that happens to use them. The context in which human beings happen to act, therefore, is no longer something objective that they perceive and process creating firm images. Rather, the way that human beings represent an environment is every time a function of the activities that they are performing or are about to perform within it. Starting from theoretical standpoints identifiable with a view of cognition as inseparable from its field of action, a view of experience and cognition gradually takes shape that is closely linked not only to physical action, but above all to corporeity as a cognitive medium. A good description of this perspective may be found in studies on embodied cognition. Bateson [17] described the human mind as ecological, i.e. able to fit in with, and adapt well to, its surroundings through a continuous evolutionary process. The human mind, he claimed, is capable of creating a progressive integration between the physical features of human beings and artifacts essentially on the basis of a continuous process of cultural mediation. The human mind, indeed, is not disembodied, but closely tied with the body in which it dwells, from which it continues to acquire information on the world. In short, we may call it embodied.

F. Morganti / What Intersubjectivity Affords

This is why thought needs the bodys mediation to arise, and it is precisely to the body that it adapts. As Lakoff and Johnson [18] pointed out, the body is, on the one hand, the frame of reference in which all our experiences take place; on the other, the body becomes, through our senses, the main link between the mind and the world. Until the twentieth century, the definition of embodiment met with limited fields of application; it nonetheless opened up the field for extensive debate in philosophy as well as in the cognitive sciences. Among twentieth-century philosophers, Heidegger was the first to refer explicitly to the importance of the body for human thought [19]. It was Heidegger, in fact, who developed a phenomenology in which human activity may be understood not as the result of representations of the world disconnected from their context, but rather through the contextualised experience of a body-environment system. Merleau-Ponty [20] provides us with a further example of phenomenology of the mind in which the role of embodiment is granted considerable weight. Merleau-Ponty maintains that the way in which human beings see physical objects is entirely conditioned by opportunities for interaction which the object itself offers to our body. Let us observe, in this respect, that this philosophical belief has had a significant impact on the theory of perception later to be developed by Gibson [13]. There, in fact, the world is not perceived in an undifferentiated manner, but supplies living beings with opportunities for action, in other words, species-specific affordances. Merleau-Ponty himself, then, stretches his view of embodied cognition to the extreme, claiming that the body is the medium with which human beings can encompass the world in its totality. It is precisely and only through the activity that men do in the world that men are able to determine what experience of this same world means. Thus the body becomes an interface between the mind and the world, not so much as a collector of stimuli, but rather providing as it does a stage for the enactment of a drama, an interface allowing a merger between thought and the specific surrounding space. Human beings, indeed, constantly interact with the context in which they live, preserving in such situation an uninterrupted thread of activities which they carry out entirely by themselves. The continuity of their actions helps us establish a match between individuals (or better their intentions, planning of complex actions, and executions of movements) with the context in which they happen to be each time. Actors and world thus end up being inseparably connected and reciprocally adaptable. This embodied and situated view of cognition, which is, as we shall see, fundamental to the study of intersubjectivity may be associated to the definition of the concept of enaction. The concept of enaction was introduced into cognitive science in 1991 by Varela, Thompson & Rosch [21], to explain how mental life relates to bodily activity in the form of embodied action. In their book, The Embodied Mind, in fact, these authors suggested a sensorimotor coupling between organisms and the environment in which they live that determines recurrent patterns of perception and action leading to the acquisition of knowledge. Enactive cognition unfolds through action and is constructed on motor skills, such as manipulating objects or practising a specific activity. It is not simply multisensory mediated knowledge, but knowledge stored in the form of motor responses and acquired by the act of doing. According to the enactive approach, the human mind is embodied in our organism, is not reducible to structures inside the head, but is

F. Morganti / What Intersubjectivity Affords

embedded in the world with which we interact [22]. In rejecting the Cartesian mind-body dichotomy (in which there is a mental and a physical way to acquire knowledge, namely theoretical and procedural learning) the world become inseparable from the subject, and humans primary way of relating to things is neither purely cognitive nor sensory, but rather bodily and skillful. Maturana and Varela define the living being as an autopoietic machine, that is a system whose primary function is the creation and preservation of a unity of its own which singles him out from the environment which it inhabits [23]. Enactive knowledge is more natural than other forms of knowledge acquisition, because it is gained through perception-action interaction in the environment. Moreover, enactive knowledge is inherently multimodal because it requires the coordination of the various senses. The development of such approach requires a common vision between situated and embodied cognition. Recently this situated and enactive perspective has been extended to the social sciences, inasmuch as significant interaction, far from taking place merely with the world, does so with a physically and culturally more complex context, namely that of social relations. 1.3 Towards a cognitive perspective on the study of social interactions One of the most exciting attempts to move the cognitive sciences towards the study of social relation is undoubtedly represented by social cognition [24-26]. By definition social cognition aims to build a bridge between the cognitive and social sciences. To do so, the social cognition approach needs the contribution not only of social psychology but also of evolutionary studies [27, 28], and animal cognition research [29]. In particular, social cognition research helps us understand both individual cognition and collective activity integrating the cognitive modelling approach (according to which beliefs are formed by and drive behaviour) with social studies (according to which behaviour is determined by relationships and informal practices). Within this area it will be possible to extend the study of consciousness and human activity in interaction with other minds, trying to understand how the construction of an intersubjective space of activities may prove possible. Starting from a more cognitive approach several studies of how one person understands, and interrelates with, others have been conducted under the Theory of Mind heading. Among them, Frith and Happ [30] suggest that mind-reading appears to be a prerequisite for normal social interaction: in everyday life we make sense of each others behaviour by appeal to a belief-desire psychology. Discussions of theory of mind are dominated by two main approaches: theory theory and simulation theory. The major tenets of theory theory claim that the understanding of other peoples minds is based on an innately specified, domainspecific mechanism designed for reading other minds [31-33]. Common to different versions of theory theory is the idea that humans attain their understanding of other minds by implicitly postulating the existence of mental states in others and using such postulations to explain and predict another persons behaviour [34].

F. Morganti / What Intersubjectivity Affords

Simulation theory, argues that one does not theorize about the other person but uses ones own mental experience as an internal model for the others mind [35, 37]. To understand the other person, one simulates the thoughts or feelings that one would experience if she were in the situation of the other. Some theorists [31, 36] claim that mind theory is our primary and pervasive means of understanding other persons. Both theory theory and simulation theory conceive of communicative interaction between two people as a process that takes place in a set of internal mental operations that end up being expressed (externalized) in speech, gesture, or action. Addressing this feature, it has recently been suggested that social interaction may influence the development of childrens mentalistic understanding, finding that competence on false belief understanding is correlated with aspects of childrens socialization history [38]. At the same time, a different approach to social interactions has been developed showing that the primary and usual attitude of human beings in the world is grounded in interaction, rather than in mentalistic or conceptual prediction. Specifically, the interactive nature of the human mind is characterized by meaningful involvement (e.g. environmental and contextual factors) and movement possibilities (e.g. action planning) that allow humans to understand and share social situations. From this perspective, human encounters with others are not normally occasions for explaining or predicting the behaviour of others on the basis of postulated mental states; instead, in most intersubjective situations, agents have a direct understanding of another persons intentions because their intentions are explicitly expressed in their embodied actions, and mirrored in their own capabilities for action [39]. Incidentally, in defining primary intersubjectivity Threvarthen [40], had already gathered revolutionary scientific evidence supporting the claims that the basis for child interaction has already been laid by certain embodied practices that allows them to perceive gestures and sound cues allowing a not necessarily conceptual perception of the other persons intentional act. In adults, embodied practices constitute the primary access for understanding others, and continue to do so to a large extent, even after humans have achieved theory of mind abilities, supporting the creation of social relationships and knowledge sharing. Consistent with this vision, with the publication of The Cultural Origin of Human Cognition [41], Tomasello suggests that social cognition in humans emerged to specialized cultural and biological adaptations. Specifically, he states that in addition to an ontogenetic development (which provides human children with the acquisition of perspective-based cognitive representations in the form of linguistic symbols) there was also a phylogenetic evolution in humans (which provides them with the ability to identify with co-specifics) and, finally, an historical socio-genesis (which encourages new forms of cultural learning in humans). According to all three approaches listed above, social interactions are the result of a cooperative process in which all agents involved actively play with, and coordinate, each other, sharing a common background (including an amount of knowledge about each agents mental states, reciprocal expectations, and other types of social and cultural cognition) in order to perform and understand social actions. This is where neuroscience research joins in, demonstrating how embodied interaction contributes to the self-organising development of the neuronal structure responsible not only for motor action, but for the way we become aware of

F. Morganti / What Intersubjectivity Affords

ourselves, communicate with others, and intersubjectively live in a meaningful world. 1.4 Towards a neuroscientific endorsement The concern of the neurosciences for social behaviour goes back a long way. Essentially, it may be traced back to studies on the role played by frontal functions in the modulation of emotions, in the famous clinical case of Phineas Cage (for more on this case, see [16]). In their attempt to understand how frontal lobe lesions may cause substantial alterations of emotional and social behaviour, neurosciences find fresh stimulus in the study of the anatomical locations of these expressions of behaviour. Despite constant keen interest and notable research achievements, true and proper cross-fertilization never took place between neurosciences and social cognition until the birth of Social Cognitive Neuroscience (SCN) [42- 45]. SCN, in fact, investigates social processes by means of the methodologies and instruments proper to research in cognitive neurosciences. According to Liebermann [43], SCN is the study of brain functions that allow people to experience the social world effectively by understanding themselves and others. Within the SCN, the social approach includes the study of experiences and behaviour of a person as she perceives and interacts with a social target, while the cognitive one includes the understanding of the psychological processes that give rise to the experience or behavior of interest. These approaches are closely linked with a neural level of analysis that includes a description of the neural systems involved in the psychological processes on which a broader social behavior is based [42]. The tools used to study these topics, in fact, generally include functional neuro-imaging tools, such as fMRI and PET, which provide a lot of information from the functioning brain of live humans. A question arises here: does SNC represent a truly new approach to the study of relation? Before the emergence of SNC, research on the biological correlates of the social processes was already afoot. Indeed, some authors believe that the brain does not exist in isolation but rather is a fundamental component of developing and aging individuals who themselves are mere actors in the larger theater of life [46, p.1019]. This standpoint was reinforced by the introduction of the techniques of functional neuroimaging, which led to the discovery of the crucial role the amygdala plays in social cognition [47, 48]. It is a largely shared opinion that this part of the limbic system is involved in emotional and motivational stimulus evaluation; it is also related to the human possibility of a social interpretation of behaviour, so much so that any damage it suffers may lessen the subjects ability to understand his relation to others and of using such understanding to modulate his social behaviour [47]. The considerable innovation that the neurosciences have injected into the study of the self-other relation is mostly the result of research on sensory-motor coupling in understanding intentions [49] and of mirror neurons findings in humans [50,51]. Recently, in fact, there has been growing consensus on the fact that mirror neurons, and the related brain areas that are activated for self-movement and perception of another persons movements, play an important part in imitation and in the human beings ability to perceive intentions [52-54]. These findings support


F. Morganti / What Intersubjectivity Affords

the idea that to imitate a gesture, for example a facial gesture that she sees, an agent has no need to simulate the gesture internally. Rather, her body is already in communication with the others body at pre-conscious and perceptual levels that are sufficient for subjective engagement in interaction [54]. The evolution of research on mirror neurons, one of the leading stars in the neuroscientific arena today, shows the existence of mirroring neural clusters which, besides contributing to the recognition and modulation of the action, represent a plausible neural basis for embodied intentional interaction. Along similar lines, Galleses shared manifold hypothesis suggests that the mirrors system has a general role in enabling empathy [51, 56]. Accordingly, intersubjective identifications among humans are possible through intentional embodied attunement, while such primitive intersubjectivity remains an essential aspect of adult empathy and social behaviour. A heated and broad debate is in progress on the role of mirror neurons and on the implications that these findings may have for the study of the understanding of oneself and the other. Something is already clear, and has met with widespread consensus: the mirror systems constitute the neural basis for a primitive intersubjective information space, which is both phylogenetically and ontogenetically prior to the explicit conceptualization of others intention. Thus, by nature it doesnt necessarily require the intervention of the theory of mind in understanding others. From neuroscience findings, in fact, intersubjectivity appears to be a pre-reflexive functional mechanism that is not necessarily the result of an explicit and conscious cognitive effort. This statement could constitute a revolutionary, ground-breaking result for the study of interactions. 1.5 Conclusions From our brief, and necessarily incomplete, outline of theoretical trends in cognitive science, social cognition and neuroscience, one may gather that these areas tend to converge towards a common ground of understanding for the study of intersubjectivity (Figure 1). The study of intersubjectivity calls for a fruitful triangulation of these disciplines, leading to mutual enrichment and enabling each discipline to help the others towards a better understanding of the relational skills of human beings. At this stage, we are looking at a possible, though somehow difficult, communication between these areas, a communication which augurs well not only for mutual enhancement but also for a more holistic understanding of the human relation. From this point of view, the enactive approach put forward by the cognitive sciences broadens the scope for an appreciation of the importance of action intentionality in our experience of the world. In turn, this has been one of the fundamental research topics on mirror neurons in neuroscience. Neurosciences, for their part, contribute to teaching enactive cognition that awareness of the world through biologically-determined action is definitely possible at the neural level. At the same time, the motor-perceptive definition of mirror neurons has highlighted the fact that the others action may be understood through embodied imitation, thus confirming biologically what social cognition had already largely speculated on by gradually putting aside mentalistic theories (e.g. mindreading and

F. Morganti / What Intersubjectivity Affords


Theory of Mind) and introducing an imitative meaning to ones awareness of the others understanding. The convergence between cognitive and social perspective has turned out to be equally fruitful, by establishing that making sense of cognitive architecture is not worth insisting on unless its social nature is duly taken into account. The human mind, in fact, relates not only to the world but also to other minds which it recognises as equivalent to itself in their intentional nature and with which it can set up a relation. Conversely, while researching intersubjectivity the social sciences may continue to address the relation alone without considering the way in which such relations occur not as a process of exchanges of actions and information, but as the result of an intentional mental activity between individuals.

Cognitive Science

Social Cognition



Figure 1. The convergence of cognitive science, social cognition and neuroscience towards a common ground of understanding for the study of intersubjectivity

For a thorough understanding of the intersubjective nature of human cognition we cannot ignore the convergence of these disciplines; indeed, the former is defined by the latter, though differently and complementarily, as the capacity to reach an attunement of biologically determined intentional actions enacted in an actor-tailored and meaningful situation. Although we are satisfied that this is no longer uncharted territory, much more ground remains to be covered before subjectivity is fully understood and defined. Many research questions are still unanswered and point the way for potential approaches to applied research on this topic. Not surprisingly, observing current research in progress, we notice at once that some of it focuses on: (i) How this capacity develops in human beings (ii) Whether or not this ability is typical of human beings or shared with other species (iii) What is the cognitive architecture underpinning this capacity


F. Morganti / What Intersubjectivity Affords

(iv) How it allows us to see ourselves as belonging to the world and vis--vis our fellow beings (v) What this capacity is able to support in human cogntition (vi) What happens when something fails in intersubjectivity Each of these research lines helps us understand the intersubjective nature of human beings and confirms once more the need for cross-fertilization between the neurobiological, cognitive and social outlooks. Cross-fertilization calls for a lowering of the barriers separating disciplines; it prompts us to pay more and more attention to research conducted in other domains in order to discover in them an opportunity for understanding the non-understood or misunderstood and new suggestions for continuing research. Even though, as we have observed, this convergence is based on excellent premises, we are not out of the woods yet. On the one hand, this is quite typical of multidisciplinary research, where all disciplines by nature co-evolve and build up step by step, on the basis of successive findings. On the other hand, we still have to clarify this need for multidisciplinarity so that the results achieved by one discipline may be consistent with what is currently known to the other disciplines on the human relation. This does not necessarily mean that one has to be influenced by the sister disciplines in our interpretation of the results achieved; it does mean that one should try to find an interpretation that is congruent with the other disciplines when ones own results seem to be in line with theirs, and that one should apply a more rigorous critical spirit in expressing conclusions where a research result does not fit in well with what is already known to the other disciplines. Finally, as is the case with any interdisciplinary approach, the study of intersubjectivity, too, must do its best to keep clear of errors and misunderstandings. First and foremost, there is a dangerous persistence of miscommunication among disciplines, which instinctively use a terminologies that are often too different from each other. This may give rise to unpleasant ambiguities and/or even more unpleasant overlapping of terms with different meanings (or denotations) from one discipline to another. Among other things, and not the least, one should avoid falling into the trap of assuming a predominant standpoint with regard to intersubjectivity, which is unfortunately often the case with the neurosciences. If in our research on intersubjectivity we manage to consider the poles of this ideal triangulation as equidistant from an understanding of our phenomenon, we will be able to view it with the appropriate degree of interdisciplinarity, and make sure that our conclusions will not be subject or subservient to evidence provided by the other disciplines. Only this way do we stand a chance to understand better what happens when we are intersubjectively related to each other, not merely from a neural, but also from a functional and qualitative point of view. 1.6 Acknowledgements Theoretical reflections tend to grow out of sound, heated debate; and any discussion involves at least two interlocutors. As always, I am indebted to Antonella Carassa for inspiring me with the necessary degree of passion and anger

F. Morganti / What Intersubjectivity Affords


to clarify thoughts and ideas. For trusting and encouraging me to undertake this task, I particularly wish to thank, once more, Giuseppe Riva. The project has benefited from the financial support of FP6-UE grant for PASION project (IST2005-027654), without which this work would not have been possible. 1.7 References
[1] J. R. Searle, The Rediscovery of the Mind. Cambridge, Mass., MIT Press, 1992. [2] Y. Engestrom, D. Middleton, Cognition and communication at work. Cambridge: Cambridge University Press, 1996. [3] M.S. Gazzaniga, The Cognitive Neurosciences. MA: MIT Press, 2004. [4] S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science, Vol. 1: Mechanisms of imitation and imitation in animals. MA: MIT Press, 2005. [5] S. Hurley & N. Chater (Eds.), Perspectives on Imitation: From Neuroscience to Social Science Cambridge, Vol. 2: Imitation, Human Development, and Culture (Social Neuroscience). MA: MIT Press, 2005. [6] C. Trevarthen, The Foundations of Intersubjectivity: Development of Interpersonal and Cooperative Understanding Infants. In D.Olson (Ed.), The Social Foundations of Language and Thought. New York: W.W. Norton & Co., 1980. [7] S. Brten, Intersubjective Communication and Emotion in Early Ontogeny. Cambridge: Cambridge University Press, 2006. [8] E. Thompson, Between Ourselves: Second Person Issues in the Study of consciousness. Imprint Academic, 2001. [9] Z.W. Pylyshyn, Computing in Cognitive Science. In M.I. Posner (Ed.), Foundations of Cognitive Science. Cambridge, MA.: MIT Press, 1990. [10] J. Fodor, The Modularity of Mind. MIT Press, 1982. [11] W. J. Clancey, Situated cognition. On human knowledge and computer representations. Cambridge: Cambridge University Press, 1997. [12] A. M. Glenberg, Mental models, space, and embodied cognition. In T. B. Ward, S. M. Smith & J. Vaid (Eds.), Creative thought: An investigation of conceptual structures and processes, (pp. 495522). Washington, DC: American Psychological Association, 1997. [13] J. J.Gibson, The theory of affordances. In R. E. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing, (pp. 67-82). Hillsdale, NJ: Erlbaum, 1977. [14] J. Searle, Rationality in Action. MIT Press, 2001. [15] G. M. Edelman, Bright air, Brilliant Fire. On Matters of the Mind. New York: Basic Books, 1992. [16] A. R. Damasio, Descartes Error. Emotion, Reason and the Human Brain. New York: GP Putnams Sons, 1994. [17] G. Bateson, Steps to an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution, and Epistemology. University Of Chicago Press, 1972. [18] G. Lakoff, & M. Johnson, Metaphors we live by. Chicago, IL: University of Chicago Press, 1980. [19] M. Heidegger, Being and Time. Albany: State University of New York Press, 1953. [20] M. Merleau-Ponty, Phenomenology of perception. London: Routledge Press, 1962. [21] F. Varela, E. Thompson & E. Rosch, The embodied mind. Cognitive science and human experience. Cambridge, MA: MIT Press, 1991. [22] E. Thompson & F.J. Varela, Radical embodiment: Neural dynamics and consciousness. Trends in Cognitive Sciences, 5(10), 418-425, 2001. [23] H. Maturana & F. Varela, Autopoiesis and Cognition: the Realization of the Living. In R.S. Cohen & M.W. Wartofsky (Eds.), Boston Studies in the Philosophy of Science, 42. Dordecht: D. Reidel Publishing, 1980. [24] L.S. Vygotsky, Mind in Society: The development of higher psychological processes. Cambridge, MA: Harvard University Press, 1978. [25] A. Bandura, Social foundations of thought and action: A social-cognitive theory. Upper Saddle River, NJ: Prentice-Hall, 1986. [26] R. Sternberg, The triadic mind: A new theory of intelligence. NY: Viking Press, 1988. [27] A. N. Meltzoff, Like me: a foundation for social cognition. Developmental Science 10(1), 126 134, 2007.


F. Morganti / What Intersubjectivity Affords

[28] C. Trevarthen, Communication and cooperation in early infancy. A description of primary intersubjectivity. In M. Bullowa (Ed.), Before Speech: The Beginning of Human Communication, (pp. 321-46). London, UK: Cambridge University Press, 1979. [29] C. Ash, G. Chin, E. Pennisi & A. Sugden, Living in societies. Science, 317, no. 5843, 1337, 7 September 2007. [30] U. Frith, F. Happe, Theory of Mind and Self-Consciousness: What Is It Like to Be Autistic? Mind & Language, 14(1), 8289, 1999. [31] S.Baron-Cohen, Mindblindness. An essay on autism and theory of mind. Cambridge, MA: MIT Press, 1995. [32] A. M. Leslie, Pretence and representation. the origins of 'theory of mind'. Psychological Review, 94, 412-26, 1987. [33] A. Gopnik & A. N. Meltzoff, Words, thoughts, and theories. Cambridge, Mass.: Bradford, MIT Press, 1997. [34] S. Baron-Cohen, The autistic child's theory of mind: the case of specific developmental delay. Journal of Child Psychology and Psychiatry,30, 285-98, 1989. [35] A. Goldman, In Defense of the Simulation Theory. Mind & Language, 7, 104-119, 1992. [36] R. Gordon, Folk Psychology as Simulation. Mind and Language, 1, 158-171, 1986. [37] J. Tooby, & L. Cosmides, Mapping the evolved functional organization of mind and brain. In M. Gazzaniga (Ed.), The cognitive neurosciences. Cambridge, MA: MIT Press, 1995. [38] J. I. M. Carpendale & C. Lewis, How Children Develop Social Understanding. Oxford: Blackwell, 2006. [39] S.Gallagher, How the Body Shapes the Mind. Oxford: Oxford University Press, 2005. [40] C. Trevarthen, Communication and Cooperation in Early Infancy: A Description of Primary Intersubjectivity. In M. Bullowa (Ed.), Before Speech: The Beginning of Interpersonal Communication. Cambridge: Cambridge University Press, 1979. [41] M. Tomasello, The Cultural Origins of Human Cognition. Cambridge, MA: Harvard University Press, 1999. [42] K. N. Ochsner & M. D. Lieberman, The emergence of social cognitive neuroscience. American Psychologist, 56, 717-734, 2001. [43] M. D. Lieberman, Social cognitive neuroscience: A review of core processes. Annual Review of Psychology, 58, 259-89, 2007. [44] K. N. Ochsner, Social Cognitive Neuroscience: Historical Development, Core Principles, and Future Promise. To appear in: A. Kruglanksi & E. T. Higgins (Eds.), Social Psychology: A Handbook of Basic Principles, (pp. 39-66). 2nd Ed. New York: Guilford Press, 2007. [45] J. Decety, A social cognitive neuroscience model of human empathy. In E. Harmon-Jones & P. Winkielman (Eds.), Social Neuroscience: Integrating Biological and Psychological Explanations of Social Behavior, (pp. 246-270). New York: Guilford Publications, 2007. [46] J. T. Cacioppo & G. G. Berntson, Social Psychological Contributions to the Decade of the Brain. American Psychologist, 47, 1019-1028, 1992. [47] R. Adolphs, The neurobiology of social cognition. Current Opinion in Neurobiology,11, 231-239, 2001. [48] R. R. Adolphs, Investigating the Cognitive Neuroscience of Social Behavior. Neuropsychologia 41, 119-126, 2003. [49] G. Rizzolatti, L. Fadiga, V. Gallese & L. Fogassi, Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141, 1996. [50] G. Rizzolatti & L. Craighero, The mirror neuron system. Annual Review of Neuroscience, 27, 169192, 2004. [51] V. Gallese, C. Keysers & G. Rizzolatti, A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8, 396-403, 2004. [52] S. J. Blakemore & J. Decety, From the perception of action to the understanding of intention. Nature Reviews Neuroscience 2(1), 561-567, 2001. [53] T. Chaminade, A. N. Meltzoff & J. Decety, Does the end justify the means? A PET exploration of imitation. NeuroImage,15, 318-328, 2002. [54] P.Ferrari P. & V.Gallese, Mirror neurons and intersubjectivity. In S. Brten (Ed.), On Being Moved: From mirror neurons to empathy, (pp.73-88). John Benjamins Publ Co., 2007. [55] S. Gallagher & A.J. Marcel, The Self in Contextualized Action. Journal of Consciousness Studies 6(4), 4-30, 1999. [56] V. Gallese, The Shared Manifold Hypothesis: from mirror neurons to empathy. Journal of Consciousness Studies, 8, 33-50, 2002.


What do you see now? Globes of red, yellow, purple. Just a moment! And now? My father and mother and sisters. Yes! And now? Knights at arms, beautiful women, kind faces. Try this. A field of grain - a city. Very good! And now? Many womens with bright eyes and open lips. Try this. Just a globet on a table. Oh I see! Try this lens! Just an open space - I see nothing in particular. Well, now! Pine trees, a lake, a summer sky. That's better. And now? A book. Read a page for me. I can't. My eyes are carried beyond the page. Try this lens. Depths of air. Excellent! And now? Light, just light, making everything below it a toy world. Very well, we'll make the glasses accordingly. Edgar Lee Master - Spoon River, 1916

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Enactive Understanding and Motor Intentionality

Abstract. Most of our social interactions rest upon our ability to understand the behavior of others. But what is really at the basis of this ability? The standard view is that we understand the behavior of others because we are able to read their mind, to represent them as individuals endowed with mental states such as beliefs, desires and intentions. Without this mindreading ability the behavior of others would be meaningless for us. Over the last few years, however, this view has been undermined by several neurophysiological findings and in particular by the discovery of mirror neurons. The functional properties of these neurons indicate that motor and intentional components of action are tightly intertwined, suggesting that the basic aspects of intentional understanding can be fully appreciated only on the basis of a motor approach to intentionality. This paper has a dual objective: to develop this approach in order to account for the crucial role of motor intentionality in action and intention understanding below and before any metarepresentational ability, and to shed new light on the ontogeny of mindreading, by explaining how the first forms of understanding in infants may be intentional in nature, even without presupposing any explicit and deliberate mentalizing.

Contents 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Introduction......................................................................................................... 18 Mirror neurons for actions: goals and movements .............................................. 19 Basic motor acts and enactive understanding ..................................................... 21 Motor chains and intention understanding .......................................................... 23 Before mindreading: the ontogeny of intentional understanding ........................ 25 Teleological stance and motor intentionality ...................................................... 27 Concluding remarks ............................................................................................ 30 References........................................................................................................... 31


C. Sinigaglia / Enactive Understanding and Motor Intentionality

2.1 Introduction Most of our social interactions rest upon our ability to understand the behavior of others. But what is really at the basis of this ability? How exactly do we understand the behavior of others? This issue encompasses two distinct but complementary questions. In the first place, how do we realize that what we are observing are not pure physical events but intentional movements in other words, how do we attribute the status of action to the observed movements? And in the second place, how do we understand what type of action these movements are in other words, how do we identify them as this or that given action? It is widely assumed in the fields of cognitive science and philosophy of mind that both the recognition of an event as an action, and its identification as that particular action, depend equally on the ability of attributing to others those mental states (beliefs, desires, intentions, etc.) that are supposed to be at the origin of the observed motor behavior and that therefore can render it intelligible and in many cases predictable. Whether such mindreading ability is considered to be related to a more or less explicit use of a theory of mind, or to the assumption of an intentional stance based on a postulate of rationality, or to a more or less complex form of simulation (see [1-2] on this point), is of minor importance in this context. What is important here is that, even though the suggested mechanisms are very different, these various views share the idea that both the status and the identity of an action depends on its connection to specific mental states, so that without the ability to read the mind of others, that is to attribute them with specific mental states, it would be impossible to grasp the intentional meaning of their behavior [3]. However this idea is being radically challenged by an increasing number of studies in the field of what has been called neurophysiology of action. Analyses of the functional properties of the cortical motor system and, even more, the discovery of a distinct class of sensory-motor neurons (the so-called mirror neurons) have suggested the hypothesis that our understanding of the actions performed by others is primarily based on a mechanism that directly matches the sensory representations of the observed actions with the motor representations of the observers own actions. According to the direct matching hypothesis, we primarily understand the actions of others by means of our own motor knowledge: it is this type of knowledge that would enable us to immediately attribute an intentional meaning to the movements of others. This hypothesis does not exclude, of course, that other more complex processes, such as those that characterize our meta-representational abilities, may be at work and play a role in these functions. It simply maintains that mentalizing is neither the sole nor the primary way of intentional understanding, pointing out that our ability to understand the actions and the intentions of others capitalizes on the same motor knowledge that underpins our own capacity to act. The present paper aims at exploring such an enactive understanding below and before any mindreading ability. First of all, the basic properties of mirror neurons and their role in intentional understanding will be addressed. It will be argued that the mirror neuron mechanism undermines the usual construal of both action and action understanding by revealing the extent to which intentional and motor components of action are intertwined and how their involvement in action understanding can only be appreciated on the basis of a motor approach to

C. Sinigaglia / Enactive Understanding and Motor Intentionality


intentionality. The ontogenetic aspects of intentional understanding that have emerged from some recent studies in developmental psychology will be successively examined and discussed. There is a large consensus that the primary forms of understanding infants develop in the first year of life do not imply any meta-representational ability, nor can they be interpreted as mind reading. However, the nature and reach of this understanding are much-debated topics, and the solutions advanced are often in contrast one to another. It will be finally shown that the mirror neuron mechanism and motor intentionality not only shed new light on the background of mindreading, but also provide a coherent account for its ontogeny, suggesting how the first forms of action and intention understanding may be intentional in nature, being deeply related to the motor expertise infants acquire during their development. 2.2 Mirror neurons for actions: goals and movements Discovered first in the premotor cortex (area F5) [4-5] and then in the inferior parietal lobule (IPL, areas PF and PFG reciprocally connected with F5) [6] of macaque monkeys, mirror neurons are a specific class of motor neurons which become active not only when an individual performs a specific act (such as grasping) but also when s/he sees it being performed by other individuals. In the macaque monkey the activation of the mirror neurons is connected to the animals observation of motor acts characterized by an effective hand (or mouth)-object interaction. When the animal observes mimed acts in which no objects are present, or movements performed without any meaning such as raising an arm or waving hands (even if these actions are carried out with the intention of frightening the animal), there is no response from the mirror neurons; this is also the case when the animal observes food or generic three-dimensional objects. Several electrophysiological experiments and brain imaging studies have provided evidence for the existence of a mirror neuron system for action in the human brain (see for a review [7-8]). It has been shown that in humans too the observation of actions performed by others activates areas and circuits that are involved in motor activity. In particular, it has been demonstrated that the lower part of the precentral gyrus plus the posterior part of the inferior prefrontal gyrus and the rostral part of the inferior parietal lobule form the core of the human mirror system. However, though the mechanism and the localization of the human mirror system are very similar to those of the monkey, its functional properties are different and more sophisticated. Indeed, the human mirror system becomes active also for intransitive and mimicked actions [9] and it is able to code both the goaldirectedness of a given action and the temporal aspects of its single movements [10]. The implications of such differences for the development in humans of imitative and communicative capabilities, as well as for the evolutions of language, are extensively treated in [11]. In the following paragraphs I shall focus on the role of mirror neurons in intentional understanding. The specific congruence of the sensory and motor responses of mirror neurons have led to the hypothesis that they form the basis of a mechanism whereby our brain is able to directly match the sensory representations of the perceived actions with our own motor representations of those actions, and that the primary function of this matching would be to enable us to immediately understand the meaning of


C. Sinigaglia / Enactive Understanding and Motor Intentionality

the actions performed by others. But what do we really mean when we talk about the understanding and the meaning of an action? What kind of understanding can be associated with the direct matching mechanism of the mirror neurons? What does such a mechanism tell us about the meaning, i.e. the intentional content, of an action? Is it not misleading to resort to an intentional language by using terms such as understanding, meaning, and content? These questions can be answered first by considering the motor properties mirror neurons share with the other F5 and IPL neurons. Indeed it has been well known for some time now that the fundamental characteristic of F5 and IPL neurons is that they code goal-directed motor acts such as grasping, holding, manipulating, etc. and not the single movements that compose these acts, as do most of the primary motor cortex neurons which control the fine morphology of movement [12]. For example, many F5 neurons discharge when the monkey performs a motor act such as grasping a piece of food, irrespective of whether it uses its right or left paw or even its mouth to do so; others are more selective, discharging only for a particular effector or grip. However, even when selectivity is at its highest, the motor responses cannot be interpreted in terms of single movements: the neurons that discharge during certain movements (the flexing of a finger, for example) performed with a specific motor goal, such as grasping an object, discharge weakly or not at all during the execution of similar movements that compose a different motor act such as scratching. A recent study [13] has shown that the goal-relatedness of the F5 neurons not only concern hand- and mouth-, but also tool-mediated motor acts, even in cases where the distal goal of the tool is the opposite of the proximal goal of the hand. The experiment was carried out with macaque monkeys, which were trained to grasp objects using two types of pliers (normal and reverse), requiring opposite hand movements. When using the normal pliers, the monkey grasped the object presented to it by opening its paw and then closing it; when using the reverse pliers it first grasped the object by closing its paw and then opening it. All recorded neurons in the F5 area discharged in relation to the goal-related motion of the pliers, maintaining the same relation to the different phases of grasping in both conditions, regardless of the fact that diametrically opposite hand movements were required to achieve that goal. Neurons that discharged with normal pliers when the paw was opening discharged when the paw was closing with the reverse pliers: the discharge was always linked to the initial phase of the motor act. Conversely, neurons that discharged with normal pliers when the paw was closing discharged when the hand was opening with the reverse pliers, the discharge being related to the final phase of the motor act. It is worthwhile noting that in this study hand-related neurons were recorded also from the primary motor cortex (F1). Surprisingly, like F5 neurons, half of F1 neurons have been shown to code the goal-related motion of the pliers, and not the single movements of the fingers of the paw. This finding indicates that the goalrelatedness is a distinctive functional feature upon which the cortical motor system is organized, suggesting a mechanism for the transformation of a motor goal into an appropriate sequence of movements even when this is opposite to that usually required to achieve such a goal. The goal-related F5 and F1 neurons are connected with different sets of motor cortex neurons controlling the opening and the closing of the hand. Hand interactions with objects as well as the use of the normal pliers reinforce the connections that usually characterize a motor goal such as grasping,

C. Sinigaglia / Enactive Understanding and Motor Intentionality


selecting first those neurons that control hand opening and then those that control hand closure. After learning to use the reverse pliers, the opposite connections are reinforced by the success of the tool-mediated motor acts and prevail. Now the neurons that control hand closure are selected first followed by those that control hand opening. In other terms, grasping means no longer: Close the fingers! but: Open the fingers! 2.3 Basic motor acts and enactive understanding The fact that at the level of the cortical motor system movements are represented with different degrees of generality and that these representations reveal, albeit in varying ways, a specific goal-relatedness, shows how the meaning, the intentional content of an action does not depend entirely and exclusively on the mental states (beliefs, desires, etc.) which are supposed to lie at the origin of its execution [14]. There is no doubt that mental states such as beliefs, desires, and intentions can contribute to shaping and refining the intentional content of an action. For example, we can pick up a glass because we are convinced it contains our favorite whiskey and we want to savor it one more time: this action is very different to the action consisting in the same motor act performed with the intention of avoiding that someone picks up the glass if we think it contains a potent poison. It must be said, however, that in both cases the motor act of grasping embodies a motor intentional content that identifies it as being more than just a mere sequence of movements, that is as a goal-related motor act directed to grasp a certain object, with a certain shape, a certain size, etc. Quite apart from being the outcome of whatever prior and distinct pure mental state, the act of grasping, like every other basic motor act, is defined by its own goal-relatedness, which renders possible the coherent composition of the various movements and enables us to control them while they are being executed. This goal-relatedness is coded by F5 and IPL neurons, as well as by a portion of F1 neurons. It cannot be interpreted in abstract or mentalistic terms; on the contrary, it presupposes a motor representation where the adjective motor does not mean simply the content of this representation (as in the case of a mere representation of movement), but its format, its way of representing. This representation has to be construed as a motor goal-related one. It is a goal-related representation because it is characterized by different degrees of generality and although its content refers to movement, nonetheless it cannot be reduced to a single sequence of movements. But is also a motor representation, because the goal is represented in a motor format, as the end-point of a motor act, and although this representation can differ in respect to single movements, nonetheless it must have a coherent motor content that enables it to determine a given behavior and to control its execution. Without such a type of representation, it would be almost impossible to select the appropriate movements for our actions, compose them in the correct sequence and control the final execution. On the other hand, this type of representation enables a movement (like the flexing of a finger, for example), to take on different intentional meanings as it is a part of various acts with different motor goals (such as grasping as opposed to scratching); it also enables different movements (even movements which are diametrically opposite such as opening


C. Sinigaglia / Enactive Understanding and Motor Intentionality

and closing ones fingers) to take on the same intentional meaning, as they are part of acts with the same motor goal (grasping). Such motor representations are evoked also by observing the actions performed by others. As already mentioned, the mirror neuron mechanism directly matches the sensory with the motor representations of the observed actions. It is the goalrelatedness of these motor representations that allows the observer to immediately pick up the motor intentional meaning of the observed actions, i.e. the motor aboutness, which characterizes them as such and makes them comprehensible. The fact that the observation of an action performed by others generates a motor representation that is similar to that which the observer himself would activate if he were planning that action, shows that both representations possess the same intentional motor content and that the status and the identity of a given action, whether observed or performed, depends primarily on this content, at least at the level of basic motor acts. That is true not only for hand- and mouth- actions, but also for tool-mediated motor acts, even when they involve opposite sequences of movements. In the above-mentioned experiment on the use of normal and reverse pliers [13], both F5 purely motor neurons and also F5 mirror neurons were recorded. The motor and visual responses of F5 mirror neurons possessed the same goal-relatedness, that is that F5 mirror neurons were able to code the distal goal of the pliers as the same (grasping) both from a motor and visual point view, even when the movement of the fingers required to achieve that goal were not only different but diametrically opposite. This finding emphasizes the constitutive role of motor goal-relatedness in the action understanding made possible by mirror neuron activation, indicating how the ability to visually code the goal of the observed movements and the fineness-of-grain of this goal coding depend on the observers motor expertise. This in line with the evidence from a number of brain-imaging studies [15-18] over the last few years; these have shown that activation of the mirror neuron system during action observation is modulated by the observers motor repertoire. As this repertoire develops, diversifies and becomes increasingly sophisticated, the ability to immediate understand the actions of others develops, diversifies and also becomes increasingly sophisticated In other words, the more the goal representations are motor fine-grained, the greater the significance acquired by details of the observed actions, which, together with those effectively executed, share the fineness-of-grain of their motor intentional content. It is due to this sharing that action understanding can become extremely detailed continuing to be immediate and without presupposing the meta-representational abilities which are alleged to be at the basis of mind reading. The motor format of the goal-related representations also explains why action understanding is not strictly bound to the completeness of the sensory information or to only one sensory modality. Indeed, recordings of single F5 mirror neurons [19] showed that most responded to the observation of hand motor acts even when the final part of these acts, consisting in the effective object-hand interaction, were hidden behind a screen. The evoked motor goalrelated representation was always the same, independently of whether the motor act was observed in its entirety or only in its initial phases, allowing the monkey to understand the motor intentional meaning of the observed act in both conditions. In another study [20] F5 mirror neurons were recorded while the monkey observed the experimenter performing a sound-producing motor act and when it heard the sound without seeing the action.

C. Sinigaglia / Enactive Understanding and Motor Intentionality


The results showed that a large number of tested neurons responded selectively and congruently to a given motor act (for instance, peanut breaking) only when it was observed, heard or both heard and observed, but did not respond to the sight and sound of another motor act, or to non-specific sounds. This means that visual features are relevant only to the extent that they facilitate the understanding of the motor intentional content of the observed act but if such understanding could be facilitated by other cues (sounds, for example), the mirror neurons would be able to code the goal-relatedness of the perceived movements even in the absence of visual stimuli. 2.4 Motor chains and intention understanding The motor acts considered above are defined by single goal-related motor representations. In point of fact, however, our motor behavior usually displays a more complex intentional structure, which cannot be interpreted in terms of a simple sequence of motor acts but presupposes the embedding of the various acts involved into a specific goal hierarchy. Take for example the case of a motor act such as grasping: it may be embedded in diverse actions leading to different final motor goals: eating or placing, for example. In this case the single goal-related motor representation (grasping) becomes part of more complex motor representations related to final goals that differ one from another (grasping for eating or grasping for placing). Such motor intentional organization has been investigated by recording the activity of IPL motor neurons in macaque monkeys during typical hand grasping movements [21]. The experiment had two conditions: in the first, the monkey grasped a piece of food that had been placed in front of it, and then carried it to its mouth; in the second condition, the monkey grasped an object or a piece food and placed it in a container. Most of the recorded hand-grasping neurons triggered differentially depending on whether the grasping was a grasping to carry to the mouth or a grasping to move the piece of food from one place to another. The motor selectivity of these neurons and the fact that the motor representation evoked by their activation modulates its goal-relatedness with respect to the final goal of a specific action, not only explains one of the fundamental characteristics of motor organization, i.e. the existence of specific motor chains that guarantees the fluidity of acting, but also provides the building blocks upon which a more complex form of intentional understanding can be constructed. This form, in its complexity, is not restricted to single motor goals, but enables a grasp of the motor intention that makes up the various goals giving origin to real action. Indeed, in the same study parietal mirror neurons were recorded in the experimental conditions (grasping for eating and grasping for placing) used to test the motor properties of parietal grasping neurons. The results showed that most parietal mirror neurons displayed a clear congruence between motor and visual responses discharging differentially depending on which action the single motor act of grasping was embedded into. Note that in both action execution and observation mirror neurons became active as soon as the macaques paw or the experimenters hand assumed the shape necessary to grip the food or other objects. The fact that the visual stimulus elicited the same set of motor goal-related representations that compose the motor intention responsible for the execution of


C. Sinigaglia / Enactive Understanding and Motor Intentionality

the entire motor chain suggests that the monkey was immediately able to understand the whole motor intentional content of the observed action, and to anticipate its final goal, from the onset of the experimenters initial movements. There is no doubt that the context (presence or absence of containers, the type of object to be grasped, etc.) provides relevant visual cues. However, these latter were interpreted in terms of possible motor chains that were not selected on the basis of an all-or nothing mechanism, but presented varying degrees of plausibility that could change depending on the circumstances, thus showing the typical plasticity that characterizes any intentional motor behavior. An fMRI experiment [22] provided evidence that also the human mirror neuron system is able to grasp the motor intentions of others. Volunteers were presented with three different visual stimuli: a hand grasping a mug with different grips (precision or full hand), two different contexts (teapot, mug, plate, arranged as if someone were about to have tea, or had just finished) and a hand grasping a mug with different grips in different contexts (to indicate grasping the mug to drink from it, or to tidy it away). The results showed that the condition of hand actions embedded in context, compared with the other two conditions, produced a higher activation in the caudal part of the inferior frontal gyrus, in the region which constitutes the frontal node of the human mirror neuron system. This suggests that this system is able not only to code single motor acts, but also to code the general motor intention with which the single motor acts are performed (for example, grasping-for-drinking or grasping-for-tidying-away). It is worthwhile adding that a series of EMG experiments [23] has very recently shown, albeit indirectly, that motor intention understanding in humans is based on a motor chain organization similar to that found in monkeys and, even more importantly, that its impairment is at the basis of one of the core deficits that characterize the Autistic Spectrum Disorder (ASD). Typically developed (TD) and high-functioning autistic children were requested to execute and to observe two different actions: the first, eating, was to grasp a piece of food with the right hand from a plate, carry it to the mouth and eat it, while the second, placing, was to grasp a piece of paper placed on the same plate and put it into a box. During the execution and observation conditions of both actions the activity of the mouthopening mylohyoid muscle (MH) of the TD and autistic children was recorded using EMG surface electrodes. Both the execution and the observation of the eating action produced a marked increase of MH activity in TD children as early as the reaching phase, whereas no MH activity was recorded in the execution and observation conditions of the placing action. As occurred in the TD children, there was no MH activity in the autistic children during the execution and the observation of the placing action; on the contrary, however, they showed a much later activation of the MH while eating and no activation at all when eating was observed. There are a number of studies, using different techniques, [see, for instance, 2427] whose results support the hypothesis that a core deficit of ASD, the inability to understand the intentional meaning of the behavior of others and therefore to relate to them in an ordinary way, depends on a malfunctioning of mirror neuron system. The EMG experiments, however, indicate, for the first time, that the primary deficit is not in the responsiveness of the mirror neurons to the observation of others action, but in the impaired organization of motor chains underlying action representation [23]. The fact that the autistic children did not show MH activity

C. Sinigaglia / Enactive Understanding and Motor Intentionality


during the entire reaching and grasping phases of eating, becoming active only during the bringing-to-the-mouth phase, indicated that they were not able to represent the entire action to be executed as an intentionally organized motor chain, but only as a simple sequence of unrelated single motor acts. This inability did not determine any actual impairment during the execution of either action, given also the simplicity of the required tasks. During observation, however, the autistic childrens ability to disambiguate the eating action from its onset was impaired, and this made them unable to understand the motor intention with which the experimenter was grasping the pieces of food or of paper. In fact, MH activation did not occur during the observation of either eating or placing. It is very likely that there are various cues (object semantic, context, etc.) that help autistic children to understand why the experimenter was doing what he was doing. This type of understanding, however, should be very clearly distinguished from that generated by motor knowledge. The former provides at best a merely associative knowledge, whereas the latter gives a grasp of the motor aboutness of others actions, enabling to understand both the goal-relatedness that characterizes the single motor acts and, above all, the overall intention that underpins them. 2.5 Before mindreading: the ontogeny of intentional understanding Up to now I have considered what the mirror neuron mechanism suggests may be below mindreading. But what does this mechanism tell us about what happens before mindreading, and in particular what happens before mindreading from an ontogenetic rather than a phylogenetic point of view (on the latter issue see [28]; see also [29])? Are the first forms of intentional understanding in infants to be construed as modalities of acting in way that are consistent with more mature motor understanding of goal-related actions? Does motor intentionality play a role in shaping the ontogeny of intentional understanding? Over the last few years, numerous studies have demonstrated that the ability to read the mind of others could in fact appear at a very early stage in the infants development. A recent looking-time experiment, for instance, revealed the covert ability of 15-month-old infants to predict an actors behavior on the basis of her true or false beliefs [30]. However, it is widely accepted that the forms of intentional understanding infants develop in the first year of life are related to the goal-relatedness of the observed movements and that they cannot be interpreted in terms of mindreading [31-33]. For example, a series of looking-time experiments have shown that 6- and 9-month-old infants are able to distinguish between the goal-relatedness of some basic hand motor acts and their kinematics: infants in both age groups looked longer at the hand grasping a new object, but which followed that the same trajectory as in of the habituation test (new goal/old path), than at the hand following a different trajectory, but which grasped the same object as in the habituation test (old goal/new path). This did not happen when the observed action involved inanimate objects (a claw, for instance) or was incoherent from a motor point of view (the back of the hand was approached to the object instead of the open palm) [34-35]. These findings suggest that by their ninth month infants have a store of knowledge that does not imply any metarepresentational ability and allows them to be better tuned to the goal-related than to the spatial and temporal properties of the basic motor acts performed by others.


C. Sinigaglia / Enactive Understanding and Motor Intentionality

In spite of such large consensus, the hypotheses advanced to account for the nature and the reach of this kind of knowledge and its function in intentional understanding in infants are very different, and often in conflict one with another. For the sake of simplicity, I shall first introduce and discuss three of the most prevalent views, then elaborate and motivate a fourth hypothesis to show that motor intentionality is not only crucial for the development of intentional understanding, but also provides a theoretical unitary and neurophysiologically sound framework in which to construe the various and ever-growing body of evidence coming from development psychology research. According to the first view [see among others 36-38], it is not a contradiction in terms to attribute the ability to detect the intentional nature of the actions performed by others and to ascribe intentions to them to 6- and 9-month-old infants, as long as it is recognized that the fully developed concept of intention is gradually acquired at a much later stage. In the first place, an infants detection of intentions should not be confused with more mature meta-representational understanding. While the latter mechanism presupposes the ability to see intentions as mental representations independent from the actual execution of actions and to appreciate both their casual role and satisfactory conditions, the former is based on a notion of intention which does not presuppose any sharp distinction to the correlated notion of desire, but has to be interpreted as an undifferentiated pro-attitude, a conation, intimately tied to given actions and objects. Though it must be recognized that this view emphasizes the differences between the mechanisms involved in the development of intentional understanding, showing how at the beginning they are not closely related, and only become integrated at a later stage, the precocious and undifferentiated concept of intention it appeals to is neither a necessary nor a sufficient condition to account for the first forms of understanding in infants. It is not a necessary condition: what the lookingtime experiments show is that 6- and 9-month-old infants are able to detect the goal-relatedness of the observed motor behaviors without needing to attribute an undifferentiated pro-attitude to the latter. Even without such an attribution, the infants would have been able to distinguish between both the different goals of the observed motor acts [34] and also the congruent and non-congruent ways to achieve them [35]. Moreover, the undifferentiated nature of this pro-attitude explains why it cannot be considered as a sufficient condition, as it ends up assuming what it should in fact be accounting for, i.e. the intentional link between the goal and the motor means on which the status and the identity of a given action depends. How could the infants have rendered the observed motor events intelligible and understood their goal-relatedness simply by ascribing a void conatus towards some unspecific objects to these movements? To what extent would such an ascription have enabled the infants to disambiguate the sensory information and to code its peculiar content? The second view hypothesizes that the development of intentional understanding is rooted in at least two different systems: first, in a low-level system for detecting statistical regularities in the actions of others would enable the identification of relevant units in the observed behavior stream; a high-level system would then facilitate the making sense of these units in terms of second-order mental states, thus achieving a genuine intentional understanding [see among others 39-40]. These systems would have independent evolutionary and developmental origins,

C. Sinigaglia / Enactive Understanding and Motor Intentionality


and only later become intertwined. The infants abilities in action understanding would therefore reflect an evolutionary archaic competence to monitor the behavior of others and to keep track of the statistical regularities embedded therein, while the ability to read the mind of others might have its own evolutionary history, probably connected with the emergence of the representational structures that support human language [39]. The evolutionary and developmental advantage of the meta-representational system would therefore not have been the generation of a large set of fundamentally new behaviors, but the greater flexibility offered in organizing and deploying existing behavioral elements, thus putting old behavioral patterns to new uses. There is no doubt that the hypothesis that systems unrelated to metarepresentational abilities may have evolved to detect the same kinds of behavioral regularities that will later be interpreted by mature mind readers in terms of intentions may sound very appealing from both an evolutionary and a developmental point of view. Indeed, it challenges the notion of a rigid cognitive discontinuity between the various ways of action understanding, claiming that the advantage of the emerging meta-representational abilities would have been to refine the behavioral abilities, increasing their flexibility in the planning of ones own actions as well as in the understanding of those performed by others. With regards the low-level system, however, it must be said that, like the previous view, this hypothesis takes for granted what should rather be accounted for. Attempting to explain the first form of infant understanding by falling back on the statistical regularities that are thought to characterize biological movements seems rather an indication of the existence of a problem than its solution. How do 6- and 9-monthold infants acquire the capacity to detect such regularities and to what extent do these regularities allow the infant to code sensory information in intentional terms? Just how statistically regular must the observed hand-object contacts (for example) be to enable infants to understand the specific goal-relatedness of reaching and grasping movements performed by others? And why should those connected to the various hand-object interaction modalities prevail, rather than those connected to the cinematic aspects of observed movements, to their spatial-temporal characteristics, etc.? Surely it is the progressive development of the ability to act that permits the infant to perceive these regularities and not others and therefore it is from that ability, rather from the simple regularities observed, that the first forms of understanding would depend? 2.6 Teleological stance and motor intentionality The third view, known as teleological stance hypothesis, seems carry greater weight today. It is based on a series of looking-time experiments that used computer-animated events with 2D geometric figures (circles and rectangles) behaving in ways adults have no difficulty in describing as goal-related [see for a review 41]. In one of these experiments, for example, 12-month-old infants were habituated to seeing a small circle approaching a large circle by jumping over an obstacle separating the two. During the test phase the obstacle was removed and infants were presented with two different test displays: in the first, the small circle approached the large circle along the same trajectory as before, while in the second the small circle approached its target along a straight-line trajectory. Infants looked


C. Sinigaglia / Enactive Understanding and Motor Intentionality

longer at the trajectory in which the small circle jumped, suggesting that they found it curious because there was no longer any need to jump as the obstacle had been removed and therefore the trajectory had become inefficient, whereas the straight-line trajectory matched their expectations as it was considered the most efficient way to approach the target in the new situation [42]. According to the authors of these experiments, such findings show that by 12 months of age infants are equipped with an inferential system, the teleological stance, enabling them to ascribe goal-relatedness to the movements of a wide range of entities on the basis of a rationality principle that would provide the well-formedness criteria for action interpretations. This principle would incorporate two basic assumptions about the intentional nature of action i.e. that (i) its primary function is to bring about some particular change of state in the world, and that (ii) any agent will employ the most efficient means available within the constraints of a given situation (equifinal variation of action) , thus specifying the types of perceptual cues whose presence can impel infants to infer the goal-relatedness of the observed movements. In particular, evidence that individual has the ability to adaptively modify his/her conduct to a change in environmental conditions, attaining the same goal in the most efficient manner in the new condition, should be taken as a strong indication of the goal-relatedness of the observed movements independently of the agent that has performed them, be it a human being or a 2D object. The intentional nature and reach of the various forms of understanding thus result as sourcing from a unique principle which, though applied to diverse dominions, would still be the same for all. In fact, though the teleological stance does not imply any mentalizing, involving inferences on factual reality (action, goal-state and current situational constraints), it would share the principle of rationality with the intentional or mentalistic stance. This would account for both the surprising sophistication of early intentional understanding and also the development of mindreading abilities, suggesting that infants start to assume a mentalistic stance when their cognition becomes sufficiently flexible to represent fictional and counterfactual world states and to apply the inferential principle of the earlier (teleological) stance to them [41]. In spite of its elegance and simplicity, the teleological stance hypothesis also presents some difficulties. First of all, the rationality principle that rules the action interpretation narrows down the goal-relatedness attribution to only the most efficient of the possible behaviors in any given situation, so identifying the first forms of intentionality with the efficacy of the conduct to which they supposedly give origin. This may be true in cases of chasing such as those of the experiment considered above, in which the goal relatedness of the observed movements can be measured only in terms of minimal action or the shortest path. But what happens in the case of more specific interactions, such as, for example, the handobject interaction that is typical of reaching and grasping? A very recent lookingtime study on macaque monkeys carried out with the same experimental paradigm used in previous work on human infants has shown that the tested animals were able to detect the efficacy of observed hand goal-related motor acts only when these latter belonged to their motor repertoire, or were at least compatible with its motor expertise [43]. Therefore it can be shown that goal-relatedness and efficacy do not always coincide, but in the case of basic motor acts such as grasping the latter presupposes

C. Sinigaglia / Enactive Understanding and Motor Intentionality


the former. On the other hand, the abstractness of the behavioral cues and the fact that in very simple situations they would suggest the goal-relatedness of the observed movements, does not in itself guarantee the generality of the proposed inferential system. Even the supporters of the teleological stance were hard put to it to account for the results of the above quoted looking-time experiments regarding the observation of hand motor acts and in which the goal attribution did not imply the effective presence of either of the two perceptual cues presupposed by the rationality principle, to the point that they had to call on the role played by the previous perceptive and motor experience of the 6- and 9-month-olds, emphasizing how natural grasping events familiar to infants often exhibit equifinal modification of action as a function of environmental changes when lifting, transporting, shaking etc. of variable objects grasped in different situations [44]. The question remains whether the recourse to pure reason is still legitimate in cases such as these, or whether it would not be simpler and more economic to call on the motor knowledge of 6- and 9-month-olds, given also the fact that it can drive the infants understanding of the goal-relatedness of observed movements without presupposing any inferential system. This in fact is the direction taken by a good number of more recent studies. In particular, it has been shown that infants are sensitive to the goal-relatedness of movements performed by others even at 3 months of age, but only when facilitated by previous motor experience [45]. The fact that infants capitalize on their own motor knowledge for intentional understanding is also strongly corroborated by a gaze-recording experiment indicating that, like adults, 12-month-old infants produce proactive goal-directed eye movements when observing a goal-directed placing which did not occur when they observed self-propelled objects following the same trajectory as before but without the presence of any human effector [46]. This is not to deny the importance of the data produced in support of the teleological stance hypothesis, nor does it exclude that forms of inference such as those indicated above may play an important role in the development of infant understanding. It simply implies the need to realize that motor knowledge is at the basis of a form of intentional understanding, which has to be recognize as original and perhaps as primary. This is particularly true as it not only permits the identification of the various degrees of goal sensitivity that children demonstrate in their first year of life, but also throws light on a key point in the development of intentional understanding, which is marked by the emergence of the capacity of the capacity of interpreting the observed motor acts not only as individual acts but also in terms of hierarchically organized motor goals. It been known for some time that 9/12 months of age marks a crucial phase in the development of infants ability to represent motor goals in planful manner. More recently, it has been shown that by 12 months of age, infants are able to detect the hierarchical goal structure of a sequence of motor acts [47]. But the most important point is that the investigations of 10-month-old infants looking times have revealed that only the infants who were able to organize, by themselves, determined sequences of hierarchically organized motor acts were able to recognize the same sequences performed by others [48] showing once again the crucial role of motor intentionality in the ontogeny of intentional understanding.


C. Sinigaglia / Enactive Understanding and Motor Intentionality

2.7 Concluding remarks Taken together, neurophysiological findings and development psychology research suggest that motor intentionality is at the basis of the action and intention understanding below and before mindreading. The functional properties of the cortical motor system and the mirror neuron mechanism indicate that the actions of others, like our own, possess a specific motor intentional meaning that cannot be reduced to the pure mental states (beliefs, desires, intention, and so on) that might have been at the origin of their execution at least at the level of basic actions. It is because of their motor intentional meaning that the actions performed by others, whether they are formed by single motor acts or entire chains of actions organized by specific goal hierarchies, are immediately recognizable to us. As soon as we see someone doing something, either a single act or a chain of motor acts, his/her movements take on meaning for us, whether he/she likes it or not, and regardless of what he/she has in mind. This does not mean that the role that mind reading ability plays in intentional understanding must be denied. Our social conduct depends largely on our ability to read the mind of others. However, whichever the underlying mechanism may be, this meta-representational ability does not account for the full extent of intentional understanding. Nor can it be assumed as being paradigmatic. If it did, we would have to assume that without any explicit or deliberate mentalizing the actions of others would be basically opaque for us, mere physical movements devoid of any intentional meaning whatsoever. This however is not the case, and the mirror neurons mechanism shows how our motor knowledge enables us to immediately understand them. Such enactive understanding is not only different in nature and content from the modalities of mind reading that have traditionally been taken into consideration, it also helps clarify their ontogeny, throwing new light on the first forms of intentional understanding that infants develop during their first year of life. Many of the most recent experiments show clearly how the sensitivity demonstrated by infants to the goal-structure of action while observing the movements of others depends on the level of development they have reached in their own capacity to act. The very transition to more articulated forms of understanding, that enables them to grasp the meaning of individual acts according to the overall actions in which they are embedded appears to be marked by the capacity to represent entire chains of actions with a specific goal hierarchy, and not just single unrelated acts; this allows them to plan and implement action with increasingly complex motor and intentional content. This is a key transition in the development of intentional understanding that once again reveals the crucial role of motor intentionality. This does not mean to say that the entire ontogeny of mind reading must be reduced to the development of motor intentionality. It is simply to underline how a motor approach to intentionality, such as that suggested by the mirror neuron mechanism, may, for the first time, show the way to rethinking the basis and the development of intentional understanding within a unitary theoretical and neuro-physiologically grounded framework.

C. Sinigaglia / Enactive Understanding and Motor Intentionality


2.8 References
[1] P. Carruthers & P.K. Smith (Eds.), Theories of theories of mind. Cambridge: Cambridge University Press, 1996. [2] B.F. Malle, J.L. Moses & D.A. Baldwin (Eds.), Intentions and intentionality: Foundations of social cognition. Cambridge Ma: MIT Press, 2001 [3] M. Rowlands, Body language. Representation in action. Cambridge Ma: MIT Press, 2006. [4] G. Rizzolatti, L. Fadiga, V. Gallese & L. Fogassi, Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141, 1996. [5] V. Gallese, L. Fadiga, L. Fogassi & G. Rizzolatti, Action recognition in the premotor cortex. Brain, 119, 593-609, 1996. [6] V. Gallese, L. Fogassi, L. Fadiga & G. Rizzolatti, Action representation and the inferior parietal lobule. In W. Prinz & B. Hommel (Eds.), Attention and Performance XIX, 247-266. Oxford: Oxford University Press, 2002. [7] V. Gallese, C. Keysers & G. Rizzolatti, A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8, 396-403, 2004. [8] G. Rizzolatti & L. Craighero, The mirror neuron system. Annual Review of Neuroscience, 27, 169192, 2004. [9] G. Buccino, F. Binkofski, G.R. Fink, L. Fadiga, L., Fogassi, V. Gallese, R.J. Seitz, K. Zilles, G. Rizzolatti & H.J. Freund, Action observation activates premotor and parietal areas in a somatotopic manner: an fMRI study. European Journal of Neuroscience, 13, 400-404, 2001. [10] M. Gangitano, F.M. Mottaghy & A. Pascual-Leone, Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12, 1489-1492, 2001. [11] G. Rizzolatti & C. Sinigaglia, Mirrors in the brain. How our minds share actions and emotions. Oxford: Oxford University Press, 2007. [12] G. Rizzolatti, R. Camarda, L. Fogassi, M. Gentilucci, G. Luppino & M. Matelli, Functional organization of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal movements. Experimental Brain Research, 71, 491-507, 1988. [13] M.A. Umilt, L. Escola, I. Intskirveli, F. Grammont, M. Rochat, F. Caruana, A. Jezzini, V. Gallese & G. Rizzolatti. How pliers become fingers in the monkey motor system. Proceedings of The National Academy of Sciences, in press. [14] A. Marcel, The sense of agency: Awareness and ownership of action. In J. Roesller & N. Eilan (Eds.), Agency and self-awareness: Issues in philosophy and psychology, (pp.48-93). Oxford: Clarendon, 2003. [15] G. Buccino, F. Lui, N. Canessa, I.,Patteri, G. Lagravinese, F. Benuzzi, C.A. Porro & G. Rizzolatti, Neural circuits involved in the recognition of actions performed by nonconspecifics: An fMRI study. Journal of Cognitive Neuroscience, 16, 114-126, 2004. [16] B. Calvo-Merino, D.E Glaser, J. Grezes, R.E. Passingham & P. Haggard, Action observation and acquired motor skills: an FMRI study with expert dancers. Cerebral Cortex, 15, 1243-1249, 2005. [17] B. Calvo-Merino, J. Grezes, D.E Glaser, R.E. Passingham & P. Haggard, Seeing or doing? Influence of visual and motor familiarity in action observation. Current Biology, 16(19), 19051910, 2006 [18] B. Haslinger, P. Erhard, E. Altenmuller, U. Scroeder, H. Boecker & A.O. Ceballos-Baumann, Transmodal sensorimotor networks during action observation in professional pianist. Journal of Cognitive Neuroscience, 17, 282-293, 2006. [19] M.A. Umilt, E. Kohler, V. Gallese, L. Fogassi, L. Fadiga, C. Keysers & G. Rizzolatti, "I know what you are doing": a neurophysiological study. Neuron, 32, 91-101, 2001. [20] E. Kohler, C. Keysers, M.A. Umilt, L. Fogassi, V. Gallese & G. Rizzolatti, Hearing sounds, understanding actions: Action representation in mirror neurons. Science, 297, 846-848, 2002. [21] L. Fogassi, P.F. Ferrari, B. Gesierich, S. Rozzi, F. Chersi & G. Rizzolatti, Parietal lobe: From action organization to intention understanding. Science, 302, 662-667, 2005. [22] M. Iacoboni, I. Molnar-Szakacs, V. Gallese, G. Buccino, J. Mazziotta & G. Rizzolatti, Grasping the intentions of others with ones owns mirror neuron system. PLoS Biology, 3, 529-535, 2005. [23] L. Cattaneo, M. Fabbi-Destro, S. Boria, C. Pieraccini, A. Monti, G. Cossu & G. Rizzolatti, Impairment of actions chains in autism and its possible role in intention understanding. Proceedings of The National Academy of Sciences, in press. [24] J.H. Williams, A. Whiten, T. Suddendorf & D.I. Perrett, Imitation, mirror neurons and autism. Neuroscience & Biobehavioral Reviews, 25(4), 287-295, 2001. [25] N. Nishitani, S. Avikainen & R. Hari, Abnormal imitation-related cortical activation sequences in Aspergers syndrome. Annals of Neurology, 55 (4), 558-562, 2004.


C. Sinigaglia / Enactive Understanding and Motor Intentionality

[26] L.M. Oberman, E.H. Hubbard, J.P. McCleery, E. Altschuler, V.S. Ramachandran & J.A. Pineta, EEG evidence for mirror neuron dysfunction in autism spectrum disorders. Cognitive Brain Research, 24, 190-198, 2005. [27] H. Theoret, E. Halligan, M. Kobayashi, F. Fregni, H. Tager-Flusberg & A. Pascual-Leone, Impaired motor facilitation during action observation in individuals with autism spectrum disorder. Current Biology, 15, 84-85, 2005. [28] V. Gallese, M. Rochat, G. Cossu & C. Sinigaglia, Motor cognition and its role in the phylogeny and ontogeny of intentional understanding. Developmental Psychology, in press. [29] V. Gallese, Before and below Theory of mind: Embodied simulation and the neural correlates of social cognition. Proceedings of the Royal Society B Biological Science, 362, 659-669, 2007. [30] K.H. Onishi & R. Baillargeon, Do 15 month-olds understand false beliefs? Science, 308, 255-258, 2005. [31] M. Tomasello & M. Barton, Learning words in nonostensive contexts. Developmental psychology, 30, 639-650, 1994. [32] A.N. Meltzoff, Understanding the intention of others: Re-enactment of intended acts by 18month-old children. Developmental Psychology, 31, 838-850, 1995. [33] M. Carpenter, K. Nagell & M. Tomasello, Social cognition, joint attention, and communicative competence from 9 to 15 months of age. Monographs of the Society for Research in Child Development, 63, 1-176, 1998. [34] A.L. Woodward, Infants selectively encode the goal object of an actors reach. Cognition, 69, 134, 1998. [35] A.L. Woodward, Infants ability to distinguish between purposeful and non-purposeful behaviors. Infant Behavior & Development, 22 (2), 145-160, 1999. [36] J. Astington, The paradox of intention: Assessing childrens metarepresentational understanding. In B.F. Malle, L.J. Moses & D.A. Baldwin (Eds.), Intentions and intentionality. Foundations of social cognition (pp.85-104). Cambridge, MA: MIT Press, 2001. [37] B.F. Malle & J. Knobe, The distinction between desire and intention: A folk-conceptual analysis. In B.F. Malle, L.J. Moses & D.A. Baldwin (Eds.), Intentions and intentionality. Foundations of social cognition,(pp.45-67). Cambridge, MA: MIT Press, 2001. [38] R. Saxe, S. Carey & N. Kanwisher, Understanding other minds: Linking developmental psychology and functional neuroimaging. Annual Review of Psychology, 55, 87-124, 2004. [39] D. Povinelli, On the possibility of detecting intentions prior to understanding them. In B.F. Malle, L.J. Moses & D.A. Baldwin (Eds.), Intentions and intentionality. Foundations of social cognition, (pp. 225-248). Cambridge, MA: MIT Press, 2001. [40] J. Baird & D.A. Baldwin, Making sense of human behavior: Action parsing and intentional inference. In B.F. Malle, L.J. Moses & D.A. Baldwin (Eds.), Intentions and intentionality. Foundations of social cognition, (pp.193-206). Cambridge, MA: MIT Press, 2001 [41] G. Gergely & G. Csibra, Teleological reasoning in infancy: The nave theory of rational action. Trends in cognitive sciences, 7 (7), 287-291, 2003. [42] G. Gergely, Z. Ndasdy, G. Csibra & S. Br, Taking the intentional stance at 12 months of age. Cognition, 56, 165-193, 1995. [43] M. Rochat, E. Serra, L. Fadiga & V. Gallese, The evolution of social cognition: Goal familiarity shapes monkeys action understanding. Current Biology, in press. [44] I. Kirly, B. Jovanovic, W. Prinz, G. Ascherleben & Gergely, The early origins of goal attribution in infancy. Consciousness and Cognition, 12, 752-769, 2003. [45] J.A. Sommerville, A.L. Woodward & A. Needham, Action experience alters 3-month-old perception of others actions. Cognition, 96(1), 1-11, 2005. [46] T. Falck-Ytter, G. Gredeback & C. von Hofsten, Infant predict other peoples action goals. Nature Neuroscience, 9 (7), 878-879, 2006. [47] A.L. Woodward & J.A. Sommerville, Twelve-month-old infants interpret actions in context. Psychological Science, 11, 73-77, 2000. [48] J.A. Sommerville & A.L. Woodward, Pulling out the intentional structure of action: the relation between action processing and action production in infancy. Cognition, 95 (1), 1-30, 2005.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Making Sense in Participation: An Enactive Approach to Social Cognition

Abstract. Research on social cognition needs to overcome a disciplinary disintegration. On the one hand, in cognitive science and philosophy of mind even in recent embodied approaches the explanatory weight is still overly on individual capacities. In social science on the other hand, the investigation of the interaction process and interactional behaviour is not often brought to bear on individual aspects of social cognition. Not bringing these approaches together has unfairly limited the range of possible explanations of social understanding to the postulation of complicated internal mechanisms (contingency detection modules for instance). Starting from the question What is a social interaction? we propose a fresh look at the problem aimed at integrating individual cognition and the interaction process in order to arrive at more parsimonious explanations of social understanding. We show how an enactive framework can provide a way to do this, starting from the notions of autonomy, sense-making and coordination. We propose that not only each individual in a social encounter but also the interaction process itself has autonomy. Examples illustrate that these autonomies evolve throughout an encounter, and that collective as well as individual mechanisms are at play in all social interactions. We also introduce the notion of participatory sense-making in order to connect meaning-generation with coordination. This notion describes a spectrum of degrees of participation from the modulation of individual sense-making by coordination patterns, over orientation, to joint sensemaking. Finally, we discuss implications for empirical research on social interaction, especially for studies of social contingency.

Contents 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Introduction.. ... 34 Enaction.... ... 34 What is a social interaction?. ... 36 Making sense in participation.. ........ 40 Implications.. ... 43 Conclusion.. ..... 45 References.... ... 45


H. De Jaegher and E. Di Paolo / Making Sense in Participation

3.1 Introduction A strange situation dominates contemporary approaches to social cognition. Whilst anthropologists and other social scientists traditionally the investigators of social interaction processes are not often interested in relating their findings to questions about individual cognition, psychologists and cognitive scientists seem generally not aware of, or take for granted, the importance of the interaction process for social cognition, and focus instead on individual capacities. On one side, the focus has been too exclusively on the interaction process, whereas on the other, the individual has been over-exaggerated. This situation needs to be rebalanced. Our understanding of social cognition is developing fast. Recently proposed embodied accounts of social cognition receive a lot of attention, and rightly so. They go beyond traditional cognitivist explanations and emphasise the role of the body in our understanding of anothers intentions. However, a drawback of many of these approaches is that the emphasis is still too exclusively on the individual body. In the enthusiasm for embodiment in the social realm the fact is sometimes overlooked that social understanding is crucially an interactional process. To social scientists, this may seem a trivial insight, but in cognitive science the importance of the interaction process is only beginning to trickle through the still very individualist net (see also [1]). An account of social cognition that can recombine the individual and the interactional is timely. Most approaches, even those that are embodied and interactive, subscribe to an individualistic view of social cognition. We call this the Rear Window view. As a result, the question What makes an interaction social? falls into a blind spot for most of social cognition research today. In this chapter, we bring together ideas developed in our recent work in order to shed an enactive light on these questions [2, 3]. This contribution extends the enactive approach to show that it can provide a non-individualistic basis for social cognition. We develop a definition of social interaction and discuss implications for empirical studies. 3.2 Enaction The concept of enaction today is applied with a variety of different meanings, which nevertheless often overlap. However, it is necessary to clarify as much as possible how this term is going to be used, particularly if we want to extend it into a novel area. Francisco Varela and colleagues [4-6] have provided the clearest articulation of enactive ideas. In their writings we find a view of cognition as an ongoing and situated activity shaped by life processes, self-organisation dynamics, and the experience of the animate body. In this perspective the properties of living and cognitive systems are part of a continuum. When referring to the enactive approach, we mean the perspective based on the mutually supporting concepts of autonomy, sense-making, embodiment, emergence, and experience [4, 5, 7, 8]. For our purposes here, we focus particularly on the first two: autonomy and sensemaking. According to enaction, living organisms are the paradigmatic cases of cognisers. Their organizational properties are the departure point of the approach. One such crucial property is the constitutive and interactive autonomy that living systems

H. De Jaegher and E. Di Paolo / Making Sense in Participation


enjoy by virtue of their self-generated identity as distinct entities in constant material flux. An autonomous system is defined as a system composed of several processes that actively generate and sustain an identity under precarious circumstances. In this context, to generate an identity is to possess the property of operational closure. This is the property that among the enabling conditions for any constituent process in the system there will always be one or more other processes in the system (i.e., there are no processes that are not conditioned by other processes in the network which does not mean, of course, that conditions external to the system cannot be necessary as well for such processes to exist). Precarious conditions are those where isolated component processes would tend to run down or extinguish in the absence of the organization of the system as a network of processes, under otherwise equal physical circumstances. Similar constitutive and interactive properties have been proposed to emerge at different levels of identity-generation, including sensorimotor and neuro-dynamical forms of autonomy [4, 7, 9-11]. In such a view, a cogniser is not seen as responding to environmental stimuli or satisfying internal demands, which are part of the traditional dichotomy between internal and external determinants of behaviour. The enactive approach gives the autonomous agent its proper ontological status as an emergent biological self instead of subordinating it to a passive role of obedience. The organism is an embodied and experiencing centre of activity in the world. The notion of interactive autonomy implies that organisms cast a web of significance on their world [4, 12, 13]. An organism regulates its coupling with the environment because it aims at the continuity of the self-generated identity or identities that initiate this regulation. This regulative process provides the organism with a perspective on the world, which is inseparable from the agent being a centre of activity in the world [4, 7, 10, 12, 13]. Being a cognitive system means that exchanges with the world are inherently significant for the cogniser who engages in the creation and appreciation of meaning or sense-making in short. Like few ideas in the past, this naturalised dimension of significance strikes at the heart of what is to be cognitive. Sense-making implies an inherently active engagement; it is an activity. This is in contrast to the view that organisms receive information from their environment in a more or less passive manner and then process it into internal representations, which are invested with significant value only after further processing. Natural cognitive systems do not build pictures of their world (accurate or not). They engage in the generation of meaning in what matters to them and enact a world. The notion of sense-making grounds a relational and affect-laden process of regulated exchanges between an organism and its environment in biological organization. Binding affect and cognition together at the origins of mental activity, metabolism creates a perspective of value on the world. This idea has been defended by Hans Jonas [14] and elaborated scientifically in terms of the theory of autopoiesis [4, 10, 12, 13, 15]. Sense-making describes a more general aspect of the relation of the cogniser with its world than those more specific engagements often described as action or perception, which are in fact later specializations of the activity of sense-making. Both action and perception are forms of sense-making. Examples illustrating this point have been discussed in the enaction literature. The clearest illustration is given by perceiving the softness of a sponge [16]. This quality is not in the


H. De Jaegher and E. Di Paolo / Making Sense in Participation

sponge but in its specific response to particular probings and squeezings by appropriate bodily movements. A particular encounter between an embodied questioning and probing agent and a reacting and responding segment of the world results in the perception of softness. Lawful co-variations in this dialogue between agent and world stabilise sense-making into the perception of an object (often not detached from its use). Movements are consequently at the centre of mental activity: a sense-making agents movements which may include utterances are tools for her cognizing. Based on these core ideas, what should be the central concerns of an enactive theory of social cognition? Previous approaches, including many embodied ones, have tended to shoehorn the whole realm of our social capacities into the problem of figuring out someone elses intentions out of our uninvolved individual observations of them a Rear Window approach to the social. This removed cognitive problem is indeed an aspect of social cognition. However, it has unduly dominated the field at the expense of downplaying more engaged forms of interaction. This chapter aims to move away from a view that centres almost exclusively on individual cognitive mechanisms. In its place, it sketches the outlines of an approach that defines the social in terms of the embodiment of interaction, shifting and emerging levels of autonomous identity, and joint sensemaking and its experience. 3.3 What is a social interaction? The individualistic perspective that prevails in social cognition research has already been challenged in other areas of cognitive science, such as in active perception work in AI and robotics [17-19]. The main lesson to be drawn from such work is that there is no empirical foundation to the view that a cognitive system bears the weight of its cognitive performance on its own in an environment that is only contextual (and often abstract). On the contrary, engagement with environmental dynamical processes is more often than not the central part of the cognitive mechanisms that render performance possible. In other words, most of everyday cognition happens thanks to processes involving the dynamics of the agent/environment coupling. In social cognition research this situation should be most obvious. However we find that, paradoxically, empirical and theoretical investigations are still informed largely by a view that places the key to appropriate performance exclusively within the agents individual cognitive mechanisms. Social interactions are often seen as abstract, disembodied, and nondynamic (e.g., in snapshot views in which time-oblivious discrete actions are followed by discrete responses). Accordingly, social interaction is the contextual problem-space where a socially-capable individual solves the problems of social performance. The interaction process is hardly ever seen as part of the mechanisms that allow embodied social skills to unfold. Even work involving rich interaction dynamics (such as studies of social contingency or collaborative work [20]) is often interpreted in terms of the individual mechanisms that would be sufficient to give rise to the observed results. The conjecture is very rarely made that the observed phenomena may be generated by a combination of individual mechanisms which may in themselves be insufficient and the right interaction dynamics. We argue that the introduction of the interactive dimension will, rather

H. De Jaegher and E. Di Paolo / Making Sense in Participation


than complicate the picture, in many cases simplify the explanation of social cognition. In order to progress beyond what we see as a limit on the development of a social cognition research that is properly social, concepts must be introduced that will allow us to uncover the complex structure of the social interaction process. Interactions are processes extended in time with a rich structure that is only apparent at the relational level of collective dynamics. This organization may be grasped using the notion of coordination. Once we understand how coordination arises, is sustained, changes and breaks down during social encounters, we will be in a position to make a connection between the temporal aspects of interaction and their consequences for joint and individual sense-making. Several physical and biological systems exhibit coordination behaviour, even when their coupling (the amount of influence that a systems variables have on anothers parameters) is weak. There are many paradigmatic cases of coordination in biology. For example, individual flashing behaviour in a species of firefly in Southeast Asia is synchronised at the group level through the visual influence of the collective flashing pattern on the individuals [21]. Countless systems coordinate when coupled collectively and the phenomenon has been heavily studied in physics, mathematical biology and dynamical approaches to cognition [22-25]. In social science, coordination between interactors has been extensively researched [26-29]. However, here we will not review this literature, but present a general and systemic analysis of the concept of coordination in order to understand how it impacts on social cognition. An important and widespread feature of coordination (understood as the nonaccidental correlation between two systems beyond what is expected of them) is its typical reliance on rather simple mechanisms of coupling. Coordination does not generally require any sophisticated skill even when cognitive systems are involved. It is, on the contrary, often difficult to avoid. This is shown in a study by Schmidt and OBrien [30] who asked pairs of subjects to avoid synchronous oscillations while swinging a pendulum with their arms. They found that oscillations were uncoordinated if the subjects were not looking at each other, but presented a strong tendency to synchronize otherwise. We may conclude from such studies that there is no general need to postulate dedicated individual mechanisms to sustain coordination; it is rather a phenomenon that is likely to appear under a range of conditions if the coupled systems possess broadly similar properties. Coordination is also found to occur at multiple timescales [24] and adopt several forms, i.e., not just synchronisation but in general many cases of appropriately patterned behaviour, such as mirroring, anticipation, imitation, etc. When it appears in coupled systems, coordination does not have to be absolute or permanent. This is significant when we consider fluid social interactions. Coordination may come in degrees. Kelso contrasts the ideas of absolute and relative coordination to illustrate this point [22]. When a child and an adult are walking together their natural tendency to walk at a different speed is somehow overcome and they often remain together overall. This can only happen if one or the other adjusts either the frequency of their step or the length of their stride without necessarily walking in synchrony. Such coordination is far more variable, plastic and fluid than pure phase locking ([22], p. 98). In perfect synchrony (pure phase-locking) coordination is absolute (e.g. pairs of duetting tropical birds singing in antiphonal synchrony [31]). Transitions happen from one perfectly


H. De Jaegher and E. Di Paolo / Making Sense in Participation

coordinated state to another, or to non-coordination. By contrast, relative coordination presents a broader range of options as it is not defined by strictly coherent states but global trends (such as walking together). The concept of coordination will help define what a social interaction is. We may think that two cognitive systems engaged in a coupling are already interacting socially. However, not all couplings between agents meet our intuitions of being social. For instance, heat transfer between two people in a crowd does not seem to exemplify the idea of a social encounter. Is bumping into someone on a busy road a social event? The structures of coordination that may arise during couplings enable a refinement of these intuitions. We propose that a distinct feature of social interaction is its (temporary) tendency to sustain an encounter through patterns of coordination. A social interaction has self-maintaining tendencies. In contrast to other forms of coupling, coordination patterns can affect the individuals involved so that they would tend to sustain the social encounter. Several events that arise during an interaction (for instance, phrases, movements, postures and gestures aimed at establishing or repairing turns in a conversation) have the effect of facilitating its continuation. And, crucially, these sustained dynamics in turn constrain the range of possible coordination patterns that are likely to happen due to the fact that interactors are susceptible to change plastically as a consequence of the interaction history. If an encounter installs this reciprocal directed link (from coordination onto the unfolding of the encounter and from the dynamics of the encounter onto the likelihood to coordinate) the encounter becomes a social interaction, forming an emerging level that is sustained and identifiable as long as the processes involved (or some external factor) do not terminate it. This organization corresponds to the autonomy of the interaction. When there are coordination structures that help sustain the social encounter, and the encounter itself promotes coordination, this double link between encounter and coordination makes the collective pattern into an autonomous system according to our definition. This permits the identification of a specific interaction on the basis of the organization of its collective dynamics. The autonomy of social interaction is typically a fleeting one. It is a property to be found even when social encounters last just a few minutes. During that period an encounter may exhibit the organization just described in terms of the reciprocal influence between coordination and global self-maintenance. Coordinated patterns between the agents sustain the interaction and the interaction in turn affects the individual behaviour of the agents and invests them with the role of interactors. An autonomous entity, the interaction process, emerges as social encounters acquire this operationally closed organization. For certain currents in social science this is not new; as Erving Goffman says, a conversation has a life of its own and makes demands on its own behalf. It is a little social system with its own boundarymaintaining tendency ([32], p. 113). An interaction constitutes a level of analysis not reducible, in general, to individual behaviours. Individuals co-emerge as interactors contemporaneously with the interaction. Considering how individuals are affected by the encounter leads to an additional requirement for defining the interaction as social: individuals as interactors must not lose their own autonomy in the process (even though the encounter may enlarge or diminish the scope of individual autonomy). This is a constitutive constraint necessary for defining the social. In its absence, if the autonomy of an interactor were destroyed, the whole

H. De Jaegher and E. Di Paolo / Making Sense in Participation


process would reduce to the cognitive engagement of the other agent with his world. The other would become a tool, an object, or a problem for his individual cognition, making the engagement indistinct from non-social ones. We can now see that, e.g., the event of bumping into someone on a busy road is by itself not yet a social interaction, because it does not necessarily establish a co-regulated coupling. It may of course initiate a subsequent interaction. We propose the following definition of social interaction: Social interaction is the regulated coupling between at least two autonomous agents, where the regulation is aimed at aspects of the coupling itself so that it constitutes an emergent autonomous organization in the domain of relational dynamics, without destroying in the process the autonomy of the agents involved (though the latters scope can be augmented or reduced). To illustrate this, it is best to think of a situation where the individual interactors are attempting to stop interacting but where the interaction self-sustains in spite of this. Such a situation sometimes occurs when two people walk along a narrow corridor in opposite directions. In order to get past each other, they must adopt complementary positions by shifting to the left or to the right. Sometimes the individuals happen to move into mirroring positions at the same time creating a symmetrical coordinated relation. Due to the spatial constraints of the situation, such symmetry favours an ensuing shift into another mirroring position (there are simply not so many more moves available). In this way, coordinated shifts in position sustain a property of the relational dynamics (that of symmetry) that all but compels the interactors to keep facing one another, thus remaining in interaction (despite, or rather thanks to, their efforts to escape from the situation). In addition, the interaction promotes individual actions that tend to maintain the symmetrical relation. Coordinated sideways movements conserve symmetry and symmetry promotes coordinated sideways movements. While it lasts, the interaction shows the organization described above in terms of the mutual influence between the individual actions and the relational dynamics. It becomes clear that interaction is not reducible to individual actions or intentions but installs a relational domain with its own properties that constrains and modulates individual behaviour. Our definition avoids the error of considering only the social aspects of the interaction and ignoring the individual elements in it. This is expressed in the condition that the autonomy of the interactors must be conserved throughout the encounter so that it may be considered a social interaction. As a consequence, the enactive perspective makes explicit the ongoing tensions between individual and social processes. This is in stark contrast to the methodological individualism prevalent in todays cognitive science [1]. Conceiving the social as a properly autonomous domain offers an important implication for fashionable theories of social cognition. Recent embodied proposals have made heavy use of neurological mechanisms, such as mirror neurons, for explaining social understanding. These explanations are agnostic about the role of the interaction as a structured and structuring process. They tend, in contrast, to concentrate on atomic correlations, for instance, the fact that a subjects mirror neurons fire both on performing a goal-directed action and while perceiving someone else doing it [33]. This style of explanation (which may have


H. De Jaegher and E. Di Paolo / Making Sense in Participation

its own problems; see [34, 35]) remains entrenched in the mindset of an individual attempting to figure out another. The question of how such a figuring out participates in and is itself shaped by coordination dynamics, in other words, the question of what is properly social about the whole situation, remains untouched. To transfer a correlation in social activity (by which an encounter manifests the presence of mutual understanding) into a neural correlation does little but redescribe the problem. A theory based on mirror-neurons could provide a snapshot of the mechanisms involved in the recognition of intentional actions. Whether such a recognition happens to be part of a coordinated or un-coordinated period in the unfolding of an interaction is not a question that can be addressed in these terms. It is by definition a relational question that only makes sense at the level of the collective dynamics, and it is at this level that social understanding is for the most part manifested. An advantage of balancing the autonomies of the interaction and the interactors is that it allows us to understand how coordination at different levels shapes the interaction throughout its history. We can expect interactions that have been sustained for some time to have gone through repeated loss and recovery of coordination. Because of the durability of such interactions, interactors must have found themselves affected by such events in ways that allowed them to remain in interaction and occasionally finding better ways to sustain the process. There is an experiential counterpart to this: we perceive some interactions as getting easier and more fluid over time, with an increased feeling of connectedness. Recovering from a breakdown in coordination takes the role of a learning event whereby new contextual significance is acquired. There is an analogy here with the growth of an adaptive system, and this analogy provides a context for the question we now turn to: the transformation of sense-making in social interactions. 3.4 Making sense in participation At the level of human communication, Merleau-Ponty [36] proposes a view that encapsulates what we propose for the more general case of sense-making in interaction. Arguing against a perspective on language as the sharing of representations, he emphasises the sense-making activity that underlies speech production. Speech, he says, is not set in motion by an explicit thought, but by a sense-giving intention which is a certain lack asking to be made good (p. 213). Speech, in other words, is not alien to the general logic of sense-making. Likewise, as an interlocutor, my taking up of this intention is not a process of thinking on my part, but a synchronizing change of my own existence (ibid). That is, I partake of the sense-making of the other as it becomes, at least partially and through a change in myself, my own sense-making activity. But how is this possible? In this section, we focus on this question by examining what the picture of the social interaction process presented thus far implies for sense-making. People have looked at the connection between coordination and meaning, trying to map affect onto degrees of coordination [26]. Such a mapping may work in certain cases, but will not capture all the complexities of social cognition. Instead, there is a spectrum of relation between coordination in interaction and individual sense-making.

H. De Jaegher and E. Di Paolo / Making Sense in Participation


An individual cogniser is engaged in ongoing sense-making. This is an intentional activity that can become expressive in social situations through embodied action. Moreover, individual sense-making activities can be directly shaped by interactive coordination. In fact, they may themselves acquire a coherence through interaction. The proposal is the following: if the regulation that sustains a social interaction happens through coordination patterns and if those patterns affect the movements including utterances that are the tools of individual sense-making, then social agents can coordinate their sense-making during interaction. We call this process participatory sense-making: the interactive coordination of intentional activity affecting individual sense-making, whereby new domains of sense-making may appear that were unavailable to each solitary individual. A spectrum of participation may be used to describe the different manifestations of the coherence that sense-making activities may acquire through coordination. At one extreme, sense-making remains largely an individual activity that is at most modulated by the interaction. Participation is minimal in these cases. At the other end, defined by the highest levels of participation, we find the process of joint sense-making, where intentional engagements become fully shared. To illustrate how patterns of coordination and breakdown can enter into the shared meaning of an interaction, we may look at situations where the normal flow of an interaction is interrupted. Imagine the following dialogue taking place over a video conferencing line with an inherent delay (the implications of these kinds of glitches in communication technology have been studied by e.g. [37], from where we have adapted this example). A: That was a pretty good presentation. (Pause) A: If youre into that kind of work. B: Well, I suppose someone has to do it. The pause, indicating to A a lack of a response from B when A was expecting it, prompted A to alter her initial praise (by justifying it in anticipation of a disagreement). B responds to this situation by expressing a similarly moderate view, even if at the start he may well have shared As initial enthusiasm. This example illustrates that individual sense-making can become aligned in a direction not initially intended by the interactors and that this shift in meaning can be introduced by the properties of the interaction dynamics. It also shows that temporal coordination plays a crucial role in producing this adjustment of individual sense-making. Generally, sense-making in interaction fluctuates with changes in interactional coordination patterns over time. Next on the scale of participation we have orientation: coordination of sensemaking orients one of the interactors towards a domain of significance that was already part of the others sense-making. For example, an interactor (A) calls the attention of an other (B) to what he cannot yet perceive. Say B is scanning the room to find something. The embodied expressiveness of this activity affects As sense-making and she can now purposefully modulate Bs sense-making by grabbing his attention and pointing to the lost object. Orientation can also be achieved through an extended temporal regulation of coordination. Stern describes a relevant example of how affect is regulated between mother and infant. An infant


H. De Jaegher and E. Di Paolo / Making Sense in Participation

may be aroused by his mother repeating a phrase such as Im gonna getcha while extending the intervals between repetitions ([38], p. 114). According to Stern, this increases the discrepancy from the expected for the infant and he becomes more and more excited (ibid.). The change of affective state is a case of orientation according to our view, which happens through the infants coordinated engagement with the mothers tempo. This orientation happens thanks to the mothers attunement to the infants responses as well as the infants active role in sustaining the interaction dynamics. The mechanisms involved need not be more complex than the cases of relative coordination described earlier. As in the case of the adult and child walking together, mother and infant seem to undergo a process of phase attraction in their temporal behaviours and expectations. Such a hypothesis (which would need empirical verification) does not require the postulation of specialized individual mechanisms. The relational dynamics of the interaction, in this view, would in themselves be sufficient. Mother and infant would not need more than a capacity to enter into a temporal interaction with an external event or object. The mother intends to regulate the infants sense-making (affect) and this makes it a case of orientation. Another example of mother-infant interaction can illustrate joint sense-making (approaching the far other end of the scale of participation). Fogel describes a filmed session between a one-year-old and his mother ([39], p. 20-21). He studied this pair at weekly intervals since the babys first month of age. Infants generally take objects from their caregiver earlier than they give things themselves, and here Fogel describes the first recorded event of giving by the infant, conveying how it is a jointly constructed event (what he calls a co-regulated activity, p. 21). He describes it as follows: Andrews action has two separate motor components. First, his arm extends (frames 1-6) and then he releases the object (frames 7-10). . . . Once Andrews arm is extended his hand remains relatively stationary and gradually opens as mothers hand moves underneath his hand. The fork gently leaves Andrews hand as it is pulled only by the slightest contact with the mothers moving palm (ibid.). In contrast to the infant simply dropping the fork in the mothers open hand, or the mother taking it from him, the giving is not an individual act. It needs the taking in order to be completed. Before reaching Fogels own interpretation, if we assume for a moment that the infant is the initiator of the act, we realise that he must create an opening by his action that may only be completed by the action of the mother. The giving involves more than orientation of the mothers sensemaking; it involves a request for her not only to orient towards the new situation, but also to create a sense-making activity that will bring the act to completion. In other words: to take up the invitation for an intention to be shared. This invitation may go unperceived and the act frustrated. But this is not the same as the situation in which the invitation is perceived and declined. The two situations are different from the perspective of the mother and this difference confirms that an invitation to participate is experienced as a request to create an appropriate closure of a sense-making activity that was not originally hers. To accept this request is to bring the other half of the act into a successful joint activity. When we remove the simplifying assumption that the infant intentionally originated the act, we open up the possibility for even richer degrees of

H. De Jaegher and E. Di Paolo / Making Sense in Participation


participation. The act may then indeed result from a co-regulation that emanates from previous aspects of the interaction, as Fogel proposes. A certain movement extending the fork in the direction of the mother, without yet intending to give it, may now be opportunistically invested with a novel meaning through joint sensemaking. Latent intentions become crystallised through the joint activity so that not only the completion of the act is achieved together, but also its initiation. Clearly, more sophisticated examples of joint sense-making than this act of giving can be found, especially as we move into the realm of linguistic interactions. It is possible to think of examples such as the creation of private nuances in meaning between intimate friends, the elaboration of joint plans, teaching, making music together, to name a few. Different cases may afford more complex forms of participation, but in all of them the meaning of an act will require the coordinated participation of the interactors to be realised. Moreover, it is likely that making sense in participation may at any point involve situations across the whole spectrum of participation sketched in this section. 3.5 Implications The shift in emphasis towards the interaction process that we are proposing in this chapter will require more elaboration. However, it is possible to derive interesting implications from this perspective already, for instance for the development of social capabilities, including its impairments [40]. It also contributes to enriching the dialogue between science and phenomenology by providing theoretical insights that could ground, for example, the experience of alterity of an other. Some of these implications are discussed in [2]. In this chapter, we would like to focus briefly on some implications for the empirical study of social interaction, in particular mother-infant interaction. Let us take as an example the question of how infants are affected by the contingency of interaction. Empirical evidence, such as Murray and Trevarthens double TV monitor experiments and its successors [41-43], indicates that individuals rely on their partners to behave responsively in order to sustain their involvement in an interaction. For instance, two-month-old infants are able to sustain a fluid dyadic interaction with their mothers via a live double video link. However, when at some point they are shown recordings of their mothers that were generated previously in the interaction, they do not coordinate with the unresponsive recording (which maintains intact the mothers expressive movements). Instead, the infants become distressed and removed. This indicates that the infants recognition of the ongoingness and contingency of the interaction plays a fundamental role in its unfolding. Early involvement in socially contingent interactions and its implied connectedness play a fundamental role in the infants affective and experiential development [44]. An individual sensitivity to social contingency in two-month-olds is inferred from these results [43], suggesting that such a recognition is necessarily performed by the individual again a Rear Window move. Candidate explanations for such a skill would require the postulation of, for instance, an innate contingency detection module [45]. Based on the view presented here, however, we may question this implication for the general case. Conceivably, the coordination structures that sustain the interaction could themselves be part of the mechanisms that affect the infant negatively when


H. De Jaegher and E. Di Paolo / Making Sense in Participation

contingency is removed. Then the postulation of contingency detection mechanisms becomes optional. The infants history of participatory sense-making is directly altered in the passage from the contingent to the non-contingent situation. Recent empirical findings and minimal social interaction models have demonstrated how the collective interaction dynamics can explain differences in individual action in cases with or without contingency. Experiments in minimalistic perceptual crossing have been carried out using a one-dimensional virtual space where two participants can encounter each other and other objects through the use of mouse movements and a tactile feedback device [46]. Their task is to locate each other in the presence of distracting objects that replicate their exact shapes and movements. The experiments demonstrate that they are successful at this task. However, the results indicate that this is not achieved through an individual appreciation of contingent interaction (in fact, individuals are unable to distinguish the movement of another subject from the non-contingent object that imitates those movements). Rather, participants find each other thanks to the fact that the interaction dynamics make them avoid the situations where confusion could arise. Models of this experiment confirm this interpretation and extend it to other tasks (analogous to the double TV-monitor experiments) [3]. In these extended models, the discrimination between contingent and non-contingent conditions is achieved through the inherent higher stability of the double feedback between interactors in the contingent condition. This double feedback is enough to keep the interactors engaged even in the presence of noise or disruptions. However, in the non-contingent condition (where the other end of the interaction is a recording), this feedback becomes one-sided and external perturbations are now sufficient to throw the engagement out of joint and make the agent fully disengaged. Response to contingency depends on the live interaction, which needs both agents to regulate its stable continuation [3]. In these experiments and models, discrimination between contingent and noncontingent situations is achieved through the social process in the ongoingness of the interaction. The dynamics of interaction are not simply the data that an individual must evaluate; they are an integral part of the evaluation process itself. In general, there is no a priori reason to assume that explanatory possibilities for mother-infant interaction situations have to be either purely individual or strictly social. Presumably, appropriate explanations for socially interactive processes incorporate both elements and, thereby, lie somewhere along a spectrum defined by strictly individual evaluation of interactive information at one end and purely social modulation of individual dynamics on the other. What is called for is a methodology that will permit to map this spectrum by (1) determining the dynamical properties of coordination present in a given social interaction and (2) generating hypotheses regarding their contribution to the observed social behaviour. Such tools would also allow the exploration of the mutual shaping (as well as the tension) between individual and social dynamics (corresponding to the two autonomies we propose to be present in social interaction) as an intrinsic source of (de-)stabilisation of coordination. Interactive factors affecting coordination may be uncovered by their signature response to controlled perturbation methods. The successful unpacking of the contribution of the social and individual dynamics may be more easily achieved in situations when they are in conflict. The

H. De Jaegher and E. Di Paolo / Making Sense in Participation


narrow corridor situation may serve as a model for a range of social interactions where the individual intention to steer the interaction in a certain direction actually prevents the realisation of this aim because of the emerging coordination patterns at the social level. This motif may prove useful for exploring the relation between the two autonomous domains. These implications for empirical research not only test the validity of the enactive ideas we propose, but are themselves instrumental in the program of improving this account and framing not just new explanations but also new questions in the field of interaction studies. 3.6 Conclusion We have described some implications of enactive ideas for social cognition. These ideas allow us to define the social domain in a way that is novel and, more importantly, operational. This is done in terms of the embodiment of interaction using the concept of coordination, in terms of the shifting and emerging levels of autonomous identity and in terms of joint sense-making and its experience. The framework presented in this chapter establishes what it means to take the role of the interaction process seriously in a way that remains close to the experience of interacting. By elaborating on the embodiment of the interaction and its autonomy as a process, we confirm that the interaction process really is a proper subject of study. Moreover, the framework balances the autonomies of interactors and of the social process, and allows further developments regarding their interplay. In this way, it contributes to crossing the gaps between social science and cognitive science by bringing dynamical and embodied approaches into dialogue with experience and interactional behaviour. 3.7 References
[1] M. Boden, Of islands and interactions. Journal of Consciousness Studies, 13 (5), 53-63, 2006. [2] H. De Jaegher & E.A. Di Paolo, Participatory Sense-Making: An enactive approach to social cognition. Phenomenology and the Cognitive Sciences, 6 (4), 485-507, 2007. [3] E. Di Paolo, M. Rohde & H. Iizuka, Sensitivity to social contingency or stability of interaction? Modelling the dynamics of perceptual crossing. New Ideas in Psychology, in press. [4] E. Thompson, Mind in Life: Biology, Phenomenology, and the Sciences of Mind. Harvard University Press, 2007. [5] F.J. Varela, E. Thompson & E. Rosch, The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA: MIT Press, 6th ed. 1991 [6] S. Torrance, In search of the enactive: Introduction to special issue on Enactive Experience. Phenomenology and the Cognitive Sciences, 4(4), 357-368, 2005. [7] E. Di Paolo, M. Rohde & H. De Jaegher, Horizons for the enactive mind: Values, social interaction, and play, in J. Stewart, O. Gapenne & E. Di Paolo (Eds.), Enaction: Towards a New Paradigm for Cognitive Science. Forthcoming, MIT Press: Cambridge, MA. [8] E. Thompson, Sensorimotor subjectivity and the enactive approach to experience. Phenomenology and the Cognitive Sciences, 4 (4), 407-427, 2005. [9] A. Moreno & A. Etxeberria, Agency in natural and artificial systems. Artificial Life, 11, 161-176, 2005. [10] F.J. Varela, Patterns of life: Intertwining identity and cognition. Brain and Cognition, 34, 72-87, 1997. [11] F.J. Varela, Principles of Biological Autonomy. New York: Elsevier (North Holland), 1979.


H. De Jaegher and E. Di Paolo / Making Sense in Participation

[12] E.A. Di Paolo, Autopoiesis, adaptivity, teleology, agency. Phenomenology and the Cognitive Sciences, 4(4), 97-125, 2005. [13] A. Weber & F.J. Varela, Life after Kant: Natural purposes and the autopoietic foundations of biological individuality. Phenomenology and the Cognitive Sciences, 1(2), 97-125, 2002. [14] H. Jonas, The Phenomenon of Life. Toward a Philosophical Biology. Evanston, Illinois: Northwestern University Press, 1966. [15] F.J. Varela, Organism: A meshwork of selfless selves. In A. Tauber (Ed.), Organism and the Origin of Self, 79-107. Kluwer: Dordrecht, 1991. [16] E. Myin, An account of color without a subject? Behavioral and Brain Sciences, 26(1), 42-43, 2003. [17] A. Clark, Being There: Putting Brain, Body and World Together Again. Cambridge, MA: MIT Press, 1997. [18] R. Pfeifer & C. Scheier, Understanding Intelligence. Cambridge, MA: MIT Press, 1999. [19] R.D. Beer, The dynamics of active categorical perception in an evolved model agent. Adaptive Behavior, 11(4), 209-243, 2003. [20] N. Sebanz, H. Bekkering & G. Knoblich, Joint action: bodies and minds moving together. Trends in Cognitive Sciences, 10 (2), 70-76, 2006. [21] J. Buck & E. Buck, Synchronous fireflies. Scientific American, May, 74-85, 1976. [22] J.A.S. Kelso, Dynamic Patterns: The Self-Organization of Brain and Behaviour. Cambridge, MA: MIT Press, 1995. [23] R.F. Port & T. van Gelder, (Eds.), Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press: Cambridge, MA & London, England. 590, 1995. [24] A.T. Winfree, The Geometry of Biological Time. 2nd ed. Interdisciplinary Applied Mathematics. London: Springer, 2001. [25] Y. Kuramoto, Chemical oscillations, waves and turbulence. Berlin: Springer, 1984. [26] A. Kendon, Conducting Interaction: Patterns of Behavior in Focused Encounters. In J.J. Gumperz (Ed.), Studies in Interactional Sociolinguistics. Cambridge: Cambridge University Press, 1990. [27] E. Goffman, The interaction order. American Sociological Review, 48, 1-17, 1983. [28] H. Sacks, E.A. Schegloff & G. Jefferson, A simplest systematics for the organization of turntaking for conversation. Language, 50(4), 696-735, 1974. [29] A. Schutz, On Phenomenology and Social Relations, Ed. H.R. Wagner. Chicago: University of Chicago Press, 1970. [30] R.C. Schmidt & B. O'Brien, Evaluating the dynamics of unintended interpersonal coordination. Ecological Psychology, 9 (189-206), 1997. [31] W.H. Thorpe, Duetting and Antiphonal Song in Birds: Its Extent and Significance. Leiden: E. J. Brill, 1972. [32] E. Goffman, Interaction Ritual: Essays on Face-to-Face Behavior. London: Allen Lane, 1972. [33] V. Gallese, L. Fadiga, L. Fogassi & G. Rizzolatti, Action recognition in the premotor cortex. Brain, 119, 593-609, 1996. [34] F. de Vignemont & T. Singer, The empathic brain: how, when and why? Trends in Cognitive Sciences, 10 (10), 435-441, 2006. [35] N. Georgieff & M. Jeannerod, Beyond consciousness of external events: A 'who' system for consciousness of action and self-consciousness. Consciousness and Cognition, 7, 465-477, 1998. [36] M. Merleau-Ponty, Phenomenology of Perception, London: Routledge, 2002/1945 [37] K. Ruhleder & B. Jordan, Co-constructing non-mutual realities: Delay-generated trouble in distributed interaction. Journal of Computer Supported Cooperative Work, 10 (1), 113-138, 2001. [38] D. Stern, The First Relationship: Infant and Mother. London: Harvard University Press, 2nd ed. 2002/1977, [39] A. Fogel, Developing through relationships: Origins of communication, self and culture. London: Harvester Wheatsheaf, 1993. [40] H. De Jaegher, Social Interaction Rhythm and Participatory Sense-Making: An embodied, interactional approach to social understanding, with implications for autism, Unpublished D.Phil. Thesis. University of Sussex, Brighton, UK, 2006. [41] L. Murray & C. Trevarthen, Emotional regulation of interactions between 2-month-olds and their mothers. In T.M. Field & N.A. Fox (Eds.), Social Perception in Infants, (pp. 177-197). Ablex: Norwood, NJ, 1985. [42] C. Trevarthen, The self born in intersubjectivity. In U. Neisser (Ed.), The Perceived Self: Ecological and Interpersonal Sources of Self-knowledge, (pp.121-173). Cambridge University Press: Cambridge, 1993. [43] J. Nadel, I. Carchon, C. Kervella, D. Marcelli & D. Rserbat-Plantey, Expectancies for social contingency in 2-month-olds. Developmental Science, 2 (2), 164-173, 1999.

H. De Jaegher and E. Di Paolo / Making Sense in Participation


[44] E. Tronick, Why is connection with others so critical? The formation of dyadic states of consciousness and the expansion of individuals' states of consciousness: coherence governed selection and the co-creation of meaning out of messy meaning making. In J. Nadel & D. Muir (Eds.), Emotional Development, (pp. 293-316). Oxford University Press: Oxford, 2005. [45] G. Gergely & J.L. Watson, The social biofeedback theory of parental affect-mirroring: The development of emotional self-awareness and self-contral in infancy. International Journal of Psycho-Analysis, 77, 1181-1212, 1996. [46] M. Auvray, C. Lenay & J. Stewart, The recognition of mutual perception in a minimalist environment. New Ideas in Psychology, 2007, in press.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Interacting Socially through Embodied Action

Abstract. This chapter contrasts traditional, disembodied information-processing approaches to intersubjectivity in socio-cognitive research with more recent, embodied approaches. Based on an analysis of the shortcomings of the former, it focuses on the latter, but also clarifies different notions of embodiment and its role in cognition and social interaction. Integrating a broad range of theoretical perspectives and empirical evidence from mainly social psychology, social neuroscience, embodied linguistics and gesture studies, four fundamental functions of the body in social interaction are identified: (1) the body as a social resonance mechanism, (2) the body as a means and end in communication and social interaction, (3) embodied action and gesture as a helping hand in shaping, expressing and sharing thoughts, and (4) the body as a representational device. The theoretical discussions are illustrated with an example from a case study of insitu embodied social interaction, with a focus on the importance of crossmodal interaction in the process of scaffolding. It is concluded that the body is of crucial importance in understanding social interaction and cognition in general, and in particular the relational nature of mind and intersubjectivity.

Contents 4.1 4.2 4.3 4.4 4.5 4.6 Introduction......................................................................................................... 49 Disembodied approaches to intersubjectivity ..................................................... 50 On the embodied nature of social interaction ..................................................... 53 Illustration and discussion................................................................................... 59 Conclusions......................................................................................................... 62 References........................................................................................................... 62

4.1 Introduction The ability to engage in social interaction is a crucial building block of human culture, which is the foundation for the complexity of social life and cognition. In this chapter, we aim to clarify the role and relevance of the body in social interaction, from the perspective of embodied cognitive science. Broadly speaking, the traditional information-processing approach to intersubjectivity in socio-cognitive research assumes that agents relate to each other in much the same way as they relate to other parts of the external world, i.e. by having more or less explicit internal representations of each other [e.g. 1-3]. This is a centralized view of cognition that considers (social) cognition to take


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

place inside the skull, with the body only serving as some kind of input and output device, i.e. a physical interface between internal programs (cognitive processes) and external world. We contrast such information-processing approaches to intersubjectivity with embodied and enactive approaches as follows. Information-processing approaches to intersubjectivity are based on the assumption that the role of the body in social interaction and cognition is merely as a trivial, peripheral appendage to the real intellectual mind. Therefore, bodily aspects are within this theoretical framework frequently addressed in terms of nonverbal communication, nonverbal behavior, or body language. Accordingly, embodied actions such as body posture, gaze and gesture are still commonly considered to be nothing but the visible outcomes of mental intentions and contents which are transmitted from one mind to another. Gallagher [4] argues that these standard information-processing models of mind as representational, functional and/or computational, despite their complexity, are oversimplified and altogether neglect the many effects of embodiment. The embodied approach, on the other hand, emphasizes the way cognition is shaped by the body and its sensorimotor interaction with the surrounding world [e.g. 4-9]. Hence, this view holds that central to intersubjectivity is first and foremost the experience of being embodied in a social, cultural and material sphere [10]. It might be worth noting that this does not necessarily imply denying mental concepts as such (e.g. beliefs or intentions) altogether, but rather questioning their central underlying role, as stressed in information-processing approaches. Instead, they may be emergent from and grounded in embodied interactions rather than an underlying requirement for cognitive processes [for more details, see 10]. It should also be noted though that there are different views within embodied cognitive science regarding in what sense, or to what extent, cognition is to be considered as embodied [9]. Clark [5], for instance, distinguishes between the positions of simple embodiment and radical embodiment. According to the former, the traditional foundation of computationalist/functionalist cognitive science can be preserved more or less intact, and embodiment is merely considered a constraint of the inner organization and processing. The radical embodiment position, on the other hand, goes much further and treats the facts of embodiment as a fundamental shift in the explanation of cognition that is profoundly altering the subject matter and theoretical framework of cognitive science [5, p. 348]. This chapter is more in line with the latter view. In a nutshell, the comparison made in this chapter is between these two positions, i.e., information-processing vs. embodied approaches to intersubjectivity. After some brief presentation and critical discussion of the first approach in the next section, this chapter particularly focuses on the second approach, discussing recent work in cognitive science and related disciplines which indicates that the body is of crucial importance in social interaction, cognition and intersubjectivity. 4.2 Disembodied approaches to intersubjectivity Without reviewing the huge literature on intersubjectivity, one might say that intersubjectivity refers to the manifestation of shared meanings constructed by people in their interactions with each other. Hence, intersubjectivity results in a

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


basic discrimination between the self and others as well as the ability to compare and project ones own private experiences or cognitive states with those of another person. Broadly speaking, it has been suggested that intersubjectivity is the cradle of social interaction and cognition [11, 12]. It is probably safe to say that within socio-cognitive research, the informationprocessing view of social interaction still is the most common one, and certainly still the dominant one. A good example of this view is found in The Encyclopedia of Cognitive Science, which characterizes social cognition as follows: Social-cognitive research, with its adherence to the information-transmission metaphor, is fundamental to the study of process; that is, social cognition is the part of social psychology that deals with the psychological mechanisms that mediate the individuals response to the social environment. As such, the nature of mental representation and the dynamics of information-processing are central topics of social-cognitive inquiry [3, p. 66]. In a similar vein, Singer, Wolpert and Frith [13] claim: the study of social interaction involves by definition a bi-directional perspective and is concerned with the question of how two minds shape each other mutually through reciprocal interactions. To understand interactive minds we have to understand how thoughts, feelings, intentions, and beliefs can be transmitted from one mind to the other how to communicate these thoughts [13, p. xvii]. From the above definitions, to mention just a few, it is obvious that much research in the social domain takes an information-processing approach [e.g. 1-3, 13]. However, criticism against this view has been put forward by a number of researchers [14-18]. Gibbs [14], for instance, addresses two major problems with the traditional view. Firstly, the traditional view of human intentions as exclusively private mental states in individual minds ignores the dynamic, interactive nature of intentional action. Generally speaking, there is a separation between beliefs, intentions, etc., and the actual behavior of social interactions. This implies that social interaction is a rather passive process between two Cartesian minds, as Gallagher [4] puts it. According to Shanker and King [15], the information-transmission metaphor fails to reveal the full story of social interaction, because it significantly oversimplifies and misrepresents what actually happens in social interaction. They stress that such interactions cannot be reduced to so-called social information transfer. The main point here is that information is not a predefined and discrete entity which can be sent, through signals, from one agent across time and space to another agent in the form of internal mental representations. However, a stimulating shift has occurred in socio-cognitive research, from mainly using the information transmission metaphor, to applying the so-called dance metaphor [10, 15, 16]. Broadly speaking, the dance metaphor focuses on dynamically emerging, creative co-regulated interaction in a particular social situation, instead of discrete and linear processes as in the informationtransmission metaphor. In other words, the dance metaphor focuses on the emergence of information in the dyad between embodied agents. As stressed by Ingold [17, p. 627], such a shift will release biological and psychological


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

studies of communication from the straightjacket of hard-core cognitivism. In line with the dance metaphor, the distributed cognition approach [19] expands the unit of analysis and focuses on real-time interactions between the various interactants and their environment, instead of focusing on mental structures in individual minds. This means, contrary to viewing cognition as mainly internal processes, social interactions are considered to be directly observable cognitive events. With this crucial change in perspective, much of cognition previously hidden inside the skull has now become apparent. This means, information is neither pre-given nor hidden internally, but can emerge in the interaction and be manifested in visible embodied actions. Secondly, the work of cultural anthropologists addresses another problem with the traditional view. The underlying assumption in the traditional view is not shared across different cultures [14], but the focus on individuals intentions by rather reflects a Western white middle-class bias regarding the nature of selfhood than a universal phenomenon. It therefore might be argued that individual intentionality is one of the holy cows of Western thought which overemphasizes the individuals psychological state at the expense of the social context in which the actions unfold [14]. The study of the social context, which we refer to as relational, has strong historical roots in the work of Mead [20] and Vygotsky [21]. Vygotskys [21] example of the development of pointing in the child illustrates the relational aspect of social interaction. He claimed that what an observer might perceive as pointing initially is only a simple and incomplete grasping movement directed toward a desired object, and nothing more. When the caretaker comes to help the child, the meaning of the gesture situation itself changes as the childs failed reaching attempt provokes a reaction, not from the desired object, but from the other person. The individual movement in itself in its social context becomes a gesture for-others. The caretaker interprets the childs reaching movement as a kind of pointing gesture, i.e. a socially meaningful communicative act, whereas the child itself at the time is not actually aware of its communication ability. However, after a while the child also becomes aware of the communicative function of its movements, and then begins addressing its gestures towards other people, rather than the object of interest that was its primary focus initially. Thus, the grasping movement changes to the act of pointing [21, p. 56]. This means, the intention of pointing initially does not reside within the childs individual mind, but emerges as an outcome of their on-going social interactions. Accordingly, by treating children as intentional beings, caregivers bootstrap and scaffold them into a socio-cultural environment, which partly rests on the illusion of intentionality. Another criticism against the traditional view, not addressed by [14], is its biological implausibility and disembodiment. Maturana and Varela [18, p. 196], for instance, pointed out that the traditional metaphor of communication is wrong, since biologically, there is no transmitted information in communication. A similar argument was put forward by Fogel [16, p. 76] who stated that information is created in the interface between perception and action It is that last point, the salience of the body that is missing in many theories of meaning. Taken together, there is a need for alternative explanations of social interaction that address the issue from an embodied perspective, since the traditional view, in a nutshell, can be regarded as a disembodied sender-receiver explanation of pregiven information, missing contextual and bodily aspects. The next sections

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


elaborate in some more detail on these objections, particularly regarding why it might be more fruitful to consider humans as embodied cognizers situated in a social, cultural, and material sphere, and why that might be crucial for social interaction and cognition. 4.3 On the embodied nature of social interaction Many recent findings in cognitive science and related disciplines indicate that the body has several important roles in social interaction and cognition. Here we briefly address different perspectives and empirical findings, ranging from disciplines such as social psychology, phenomenology, social neuroscience, and gesture studies to linguistics. These findings are then generalized to four fundamental functions of the body in social interaction [10, 22]. 4.3.1 Social embodiment effects Semin and Smith [23] point out that empirical findings in social psychology and current research on embodied cognition have a lot in common, given that several interesting phenomena in social psychology can be explained from an embodied perspective. Barsalou et al. [24], for example, have identified the following four kinds of social embodiment effects for which there is plenty of empirical evidence (for details see [24] and [25]) and the many references therein). Firstly, perceived social stimuli do not only produce cognitive states, but also bodily states. For example, it has been reported that high school students who received good grades in an exam adopted a more erect posture than students who received poor grades. In another experiment, subjects primed with concepts commonly associated with elderly people (e.g., gray, bingo, wrinkles) exhibited embodiment effects such as slower movement when leaving the experimental lab, as compared to a control group primed with neutral words. Several other studies also show similar effects. Secondly, the observation of bodily states in others often results in bodily mimicry in the observer. People often mimic behaviors, and subjects often mimic an experimenters actual behavior, e.g. rubbing the nose or shaking a foot. Subjects also tend to mimic observed facial expressions, which is widely documented in the literature. Thirdly, bodily states produce affective states, which means that embodiment not only facilitates a response to social stimuli but also produces tentative stimuli. For example, subjects rated cartoons differently when holding a pen between their lips than when holding it between their teeth. The latter triggered the same musculature as smiling, which made the subjects rate the cartoons as funnier, whereas holding the pen between the lips activated the same muscles as frowning and consequently had the opposite effect. Moreover, bodily postures influence the subjects affective state; e.g., subjects in an upright position experienced more pride than subjects in a slumped position. Fourthly, compatibility between bodily and cognitive states enhances performance. For instance, several motor performance compatibility effects have been reported in experiments in which subjects responded faster to positive


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

words (e.g. love) than negative words (e.g. hate) when asked to pull a lever towards them. In more recent work, some of the abovementioned researchers focus explicitly on traditional conceptions in social psychology, such as attitudes, social perception, and emotions. Niedenthal et al. [25], for example, emphasize that empirical studies show that bodily postures and motoric activities, such as nodding heads (in agreement) or shaking heads (in disagreement) are related with positive or negative preferences and action predispositions toward objects. Similarly, others studied how such head movements influence the attitudes towards a pen placed on the table in front of the participants during the cover story of testing head phones. Afterwards, a nave experimenter offered the old pen that had been placed on the table during the experiment or a new pen the subjects had not seen before. Depending on the performed head movements, i.e., nodding in agreement or shaking in disagreement, the participants favored the pen that correlated with the developed attitude. In other words, the nodding participants chose the old pen, whereas the head-shaking participant preferred the new one. These examples, as well as many other studies, demonstrate that there is a strong relation between embodied and cognitive states in social interaction. In short, the bi-directional swapping between these states occurs automatically without higher knowledge structures. These findings suggest that the body might be used as a resonance mechanism in the process of perceiving others, and it has been suggested that so-called mirror neurons function as the neurobiological underpinning for these social embodiment effects, as discussed in more detail in the following. 4.3.2 Social neuroscience Recent findings in social neuroscience provide strong evidence for an embodied interpretation of intersubjectivity. For instance, simulation theories and work on mirror neurons are good examples of more radically embodied views (in Clarks above sense). In short, the simulation account argues that cognitive processes are achieved by the reactivation of the same neural structures used for physically sensing, moving and acting in the environment, but also in social interaction and cognition [26-31]. Proponents of simulation theories hold that social understanding essentially is the ability to project oneself into another persons point of view; simulating what it is like to be in the other persons situation [28-32]. In short, the simulation account argues that cognitive processes are achieved by the reactivation of the same neural structures used for physically sensing, moving and manipulating the environment, but also the conceptualization and understanding of intersubjectivity and language. The capacity to simulate requires an ability to imitate the inner states of another person and it has been supposed that the body and its sensorimotor processes can be used as a linking device when perceiving others1. Galleses [31] theory of the shared manifold of intersubjectivity, for example, proposes that all kinds of
It should be noted, however, that this view should not be misinterpreted as claiming there is a direct correlation between so-called objective neurological states in the brain and subjective phenomenological experience On the contrary, as pointed out by Gallagher [4], bridging the troubled water of social cognitive neuroscience and phenomenology through a direct mapping is no viable approach, because there is no short cut that can bypass the effects of embodiment [4, p. 244].

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


interpersonal relations depend, at a basic level, on the foundation of a shared manifold space characterized by routines of embodied simulations. Simulation is presumably accomplished through the sharing of neural mechanisms between sensorimotor processes and higher-level processes. Such accounts show that the traditional strong division between perception and action as well as between sensorimotor and cognitive processes, might need to be revised. According to Svensson [32], an important factor in understanding the embodiment of higherlevel cognition, is to consider embodied simulation as offline representation. This means, embodied simulation processes can function as offline representations, i.e., generally speaking, the internal replication of agent-environment interaction for issues beyond the here and now. [32]. Such an understanding may rely on a resonance mechanism, being part of a particular type of visuo-motor neurons found in pre-motor cortex of the macaque monkey brain, so-called mirror neurons, which exemplify how perception, action, and social cognition, might come together at the level of single neurons. Mirror neurons are located in area F5 in the monkey brain and become activated both when performing specific goal-directed hand (and mouth) movements and when observing or hearing about the same actions [26-28]. Since mirror neurons respond in both conditions, it has been argued that the mirror system functions as a kind of action representation, linking action and action-perception. Such a mirroring mechanism might enable an agent to understand the meaning of the observed action by embodied reactivation. This means, even while only observing the actions of another individual, a neural triggering event in fact takes place in the observer. Accordingly, the linking between action and perception offers an intuitive first-person understanding of the observed action. In later studies, mirror neurons have been investigated under two conditions, namely hidden and full visual scenes [33]. In the visual condition, the monkey was able to see the entire action, for example, a hand-grasping movement. In the hidden condition, the same action was carried out, but its crucial and final part, i.e., the interaction with the actual object, was invisible, and the monkey merely knew that the target object was present. The result, however, demonstrated that more than half of the mirror neurons responded in the hidden condition [33]. This implies that the intention behind the action actually was mediated, despite the fact that the monkey did not see the actual hand-object interaction. That is, the goal of the action was still hinted at, given that the gap of missing visual information is filled by reactivating the complete action. This means, the mirror neurons are able to compensate for the missing information, and still seem to interpret the actual goal of the action. More recent work on the activation of the mirror neuron system has been performed in specific contexts (such as before and after drinking tea) [34]. The study indicates that a certain kind of mirror neurons, so called logically related mirror neurons, may constitute the foundation for intentionality. Traditionally, the description of an action and the interpretation of the reason why that particular action is performed have been considered to rely on two different mechanisms. The mirror neuron system, however, provides an alternative solution, given that logically related mirror neurons automatically code the motor acts that are most expected to follow the observed action in a particular context [34]. This means, the ability to infer the forthcoming new goal is already there in the mirror neuron system and explaining intentionality by two different mechanisms is both


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

unnecessary and biologically implausible. Another study [35] implies that information of intentions might be conveyed also by the grasping action itself. Their data suggests that the human mirror neuron system uses both contextual and action-type information (precision-grip vs. whole-hand prehension) to predict others intentions. Furthermore, there are indications that mirror neuron activity is linked to social competence [35]. Hence, it has been speculated that the mirror system might be a basic mechanism necessary for imitation and attributing mental states to others [26-35]. This implies that during the course of ontogeny, the mirror neuron system and simulation processes might develop further, through maturation as well as socially scaffolded interaction, to more advanced forms. Based on these findings, Gallese stresses there is now enough empirical evidence to reject a disembodied theory of the mind as biologically implausible [32, p. 166]. All in all, the consideration of the mirror neuron system and simulation theories as the neurobiological underpinning of social interaction and cognition provides significant examples of more radically embodied views of intersubjectivity. 4.3.3 Embodied linguistics In addition to action recognition, mirror neurons are also considered to be involved in more complex social actions, such as gesture and language. Rizzolatti and Arbib [27], for instance, suggest that the human communicative and linguistic capacity is a natural extension of action recognition based on mirror neuron mechanisms. This provides a tentative explanation of why and how the human Brocas area, involved in gesture and language processes, emerged from area F5 in the monkey brain. Arbib [36], for instance, suggests that the mirror system provides the causal mechanism for basic intentional interaction and thus might constitute the foundation of human language. As Rizzolatti [26] points out, however, it is obvious that the mirror neuron mechanism itself is unable to explain the whole complexity of speech and human language, but it actually clarifies one of the fundamental aspects of intersubjectivity, namely how interacting partners are able to share the communicated meaning of a dialogue. In other words, the epistemological divide (i.e., verbal versus non-verbal interaction) in linguistics may be bridged from an embodied perspective. Several researchers have demonstrated converging empirical evidence which suggests that the systems of hand and mouth movements are not two separate systems. Rather, they should be viewed as an integrated communicative speechlanguage-gesture system, linking action, thought and cognition. McNeill [37], for example, proposed that speech and gesture form a single system of communication, grounded in a common underlying thought process, emphasizing that [g]estures do not just reflect thought but have an impact on thought. Gestures, together with language, help constitute thoughtGestures occur, according to this way of thinking, because they are a part of the speakers ongoing thought process [37, p. 245]. Despite the close connection between gesture and speech in language, they generally differ in how they carry meaning [4, 37, 38]. Gesture offers alternative ways of expressing ideas that are hard to articulate in speech, as well as when there is no proper word at hand for the actual meaning to be conveyed. Furthermore, gestures can present different pieces of information simultaneously, which in

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


speech would need to be expressed sequentially. Goldin-Meadow also discovered that speech and gesture convey different information, but not necessarily conflicting meaning [38]. Furthermore, it has been argued that gestures also have representational properties. Goldin-Meadow [38], for example, emphasizes that the gestures accompanying speech are symbolic acts that convey meaning. Gallagher [4] emphasizes the fundamental difference between instrumental acts (e.g., opening a jar or reaching out to pick up a glass), and the generation of a gesture signifying the very action of opening a jar or picking up the glass. In other words, the act of gesture achieves an entirely different function than the actual grasping or opening, because those actions have representational content, which is a cognitive and possibly a communicative function that requires the generation and expression of meaning [4]. Accordingly, gesture is a natural part of communication, and enables people to embody their thoughts in action. Nevertheless, many researchers still overlook the integrated nature of speech and gesture in the evolution of human language [e.g. 27, 39]. McNeill [40], on the contrary, emphasizes the double characteristic of language, i.e., speech and gesture in the course of joint action in evolution. Brain area 44 is mainly responsible for the organization of action sequences, whereas area 45 is the part of Brocas area that contains many mirror neurons, which McNeill suggests became selfresponding to ones own actions, subsequently imbuing them to contain meaning. During the course of phylogeny, these two systems became co-opted in order to unite gesture and vocalization [40]. According to McNeill, the crucial shift in the function of mirror neurons occurred when they began to respond to significances other than the actions themselves, as a way of co-opting areas 44 and 45 in Brocas area, providing the basis for recognizing the actions of others. In other terms, this co-opted system seems to be part of a circuit for recognizing intentional goal-directed actions from ones own actions or from others. It should be stressed, that McNeill emphasizes the relational nature of the mirror neuron system which, in our opinion, is overlooked in many other theories. McNeill refers to Mead whom argued that [g]estures become significant symbols when they implicitly arouse in an individual making them the same response which they explicitly arouse in other individuals [40, p. 250]. Hence, meaningfulness emerges from the ability to activate a social reaction of another in yourself, a way of reacting in your own actions similarly to the actions of others, which McNeill denotes Meads loop. This means, gesturing also has the important role of activating our own mirror neuron system, as well as offering oneself the ability to take the role/perspective of the other simultaneously [40]. Thus, the shift in social interaction, as previously described in Vygotskys pointing example can partly be explained neurologically by Meads loop. In a similar vein, Gallagher [41], for example, argues that phenomenologically, when one sees another persons action or gesture, one directly perceives or immediately sees the meaning in the action/gesture, without the need to model it at a higher cognitive level. His main point is that the relevant neural systems are activated by the other persons action. Thus, the other person has an effect on us [41, p. 8-9]. In other words, Meads loop creates a connection of gesture to discourse, given that this relational characteristic is also present in speech.


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

This implies that bodily actions might be of crucial importance in the process of intersubjectivity. From a radically embodied perspective, the activation and/or reactivation of the mirror neuron system, together with other bodily mechanisms, might function as the glue that binds hand, mouth and language together, in a social and cultural sphere. 4.3.4 Four fundamental functions of the body in social interaction In summary, the work presented in the previous sections offers highly complementary rather than alternative views on the role of embodiment in intersubjectivity. By integrating these perspectives, we can obtain a deeper understanding without bypassing the effects of embodiment. Based on the previous discussions and empirical findings, four fundamental functions of embodiment in social interaction can be identified [for more details, see 10, 21]. It should be noted that these fundamental functions are not fixed, and, to some degree, overlapping. The body functions as a social resonance mechanism. The body functions as a means and end in communication and social interaction. Bodily actions and gestures function as a helping hand in shaping, expressing and sharing thoughts. The body functions as a representational device. The body functions as a social resonance mechanism suggests that there is no need to decode or represent embodied social stimuli to more advanced or cognitive states since the bodily states in themselves actually are cognitive states, as the work of Barsalou et al. [24] and others shows [25]. Hence, this first function characterizes how cognitive/bodily states of interacting partners are reflected both in themselves and in-between them at a basic level, during both online and offline interactions. The examples presented in this chapter, as well as other studies, demonstrate there is a strong relation between embodied and cognitive states in social interaction, since the bi-directional exchange between these states as well as between the interacting partners, occur automatically without the involvement of higher knowledge structures. The body functions as a means and end in communication and social interaction. The suggested linkage between action and action-perception provided by the mirror neuron system implies that the body and its sensorimotor processes are cognitive in themselves. The great benefit of this actionunderstanding linkage, beside its parsimony, is the inbuilt dual ability of grasping both the what and why aspects of the present action, i.e., what the action is about as well as catching the intention behind the movement. Hence, this second function stresses how bodily actions operate both outwardly and inwardly in meaning-making activity, e.g., through Meads loop. The functions of the body as a resonance mechanism and also a means and end might seem quite similar. However, while the function of the body as a resonance mechanism simply means that cognitive and bodily states of the interacting partners are reflected in both themselves and in-between them, it does not explain the relationship between their first-hand and third-hand experiences in social interaction. Instead, viewing the function of the body as a means and end offers a tentative explanation of that particular linkage, thereby unifying the inside and outside perspectives of

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


socially embodied interaction. In other words, the previously portrayed function of embodiment in social interaction mostly stresses that the body and its sensorimotor processes function as a social resonance mechanism, whereas the second function rather focuses on how this is accomplished. Bodily actions and gesture function as a helping hand in shaping, expressing and sharing thoughts. Besides speech, manual gesture is a significant (embodied) aspect of meaning-making activity, which can provide important information to the listener, since gesture offers speakers the means of expressing thoughts difficult to articulate in speech. Through gesturing, we are able to generate and embody dynamical associations between different matters, which can offer new insights to the present situation or problem at hand. In addition, gesture sometimes serves as an explicit instance of the action-meaning embodied in speech, suggesting that hand movements are physical externalizations of the speakers ideas. The body functions as a representational device. In addition to speech, there is the more controversial claim that non-vocal embodied action also has representational properties, where certain kinds of gesture, portraying representational aspects, are the most obvious examples of the body as an external representational device. Furthermore, the internal reactivation of agentenvironment-interaction, in the form of embodied simulation, can be considered as representations in a strict sense. The neurological roots of this ability might be the activity of the mirror neurons, since their linkage between action and actionperception suggests a kind of action representation that is directly enacted in social interaction. Furthermore, since mirror neurons seem to understand the goal of the action, it can be argued that the grasping of the action does not require a declarative understanding, since it is meaningful in itself. 4.4 Illustration and discussion The previous section has summarized some of the arguments for the view that it is first and foremost the enacted body, and the experiences that come from its situatedness in a social and cultural sphere, that constitute the roots of social interaction and cognition. In order to further illustrate this view, some frame-by-frame analyzed images from an episode of spontaneous social interaction captured in situ are presented in the following. The examples are from the first authors fieldwork on a horse ranch that maintains and preserves Spanish mustang horses [for more details, see 10]. We here briefly illustrate and discuss the role of scaffolding [42-43] (see Figure 1). In this episode the head of the ranch, Bob, is telling the visitors about the different places that nowadays keep herds of horses originating from the ranchs herds. In summary, all his bodily actions, i.e., facial expression, tone of voice, bodily posture, gesture, speech as well as gaze, reflect the significance of cross-modal embodied actions in social interaction. It should be pointed out that throughout the following analysis the earlier identified four functions of the body in social interaction are used as the underpinnings for describing and explaining how embodiment is part and parcel of social interaction and cognition. They are, however, not always mentioned explicitly throughout the analysis, since this


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

would result in many unnecessary reiterations rather than adequate descriptions and analyses of the interactions at hand. Throughout the enumeration of different places, Bob uses his fingers as scaffolds in several ways. First, through slight tapping actions, he uses his fingers as a means to put the places in order. For each location, he touches his fingers in a certain order as a way to inform himself to keep track of the places. These tapping actions signify the representational aspect of gesturing and therefore convey meaning in their own way, given that the actual gestures are signs of another aspect other than the actual tapping movement. That is, the tapping actions serve as a way to keep track of the different locations. Second, the tapping action is flawlessly integrated with the speech utterance of the name of the location, thus indicating and highlighting the central information in the utterance in both speech and gesture. That is, the two most important aspects of the utterances, the number and location, are manifested in speech as well as gesture simultaneously.

Figure 1. Bob counting on his fingers

Third, in order to re-enact/remember the different locations, the very actions of moulding, moving, and contacting his fingers facilitate the process of remembering the locations and their names. That is, the act of moving his fingers functions as a way of shaping and expressing thoughts, given that in the precise moment when he has figured out the name of the location, he stops moving the current finger, and then touches the other hands finger. That is, the movement shows the status of the act of remembering. Taken together, this example demonstrates the act of scaffolding, by using ones fingers as representational devices during the enumeration of the places. In total, there are nine different herds, situated at seven locations. Bobs cognitive strategy of off-loading the act of remembering into a visual and external representational format through embodied actions is very obvious and observable in the above examples. However, the very action and experience of moving/touching his fingers actually brings forth the names, by facilitating the shaping of the numbers and locations, instead of functioning as a way of externalizing the already existing names of the places. More specifically, this strategy has far more wide reaching consequences than stated above. The explanation thus far focuses on what one might term individual scaffolding, but it should be noted that Bob is actually both in relation with himself and with the others. By using his fingers he creates and experiences action-

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


perception loops through his body, given that the actual movements provide him with the kinesthetic experience of movement as well as the felt sense of touching when he grips and senses his fingers during the enumeration of places, in which every embodied action he makes creates a spectrum of embodied experiences to him. Thus, he constructs and experiences sensory-motor information at the same time through the crossmodal integration of his embodied being. This type of relational embodied scaffolding functions as a way of being in dialogue with oneself as well as with others, and is accomplished by the activation of his own mirror neuron system through Meads loop. This means, most of the enumeration and the act of remembering is an intra-personal interaction, but there is also an inter-personal theme present. Additional evidence in favor of this interpretation emerges during the frame-byframe analysis of the actions. The tiny movement and touching of Bobs fingers when he uses them as scaffolds in the naming of the herds, are almost invisible and perceived unconsciously in real time. Due to the ways our cognition is embodied, the body knows and grasps directly what is going on. Thus, the effects of embodiment do the job for us. Indeed, one might ask why do humans perform such actions so frequently when they are almost invisible and therefore not necessarily communicative in any obvious way. What role and relevance do they actually have if they are not used for social interaction and communication in the first place? Our tentative answer to these questions is that these actions are not first and foremost inter-communicative, but also function inwardly. This means, they are both intra- and inter-communicative, stressing the relational aspect of social interaction and cognition that is profoundly manifested in our embodiment. Thus, the embodied nature of social interaction and cognition unifies the individual and social perspectives. For instance, in Meads loop, gesture and language are displayed but their relational characteristics are the same they are both external actions that we can act upon in the public sphere and internal embodied actions used to organize and structure our internal and sometimes abstract and decoupled thinking, though still grounded in embodied experience. Furthermore, Bobs enumeration is a significant example of what Clark [44] refers to as surrogate situatedness. According to Clark, human reason is disengaged but not disembodied [44 p. 236] and there is no sharp line between so-called online versus offline cognition, given that both processes are running in parallel. He argues that humans create and use human-built structures in order to transform the space of higher-level cognition, and stresses that we actively create restricted artificial environments that allow us to deploy basic perception-actionreason routines in the absence of their proper objects [44, p. 233]. According to Clark, these strategies allow human cognition to be disengaged while at the same time offering a concrete place in which to organize action-perception couplings of an essentially real world-like kind of interaction. Broadly speaking, whenever people act/think/communicate they are always in interaction, either with themselves or with other interactants or objects.


J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action

4.5 Conclusions The aim of this chapter has been to clarify the role and relevance of the body in social interaction and cognition, and we have tried to present an integrated, interdisciplinary theoretical foundation for studying the embodied nature of social interaction and cognition. What ties all these issues together is the idea of the social mind as being relational, radically embodied, and situated in the social sphere. That means, the social dimension meets the physiological dimension, thus reaping the best of both worlds without neglecting the effects of embodiment for social interaction. What further unites all these issues, as hand occurs in glove, is how profoundly embodiment shapes social interaction and cognition through unfolding socially embodied actions in social and cultural contexts. As discussed in this chapter, the key to this coherent union is the way our social mind is embodied, a fact that should not be neglected or trivialized. To summarize, the ways humans are embodied imply that ones own understanding of social interaction is more than the exchange of communication signals between disembodied information-processors. Instead, meaning and intentions are emergent products of socially embodied interaction, and in many situations they can be viewed as distributed phenomena rather than as individual private mental acts or properties. 4.6 References
[1] M. Augoustinos & I. Walker, Social cognition. London: Sage, 1995. [2] Z. Kunda, Social cognition. Cambridge, MA: MIT Press, 1999. [3] K. A. Quinn, N. C. Macrae & G. V. Bodenhausen, Social cognition. Encyclopedia of Cognitive Science, (pp.66-73). London: Macmillan, 2003. [4] S. Gallagher, How the body shapes the mind. Oxford: Oxford University Press, 2005. [5] A, Clark, An embodied cognitive science? Trends in Cognitive Sciences, 3(9), 345-351, 1999. [6] R. W. Gibbs, Jr., Embodiment and cognitive science. Cambridge: Cambridge University Press, 2006. [7] G. Lakoff & M. Johnson, Philosophy in the flesh. New York: Basic Books, 1999. [8] F. J. Varela, E, Thompson & E. Rosch, The embodied mind. Cambridge: MA: MIT Press, 1991. [9] T. Ziemke, J. Zlatev & R. Frank (Eds.), Body, language, and mind: Embodiment,1. Berlin: Mouton de Gruyter, 2007. [10] J. Lindblom, Minding the Body interacting socially through embodied action. Doctoral dissertation, University of Linkping/University of Skvde, Sweden. ISBN 978-91-85831-48-7, 2007 [11] S. Brten, Participants perception of others acts. Culture & Psychology, 9 (3), 261-276, 2003. [12] P. Rochat & T. Striano, Social-cognitive development in the first year. In P. Rochat (Ed.), Early Social Cognition, (pp.3-34). Mahwah, NJ: Lawrence Erlbaum, 1999. [13] T. Singer, D. Wolpert & C. Frith, Introduction. In C. Frith & D. Wolpert, The Neuroscience of Social Interaction, (pp. xiii-xxvii). Oxford: Oxford University Press, 2004. [14] R. W. Gibbs, Jr., Intentions as emergent products of social interactions. In B. F. Malle, L. J. Moses & D. A. Baldwin (Eds.), Intentions and Intentionality, 105-122. Cambridge, MA: MIT Press, 2001. [15] S. G. Shanker & B. J. King, The emergence of a new paradigm in ape language research. Behavioral & Brain Sciences, 25, 605-656, 2002. [16] A. Fogel, Developing through relationships. New York: Harvester Wheatsheaf, 1993. [17] T. Ingold, Communication and communion. Behavioral & Brain Sciences, 25(5), 627-628, 2002. [18] H. Maturana & F.Varela, The tree of knowledge. Boston: Shambhala, 1987. [19] E. Hutchins, Cognition in the wild. Cambridge, MA: MIT Press, 1995. [20] G. H. Mead, Mind, self and society. Chicago: Chicago University Press, 1934.

J. Lindblom and T. Ziemke / Interacting Socially Through Embodied Action


[21] L. S. Vygotsky, Mind in society. Cambridge, MA: Harvard University Press, 1978. [22] J. Lindblom, Embodied action as a helping hand in social interaction. Proceedings of the 28th Annual Conference of the Cognitive Science Society, (pp. 477-482). Mahwah, NJ: Lawrence Erlbaum, 2006. [23] G. R. Semin & E. R. Smith, Interfaces of social psychology with situated and embodied cognition. Cognitive Systems Research, 3(3), 385-396, 2002. [24] L.W. Barsalou, P. M. Niedenthal, A. K. Barbey & J. A. Ruppert, Social embodiment. In B. H. Ross (Ed.), The Psychology of Learning and Motivation, 43, (pp. 43-92). San Diego, CA: Academic Press, 2003. [25] P. M. Niedenthal, L. W. Barsalou, P. Winkielman, S. Krath-Gruber & F. Ric, Embodiment in attitudes, social perception, and emotion. Personality and Social Psychology Review, 9(3), 184211, 2005. [26] G. Rizzolatti, The mirror neuron system and its function in humans. Anatomical Embryology, 210, 419-421, 2005. [27] G. Rizzolatti & M. A. Arbib, Language within our grasp. Trends in Neurosciences, 21, 188-194, 1998. [28] V.Gallese, C.Keysers & G.Rizzolatti, A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8 (9), 398-403, 2004. [29] H. Svensson, J. Lindblom & T. Ziemke, Making sense of embodied cognition: simulation theories of shared neural mechanisms for sensorimotor and cognitive processes. In T. Ziemke, J. Zlatev & R. Frank (Eds.) Body, language, and mind: Embodiment,1, (pp. 241-270). Berlin: Mouton de Gruyter, 2007. [30] J. Lindblom & T. Ziemke, Embodiment and social interaction: implications for cognitive science. In T. Ziemke, J. Zlatev & R. Frank (Eds.), Body, language, and mind: Embodiment, 1, (pp.129162). Berlin: Mouton de Gruyter, 2007. [31] V. Gallese, The manifold nature of interpersonal relations: the quest for a common mechanism. In C. Frith & D. Wolpert (Eds.) The neuroscience of social interaction: decoding, imitating and influencing the actions of others, (pp.159-182). Oxford: Oxford University Press, 2004. [32] H. Svensson, Embodied simulation as off-line representation. Licentiate thesis, University of Linkping/University of Skvde, Sweden. ISBN 978-91-85831-83-8, 2007. [33] M.A.Umilt, E. Kohler, V. Gallese, L. Fogassi, L. Fadiga, C. Keysers & G. Rizzolatti, (2001) I know what you are doing: a neurophysiological study. Neuron, 32, 91-101, 2001. [34] M. Iacoboni, I. Molnar-Szakacs, V. Gallese, G. Buccino, J. C. Mazziotta & G. Rizzolatti, Grasping the intentions of others with ones own mirror neuron system. PLoS Biology, 3(3):e79, 529-535, 2005. [35] T. Kaplan & M. Iacoboni, Getting a grip on other minds. Social Neuroscience, 1(3-4), 175-183, 2006. [36] M. A. Arbib, From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28, 105-167, 2005. [37] D. McNeill, Hand and mind. Chicago: Chicago University Press, 1992. [38] S. Goldin-Meadow, Hearing gesture. Cambridge, MA: Belknap Press, 2003. [39] J. Zlatev, Embodiment, language, and mimesis. In: T. Ziemke J. Zlatev, & R. Frank (Eds.), Body, language, and mind: Embodiment, 1, (pp. 297-337). Berlin: Mouton de Gruyter, 2007. [40] D. McNeill, Gesture and thought. Chicago: The University of Chicago Press, 2005. [41] S. Gallagher, Simulation trouble. Social Neuroscience, 2, 1-13, 2007. [42] D. Wood, J. S. Bruner & G. Ross, The role of tutoring in problem-solving. Journal of Child Psychology and Psychiatry, 17, 89-100, 1976. [43] M. Wilson, Six views of embodied cognition. Psychonomic Bulletin & Review, 9(4), 625-636, 2002. [44] A. Clark, Beyond the flesh: some lessons from a mole cricket. Artificial Life, 11, 233-244, 2005.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Conceptual and Methodological Issues in the Investigation of Primate Intersubjectivity

Abstract. Historically, the ability to point and conversely the absence of pointing in other great ape species has been interpreted as evidence of great discontinuity across the primate lines in the ability to share meaning with an interlocutor. However, this conclusion ignored a variety of observations of nonhuman primates pointing in captivity over the past century and was put to rest by careful experimental work conducted in especially the past decade. Now the debate concerns the human ability to declaratively point and the absence of declarative pointing in other great apes and the same discontinuous conclusions are being drawn. In this chapter, we argue that this is a continuation of the same debate that presupposes certain problematic ideas about the nature of meaning and mind. We attempt to show that the mental state of, for example, a pointer is not what makes an act declarative (or imperative) and we examine this mentalistic picture of the mind that guides the work of theorists who claim to be advancing very different explanations of early social cognition. We then turn to a more general methodological critique of existing research in order to show that the lack of valid empirical evidence can speak to these issues.

Contents 5.1 5.2 5.3 5.4 5.5 Introduction......................................................................................................... 65 Conceptual clarification, theory construction and empirical research ................ 67 Drawing undrawable conclusions ....................................................................... 71 Conclusions......................................................................................................... 76 References........................................................................................................... 77

5.1 Introduction It is generally agreed that we share some capacity for basic forms of intersubjective engagement with other primate species [1, 2]. For example, chimpanzees and humans are both adept at following anothers gaze and signalling with communicative gestures. But there is debate concerning how to accurately characterize the cognitive differences across species, particularly concerning the extent to which other apes are able to reach human levels of shared meaning. The general consensus, as a pair of recent target articles in the prestigious journal


T.P. Racine et al. / Conceptual and Methodological Issues

Behavioural and Brain Sciences have argued [3, 4], is that human social cognition is unlike that of any other species in its nature, origins and extent. What is particularly persuasive about this pronouncement is that it comes from two research groups who fail to agree on much else [5, 6]. In this chapter, we conceptually and methodologically analyze the arguments in support of these conclusions. That human social cognition is unlike that of any other species is beyond dispute in certain respects. For example, concerning meaning shared through language, it is trivially the case that because apes do not develop language in the wild they do not use linguistic symbols to share meaning. Even enculturated and languagetrained apes that possess considerable communicative skill [7] are said by many to lack an appreciation of the communicative intention behind the act [e.g., 8]. And conversely understanding communicative intentions is said to enable the sharing of linguistic meaning in human infants [4]. This is a common and commonsensical way of explaining the development of one species and the lack of development of another with a single mechanism. Using an identical logic, it has been argued that although other apes might gesture or follow anothers gaze, they do not appreciate the mental state that is behind and causally related to the act in question. As Tomasello and colleagues recently put it, To recover the intended meaning of a pointing gesturerequires some fairly serious mindreading [9]. What is interesting about this claim though is that it follows the statement that, pointing can convey an almost infinite variety of meanings by saying, in effect, If you look over there, youll know what I mean. Therefore, it would be equally valid to claim that understanding the meaning of a pointing gesture requires some fairly serious context-reading. But there is a problem with either of these ways of putting it though because they overemphasize, respectively, the inner and the outer, both of which are intrinsic to the sharing of meaning. However, possibly (a) because researchers have rich access to their own mental lives and (b) reference underdetermines meaning, few researchers would imagine that intersubjectivity involves sharing contexts. Thus, the pendulum swings too far in the other direction, leading to theories of meaning in existing comparative and developmental research that are overly mentalistic and which conceptualize certain forms of shared meaning as the sharing of mental states [1, 2, 10-16]. In this chapter, we argue against this way of thinking on logical and methodological grounds by focussing on the ontogeny and phylogeny of early social cognitive capacities. We first describe and critique widely assumed mentalistic views of meaning and mind and briefly discuss social cognitive capacities in human and nonhuman primates to show the degree to which the extant literature is interpreted within this mentalistic frame of reference. We turn next to a methodological critique that questions the validity of existing empirical evidence with respect to the adjudication of these issues. Our assumption is that the methodological shortcomings that we identify in the existing literature are more likely to occur when one does not have a firm grasp of the relation between the inner and outer. We conclude that the lack of appreciation for conceptual analysis may be helping to fuel debates regarding social cognition in the developmental and comparative literatures.

T.P. Racine et al. / Conceptual and Methodological Issues


5.2 Conceptual clarification, theory construction and empirical research In this section we argue that: (a) studying the phylogeny and ontogeny of early social cognition necessarily involves conceptual analysis and (b) problematic preconceptions about the nature of meaning and mind interfere with the interpretation of data and the construction of valid theories. Some might claim that comparative and developmental psychologists should not worry themselves about such issues because theirs is an empirical science and issues of meaning and epistemology are the business of philosophers. However, even if philosophers agreed about these issues, which they do not, psychological theories and empirical research are based on assumptions about these issues and therefore the work raises conceptual, philosophical questions whether we like it or not. Because conceptual analysis is widely viewed as irrelevant to psychological science, however, it is uncontroversial to issue disclaimers such as a recent one by Tomasello and colleagues that [they] are not attempting to address the large and complex philosophical literature on the nature of mutual knowledge nor the philosophical use of the word know[9]. However, in our view this strategy unfortunately obscures researchers conceptual commitments, creating invisible philosophy. Instead, we argue that it is preferable to make assumptions explicit and reflect on the meanings of the concepts that we employ in our investigations [13]. 5.2.1 Psychology cannot escape its philosophical roots Although we do not wish to single out Tomasello and his colleagues, their recent article affords us an opportunity to introduce and discuss the issues we raise in this chapter. And it is ironic that although these authors note that they wish to avoid getting into philosophical analyses, when explaining the basics of pointing their first substantive references are to the works of two philosophers, Wittgenstein and Grice [9]. This is also an odd juxtaposition in our view because the differences between Wittgenstein and Grice merit serious attention, more attention than we have space for at present. But as Tomasello et al. point out, Wittgenstein noted that reference massively underdetermines meaning and that the meaning of a sign, gesture, and so forth depends critically on the context in which it is embedded [17]. Tomasello and colleagues [18] then argue that: Crucially, as Grice (1957) first observed, cooperative communicative acts also involve in addition an intention about the communication specifically. In this analysis, when I point to a tree for you, I not only want you to notice the tree (for some reason), I also want us to notice together my desire that you notice the tree and this additional tier is necessary to instigate in you the kinds of relevance inference required to identify my reason for communicating in the first placeWe call this, following Sperber and Wilson, the communicative intention, and it represents my desire that we both know together that I am referring you to the tree so that you will infer what I want you to know or to do. Strictly speaking of course, these are not Grices observations, but rather his theory of meaning. To characterize the above propositions as observations obscures the fact that they paint what Wittgenstein called a metaphysical picture


T.P. Racine et al. / Conceptual and Methodological Issues

because they do not describe possible objects of observation [17]. To notice, for example, that a person is communicating is not to observe two objects, first a communicative intention and second the behaviour that constitutes the communicative act. However, in making meaning parasitic on intentions, Grices analysis assumes that communicative content is derivative from mental content1. The metaphysical picture underlying this theory is that understanding what a person means by a gesture involves understanding what they have in mind when they perform that gesture. By contrast, Wittgenstein claims, An intention is embedded in its situation. If the technique of chess did not exist, I could not intend to play a game of chess [19]. This is not to argue that intentions are somehow non-mental but rather that mental properties are not radically separate from environmental ones, such as social and cultural practices. After all, intending to X presupposes the existence of Xing. Now, certain cases of sharing meaning may involve determining what another person has in mind. For it seems clear that guessing the referent of an ambiguous point might involve inferences about the pointers mental state, for example, whether she is attending to X or Y as she points. However, determining the referent of a point is distinct from understanding the intentions of the pointer, for example, whether she is playing a game, making a joke, issuing a warning, and so on. Tomasello and colleagues assume that understanding intentions involves inferences about representations and accordingly is part of what separates humans from other primates [but see 20]. Our point, at least at this juncture, is these researchers could not have constructed the theory they did had they followed Wittgenstein rather than Grice. One might protest that these authors base their theory on Grice because they happen to agree with him, but this is our very point: they cant help but do (invisible) philosophy. Wittgenstein claimed that conceptual clarity is a precondition for any successful empirical investigation. As Machado and his colleagues [21] have suggested, not only is the need for conceptual analysis poorly understood but: conceptual investigations have also been dismissed as philosophical speculation alien or even inimical to science, as misguided attempts to circumvent empirical research, a sort of shortcut in the path to the truth, or as armchair speculation about the meaning of words. We believe that this diagnosis is, unfortunately, correct2. And in terms of research concerning primate intersubjectivity, researchers must bear in mind that Ascribing an understanding of attention to infants specifies what they are capable of doing, not how or why they do it [22]. Cognitive answers to the causal, empirical, how questions typically involve claims that human-specific joint attention behaviours are causally dependent upon a certain class of mental representations, specifically those that represent the attentional and intentional

Although we do not have the space to make our case here, this necessary violates Wittgensteins private language arguments and leaves Tomasello with an ungrounded level of meaning and a fundamental incoherency in his theory [2, 14]. The basic problem is that it is impossible to extract the meaning of the concept intention or attention simply from private experience. 2 A recent example is Moores commentary entitled Show Me the Theory! that was written in response to Racine and Carpendales attempt to clarify the meaning of joint attention as used by contemporary theorists [14, 23].

T.P. Racine et al. / Conceptual and Methodological Issues


mental states of others. These theories posit mental representations as causally related to the behaviours that constitute the grounds for ascribing understanding others attention and intentions. But, given that the only way to determine whether such mental representations are present is to observe the behaviours that are their putative effects, in what sense are joint attention behaviours explained by propositions about representations [13, 16]? This violates a basic scientific tenet that causes and effects be logically distinct. Fodor has noted that in certain cases, however, the requirement of logical independence for causes and effects has not been met. For example, in Mendels research, genes were initially defined in terms of their effects; the presence of genes as trait-bearing entities could only be confirmed through observation of those traits that constituted their putative effects. Fodor points out that this did not prevent the development of a successful science of genetics. However, this success entailed the resolution of such ambiguity regarding causes and effects. Mendels classic demonstration that recessive characteristics appear unaltered in the offspring of heterozygotesshowed that a distinction is required between traits (effects) and their genetic carriers (causes) [24]. The mental property of a communicative intention is constituted by a family of behavioural properties, namely those behaviours that count as intentionally communicative. It is impossible to identify a communicative intention independent of intentionally communicative behaviours, and thus, the various representational hypotheses that are widely conceived of as competing causal explanations are in fact alternative redescriptions of the behavioural phenomena they putatively explain [2, 12, 14-16]. 5.2.2 What, how and why The first order of business in attempting to separate definitional what questions from causal how or why questions is clarifying the grounds upon which mentality is ascribed to others. Simply put, inner states are attributed to others on the basis of behavioural criteria. That is, things that they do, express and so on. However, it is not action alone but action in some specific context that matters, for what is criterial for a given psychological predicate in one situation might not be in another. Because this is perhaps easier to understand in a language using context, consider the following example [25]: Suppose the phone ringsyou pick up the receiver, say Hello, and enter into conversation with the speaker at the other end of the line. Afterward it could be said that you answered the phone, not that you tried to answer it. If you couldnt get a hold of the receiver, or dropped it breaking the phone, or there was no response from the other end, etc., then it could be said that you tried to answer the phone. There are indefinitely many different situations in which answered can be said; similarly for tried to answer; and indefinitely many situations in which it would not be clear that either thing could be said. This is also how researchers often determine the meaning of a pointing gesture. For example, a researcher might activate a toy and when an infant points at the toy, the researcher responds by sharing attention. If the infant stops gesturing she is seen to be satisfied and it is coded as a declarative act, whereas if the infant persists in directing the researchers attention at an object it is coded as imperative


T.P. Racine et al. / Conceptual and Methodological Issues

[e.g., 26]. In other words, overlapping behaviours are criterial for differing motivational states because the sequence of behaviours in which the gesture occurs is manifestly different. This example demonstrates that we say that the motive is declarative (or imperative) because of the situation in which it is embedded. Again, this should not be understood as a claim that a motivation, for example, is not a property of an agent and is not in this sense inner. If children or apes exhibit very similar behaviours in very similar situations it follows that pointing and other aspects of more advanced (secondary) forms of intersubjectivity must apply equally to human and nonhuman primates [27, 28]. This is a logical and not an empirical must. Although the primary application of most of the concepts of mind of interest to comparative and developmental researchers is to human beings, these are sensibly applied to the great apes because of their similarity to humans and their life to the human form of life [17]. The causal issue of how and why apes do such things is a separate question and is in fact the one in which researchers are most interested. The problem in many contemporary theories is that logically indistinguishable causes and effects are posited, which cannot contribute to our understanding of causal issues. Although much creative and informative research has been conducted, in our view the theoretical frameworks within which these data are understood shroud social cognitive capacities in a mentalistic fog that is hard to see ones way through. In a similar vein, Povinelli claims that chimpanzees satisfy criteria for understanding attention such as careful attention to eyes of human experimenters but yet do not understand the psychological significance of seeing [29]. But to pay careful attention to the eyes and to otherwise monitor the gaze of others are criterial for understanding basic forms of attention. And although they critique one anothers work, Povinelli, Tomasello and their colleagues seem to all look for something additional to the activity in question and they assume that what gives the activity the meaning that we attribute to it is the mental state of the agent [2]. The confusions about the relation between causation and definition that inhere in representationalist views are so deeply embedded that they also creep into approaches to mind that try to avoid equating representation with understanding. Proponents of distributed approaches to cognition such as Johnson [e.g., 30] explicitly contrast their theories to ones like Tomasellos or Povinellis in an attempt to provide a more accurate characterization of the phenomena that constitute understanding attention and other related psychological states. Although we share some of Johnsons motivations and have made similar points ourselves [1, 2, 14], we will use her article as a basis for further clarification. Johnson claims that, Rather than using behaviour as the basis for inferences to invisible mental events such as intentions, the distributed approach treats communicative interactions as, themselves, directly observable cognitive events [31]. Again, Wittgensteins separation of definitional relations between inner states and behaviour from the causal relations that may obtain highlights the problem here. As Susswein and Racine point out, it is misleading to think of a behaviour as a directly observable cognitive event because this conflates the criteria by which cognitive events are ascribed with the cognitive events themselves [16]. As Wittgenstein famously remarked, "An 'inner process' stands in need of outward criteria [17]. And claiming that criteria are cognitive events obscures causal/definitional and inner/outer in a similar way as the representationalist programme. This is unfortunate because many comparative and developmental

T.P. Racine et al. / Conceptual and Methodological Issues


researchers tend to think of an inner mental world of experiences, dispositions, abilities, and preferences, and so on, on the one hand, and an outer world of behaviours and environments on the other. But just as heads logically presupposes a tails side of the coin, the relations between the mental and the behavioural are [intrinsically related] andcannot be identified independently of each other [32]. Although we sympathize with Johnsons attempt to avoid logical difficulties inherent in representational approaches, we argue that collapsing, rather than overdrawing, critical distinctions between the inner and outer also obscures these issues of interest.

5.3 Drawing undrawable conclusions We now move from logical concerns to methodological ones. In recent years, comparative and developmental psychologists have produced a large number of empirical findings purporting to demonstrate that young infants and/or nonhuman primates, especially apes, either have or do not have the capacity to represent the invisible contents of other minds [e.g., 5, 6, 9, 33, 34]. We have argued that the epistemological assumptions upon these studies are based are untenable. We now turn our focus to some of the more common methodological failings of species comparisons designed to assess the cognitive bases of the comprehension or production of manual gestures, the comprehension of gaze, and the comprehension of epistemic states. These shortcomings include (a) failure to control for or otherwise acknowledge rearing history confounds with species [e.g., 1, 10, 35-37], (b) failure to control training regimens across species, thereby confounding training histories with species, and (c) confounding experimental manipulations across levels of independent variables [for an insightful critique of contemporary research into comparative cognition on separate grounds, see 38]. 5.3.1 The confound in rearing histories between humans and other apes Studies in which the communicative competencies of captive apes are compared with typically developing human children universally suffer from a lack of experimental control over the respective organisms rearing histories. Thus, ontogenetically and experimentally relevant factors quite typically experienced by captive apes, such as the early trauma associated with witnessing the murder of ones mother, rejection by and consequent loss of a primary caregiver, peercentred attachment relations, impoverished physical environments, and relatively restricted interaction with (and hence, familiarity with) human caregivers are all confounded with the apes species classifications. No researcher would: (a) sample human children who have experienced the kinds of extreme trauma, neglect, and impoverishment that is quite typical of captive ape experiences, (b) measure aspects of their sensitivity to human communicative or attentional behaviour, and then (c) generalize from these traumatized and/or impoverished samples to the entire human species. Yet, there are many published examples in which even very eminent researchers have generalized from their captive ape samples to entire species [e.g., 29, 33, 39]. For a typical example, in an oft-cited monograph by Povinelli and Eddy [29], seven orphaned young chimpanzees, who were raised in peer cohorts in the


T.P. Racine et al. / Conceptual and Methodological Issues

relatively impoverished circumstances of a nursery in a biomedical research centre, were compared with human 2- to 5-year-old children, raised by their biological parents, in the comparatively enriched circumstances characteristic of Western upbringing contexts in the developed world. The tests were designed to assess the sensitivity of these respective organisms to visual cues of visual attention in human experimenters. In experimental circumstances that were alleged by the authors to be similar, the children performed better than did the apes in some procedures. Was this because the apes lacked stable, primary, adult attachment figures over the entireties of their childhoods? Was this because the apes had very much less interactive experience with humans than did the human children? Was this because the apes were representatives of a different species than the human children? Although it is clearly impossible to isolate the factors responsible for the performance differences between the chimpanzees and the children, Povinelli and Eddy concluded that the differences in performance between the two groups were attributable to the chimpanzees incapacity to appreciate the mental connection engendered by visual inspection and that despite their striking use of (and interest in) the eyes, 5-6 chimpanzees apparently see very little behind them [40]. Povinelli and Eddy argued that because they had demonstrated similar baseline performances between the apes and the children in some experimental conditions, and because the differences that they found were evident in only some transfer conditions, that therefore they had demonstrated a deficiency in those transfer conditions by the chimpanzees. However, although it is well established that more experienced animals will more easily transfer their expertise to novel experimental circumstances [e.g., 41], and therefore the poorer transfer performances of the chimpanzees in that study may be due to their species inability to discern others visual perspectives, their performances might also simply reflect their relative lack of experience engaging with humans, or other incidental effects of their radically different, relatively impoverished rearing histories. And there is no reasonable basis to conclude that the apes had anything like as much pre-experimental experience with human gaze cues of visual attention as did the human children (even the younger human children), and there is every reason to believe that the apes were far less experienced than the human children were (even the younger human children). It is simply impossible to distinguish these possibilities from this research design. The attribution by Povinelli and Eddy of the performance differences between the humans and the apes to their species classifications and not their pre-experimental histories in the face of obvious rearing history differences that differed immensely between the two groups is symptomatic of the recent literature on comparative cognition. Although by no means do we suggest that this interpretive bias is characteristic only of one particular group of researchers in one particular laboratory, the fact remains that experimental confounds do not disappear if ignored. Of course, it is, as a practical and ethical matter, impossible to control these preexperimental factors in a full factorial experimental design, particularly with organisms as long-lived as humans and other apes. Although developmental research suggests that very impoverished upbringings lead to poor cognitive and social development [e.g., 42-45], we cannot, for example, consign human children to be raised by chimpanzees. Quasi-experimental designs can, however, be achieved by cross-fostering apes with human caregivers. This has been attempted

T.P. Racine et al. / Conceptual and Methodological Issues


with great apes of all species, including chimpanzees [46-50], bonobos [7, 51], gorillas [52] and orangutans [53]. Others have raised infants in bi-specific communities consisting of both humans and apes: e.g., Loulis, adopted son of Washoe at the Central Washington University in Bellingham, Washington, U.S.A. [54], several chimpanzees at the Primate Research Centre of Kyoto, Japan, and several animals at the Great Ape Trust of Des Moines, Iowa, U.S.A. However, to our knowledge, only two formal cross-fostering experiments have raised apes in a human culture from near-birth. The first was the study by Hayes and Hayes of a single chimpanzee subject, Viki [49]. The second was by Gardner and Gardner [55]; in this experiment, four chimpanzees were cross-fostered from neonates: Moja, Tatu, Dar, & Pili (Pili died at less than two years of age, so there is a limited behavioural record for him). Thus, to our knowledge, only four chimpanzee subjects in the history of science (excluding Pili) could have served, in principle, as an experimentally valid comparison to human children by virtue of having been raised from birth by humans, with stable, primary attachment relationships to particular human caregivers, in the socio-ecological and physical circumstances typical of human childhood [56, for non-experimental account of cross-fostering a chimpanzee named Lucy, see 57]. These are all astonishingly accomplished animals, whose behavioural competencies refuted a number of recent claims about ape social cognition, decades in advance of those claims. For example, it is widely claimed that apes do not communicate protodeclaratively or imitate despite hundreds of published examples of this in these and other language-trained animals or later-adopted apes [e.g., 49, 58]. There also seems to be a very widespread misconception that any ape raised by humans is therefore cross-fostered by humans to the same degree, regardless of the diversity of circumstances in which these apes have lost their biological mothers, the wide variations in the social, emotional, and physical environments in which they were raised from birth, and the extraordinarily large differences among captive apes in their relative familiarity with humans. We hope that it is now obvious that virtually all direct comparisons of the cognition of apes and humans are invalid from an experimental point of view. This is not to argue that the whole enterprise of comparative cognition is meaningless; we are simply making the rather rudimentary point that if one compares individuals from two separate groups with radically different rearing histories and finds a significant difference in a dependent variable between those two groups, then one cannot rationally conclude that you have uncovered a group, but not a rearing history, difference; it is entirely unclear, in these kinds of research designs, whether differences between apes and humans are attributable to species differences (i.e., different evolutionary histories), rearing history differences, or some interaction between these evolutionary and developmental factors. Whenever a researcher concludes, from research designs like this that they have identified a species difference in social cognition, this conclusion can only illustrate the interpretive bias of the researcher. If this is still not obvious, then consider the following thought experiment: raise human boys from birth in the same relatively impoverished circumstances in which captive apes are typically raised. Let the comparison group be human girls raised by their biological parents in their homes. Years later, assess the sensitivity of the boys and the girls to subtle cues of visual attention in human adults. Suppose the girls, unsurprisingly, perform better than the boyswould any


T.P. Racine et al. / Conceptual and Methodological Issues

researcher in their right mind attribute the difference to a gender difference between boys and girls? Of course not, rearing history is clearly confounded with the gender of the subjects. Yet substitute apes for boys and humans for girls in this research design and how often have researchers trumpeted a species difference between apes and humans in various aspects of sensitivity to visual attention? [59] Again, in the face of the extraordinary difficulty of adequately controlling for rearing history factors in ape-human comparisons, we believe that these kinds of experiments are worthwhilewhat we wish to highlight here is the irrationality of asserting one factors influence (e.g., speciesthey cant do it because they are chimpanzees), rather than another confounded factors influence (e.g., rearing historythey cant do it because they were exposed to prolonged trauma, neglect, or were relatively less experienced). 5.3.2 Methodological misconceptions of behaviour Given that this is such a commonplace practice in comparative psychology, it is perhaps not very surprising that rearing histories are ignored in much of contemporary empirical research albeit with some notable exceptions [e.g., 33]. It is more surprising that it is also common to find researchers claiming to have demonstrated a cognitive deficiency in great apes, relative to humans, in the face of apes superior performance. A straightforward example of this is a study by Povinelli et al. in which human children and chimpanzees were presented with an experimenter seated behind two containers, one of which was baited with a reward (stickers, in the case of children, and edible treats, in the case of the chimpanzees) [39]. In the critical test trials, the experimenter adopted one of three different postures designed to communicate the location of the hidden treat: (a) head and eyes turned to look at the baited container (At Target), (b) head oriented straight ahead, with eyes peering at the baited container (Eyes Only), and (c) head and eyes oriented considerably above the baited container (Above Container). According to the reasoning of Povinelli and his colleagues, if an organism has a high-level mentalistic concept of visual attention, in which attention is conceived of as a kind of laser beam, then that organism ought to find it difficult to locate the hidden treats in the Above Container condition because the gaze is focused decidedly away from the baited container. They found that both human children and apes performed poorly (at chance) in the Eyes Only condition, that both humans and apes performed well in the At Target condition, and that apes performed well (above chance) in the Above Target, but the children performed at chance levels. Povinelli et al. attributed the apes better performance in the Above Target condition as evidence that apes conceptions of visual attention were like floodlamps, vaguely and imprecisely indicating a general area, but not a specific locus. They concluded that chimpanzees had only a low-level appreciation of gaze, despite the fact that these same chimpanzees very frequently turned to look at the ceiling behind them, on the same sides at which the experimenters eyes were focused [60]. In a study of the performance of human adults, Thomas, Murphy, Pitt, Rivers, and Leavens found that human adults acted just like the chimpanzees in the Above Target condition [61]; taking Povinelli et al.s hypotheses at face value, this implies that human children have high-level, laser-beam-like

T.P. Racine et al. / Conceptual and Methodological Issues


conceptions of visual attention, and then humans lose this conceptual sophistication in adulthood. More likely, the improved performances of the chimpanzees, compared to the human children, simply reflected either their superior grasp of the task requirements or their greater motivation for their rewards, contra Povinelli and colleagues. Another example of this kind of methodological error comes from the same laboratory. Theall and Povinelli attempted to determine whether chimpanzees would exhibit more attention-getting behaviour when an experimenter could not see them, compared to when the experimenter could see them [62]. The critical probe trials in which the experimenters adopted their various attentive and inattentive postures were embedded in a series of standard trials, in which the experimenter was attentive. There was one experimental probe trial for every two standard, baseline trials. Crucially, in the standard trials, the chimpanzees were rewarded immediately for placing their hands through a hole in a transparent barrier. In contrast, in the experimental probe trials (in half of which the experimenter was attentive and in half of which the experimenter was inattentive) no reinforcement took place until after 20 seconds had elapsed from the apes placements of their hands through the barrier. Thus, the chimpanzees were trained to place their hands in particular holes and to expect immediate reinforcement for doing so, yet, unaccountably, on one-third of all trialsand irrespective of whether the experimenter was attentive or inattentivethe experimenter simply would not respond with reinforcement until a 20-second interval had elapsed. Thus, when Theall and Povinelli failed to find a difference in the rates of the hapless chimpanzees attention-getting behaviour between the so-called attentive and inattentive conditions, it is entirely unclear whether this is attributable to the apes inability to discriminate attentive from inattentive states, as the authors concluded, or whether, as seems much more likely to us, the chimpanzees were displaying attention-getting behaviour to enigmatically unresponsive experimenters, irrespective of their attentional state. In this study, then, the experimenters lack of responsiveness was confounded with the manipulation of the experimenters posture, relative to the more numerous baseline trials in which the experimenter responded immediately to the apes gestures. Other research protocols have clearly demonstrated that great apes do discriminate different attentional states in human experimenters demonstrating that almost all failures to demonstrate these kinds of discrimination in great apes are attributable to procedural deficiencies, rather than the deficiencies in the animals studied [e.g., 63-66]. There are other serious, yet common shortcomings in direct human-ape comparisons in cognitive performance, but virtually all existing studies purporting to compare humans with apes suffer from one or more of these three major methodological failings: failure to control pre-experimental histories across species, failure to control training protocols across species, or confounding factors across the levels of the intended independent variable. Thus, very strong theoretical positions [e.g., 3, 4, 8, 9, 29, 67] have been taken on the basis of dubious empirical findings. A more informative and nuanced approach to comparative cognition is to sample apes from varied backgrounds, comparing them to human children. This kind of protocol was pioneered by Tomasello and his colleagues [68, 69]. For example, Carpenter and her colleagues analyzed the joint attentional competencies of: (a) apes who were relatively inexperienced with humans, (b) apes who were relatively


T.P. Racine et al. / Conceptual and Methodological Issues

more experienced with humans, and (c) human children. They found that the experienced (enculturated) chimpanzees performed much more like human children than did the less experienced (unenculturated) chimpanzees, thus clearly implicating rearing history differences as more relevant to performance in joint attention than species differences [70, 71; for a similar apparent rearing history influence on two orangutans see 72; see 10 for relevant discussions of within-species effects of differential exposure of apes to human cultures]. Thus, when within-species variation in rearing history is properly accounted for in apehuman comparisons, the apparent influence of species as a factor in sociocognitive development is reduced, a conclusion also reached by Tomasello et al. [69]. This truism is underscored by the ease with which chimpanzees and other apes in captivity come to manipulate people in their environments, through pointing and other manual gestures, despite the fact that they virtually never point in the wild [71, 73-76]. Apes in captivity are sensitive to variations in the visual attention of their human caregivers, as evidenced by their requirement that a human is both present [71, 75] and looking at the apes [64, 71] before they display a gesture. Like human children, apes look back-and-forth between the objects of their points and their social partners [73, 74], and they persist in and elaborate their communication in the face of communicative failures [75, 77]. Because these behaviours also define the human developmental transition into intentional communication or secondary intersubjectivity, researchers sometimes balk at attributing the same kinds of mental representations to apes that they attribute to human infants. There is in fact no stronger empirical evidence for intentional communication in preverbal human children than in nonhuman primates.

5.4 Conclusions We have argued that developmental and comparative research into early social cognition has failed to adequately address the conceptual aspects of such investigations and has suffered because of it. Because many researchers seem to not have a clear handle on the distinctions between causal and definitional issues and the relation between inner and outer they have imported an overly mentalistic conception of social cognitive activity into their research designs. In our view, the failure to take rearing histories of nonhuman primates into account and the methodological problems that ensue from this error follow from this mentalistic conception of the mind as an inner entity that is logically distinct from activity, cultural surround, rearing history and so forth. That is, ostensible differences between chimpanzee and human minds make a lot more sense when one forgets about the relation between the inner and the outer. The central message from the comparative literature is that when the behavioural context is used as a basis for the attribution of the cognitive bases for communication, then humans and the great apes manifestly share many aspects of early social cognition. This obviates the need to invoke an evolutionary or ontogenetic deus ex machina for what are really rather simple cognitive processes. Despite the recent claims that Darwin was mistaken to argue for continuity in primate cognitive differences [3], in accordance with Darwinian theory, it is crucial that continuity between humans and other animals remains the null

T.P. Racine et al. / Conceptual and Methodological Issues


hypothesis, particularly in the face of the numerous methodological failures to convincingly demonstrate discontinuity.

5.5 References
[1] D. A. Leavens, W. D. Hopkins & K.A. Bard, The heterochronic origins of explicit reference. In J. Zlatev, T. P. Racine, C. Sinha & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity. Amsterdam: Benjamins, in press. [2] T. P. Racine & J. I. M Carpendale, The embodiment of mental states. In W. F. Overton, U. Mller & J. Newman (Eds.), Body in mind, mind in body: Developmental perspectives on embodiment and consciousness, (pp.159-190). Mahwah, NJ: Erlbaum, 2007. [3] D. C. Penn, K. J. Holyoak & D. L. Povinelli, Darwins mistake: Explaining the discontinuity between human and nonhuman minds. Behavioural and Brain Sciences, in press. [4] M. Tomasello, M. Carpenter, J. Call, T. Behne & H. Moll, Understanding and sharing intentions: The origins of cultural cognition. Behavioural and Brain Sciences, 28, 675735, 2005. [5] D. J. Povinelli & J. Vonk, Chimpanzee minds: Suspiciously human? Trends in Cognitive Sciences, 7, 157160, 2003. [6] M. Call Tomasello & B. Hare, Chimpanzees understand psychological statesthe question is which ones and to what extent. Trends in Cognitive Sciences, 7, 153156, 2003. [7] E. S. Savage-Rumbaugh, S. G. Shanker & J. T. Talbot, Apes, language, and the human mind. New York: Oxford University Press, 1998. [8] M. Tomasello & M. Carpenter, The emergence of social cognition in three young chimpanzees. Monographs of the Society for Research in Child Development, 70 (Serial No. 279), 2005. [9] M. Tomasello, M. Carpenter & U. Liszkowski, A new look at infant pointing. Child Development, 78, 705-722, 706, 2007. [10] D. A. Leavens, T. P. Racine & W. D. Hopkins, The ontogeny and phylogeny of non-verbal deixis. In C. Knight & R. Botha (Eds.), The cradle of language, 1: Multidisciplinary perspectives. Oxford: Oxford University Press, under review. [11] T. P. Racine, Computation, meaning and artificial intelligence: Some old problems, some new models. Canadian Artificial Intelligence, 50, 8-19, 2002. [12] T. P. Racine, Wittgensteins internalistic logic and childrens theories of mind. In J. I. M. Carpendale & U. Mller (Eds.), Social interaction and the development of knowledge, (pp.257276). Mahwah, NJ: Erlbaum, 2004. [13] T. P. Racine & J. I. M. Carpendale, Shared practices, understanding, language and joint attention. British Journal of Developmental Psychology, 25, 45-54, 2007. [14] T. P. Racine & J. I. M. Carpendale, The role of shared practice in joint attention. British Journal of Developmental Psychology, 25, 3-25, 2007. [15] N. Susswein & T. P. Racine, Sharing mental states: Causal and definitional issues in intersubjectivity. In J. Zlatev, T. P. Racine, C. Sinha & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity. Amsterdam: Benjamins, in press. [16] N. Susswein & T. P. Racine, Wittgenstein and not-just-in-the-head cognition. New Ideas in Psychology, in press [17] L. Wittgenstein, Philosophical investigations (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall, 1958. [18] M. Tomasello, M. Carpenter & U. Liszkowski, A new look at infant pointing. Child Development, 78, 705-722, 707-708, 2007. [19] L. Wittgenstein, Philosophical investigations (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall, 1958. [20] J. Call, B. Hare, M. Carpenter & M. Tomasello, Unwilling versus unable: Chimpanzees understanding of human intentional action. Developmental Science, 7, 488-498, 2004. [21] A. Machado, O. Loureno & F. J. Silva, Facts, concepts and theories: The shape of psychologys epistemic triangle. Behaviour and Philosophy, 28, 1-40, 25, 2000. [22] N. Susswein & T. P. Racine, Sharing mental states: Causal and definitional issues in intersubjectivity. In J. Zlatev, T. P. Racine, C. Sinha & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity, 11. Amsterdam: Benjamins, in press. [23] C. Moore, Show me the theory! British Journal of Developmental Psychology, 25, 39-43, 2007. [24] J. Fodor, Psychological explanation. New York: Random House, 1968. [25] N. Malcolm, In D. M. Armstrong & N. Malcolm, Consciousness and causality, 36. Oxford: Blackwell, 1984.


T.P. Racine et al. / Conceptual and Methodological Issues

[26] U. Liszkowski, M. Carpenter, A. Henning, T. Striano & M. Tomasello, Twelve-month-olds point to share attention and interest. Developmental Science, 7, 297307, 2004. [27] C. Trevarthen & P. Hubley, Secondary subjectivity: Confidence, confiding, and acts of meaning in the first year. In A. Lock (Ed.), Action, gesture and symbol: The emergence of language, (pp. 183229). London: Academic Press, 1978. [28] D. A. Leavens, W. D. Hopkins & K. A. Bard, Understanding the point of chimpanzee pointing: Epigenesis and ecological validity. Current Directions in Psychological Science, 14, 185-189, 2005. [29] D. J. Povinelli & T. J. Eddy, What young chimpanzees know about seeing. Monographs of the Society for Research in Child Development, 61 (Serial No. 247), 1996. [30] C. M. Johnson, Distributed primate cognition: A review. Animal Cognition, 4, 167-183, 2001. [31] Ibid, p. 167. [32] N. Susswein & T. P. Racine, Wittgenstein and not-just-in-the-head cognition. New Ideas in Psychology, 20, n press. [33] J. Call & M. A. Tomasello, Nonverbal false belief task: The performance of children and great apes. Child Development, 70, 381-395, 1999. [34] C. Moore & B. DEntremont, Developmental changes in pointing as a function of attentional focus. Journal of Cognition and Development, 2, 109-129, 2001. [35] D. A. Leavens, Having a concept see does not imply attribution of knowledge: Some general considerations in measuring theory of mind. Behavioural and Brain Sciences, 21, 123-124, 1998. [36] D. A. Leavens, On the public nature of communication. Behavioural and Brain Sciences, 25, 630631, 2002. [37] D. A. Leavens, Manual deixis in apes and humans. Interaction Studies, 5, 387-408, 2004. [38] C. Boesch, What makes us human (Homo sapiens)? The challenge of cognitive cross-species comparison. Journal of Comparative Psychology, 121, 227-240, 2007. [39] D. J. Povinelli, D. T. Bierschwale & C. G. Cech, Comprehension of seeing as a referential act in young children, but not juvenile chimpanzees. British Journal of Developmental Psychology, 17, 37-60, 1999. [40] D. J. Povinelli & T. J. Eddy, What young chimpanzees know about seeing. Monographs of the Society for Research in Child Development 61,(Serial No. 247), 122; 140, 1996. [41] H. F. Harlow, The formation of learning sets. Psychological Review, 56, 51-65, 1949. [42] J.Hodges & B. Tizard, Social and family relationships of ex-institutional adolescents. Journal of Child Psychology and Psychiatry, 30, 77-97, 1989. [43] T. G. OConnor, R. S. Marvin, M. Rutter, J. T. Ulrich, P. A. Britner & the English and Romanian Adoptees Study Team. Child-parent attachment following early institutional deprivation. Development and Psychopathology, 15, 19-38, 2003. [44] M. Rutter, Maternal deprivation. In M. H. Bornstein (Ed.), Handbook of parenting, 4: Applied and practical parenting, (pp.3-31). Mahwah, NJ: Erlbaum, 1996. [45] R. A. Spitz, Anaclitic depression. Psychoanalytic study of the child, 2, 313-342, 1946. [46] N. N. Ladygina-Kohts, In F. B. M. de Waal (Ed.), Infant chimpanzee and human child: A classic 1935 comparative study of ape emotions and intelligence. New York: Oxford University Press, 2001. [47] W. N. Kellogg & L. A. Kellogg, The ape and the child: A study of early environmental influence upon early behavior. New York: McGraw-Hill, 1933. [48] B. T. Gardner & R. A. Gardner, Two-way communication with an infant chimpanzee. In A. M. Schrier & F. Stollnitz (Eds.), Behavior of nonhuman primates: Modern research trends, 4, 117183. New York: Academic Press, 1971. [49] K. J. Hayes & C. Hayes, The cultural capacity of chimpanzees. Human Biology, 26, 288-303, 1954. [50] E. S. Savage-Rumbaugh, Ape language: From conditioned response to symbol. New York: Columbia University Press, 1986. [51] E. S. Savage-Rumbaugh & R. Lewin, Kanzi: The ape at the brink of the human mind. New York: John Wiley, 1994. [52] F. G. Patterson, Linguistic capabilities of a lowland gorilla. In F. C. C. Peng (Ed.), Sign language and language acquisition in man and ape: New dimensions in comparative pedolinguistics, (pp.161-201). Boulder, CO: Westview Press, 1978. [53] H. L. Miles, The cognitive foundations for reference in a signing orangutan. In S. T. Parker & K. R. Gibson (Eds.), "Language" and intelligence in monkeys and apes: Comparative developmental perspectives, (pp. 511-539). Cambridge: Cambridge University Press, 1990.

T.P. Racine et al. / Conceptual and Methodological Issues


[54] R. S. Fouts, A. D. Hirsch & D. H. Fouts, Cultural transmission of a human language in a chimpanzee mother-infant relationship. In H. E. Fitzgerald, J. A. Mullins & P. Gage (Eds.), Child nurturance: Studies of development in primates, (pp.159-193). New York: Plenum Press, 1982. [55] R. A. Gardner & B. T. Gardner, A cross-fostering laboratory. In R. A. Gardner, B. T. Gardner, & T. E. Van Cantfort (Eds.), Teaching sign language to chimpanzees, (pp.1-28). Albany: State University of New York Press, 1989. [56] K. A. Bard & D. A. Leavens, Socio-emotional factors in the development of joint attention in human and ape infants. In L. Roska-Hardy & E.M. Neumann-Held (Eds.), Learning from animals? London: Psychology Press, in press. [57] M. K. Temerlin, Lucy: Growing up human. London: Souvenir Press, 1976. [58] R. E. Van Cantfort, B. T. Gardner & R. A. Gardner, (1989). Developmental trends in replies to Wh-questions by children and chimpanzees. In R. A. Gardner, B. T. Gardner & T. E. Van Cantfort (Eds.), Teaching sign language to chimpanzees (pp. 198-239). Albany: State University of New York Press, 1989. [59] D. A. Leavens, W. D. Hopkins & K. A. Bard, The heterochronic origins of explicit reference. In J. Zlatev, T. P. Racine, C. Sinha & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity, 17. Amsterdam: Benjamins, in press. [60] D. J. Povinelli, D. T. Bierschwale & C. G. Cech, Comprehension of seeing as a referential act in young children, but not juvenile chimpanzees. British Journal of Developmental Psychology, 17, 37-60, 1999, see Fig. 7 and pp. 51-52. [61] E. Thomas, M. Murphy, R. Pitt, A. Rivers & D. A. Leavens, Understanding of visual attention by adult humans (Homo sapiens): A partial replication of Povinelli, Bierschwaleand Cech (1999), under review. [62] L. A. Theall & D. J. Povinelli, Do chimpanzees tailor their gestural signals to fit the attentional states of others? Animal Cognition, 2, 207-214, 1999. [63] J. Brauer, J. Call & M. Tomasello, All primates species follow gaze to distant locations and around barriers. Journal of Comparative Psychology, 119, 145-154, 2005. [64] A. B. Hostetter, M. Cantero & W. D. Hopkins, Differential use of vocal and gestural communication in response to the attentional status of a human. Journal of Comparative Psychology, 115, 337-343, 2001. [65] D. A. Leavens, A. B. Hostetter, M. J. Wesley & W. D. Hopkins, Tactical use of unimodal and bimodal communication by chimpanzees, Pan troglodytes. Animal Behaviour, 67, 467-476, 2004. [66] S. R. Poss, C. Kuhar, T. S. Stoinski & W. D. Hopkins, Differential use of attentional and visual communicative signaling by orangutans (Pongo pygmaeus) and gorillas (Gorilla gorilla) in response to the attentional status of a human. American Journal of Primatology, 68, 978-992, 2006. [67] D. J. Povinelli, J. M.Bering & S. Giambrone, Toward a science of other minds: Escaping the argument by analogy. Cognitive Science, 24, 509-541, 2000. [68] M. Carpenter, M. Tomasello & S. Savage-Rumbaugh, Joint attention and imitative learning in children, chimpanzees, and enculturated chimpanzees. Social Development, 4, 217-237, 1995. [69] M. Tomasello, E. S. Savage-Rumbaugh & A. C. Kruger, Imitative learning of actions on objects by children, chimpanzees, and enculturated chimpanzees. Child Development, 64, 1688-1705, 1993. [70] D. A. Leavens & W. D. Hopkins, The whole hand point: The structure and function of pointing from a comparative perspective. Journal of Comparative Psychology, 113, 417-425, 1999. [71] J. Call & M. Tomasello, The effect of humans on the cognitive development of apes. In A. E. Russon, K. A. Bard & S. T. Parker (Eds.), Reaching into thought: The minds of the great apes, (pp.371-403). Cambridge: Cambridge University Press, 1996. [72] J. Call & M. Tomasello, Production and comprehension of referential pointing by orangutans (Pongo pygmaeus). Journal of Comparative Psychology, 108, 307-317, 1994. [73] M. A. Krause & R. S. Fouts, Chimpanzee (Pan troglodytes) pointing: Hand shapes, accuracy, and the role of eye gaze. Journal of Comparative Psychology, 111, 330-336, 1997. [74] D. A. Leavens & W. D. Hopkins, Intentional communication by chimpanzees: A cross-sectional study of the use of referential gestures. Developmental Psychology, 34, 813-822, 1998. [75] D. A. Leavens, W. D. Hopkins & R. K. Thomas, Referential communication by chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 118, 48-57, 2004. [76] D. A. Leavens, J. L. Russell & W. D. Hopkins, Intentionality as measured in the persistence and elaboration of communication by chimpanzees (Pan troglodytes). Child Development, 76, 291306, 2005. [77] E.A. Cartmill & R.W. Byrne, Orangutans modify their gestural signaling according to their audiences comprehension. Current Biology, 17, 1345-1348, 2007.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


On the Nature and Role of Intersubjectivity in Human Communication

Maurizio TIRASSA, Francesca Marina BOSCO
Abstract. We outline a theory of human agency and communication and discuss the role that the capability to share (that is, intersubjectivity) plays in it. All the notions discussed are cast in a mentalistic and radically constructivist framework. We also introduce and discuss the relevant literature.

Contents 6.1 6.2 6.3 6.4 6.5 6.6 Introduction......................................................................................................... 81 The mental nature of human communication...................................................... 82 Human agency .................................................................................................... 88 Communication ................................................................................................... 91 Acknowledgment ................................................................................................ 93 References........................................................................................................... 93

6.1 Introduction Human communication is a complex type of interpersonal activity that is neither reducible to the mere use of language nor to just an instance of "general", undifferentiated intersubjectivity. While it is obviously related to the latter faculty, often related to the former, and almost always interleaved with both, it needs an analysis of its own. In this paper we will outline one such analysis. Since, of course, we are not the first to do so, we will also discuss the relevant literature. The main points that we will advance are: (i) human communication has to be understood in terms of the mental processes involved in it; such processes are, at least in part, specific to communication, so that it is better characterized as a faculty than as a task or as merely something that humans do; (ii) communication consists, at least in part, in the creation and the maintenance of a particular type of intersubjectivity which we will characterize in terms of public, or shared, meanings; public meanings have to be understood primarily


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

as part of the interactants' mental events and only secondarily as (a peculiar type of) material activity; (iii) all the notions involved should be cast in a mentalistic, biological, and radically constructivist framework.

6.2 The mental nature of human communication 6.2.1 Communication as message The first contemporary theory of communication was advanced by Shannon and Weaver [1]. In their account, a communicational event occurs when a sender codes a message into a signal and broadcasts the latter to a recipient, who decodes it and recovers the message contained. This theory relies on a realist conception of meaning and of the relations between mind and world and between mind and mind. Signals materially exist in the world and are, in a sense, independent of both the sender and the recipient. The relation between messages and signals is bidirectional and mechanical: given the one, the other is immediately available to whomever knows the code involved. In Shannon and Weaver's theory, furthermore, the interlocutors are separate: one launches her message like a signal in the bottle, with no expectations about the other recovering and interpreting it. All that is safe is that, if the signal survives the noise in the channel and gets recovered by someone who knows the code, it will be correctly interpreted. While clear traces of this approach survive in the theories that accept the notion of literal meaning and the separation between syntactic, semantic and pragmatic levels or components of communication, it is commonly said to have been integrally demised after the work of Wittgenstein [2], Grice [3], Austin [4], and Searle [5, 6]. These authors, rooted in philosophy rather than in engineering or cybernetics, and their followers in the different disciplines that study human communication have instead emphasized a view of communication as (a particular type of) social activity, grounded in cooperation and in the reciprocal recognition of agency and mentalization as well as, more recently, in the different notions that go under the label of intersubjectivity. Since the very notions of action, social action, and intersubjectivity are far from being clearly or unanimously defined, it may be worth to try drawing some distinction. 6.2.2 Communication as cooperation Grice [7] identifies some features of cooperation which he summarizes in a wellknown set of principles. These principles or maxims, whose nature some have considered descriptive and others normative, are rooted in more general principles of rationality and embodied in the reality of human interactions. This conception has been highly influential in successive theorizing. Most research on communication in classical artificial intelligence and cognitive science [8, 9, 10] has substantially mapped the notion of communication onto the contents of dialogue and the latter onto the joint activities that can be carried about by the interactants. Examples are conversations between a novice and an expert about the

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


maintenance of an appliance, or between a clerk in the information booth of a railway station and a traveler about departure and arrival timetables. Here communication is intrinsically cooperative, because so are the collaborative plans in the service of which it exists. The problem with this approach is that strictly task-oriented dialogues are only a small subset of the human possibilities of communication. To map the latter onto the former means to miss all the cases where communication is not in the service of a predefined joint task. Furthermore, there is no reason to think that benevolence and collaboration are built-in features of communication or anyhow intrinsic to it or necessary for it: (1) Ann: You are You are I just can't find the words to express my anger! Bob: "Moron" seems too weak here. What about "filthy scumbag" or "dirty rat"? However, many researchers argue that communication is a collaborative activity even when the broader activities in which it is embedded are not, and that the view that "good old-fashioned AI" has of the role of cooperation in communication is not the only possible. For example, Airenti, Bara e Colombetti [11] draw a distinction between a level of cooperation that they call behavioral and one that they call conversational. The former concerns the more or less collaborative nature of the individual action plans which each interlocutor entertains; the latter concerns the forms of the dialogue to which such plans give rise. Cooperation exists on the conversational level even when it does not occur on the behavioral level: (2) Ann: Listen, Bob, can you please lend me a couple thousand euro? Bob: I am very sorry, Ann, but I've had some expenses lately. According to Airenti, Bara and Colombetti, the two types of cooperation have different origins and ought to be understood on different grounds. Behavioral cooperation is related to the unfolding of the individual plans and the social events in which the interlocutors are engaged; conversational cooperation is instead related to the partly joint management of the processes involved in the generation and the understanding of the relevant speech acts. Only the latter would be intrinsic to communication proper. Actually, it is often argued that human communication consists in, or at least includes, events that are collaborative not, or not necessarily, on the level of the individual macro-plans (like trying to borrow, or refusing to lend, amounts of money), but also, or exclusively, on the level of the material actions brought about within the dialogues to which such plans give rise (like asking questions or giving replies). For example, researchers in ethnomethodology and conversational analysis have empirically identified and described collaborative phenomena occurring in the management of turn-taking, that is of how the interlocutors trade and exchange their respective turns of intervention in the ongoing conversation [12], and of the repair system, that is of how they amend a troublesome turn or request that it be amended by the partner [13, 14, 15]. These studies have then been generalized to the study of adjacency pairs: couples of turns, produced by two different


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

participants, the second of which is conditionally relevant, given the first. It turns out that in real conversations, like in dancing or in shaking hands, the actions of each participant are tightly coupled to the actions of the other(s) and can only be fully understood in their light. Conversation thus appears to be an interactional micro-world that follows rules of its own, relatively independent of other events and of the overall mental dynamics of the interactants. This way, communication has ended up to be viewed as taking place on a common ground [16, 17, 18, 19] made up of the set of utterances produced by the interactants up to the present time, possibly with their presuppositions and implicatures. Utterances are material joint actions, emerging from the intertwinement of the partial actions that are produced by each participant: such partial actions have neither structure nor sense if taken in isolation, but acquire both structure and sense as they are interwoven with the corresponding actions produced by the partner(s). The intrinsic structure of communication thus consists largely in the construction, management and maintenance of the common ground. The ability to move on a ground which is in common with a partner is then viewed as a particularly important feature of the more general human capacity for intersubjectivity. We thus arrive to one of the possible meanings of this notion: here, intersubjectivity is defined as the capability to share and bring about joint collaborative actions with a partner [20, 21, 22, 23]. These acceptations of communication and intersubjectivity give rise to some problems. The first is that they only apply to interactions which take place in copresence (at least virtually, if telephone conversations are to be included in the picture) and in which all participants have equal rights of intervention. This, however, is not always the case. Human beings can communicate beyond the barriers of time and space: they leave notes and write documents for someone else to read in an elsewhere and an elsewhen which they are often unable to foresee, they give lectures where it is considered impolite of someone in the audience to interrupt the speaker, they broadcast television news when such interruption just cannot possibly occur, they send messages in bottles, and so on. We do not want to deny the importance of face-to-face interactions, or the reasonable hypothesis that they have been the first communicative mode evolved in our species; yet, it would be a mistake to define human communication by looking at their local features. It might be objected that, when communicating unidirectionally, we somehow simulate or impersonate the participation of the audience: this is likely very close to what really happens, but, since such personation seldom, if ever, manifests itself as material actions of simulated co-participation, we are left again with the need to describe communication in terms of its underlying mental dynamics, and not of the material actions that may or may not represent its behavioral counterpart. The general point here is that communication has to be understood primarily as a mental phenomenon, rather than a material one. Suppose that one morning, while Bob is dressing to go to work, Ann looks out of the window and says: (3) Ann: It looks like it's going to rain and that Bob then decides to take an umbrella with him. Bob's problems are: should I consider what Ann has done as communicative, for a start? and why has she said that? what stance should I take toward what she has meant? By necessity,

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


his interpretation of Ann's action as a suggestion to take an umbrella will be uncertain: for what he knows, she might have meant something very different, like "don't bother to water the flowers before going to work." To make things worse, Ann might have done something much more ambiguous to the same effect, like moving the tent away from the window so to let Bob see a cloudy sky, or cranking up the volume of the television during the weather forecasts. Sometimes, even a non-action can have a highly communicative value and thus become a communicative action proper: (4) Ann: I love you so much, Bob. Bob remains silent and keeps eating his soup. None of these actions or of their effects is reducible to purely material terms. Understanding an utterance is a matter of abduction: basically, it is a diagnostic process whereby we reconstruct a meaning starting from scarce and often ambiguous hints, and this process is a mental one. That it is grounded in the individual's interactions with the environment and with other individuals does neither make it less mental nor eliminate the need to consider the individual mind as the proper object of investigation of psychology. In general, no list of behaviors with their contexts of occurrence may substitute for a mentalistic theory of the mental dynamics involved in their generation. 6.2.3 Communication as mindreading Another interesting stream of research on communication, more mentalistically oriented than conversation analysis and studies of the common ground, substantially identifies communication and mindreading (for theoretical reasons, we prefer this neutral term to the more classical "Theory of mind"). This approach traces back to Grice's analysis of non-natural (that is, Intentional, or communicative) meaning [24]. Grice defines communication as an overt interaction between two (or more) agents, one meaning something by a certain action in a certain context and the other(s) inferring from the observation of that action to its presumed communicative meaning. Communicative meaning is the effect that the first agent overtly intends to achieve on the partner's mental processes. Let us reconsider the episode outlined in (3) above: Ann says "It looks like it's going to rain", and Bob takes an umbrella with him while going out. In Grice's account, as spelled out by Strawson [25], this is a case of (successful) communication iff Ann, by her utterance, (i) intends to induce Bob to take an umbrella with him, (ii) intends Bob to recognize intention (i), (iii) intends such recognition to be (part of) Bob's reason for taking an umbrella with him, and Bob recognizes Ann's intentions (i iii) in his turn. When Grice wrote his seminal paper the expression "Theory of mind" [26] had not been invented yet, nor there existed a corresponding research area; yet, his account is fully compatible or straightforwardly identifiable with the idea that human communication is largely or exclusively based upon mindreading. Several theories of communication are founded on this assumption (e.g., [27], at least as revised in [28]). In [29], for example, an agent's actions in dialogue result from the interaction of her cognitive dynamics (that is, basically, the mental states


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

that she entertains moment by moment) with those that she ascribes to her partner(s). A communicating agent's subjectively viewed situation (see Section 3) includes the partner's presumed mental states; her actions consist in speech acts. While it is impossible to know the details of each other's respective situation and mental states, agents must be able to understand at least an outline of them. Mindreading thus gains a crucial role in communication, the other key element of which is the capability to plan and produce speech acts so appropriately that the partner's mental states are modified as desired. Some researchers are very critical of the notion of a Theory of mind: e.g., Gallagher [30] and Gallagher and Hutto [31] reject it on the basis of their phenomenological approach to intersubjectivity. This appears to also imply a rejection of the idea that human communication builds upon mindreading. However sympathetic with these perspective and proposals, we do not feel that the notion of mindreading is completely devoid of usefulness [32], at least while it is not cast in classical cognitive or modular terms. We can imagine Bob wondering whether Ann actually wanted to suggest that he take an umbrella when going to work or that he do not water the flowers before going. In general, it is normal for humans to ask themselves and the others explicit questions about someone's "real" thoughts and feelings and to look for rational answers to them. We agree that mindreading heavily leaks into a narrative experience, but we do not think that narration and theorization should necessarily be antagonistic notions. The real problem, as was the case with cooperation, is whether a theory of communication can be built upon mindreading. There can be no doubt that we sometimes recur to mindreading in communication (as there can be none that we often materially cooperate with our partners in the management of common ground during face-to-face interactions). However, it is hard to believe that, each time a colleague or a student of ours says "hello" upon meeting us in the corridor, we remain unable to understand the meaning of that utterance until we have reconstructed what that person's mental states might have been when she uttered it. Another argument against the view that communication builds on mindreading comes from developmental considerations. The discussion on the ontogenesis of mindreading has a long and articulated history that we will not attempt to summarize here (but see [32, 33]). However, most empirical data currently available agree that infants are incapable to read minds at least during their first 9-12 months of life. If mindreading were a necessary component of human communication, or of social cognition in general, infants younger than that would turn out to be incapable to communicate with their caregivers and to understand the communication that the caregivers address to them [34]. This is impossible, because this would prevent them from participating in the interpersonal dynamics that are necessary for their development as persons and as members of the human species and of their cultural community. To divide the human capability of intersubjectivity into components or into logically, ontogenetically or phylogenetically successive phases [35, 36] does not help with this problem, because Grice's theory and its descendants identify communication with what is anyway the most evolved component of mindreading or the final phase of its development, that is the capacity to form explicitly beliefs about and to reason upon a partner's mental states.

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


Nor, for the reasons we have already discussed, would the problem be solved by grounding early communication into the material interactions that the infant has with the caregivers [37, 38, 39]. 6.2.4 Sharedness in communication In Grice's account [24], as outlined above, the brief episode in (3) is a true instance of (successful) communication if and only if Ann, by uttering that sentence, entertains a certain set of intentions (i iii) regarding Bob's mental states and Bob entertains a matching set of beliefs regarding Ann's intentions. However, this account lends itself to certain counterexamples (concerning in particular keyhole recognition) that can only be avoided if Ann also entertains an intention (iv) that her intention (ii) be recognized, an intention (v) that her intention (iv) be recognized, and so on, and if Bob entertains the corresponding set of beliefs [25, 40]. This leads into an infinite regression whereby, for any n-th intention that the agent entertains, it is always necessary that she also entertain an (n + 1)-th intention that that intention be recognized, and that the partner recognize all such intentions. This is obviously impossible for principled and practical reasons. A solution to this problem has been proposed by Airenti, Bara and Colombetti [11], who define common knowledge as a primitive, circular mental state type which they call shared belief: an agent shares the belief that p with a partner if she believes that p and that the partner shares the belief that p with her. Communication (that is, conversational cooperation see above) is a joint activity that takes place in the space that an agent shares with a partner. So defined, shared belief is a mental state among the others [41]: it is subjective (that is, one-sided no collective mind is required), primitive (that is, irreducible to private beliefs), and representational (that is, relative to the viewpoint of the agent who entertains it, and not to that of the partner's or to "objective truth"). An agent has neither the need nor the possibility to know what is "objectively" shared with a partner. Being ascriptional, shared belief does not require fancy abilities like telepathy or an endless circularity of reciprocal confirmations; nor does it require or allow any more reference to "objective" facts in the external world than ordinary beliefs do. It may happen that I take p to be shared with you, whereas you do not believe p or do not take p to be shared with me. The failure of a (supposedly) shared belief may give rise to different kinds of problems, exactly like the failure of a private belief, but does not create more cognitive or epistemological difficulties than it does. Sharedness is in the agent's mind, not in the world. The meaning of a communicative action, and even its communicative nature, is therefore, from the standpoint of the addressee, a matter of ascription. That is, Bob may wrongly take Ann's behavior as communicative or vice versa, or as communicative that q, while Ann meant to communicate that p. This account captures the overt and circular nature of communication in a psychologically plausible way. Sharedness is an agent's ability to construe her own mental states as mutually known to a partner. This is the starting point of communicative interaction, which may then be viewed as the progressive modification of the mental ground that each participant shares with the partner.


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

6.2.5 Communicative competence in the infant and the adult The solution that we have advanced elsewhere [32, 33, 34] to the problem of infant communication is to employ a reformulation of Airenti, Bara and Colombetti's notion of shared belief to account for communication in the first months of life in such a way that children be viewed as fully human from the beginning, although apparently incapable of reading minds. Our proposal is to view communication as an innate competence, one component of which namely, the ability to share is present at birth, albeit in an early version, while another namely, mindreading appears at a later age. On our account, infant communication in the absence of mindreading is then possible if the child construes all of her mental states as shared with the caregivers. This is in agreement with the empirical evidence that the infant is incapable of understanding that other individuals have mental states of their own that are qualitatively similar but not necessarily identical to those that she entertains. While the classical interpretation of these data is that she must therefore live alone in a subjective world of which she in the better case is the only inhabitant endowed with a mind, ours is that she lives instead in an ever-social world where everybody simply and directly knows her feelings and thoughts. In her perspective, all of her experiential states would be intrinsically public, that is, shared with the individuals that surround her. An infant thus has no private, non-social mental states; to her, intersubjectivity and communication are a plain state of the world rather than a local, transient occurrence. This only requires a primitive recognition of agency, a capability that, according to the relevant literature, can be safely ascribed to infants not older than a few weeks [42, 43, 44, 45]. An elder child's or an adult's ability not to construe all of her mental states as shared is made possible by the later development of the capability to differentiate one's own mental states from those that may be ascribed to the partner. Mindreading then builds on the latter development. The idea that sharedness is a primitive capability of the human mind has a certain amount of empirical support and contributes to founding a view of communication as a faculty, or competence, in its own respect [46]. It is crucial to note here that, on this account, this ability is a mental one. Sharedness is mental: like everything mental, it reflects in the individual's actions, but cannot be recovered from the empirical or material levels alone.

6.3 Human agency An agent is a conscious organism who lives in a dynamic situation and strives to make it more to her liking; the situation is a subjective, open, and continuingly revised interpretation of the environment [29, 46]. An agent's mind consists in a flow of consciousness, that is, in a flow of subjective, meaningful representations. For our current purposes we, like other researchers [47, 48, 49], conceive of terms like mind, consciousness, representation, semantics, and Intentionality as synonymous. In the case of the human species, the mind consists in a flow of meaningful representations of the agent herself immersed in and interacting with her subjective environment as it is, was, or could be.

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


The agent's subjective situation is a dynamical landscape of meanings; meanings are opportunities for actions (affordances, in Gibson's terms [50, 51]). Representations neither need nor can be faithful to the real world. They are active constructions that the agent makes of an ultimately unknowable reality, by superimposing a subjective ontology [52] on it that may comprise different types of objects, relations, events, and actions. An agent's subjective ontology and the representations into which it is embedded just need be compatible with the external reality, whatever its ultimate nature may be. An agent's subjective ontology and representations result from its phylogenetic and ontogenetic history as well as from the reading that she makes of the current situation. That is, there is no a priori catalogue of discrete, pre-defined entities given once and for all and kept in a repository from which they are extracted and employed when needed; instead, the mind is continuingly re-created in the agent's here and now [53, 54]. More specifically, the human mind entertains some interesting properties. Our mental life is structured along two interwoven dimensions that may be called experience and description, or narration [55, 56]. Every experience of ours incorporates a description that rises from it and allows for its form, structure and sense and that results from a mix of (fragments of) logical, causal, and psychological explanations, retrograde reconstructions and anterograde projections, linguistic labeling and redescriptions, narrative integrations and so on. Actually, it is imprecise to say that experience and description are interwoven: they structure and determinate each other in a circular way, so much so that it is impossible to keep them separate, except for descriptive purposes. Phenomenically, they are one and the same thing. The idea that human cognition is such a complex but unitary dynamics traces back in modern science at least to Michotte's demonstration that his subjects "directly" perceived causality even when there was none and incorporated it into their visual experience, to the point of being unable not to do so [57]. Causality thus becomes one of the crucial structural components of the subjective ontology of the human species and therefore of the world we perceive.1 When we see or think of a car we cannot help sensing features to it that go beyond its mere visual appearance: under normal conditions, we can recognize it as an artificial object, namely a machinery of sort, we can assign a linguistic label to it, we know what it is for and how to use it, we have a sort of memory or bodily image of what it feels like to drive it or to travel in it, we have at least a rough idea of its material structure and monetary value, and so on. Our knowledge about the car is not distinct from our visual perception or imagery of it: like its shape or color, it is an ineliminable part of our perception of it. Exactly like we cannot possibly see the car with a shape different from what it appears to us, we cannot see it without recognizing it as a car, knowing that it is made for driving, and so on; and, exactly like its color or its shape, the knowledge that we have of it arises in and from the interaction that we have with it as well as with other encounters with or descriptions of cars that we may have faced in the past.

In Michotte's experiments, subjects who were shown, for example, cartoons depicting couples of abstract forms moving could not help interpreting them in terms of, e.g., the triangle "pursuing" the square, who was "waiting for it" or "fleeing from it", and so on. In today's terms, what Michotte was exploring actually was the human perception of Intentional causation, that is mindreading.


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

Still more pervasively, our knowledge of cars modifies a whole range of activities and creates new ones. Our perception of what places can be reached within a certain time, how, and with how much effort, changes, and so do our representations of the territory within which we act or of our professional or social activities. Even our perception of the physical space that our body occupies changes when we drive. In general, our knowledge of cars modifies our representation of the world and of ourselves in the world. Such modification is not supplementary to a supposedly "basic functioning" of our mind: there is no way I can divide my experience of driving into an experience of me-without-car, plus a car with no experiential connotations, plus a superordinate description of the whole business. There exists instead the complex experience of me-in-the-car-in-the-street involved in the complex activity of driving while narrating to myself what is happening, how, and why, and what has happened immediately before, and what is going to happen next and what I can do about it. Actions are the external counterpart of these mental dynamics. When I meet a friend, I can rejoice, smile, and shake hands with him. This happens because I represent and narrate the whole situation in which I find myself as characterized by certain features, to which I react by forming certain emotions, desires and intentions. This leads me to engage in a social activity in the end, what happens is that I walk toward my friend, smiling and offering him to shake hands. An agent's cognitive dynamics across time thus results from the interaction of her mind/body with the subjective internal and the external environments. The specific patterns with which this happens are rooted in her phylogenetic and ontogenetic history, as well as in her current interests and feelings. What can be said in general terms is that they depend on the worldviews that the agent maintains. Worldviews are frameworks of interpretation that provide for the meaning that a certain situation and its current features have for a certain agent at a certain time. For example, my intention to rise from this chair, go to the fridge and take a beer only makes sense because it is part of my current worldview that I am sitting, that I might use a beer, that there is one in the refrigerator, that the floor that lies between me and the refrigerator will sustain me while I walk, that I will be able to open the refrigerator and to recognize and grab the beer can, and so on. An agent's engagement in an activity is sensibly understood only against the background provided by such worldviews [53, 54, 58]. Worldviews need not be fully represented for an agent to represent, narrate and engage in an activity; indeed, they typically are not. We usually do not even conceive of the possibility that the floor of our kitchen is not so solid as it seems; nonetheless, it is because we take it for granted that it is that we can engage in the beer-taking activity. Still, as adult human beings we can always focus on some features of the worldviews in which we are currently engaged and possibly reason upon them or verbalize them: but this is a mental and social activity in itself, in which language and education (and, in general, ontogeny) play a key role, and is not necessarily part of the beer-taking activity. Indeed, most of the times we drink beer without feeling any need to verbalize the worldviews that underlie such activity and,

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


even when we decide to do so, we can only focus on a small subset of the features of our worldviews.2

6.4 Communication What, then, is communication in the human species, and what role does sharedness play in it? There are, in our view, three such roles. Firstly, sharedness has to be part of the current worldview, that is, of the background within which we participate in communication, producing and interpreting the relevant actions. Secondly, it may be a mental state or part thereof, that is, something which is present to the agent's awareness and which the agent can reason upon and verbalize. Thirdly, it plays a manifest role in the artifacts that we can materially use to communicate. Let us examine these roles in better detail. The first is that sharedness has to be part of the communicating agent's worldview. Our communicative acts take place within the framework provided by sharedness, without our being necessarily aware of it. When we engage in a casual conversation with a colleague in the elevator that is bringing us to our story, we do not focus on sharedness, but on the actual topic of conversation. Under normal conditions, we are not even aware that there is an issue of sharedness at play here. Yet, we speak Italian, even if the both of us also speak English and French, we use kind words and a gentle voice, we trade references to previous experiences we had together, we laugh about other colleagues, and so on all feats that we accomplish without even realizing that they are possible and meaningful only because we take it for granted that our interlocutor and we share similar knowledge, memories, feelings, mental dynamics, etc. Our partner does exactly the same. None has access to the other's mind, none is likely to really question the status of sharedness, and yet communication flows smoothly. And, of course, the same happens when we write a paper (although this is a much more troublesome activity) or a message to put in a bottle and launch into the wide ocean. Sharedness is part of the worldview that we are adopting, and can thus provide the framework of interpretation within which the various communicative acts that we exchange acquire their meaning. However, it is not necessarily part of our conscious states, or even of our engagement in and represention and narration of that conversation. Yet, we can always focus on sharedness and reason upon it or verbalize it, e.g., when we realize that a breakdown has occurred in conversation. When sharedness is actually present to our mind, that is when we become aware that our partner and we are moving or failing to move on a shared mental ground, then an analysis in terms of private and shared beliefs may be appropriate. Analyses in terms of mental states have been standard practice in theoretical studies of communication for half a century, at least since Grice's paper [24] which we have repeatedly mentioned (see [46] for a discussion of the structure and

Our notion of worldviews may resemble Searle's (1983) notion of Background. While we have no space to discuss the similarities and differences, we have opted for a different label because we think that there are certain difficulties inherent to Searle's Background which we do not want to inherit here. Our notion of worldviews is meant to stand on an autonomous ground.


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

import of mental states talk in this area). Interestingly, even areas traditionally as far removed from phenomenology or from a holistic conception of cognition as could be, like classical cognitive science and artificial intelligence, have found themselves in the need to adopt a BDI (Belief-Desire-Intention [59]) paradigm for the study of private and social action, including communication. Communicative meanings are the material counterparts of sharedness, and this is the third role that such notion plays in our analysis.3 Independently of their material appearance, communicative meanings are virtual artifacts that are produced and function in the mental space that the interactants share. Thanks to the experience/description dynamics we have outlined, and to the capability of understanding and manipulating such dynamics, humans are capable to summarize their situations (whether actual or not, and even insincerely) into partial descriptions and then possibly to act so that, given the grounds provided by sharedness, the mental dynamics of other humans are properly modified. This is independent of whether it occurs with language, gestures, or even sheer silence, or of the time and space that may separate the interlocutors. Mindreading only enters the process when the partner happens to wonder what the actress's mental dynamics really were as she produced a certain communicative meaning. A rewording of this idea might be as follows: communicative meanings are reifications of the actress's situation that are externalized in a form that may become public knowledge of all the parties involved. When everything works, such public knowledge interferes with the mental dynamics of the partner(s), modifying them in the direction that the actress desired. Communication thus takes place when an actress overtly tries to interfere with some other agent's situation. "Overtly" means that a partial comprehension of the actress's situation is intentionally shown to the partners and thus made part of their situation. With communication, part of each agent's situation is subject to the others' scrutiny and partial control. This is only possible in a species whose members are capable (i) of sharedness in the different forms we have outlined, and (ii) of externalizing a description of appropriately chosen features of their situation. Communicative meanings are partial (and not necessarily sincere) descriptions of the actress's mental dynamics, overtly reified and externalized so that the partners' mental dynamics are modified. The partner may or may not materially cooperate with this operation: in face-to-face conversations this typically (but not necessarily) happens, but in other situations it does not yet communication takes place and can be successful in the latter case as well as in the former. During this activity, the public knowledge at play and the peculiar nature of the human mind allow the participants to "zoom in" and "out" on their respective mental states and worldviews (as well as, of course, on the actual topics of communication). Thus emerges the "choreographic" nature of communication. Such choreography, however, is seldom planned as such; rather, it emerges from the relative commonality and predictability of the participants' respective mental dynamics, as well as from social customs and conventions.

We have already argued that "material", in this context, may have a peculiar meaning see the example in (4) and the relevant discussion.

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


6.5 Acknowledgments Research and scholarly activities for this paper were funded by the Fondazione Cassa di Risparmio di Torino (CRT). We are grateful to Francesca Morganti for her endless patience and, together with Antonella Carassa, Livia Colle and Marianna Vallana, for several discussions about the topics dealt with herein.

6.6 References
[1] C.E. Shannon & W. Weaver, The mathematical theory of communication. Urbana, IL: University of Illinois Press, 1949. [2] L. Wittgenstein, Philosophische Untersuchungen (Philosophical investigations). Oxford: Blackwell, 1953 [3] H.P. Grice, Studies in the way of words. Cambridge, MA, and London: Harvard University Press, 1989. [4] J.L. Austin, How to do things with words. London: Oxford University Press, 1962 (2nd ed. revised by J.O. Ormson & M. Sbis, 1975). [5] J.R. Searle, Speech acts: An essay in the philosophy of language. London: Cambridge University Press, 1969. [6] J.R. Searle, Expression and meaning. Cambridge: Cambridge University Press, 1979. [7] H.P. Grice, Logic and conversation. In P. Cole & J.L. Morgan (Eds.), Syntax and semantics, vol. 3: Speech acts, New York: Academic Press, 1975. [8] P.R. Cohen & C.R. Perrault, Elements of a plan-based theory of speech acts. Cognitive Science, 3, 177-212, 1979. [9] J.F. Allen & C.R. Perrault, Analyzing intention in utterances. Artificial Intelligence, 15, 143-178, 1980. [10] P.R. Cohen, J. Morgan & M.E. Pollack (Eds.), Intentions in communication. Cambridge, MA: MIT Press, 1990. [11] G. Airenti, B.G. Bara & M. Colombetti, Conversation and behavior games in the pragmatics of dialogue. Cognitive Science, 17, 197-256, 1993. [12] H. Sacks, E.A. Schegloff & G. Jefferson, A simplest systematics for the organization of turntaking in conversation. Language, 50, 696-735, 1974. [13] E.A. Schegloff, G. Jefferson & H. Sacks, The preference for self-correction in the organization of repair in conversation. Language, 53, 361-382, 1977. [14] E.A. Schegloff, The relevance of repair to syntax-for-conversation. In T. Givon (Ed.), Syntax and semantics 12: Discourse and syntax. New York: Academic Press, 1979. [15] E.A. Schegloff, Repair after next turn: The last structurally provided defense of intersubjectivity in conversation. American Journal of Sociology, 97, 1295-1345, 1992. [16] H.H. Clark & D. Wilkes-Gibbs, Referring as a collaborative process. Cognition, 22, 1-39, 1986. [17] H.H. Clark & Schaefer, E.F. (1989) Contributing to discourse. Cognitive Science, 13, 259-294, 1989. [18] H.H. Clark, Arenas of language use. Chicago, IL: University of Chicago Press, 1992. [19] H.H. Clark, Using language. Cambridge: Cambridge University Press, 1996. [20] M. Tomasello, The cultural origins of human cognition. Cambridge, MA & London: Harvard University Press, 1999. [21] M. Tomasello & H. Rakoczy, What makes human cognition unique? From individual to shared to collective intentionality. Mind and Language, 18, 121147, 2003. [22] I. Brinck & Grdenfors, Co-operation and communication in apes and humans. Mind and Language, 18, 484501, 2003. [23] I. Brinck, The role of intersubjectivity for intentional communication. In T. Racine, C. Sinha, J. Zlatev & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity. Amsterdam: Benjamins, in press. [24] H.P. Grice, Meaning. The Philosophical Review, 67, 377388, 1957. [25] P.F. Strawson, Intention and convention in speech acts. The Philosophical Review, 73, 439-460, 1964. [26] D. Premack & G. Woodruff, Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1, 512-526, 1978. [27] D. Sperber & D. Wilson, Relevance. Communication and cognition. Oxford: Blackwell, 1986.


M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity

[28] D. Wilson & D. Sperber, Pragmatics and modularity. In S. Davis (Ed.), Pragmatics. A reader. Oxford: Oxford University Press, 1991. [29] M. Tirassa, Mental states in communication. Proceedings of the 2nd European Conference on Cognitive Science (ECCS '97). Manchester, UK, 1997. [30] S. Gallagher, The practice of mind: Theory, simulation, or interaction? Journal of Consciousness Studies, 5-7, 83-108, 2001. [31] S. Gallagher & D. Hutto, Understanding others through primary interaction and narrative. In T. Racine, C. Sinha, J. Zlatev & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity. Amsterdam: Benjamins, in press. [32] M. Tirassa, F.M. Bosco & L. Colle, Rethinking the ontogeny of mindreading. Consciousness and Cognition, 15, 197-217, 2006. [33] M. Tirassa, F.M. Bosco & L. Colle, Sharedness and privateness in human early social life. Cognitive Systems Research, 7, 128-139, 2006. [34] F.M. Bosco & M. Tirassa, Sharedness as an innate basis for communication in the infant. Proceedings of the 20th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum, 1998. [35] S. Baron-Cohen, Mindblindness. Cambridge, MA: MIT Press, 1995. [36] P. Grdenfors, Evolutionary and developmental aspects of intersubjectivity. In H. Liljenstrm & P. rhem (Eds.), Consciousness transitions: Phylogenetic, ontogenetic and physiological aspects. Amsterdam: Elsevier, in press. [37] J.S. Bruner, Formats of language acquisition. American Journal of Semiotics, 1, 1-16, 1982. [38] G. Airenti, Dialogue in a developmental perspective. Proceedings of the 6th Conference of the International Association for Dialogue Analysis. Tbingen: Niemeyer, 1998. [39] G. Airenti, The development of the speaker's meaning. In C. Florn Serrano, C. Inchaurralde Besga, M.A. Ruiz Moneva (Eds.), Applied linguistics perspectives: Language learning and specialized discourse. Zaragoza: Anubar, 2004. [40] S.R. Schiffer, Meaning. Oxford: Oxford University Press, 1972. [41] B.G. Bara & M. Tirassa, A mentalist framework for linguistic and extralinguistic communication. Proceedings of the 3rd European Conference on Cognitive Science. Roma: Istituto di Psicologia del Consiglio Nazionale delle Ricerche, 1999. [42] C. Trevarthen, Descriptive analyses of infant communicative behavior. In H. Schaffer (Ed.), Determinants of infant behavior. London: Academic Press, 1977. [43] L. Murray & C. Trevarthen, Emotional regulation of interaction between two-month-olds and their mothers. In T.M. Field & N.A. Fox (Eds.), Social perception in infant. Norwood, NJ: Ablex, 1985. [44] A.M. Leslie, ToMM, ToBy, and Agency: core architecture and domain specificity. In L.A. Hirschfeld & S.A. Gelman (Eds.), Mapping the mind. Domain specificity in cognition and culture. Cambridge: Cambridge University Press, 1994. [45] D. Premack, The infant's theory of self-propelled objects. Cognition, 36, 1-16, 1990. [46] M. Tirassa, Communicative competence and the architecture of the mind/brain. Brain and Language, 68, 419-441, 1999. [47] J.R. Searle, The rediscovery of the mind. Cambridge, MA: MIT Press, 1992. [48] F. Varela, A science of consciousness as if experience mattered. In S.R. Hameroff, A.W. Kaszniak & A.C. Scott (Eds.), Toward a science of consciousness: The first Tucson discussions and debates. Cambridge, MA: MIT Press. [49] F. Varela, E. Thompso & E. Rosch, The embodied mind. Cognitive science and human experience. Cambridge, MA: MIT Press, 1991. [50] J.J. Gibson, The theory of affordances. In R.E. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing. Hillsdale, NJ: Erlbaum, 1977. [51] J.J. Gibson, The ecological approach to visual perception. Boston, MA: Houghton Mifflin, 1979. [52] M. Tirassa, A. Carassa & G. Geminiani, A theoretical framework for the study of spatial cognition. In S. Nuallin (Ed.), Spatial cognition. Foundations and applications. Amsterdam and Philadelphia: Benjamins, 2000. [53] A. Carassa, F. Morganti & M. Tirassa, Movement, action, and situation: Presence in virtual environments. Proceedings of the 7th Annual International Workshop on Presence. Valencia, Spain: Editorial Universidad Politcnica de Valencia, 2004. [54] A. Carassa, F. Morganti & M. Tirassa, A situated cognition perspective on presence. Proceedings of the 27th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum, 2005. [55] V.F. Guidano, Complexity of the Self: A developmental approach to psychopathology and therapy. New York: Guilford, 1987.

M. Tirassa and F.M. Bosco / On the Nature and Role of Intersubjectivity


[56] V.F. Guidano, The Self in process. Toward a post-rationalist cognitive therapy. New York: Guilford, 1991. [57] A. Michotte, La perception de la causalit. Louvain: ditions de l'Institute Superieur de Philosophie, 1946. [58] W.J. Clancey, The conceptual nature of knowledge, situations and activity. In P.J. Feltovich, K.M. Ford & R.R. Hoffmann (Eds.), Expertise in context. Cambridge, MA: AAAI Press/MIT Press, 1997. [59] A. Rao & M. Georgeff, An abstract architecture for rational agents. Proceedings of KR 92: The 3rd International Conference on Knowledge Representation and Reasoning. San Mateo, CA: Morgan Kaufmann, 1992.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Enacting Interactivity: The Role of Presence

Giuseppe RIVA

Abstract: The chapter presents a conceptual framework that links the enaction of our intentions to the understanding of other peoples intentions through the concept of Presence, the feeling of being and acting in a world outside us. Specifically the chapter suggests that humans develop intentionality and Self by prereflexively evaluating agency in relation to the constraints imposed by the environment (Presence): they are present if they are able to enact in an external world their intentions. This capacity also enables them to go beyond the surface appearance of behavior to draw inferences about other individuals intentions (Social Presence): others are present to us if we are able to recognize them as enacting beings. Both Presence and Social Presence evolve in time, and their evolution is strictly related to the three-stage model of the ontogenesis of Self introduced by Damasio (Proto-Self, Core Self, Autobiographical Self). More, we can identify higher levels of Presence and Social Presence associated to higher levels of intentional granularity: the more is the complexity of the expressed and recognized intentions, the more is the level of Presence and Social Presence experienced by the Self. In this framework, motor intentions and mirror neurons are at the basis of the intentional chain, but full intentional granularity requires the activity of higher cortical levels.

Contents 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Introduction......................................................................................................... 98 The simulation approach and the arguments against it ....................................... 98 What is agency .................................................................................................. 100 From intention to agency: the role of presence ................................................. 103 The evolution of presence, intentions and self .................................................. 108 Conclusions....................................................................................................... 110 Acknowledgments............................................................................................. 112 References......................................................................................................... 112


G. Riva / Enacting Interactivity: The Role of Presence

7.1 Introduction A central objective of contemporary cognitive science is the explanation of Social Cognition, the information-processing system that enables us to engage in social behavior. Specifically, social cognition addresses how people process social information: its encoding, storage, retrieval, and use in social situations. An important step towards the understanding of how we handle social information came from the recent discovery of neuronal resonance processes activated by the simple observation of others. Rizzolatti and colleagues found that a functional cluster of premotor neurons (F5c-PF) contains mirror neurons, a class of neurons that are activated both during the execution of purposeful, goalrelated hand actions, and during the observation of similar actions performed by another individual [1, 2]. The general framework outlined by the above results, was used by Simulation Theorists for example, Lawrence Barsalou, Vittorio Gallese, Alvin Goldman, Jane Heal, Susan Hurley, Marc Jeannerod, Guenter Knoblich and Margaret Wilson to support their view: the mirror system instantiates simulation of transitive actions used to map the goals and purposes of others actions [3, 4]. As clearly explained by Wilson and Knoblich [5] this is the outcome of an implicit/covert, subpersonal process: The various brain areas involved in translating perceived human movement into corresponding motor programs collectively act as an emulator, internally simulating the ongoing perceived movement The present proposal suggests that, in tasks requiring fast action coordination, the emulator derives predictions about the future course of others actions, which could be integrated with the actions one is currently planning. (pp. 468-469). In this chapter our aim is twofold: a) we will outline three general arguments against the covert/implicit simulation approach, and (b) we will try to address them within a general framework that links the enaction of our intentions to the understanding of other peoples intentions. Specifically we suggest that humans develop intentionality and Self by evaluating agency in relation to the constraints imposed by the environment (Presence): they are present if they are able to enact their intentions in an external space. This capacity also enables them to go beyond the surface appearance of behavior to draw inferences about other individuals intentions (Social Presence): others are present to us if we are able to recognize them as enacting beings.

7.2 The simulation approach and the arguments against it Even if the covert/implicit simulation approach is gaining momentum within cognitive science, different authors raised arguments against it. The main arguments are three: - mirror neurons are not enough to explain social cognition; - the covert simulation is not a simulation but a perceptual elicitation; - the covert simulation is not a simulation but a sensory forward prediction.

G. Riva / Enacting Interactivity: The Role of Presence


The first argument is based on a simple consideration: we are able to mind read beliefs, desires, and intentions of others, and such mind reading is our primary and pervasive way of understanding their behavior. How mirror neurons are able to provide the richness required for representing a subjects social intention [6]? An interesting discussion about this topic, with questions and answers from both sides, appeared in the interdisciplinary conference What do mirror neurons means, available online at the address: Usually simulationists answer to this question underlining the role played by the imitation process in understanding behaviors. Meltzoff, in his life-long research about infant imitation, found that newborns even only 42 minutes old demonstrate successful facial imitation. Moreover he found that 1221-day-old infants can imitate four different adult gestures: lip protrusion, mouth opening, tongue protrusion and finger movement. Interestingly, the newborns first response to seeing a facial gesture is the activation of the corresponding body part [7]: it is as if young infants isolate what part of their body to move before how to move it (organ identification). To explore the neural correlates of this ability, Chaminade, Decety and Meltzoff [8] designed a functional neuroimaging experiment. The results show that, when subjects imitated either the goal or the means to achieve it, overlapping activity was found in the right dorsolateral prefrontal area and in the cerebellum. There is a main criticism to the possible role of imitation in understanding behaviors coming from Gergely and Csibra [9, 10]. Gergely and colleagues showed that a novel response illuminating a box by touching it with the head imitatively learned from the demonstration of a human model is retained by infants in spite of the availability and production of more readily accessible and rational response alternatives the use of the hands that also produce the same effect [11]. This suggests that imitative learning of novel actions is a qualitatively different process in humans than the imitative copying that has been demonstrated in several other animal species. Specifically, it suggests the existence of some specific processes selecting what to imitate. The second argument against covert simulation was recently raised by Shaun Gallagher [12]. According to this author, the neuronal resonance processes allowed by mirror neurons instantiate a form of enactive social perception a common bodily intentionality that is shared by the perceiving subject and the perceived other that is not a simulation. As underlined by Gallagher: The nature of the resonance processes involved in such encounters makes our perception of other conspecifics different from our perception of objects and instruments. But it does not make our perception and understanding of others the result of an implicit simulation. In effect, simulation is a personal-level concept that cannot be legitimately applied to subpersonal processes. (p. 363). The last argument, raised by Csibra and Gergely [10, 13] is strictly related to the previous one. These authors claim that the subject already sees the meaning of the others actions because the neuronal resonance processes (action mirroring) are generated by some form of action reconstruction (teleological reasoning). In brief, Csibra and Gergely [13] suggest that the resonance processes are not retrodictive,


G. Riva / Enacting Interactivity: The Role of Presence

they do not recover the intention that generated the action but predictive they emulate the action needed to achieve a hypothesized goal. In sum, in spite the growing neuroscientific evidence that humans are endowed with a mirror system, there is not a shared vision about how our brain makes use of this system. Is it really used for the development of social cognition skills? The real question, however, is whether there is a different account that can avoid these objections. We turn now to the construction of a possible alternative account, starting from the analysis of the phenomenology of agency.

7.3 What is agency As we have seen previously, the neurobiological models of the mirror neuron system often state that action understanding is based on mapping the surface properties of observed actions onto the observers motor system. However, different authors (for example, see Wood et al. [14]) suggest that action understanding must also consist of a mechanism that evaluates action means in relation to goals, and places this analysis into a broader context that entails constraints imposed by the current environmental situation. Following this suggestion, we will start our discussion from a deeper analysis of the phenomenology of agency. 7.3.1 Agency: from intention to action and self If actions have to be evaluated in relation to their goals, and it is possible to identify different intentional forms [15, 16], it is also possible to categorize actions according to their underlying intentions. This was one of the main efforts of the Activity Theory, a psychological approach that aimed to understand humans through an analysis of the genesis, structure and processes of their activities [17]. The Activity Theory is the result of a larger effort to develop a new psychology based on Marxist philosophy, initiated by a group of revolutionary Russian psychologists Vygotsky, Leont'ev and Luria in the 1920s and 1930s [18]. For these authors any activity is motivated toward the solution of a problem or purpose (object), and mediated by tools (artifacts) in collaboration with others (community). In particular, Leontev [19] distinguished, within the general activity of the subject, three different levels (see Figure 1) related to the different objects driving it: - Activity is the highest level: the direct answer to a specific objective of the subject. The activity of the subject moves toward the object of a specific need and terminates when it is satisfied. Specifically, an objective is a process characterizing the activity as a whole. For example, in reference to Figure 1, the activity is to obtain a Ph.D. in Psychology. Any objective e.g. helping anorectic girls is closely related to a motive e.g. the need of self actualization and both have to be considered in the analysis of an activity. - Each activity is then translated into reality through a specific or a set of Actions. Each action is a process performed with conscious thought and effort, planned and directed towards achieving a goal. In reference to Figure 1, the activity obtain a Ph.D. is translated in a set of actions: going to the library for

G. Riva / Enacting Interactivity: The Role of Presence


searching the sources, preparing an index, discussing it with the tutor, etc. Each action can then be split in sub-activities, each related to a sub-goal: searching for books on eating disorders, writing the first chapter outline, etc. - Actions and sub-actions are developed through Operations: if actions are connected to conscious goals, operations are related to behaviors performed automatically. In reference to Figure 1, the operation of taking notes on an exercise book is done automatically, without a conscious focus on the movement of the fingers. All the operations e.g. the movements of the fingers to guide the pen however, are oriented by some conditions: specific constrains and affordances related to the characteristics of a given tool such as the size of the paper, the shape of the pen that influence the outcome of the operation. In sum, any human activity is directed toward a specific object. More, it is possible to identify three different levels of human activity (see Figure 1) Activity, Action, Operation according to their specific object Motive, Goal, Condition. Further, any activity level can move both up and down e.g., an Operation can become an Action according to learning and environmental conditions.

Figure 1: The structure of agency

The structure of agency suggested by the Activity Theory has many similarities with the Dynamic Theory of Intentions presented by Pacherie [16, 20]. According to this author, it is possible to identify three different categories or forms of intentions using their different roles and contents (see Figure 1): distal intentions (D-intentions), proximal intentions (P-intentions) and motor intentions (Mintentions): - D-intentions (Future-directed intentions). These intentions are terminators of practical reasoning about ends and have conceptual and descriptive contents. They also act both as intra- and interpersonal coordinators, and as prompters of practical reasoning about means and plans. D-intentions almost overlap objectives as defined by the Activity Theory: in the activity described in Figure 1, helping anorectic girls is a D-intention. - P-intentions (Present-directed intentions). These intentions are responsible for high-level (conscious) forms of guidance and monitoring. More in detail, they


G. Riva / Enacting Interactivity: The Role of Presence

have to ensure that the imagined actions become current through situational control of their unfolding. P-intentions are similar to goals as defined by the Activity Theory: in the activity described in Figure 1, preparing the dissertation is a D-intention. - M-intentions (Motor intentions). These intentions are responsible for low-level (unconscious) forms of guidance and monitoring: we may not be aware of them and have only partial access to their content. Further, their contents are not propositional. As before, M-Intentions are quite similar to conditions, as defined by Activity Theory: in the activity described in Figure 1, the motor representations required to move the pen are M-intentions. In sum, any intentional level has its own role: the rational (D-intentions), situational (P-Intention) and motor (M-Intention) guidance and control of action. More, as suggested by the Activity Theory, they form an intentional cascade [16, 20]: higher intentions generate lower intentions. Activity Theory also suggests that human activity should be analyzed in the context of development. Specifically, Vygotsky [21, 22] states that internalization and externalization are the dialectical mechanisms behind the development of the Self. On one side external activity transform internal cognitive processes (internalization). On the other side, knowledge structures and moments of internal activity organize and regulate external social processes (externalization). It is interesting to note that the three-level structure of agency suggested by the Activity Theory is very close in certain respects to the three-stage model of the ontogenesis of Self introduced by Damasio (Figure 2). This author distinguishes between a preconscious precedent of Self and two distinct notions of selfconsciousness [23, 24]: - the Proto Self: a coherent collection of neural patterns that map, moment by moment, the physical state of the organism; - the Core Self: a transient entity which is continuously generated through encounters with objects; - the Autobiographical Self: a systematic record of the more invariant properties that the organism has discovered about itself. In this vision, the basis for a conscious Self is a feeling state that arises when organisms represent a non-conscious Proto-Self in the process of being modified by objects. In essence, the sense of self depends on the creation of a second-order mapping, in certain brain regions (brainstem nuclei, hypothalamus, medial forebrain and insular and somatosensory cortices), of how the Proto Self has been altered [23]. However, it is only the Autobiographical Self that generates the subjective experience of possessing a transtemporal identity.

G. Riva / Enacting Interactivity: The Role of Presence


Figure 2: From self to agency

7.4 From intention to agency: the role of presence Integrating the previous theories, the goal of this paragraph is to outline a conceptual framework directly linking Self, intentions and activity through the concept of Presence, the feeling of being and acting in a world outside us. One key assumption guiding this attempt is that the three levels of Self identified by Damasio can be directly connected (see Figure 2) to specific intentional forms and activities (intentional granularity). More in detail, we suggest that humans develop intentionality and Self by evaluating prereflexively their agency in relation to the constraints imposed by the environment: they are present if they are able to enact their intentions. This capacity also enables them to go beyond the surface appearance of behavior to draw inferences about other individuals intentions: others are present to us if we are able to recognize them as enacting beings. The next sections will deepen these points. In Section 1 we will introduce the concept of Presence by describing the link between action and perception. Section 2 will introduce the phenomenology of Presence, by differentiating between Presence-as-process and Presence-as-feeling. In Section 3 we will discuss the concept of Proto Naked Intentionality, the innate human ability of recognizing M-intentions within the perceptual field. And in Section 4 we will introduce Social Presence a cognitive process that evaluates intentions using the same predictive model used by Presence. 7.4.1 The concept of presence In its more general use the term Presence has referred to a widely reported sensation experienced during the use of virtual reality or other media [25-27]. However, a growing number of researchers consider Presence as a neuropsychological phenomenon, evolved from the interplay of our biological and cultural inheritance whose goal is to produce a strong sense of agency and control [28-33]: Presence as the feeling of being and acting in a world outside me. To understand the relationship between Presence and action we have to start from the link between percept and behavior: recent neuropsychological research showed that the contents of subjects perception guide action in space and locate


G. Riva / Enacting Interactivity: The Role of Presence

the subject in the perceived world [34, 35]. In other words, as suggested previously by Piaget (assimilation) and Gibson (affordance), we conceive places in terms of the actions we could take towards them: the subject has not a separate knowledge of the places location relative to him/her, what he/she can do in it, and his/her purposes. Extending this vision, Waskan [36] suggests that we represent phenomena by thinking in terms of the mechanisms by which the phenomena may be produced. An example can help in understanding this point. Retrieving an occluded object e.g. when we lift a book to retrieve a pen from under it is an action taken on the basis of a belief about where the pen is located relative to the Self. In sum [36], one cannot see a place as being there1 rather than there2 without knowing what it would be to act there1 rather than there2. (p. 170, our italics). It follows that to know that the pen exists when it is occluded is a matter of knowing what can be done to make the pen visible. More, if I want to grab the pen, its spatial position will be represented in terms of the movements needed to reach for it. Further, its shape and size will be represented in terms of the type of handgrip it affords. In other words [36], humans harbor and manipulate specific, intrinsic, cognitive models of complex, inter-dimensional, worldy constraints (p. 195). Recently Proffitt [37] provided an experimental support to this vision: his data showed that under conditions of constant visual stimulation, the apparent dimensions of surface layout expand and contract with changes in the energetic costs associated with intended actions. In sum, the explicit awareness of spatial layout varies not only with relevant optical and ocular-motor variables, but also as a function of the costs associated with performing intended actions. This experimental result is backed by the discovery of two different visual systems [38]: - Vision for Action. It extracts from the visual stimuli information used to build motor representations used in effecting rapid visuo-motor transformations; - Vision for Semantical Perception. It allows the identification and recognition of objects and scenes. In sum, the subject locates himself/herself in an external space according to the action he can do in it. In other words, the subject is present in a space if he/she can act in it. More, the subject is present in the space real or virtual where he/she can act in. According to this vision, Presence has a simple but critical role in our everyday experience: the control of agency (enaction of intentions) through the unconscious separation of internal and external [39, 40]. Within this view, Presence is defined as the non mediated (prereflexive) perception of successfully transforming an intention in action (enaction) within an external world [41]. The recent research of Haggard and Clark [42, 43] on voluntary and involuntary movements provides a direct support to the existence of a specific cognitive process binding intentions with actions. In their words [43]: Taken as a whole, these results suggest that the brain contains a specific cognitive module that binds intentional actions to their effects to construct a coherent conscious experience of our own agency. (p. 385).

G. Riva / Enacting Interactivity: The Role of Presence


7.4.2 The phenomenology of presence From a phenomenological viewpoint, it is critical to distinguish between Presenceas-process and Presence-as-feeling. The Presence-as-process is the continuous activity of the brain in separating internal and external within different kinds of afferent and efferent signals. As clarified by Russell [44] and in agreement with Gallagher: Action-monitoring is a subpersonal process that enables the subjects to discriminate between self-determined and world-determined changes in input. It can give rise to a mode of experience (the experience of being the cause of altered inputs and the experience of being in control) but it is not itself a mode of experience. (p.263). From the computational viewpoint, this is achieved through a forward-inverse model: - first, the agent produces the motor command for achieving a desired state given the current state of the system and the current state of the environment; - second, an efference copy of the motor command is fed to a forward dynamic model that generates a prediction of the consequences of performing this motor command; - third, the predicted state is compared with the actual sensory feedback. Errors derived from the difference between the desired state and the actual state can be used to update the model and improve performance. As result, when we move much of what we perceive as action is tagged to our intention to move rather than to our perception of what has happened as a result of movement. For instance, Fourneret and Jeannerod [45] have shown that, in a reaching task, we are more aware of where we direct movement of the arm and hand (and where it appears to go) than to where the hand actually moves. For this reason, the Presence-as-feeling the non-mediated (prereflexive) perception that agents intentions are successfully enacted is not separated by the experience of the subject but it is directly related to it. It corresponds to what Heidegger [46] defined the interrupted moment of our habitual standard, comfortable being-in-the-world. In fact, a higher level of Presence-as-feeling is experienced by the Self as a better quality of action and experience [32]. More, the agent perceives directly only the variations in the level of Presence-as-feeling: breakdowns and optimal experiences [41]. At this point we can argue that is the feeling of Presence that provides to the agent a feedback about the status of its activity: the agent perceives the variations in the feeling of Presence and tunes its activity accordingly. Specifically, the agent tries to overcome any breakdown in its activity and searches for engaging and rewarding activities (optimal experiences). 7.4.3 Proto naked intentionality: the innate ability to recognize m-intentions In the previous section we suggested that Presence allows the subject to monitor the enaction of his/her intentions. However, how we can recognize them in others: how can we distinguish between a blink and a wink?


G. Riva / Enacting Interactivity: The Role of Presence

There is a large body of evidence underlying that infants, even in the first months of life, show a special sensitivity to communication and participate in emotional sharing with their caregivers [47]. Trevarthen [48, 49] argues that an infant is conscious, from birth, of others subjectivity: he/she is conscious of others mental states and reacts in communicative, emotional ways so to link each others subjectivity. Meltzoff goes further [7, 50-52] proposing the existence of a biological mechanism allowing infants to perceive others like them at birth. Extending this vision Tirassa and colleagues [53] argue that infants are in a particular state that they define sharedness: the infants capability to take it for granted that the caregiver is aware of his/her mental states and will act accordingly. In this vision the infant considers his own mental states as mutually and overtly known to the caregiver. A more radical position was recently suggested by Jeannerod and Pacherie [54]. In their view infants have a direct ability naked intentionality of recognizing intentional behaviors in their perceptual field. Specifically, these intentions are naked, not directly attributed to a subject: Our contention is that this [premotor] cortical network provides the basis for the conscious experience of goal-directedness the primary awareness of intentions but does not by itself provide us with a conscious experience of Selfor Other- agency We can be aware of an intention, without by the same token being aware of whose intention it is... something more than the sole awareness of a naked intention is needed to determine its author. (p.140). However, other scholars have proposed different arguments and explanations against this position. For instance, Gallagher [12] argued that: Phenomenologically (experientially) intentions in almost all cases come already fully clothed in agent specification. The who question does not come up at the level of experience, because the neural systems have already decided the issue. The wonderful thing about the Who system is that it is completely neurological and subpersonal. (p. 358). Further, Legrand [55] underlines that: Mechanisms of identification and attribution are necessary in order to disambiguate "naked intentions" and attribute the action/intention to an identified agent. However, this implies focusing exclusively on consciousness of the agentas-object leaving aside its foundation: the primary experience of oneself as an agent-as-subject, at a pre-reflective level. (p.475) In general, we agree with both remarks. It is true that our direct perception is highly reliable in discriminating between Self and non-Self. Further, it is true that this discrimination is completely neurological and sub-personal. Finally, we agree that the experience of the agent-as-subject remains prior to any intentional process of self-identification. In fact, we take a related but different position. Following Jeannerod and Pacherie [54] we believe that infants have a direct ability of recognizing

G. Riva / Enacting Interactivity: The Role of Presence


intentional motor behaviors in their perceptual field. However, there are two critical differences between our position and the one presented by these authors: - Only M-Intentions are naked at birth, because they are the only ones available at that time. - Is through Presence that neonates differentiate between internal and external intentions, between their actions and those of others. In sum, infants have naked proto-intentionality: a primitive and innate mental state type, which can be characterized in the following terms: to be able to recognize a motor intention (M-intention) without being aware of whose intention it is. This position is not so far from what suggested by Meltzoff and Brooks [56]: Evidently, infants construe human acts in goal-directed ways. But when does it start? We favor the hypothesis that it begins at birth The hypothesis is not that neonates represent goal directedness in the same way as adults do. In fact, neonates probably begin by coding the goals of pure body acts and only later enrich the notion of goals to encompass object directed acts. (p. 188). More, is through Presence, through the development of a common spatial and temporal framework with external objects [57], that the agent becomes a self, able to differentiate between internal and external intentions/actions. However, the emergence of the Self also leads to the recognition of the Other as another intentional Self. 7.4.4 From presence to social presence Even if Presence allows the identification of the Other as another intentional Self, we need a new cognitive process (Social Presence), different but directly connected to the Presence one, tracking the behavior of the Other to understand his/her intentions. In fact, naked proto-intentionality allows infants to detect intentionality they recognize that a M-intention is being enacted but neither to detect higher level intentions they do not recognize D-intentions and P-intentions nor to identify the motives of motor behaviors they do not recognize why the specific M-intention is being enacted. More in detail, we define as Social Presence the non mediated (prereflexive) perception of an enacting Other within an external world. As for Presence, we distinguish between Social-Presence-as-process and SocialPresence-as-feeling. The Social-Presence-as-process is the continuous activity of the brain in identifying Others intentions within the perceptual field. So, it can be described as a sophisticated form of monitoring of others actions transparent to the Self but critical for its social abilities. Following Csibra and Gergely [10], we suggest that this processes is not retrodictive, it does not recover the intention that generated the action but predictive it emulates the action needed to achieve a hypothesized goal. From the computational viewpoint, it follows the same approach used by the Presence-asprocess: - first, the agent recognizes the motor command, the current state of the other


G. Riva / Enacting Interactivity: The Role of Presence

agent and the current state of the environment; - second, an efference copy of the motor command is fed to a forward dynamic model that generates a prediction of the consequences of performing this motor command; - third, the predicted state is compared with the actual sensory feedback. Errors derived from the difference between the predicted state and the actual state can be used to update the model and improve performance. Supporting this vision, Oztop and colleagues [58] showed that the motor modules of the observer can be used in a predictive mode to infer the mental state of the actor. According to their model, mirror neurons can be involved in the sensory forward prediction of goal-directed movements, which are activated both for mental simulation during action observation and for feedback-delay compensation during movement. Recently, Kilner and colleagues [59] introduced a predictive coding framework for mirror neurons on the basis of a statistical approach known as empirical Bayesian inference. Within this scheme, the most likely cause of an observed action can be inferred by minimizing the prediction error at all levels of the cortical hierarchy that are engaged during action observation. From an evolutive viewpoint this approach has two strengths. First, it can be seen as the brains attempt to minimize the free energy induced by a stimulus by encoding its most likely cause [59]. More, the recognition of others intentions using a forward model allows interpretation without prior experience since, as long as an intentional movement or behavior is in the repertoire of the Self, it will be interpretable without any training. Social-Presence-as-feeling is instead the non mediated perception of others intentions. The concept of Social-Presence-as-feeling is similar to the concept of intentional attuning suggested by Gallese [60, 61]: our capacity to prereflexively identify with others. In fact the Social-Presence-as-feeling is not separated by the experience of the subject but it is related to the quality of his/her social interactions. The Self experiences reflexively the Social-Presence-as-feeling only when the quality of his experience is modified during a social interaction: according to the level of Social Presence experienced by the subjects, they will experience intentional opacity on one side, and communicative attuning and synchrony on the other side [62]. 7.5 The evolution of presence, intentions and self A key assumption of the model we just presented is a strict link between intentions, Self and Presence. Here we try to add a broader claim: Presence and Social Presence evolve in time, and their evolution is strictly related to the evolution of Self. Specifically, following the three-stage model of the ontogenesis of Self (Proto-Self, Core Self, Autobiographical Self) proposed by Damasio [24], we can identify higher levels of Presence and Social Presence associated to higher levels of intentional granularity.

G. Riva / Enacting Interactivity: The Role of Presence


Figure 3: The evolution of self, presence and social presence

As showed in Figure 3, the higher is the complexity of the enacted and recognized intentions, the higher is the level of Presence and Social Presence experienced by the Self. In proto naked intentionality the structure of the intention includes action and goal only. When the Self experiences the highest level of Presence and Social Presence he is able to express, enact and recognize complex intentions including Subject, Action, Goal, Object, Way of Doing and Motive. In sum, the enaction and recognition of high-level intentions D-Intentions requires higher levels of Presence and Social Presence. In the next two sessions we will introduce them (for a broader and more in-depth description of the layers and their interaction see [39, 41]). 7.5.1 The layers of presence Even if Presence is a unitary feeling, the recent neuropsychological research has shown that, on the process side, it can be divided in three different layers/subprocesses phylogenetically different, and strictly related to the evolution of Self [24]: - Proto Presence (Self vs. non Self M-Intentions); - Core Presence (Self vs. present external world P-Intentions); - Extended Presence (Self relative to present external world D-Intentions). More precisely we can define Proto Presence the process of internal/external separation related to the level of perception-action coupling (Self vs. non-Self). The more the organism is able to couple correctly perceptions and movements, the


G. Riva / Enacting Interactivity: The Role of Presence

more it differentiates itself from the external world, thus increasing its probability of surviving. Proto Presence allows the enaction of M-Intentions only. Core Presence can be described as the activity of selective attention made by the Self on perceptions (Self vs. present external world): the more the organism is able to focus on its sensorial experience by leaving in the background the remaining neural processes, the more it is able to identify the present moment and its current tasks, increasing its probability of surviving. Core Presence allows the enaction of M-Intentions and P-Intentions only. The role of Extended Presence is to verify the relevance to the Self of experienced events in the external world (Self relative to the present external world). The more the Self is present in relevant experiences, the more it will be able to reach its goals, increasing the possibility of surviving. Following the Sperber and Wilson approach [63], an input is relevant when its processing yields a positive cognitive effect: a worthwhile difference to the Selfs representation of the world. Only with Extended Presence the agent is able to enact all the three levels of intentions. 7.5.2 The layers of social presence The study of infants and the analysis of their ability of understanding and interacting with people suggest that also Social Presence, on the process side, includes three different layers/subprocesses phylogenetically different, but mutually inclusive: - Proto Social Presence (there is an other intentional Self ); - Interactive Social Presence (the intention of the Other is toward the Self); - Shared Social Presence (the Self and the Other share the same intention). More precisely we can define Proto Social Presence the process allowing the identification of other intentional selves in the phenomenological world (there is an other intentional Self). The more the Self is able to identify other selves, the more it is the possibility of starting an interaction, thus increasing its probability of surviving. Proto Social Presence allows the recognition of M-Intentions only. Interactive Social Presence can be described the process allowing the identification of communicative intentions in other selves (the intention of the other is toward the Self). The more the Self is able to identify a communicative intention in other selves, the more it is the possibility of starting an interaction, thus increasing its probability of surviving. Interactive Social Presence allows the recognition of M-Intentions and P Intentions only. Finally, the role of Shared Social Presence is to allow the identification of intentional congruence and attunement in other selves (the Self and the other share the same D-intention). The more the Self is able to identify intentional attunement in other selves, the more it is the possibility of conducting an interaction, thus increasing its probability of surviving.

7.6 Conclusions In this chapter we tried to show that the concepts of Presence the non mediated (prereflexive) perception of successfully transforming an intention in action

G. Riva / Enacting Interactivity: The Role of Presence


(enaction) within an external world and Social Presence the non mediated perception of an enacting Other within an external world can offer a conceptual framework for understanding the link between the enaction and the recognition of intentions. Through Presence, the agent prereflexively controls his/her action through a forward-inverse model: the prediction of the action is compared with perceptual inputs to verify its enaction. Through Social Presence, the agent prereflexively recognizes and evaluates the action of others using the same forward-inverse model: the prediction of the action is compared with perceptual inputs to verify its enaction. Both Presence and Social Presence evolve in time, and their evolution is strictly related to the evolution of Self. Following the Damasios three-level model of Self (Proto-Self, Core Self, Autobiographical Self) we can identify higher levels of Presence and Social Presence associated to higher levels of intentional granularity. In this framework, motor intentions are at the basis of the intentional chain but inherit their goal from higher level intentions. In other words, mirror neurons have a direct role in the enaction and recognition of M-intentions only. On one side, mirror neurons are activated in P-intentions and D-intentions only within the intentional/activity chain generated by high-level intentions. Recently, Cheng and colleagues [64] provided a first empirical support to this vision. They used a functional magnetic resonance experiment to demonstrate that motivation can inuence activity in the human mirror-neuron system. They state: [The results] indicate that the motivational state of the organism affects neural systems involved in perception-action coupling mechanism. We speculate that the signals arising from the neural systems involved with drive (orbitofrontal cortex) and motivation (amygdala) enhance the activity in the mirror-neuron system to prepare the organism to behave. (p. 1983). On the other side, mirror neurons are not directly involved in the recognition of P-Intentions and D-Intentions. As recently showed by Brass and colleagues [65] the description of the goal (P-intention) of an observed action (the operation of a light switch with the knee) is not encoded by the mirror neuron system. However, as predicted by our framework, the mirror neuron system encodes its conditions (M-intention), the short-term intentions necessary to enact the goal. Within this view, signals encoding higher-level attributes of an observed action are probably expressed by the activity in higher cortical levels, whereas those encoding lowerlevel attributes, such as the goal and the kinematics of the movement, may be expressed in lower cortical levels. Finally, the prediction of others intentions is strictly related to the enaction of my ones: I can predict what I can enact. A strong experimental support to this claim comes from a recent study by Calvo-Merino and colleagues [66] comparing dancers and non-dancers. In their study, the dancers mirror neurons showed more activity when they saw movements they had been trained to perform than when they observed movements they hadn't been trained to perform. More, the mirror system in the non-dancers showed appreciably less activity while watching the videos than either of the dancers' mirror systems. Obviously, this chapter has its limitations: the framework here introduced is still in progress and some of the claims presented require additional theoretical work and an empirical confirmation. Nevertheless, quite independently of the intricacies


G. Riva / Enacting Interactivity: The Role of Presence

of terminology and conceptualizations, we hope that the Presence framework will help to disentangle the variety of claims and theories that characterizes intersubjectivity research.

7.7 Acknowledgments The present work was supported by the Italian MIUR FIRB programme (Project IVT2010 Immersive Virtual Telepresence (IVT) for Experiential Assessment and Rehabilitation RBIN04BC5C) and by the European Union IST Programme (Project PASION Psychologically Augmented Social Interaction over Networks - IST-2005- 027654).

7.8 References
[1] V. Gallese, L. Fadiga, L. Fogassi & G. Rizzolatti, Action recognition in the premotor cortex. Brain, 119, 593-609, 1996. [2] G. Rizzolatti, L. Fadiga, V. Gallese & F. L., Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141, 1996. [3] V. Gallese, Embodied simulation: From neurons to phenomenal experience. Phenomenology and the Cognitive Sciences, 4, 23-48, 2005. [4] L.W. Barsalou, Situated simulation in the human conceptual system. Language and Cognitive Processes, 18, 513-562, 2003. [5] M. Wilson & G. Knoblich, The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460-473, 2005. [6] P. Jacob & M. Jeannerod, The motor theory of social cognition: A critique. Trends in Cognitive Sciences, 9, 21-25, 2005. [7] A.N. Meltzoff & M.K. Moore, Imitation of facial and manual gestures by human neonates. Science, 198, 702-709, 1977. [8] T. Chaminade, A.N. Meltzoff & J. Decety, Does the end justify the means? A PET exploration of the mechanisms involved in human imitation. Neuroimage, 15, 318-328, 2002. [9] G. Gergely & G. Csibra, The social construction of the cultural mind: Imitative learning as a mechanism of human pedagogy. Interaction Studies, 6, 463-481, 2005. [10] G. Csibra & G. Gergely, Social learning and social cognition: The case for pedagogy. In Y. Munakata & M.H. Johnson, (Eds.), Process of change in brain and cognitive development. Attention and performance XXI, (pp. 249-274). Oxford University Press: Oxford, 2006. [11] G. Gergely, H. Bekkering & I. Kiraly, Rational imitation in preverbal infants. Nature, 415(6873), 755, 2002. [12] S. Gallagher, Simulation Trouble. Social Neuroscience, 2(3-4), 353-365, 2007. [13] G. Csibra & G. Gergely, Obsessed with goals: Functions and mechanisms of teleological interpretation of actions in humans. Acta Psychologica, 124, 60-87, 2007. [14] J.N. Wood, D.D. Glynn, B.C. Phillips & M.D. Hauser, The perception of rational, goal-directed action in nonhuman primates. Science, 317(5843),1402-5, 2007. [15] J. Searle, Intentionality: An essay in the philosophy of mind. New York: Cambridge University Press, 1983. [16] E. Pacherie, Toward a dynamic theory of intentions. In S. Pockett, W.P. Banks & S. Gallagher, (Eds.), Does consciousness cause behavior?, (pp. 145-167). MIT Press: Cambridge, MA, 2006. [17] V. Kaptelinin & B. Nardi, Acting with Technology: Activity Theory and Interaction Design. Cambridge, MA: MIT Press, 2006. [18] B. Nardi (Ed.), Context and consciousness: Activity theory and Human-Computer Interaction. MIT Press: Cambridge, MA, 1996. [19] A.N. Leontjev, Problems of the Development of Mind. Moscow: Progress, 1981. [20] E. Pacherie, The phenomenology of action: A conceptual framework. Cognition, in press: doi:10.1016/j.cognition.2007.09.003. [21] L.S. Vygotsky, Mind in society: The development of higher psychological processes. Harvard University Press. Cambridge, MA, 1978.

G. Riva / Enacting Interactivity: The Role of Presence


[22] L.S. Vygotsky, Thought and language. Cambridge, MA: MIT Press, 1965. [23] R.J. Dolan, Feeling the neurobiological self. Nature, 401, 847-848, 1999. [24] A. Damasio, The Feeling of What Happens: Body, Emotion and the Making of Consciousness. San Diego, CA: Harcourt Brace and Co, Inc, 1999. [25] J.S. Steuer, Defining virtual reality: Dimensions determining telepresence. Journal of Communication, 42(4),73-93, 1992. [26] M. Slater & S. Wilbur, A framework for immersive virtual environments (FIVE): Speculations on the role of presence in virtual environments. Presence: Teleoperators and Virtual Environments, 6(6), 603-616, 1997. [27] W. Wirth, T. Hartmann, S. Bocking, P. Vorderer, C. Klimmt, H. Schramm, T. Saari, J. Laarni, N. Ravaja, F.R. Gouveia, F. Biocca, A. Sacau, L. Jancke, T. Baumgartner & P. Jancke, A Process Model of the Formation of Spatial Presence Experiences. Media Psychology, 9(3), 493-525, 2007. [28] G. Riva, F. Davide & W.A. IJsselsteijn, (Eds.), Being There: Concepts, effects and measurements of user presence in synthetic environments. In G. Riva & F. Davide (Eds.), Emerging Communication: Studies on New Technologies and Practices in Communication. Ios Press, 2003. Online: Amsterdam. [29] J.A. Waterworth & E.L. Waterworth, Focus, Locus, and Sensus: The three dimensions of virtual experience. Cyberpsychology and Behavior, 4(2), 203-213, 2001. [30] G. Mantovani & G. Riva, "Real" presence: How different ontologies generate different criteria for presence, telepresence, and virtual presence. Presence, Teleoperators, and Virtual Environments, 8(5), 538-548, 1999. [31] T. Schubert, F. Friedman & H. Regenbrecht, The experience of presence: Factor analytic insights. Presence: Teleoperators, and Virtual Environments,10(3), 266-281, 2001. [32] P. Zahoric & R.L. Jenison, Presence as being-in-the-world. Presence, Teleoperators, and Virtual Environments, 7(1), 78-89, 1998. [33] J.A. Waterworth & E.L. Waterworth, The meaning of presence. Presence-Connect, 3(2), 2003. Online: worthFeb1020031217.html. [34] M. Matelli & G. Luppino, Parietofrontal circuits for action and space perception in the macaque monkey. Neuroimage,14(1 Pt 2), S27-32, 2001. [35] A. Postma, Space: from perception to action. Acta Psychologica, 118(1-2), 1-6, 2005. [36] J. Waskan, Models and Cognition. Cambridge, MA: MIT Press, 2006. [37] D.R. Proffitt, Embodied Perception and the Economy of Action. Perspectives on Psychological Science, 1(2), 110-121, 2006. [38] P. Jacob & M. Jeannerod, Ways of seeing: The scope and limits of visual cognitions. Oxford: Oxford University Press, 2003. [39] G. Riva, J.A. Waterworth & E.L. Waterworth, The Layers of Presence: a bio-cultural approach to understanding presence in natural and mediated environments. Cyberpsychology & Behavior, 7(4), 405-419, 2004. [40] G. Riva, Virtual Reality and Telepresence. Science, 318(5854), 1240-1242, 2007. [41] G. Riva, Being-in-the-world-with: Presence meets Social and Cognitive Neuroscience. In G. Riva, M.T. Anguera, B.K. Wiederhold & F. Mantovani (Eds), From Communication to Presence: Cognition, Emotions and Culture towards the Ultimate Communicative Experience. Festschrift in honor of Luigi Anolli, IOS Press, 2006. Online: Amsterdam. p. 47-80. [42] P. Haggard & S. Clark, Intentional action: conscious experience and neural prediction. Consciousness and Cognition,12(4), 695-707, 2003. [43] P. Haggard, S. Clark & J. Kalogeras, Voluntary action and conscious awareness. Nat Neurosci, 5(4), 382-5, 2002. [44] J.A. Russell, Agency: Its role in mental development. Hove: Erlbaum, 1996. [45] P. Fourneret & M. Jeannerod, Limited conscious monitoring of motor performance in normal subjects. Neuropsychologia, 36(11), 1133-1140, 1998. [46] M. Heidegger, Unterwegs zur Sprache. Neske: Pfullingen, 1959. [47] M. Legerstee, Infants' sense of people: Precursors to a Theory of Mind. Cambridge: Cambridge University Press, 2005. [48] C. Trevarthen, The neurobiology of early communication: Intersubjective regulations in human brain development. In A.F. Kalverboer & A. Gramsbergen (Eds.), Handbook on brain and behavior in human development. Klewer Academic Publisher: Dordrecht, The Netherlands, 2001. [49] C. Trevarthen & K. Aitken, Infant intersubjectivity: Research, theory and clinical applications. Journal of Psychological Psychiatry, 42, 3-48, 2001.


G. Riva / Enacting Interactivity: The Role of Presence

[50] A.N. Meltzoff, W. Prinz, G. Butterworth, G. Hatano, K.W. Fischer, P.M. Greenfield, P. Harris & D. Stern (Eds.), The imitative mind: Development, evolution, and brain bases. Cambridge University Press: Cambridge, 2002. [51] A.N. Meltzoff & J. Decety, What imitation tells us about social cognition: a rapprochement between developmental psychology and cognitive neuroscience. Philosophical Transactions of the Royal Society, 358, 491-500, 2003. [52] A.N. Meltzoff, Origins of theory of mind, cognition and communication. Journal of Communicative Disorders, 32, 251-269, 1999. [53] M. Tirassa, F.M. Bosco & L. Colle, Rethinking the ontogeny of mindreading. Consciousness and Cognition, 15, 197-217, 2006. [54] M. Jeannerod & E. Pacherie, Agency, simulation and self-identification. Mind & Language, 19(2), 113-146, 2004. [55] D. Legrand, Naturalizing the Acting Self: Subjective vs. Anonymous Agency. Philosophical Psychology, 20(4), 457-478, 2007. [56] A.N. Meltzoff & R. Brooks, "Like me" as a building block for understanding other minds: Bodily acts, attention and intention. In B.F. Malle, L.J. Moses & B. D.A. (Eds.), Intentions and Intentionality: Foundation of social cognition, (pp.171-191). MIT Press: Cambridge, MA, 2001. [57] A. Revonsuo, Inner Presence, Consciousness as a Biological Phenomenon. Cambridge, MA: MIT Press, 2006. [58] E. Oztop, D. Wolpert & M. Kawato, Mental state inference using visual control parameters, 22, 129-151, 2005. [59] J.M. Kilner, K.J. Friston & C.D. Frith, The mirror-neuron system: a Bayesian perspective. Neuroreport, 18(6), 619-23, 2007. [60] V. Gallese, Intentional Attunement: The mirror system and its role in interpersonal relations. Interdisciplines, 1, 2004. Online: [61] V. Gallese, The roots of empathy. The shared mainfold hypothesis and the neural basis of intersubjectivity. Psychopathology, 36, 171-180, 2003. [62] L. Anolli, R. Ciceri & G. Riva (Eds.), Say not to Say: New persectives on miscommunication. Emerging Communication: Studies on New Technologies and Practices in Communication, ed. G. Riva and F. Davide. Ios Press, 2002. Online: Amsterdam. [63] D. Sperber & D. Wilson, Relevance: Communication and Cognition (2nd Edition). Oxford: Blackwell, 1995. [64] Y. Cheng, A.N. Meltzoff & J. Decety, Motivation modulates the activity of the human mirrorneuron system. Cerebral Cortex, 17(8), 1979-86, 2007. [65] M. Brass, R.M. Schmitt, S. Spengler & G. Gergely, Investigating Action Understanding: Inferential Processes versus Action Simulation. Current Biology, 17(24), 2117-21, 2007. [66] B. Calvo-Merino, D.E. Glaser, J. Grezes, R.E. Passingham & P. Haggard, Action observation and acquired motor skills: an FMRI study with expert dancers. Cerebral Cortex, 15(8), 1243-9, 2005.


She drops the paddle and picks up one of the lobsters by the tail. Laughing, she shoves it at Alvy who jerks backward, squeamishly. ALVY: Dont give it to me. Dont! ANNIE: (Hysterically)Oooh! Here! Here! ALVY: (Pointing) Look! Look, one crawled behind the refrigerator. Itll turn up in our bed at night. (They move over to the refrigerator; Alvy moves as close to the wall as possible as Annie, covering her mouth and laughing hysterically, teasingly dangles a lobster in front of him) Will you get outta here with that thing? Jesus! ANNIE: (Laughing, to the lobster) Get Him! ALVY: (Laughing) Talk to him. You speak shellfish! (He moves over the stoves and takes the lid of a large steamer filled with boiling water) Hey, lookput it in the pot. ANNIE: (Laughing) I cant! I cant put it in the pot. I cant put a live thing in hot water. ALVY: (Overlapping) Gimme! Gimme! Let me do it! What-Whats he think were gonna do, take him to the movies? Woody Allen - Annie Hall, 1977

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Stages in the Development of Perceptual Intersubjectivity

Abstract. We offer a model of perceptual intersubjectivity (PI), the phenomenon of two or more subjects focusing their attention on the same external target. The model involves two types: symmetric and asymmetric PI, and three levels: synchronous (SPI), coordinated (CPI) and reciprocal (RPI), defined on the basis of the observable behavior of the participants of (non-verbal) social interactions. We hypothesize that the three levels correspond to stages in the development and possibly evolution of human perceptual intersubjectivity, and provide support for this through an empirical study of adult-infant interactions in two species of great apes (chimpanzees and bonobos) and human beings. The results showed conspicuous and apparently qualitative differences between the human and nonhuman subjects, and clear developmental patterns in the human data. Thus our analysis may contribute to the ultimate goal of understanding the nature and development of human cognitive specificity, in line with goals with the collaborative project Stages in the Evolution and Development of Sign Use (SEDSU) [1] .

Contents 8.1 8.2 8.3 8.4 8.5 8.6 Introduction....................................................................................................... 118 A model of perceptual intersubjectivity ............................................................ 118 An empirical study ............................................................................................ 125 Conclusions....................................................................................................... 131 Acknowledgment .............................................................................................. 132 References......................................................................................................... 132


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

8.1 Introduction Intersubjectivity can be defined, most generally, as the sharing of states and processes of consciousness between two or more subjects [2]. Different forms of intersubjectivity can be distinguished on the basis of the most prominent form of consciousness involved perceptual, affective, or reflective [3] or on the basis of their intrinsic complexity [4-6]. An important value of such analyses is that they can help us understand the development and evolution of human intersubjectivity. The goal of the present chapter is to provide one such analysis, focusing on the phenomenon of perceptual intersubjectivity, more commonly known under the label joint attention. Though wide-spread, the meaning of the latter term is rather ambiguous: sometimes it refers to the general case in which two or more subjects perceive the same target [7], while for others the term applies to more specific reciprocal states in which the subjects are also aware that they perceive the same target [8]. Most often only visual attention has been described in the literature, but implicitly the descriptions have been thought to generalize to other modalities. Building on previous work [5,6,9-15] we define perceptual intersubjectivity (PI) as the process in which two or more subjects focus their attention on the same external target. Like most others we focus on the visual modality, but formulate our definitions in way to be applicable to other modalities such as hearing and touch as well. In Section 2, we identify different levels of PI on the basis of the complexity of the interaction between the subjects. These levels build on each other cumulatively, and it is therefore possible to hypothesize that they correspond to developmental and/or evolutionary stages [1]. In order to test this hypothesis we provide operational definitions of the different levels of PI, thereby making the model applicable to empirical data involving human beings and non-human primates, and thus allowing it to be empirically assessed. One such an assessment is described in Section 3, where we apply the model to data from adult-infant interactions in great apes (chimpanzees and bonobos) and human beings. In Section 4 we summarize our proposed stagebased model of perceptual intersubjectivty. 8.2 A model of perceptual intersubjectivity 8.2.1 General Definitions In most general terms perceptual intersubjectivity (PI) can be defined as the phenomenon of two or more subjects focusing their attention on the same external target. Individual PI episodes may be individuated in terms of their targets, present in the immediate context shared by the participants of the interaction. Targets can be objects, events, spatial locations (e.g., a certain place to go to), or directions (e.g., a way in which to go). The term object should be understood in a wide sense to refer to any animate or inanimate entity that occupies a position in spacetime, e.g. a toy, or a person. A PI episode may have one and only one target1. Two major types of PI episodes can be distinguished:

See Section 3.2 for a more precise operational definition of the notion PI episode

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


Symmetric: when the target has already been noticed by both (or more) subjects [6]. In discussing symmetric PI, we refer to the participants as subjects. Asymmetric: when the target initially is noticed by only one of the subjects, and subsequently the other subject aligns her attention with the first subjects attention [9- 12]. We refer to the subject who has initially noticed the target as the sender and the one who focuses her attention on the target as a consequence of the senders behavior the receiver.2 The relationship between these two types merits further research, but this is not our present focus. Rather, we will concentrate on the asymmetric type, especially in the study described in Section 3. Nevertheless, we maintain that the two types have parallel levels or stages, and we capture this parallelism in the presentation below. Prior to describing the different levels of symmetric and asymmetric PI, we provide definitions of the central terms that will appear in the descriptions. The following four are basic behaviors, which can be observed more or less directly: Attention-focusing: the senders or receivers prolonged attention to a target; Attention-turning: the senders change in attention-focusing from target to receiver (or vice versa); in the case of visual attention this amounts to gaze alternation; Attention-getting: the senders behavior directed at the receiver, apparently causing the receiver to turn her attention toward the sender; Attention-contact: the senders and receivers focused attention on each others attentional state; in the case of visual attention this amounts to mutual gaze. On this basis, we can define the following more complex behaviors: Referential behavior: the senders behavior while attention-focusing on the target, apparently causing the receiver to turn her attention toward the target. Referential behavior can be either: Communicative: performed relative to the attentional status of the receiver, with the goal of affecting her behavior; or Non-communicative: performed in order to manipulate an object, reach a location etc. and not with the goal of affecting the receivers behavior. Non-communicative referential behavior may of course be communicative from the perspective of the receiver, but it is not intentionally communicative for the sender. Intentional communication [16] is indispensible for higher levels of intersubjectivity, and cannot be reduced to observable behavior or described in purely causal terms short of behaviorism, which as well-known is a blind alley. On the other hand, intentional communication is not something private and unobservable, since the intentions it involves are behaviorally manifest. Attentiongetting, attention-turning and persistence are markers of intentional communication, as stated in a classical definition:

2 We use these terms for ease of reference and not in their information-theoretical senses. In order to avoid clumsy gender-neutral expression like he or she or (s)he we refer to the sender in the masculine and the receiver in the feminine.


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

Intentional communication is a signalling behavior in which the sender is aware, a priori, of the effect that the signal will have on his listener, and he persists in that behavior until the effect is obtained or failure is clearly indicated. The behavioral evidence that permits us to infer the presence of communicative intentions includes (1) alternation in eye gaze contact between the goal and the intended listeners, (2) augmentations, additions, and substitution of signals until the goal has been obtained, and (3) changes in the form of the signal towards abbreviated and/or exaggerated patterns that are appropriate only for achieving a communicative goal. [17 p. 39] Other more specific behavioral manifestations apply to the most characteristic form of non-verbal intentional communication, pointing, which can be distinguished from its non-communicative (in the above sense) counterpart reaching as follows: Pointing: the extension of the hand (with or without the index finger outstretched) or the goal-directed movement of the head and/or some other body part towards the target in order to affect another subjects behavior towards the target. Reaching: the subjects outstretched arm(s) and hand(s) in the direction of the target with the hand and fingers being formed as to grasp the target as the target is approached and the grip being adjusted as the distance to the target decreases. The action is performed irrespective of the attention of another subject. Although pointing gestures may look similar to reaching, the two can be distinguished on closer examination. If the sender is not persistently trying to decrease the distance between himself and the target, and the reach and the grip of the hand are not adjusted so as to fit the target, then this should be classified as an instance of pointing (by the definition above), rather than reaching. 8.2.2 Levels of perceptual intersubjectivity Based on the definitions given above, we distinguish three general levels of PI, with symmetric and asymmetric counterparts. The first level is synchronous PI (SPI), and consists in the subjects simply synchronizing their actions in time and space while performing similar individual actions relative to a perceptual target. Synchronous PI is not a communicative behavior, in the sense that the action is not performed with the goal of affecting the behavior of another subject, though of course it may do so inadvertently. The second level is coordinated PI (CPI) and consists of the subjects adjusting their actions relative to a perceptual target. On this level the subjects actions are intentionally calibrated in time and space and are communicative in the sense explicated above. The third level is reciprocal PI (RPI), and is achieved by the subjects mutually matching their actions relative to a perceptual target. On this level each action is intentionally adjusted in space and time to the actions of the other subject. Similarly to CPI, the action is communicative, but the interaction between the subjects is still more complex: Each subject will perform his or her actions in response to those performed by the other subject, with the result that the actions

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


will be either similar, as in imitation, or complementary, as in turn-taking [5]. In the remaining part of this section we specify more clearly each one of these three levels, with respect to the two major types of PI: symmetric and asymmetric. Level 1: Synchronous Perceptual Intersubjectivity (SPI) In the case of symmetric SPI, the target T has independently captured subject As and subject Bs attention and caused both to focus their attention on it (Figure 1). An example would be when T belongs to a category of similar intrinsic value for both A and B, such as food or danger [4].

Figure 1. Symmetric SPI

On the other hand, asymmetric SPI can be characterized by the following stereotypical sequence (Figure 2):
1. 2. 3. A focuses his attention on T, possibly reaching towards T. Bs attention is attracted by (1). B turns her orientation to T, with the result that both A and B focus their attention on T.

An example is an infants reaching toward T, causing a caregiver to notice T, and possibly to offer it to an infant. But even more simple behaviors without reaching, such as attentional contagion, would qualify as belonging to this level, e.g. when a goat A turns its attention to a significant target (food) located behind another goat B, and this causes B to look towards A, and then turn its attention to the target by following the direction of As attention [18].

3 2

Figure 2. Asymmetric SPI

Level 2: Coordinated Perceptual Intersubjectivity (CPI) In the case of symmetric CPI, T has already been noticed by A and B. In addition, A directs his attention to Bs attention-focusing on T, and B directs her attention to As attention-focusing on T. In contrast to Level 1, we have here second-order attention for both participants: both perceive that the other perceives T (Figure 3).


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

Figure 3. Symmetric coordinated perceptual intersubjectivity (CPI)

An example of symmetric CPI is the following situation of social referencing: A is an infant, B is an adult, and T is of ambiguous value. By checking whether B is paying attention to T and looking for indications of positive or negative reactions on the part of B, A can adjust his attitude to T. Further, by monitoring As attention and attitude towards T, B can check if A is behaving appropriately towards T. Asymmetric CPI can be divided into two sub-types, according to whether or not the sender turns his attention to and focuses on the receiver during the interaction: (a) simple CPI in which the sender ostensively attends to the target and engages in communicative referential behavior towards it, however, without turning his attention to the receiver; (b) complex CPI where the sender turns his attention to the receiver, and possibly draws her attention to himself and his behavior (attention-getting). Thus, simple CPI can be characterized by the following stereotypical sequence of behaviors (Figure 4):
1. 2. 3. 4. A focuses his attention on T. A ostensively attends to T and engages in communicative referential behavior towards T. B notices (2). B turns her attention to T.

4 2 3

Figure 4. Asymmetric simple coordinated perceptual intersubjectivity (CPI)

Instances of simple CPI are simpler forms of imperative pointing, where an infant points toward an object, without turning his attention to the adult. Note that the novel behavior that distinguishes simple coordinated from synchronous PI is step 2, the communicative referential behavior, e.g., manifest attention (e.g., gaze) or pointing. Complex CPI may be characterized by the following stereotypical sequence (Figure 5):
1. A focuses his attention on T.

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


2a. A ostensively attends to T and engages in communicative referential behavior towards T. 2b. A turns his attention towards B. 3. 4. B notices (2). B turns her attention to T.

4 2a 2b 3

Figure 5. Asymmetric complex coordinated perceptual intersubjectivity (CPI)

Examples of this are typical cases of imperative pointing, in which a child makes sure that the adult is attending before performing the pointing gesture. Note that the crucial behavior that distinguishes complex from simple CPI is step 3 in which the infant turns his attention towards the adult during the interaction. Level 3: Reciprocal Perceptual Intersubjectivity (RPI) In symmetric RPI, A not only attends to Bs attention to T and vice versa (as in symmetric CPI), but attends to Bs attending to his (As) attention, and vice versa. On this level we have third-order attention [6]. The following example may illustrate the phenomenon: A child and an adult play a game of hiding toys. The child sees the hidden toy, smiles, and then looks at the adult and sees that the adult sees that he has seen the toy. Both acknowledge this (verbally). See Figure 6, where only the third-level attention of the child (A) is shown.

Figure 6. Symmetric reciprocal perceptual intersubjectivity (RPI)

Asymmetric RPI is characterized by the following stereotypical sequence (Figure 7):

1. 2. 3. 4. 5. 6. A focuses his attention on T. A engages in attention-getting relative to B (optional). B notices (2), and focuses her attention on A. A and B establish attention contact. A ostensively attends to T and/or engages in communicative referential behavior towards T. B turns her attention to T.


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

6 5 2 4 3

Figure 7. Asymmetric reciprocal perceptual intersubjectivity (RPI)

Examples of asymmetric RPI are typical cases of declarative pointing. Note that the novel behavior, distinguishing RPI from CPI is step 4, attention-contact, which in the case of visual attention corresponds to mutual gaze. 8.2.3 Summary In this section we have provided a level-based analysis of perceptual intersubjectivity, where each consecutive level is of higher complexity than the previous one. In the case of symmetric PI, where the target has already been noticed by the participants of the interaction, this complexity can be defined as first-order attention (Level 1), second-order attention (Level 2), and third-order attention (Level 3). In the type of PI which we call asymmetric, due to the fact that initially only one of the subjects has noticed the target and the other does so due to the referential behavior of the first, the different levels are defined by sequences of behaviors. In this type of PI the increased complexity is reflected by the fact that each higher level subsumes the previous ones, and also includes crucial novel behaviors. Thus, on Level 1 the behavior of the sender is not (intentionally) communicative (it is not directed towards the attention of the receiver). In contrast, on Level 2 (coordinated PI) the sender engages in various forms of communicative referential actions such as pointing. On Level 2.1 the sender engages in ostensively manifest behaviors, but does not turn his attention to the receiver to check if his action has been noticed. In contrast, on Level 2.2 such attention-turning occurs. Finally, Level 3 adds attention-contact, during which the subjects simultaneously attend to each others attentional states, which in the visual modality corresponds to mutual gaze. The different types and levels of perceptual intersubjectivity may thus be distinguished on the basis of overt behaviors and their sequencing. The rationale behind this procedure is its purpose, i.e., to construct a global model of perceptual intersubjectivity that can be applied to empirical data of adult-infant interactions in different species, cultures, and at different ages, such as the data made available by the SEDSU project [1]. This is mandatory in order to substantiate our hypothesis that the levels that we have identified correspond to developmental and possibly also evolutionary stages, i.e., they will be observed to different degrees in different periods of childrens development, and will differ between apes (and possibly the

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


common ape-human ancestor) and human beings. This hypothesis was tested in the study described in the following section. 8.3 An empirical study 8.3.1 Data and hypotheses We analyzed the following sets of data, which were made available by our collaborative research within the SEDSU project. 2 video-recordings of an infant bonobo (Luiza, age 10 months) and chimpanzee (Lobo, age 19 months) collected by Mathias Osvath at MPI Leipzig, appr. 60 minutes each. Additionally, 2 video recordings of Luiza (at the age of 13 months) and the chimpanzee Kara (age 7 months), 8:30 minutes each, recorded by Josep Call and his assistants. 6 video-recordings from a Thai/Swedish video-linked corpus, involving three Swedish children (BEL, TEA and HAR) and 3 Thai children (JOM, JAM and CHE), when these children were app. 18 months old. 2 video-recording of 2 Swedish children aged 12 months: ALI (recorded by Mats Andrn, 15 minutes) and TEA (recorded by Ulla Richtoff, 23 minutes). As obvious from this description, with the exception of the 2 data points from TEA and the bonobo, the data was not longitudinal. However, given that the data was, in broad terms, cross-sectional, and the different PI levels are of increasing complexity, we could formulate three hypotheses to test whether these correspond to developmental and evolutionary stages. H1: PI episodes of Level 3 (Reciprocal PI) will be attested predominantly among the 18-month old children. H2: PI episodes of Level 2 (Coordinated PI) will be observed among the 18-month old children and the 12-month old children. H3: PI episodes of Level 1 (Synchronous PI) will be the only form of perceptual intersubjectivity found in the ape data perhaps with occasional instances of Level 2. 8.3.2 Operationalization of the model The definitions of the different types and levels of PI presented in Section 2 were intended to be empirically attestable and applicable to both human and non-human subjects. However, in order to be able to use them as the basis for a coding scheme for the study it was necessary to specify them further. The general guiding principle was to be conservative, i.e. to have operational definitions which preferably under-interpret rather than over-interpret the observational data, especially the data from the 18-month old children. The reason for this is that these children have already made their entrance into language, and language can substitute for many other forms of intersubjective behavior, including mutual gaze and gesturing [7-8]. While being in essence a form of communicative referential behavior we decided explicitly not to code the childrens utterances as such, since that would have placed them on an uneven footing compared to the behavior of pre-linguistic children and apes. Furthermore, especially with the 12-months old children, it is not easy to distinguish verbalization from vocalization. Hence, in the


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

operational definitions offered below, we treat vocalization as a form of attentiongetting (cf. Section 2) but not as any of the other crucial behaviors (attentionturning, attention-contact, communicative referential behavior). The first specification compared to the definitions of asymmetric PI in Section 2, is that we analyzed only cases in which the sender was the infant (human or ape) and the receiver was the interacting adult (parent or some other individual). Furthermore, as mentioned in the introduction, we intend the definitions given below only to apply to the asymmetric variety of PI, the main reason being that it was much easier to individuate the PI episodes for this type, rather than for symmetric type. The beginning of a new PI episode was marked by the introduction of a new target: an object, event, location or direction that received focal attention from one or both of the subjects. Therefore, a PI episode by definition includes one and only one target and the introduction of another target defines the end of the previous episode and the start of a new one. What operationally counts as a new target was based on visible behavioral contrasts in the interaction. New targets were judged to occur when: (1) There was a shift in the infants attention to a target that is altogether outside his earlier focus of attention. (2) There was a shift in the infants attention to a target which is more or less within the earlier focus of attention, but a visible shift of attention is observable in both the infants and the adults behavior, such as: a. An object is singled out in contrast to several possible others. Example: playing with building blocks; although both participants are already attending to the block-building in general, the infant focuses his attention on a specific block while picking it up and thereby introduces a new target. In addition to this, the adult also visibly redirects her attention to this specific object. b. A part of an object is singled out in contrast to the object as a whole. Example: playing with a toy telephone; although the toy telephone is already the focus of the infants attention, the infant shifts his attention to a specific part of the telephone such as the mouthpiece. In addition to this, the adult also visibly redirects her attention to this specific part of the telephone. c. An object is moved and in this process its new location constitutes a new target of attention. Example: the infant is holding a glass of milk and the glass is within his focus of attention; but then the infant puts it down outside the current visual field of the adult who then needs to shift attention to this new location. Since we were primarily interested in classifying the behavior of the infant, when new targets were altogether outside the focus of the infants previous attention (case 1) it was not of crucial importance whether the target was within the visual field of the adult as long as the infant cannot see this. However, in cases (2a), (2b) and (2c) it was of crucial importance whether there was also a slight adjustment in the attention of the adult. Otherwise it was impossible to establish that these more subtle kind of new targets really are established as common to both parties. Since it was not possible to distinguish the infants attention to the target from attention-turning when the target was the adult herself, these targets/episodes were

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


excluded from analysis. In other words, the analysis deals with triadic and not dyadic engagements. Finally, we concentrated on visual attention, thus reducing attention to gaze. In the case of Level 3 (RPI) we requested that the infant and adult engage in mutual attention (gaze) within the infants turn, i.e. prior to the adult verbally commenting on the target. Given these qualifications, we could operationally define the different levels of asymmetric PI as shown in Table 1.
Level 1 Term Synchronous PI

Operational definition


Coordinated PI - simple



Coordinated PI - complex


Reciprocal PI


Table 1. Operational definitions of the levels of perceptual intersubjectivity used for the empirical study

8.3.3. Analysis and results All PI episodes were identified in the data according to the criteria outlined above. Coding was performed by the first and third author, and in case of uncertainty, the second author was consulted as well: until consensus between the three authors was reached. Unclear examples were excluded. This resulted in a total of 190 PI episodes, divided by the different video-recordings (data points) as shown in Table 2. The results strongly suggest that asymmetric PI seems to be a human speciality not in the sense that it is unique for our species, but that it is much more frequent in human infant-adult interactions, and consequently typical for human beings. In approximately 2 hours and 18 minutes of data, the apes engaged in only 5 asymmetric PI episodes. In contrast, the 2 hours and 8 minutes of human data contained a total of 185 instances.


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

Data point Kara (chimp) Lobo (chimp) Luiza (bonobo) Luiza (bonobo) TEA (Swedish) ALI (Swedish) TEA (Swedish) HAR (Swedish) BEL (Swedish) JAM (Thai) JOM (Thai) CHE (Thai)

Age (months) 7 19 10 13 12 12 18 18 18 18 18 18

Length (minutes) # PI episodes 8:30 60 60 8:30 23 15 15 15 15 15 15 15 2 1 1 1 26 24 26 18 33 11 23 24

Table 2. Total number of PI episodes per data point

Furthermore, the three hypotheses (Section 3.1) where almost surprisingly well confirmed, as shown in Figure 8. RPI (Level 3) episodes were not limited to the interactions of the 18-month old children, but they were proportionally more frequent than for the two 12-month olds. CPI episodes (Level 2) occurred in the data of both groups of children, but where altogether absent (along with RPI episodes) in the ape data. All 5 instances of asymmetric PI episodes initiated by the infant apes were cases of SPI (Level 1), thereby confirming hypothesis 3.
L1 L2 L3

100 80 60


L2 L3

L2 L1 L3

40 20 0 L2 L3

Apes 7-19 months 12 months

18 months

Figure 8. Percentages of asymmetric PI episodes by level (1, 2 and 3) in the three sets of data. Total number of episodes: 5 for apes, 50 for 12-month old children, and 135 for 18-month old children

8.3.4. Discussion The results of the study supported our hypothesis that the different levels of PI in our model, at least of the asymmetric variety, correspond to developmental and evolutionary stages. The three ape infants studied (in the four data points) and their

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


interacting adults engaged in a surprisingly low number of PI episodes, and all of these were of the simplest type, Level 1 (SPI), which did not involve intentional communication. Even for the adult, this type does not imply more complex processes than attentional contagion.3 Of course, this does not exclude the possibility of higher-level processing. The complete absence of any complex type of PI episodes in the ape data, irrespective of the differences in the ages of the ape infants (7-19 months) supports the analysis of SPI as being qualitatively different from the higher-level types, and indicates a corresponding difference between Pan and Homo. At the same time, we need not interpret this as a matter of inability of apes to engage in more complex types of intersubjectvity, since we know from previous research that in captivity (adult) apes do engage in intentional communication with human subjects [19]. Nevertheless, the differences were so conspicuous, that we believe that they reflect a qualitative difference in the nature of ape and human social interactions: human infants (and young children) engage in communicative referential behavior on a regular basis, while ape infants do not. Furthermore, since the differences between the pre-verbal and just-verbal children were relatively minor (as reflected in the minor differences in CPI (Level 2) in Figure 8 for the two groups) this seems to be a feature of human social interactions that is independent of and more basic than language. In terms of the Mimesis Hierarchy (MH) model [14-15] this difference can be interpreted as due to differences in dyadic and especially triadic bodily mimesis.

1 0,8 0,6 0,4 0,2 0 TEA12 Age TEA18

L1 L2.1 L2.2 L3

Figure 9. Number of PI episodes per minute for TEA at 12 and 18 months.

This conclusion is also supported by the only piece of direct human developmental evidence in our data: While SPI (Level 1) episodes predominate over all the others for the child TEA at 12 months, occurring roughly once a minute in the interaction, they decrease to 0,4 per minute at 18 months, while CPI (Level 2) episodes rapidly increase (Figure 9). The results concerning our first hypothesis (H1) regarding RPI episodes dominating in the 18-month old group, the results were less clear cut. Indeed, there was a higher proportion of RPI episodes in that group, but the 12-months old
3 Attentional contagion appears to be supported by a specialised neural mechanism [20]. The attention system immediately reacts to the perceivable re-orientation of the body, head, or gaze, or all of these, of other subjects, and will cause the receiver of the signal to turn her attention unless the behaviour is inhibited. Thus, attentional contagion can be said to occur on a subpersonal processing level, i.e., a level that cannot be accessed by conscious awareness.


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

group had only 2 children, and there was considerable individual variation between the children in both groups. Also the distinction between simple and complex CPI on the basis of the presence of attention-turning (towards the adult) in the latter, but not the former, seemed questionable. As Figure 10 shows, there was a higher proportion of simple CPI (Level 2.1) in the older group.

60 50 40 30 20 10 0

L1 L2.1 L2.2 12 months L3 L1

L2.1 L3 L2.2 18

L1 L2.1 L2.2 L3

Figure 10. Percentages of different levels (and sub-levels) of PI episodes for the 12-month and 18month human children

On many occasions the older children pointed to a target and verbalized (sometimes even using the appropriate term in referring to an object, e.g. CHE pointing to a cartoon figure in a book and saying Woodie!). Since there was no attention-turning to the adult, the episode was coded as an instance of Level 2.1. But if one takes the whole situational context into account the parent, child and guest sitting on the floor and repeatedly naming different toys and pictures it is hardly surprising that the child does not check to see if the adult is paying attention, since the interaction conforms to a pattern of many similar ones, and in a way, the child can take it for granted that the adult is paying attention given the lack of any evidence to the contrary. One could say that the attention of the other is part of the common ground [21]. Indeed, it seemed that the children directed their attention to the adults when for some reason, e.g. a silence on the part of the adult, it was not clear (to the child) that the adult was paying attention. Thus in sum, our particular coding scheme and data does not support treating Level 2.1 and Level 2.2 as developmentally distinct. While it was part of our methodology to be conservative, as pointed out, it seemed in quite a few cases that we were forced to under-interpret the childrens behavior, since language was not allowed to be coded as communicative referential behavior. Thus a number of /d/ utterances by TEA at 12 months which did appear to be communicative (and approximating the neuter deictic pronoun in Swedish), were not treated as such, and thus the corresponding PI episodes were coded as Level 1 rather Level 2. We may have been too conservative in this case, since /d/ can be argued to be communicatively referential not merely because of the corresponding deictic pronoun, but because it is formed by a protrusion of the tongue which is analogous to pointing, and has been argued by some to be even a developmental precursor to it [22]. It is characteristic that this data point (of 23 minutes) did not include a single case of true pointing (though a few cases of reaching). Clearly, this is a topic that needs to be further investigated.

J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity


8.4 Conclusions In this chapter we have offered a model of perceptual intersubjectivity (PI) in terms of two types (symmetric and asymmetric) and three levels (synchronous, coordinated and reciprocal). We concentrated on the asymmetric type and showed how the three levels formed a complexity hierarchy, with each successive level including additional behaviors on the part of the sender and matching responses from the receiver. Interpreting the sender as the infant initiating the PI episode, we showed how these levels can be given a developmental interpretation, corresponding to a possible sequence of stages of development, distinguished by the childs progressive understanding of the attentional state of the interacting adult. In sum, the foremost contribution of this chapter to the theme of enacting intersubjectivity consists of the systematic specification of levels of perceptual intersubjectivity in terms of observable behaviors for the purpose of analyzing social interaction, thereby connecting the individual and the social dimensions. The model resulted in a coding scheme of operational definitions, which was applied to infant-adult interactions in great apes (two chimpanzees and one bobobo), and human beings divided in two age groups (12 and 18 month-old children). The results showed conspicuous differences between the two species (Pan and Homo sapiens), which we took to be qualitative, and therefore as a possible contribution to the ultimate goal of understanding human cognitive specificity. Still, since we concentrated on the visual modality (audio data from the apes was in practice unavailable) we need to take the results of our study with some precaution. Nevertheless, the results offered support to our developmental (and to some extent evolutionary) interpretation of the different levels of PI. The three hypotheses formulated prior to any data analysis were confirmed, though it should be pointed out that the definitions of the levels, especially the operational ones in Section 3, were further specified after preliminary analysis of the data. At the same time, changes to the definitions were by no means introduced in order to offer post hoc support for our hypotheses, but to be able to code the interactions as unambiguously as possible. This forced us to exclude the childrens utterances as a form of communicative referential behavior, in order not to privilege the verbal children against the pre-verbal ones and the apes. The downside of this is that our model somewhat underestimates the role of language, as well as the capacity for sharing a common ground without overt indications of this. The upside is that we managed to define the different levels in terms of observable behaviors. We view this as an achievement in a field which is rife with debate on rich versus lean interpretations of the underlying capacities. In conclusion, our model of perceptual intersubjectivity, building on and further developing our previous research, can be said to have passed the test of empirical assessment, and can therefore be regarded as a useful conceptual and theoretical tool for conducting further analyses.


J. Zlatev et al. / Stages in the Development of Perceptual Intersubjectivity

8.5 Acknowledgements We wish to express our gratitude to Ulla Richtoff, Mathias Osvath and Josep Call for supplying us with parts of the data, and to the collaborative project Stages in the Evolution and Development of Sign Use (SEDSU), supported by the EU FP6 program under the call What it means to be human, for providing the framework and resources necessary for conducting this research. 8.6 References
[1] J. Zlatev & SEDSU-Project, Stages in the Evolution and Development of Sign Use (SEDSU). In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of Language: Proceedings of the 6th International Conference (EVOLANG6), (pp.379-388). New Jersey: World Scientific, 2006. [2] J. Zlatev, T. Racine, C. Sinha & E. Itkonen, The Shared Mind: Perspectives on Intersubjectivity. Amsterdam/Philadelphia : Benjamins, in press. [3] T. Honderich, Radical externalism. Journal of Consciousness Studies, 13 (7-8), 3-13, 2006. [4] S. Brten, On Being Moved: From Mirror Neurons to Empathy. Amsterdam/Philadelphia : Benjamins, 2007. [5] I. Brinck, The role of intersubjectivity for the development of intentional communication. In Jordan Zlatev, et al. The Shared Mind: Perspectives on Intersubjectivity. Amsterdam : Benjamins, in press. [6] J. Zlatev, The co-evolution of intersubjectivty and bodily mimesis. In J. Zlatev, et al. (Eds.), The Shared Mind: Perspectives on Intersubjectivity. Amsterdam : Benjamins, in press. [7] G. Butterworth, Pointing is the royal road to language for babies. In S. Kita (Ed.), Pointing: Where Language, Culture and Cognition Meet. Mahwah, NJ : Laurence Erlbaum, 2003. [8] M. Tomasello, The Cultural Origins of Human Cognition. Cambridge, Mass. : Harvard University Press, 1999. [9] I. Brinck, Attention and the evolution of intentional communication. Pragmatics & Cognition,9 (2), 255-272, 2001. [10] I. Brinck, The pragmatics of imperative and declarative pointing. Cognitive Science Quarterly, 3/4, 429-446, 2004. [11] I. Brinck, Joint attention, triangulation and radical interpretation: A problem and its solution. Dialectica, 58 (2), 179-205, 2004. [12] I. Brinck & P. Grdenfors, Co-operation and communication in apes and humans. Mind & Language, 18 (5), 484-501, 2003. [13] J. Zlatev, Meaning = Life + (Culture): An outline of a biocultural theory of meaning. Evolution of Communication, 4 (2), 253-296, 2003. [14] J. Zlatev, T. Persson & P. Grdenfors, Bodily Mimesis as the "Missing Link" in Human Cognitive Evolution, LUCS 121. Lund : Lund University Cognitive Studies, 2005. [15] J. Zlatev, From protomimesis to language: evidence from primatology and social neuroscience. Journal of Physiology, Paris, in press. [16] P. Grice, Meaning. Studies on the Way of Words, (pp. 213-223). Harvard : Harvard University Press, 1989. [17] E. Bates, The Emergence of Symbols. Cognition and Communication in Infancy. New York : Academic Press, 1979. [18] J. Kaminski, J. Tiedel, J. Call & M. Tomasello, Domestic goats, Carpa hircus, follow gaze direction and use social cues in an object choice task. Animal Behaviour, 69, 11-18, 2005. [19] D. A. Leavens & W. D. Hopkins, The whole hand point: The structure and function of pointing from a comparative perspective. Journal of Comparative Psychology, 113 (4), 417-425, 1999. [20] K. Chawarska, A. Klin & F. Volkmar, Automatic attention cuing through eye movement in 2-year old children with autism. Child Development, 74 (4), 1108-1122, 2003. [21] H. Clark, Using Language. Cambridge : Cambridge University Press, 1996. [22] S.A. Williams, Study of the Occurrence and Functions of da in a Very Young Bilingual Child. Lottbek : Verlag an der Lottbek, 1992.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Intersubjective Enactment by Virtue of Altercentric Participation Supported by a Mirror System in Infant and Adult
Abstract. Human newborns demonstrate a readiness to mirror facial expressions and gestures, and both infants and adults frequently manifest their participant mirroring of what their companions are doing to be illustrated and explained in this chapter. Sometimes they imitate or re-enact what they have seen being done. Sometimes they concurrently co-enact what the companion is doing as if they were virtual co-authors of the companions doing, and sometimes they pre-enact slightly in advance what the companion is about to do or say as if coming to the companions virtual aid, for example when spectators at a sports arena lift their legs as the high-jumper is about to jump, or when the spoon-feeder opens own mouth as the spoon is pushed into the opening mouth of the patient. Illustrations will be offered of infants who reciprocate feeding or even spoon-feeding before their first years birthday, thus demonstrating their learning by imitative reenactment by virtue of participant perception of their feeders acts of feeding. The above and other illustrations of participant perception are specified in terms of the inborn capacity for other-centred participation, and indicated to be supported by a mirror neurons system adapted in hominin phylogeny to subserve learning to cope and take care by (m)other-centred participation. This facilitates the ontogenetic path to speech in the culture into which the infant is born and will be shown to open a window to altruism in young children, exemplified by some three-year old orphans. The ontogenetic path from primary to tertiary intersubjective enactment is specified to go by way of embodied simulation to verbal conversation with its reciprocal and participant characteristics by virtue of simulation of mind.

Contents 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 Introduction to layers of intersubjective attunement and enactment in ontogeny ....................................................................................................... 134 On newborns imitation and protoconversation in the first weeks and months ........................................................................................................ 134 On infants having learned by altercentric participation to feed their feeder ........................................................................................................ 135 How altercentric participation opens a window to altruism in young children .................................................................................................. 139 From bodily simulation of others acts to simulation of conversation partnersmind ............................................................................... 141 On the neurosocial support by a mirror system decentred in phylogeny ..................................................................................................... 142 Summary and conclusions ................................................................................ 145 References......................................................................................................... 145


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

9.1 Introduction to layers of intersubjective attunement and enactment in ontogeny Recent infancy research findings, revealing the capacity for intersubjective attunement from birth, have replaced earlier theoretical views of infants as a-social and ego-centric with a new understanding of infant capacity for interpersonal communion and learning by other-centred participation [1]. In his new introduction to the paperback edition of The Interpersonal World of the Infant, Daniel Stern declares that recent evidences suggest that, probably from the beginning of life, infants have the capacity for what Braten (1998) terms alterocentric participation or what Trevarthen has long called primary intersubjectivity. [2, p. xx]. Today, based on empirical findings during the last four decades, we are able to distinguish different layers of intersubjective attunement in early human development arising from the foundations of infant intersubjectivity which Trevarthen was the first to define in the 1970's [3], and which throws new light upon imitative re-enactment and learning in infancy as well as upon steps from embodied simulation of actions to simulation of mind. Such steps adhere to these layers of intersubjective attunement, with the first operative from birth and continuing throughout life to support the higher-order layers: (I) Primary intersubjective attunement [3,4] in a reciprocal subject-subject format of protoconversation and interpersonal communion exhibited in the first weeks and months of life announces the kind of mutual mirroring and turn-taking which we find also in mature verbal conversation (II) Secondary intersubjective attunement [5] in a triangular subject-subject-object format involving shared attention and altercentric participation [1,2] in the object-oriented movements of one another. This begins between 6 and 9 months of age with co-operative use of objects of joint emotional referencing and imitative learning from altercentric participation in object-handling movements, inviting circular reenactment: learning by imitation to manipulate objects and to reciprocate caregivers acts. (III) Tertiary intersubjective understanding [6,7] in conversational and narrative speech, entailing predication and a sense of verbal or narrative self [2] and other in first-order modes of symbolic communication (from about 18 - 24 months). Second-order understanding of others minds and emotion (theory or simulation of mind from 3 to 6 years) opens for perspective-taking and emotional absorption, even in fictional others [8], and for simulation of conversation partners minds [9,10]. Thus, a major point here is that such higher-order achievements are supported by capacities and competencies unfolding in the low-order layers, which continue to be operational and supportive throughout life. I shall now go into some details about some of the operating characteristics pertaining to the various layers. 9.2 On newborns imitation and protoconversation in the first weeks and months Already in the first weeks of life human infants have been found to be capable of mutual subject-subject attunement in an immediate sense. This is documented by

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


film and video records and analyses. For example, 6 to 8 weeks olds have been found to engage in proto-conversation with their mothers, exhibiting finely tuned inter-coordination of movements and mirroring of expressions entailing the kind of turn-taking and rhythmic synchrony which we may recognize in verbal conversation partners later in ontogeny [11]. 9.2.1 How an audience reacts when watching a video of newborns imitatation In the first weeks after birth infants have been documented by experimental studies to imitate a variety of gestures, such as tongue protrusion, brow motions, and head rotation, finger movements, gestural features used to express surprise, delight and boredom, and vocal (vowel) productions. Most dramatic is perhaps the video documentations in 1983 by Kugiumutzakis [12] of how neonates in the first hour after birth attempt to come up with a semblant response, matching his facial gestures, such as tongue protrusion or a wide mouth opening. When I show Kugiumutzakis video record on the screen to an audience, for example, a video clip of a 25 minutes old girl exposed to his wide mouth opening preparing to imitate him, some people in the audience unwittingly reveal by their own wide mouth opening their virtual participation in what the newborn is trying to do [13]. When they watch this newborn girl preparing for coming up with a wide mouth opening movement resembling what Kugiumutzakis just had been doing with his mouth, I have photo records of how some in the audience open their own mouth. This is not imitative re-enactment on the part of the audience. This is pre-enactment or co-enactment, because people in the audience open the mouth slightly in advance or concurrently with the little girls opening her mouth -- as if to help her to achieve this tremendous feat. Being acutely aware of what the little girl is trying to prepare for; they unwittingly open their mouth before she manages to do so. And when I return to the speakers platform and point out what some of them had done, laughter breaks out and they become conscious of their own preor co-enactment. I can then explain how their unwitting pre- and co-enactment came about by their other-centred participation as if being a virtual co-author of the newborns impressive feat of re-enactment. 9.3 On infants having learned by altercentric participation to feed their feeder In the middle or second half part of the first year objects of shared attention come into focus, and with it the possibility of re-enactment of object manipulation demonstrated by the caregiver. This entail a triangular subject-subject-object format involving shared attention and participation in the object-oriented movements of the model, for example, when the nine-month-old in Meltzoffs deferred imitation experiment take after what the experimenter did the previous day, pushing a button on a box, emitting a sound or a light [14]. 9.3.1 When eleven-month-olds reciprocate their caregivers(spoon)feeding Add another month or two and even more complex feats of learning by imitating re-enactment can occur. In 1996 was video- and photo-documented that for instance a baby boy (3/4 month), when allowed to take the spoon in his own hand,


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

could reciprocate his big sisters spoon-feeding, and even opened his own mouth in the process [15, 16] (cf. Fig. 1 (middle right)). When infants reciprocate in this manner they demonstrate that while having been previously spoon-fed they have not just participated by receiving and eating the food, but actually having virtually partaken in the caregivers spoon-feeding from the caregivers stance. This entails altercentric participation, just like what occurs when the caregiver unwittingly opens his or her mouth when the baby opens the mouth to receive the food. Their circular re-enactment of what they have experienced as recipients of spoon-feeding show that the infants must have been able to participate in the feeders movements from the feeders stance -- the very reverse of what is seen from an outside, egocentric stance in such face-to-face situations. In order for infants to be able reciprocate the spoon-feeding they must have been able to virtually partake in their caregivers previous spoon-feeding activity as if they were co-authors of the feeding, even though their caregivers have been the actual authors of the feeding. 9.3.2 Definition of altercentric participation Regard the mouth of the infant feeders in Figure 1 (top and middle illustrations). Notice how they are opening their own mouth as their companions open the mouth to receive the food offered, and notice how the Yanomami girl (middle, left) tightens her lips as her big sisters mouth closes on the morsel. What you see revealed here, like what you yourself may unwittingly exhibit when feeding a child or a patient, is taking a virtual part in the patients intake of the food, as if participating in the others eating from the others stance, or virtually helping the other to grasp by mouth the food offered. These are instances of what I have identified and termed altercentric participation [1,17]. As the very reverse of perception of facing other subjects from an ego-centric perspective, other-centered participation entails the empathic capacity to identify with the other in a virtual participant manner that evokes co-enactment or shared experience as if being in the others bodily centre. Thus I define altercentric participation as egos virtual participation in Alters act as if ego were a virtual co-author of the act or being virtually hand-guided from Alters stance. This is sometimes unwittingly manifested overtly, for example, when lifting ones leg when watching a high jumper, or when opening ones own mouth when putting a morsel into anothers mouth (and differs from perspective-taking mediated by conceptual representations of others). Stern sees such other-centred participation as the basic intersubjective capacity that makes imitation, empathy, emotional contagion, and identification possible[18, p.242]. And what is more, we may add, when you are not just watching the other about to perform something, but wishing for the other to succeed in whatever he or she is doing, you will tend to show by your own accompanying muscle movements your virtual participation in the others effort as if you were a co-author of the others doing.

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


Adam Smith [19] had noticed how spectators watching a French line dancer would sometimes wriggle and move their own bodies as if helping the dancer to keep the balance as he walked on the slack line. He saw this as a manifestation of what he termed sympathy (which is not a bad term when considering the Greek roots of sym for joint and pathos for feelings, passion, or suffering). So, what the spectators were unwittingly doing, or what the members of the audience did when watching the video of what the newborn girl was trying to do, or what the feeder often unwittingly does, such as opening the mouth and then tightening the lips, when the patient prepares to mouth grasp the afforded food, is to exhibit by their muscle activation pre-movements and co-movements (termed MitBewegungen by Eibl-Eibesfeldt [20]). In this manner they show that they are taking a virtual part in what the other is trying to do, as if sharing the bodily centre of the others muscular activity. Such participant perception of the others move entails altercentricity -- the very reverse of egocentricity. 9.3.3 On intersubjective re-enactment, co-enactment, and pre-enactment When the feeder opens own mouth while in the process of feeding the patient this is not imitation in the sense of re-enactment, but rather an anticipating or concurrent move in the sense of pre- or co-enactment. Various modes of virtual intersubjective enactment may thus be seen to differ in terms of the temporal relations of the observed act and the enactment occurring after the observed act as occurred (inviting re-enactment), occurring concurrently with the observed act (inviting co-enactment) or even anticipating the act to be observed (inviting anticipatory pre-enactment). All three modes may be successively be evoked in the infant learning to reciprocate spoon-feeding: Firstly, when being subjected to spoon-feeding, the infant may be virtually co-enacting the spoon-feeding as if being a virtual co-author of the feeding. Secondly, when taking after the feeder by reciprocating spoon-feeding, the infant re-enact the spoon-feeding previously experienced. Thirdly, if the infant opens own mouth as he or she is offering the spoonful to the caregivers mouth, the infant is pre-enacting the caregivers intake of the afforded food (cf. the appendix on p. 135 in Brten (ed.) [21]). Learning by imitative re-enactment can hardly be accounted for in terms of perspective-taking in a social-cognitive sense, but rather in an e-motive and participatory sense of more primitive subjective experience in felt immediacy, evoking temporal feeling flow patterns, what Stern [18] terms vitality contours, that are shared by the model and the infant learner. The same applies to the oneand-half year old who is capable of reading of intention, e.g. imagination of unrealized or incomplete efforts inviting simulated completion, such as in Meltzoffs (1995) behavioral re-enactment design in which 18-month-olds successfully realizes a novel target act from watching the experimenter failing to pull a dumbbell apart [22]. This will be turned to later.


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

Figure 1 Feeding situations and food grasping inviting participant mirroring [23]. (Top) From a day-care centre in Italy (drawing after photo recording by C:P. Edwards 1997). (Middle left) Yanomami-girl feeding her big sister (based on photo by Eibl-Eibesfeldt [20]). (Middle right) Norwegian boy (11 3/4 months) reciprocating his sisters spoon feeding (Based on photos by Brten [15]). (Bottom) Macaque (with electrodes attached to pre-motoric cells in the brain) in an experimental situation at the Institute of Human Physiology, University of Parma, in which the macaque first sees the experimenter grasping a morsel, and then grasps the morsel on its own. In both cases there occurs discharge of the same pre-motoric cells (which later aptly were termed mirror neurons) (Drawing adapted from video presentation by Rizzolatti at a conference in Delmenhorst June 2000 [24]). This will be turned to later in the chapter. The drawings in this figure have been made by the author and used in his textbook in Norwegian [23, p.41] for which he has copyright of figures (See also his drawings of the keynote experiment on mirror neurons in three pertinent collective volumes [17, p.122; 10, p.281; 21, p.5]).

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


9.4 How altercentric participation opens a window to altruism in young children As the very reverse of perception of facing subjects from an ego-centric perspective, other-centered participation entails the empathic capacity to identify with the other in a virtual participant manner that evokes co-enactment or shared experience as if being in the others bodily centre. This may invite in children a general proclivity towards prosocial and even altruistic behaviour. Here is the proposition: (i) By virtue of the innate capacity for other-centered participation in the patients distress or felt need as if experiencing that from the patients center, there is a natural proclivity in the child to feel concern and sometimes attempt to help the patient, perhaps even at own expense, if situational and motoric resources permit.

If helping occurs at own expense, then this would per definition entail altruism. Does this apply to the previous examples of infants feeding other? Not quite, and only if those infants would have preferred to reserve the food afforded for themselves. In the case of the Norwegian boy (Fig. 1 (middle right)), that certainly did not apply. True, he reciprocated his sisters spoon-feeding, but only until the sweet desert; that he kept to himself; no more sharing, then. 9.4.1 Circular re-enactment of care-giving from e-motional memory From previously being spoon-fed by his caregivers, however, he had learnt to (take delight) in spoon-feeding others in return, and to do so before his first birthday. Such an impressive early feat of cultural learning entails that nature has been at play: an innate capacity for imitative learning even of care-giving, and which now permits specifications in terms of other-centred participation: (ii) Care-giving situations, which may appear to be unilateral activities, should be re-defined to be seen at the reciprocal activities entailed in virtue of the infants taking a virtual part in what the caregiver does, and thereby learns from alter-centric participation in that very care-giving. Such learning entail a kind of procedural memory or, as I would specify it, an emotional memory. The composite term e-motional combines the folk sense of being moved by and the root sense out-of-motion. By e-motional memory, then, I mean here the affective remembrance -- which is not conceptual and may not be conscious -- of having virtually moved with Alters movements leaving Ego with a characteristic vitality contour and implicit memory of the virtual coenactment which may be evoked for re-enactment in similar situations. In an environment affording care, the infant gets recurrent opportunities to not just be subjected to care but to feel to be virtually co-enacting such care-giving, inviting circular re-enactment from e-motional memory of such care-giving if and when others in need or distress reactivate in the child feelings semblant of the form


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

of bodily self-feelings evoked in situations in which the infant has experienced care-giving, and hence activating circular re-enactment of care. Then others in need or distress may invite caring efforts resembling the caring afforded by others earlier in infancy from e-motional memory (or what Fogel terms participative memory [25]) of having virtually participated in that caregiving. (iii) The kind of caretaking frequently experienced by the infant in virtue of altercentric participation provides a basis for circular re-enactment of that kind of caretaking towards other children in need or distress. This fits with studies revealing how the quality of the care-giving background appears to play a role in childrens reaction towards others in need: Those from a nurturing and caring background are most likely to help and offer comfort to other children in need or distress [26, 27]. Altercentric participation is at play in a twofold way here: first, by the part it plays in learning from caregivers who have left the child with an e-motional or participatory memory of care-giving; second, by the way in which altercentric participation may be elicited by others in need or distress, and thereby activating circular re-enactment of care-giving offered to them. 9.4.2 Cases of altruism exhibited by orphans rescued from horrible circumstances Thus, sensitive caretaking frequently experienced by the infant in the reciprocal mode of felt immediacy may come to provide a basis for circular re-enactment of semblant kinds of caretaking towards other children in need or distress. However, the innate capacity for participant perception may also give rise to empathy and affordance of care in a more immediate sense even in the absence of model learning. Anna Freud reports about three-year old orphans rescued from Nazi death camps during the second world war who, in spite of having been deprived of caregiving nurture, afford one another care [28]probably by virtue of the inborn capacity for participant perception triggering empathic identification. Perhaps most impressive, in view of their gruesome and depriving backgrounds, was they way they behaved towards one another at mealtimes: handing food to the companion was more important than having food oneself. Here is one example, occurring in one month after their arrival in October: John [3 years 11 months] cries when there is no cake left for a second helping for him. Ruth [3 years 7 months] and Miriam [3 years 3 months] offer him what is left of their portions. While John eats their pieces of cake, they pet him and comment contently on what they have given him. [28, p.175] Their content and commenting behaviour suggests their taking delight in his eating. Their empathic identification is indicative of other-centred participation which probably gave rise to their altruistic act.

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


9.5 From bodily simulation of others acts to simulation of conversation partners mind We shall now turn to the path from embodied simulation of movements or attempted moves to mental simulation of other minds. First will be considered the behavioural reenactment cases of 18-month olds who demonstrate that they can realize the unrealized target from merely watching someone aiming for the target. Then will be turned to conversation partners simulation of mind. Both cases will be accounted for in terms of simulation circuits evoked by altercentric participation. 9.5.1 When an 18-month-old realizes the models failure to pull a dumbbell apart In Meltzoffs behavioral reenactment design 18-month-olds are face-to-face with the experimenter who fails to pull the dumbbell apart. When handed the dumbbell, the child pulls it apart, usually with a triumphant smile. Here is demonstrated the childs capacity to read the models intention [22]. But there is more involved, and which may be specified in terms of altercentric participation: From having virtually participated in the models effort, evoking simulated completion of the attempted act, there is circular re-enactment by the child, successfully realizing the target act [17]. Goldman supports this account [29]. He is an early advocate of the simulation of version of theory of mind, i.e. that rather than constructing a theory of others minds from which childrens understanding of others understanding is derived, children simulate the others (mis)understanding processes in more direct sense. In a joint paper with one of the mirror neurons discoverers, Vittorio Gallese, Goldman makes the point that when the same muscular activity is activated in an observer as the activity utilized by the observed, when such matching muscular activity accompanies mirror neuron systems activation in the observer, that lends support to the simulation version of theory of mind [30] as opposed to the theory construction version which assumes that the child constructs a theory of the others mind, deriving its understanding of the others understanding from that construct. The alternative conceptions allows us to specify an ontogenetic path from objectoriented secondary intersubjectivity entailing embodied simulation of objectoriented acts to tertiary intersubjectivity opening windows to simulation of conversation partners mind. 9.5.2 Listeners altercentric perception and interlocutors simulation of one anothers act In an interview with Gallese and myself on the mirror neuron systems implications for intersubjectivity and social cognition [31], we both reply in the positive to the question whether there is a path in child development from bodily to mental simulation. As I put it: It appears to be a path of learning by imitation entailing virtual (other) participation, i.e. other-centred participation in the sense of being a virtual co-author of what the model or patient is doing [31, p.100]. In the above case of realizing an unrealized attempted target act, there a mirroring mechanism in operation which resembles and probably supports processes also in verbal conversation, paralleling the conversational efficiency demonstrated in verbal dialogues later in ontogeny.


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

From about 3 to 6 years of age, children manifest meta-understanding of others understanding entailing second-order mental understanding of thoughts and emotions in self and other in virtue of recursive mental simulation of mental processes in others -- beginning with discovery of deceit and attribution of false beliefs, and with co-narrative fictional constructions with peers, and enabling the child that is listening to a story to take the point-of-view of the main character or protagonist. For example, in an Oxford study, Rall and Harris find that when 3and 4-year-olds are asked to retell fairytales, such as about Cinderella, they manage the best recall when the verbs in the stories listened to are consistent with the stance of the protagonist with whom they identify. While the children have trouble when the verbs in the stories told are used from the reverse perspective, at odds with their perspective-taking, their recall is more accurate for verbs, such as come and bring, go and take, if used spatially consistent with the point-ofview of the main protagonist, inviting their altercentric participation in Cinderellas slippers, as it were [32]. This pertains to the qualitative leap to childrens simulation of mind, correlating with their verbal and conversational ability and entailing second-order understanding of others thoughts and emotions. It seems reasonable to assume that a mirror system for matching or simulating others acts may afford a precursory and nurturing path to simulation of other minds [6, 30]. For example, when conversation partners complete one anothers aborted statements, such sentence completion may be accounted for in terms of a cybernetic model of simulation in conversation partners, put forward in the early 1970's [9, 10] -- the first model to articulate the simulation version of theory-of-mind approaches in psychology and philosophy. You may be listening to a conversational partner who is in the process of making a verbal utterance who before the utterance is completed, appears to hesitate or to be at loss for the right words, and without hesitation you supply the words, completing the utterance of the speaker. Analogue to the spoon-feeding situation: when mouth movements of the feeder -- infant or adult --- reflect the corresponding mouth movements of the one being fed we may see a parallel here also to the virtual participation exhibited by partners in verbal conversation, coenacting one anothers complementary acts and sometimes completing one anothers utterances by virtue of simulating the production, much as the toddler does in the above behavioural reenactment design. As I phrase it today in terms of altercentric participation, when you find yourself more or less unwittingly completing what your conversation partner is about to say, you overtly manifest your participant perception of the others speech act, simulating what the other is about to say as if you were a virtual co-author enabled by an altercentric mechanism, supported by an other-centred mirror system decentred in phylogeny to subserve preverbal and verbal conversational efficiency. 9.6 On the neurosocial support by a mirror system decentred in phylogeny When verbal conversation partners show by the overt behaviour that they simulate one anothers complementary processes, by virtue of altercentric participation in the partners executed speech act and understanding, they parallel to a certain

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


extent lower-order processes already exhibited earlier in ontogeny, for example by infants altercentric participation in their caregivers enactment as if they had been hand-guided from the caregivers stance and were a co-author of the care-giving. 9.6.1 The discovery of mirror neurons Parts of the neurophysiological support of such feats have now been revealed by the discovery of a mirror system in the human brain [24, 33, 34 35]. Mirror neurons were first found in macaque monkeys to discharge both when the macaque observes the experimenter grasping a piece of food and when the monkey is grasping the piece by itself [33, 34, 35] (such as illustrated in Fig. 1 (bottom)). Further experimental evidence [33, 35, 36] shows that such a system exists also in humans, in the brain region that contains Brocas area (which not only serves speech, but appears to come active during execution and imagery of hand movement and tasks involving hand-mental rotation). Identifying such a mirror neurons system enabling observed enactment to be matched to semblant, internally generated enactment in the observer of that enactment, Rizzolatti and Arbib [33] refer to a Libermans motor theory of speech perception [37] implying a close link between the production and perception of speech, This is consistent with what is portrayed in Brtens conversation model of how the listener takes a part in the speakers production process [9, 10], and which would presuppose the operational subservience of such a mirror system, affording some clues to the evolution of language [33, 36, 38, 39]. 9.6.2 Questions about evolution One pertinent question concerning the phylogenesis of learning by other-centred participation is this: At what critical period in the hominid evolution would infant learning by altercentric participation have afforded a distinct selective advantage? Or to put the question otherwise, inviting a tentative reply: Deprived as they were by their parents bipedalism of the body-clinging advantage enjoyed by young offspring of apes, how could hominid offspring learn to cope and take care from the distant gesticulations and articulations of an instructing mother? The reply I have offered is this: In order to compensate for the loss of the protective and instructive back-riding mode of actually moving with their mothers body, they would have had to depend on face-to-face modes of communion and on cultural learning by virtually moving with observed models. Mother-infant pairs capable of protoconversation and joint visual attention, would have had a selective advantage in both contributing to and drawing upon an emerging protolanguage cultural environment, and in particular before the invention of baby-carrying slings. Richard Leakey attributes such body slings to Homo erectus in this scenario: "[w]e see a small human group, five adult females and a cluster of infants and youths. they are athletic in stature, and strong. They are chattering loudly, some of their exchanges obvious social repartees, some the discussion of today's plans [...] to gather plant foods [...]. Three of the females are now ready to leave, naked apart from an animal skin thrown around the shoulders that serves the dual role of baby carriers and, later, food bag. [40, pp. 93-97]


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

Now, assuming that this attribution of chattering, as well as baby slings, is warranted, then it is likely that earlier in hominid evolution, prior to the invention of such baby slings, hominid offspring incapable of back-riding could have been faced with extinction were they not able to learn face-to-face and from a distance to cope and take care. Thus, I have proposed this hominin infant decentration hypothesis: Compensating for the loss of the body-clinging advantage that enables offspring of other primates to perceive and learn without having to transcend own body-centered perspective shared with the carrying mother, those hominin offspring able to learn to cope and take care by (m)other-centered perception of distal vocalizing and gestural articulation, would have had a selective advantage and a contributing impact [38, 39]. In her article on pre-linguistic evolution in early hominins, Dean Falk has hypothesized that hominin mother would have had to adopt new foraging strategies that entailed maternal silencing, reassuring, and controlling the behaviours of physically distant offspring who did not have the possibility of clinging to the mothers body. Thus, she claims, mothers using prosodic and gestural markings and attended vigorously to their offspring would be strongly selected for [41]. As I have pointed out in a commentary, Falks hominin mother-infant model presupposes an emerging infant capacity to perceive and learn to cope and take care from a distance [39]. A necessary condition may have been an emerging infant capacity to perceive, understand, and learn by (m)other-centred participation from the gestures and vocalizations afforded by the vigilantly attending mothers, and in her response to commentaries Falk stresses the significance of my hominin infant decentration hypothesis because it specifies how mirror neurons could have been of major importance during the period of evolution when hominin infants lost the ability to ride clinging to their mothers backs [41, p.532] and before the invention of baby slings. Unlike back-riding offspring of other primates, in no need to de-center own body-centered perspective, a mirror neurons system may have been adapted in hominin infants to subserve the kind of (m)other-centered mirroring we now see manifested by human infants even in face-to-face situations which actually entails mirror reversal when reenactment occurs. 9.6.3 Defective mirror system in autism Such phylogenetic adaptation of the mirror neurons system by way of decentration enables the modern human infant who is watching and imitating performing others face-to-face, to carry of a perceptual mirror reversal, shifting from other-centred participant perception of what the model is doing, as if experienced from the models centre, to own body-centred frame of orientation required for own execution of the imitative re-enactment [10, 17, 42]. While ordinary children in virtue of altercentric perception can do what the other is doing when seen face-toface, children with autism who understand and comply with the invitation Do as I do have problems in face-to-face situations probably due to a defective mirror system. For example, when the model is raising his arms, the subject with autism, incapable of altercentric mirroring, may compare the inside of the models hands with own hands and, then, raise his own hands with the palms inwards [4, 43].

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


9.7 Summary and Conclusions In this chapter intersubjective enactment in the anticipatory, concurrent and imitative senses have been distinguished and illustrated with reference to layers of intersubjective attunement in ontogeny [6,7], as distinguished by infancy research findings in the last four decades [17, 21] and with a focus on infant learning by altercentric participation in what the model is doing in face-to-face situations as if being a virtual co-author of the models doing [1, 2]. Imitative re-enactment in such face-to-face situations actually requires a mirror reversal on the part of the infant learner, found difficult by subjects with autism, i.e. to engage in othercentred participant perception of what the model is doing, as if experienced from the models centre, and then shift to own body-centred execution of the imitative re-enactment. With reference to the mirror neurons discovery, the neurosocial support by an altercentric mirror system that must have been decentred in phylogeny has been indicated. It affords the innate foundations of the impressive feats of infant re-enactment in early ontogeny as succinctly described in this chapter and as documented inter alia by the seminal findings presented in three recent collective source volumes [17, 21, 44]. Examples have been given of different modes of intersubjective enactment in the format, respectively, of preenactment, co-enactment, and re-enactment when the differences of time between the observed (model) act and participant enactment are taken into account. 9.8 References
[1] S. Brten, Infant learning by altercentric participation. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 105-124). Cambridge: Cambridge University Press, 1998. [2] D.N. Stern, Introduction to the paperback edition. In D.N. Stern, The Interpersonal World of the Infant (pp.xi-xxxix). New York: Basic Books 2000 / London: Karnac, 2003. [3] C. Trevarthen, Communication and cooperation in early infancy: A description of primary intersubjectivity. In M.M. Bullowa (Ed.), Before Speech, (pp. 321-347). New York: Cambridge University Press, 1979. [4] C. Trevarthen, The concept and foundations of infant intersubjectivity. In S. Brten (Ed.) IntersubjectiveCommunication and Emotion in Early Ontogeny, (pp. 15-46). Cambridge: Cambridge University Press, 1998. [5] C. Trevarthen & P. Hubley, Secondary intersubjectivity. In A. Lock (Ed.), Action, Gesture, and Symbol, (pp.183-229). London: Academic Press, 1978. [6] S. Brten, Intersubjective communion and understanding: development and perturbation. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 272-382). Cambridge: Cambridge University Press, 1998b. [7] S. Brten, S. & C. Trevarthen, From infant intersubjectivity and participant movements to simulation and conversation in a cultural common sense. In S. Brten (Ed.), On Being Moved: From mirror neurons to empathy, (pp. 21-33). Amsterdam/Philadelphia: John Benjamins Publ. Co., 2007. [8] P. Harris, Fictional Absorption: Emotional response to make-believe. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 336-353). Cambridge: Cambridge University Press, 1998. [9] S. Brten, Coding Simulation Circuits during Symbolic Interaction. In Proceedings of the 7th International Congress on Cybernetics 1973, (pp. 327-336). Namur: Association Internationale de Cybernetique, 1974. [10] S. Brten, Altercentric perception by infants and adults in dialogue: Egos virtual participation in Alters complementary act. In M. Stamenov & V. Gallese (Eds.), Mirror Neurons and the Evolution of Brain and Language, (pp. 273-294). Amsterdam/Philadelphia: John Benjamins Publ. Co., 2002.


S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation

[11] M.C. Bateson, Mother-infant exchanges. the epigenesis of conversational interaction. In D. Aronson & R.W. Rieber (Eds.), Developmental Psycholinguistics and Communication, (pp. 110113). New York: New York Academy of Sciences, 1975. [12] G. Kugiumutzakis, Neonatal imitation in the intersubjective companion space. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 63-88). Cambridge: Cambridge University Press, 1998. [13] S. Brten, Participant perception of others acts. Virtual otherness in infants and adults. Culture & Psychology, 9 (3), 261-276, 2003. [14] A. N. Meltzoff, Infant imiation and memory: nine-month-olds in immediate and deferred tests. Child Development, 59, 217-225, 1988. [15] S. Brten, Infants demonstrate that care-giving is reciprocal. Centre for Advanced Study Newsletter no.2 (November), 2, 1996. [16] S. Brten, Altercentric infants and adults: On the origins and manifestations of participant perception of others acts and utterances. In S. Brten (Ed.) On Being Moved: From mirror neurons to empathy, (pp. 111-135). Amsterdam/Philadelphia: John Benjamins Publ Co, 2007. [17] S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny. Cambridge: Cambridge University Press 1998. [18] D. N. Stern, The Present Moment in Psychotherapy and Everyday Life. New York: Norton, 2004. [19] A. Smith, The Theory of Moral Sentiments, 6th ed. 1970. Oxford: Clarendon Press, 1976. [20] I. Eibl-Eibesfeldt, Die Biologie des menschlichen Verhaltens. Seehamer Verlag, 1997. [21] S. Brten (Ed.), On Being Moved: From mirror neurons to empathy. Amsterdam/Philadelphia: John Benjamins Publ. Co., 2007. [22] A.N. Meltzoff & M.K. Moore, Infant intersubjectivity: broading the dialogue to include imitation, identity and intention. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 47-62). Cambridge: Cambridge University Press, 1998. [23] S. Brten, Dialogens speil i barnets og sprkets utvikling. Oslo: Abstrakt Forlag, 2007. [24] G. Rizzolatti, L. Graighero & L. Fadiga, The mirror system in humans. Presented at the Delmenhorst conference on mirror neurons, June 2000. In M. Stamenov & V. Gallese (Eds.), Mirror Neurons and the Evolution of Brain and Language, (pp. 37-59). Amsterdam/Philadelphia: John Benjamins Publ. C., 2007. [25] A. Fogel, Remembering Infancy: Accessing our Earliest Experiences. In G.Bremner & A. Slater (Eds.), Theories of Infant Development, (pp. 204-230). Cambridge: Blackwell, 2004. [26] L.E. Berk, Child Development, 3rd ed. Boston: Allyn and Bacon, 1989. [27] C. Zahn-Waxler, M. Radke-Yarrow & R. King, Child rearding and childrens prosocial initiations towards victims in distress. Child Development, 50, 319-330, 1979. [28] A. Freud (with S. Dann), An Experiment in Group Upbringing. The Psychoanalytic Study of the Child, 6, 127-168, 1951, reprinted in The Writings of Anna Freud, 4, (pp. 163-229). New York: International Universities Press, 1973. [29] A. Goldman, A. (2005) Imitation, mind-reading, and simulations. In S. Hurley & N. Shater (Eds.), Perspectives on imitation: From neuroscience to social science, 2 (pp. 79-93). Cambridge, MA: The MIT Press, 2005. [30] V. Gallese & A. Goldman, Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Neuroscience, 2 (12), 493-501, 1998. [31] S. Brten & V. Gallese (Interviewed by the Impuls editors L.T. Westlye & T. Weinholdt), On mirror neurons systems implications for social cognition and intersubjectivity. Impuls, 58 (3), 97107, 2004. [32] J. Rall & P. Harris, In Cinderellas Slippers: Story Comprehension from the Protagonists Point of View. Developmental Psychology, 36, 202-208, 2000. [33] G. Rizzolatti & M. Arbib, Language within our grasp. Trends in Neurosciences, 21 (5). 188-193, 1998. [34] P.F. Ferrari & V. Gallese, Mirror neurons and intersubjectivity. In S. Brten (Ed.), On Being Moved: From mirror neurons to empathy, (pp. 73-87). Amsterdam / Philadelphia: John Benjamins Publ. Co., 2007. [35] L. Fadiga & L. Craighero, Clues on the origin of language: From electrophysiological data on mirror neurons and motor representations. In S. Brten (Ed.), On Being Moved: From mirror neurons to empathy, (pp. 101-110). Amsterdam/Philadelphia: John Benjamins Publ Co, 2007. [36] L. Fogassi & V. Gallese, The neural correlates of action understanding in non-human primates. In M. Stamenov & V. Gallese (Eds.) Mirror Neurons and the Evolution of Brain and Language, (pp. 13-35). Amsterdam/Philadelphia: John Benjamins Publ. Co., 2002. [37] A.M. Liberman, Haskin Laboratories Status Report on Speech Research, 113, 1-32, 1993.

S. Braten / Intersubjective Enactment by Virtue of Altercentric Participation


[38] S. Brten, Beteiligte Spiegelung. Alterzentrische Lernprozesse in der Kleinkindentwicklung under der Evolution. In U. Wenzel, B. Bretzinger & K. Holz (Eds.), Subjekte und Gesellschaft. (pp. 139169). Weilerswist: Velbrck Wissenschaft 2003. [39] S. Brten, Hominin Infant Decentration Hypothesis: Mirror neuron system adapted to subserve mother-centred participation. Commentary. Behavioral and Brain Science, 27 (4), 508-509, 2004. [40] R. Leakey, The Origin of Humankind. London: Phoenix 1995. [41] D. Falk, Prelinguistic evolution in early hominins: Whence motherese? (incl. Authors response to commentaries, pp. 139-169). Behavioral and Brain Science, 27 (4), 291-541, 2004. [42] A. Billard & M. Arbib, Mirror neurons and the neural basis for learning by imitation: Computational modeling, In M. Stamenov & V. Gallese (Eds.), Mirror Neurons and the Evolution of Brain and Language, (pp. 243-351). Amsterdam/Philadelphia: John Benjamins Publ. Co., 2002. [43] A. Whiten & J. Brown, J., Imitation and the reading of other minds. In S. Brten (Ed.), Intersubjective Communication and Emotion in Early Ontogeny, (pp. 260-282). Cambridge: Cambridge University Press, 1998. [44] M. Stamenov & V. Gallese (Eds.), Mirror Neurons and the Evolution of Brain and Language. Amsterdam/Philadelphia: John Benjamins Publ. Co., 2002.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



The Self-Other Distinction: Insights from Self-Recognition Experiments

Abstract. Recent neuroscientific studies of self-awareness have focused on how the self compares to representations of other people, on the ability to represent and attribute mental states, and on the ability to represent how the external world would appear from other viewpoints. Social cognitive neuroscience tends to emphasize the shared properties of self and others across several dimensions, such as the shared properties of actions, bodies and sensations, rather than the asymmetries between self and other. In the present chapter, we put forward the hypothesis that the experience and representation of ones own body may underpin the distinction between the self and other agents. In every inter-action, there are both private and public states and signals represented in the brain of the agent and the observer. Private signals refer to centrally generated action representations such as intentions, efferent signals (e.g. efference copy, motor commands), and reafferent signals such as proprioception. Public signals originate from observable sensory events, both re-afferent and ex-afferent, such as visual and auditory signals that may refer to bodies, objects or complex patterns of motor behaviour. How are these signals used to disambiguate the identity of bodies and the origin of actions? By focusing on recent experiments on self-recognition, we propose that the experience of ones actions, which depends largely on the processing of efferent information, may function as a unifying element that structures a coherent representation of the bodily self, as distinct from the other agents.

Contents 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 Introduction....................................................................................................... 150 On the primacy of the body............................................................................... 150 On motor and sensory signals ........................................................................... 151 From signals to the experience of ones own body ........................................... 152 Who is the agent? .............................................................................................. 153 A working example: self-recognition studies.................................................... 156 Towards an implicit measure of self-recognition? ............................................ 160 Conclusions....................................................................................................... 161 Acknowledgments............................................................................................. 161 References......................................................................................................... 161


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

10.1 Introduction With the advent of social cognitive neuroscience, recent studies of self-awareness have focused on how the self compares to representations of other people [1], on the ability to represent and attribute own and other peoples mental states [2], and on the ability to represent how the external world would appear from other viewpoints [3]. However, the question of how the self can be distinguished from other people, what we would call the self-other distinction, has not been fully addressed. Most studies tend to emphasize the shared properties of self and others across several dimensions, such as the shared properties of actions, bodies and sensations (for a review see [4]), rather than the asymmetries between self and other. In the present chapter, we put forward the hypothesis that the experience and representation of ones own body may underpin the distinction between the self and other agents. For our purposes, the self will be treated as the minimal sense of owning a body and the actions originating from that body [5]. This minimal self is a physical entity which exists in a physical world and has physical effects via its physicality [6, p. 50]. As such, the minimal self is predominantly an embodied acting self. 10.2 On the primacy of the body There are several unique components in the experience of ones own body, that demonstrate the existence of an intimate link between the body and the self. For example, contrary to the perception of an object, which can be perceived from different perspectives or even cease to be perceived, we experience the feeling of the same old body always there [7, p. 242]. When I decide to write something, I do not need to look for my hand, in the same way that I have to look for a pen or a piece of paper. Does this permanent presence make the body special? MerleauPonty wrote: [] It is particularly true that an object is an object in so far it can be moved away from me, and ultimately disappear from my field of vision. Its presence is such that it entails a possible absence. Now the permanence of my own body is entirely different in kind [] Its permanence is not a permanence in the world, but a permanence on my part. [8, p.90]. The fact that the body is always present suggests that body-awareness is not like any other form of objectawareness, because the body is an object that normally never leaves me. The body is also a unique perceptual entity by virtue of the versatile ways in which it is perceived. Bodies are perceived from the outside (e.g. vision), but my body is also perceived from the inside (e.g. proprioception, interoception). Proprioceptive sense is often conceptualized as the sense of the self par excellence, precisely because no one else can feel my hand moving in the same way I feel it moving from the inside. The fact that the body is perceived from within guarantees an immediate first-personal mode of presentation of bodily experiences. More importantly, ones own body is the only object in the world that can be freely moved according to ones own will. [] Body is an organ of the will, the one and only Object which, [], is moveable immediately and spontaneously and is a means for producing a mediate spontaneous movements in other things [] [9, 38, pp.152]. The simple fact that we are capable of action with and sensation

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


in our bodies is sufficient to distinguish the relation we have with our bodies from our relations with other objects [10]. At the experiential level, the body imposes a point of view of the world [8]. It is the mere fact of embodiment that defines a certain point de vue for the embodied self, because it is thanks to the presence of the body, and its position in space that every relation between the self and the world is made possible. In that sense, the bodily self can be thought of as a perspectival source from where all actions emanate and to where all experiences are returned [6]. In addition, both the effectors that materialize our intended actions and the sensory organs that provide our perceptual experiences of the world are the constitutive elements of the lived body. Almost all human activity involves voluntary movements and sensory experiences. Both action and perception are made possible through central motor signals and peripheral sensory signals that are ever present. As agents, we act upon the world with our body, and at the same time we experience ourselves, and the world through the same body. We communicate our intentions to the world through the motor signals that are conveyed into voluntary bodily movements, and we understand the world through the interpretation of sensory signals. In short, the body is an intentional arc between the agent and the world [8], a channel of meaningful communication between the self and the world. Having established this intimate relation between the body and the self, it then becomes an empirical question to characterize the functional properties of the bodily self. A preliminary approach to this question can be given by investigating the physiological signals that are used to constitute the bodily self, and possibly distinguish it from other bodies. 10.3 On motor and sensory signals Two main kinds of physiological signals are used to inform the representation of ones body: the centrally generated motor (or efferent) signals, and the peripheral sensory (or afferent signals). Efferent signals are the centrally-generated signals that control every voluntary movement. A key concept in the motor control literature is that of an efference copy. The concept of efference copy was first described as an effort of will by Helmholtz [11]. In fact, the idea of an effort of will was the answer to Helmholtzs question regarding our visual experience of the world. When we move our eyes, the retinal image of a perceived object is displaced. Similarly, in the case where we keep our eyes still, but we perceive a moving object, the retinal image of this object is again displaced. The critical question is how the CNS distinguishes between a sensation that is due to the activity of the organism itself from movement that is due to external activity. Helmholtz initially suggested that whenever we make eye movements, the effort of will, that is the voluntary effort to produce the eye movement provides critical predictive information about the sensory outcome of the eye movement that will follow. In the 20th century, Helmholtzs idea was further developed into the concept of an efference copy. Whenever a motor command is issued in the motor cortex, a copy of this command is generated in parallel [12,13]. This information can be used for perceptual compensation, and can help identify the source of the movement (i.e. self vs. non-


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

self). Von Holst and Mittelstaedt suggested that during voluntary eye movements, an efference copy can be used by visual or motor areas of the brain to predict the sensory outcome of the descending motor command, and therefore anticipate the self-generated stimulation (i.e. the sensory feedback originating from the eye movement itself). More recently, the idea of an efference copy has been generalised to the operation of the motor system, and it is not restricted only to the operation of the occulomotor system. Thus, an efference copy is thought to be generated whenever a motor command that precedes a self-generated movement is issued. This efference copy can then be used by the internal predictive models of the motor system in order to generate accurate predictions about ones own actions [14]. On the other hand, afferent signals are the sensory peripheral signals that can be either the effect of self-generated stimulation (re-afferent) or of externallygenerated stimulation (ex-afferent). Taken together, the afferent peripheral signals seem to support an ecological self-awareness [15], in the sense that they provide information about the body and the world within which the body is situated, since information about ones body cannot be perceived in isolation from the environment. According to Gibson each act of perception contains both propriospecific information about the self (i.e. re-afferent), as well as exterospecific information about the distal environment (i.e. ex-afferent): Egoreception accompanies exteroception, like the other side of the coin.One perceives the environment and coperceives oneself [15, p.126]. It has been suggested that afference, and especially proprioception, provides us with the phenomenal content of our bodily self-awareness, because proprioceptive information unambiguously pertains to the self [16]. However, the meaning of afferent signals for perception and behaviour is ambiguous, precisely because the afferent signals can be either self- or externally-generated. Recent theories of motor control have shown how an interaction between the efference copy and sensory inflow may reduce this ambiguity. In the case of a self-generated action, intentions and efferent information can predict the consequent multisensory signals produced by ones own movement. This prediction is thought to take place in the internal models of the motor system [14]. We do not normally experience the efferent and afferent components separately. Instead, we have a general awareness of our bodily actions that involves both components. However, the efferent and the afferent signals may support different functions, and may give rise to distinct forms of body-awareness. In fact, recent neuroscientific and phenomenological approaches to selfhood [10, 17, 18] distinguish between two aspects of bodily self consciousness: the sense of agency and sense of ownership. 10.4 From physiological signals to the experience of ones own body Sense of agency is the sense of intending and executing an action [5], a sense of oneself as an actor or a sense that ones actions are ones own [6]. In agency, the self is experienced as the source of the experience of the acting, suggesting that the relationship between the self and the action is not simply causal, because that would imply that the agent can be separated from the action. This stance implicitly

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


suggests that awareness of action cannot be separated from agency, at least not under normal circumstances [6]. The feeling that the body I inhabit is mine and always with me is called bodyownership. This feeling is a fundamental element of the phenomenal experience of my body. Moreover, ownership refers to the sense that I am the experiencing subject, my body is the site where the sensory experience takes place, and it is my body the one that experiences a certain sensation, either self- or externallygenerated [5,19]. Thus, the sense of body-ownership is present when I move voluntarily, but also when an externally-generated somatic sensation is experienced by me (e.g. passive movement), and also when my body is at rest. The raw basis of body-ownership may be provided by the epistemologically private experience that I have of my body from within (e.g. as provided by the proprioceptive sense), by the body schematic control of movement, and by multisensory integration of body-related sensory signals (e.g. vision of touch and touch). Following these operational definitions, the sense of agency involves a strong efferent component, because actions are centrally generated. On the other hand, the sense of body-ownership involves a strong afferent component, because the content of body-awareness originates mostly from the plurality of multisensory peripheral signals. An important phenomenological observation is that the sense of body-ownership is present not only during voluntary actions, but also during externally- or passively generated experiences. In contrast, only voluntary actions, or actions that are experienced as voluntary, should produce a sense of agency. To give an example, when I voluntarily move my hand, I have a sense of agency by identifying my intention to move as the source of the movement, and a sense of ownership, by identifying the moving hand as mine. However, if someone else moves my hand, I do not have a sense of agency over the hand movement, yet I retain a sense of ownership of the moving hand as being mine. It is therefore important to ask what is it exactly that the sense of agency adds to the sense of ownership, and more importantly how can agency be used to address the self-other distinction. Recent studies (for a review see [19]) have provided valuable insights on how we experience and represent our bodies in body-ownership and agency, but they have also raised important methodological and epistemological questions. 10.5 Who is the agent? Several questions regarding the nature of self-specific body- and actionrepresentations were raised with the discovery of the mirror neurons in the macaque brain. The properties of mirror neurons suggest that both self-generated and observed actions, as well as the experience and observation of sensory events, activate overlapping neural networks [4]. These common activations reflect shared representations of actions and bodies that are agent-neutral, arguing against a special representation of ones own body. In every inter-action, there are both private and public states and signals represented in the brain of the agent and/or the observer. Private signals refer to centrally generated action representations such as intentions, efferent signals (e.g. efference copy, motor commands), and re-afferent signals such as proprioception. Public signals originate from observable sensory events, both re-afferent and ex-


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

afferent, such as visual and auditory signals that may refer to bodies, objects or complex patterns of motor behaviour. How are these signals used to disambiguate the identity of bodies and the origin of actions? The predictive function of the motor system and the resulting anticipation of sensory inflow have been well documented in the literature across different experimental paradigms [14]. However, the link between the operation of the internal models of the motor system and the conscious awareness of action is still debated [20]. A critical issue in this debate relates to the question of the conscious experience of agency. It is not clear which signal(s) or state variable(s) of the motor system give rise to the conscious experience of agency. Accumulating evidence suggests that we are not aware of the actual motor commands or motor parameters of our actions [20]. This un-awareness of the actual motor commands was nicely demonstrated by Fourneret and Jeannerod [21] in a replication of the ingenious experiment by Nielsen on volition [22]. Participants were asked to draw lines in a sagittal direction on a digital tablet using a stylus. When tracing a line on the tablet, the subjects could see through the mirror a red line appearing on the computer screen in exact coincidence with the displacements of the tip of the stylus on the tablet. The output of the graphic tablet was processed by the computer using a simple algorithm for adding a linear directional bias. When the bias was set to the right (e.g. at 15), a line traced in the sagittal direction on the tablet appeared to the subject to deviate to the right at an identical angle. Subjects were able to correct for the introduced bias, and managed to trace lines that appeared to be sagittal. However, when asked after each trial to either report verbally their movement or to reproduce it, it became evident that they were unaware of the corrections they produced during the experimental trials [21]. A theoretical implication of this study is that there seems to be a two-level coding of action-related information [23]. The 1st level codes the sensory and motor signals that are used for the control and monitoring of movements. According to Georgieff and Jeannerod [23], these signals are not made available to consciousness, and therefore they are not the ones used for conscious judgments of actions. The 2nd level coding of action-related information represents the public aspects of action, such as the observable effects of the action (see also [24]), whereas the 1st level represents the private aspects, such as the efference copy, the motor command, and the sensory feedback. The 2nd level becomes especially important when we adopt a public view of action. The public view of action-representations is based on the ideomotor theory put forward by James [7]. The basic hypothesis of the ideomotor approach is that actions are coded in terms of the perceptual events resulting from them. Therefore, in action generation, the actual movement is governed by a representation of the goal of the action, which could be agent-neutral. Similarly, in action perception, the generated representations attempt to detect the intended goal. Thus, both own and others people actions are coded in a common way (see the common coding theory [25,26]). Similarly, perceived events (i.e. perceptions) and to-be-produced events (i.e. actions) are commonly represented by an integrated network of cognitive structures called event-codes (for a review see [25,26]). With regards to the issue of agency, according to the common coding theory, there are neither quantitative nor qualitative differences in the generation and processing of these common representations that would enable the a priori attribution of the source of the action (i.e. agency), allowing thus a clear-cut

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


distinction between self and other. Knoblich and Flach [27], in an experiment on action prediction, where participants had to predict the outcome of either self- or other-generated actions (e.g. throwing darts), found an authorship effect in correctly predicting the outcome of self-generated actions. In the light of this evidence, they acknowledge that one problem of the common coding theory is that [] first-person and third-person information cannot be distinguished on a common-coding level [27, p. 468]. The authorship effect reported by Knoblich and Flach could be accounted by the fact that the motor system that perceived the action during the prediction task was the same motor system that generated the action. Thus, the matching process between first-person perspective (i.e. producing the effect) and third-person perspective (i.e. observing the effect) was even more complete, leading to more accurate predictions. Nevertheless, according to the common coding theory, it remains unclear what could be the functional role of the first-person perceptive in action generation and perception: In any case, we see no indication of privileged access to 1st person knowledge, that is, to knowledge referring to the mental preparation of the upcoming action and arising before the fact. Rather, like any other event, both the physical action itself and its mental antecedents appear to be perceived after the fact. The mental representation seems to follow the physical event it represents. [25, p.149]. According to this public view of action-generation and perception, agency of action is not intrinsically embedded in the generation of the action. Instead, agency of action is the result of an attribution process that takes place at the observational level of public aspects of action that happen after the action itself. The same could be true of the self-other distinction. Jeannerod and colleagues [23, 28] have argued for the necessity of a specialized neural system that would discriminate between the self and the other, and thus provide the sense of agency. The function of this who system is to answer the question who made the action?, in other words, who was the agent. The necessity of the who system is justified by the fact that several kinds of action representations are independent of the agent who is performing them. It has been shown that both the representations of self-generated and observed actions activate overlapping neural networks [29]. These common activations share representations of actions that are agent neutral [30]. According to the shared representations model, the who did it? question can be answered in computational terms only by disentangling the non-over-lapping areas that are active during self- and other-actions. Within this framework, even intentions seem to be agent-neutral: It could be the case either that intentions [] are impersonal representations or that, although their form is <agent, action, goal>, the agent parameter can be left unspecified [31, p.139]. Neither the intention of the acting subject, nor the translation of the intention into an efference copy and a motor command suffice for the experience of agency. Thus, for the who system, the default mode of operation seems to be no agent. This line of argument implies that the sense of agency arises as a post-action reconstructive meta-representation, and that this meta-representation would be necessary for efficient self-other distinction. The who system seems to be strongly committed to a representational model of agency and self-consciousness, and thus, the problem is no more that of being the agent, but it is rather that of knowing who the agent is. In this sense, the model ignores all the processes that precede the execution of intentional actions, and


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

instead focuses on the perception of action as an objective manifestation of naked intentions [31]. On a strong view of the attributional perspective on agency, conscious agency could only be the minds best trick [32]; an after the fact, perhaps illusory, ownership of the intention to move. If shared representations is the brains basic model, then the who system is needed in order to reconstruct the representation of an agentic self. In effect, the shared representations model and the who system raise an epistemological problem, because they leave no room for a phenomenally or epistemologically special self. However, on the experiential level, the sense of agency seems to presuppose a subjective point of view, a 1st person perspective, and in addition the sense of agency has to be distinguished from a judgment of agency [33]. By refuting the very possibility of an intrinsic link between intention, efference action, and perception of ones body, it is impossible to provide an ecological account of agency. The acting body is perceived, not only from the outside (e.g. vision), but also from within (e.g. proprioception), and it is therefore experienced in an epistemologically immediate fashion. Moreover, efferent signals are present only when an action is self-generated, and thus, they could in principle code in an intrinsic way the origin of the action. It may be possible that the sense of agency is a phenomenological correlate of a neural or functional signatures that are unique to voluntary actions, and that such signatures may actually construct rather then reconstruct the conscious sense of agency. On this hypothesis, agency is not embedded in the public aspects of action, but may arise as an intrinsic property of action-execution or even action-generation processes (for a review see [19, 34]). Converging evidence suggests the sense of agency seems to be dependent upon the processing of efferent signals that precede the action itself, and that such signals intrinsically modulate the time-awareness of action, the sensory processing of reafferent events, and action-attribution [19, 35]. 10.6 A working example: self-recognition A working example that may be used to elucidate this tension between private and public signals, between shared and self-specific representations, and provide some critical insights for the self-other distinction is the self-recognition of bodily movement. Recent research on self-recognition distinguishes between two related computational problems: the problem of action recognition and the problem of self-recognition. In action-recognition, the brain must distinguish between afferent information generated by our own movements, and afferent information that is externally imposed. Action-recognition may involve unconscious operation of internal predictive models of the motor system [34], while self-recognition appears to be a specific cognitive process typically involving conscious experience [36]. Self-recognition, in the current context, involves deciding whether a visual stimulus shows ones own body or not. Thus, self-recognition is also possible in the absence of any movement or action, for example by purely morphological features. However, we often use voluntary movements as a means of selfrecognition. This fact by itself suggests a hierarchical relation between actionrecognition and self-recognition: voluntary action can aid self-recognition only if one can be sure that the viewed resulting body movements were caused by ones own voluntary action. In most studies of self-recognition, participants see a body-

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


part, which may or may not be related to their own body. The task is to judge whether what they see is their own body or not. The information available to support this judgment is systematically varied across conditions, for example by moving the hand [36, 37], by introducing delays between the movement and the visual feedback [38], or by rotating the hand image [39]. Self-recognition requires the monitoring and integration of various sources of information such as intention, motor command and somatic perception in a short time-window. Only a few studies have explicitly investigated the link between voluntary movement and action-recognition [40, 41], while the specific contribution of efferent signals for self-recognition has been under-investigated (see Table 1). Summary of Action-Recognition Studies
Participants Fourneret & Jeannerod, 1998 Normal Subjects Farrer et al., 2003b Normal Subjects & Deafferented Patient GL Angular Bias Computerreconstructed image of a hand Yes Normal subjects: differences between active and passive movement were significant only for bias >30. GL was significantly more impaired. Sirigu et al., 1999 Parietal Patients & Controls Visual Feedback: 1.Own hand 2.Others hand/same movement 3.Others hand/different movement Video display of 1 hand No Parietal patients were significantly impaired in condition 2 MacDonald & Paus, 2003 Normal Subjects

Experimental Manipulation Visual Feedback

Angular Bias Display of the line drawn by the subjects No Subjects automatically compensate for the introduced bias, but they are unaware of these corrections when bias<15.

Temporal Delays CyberGlove

Manipulation of Efference Results

Yes rTMS over left superior parietal lobule impaired the detection of asynchrony for active but not for passive movement.

Summary of Self-recognition Studies

Daprati et al., 1997 Participants Experimental Manipulation Schizophrenics & Controls Visual Feedback: 1.Own hand 2.Others hand/same movement 3.Others Hand/different movement Video display of 1 hand No Schizophrenics were significantly impaired in condition 2 Van den Bos & Jeannerod, 2003 Normal Subjects Visual Feedback : 1. Rotation of HandLocation on screen (0, 90, -90, 180) 2.Movement (same, different, no movement) Video Display of 2 hands (performing same / different / no movement) No For same movements, selfrecognition performance was influenced by the rotation of the hand image.

Visual Feedback

Manipulation of Efference Results

Table 1. A summary of recent experiments on action- and self-recognition


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

The summary of studies presented in Table 1 shows that only two actionrecognition studies have dissociated efferent from afferent information, while none of the self-recognition studies presented above have examined the distinctive roles of efferent and afferent information. Daprati, Sirigu and colleagues [36,37] investigated the self-recognition of simple and complex gestures in schizophrenic and in parietal patients respectively, using identical experimental designs. Participants were instructed to perform simple or complex self-generated movements (extension of one or two fingers), without direct visual image of their hand. Participants could see on a mirror in front of them (a) their own hand, or (b) the experimenters hand performing the same movement as the participants hand, or (c) the experimenters hand performing a different movement from the participants hand. Participants were asked to judge whether they saw their hand or not. Consistent results from both experiments revealed that both patients and controls performed almost perfectly when they saw their own hand, and when they saw the experimenters hand performing a different movement. This suggests that the detection of a mismatch between visual and proprioceptive/efferent information is a relatively easy task, even for patients who display impaired awareness of action [34]. However, both schizophrenics and parietal patients were significantly worse, compared to controls, when they saw the experimenters hand performing the same movement as them. In this critical condition, they said that they saw their own hand, whereas in fact they saw the experimenters hand. In other terms, participants tended to misattribute the experimenters hand to themselves, In all these studies [36, 37, see also 39], the performed movements were selfgenerated, that is, participants had both efferent and afferent signals available for comparison against the visual feedback. Efferent information was not dissociated from proprioceptive information, and therefore the relative contributions of these two kinds of information for explicit self-recognition were not clarified. Results showed a significant impairment in the self-recognition performance of schizophrenic and parietal patients when these groups saw someone elses hand performing the same movement as they did. In fact, patients misattributed the viewed hand to themselves. What can account for the enhanced performance of normal participants? In other words, which factor enabled normal participants to distinguish between self and other more efficiently? These studies cannot conclusively answer whether normal subjects integrated in a more efficient way afferent information alone (visual and proprioceptive feedback), that is, an integration of both public and private signals, or whether they used fine-grained efferent information for their self-recognition judgments. According to Jeannerod [30], one main conclusion of these studies is that action cues are used when distinctive movements are made (e.g. in the different movement condition), and that afferent signals (i.e. vision and proprioception) are used when action cues are ambiguous (e.g. in the same movement condition). In these studies, the movements performed by the subjects were always self-generated, and therefore across conditions, both efferent and afferent information were present. To that extent, these studies did not quantify the specific contribution of efferent information for self-recognition, over and above multisensory integration. Moreover, the paradigm of the rubber hand illusion [42, 43] suggests that if only afferent information were present or used for self-recognition, then the viewed hand would always be attributed to the self, provided that vision and

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


proprioception were synchronised. In such cases, a dominance of vision which is based on the perception of public states would be the main cue for selfrecognition. Thus, it may be hypothesized that for highly reliable selfother discrimination, visuo-proprioceptive congruence may not be sufficient. The specific contribution of efference to self-recognition can only be addressed by implementing a situation where visuo-proprioceptive information is kept congruent and maintained constant, while efference is systematically manipulated. This manipulation was implemented in a recent self-recognition experiment. Tsakiris et al. [44] investigated the specific role of efferent information for self-recognition. Subjects experienced a passive extension of the right index finger via a lever, either as an effect of moving their left hand (self-generated action), or imposed externally by the experimenter (externally generated action). The visual feedback was manipulated so that subjects saw either their own right hand (view own hand condition) or someone elses right hand (view others hand condition) undergoing the same passive displacement of the right index finger. Thus, across all trials, subjects experienced a passive displacement of their right index finger. In one block, this passive displacement was self-generated, and in another block, the same passive displacement was externally generated. In half of the trials, subjects saw their own right hand, and in the other half, subjects saw someone elses hand. Participants judged whether the right hand they saw was theirs or not. In that experiment, unlike other self-recognition studies [36, 37, 39], efferent information was selectively manipulated because the right hands displacement could be effected either by the participant or by the experimenter. In the former case, participants had two kinds of information about the passive displacement of the right hand: efferent information from the left hand that caused the displacement of the right hand, and also afferent information from the right hand itself. Overall, performance was significantly better when the passive displacement of the right index finger was self-generated across both viewing conditions (i.e., viewing self and other). Self-recognition was significantly more accurate when subjects themselves were the authors of the action, even though visual and proprioceptive information always specified the same posture, and despite the fact that subjects judged the somatic effect of an action and not the action per se. In fact, even when subjects saw their own hand, they were significantly better at correctly recognizing it as their own when they produced the passive displacement themselves, than when the passive displacement was externally generated. This significant difference suggests that efference can also improve the comparison and integration of private (e.g. proprioception) and public (e.g. vision) signals, because these were the same in both the self-generated and externally-generated conditions while participants were looking at their own hand. In the critical condition where participants saw someone elses right hand and the displacement of their right hand was externally generated, they incorrectly attributed the viewed hand to themselves in 55% of the trials. When the passive displacement was self-generated and they saw someone elses hand, incorrect attribution to self occurred in only 38% of the trials. The difference between these two conditions shows the specific role of efferent information in the accuracy of self-recognition. Therefore, efferent information clearly contributes to the ability to match proprioceptive and visual representations of a remote bodily effect. The observed efferent advantage could occur for two reasons. First, efferent information might provide an advantage in monitoring the timing of sensory events. In the case of a self-generated action,


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

forward models of the motor system use the efferent information so as to generate a prediction about the anticipated sensory feedback [14]. Second, efference might modulate the on-line comparison between vision and proprioception by providing detailed temporal and kinematic information, and integrating these signals in posterior parietal areas [37, 41]. The results suggest that afferent-driven body-awareness alone may not be sufficient for reliable explicit self-recognition. Similarly, even when there is a perfect match between proprioception and vision, efference does provide a significant advantage for their integration. Self-recognition, in the sense of correctly recognizing a visual object or event as me or mine seems to depend largely on efference and agency. This is consistent with recent experiments on action recognition and prediction, where an agentive effect was observed in recognizing and predicting actions that were performed by the participants themselves, when compared to actions performed by other agents (for a review, see [45]). This finding also suggests that efferent information is important for selfrecognition, and the self-other distinction, and not only for motor control. The distinctive role of efference in self-recognition suggests that central efferent signals have a highly predictive power allowing the correct detection of appropriate afferent signals that pertain to ones self, and can therefore be used to distinguish between the self and others. It has been suggested [46] that a basic computational mechanism that implements this function may also underpin higher cognitive abilities such as perspective taking and mental states attribution, and that right temporo-parietal areas may underpin this basic computation. 10.7 Towards an implicit self-recognition measure? A methodological confound present in almost all the self-recognition studies is the use of explicit measures of self-processing. Participants in self-recognition studies are asked to explicitly recognize the identity of a moving hand they see on a screen in front of them which could be theirs or not. The experience that participants have during these tasks does not do justice to the actual experience (or even representation) that one has about ones own body, because we rarely represent explicitly and reflectively our sense of embodied selfhood [47]. A recent study [48] showed that the primary motor cortex forms an agent-specific, not neutral, representation of observed actions. Observing another agent acting facilitates the observers motor system [49], whereas observing ones own actions tends to suppress the excitation of the motor system [48]. This novel finding implies that the motor system may be sensitive to representations of other agents as qualitatively different from the self, and as such, it may underpin a distinction between self and other, providing thus an important addition to the self-other equality of the mirror system. Further studies should investigate whether this lowlevel sensorimotor representation might underpin a form of pre-reflective selfconsciousness and whether and how it may be used to build up a conscious sense of agency and a sense of self, as distinct from other agents.

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


10.8 Conclusions We constantly feel, see and move our body, and have no doubt that it is our own. Correct demarcation of the physical bodys boundaries seems to be essential for goal-directed action, for our sense of who we are and for our successful interaction with other agents. It has been proposed that the experience of ones body and related sensory events are characterized by a sense of body-ownership, and actions generated by ones own body are characterized by a sense of agency. Converging evidence suggests that the sense of agency is efferent-driven, whereas the contents of body-ownership are predominantly afferent in their origins [19, 43].This effect of efference is not surprising since our main way of being-in-the-world is to voluntarily act on it, rather than passively perceiving it. In this sense, bodily selfawareness is not simply another form of object consciousness. Models of selfawareness that over-emphasize the shared self-other representations ignore the mere fact that my body is not so much an object of perception, but rather it is given to me as a subject, and that agency actually structures the experience of ones body. The sense of body-ownership and the sense of agency may underpin a minimal model of the self as distinct from other agents. This model would process efferent and afferent signals to inform and update representations of the body and structure its experience. Perhaps, this self-model would be a prerequisite for higher cognitive abilities, such as perspective taking and action understanding. 10.9 Acknowledgments Bial Foundation Research Grant 165/06 to MT. 10.10 References
[1] D. M. Amodio & C. D. Frith, Meeting of minds: the medial frontal cortex and social cognition. Nature Reviews Neuroscience, 7(4), 268-77, 2006. [2] R. Saxe & N. Kanwisher, People thinking about thinking people. The role of the temporo-parietal junction in "theory of mind". Neuroimage, 19, 1835-42, 2003. [3] K. Vogeley, M. May, A. Ritzl, P. Falkai, K. Zilles & G. R. Fink, Neural correlates of first-person perspective as one constituent of human self-consciousness. Journal of Cognitive Neuroscience, 16(5), 817-27, 2004. [4] G. Rizzolatti, & L. Craighero, The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192, 2004. [5] S. Gallagher, Philosophical concepts of the self: implications for cognitive sciences. Trends in Cognitive Sciences 4, 14-21, 2000. [6] A. J. Marcel, The sense of agency: awareness and ownership of actions and intentions. In: J. Roessler and N. Eilan (Eds.), Agency and Self-Awareness, Oxford University Press, 2003. [7] W. James, The Principles of Psychology. Cambridge, London: Harvard University Press, 1890, 1981. [8] M. Merleau-Ponty, The Phenomenology of Perception (C Smith, trans.).London, NY : Routledge, 1962. [9] E. Husserl (R Rojcewicz & A Schuwer trans.), Ideas Pertaining to a Pure Phenomenology and to a Phenomenological Philosophy. Second Book : Studies in the Phenomenology of Constitution. Dordrecht : Kluwer Academic Publishers, 1952/989. [10] J. L. Bermdez, A. Marcel & N. Eilan (Eds.), The Body and the Self. Cambridge, MA: MIT Press, 1995. [11] H. Helmholtz, Science and culture: Popular and philosophical essays. Chigaco: University of Chicago Press, 1995.


M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments

[12] R. W. Sperry, Neural basis of the spontaneous optokinetic response produced by visual inversion. Journal of Comparative And Physiological Psychology, 43, 482-489, 1950. [13] E. Von Holst & H. Mittelstaedt, Das Reaffernzprinzip wechselwirkungen zwichen zentrainervensystem und peripherie. Naturwissenschalten, 37, 464-476, 1950. [14] D. M. Wolpert & J. R. Flanagan, Motor prediction. Currernt Biology, 11, R729-R732, 2001. [15] J. J. Gibson, The Ecological Approach to Visual Perception. Boston: Houghton Mifflin, 1979. [16] J. L. Bermdez, The Paradox of Self-Consciousness. Cambridge, MA: MIT Press, 1998. [17] T. T. Kircher & A. David (eds.), The Self in Neuroscience and Psychiatry. Cambridge University Press, 2003. [18] D. Zahavi (Ed.), Exploring the Self: Philosophical and Psychopathological Perspectives on Selfexperience. Advances in Consciousness Research 23. John Benjamins Publishing Company. Amsterdam-Philadelphia, 2000. [19] M. Tsakiris & P. Haggard, Experimenting with the acting self. Cognitive Neuropsychology, 22, 387-407, 2005. [20] M. Jeannerod, Motor Cognition. Oxford : Oxford University Press, 2007. [21] P. Fourneret & M. Jeannerod, Limited conscious monitoring of motor performance in normal subjects. Neuropsychologia, 36, 1133-1140, 1998. [22] T. I. Nielsen, Volition : a new experimental approach. Scandinavian Journal of Psychology, 4, 225-230, 1963. [23] N. Georgieff & M. Jeannerod, Beyond Consciousness of External Reality: A "Who" System for Consciousness of Action and Self-Consciousness. Consciousness and Cognition, 7, 465-477, 1998. [24] C. D. Frith, Consciousness is for other people. Behavioral and Brain Sciences, 18, 682683, 1995. [25] W. Prinz, Perception and action planning. European Journal of Cognitive Psychology, 9, 129-154, 1997. [26] B. Hommel, J. Musseler, G. Aschersleben & W. Prinz, The Theory of Event Coding (TEC): a framework for perception and action planning. Behaviorual and Brain Sciences, 24(5), 849-78; discussion 878-937, 2001. [27] G. Knoblich & R. Flach, Predicting the effects of actions: interactions of perception and action. Psychological Science, 12, 467-472, 2001. [28] Vignemont de F & P. Fourneret, The sense of agency: a philosophical and empirical review of the "Who" system. Consciousness and Cognition, 13,1-19, 2004. [29] J. Grezes & J. Decety, Functional anatomy of execution, mental simulation, observation, and verb generation of actions: a meta-analysis. Human Brain Mapping, 12, 1-19, 2001. [30] M. Jeannerod, The mechanism of self-recognition in humans. Behavioural Brain Research, 142, 1-15, 2003. [31] M. Jeannerod & E. Pacherie, Agency, Simulation and Self-identification. Mind & Language, 19,113-146, 2004. [32] D. M. Wegner, The mind's best trick: how we experience conscious will. Trends in Cognitive Sciences, 7, 65-69, 2003. [33] M. Synofzik, G. Vosgerau & A. Newen, Beyond the comparator model: A multifactorial two-step account of agency. Consciousness & Cognition [Epub ahead of print], 2007. [34] S. J. Blakemore, D. M. Wolpert & C. D. Frith, Abnormalities in the awareness of action. Trends in Cognitive Sciences, 6, 237-242, 2002. [35] P. Haggard, Conscious intention and motor cognition. Trends in Cognitive Sciences, 9(6), 290-5, 2005. [36] E. Daprati, N. Franck, N. Georgieff, J. Proust, E. Pacherie, J. Dalery & M. Jeannerod, Looking for the agent: an investigation into consciousness of action and self-consciousness in schizophrenic patients. Cognition, 65, 71-86, 1997. [37] A. Sirigu, E. Daprati, P. Pradat-Diehl, N. Franck & M. Jeannerod, Perception of self-generated movement following left parietal lesion. Brain, 122 ( Pt 10), 1867-1874. 1999. [38] N. Franck, C. Farrer, N. Georgieff, M. Marie-Cardin, J. Dalery, T. d'Amato & M. Jeannerod, Defective recognition of one's own actions in patients with schizophrenia. American Journal of Psychiatry, 158, 454-459, 2001. [39] Van den Bos & M. Jeannerod, Sense of body and sense of action both contribute to selfrecognition. Cognition, 85, 177-187, 2002. [40] C. Farrer, N. Franck, J. Paillard & M. Jeannerod, The role of proprioception in action recognition. Consciousness and Cognition, 12, 609-619, 2003. [41] P. A. MacDonald & T. Paus, The role of parietal cortex in awareness of self-generated movements: A transcranial magnetic stimulation study. Cerebral Cortex, 13, 962-967, 2003. [42] M. Botvinick & J. Cohen, Rubber hands 'feel' touch that eyes see. Nature, 391, 756, 1998.

M. Tsakiris / The Self-Other Distinction: Insights from Self-Recognition Experiments


[43] M. Tsakiris & P. Haggard, The Rubber Hand Illusion Revisited: Visuotactile Integration and SelfAttribution. Journal of Experimental Psychology: Human Perception and Performance, 31, 8091, 2005. [44] M. Tsakiris, P. Haggard, N. Franck, N. Mainy & A. Sirigu, A specific role for efferent information in self-recognition. Cognition, 96, 215-231, 2005. [45] G. Knoblich & R. Flach, Action identity: evidence from self-recognition, prediction, and coordination. Consciousness and Cognition, 12(4),620-32, 2003. [46] J. Decety & C. Lamm, The role of the right temporoparietal junction in social interaction: How low-level computational processes contribute to meta-cognition. The Neuroscientist (in press). [47] D. Legrand, Naturalizing the acting self: subjective vs. anonymous agency. Philosophical Psychology, 20 (4) 457-478, 2007. [48] S. Schutz-Bosbach, B. Mancini, S. M. Aglioti & P. Haggard, Self and other in the human motor system. Current Biology, 16,1830-4, 2006. [49] L. Fadiga, L. Fogassi, G. Pavesi & G. Rizzolatti, Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73, 2608-11, 1997 .

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



Mirror Games
Wolfgang PRINZ
Abstract. It is sometimes claimed that individuals come to shape their own minds through looking into the mirror of others (Social Mirroring). Social mirroring has two sides to it: mirroring (individual 1 mirrors individual 2) and understanding being mirrored (individual 2 understands that s/he is being mirrored by individual 1). Social mirroring comes in various guises, arising from different modes of mirroring and different modes of communication. In this chapter I argue that two basic requirements must be fulfilled for social mirroring to work, a functional and a social one. The functional requirement refers to the operation of representational devices with mirror-like properties (mirrors inside). The social requirement refers to discourses and practices for using and exploiting mirrors inside in social interaction ("mirror games" and "mirror policies")

Contents 11.1 11.2 11.3 11.4 11.5 11.6 Others as mirrors for self .................................................................................. 165 Varieties of mirroring ....................................................................................... 166 Mirrors inside.................................................................................................... 168 Mirror games..................................................................................................... 170 Mirror policies .................................................................................................. 171 References......................................................................................................... 172

11.1 Others as mirrors for self In his Theory of Moral Sentiments, which first appeared in 1759, the Scottish philosopher Adam Smith raised, among many other things, the issue of how people come to understand and appraise their own conduct. Let me start with a short quotation summarizing his view on how an individual perceives his/her own actions: "... these are objects which he cannot easily see, which naturally he does not look at, and with regard to which he is not provided with [a] mirror. [That mirror] is placed in the countenance and behaviour of those he lives with []; and it is here that he first views the propriety and impropriety of his own passions, the beauty and deformity of his own mind." (Smith, 1759/1976, p. 110). The notion here is that individuals come to perceive and understand themselves through mirroring themselves in others that is by understanding how their conduct is perceived, received, and understood by others. What this suggests is


W. Prinz / Mirror Games

that social mirrors can take for individuals a similar role as physical mirrors do: Both help them to perceive themselves in the same way as others perceive them [13]. The notion of social mirroring is widespread in the social sciences. In this chapter I take a look at social mirrors from a cognitive science perspective. In the cognitive science domain this notion has recently been discovered, or rediscovered in various branches, such as Neurophysiology [4-7], NeuroImaging [8-12], Cognitive Psychology [3, 13, 14], Developmental Psychology [15-18] and Social Psychology [19-21]. Notably, the discovery of mirror neurons and mirror systems in the monkey and the human brain has given rise to a growing literature on the possible role of mirror-like devices for self recognition and social interaction [2225]. In most of these research traditions and their associated literatures the concept of mirror is used as a metaphor that stands for close functional relationships between action perception and production. In this chapter I raise the issue whether there is anything serious behind the metaphorical use of the notion of mirroring in the context of self-recognition and self-reflection. I will argue that social mirroring can indeed play an important role for the formation of the self, provided that mirrors outside are met by mirrors inside. By mirrors outside I refer to social mirrors that individuals encounter in their environments. By mirrors inside I refer to mirror-like representational devices operating inside their minds. These two kinds of mirrors, I suggest, interact with each other in ways that give rise to the formation of the mental self. 11.2 Varieties of mirroring Social mirroring has two sides to it: that of the target individual T whose acting is being mirrored, and that of the mirror individual M who is mirroring Ts acting. For the target individual, T, the mirror individual, M, provides a living mirror that exists in her environment in the same way as physical mirrors do. In the following I discuss in what ways M can mirror T and how T can find her own action mirrored through Ms action. For answering these questions, it may be useful to draw two distinctions, one between two basic modes of mirroring (reciprocal vs. complementary), and another one between two modes of communication (embodied vs. symbolic). 11.2.1 Modes of mirroring In the most fundamental form of social mirroring, T sees her own action imitated, or replicated by M (reciprocal mirroring). In a setting like this, the other (M) acts as a mirror for self (T) in a more or less literal sense. Social mirrors are of course fundamentally different from physical mirrors. Even if M attempts to provide asperfect-as-possible copies of Ts acting, those copies will always be delayed in time, and their kinematics will never be as perfectly correlated with Ts acting as specular images are. Obviously, the mirror-like appearance of Ms action will become even poorer when M does not even try to provide a perfect copy of Ts action (or, perhaps, even a systematically distorted one). Reciprocal mirroring can only work if these distortions are limited. We can only speak of reciprocal mirroring as long as T is in a position to recognize and

W. Prinz / Mirror Games


understand Ms acting as a delayed copy of her own preceding acting. As long as this condition is fulfilled, we may leave it open what the grain size of appropriate action units and the magnitude of acceptable delays may be. Hence, the constitutive feature of reciprocal mirroring is Ts understanding of Ms action as a copy of Ts preceding own action. A slightly different form of social mirroring arises when T sees her own action continued and carried on by M rather than replicated (complementary mirroring). In a setting like this, the other (M) does not act as a mirror in the strict sense of reflecting selfs own preceding action but only in the loose sense of continuing that action in a meaningful way. This is, of course, entirely different from what physical mirrors do. Still, what complementary mirroring has in common with reciprocal mirroring is (1) that Ms action is strongly contingent upon Ts preceding action, and (2) that this contingency needs to be perceived and understood by T. In this case, too, the reach of mirroring goes as far as T is in a position to assess Ms doing as a meaningful continuation of her own doing. 11.2.2 Modes of communication The examples considered so far draw on what we may call mirroring through embodied communication. It starts with T acting in a particular way; then M, upon perceiving Ts acting, starts replicating or continuing that action, and eventually that replication/continuation is perceived and somehow understood by T. Communication is here embodied in the sense that it relies on Ts and Ms competence for both production of own action and perception of foreign action. Such embodied mirroring does not require a language system in which the two communicate. It does not even require explicit intentions to communicate something to someone else on either side. The sole requirement is that competent perceivers/actors meet and interact. However, this does not mean that embodied mirroring relies on primitive representational resources. Though it does not require language, it does, in fact, require a smart machinery for action production and action perception. Routines for embodied mirroring play an important role in interactions between young infants and their caretakers. Babies and their mothers will often find themselves involved in what has been called protoconversational interactions, i.e. interactions involving mutual imitation and continuation of actions and emotional expressions and taking turns in this funny game from time to time. Such interactions have been extensively studied, particularly with regard to the development of imitation and its underlying mechanisms. Most of these studies focus on the babys production, but not on her perception of imitative action [18, 26-30]. In other words, this work views the baby in the role of individual M (who mirrors mothers actions) but not in the role of individual T (who perceives herself being mirrored by mother). This is, however, precisely the perspective that one needs to adopt in order to understand how social mirroring can contribute to the formation of the self. Unfortunately, literature on this perspective is scarce. Sensitivity to being imitated has only occasionally been studied in babies [18, 3133]. Quite surprisingly, a recent study has demonstrated such sensitivity in macaques as well [34]. More familiar to adults is action mirroring through symbolic communication. T acts in a particular way, and M, upon perceiving Ts acting, starts talking about Ts


W. Prinz / Mirror Games

acting, and that verbal account is finally perceived and understood by T as referring to her own preceding acting. In a setting like this, Ms verbal account of Ts acting cannot only vary along the dimension of replication/continuation but also along the dimension of description/explanation/evaluation. In any case such symbolic mirroring is dependent on the two individuals competences for the production and perception of spoken language. M communicates to T a verbal message concerning Ts action, and that message is then decoded and understood by T. Competences for production and perception of spoken language may thus be necessary conditions for symbolic mirroring to work, but they are certainly not sufficient. On top of speaking and listening to each other, the two individuals need to share a conceptual framework for the description and explanation of action. They need to draw on a shared action ontology that entails a common understanding of what actions are, how they can be parsed and individuated, and how physical action can be explained through foregoing mental action. This is precisely what folk psychology delivers us: a common-sense framework for the description and explanation of action to which we resort when we reflect and communicate about what people are doing and why they do what they do [35-39]. 11.3 Mirrors inside What kinds of representational resources does our target individual T need to have in order to be in a position to capitalize on Ms mirroring for building up a representation of self? Evidently the mere fact of being mirrored from the outside will not do the job by itself. Pet owners, for instance, will often entertain mirror conversations with their cats and dogs all day long without any obvious consequences for the animals mental architectures. Human babies seem to be different in that respect. They do exploit social mirrors for shaping and, in fact, for making their minds. What, then, do humans have that cats and dogs do not have? Humans have mirrors inside. Mirrors inside are representational devices that help them to exploit what mirrors outside afford. Basically, these devices serve to couple perception and action. But they do so in a special way, allowing for the operation of similarity between what comes in and what goes out. 11.3.1 Design principles How do these mirror devices work and how do they interact with mirrors outside? Here is the functional problem to be solved. Consider individual T, watching what M is doing. Suppose that M will occasionally mirror T, but that, for most of the time, M will be doing something else. This raises the problem how T can tell mirroring from non-mirroring in Ms actions. As long as this problem is unsolved, T will not be in a position to capitalize on what the social mirror facing her affords. Mirror devices solve this problem by virtue of two basic design principles, common coding and distal reference. The notion of common coding posits a shared representational domain for perception and action. Common coding invokes that the same representational resources are used for both planning and control of own action and perception of foreign action. In other words, tokens of own action will get their entries in that

W. Prinz / Mirror Games


space on exactly the same dimensions as tokens of foreign action [14, 40-43]. Common coding makes it possible both to perceive and produce similarity between own action and foreign action. This has important implications for either of our two model individuals, M, the producer, and T, the perceiver of similarity. As concerns production, Ms mirroring of Ts acting will rely on production of own action that resembles perceived foreign action. Conversely, as concerns perception, Ts understanding of the mirror nature of Ms action will rely on the perception of foreign action that resembles previous self-produced action. Common coding is thus a prerequisite for the mirror game between the two to work. How can representations of own and foreign action be commensurate? The key feature here is distal reference. Distal reference is fairly obvious on the perceptual side [44-46]. What we see and what we hear are neither patterns of sensory stimulation nor patterns of brain activation. Instead, we perceive objects and events in the environment distal events rather than proximal stimuli or even central activations. No less obvious is distal reference on the action side. For instance, when we plan to hammer a nail into the wall, that planning does not refer to muscle contractions or to activations in the motor cortex. Instead, it refers to the planned action and its intended outcome in the environment [47, 48]. Distal reference has two important implications: efficiency and publicity. In virtue of distal reference, perceptual representations are efficient in the sense of representing environmental events in a way that satisfies the needs for successful interaction with them. Likewise, goal representations are efficient in the sense of effectuating the actions required to reach the pertinent goals [49]. The other implication is publicity. In virtue of distal reference, mental representations for perception and action control are public in the sense of representing events in a way that satisfies the needs for successful communication about them. They always refer to public events in the environment. These two design principles make up for mirrors inside. These mirrors go either way to produce own action resembling perceived foreign action and to perceive foreign action resembling own action. Their operation is based on priming through similarity: perceived foreign action will prime corresponding own action, and likewise will own action prime the perception of corresponding foreign action. 11.3.2 Embodied and symbolic devices So much about design principles. How are mirrors instantiated inside the human mind? This question brings us back to the two basic modes of mirroring: embodied and symbolic. Embodied devices operate on implicit procedural knowledge for the perception and control of bodies and actions. This knowledge is likely to be contained in representational structures that build on innate resources. Conversely, symbolic devices operate on explicit declarative knowledge about bodies and actions. That knowledge is contained in representational structures that build on acquired, language-based resources. Without going into much detail, let me briefly mention what I mean by these devices. On the embodied side we may discern mirror devices like body schemes, action schemes, and, perhaps, emotion schemes. As I have argued elsewhere [3], the representational capacities of these devices are, from the outset, shared between perception and production and, hence, between others and self. One may


W. Prinz / Mirror Games

even invoke that they are first developed for others and then projected back to self a view that poses a challenge to the widely accepted notion that knowledge of self is the natural fundament for knowledge of others. On the symbolic side individuals have a rich conceptual framework for action identification, comprehension, and evaluation at their disposal a framework that forms the core of their folk-psychology beliefs about the mental dynamics of human action. That framework gets acquired and continuously shaped in languagebased interaction and communication. From the outset, it equally applies to both, others and self. And again, some would argue that this framework, too, gets first developed for understanding what others do, and only later becomes applied to planning ones own doings. The notion of embodied and symbolic mirrors opens a fascinating research agenda on how these devices emerge, how they work, and how they get shaped through embodied and symbolic forms of learning and communication. Yet, here I will not address this agenda. For the rest of the chapter I take it for granted that embodied and symbolic mirror devices are in place and examine how they are used in mirror games. 11.4 Mirror games Mirror devices give a promise that cannot always be fulfilled. For instance, for individuals like Robinson Crusoe who live in isolation, devices like body or action schemes cannot fulfill their mirror function. To fulfill the promise, two basic conditions must be met. One is that other individuals need to be around. This is what Fridays advent affords: mirrors inside need to be complemented by mirrors outside. The other is that the two individuals need to interact in particular ways. This is what their reciprocal acting and talking affords: they need to engage in mirror games. Mirror games are, in other words, social practices designed to confront mirrors inside with mirrors outside. We may discern two basic kinds of such games, symbolic and embodied. While symbolic games rely on reciprocal talking about action, embodied games rely on reciprocal acting. Here I use the terms of attribution discourse and mirror practices for symbolic and embodied games, respectively. Attribution discourses: Attribution discourses provide culturally standardized schemes of interpretation of, and communication about, human conduct. These discourses attribute to individuals a mental configuration centred around a self. Such discourses permeate our daily life at several levels, predominantly, for instance, when using psychological common sense to explain peoples actions. Folk psychology is based upon the idea of a subject having an explicit, lifelong identical self at its core. Discourse about morals and rights are no less relevant when they identify the self as an autonomous source of decisions to act. Such discourses are often embedded in narrative discourses of various kinds. Fictional stories in books and movies are packed with talk about willing and behaving. We tell stories to our children in order to explain to them just what it means to be a person. We thereby provide them with two tools. One is the explicit semantics of the culture in which they live its customs and practices, values and standards, myths and legends. The other is the implicit syntax of its folk psychology, which specifies how human agents function, what they think and do,

W. Prinz / Mirror Games


how their thinking is related to their doing, and how they are rewarded or punished for their doings be it in heaven or on earth. Now, when agents in social groups organize their mutual interaction and communication in a way that each one expects all the other co-agents to also have a self, everyone of the agents new arrivals, too is confronted with a situation that already provides a role for her in the shape of a self. Awareness of foreign ascriptions to oneself induces self-ascriptions, and the agent becomes accustomed to the role of a self ascribed to her by others. A person thinks of herself as others think of her. Mirror practices: While attribution discourses rely on exchange of declarative knowledge about action, mirror practices rely on interactions based on procedural knowledge for action perception and production. In early infancy embodied mirroring is the only game in town. For caretakers the practice of reciprocating or continuing the babys doings is common and widespread perhaps even a human universal. For babies these games seem to be of crucial importance for tuning in with, and becoming attached to others, as well as laying the ground for perceiving and understanding themselves like others. In no way are embodied mirror games limited to interactions between caretakers and infants, however. They also apply to interactions among grown-ups. For instance, an individual may cross his arms behind his head while facing another individual doing the same (reciprocation). Likewise an individual may take up another individuals work (say washing a car) when the other is temporarily withdrawn (continuation). In the same way individuals may accompany other individuals acting through pertinent facial and bodily gestures, thereby commenting on that acting in a non-verbal format. As a rule such action-based mirroring is not really cultivated as a social practice. Individuals will often have no explicit intention to communicate anything to others and they may not even be aware of what they are doing. Their mirroring reflects automatized habits [20, 50], and sometimes these habits are even considered to be inappropriate conduct that ought to be suppressed. Still, from the viewpoint of the others, these implicit habits have exactly the same consequences as explicit practices: They let people perceive and receive their own doing through the mirror of somebody else. There are two perspectives here. One is related to the experience that others are/act like self. This aspect of the game has been shown to be a crucial factor for the formation of social bonding and coherence [51, 52]. The other relates to the converse experience that self is/acts like others. This aspect of the game, which has to date received less attention, may in fact prove to be a crucial factor for the formation of the self in the first place [53, 54]. By engaging in mirror games, people make capital out of their capacity to understand mentality and agency in others for construing mentality and agency in themselves. In a way, then, mirror games exploit others for building selves. 11.5 Mirror policies We should not think of mirror games as pieces of interaction that get automatically started when people meet each other, but rather as being embedded in what one could call mirror policies. By this term I refer to traits, states, and strategies that


W. Prinz / Mirror Games

may govern individuals readiness to engage and become engaged in mirror games. We may discern two basic dimensions on which mirror policies vary. One concerns the conditions under which an individual is prone to imitate others and/or become imitated by others. As recent evidence suggests [55-57] even newborns may, at times, not only be prepared to imitate certain gestures, but even to provoke imitative responses by their caretakers. Mirroring and being mirrored is thus controlled for them by their proneness to become engaged in the game. The other dimension of mirror policies concerns selectivity. Individuals may in fact be quite selective in playing mirror games. For instance, they may mirror some kinds of behaviours, but not others. They may engage in mirror games under certain circumstances, but not under others. And, most importantly, they may be selective with respect to the target individuals whom they grant their mirroring. They may be prone to mirror certain individuals, but refuse to mirror others. For instance, they may tend to mirror their kids, their folks, and perhaps their peers, but perhaps not or to a much lesser degree strangers, disabled individuals, or elderly people. We can, therefore, think of each mirror individual as entertaining an implicit list of target individuals with whom s/he is prone to engage in mirror games, and of each target individual as being included in some individuals target lists, but excluded from some other individuals lists. This way mirror policies act to induce both social assimilation and dissimilation. Assimilation is based on the dialectics of mirroring and perceiving being mirrored. Likewise, dissimilation is based on the dialectics of refusing to mirror and perceiving being refused. In principle, mirror policies rely on both, symbolic discourses and embodied practices. In any case embodied practices will add to the various sorts of symbolic discourses through which social relations are established and maintained in the first place. 11.6 References
[1] A. Smith, The theory of moral sentiments, 1. Oxford, UK: Clarendon Press, 1759/1976. [2] W. Prinz, F. Frsterling & P. Hauf, Of minds and mirrors. An introduction to the social making of minds. Interaction Studies, 6, 1-19, 2005. [3] W. Prinz, Mirrors for embodied communication. In G. Knoblich & I. Wachsmuth (Eds.), Embodied communication. Oxford, UK: Oxford University Press, in press. [4] G. Rizzolatti, The mirror neuron system and imitation. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science. Mechanisms of imitation and imitation in animals, 1, (pp. 55-76). Cambridge, MA: MIT Press, 2005. [5] G. Rizzolatti, L. Craighero & L. Fadiga, The mirror system in humans. In M. I. Stamenov & V. Gallese (Eds.), Mirror neurons and the evolution of brain and language, (pp. 37-59). Amsterdam: John Benjamins, 2002. [6] G. Rizzolatti, L. Fadiga, L. Fogassi & V. Gallese, From mirror neurons to imitation: Facts and speculations. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases, (pp. 247-266). Cambridge, UK: Cambridge University Press, 2002. [7] M. Jeannerod, Neural simulation of action: A unifying mechanism for motor cognition. NeuroImage, 14, S103-S109, 2001. [8] M. Iacoboni, Understanding others: Imitation, language, and empathy. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science. Mechanisms of imitation and imitation in animals, 1, (pp. 77-99). Cambridge, MA: MIT Press, 2005. [9] M. Iacoboni, I. Molnar-Szakacs, V. Gallese, G. Buccino, J. C. Mazziotta & G. Rizzolatti, Grasping the intentions of others with one's own mirror neuron system. PLoS Biology, 3, e 79, 2005.

W. Prinz / Mirror Games


[10] J. Decety, Is there such a thing as functional equivalence between imagined, observed, and executed action. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases, (pp. 291-310). Cambridge, UK: Cambridge University Press, 2002. [11] J. Decety & T. Chaminade, The neurophysiology of imitation and intersubjectivity. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science. Mechanisms of imitation and imitation in animals, 1, (pp. 119-140). Cambridge, MA: MIT Press, 2005. [12] J. Decety & J. Grzes, Neural mechanisms subserving the perception of human actions. Trends in Cognitive Sciences, 3, 172-178, 1999. [13] N. Sebanz, H. Bekkering & G. Knoblich, Joint action: Bodies and minds moving together. Trends in Cognitive Sciences, 10, 70-76, 2006. [14] W. Prinz, An ideomotor approach to imitation. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science. Mechanisms of imitation and imitation in animals, 1, (pp. 141-156). Cambridge, MA: MIT Press, 2005. [15] A. N. Meltzoff, Elements of a developmental theory of imitation. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases, (pp. 19-41). Cambridge, UK: Cambridge University Press, 2002. [16] A. N. Meltzoff, Imitation and other minds: The "like me" hypothesis. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science. Imitation, human development, and culture, 2, (pp. 55-77). Cambridge, MA: MIT Press, 2005. [17] A. N. Meltzoff & M. K. Moore, Infant intersubjectivity. Broadening the dialogue to include imitation, identity and intention. In S. Brten (Ed.), Intersubjective communication and emotion in early ontogeny, (pp. 47-62). Cambridge, UK: Cambridge University Press, 1998. [18] C. Trevarthen, The self born in intersubjectivity: The psychology of an infant communicating. In U. Neisser (Ed.), The perceived self. Ecological and interpersonal sources of self-knowledge, (pp. 121-173). Cambridge, UK: Cambridge University Press, 1993. [19] T. L. Chartrand & J. A. Bargh, Nonconscious motivations: Their activation, operation, and consequences. In A. Tesser, D. Stapel & J. Wood (Eds.), Self and motivation: Emerging psychological perspectives, (pp. 13-41). Washington, DC: American Psychological Association, 2002. [20] T. L. Chartrand & J. A. Bargh, The chameleon effect: The perception-behavior link and social interaction. Journal of Personality and Social Psychology, 76, 893-910, 1999. [21] A. Dijksterhuis & J. A. Bargh, The perception-behavior expressway: Automatic effects of social perception on social behavior. In M. P. Zanna (Ed.), Advances in experimental social psychology, 33, (pp. 1-40). San Diego, CA: Academic Press, 2001. [22] G. Rizzolatti, L. Fogassi & V. Gallese, Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2, 661-670, 2001. [23] V. Gallese, The "Shared Manifold" Hypothesis: From mirror neurons to empathy. Journal of Consciousness Studies, 8, 33-50, 2001. [24] V. Gallese, Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 12, 493-501, 1998. [25] C. Whitehead, Social mirrors and shared experiential worlds. Journal of Consciousness Studies, 8, 3-36, 2001. [26] S. Brten, Infant learning by altercentric participation: The reverse of egocentric observation in autism. In S. Brten (Ed.), Intersubjective communication and emotion in early ontogeny, (pp. 105-124). Cambridge, UK: Cambridge University Press, 1998. [27] J. Raphael-Leff, Parent-infant psychodynamics. Wild things, mirrors & ghosts. London: Whurr Publishers, 2003. [28] P. Rochat, Early social cognition. Understanding others in the first months of life. Mahwah, NJ: Erlbaum, 1999. [29] C. Trevarthen, The concept and foundations of infant intersubjectivity. In S. Brten (Ed.), Intersubjective communication and emotion in early ontogeny, (pp. 15-46). Cambridge, UK: Cambridge University Press, 1998. [30] C. Trevarthen, T. Kokkinaki & G. A. J. Fiamenghi, What infants' imitations communicate: With mothers, with fathers, and with peers. In J. Nadel & G. Butterworth (Eds.), Imitation in infancy, (pp. 127-185). Cambridge, UK: Cambridge University Press, 1999. [31] A. N. Meltzoff, Foundations for developing a concept of self: The role of imitation in relating self to other and the value of social mirroring, social modeling, and self-practice in infancy. In D. Cicchetti & M. Beeghly (Eds.), The self in transition: Infancy to childhood, (pp. 139-164). Chicago: University of Chicago Press, 1990.


W. Prinz / Mirror Games

[32] J. Nadel, Imitation and imitation recognition: Functional use in preverbal infants and nonverbal children with autism. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind. Development, evolution, and brain bases, (pp. 42-62). Cambridge, UK: Cambridge University Press, 2002. [33] P. Zukow-Goldring, Assisted imitation: Affordances, effectivities, and the mirror system in early language development. In M. A. Arbib (Ed.), Action to language via the mirror neuron system, (pp. 469-500). Cambridge, UK: Cambridge University Press, 2006. [34] A. Paukner, J. R. Anderson, E. Borelli, E. Visalberghi & P. F. Ferrari, Macaques (Macaca nemestrina) recognize when they are being imitated. Biological Letters, 1, 219-222, 2005. [35] R. J. Bogdan, Mind and common sense: Philosophical essays on commonsense psychology. Cambridge, UK: Cambridge University Press, 1991. [36] J. D. Greenwood, The future of folk psychology. Cambridge, UK: Cambridge University Press, 1991. [37] M. Kusch, Psychological knowledge: A social history and philosophy. London: Routledge, 1999. [38] B. F. Malle, How the mind explains behavior. Folk explanations, meaning, and social interaction. Cambridge, MA: MIT Press, 2004. [39] B. F. Malle, L. J. Moses & D. A. Baldwin, Intentions and intentionality. Foundations of social cognition. Cambridge, MA. MIT Press, 2001. [40] B. Hommel, J. Msseler, G. Aschersleben & W. Prinz, The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences, 24, 849-878, 2001. [41] W. Prinz, A common coding approach to perception and action. In O. Neumann & W. Prinz (Eds.), Relationships between perception and action: Current approaches, (pp. 167-201). Berlin, Heidelberg, New York: Springer, 1990. [42] W. Prinz, Perception and action planning. European Journal of Cognitive Psychology, 9, 129-154, 1997. [43] W. Prinz, Experimental approaches to imitation. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases, (pp. 143-162). Cambridge, UK: Cambridge University Press, 2002. [44] E. Brunswik, Distal focussing of perception: Size constancy in a representative sample of situations. Psychological Monographs, 254, 1944. [45] E. Brunswik, Conceptual framework of psychology. In U. Neurath, R. Karnap & C. Morris (Eds.), International encyclopedia of united science, 1. Chicago: University of Chicago Press, 1952. [46] E. Brunswik, Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193-217, 1955. [47] W. James, The principles of psychology. New York: Holt, 1890. [48] G. Wulf & W. Prinz, Directing attention to movement effects enhances learning: A review. Psychonomic Bulletin and Review, 8, 648-660, 2001. [49] W. Prinz, Why dont we perceive our brain states? European Journal of Cognitive Psychology, 41-20, 1992. [50] J. A. Bargh, M. Chen & L. Burrows, Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71, 230-244, 1996. [51] J. A. Bargh, Auto-motives: Preconscious determinants of social interaction. In E. T. Higgins & E. M. Sorrentino (Eds.), Handbook of motivation and cognition: Foundations of social behavior, 2, (pp. 93-130). New York: Guilford Press, 1990. [52] K. J. Jonas & K. Sassenberg, Knowing how to react: Automatic response priming from social categories. Journal of Personality and Social Psychology, 90, 709-721, 2006. [53] W. Prinz, Emerging selves: Representational foundations of subjectivity. Consciousness and Cognition, 12, 515-528, 2003. [54] W. Prinz, Free will as a social institution. In S. Pockett (Ed.), Does consciousness cause behavior?, (pp. 257-276). Cambridge, MA: MIT Press, 2006. [55] E. Nagy & P. Molnar, Homo imitans or homo provocans? Human imprinting model of neonatal imitation. Infant Behavior and Development, 27, 54-63, 2004. [56] E. Nagy, From imitation to conversation: The first dialogues with human neonates. Infant and Child Development, 15, 223-232, 2006. [57] C. Trevarthen, Stepping away from the mirror: Pride and shame in adventures of companionship. Reflections on the nature and emotional needs of infant intersubjectivity. In C. S. Carter, L. Ahnert, K. E. Grossmann, S. B. Hrdy, M. E. Lamb, S. W. Porges & N. Sachser (Eds.), Attachment and bonding: A new synthesis, (pp. 55-84). Cambridge, MA: MIT Press, 2003.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



Early Ontogeny of Action Perception and Control

Abstract. Perception and interpretation of goal-directed behaviour is one of the crucial social-cognitive skills in the field of human cognition. At a very early age, infants start to be able to perceive and interpret a human action as goal-directed. This early ability is often viewed as an important precursor for intentional understanding and, even more importantly, for later Theory of Mind development. A question which is discussed controversially is how infants abilities to perceive and understand goal-directed human actions are interrelated with their competence to perform the same behaviour. There is ample evidence that in adults, perception and production of an action share a common representational ground where planned actions are represented in the same format as perceived events [e.g. Common Coding Principle, 1, 2]. However, studies on the development of this interrelation have yielded contradictory results. The present chapter integrates various findings from different studies investigating perception, production, and imitation of goal-directed actions and discusses them in the light of existing hypotheses and theories on the development of action perception and production.

Contents 12.1 12.2 12.3 12.4 Introduction....................................................................................................... 175 Interrelation of action perception and action control in development ............... 178 Conclusion ........................................................................................................ 182 References......................................................................................................... 182

12.1 Introduction Research in developmental psychology has repeatedly shown that infants from early age on have an amazing knowledge about their surrounding environment [3, 4]. Infants as young as 2.5 month old are able to represent the continued existence of a hidden object and understand that this object continues to exist after it disappeared [3, 5-7]. At around the same age human infants are able to understand that objects are solid and cannot move through another [3, 8, 9]. Spelke [4] concludes from these and other studies that young infants have systematic knowledge about three principles in the domain of physics: continuity (objects move on connected, unobstructed paths), cohesion (objects move as connected, bounded units), and contact (objects affect one anothers motion if and only if they touch). Spelke introduced the term Core Knowledge as according to her, this initial


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

knowledge is innate. This early representation of the physical world is important as by its biological nature, human perceptual-motor skills are the result of a long evolutionary history, which always was and still is constraint by the laws of physics. However, the environment, in which a human infant grows up, is not only physical but also social. From the very first day of life, and even before, infants act and interact in a social world. Thus, as it is important to learn that objects continue to exist even if they are no longer visible, it is even more important to learn how to control own actions and to come to understand and interpret actions performed by others. The present chapter shall give an overview of the ontogeny of action understanding in the first years of life and how this development might be related to the development of action control. 12.1.1 What constitutes an action? To study action understanding and control, a clear definition is required of what constitutes an action as opposed, for example, to a simple body movement. Actions differ from body movements in their intentional character, that is, actions are directed towards an intended goal. As a consequence, it is important for both theoretical considerations and practical experimental planning to distinguish the two constituents of an action: the movement and the goal. Infants start at about 18 weeks of age to perform simple goal directed actions like reaching and grasping for stationary or even slowly moving objects [10]. And, as in adults, infants reaching movements are predictive. Arm and hand movements are initiated before the target is within reaching distance, and the reaching movement is geared ahead of the objects momentary position toward a future interception position [11-13]. Thus, very early, infants are already able to dissociate movement and goal in their action. This distinction becomes even more apparent in tasks, in which infants have to differentiate between means and ends. Infants begin to pass simple versions of a means-end task (e.g. pull a cloth to receive a toy) around the age of 6 months [14, 15]. The competence to produce means-end behaviour becomes increasingly systematic over the next months, with infants aged 9 to 12 months being able to produce spontaneous means-end behaviour [16-19]. Interestingly, at the same age other important skills like motor competence, locomotion, and social cognition develop [20]. 12.1.2 History of research on human action The study of human actions has a long history in adult research, which can be traced back to the 19th century. The ideomotor theory, which can be seen as the first cognitive approach to action control, assumes that goal representations, which are in the view of this approach functional anticipations of action effects, play a crucial role in action control [21-24]. Intentional action requires a goal, which is seen as an anticipatory representation of the expected action effects. This idea has recently been picked up by Prinz and colleagues in the Common Coding approach [1, 2, 25]. The primary assumption of this Common Coding approach is that action perception and action control share a common representational ground where planned actions are represented in the same format as perceived events. Whereas separate coding accounts need to postulate transformations to explain how

M.M. Daum et al. / Early Ontogeny of Action Perception and Control


coordination between the action system and the perceptual system is achieved, the Common Coding account tells a much simpler story. Event representations that are common to perception and action make transformations between perceptual and motor information unnecessary. Empirical support for such an approach comes from different domains, for example, from studies on the timing of movements [e.g., 26], on stimulus-response compatibility [e.g., 27], on bimanual coordination [28], and on action perception [29]. This evidence shows that the Common Coding approach offers a powerful framework for the interpretation of action production as well as action perception [30, for a related assumption of common representations see 31]. 12.1.3 Perception of goal-directed action During the last years, the perception and understanding of goal-directed action has been more and more identified to be one of the most crucial social-cognitive skills in cognitive development [32-34] and in the field human cognition per se [25]. Already at the age of 5 to 6 months, infants start to be able to perceive and interpret actions as goal-directed [35-40]. In Woodwards [36] seminal studies, infants were habituated to a grasping action towards one of two objects. In the testing phase, in which the positions of the two objects were switched, 6-monthold infants demonstrated a stronger novelty response to the hand grasping a new object but performing the old motion path than for the hand grasping the same object at a new position. To be interpreted as goal-directed by 6-month-olds, the action has to be performed by a human agent [36, 39, 41, 42], for an exception see [38]. Moreover, the action has to be either familiar to the infants [e.g., a grasping action, 40] or it has to result in a salient action effect [i.e., a salient change in the object's state, 39, 41-43]. Infants at this age are even able to encode the goal of an uncompleted action, that is, if the action is presented without the actual achievement of the goal [35]. Thus, by the age of 6 months, infants differentiate in their reasoning between human action and object motion, and they can encode goal-directed actions of others. A few months later, at 9 months of age, infants abilities to understand others actions are extended to the understanding of computer-animated displays, e.g., a ball performing rational vs. non-rational movement patterns [44, 45]. According to Gergely and colleagues infants apply a non-mentalistic action interpretation system, the teleological stance, which also takes into account the context of the external goal and the situational constraints in order to interpret and understand actions [46, 47]. Interestingly, the same action pattern can be interpreted already by 6-month-olds if the action is not performed by an inanimate object but by a human [48]. At the age of 9 to 11 months, infants are able to parse observed sequences of continuous everyday actions along intention boundaries [49, 50]. Such action analysis is central to inferring intentions, and at natural breakpoints the links between action and intention are especially strong [51]. At the end of the first year, infants are able to infer goals from a variety of cues like gaze direction, emotional expression, and pointing [52-55] and they have a broad notion of what counts as an agent. At this age, infants ability to detect agents can no longer be reduced to perceive humans, objects that are perceptually similar to humans, or objects that display self-propulsion but it is now based on the detection of goaldirectedness [56].


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

These early competencies of action perception are further improved during the second year of life. In this period, infants or toddlers develop a more sophisticated understanding of other persons actions and their corresponding mental states. They begin to understand other persons as intentional. That is, they not only know that others pursue goals persistently, which is the case for 9-month-olds [57], but they also understand that others choose specific means to obtain goals [58, 59]. And infants start to infer goals of other persons from their actions in varying situations. In his seminal study, Meltzoff [60] showed that 18-month-olds could infer and imitate an adults intended act even by watching attempts, in which the adult model failed in achieving the end of the intended action. Subsequent studies demonstrated that also 15-month-olds are able to infer the goals of actions performed by a stuffed orang-utan operated by a puppeteer [61]. In a replication of the Meltzoff [60] study, 12-month-olds did not frequently imitate goal-directed actions, which were unsuccessful [62]. Thus by the age of around 15 months, children are able to complete an observed intended action instead of just copying the surface features of failed attempts. Similarly, at the age of 14 months infants reproduce an action done on purpose but not an action done by accident [63] and infants understand that identical actions may have different goals depending on the context [32]. These early abilities of perception, understanding and imitation of various kinds of intentional actions can be considered as an important precursor for the understanding of others intentions more importantly, for a later development of a Theory of Mind [64-66]. Theory of Mind (ToM) is the ability to interpret other people as having beliefs, desires and intentions that are different from ones own. This is an essential basis for humans to communicate with each other in a meaningful way [67, 68]. The presence of a ToM (as assessed by a standard, verbal false belief task) was long considered to be present not before 4 years of age [69]. However, in recent studies using looking paradigms, Southgate, Senju, and Csibra [70] showed that already 25-month-old infants showed some extend of ToM when they were tested in a nonverbal implementation of the false belief task using and Onishi and Baillargeon [71] found such understanding even in infants as young as 15 months. Evidence from recent longitudinal studies suggests a close link between early social-cognitive skills like action understanding and later social cognition [72-74]. In 14-month-olds for example, decrement of attention during habituation to human intentional action was significantly correlated to ToM abilities and predicted later preschool mentalistic construal of persons [73]. A similar relation has been shown between 6-month-olds habituation to a goal-directed action and their later performance in false belief tasks [74]. And infants intentional understanding in an imitation task at 14 and 18 months was related to the use of internal state language at 32 months, which predicted their concept of intention tested at the age of four years [75, 76]. 12.2 Interrelation of action perception and action control in development What we have learned so far in the previous section is that infants at a very early age show a sophisticated level of understanding of their social world. However, a

M.M. Daum et al. / Early Ontogeny of Action Perception and Control


question that remains open is how perception and understanding of actions are related to the infants own competence to perform the same action. In adults, the interplay of action perception and production is extensively described in the theoretical framework of the Common Coding Principle [1, 2] mentioned above. This account assumes a bidirectional influence of action and perception, which has found support in adult research with many different paradigms. On the one hand, perceived events can have an impact on planned and executed actions if perceived events and planned actions share common features as has been shown, for example, using imitation paradigms [77, 78]. Brass and colleagues [77] conducted a task, in which participants had to initiate, as fast as possible, a particular finger gesture while watching either the same or a different gesture performed by a hand on a computer screen. Participants initiated the required gesture much faster when the same gesture was shown on the screen compared to a different gesture. On the other hand, planned or executed action can also have an impact on the perception of events if they share common features [79, e.g., 80, 81, 82]. Hamilton and colleagues [80], for example found that actively lifting a box altered the perceptual judgment. An observed box was judged to be heavier when subjects were lifting a light box, and it was judged to be lighter when they were lifting a heavy box. These findings of a common representation have received support from research on the neural basis of these shared representations, the mirror neuron system. Mirror neurons are neurons that fire both when an action is perceived and when it is performed. They have first been discovered in the premotor area (F5) of the macaque brain [83, 84]. Research on brain imaging in humans has shown similar evidence for common brain regions subserving perception and production of actions [85, 86], empathy [87], language acquisition [88], and Theory of Mind [89]. In the field of developmental psychology, however, there is disagreement whether a mirror neuron system and, thus, a possible interrelation of action perception and control is already in place at birth or in early infancy [for a detailed overview, see 90] and about the developmental direction of this interrelation. An important issue discussed in the literature is whether in infants the understanding of oneself as an agent precedes the understanding of others as agents or vice versa [see, e.g., 91]. There are two main hypotheses on how the interrelation between action perception and action production develops. The first hypothesis (action first) states that infants come to understand others actions based on an understanding of their own actions and a competence to produce own actions [92, 93]. The second hypothesis (perception first) suggests that infants understanding of their own actions is based on the understanding of other peoples actions [e.g. in imitation, 94] and that infants are able to understand actions, which they are not yet able to produce themselves. There is empirical evidence in support of both hypotheses. These findings and implications will be discussed in the following sections. 12.2.1 Action first: perception grounded in action Evidence for the action first hypothesis first came from constructivist theories based on the work of Jean Piaget [95]. In the sensorimotor stage during the first two years of life, infants and toddlers construct a representation of the world


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

through self-motivated action in and interaction with the surrounding world [9698]. The hypothesis received support from different directions. First, from findings on the development of infants manual skills, in which infants increased their attention to auditory and visual properties of objects as this information becomes useful for guiding new actions [99]. Second, support for the action first hypothesis came from the famous neonatal imitation studies by Meltzoff and Moore [100]. They discovered that newborns can imitate facial acts and concluded that imitation, and the neural machinery that underlies it, begets an understanding of other minds [92, p. 56]. And third, it has been proposed that the motor system is used to emulate observed actions. Via covert imitation, the motor system maps the perceived actions of others actions onto ones own action repertoire. This emulation of others actions helps to generate predictions about future events and thus, to understand underlying goals [101-103]. Empirical evidence comes from several studies on infants understanding of goal-directed actions. Using a means-end task with 8- and 12-month-old infants, Schlesinger and Langer [104] demonstrated that already 8-month-old infants were able to respond to the causal structure of a means-end sequence in their action, however, only at the age of 12 months this causal structure influences infants expectations about the perceived means-end event. The results of two recent studies show that 6- to 9-month-old infants only seem to be able to understand actions, which they are able to perform themselves [105, 106]. Longo and Bertenthal [106] looked at perseverative search errors of 9-month-olds in a task comparing covert imitation of ipsilateral and contralateral reaches. Infants at this age tend to use the ipsilateral hand when reaching for an object [ipsilateral bias in reaching during early development, 107]. In the imitation task, infants imitated an action more often if this action was already in the infants action repertoire (ipsilateral reach) than if the action was not yet in the infants action repertoire (contralateral reach). This general view is supported by findings indicating that prior action experience can alter and facilitate following action perception [108, 109]. 12.2.2 Perception first: action grounded in perception The perception first hypothesis, which received supporting evidence from neonativist accounts [3, 110], assumes that infants cognition is first expressed in their perceptions and that, therefore, infants understanding of their own actions is based on the understanding of other peoples action. Empirical support for this assumption comes, for example, from studies on infants search behaviour [e.g., 17, 111, 112]. Studies testing 8-month-old infants have shown that when an object is hidden in a location A and then in location B, infants tend to search in the wrong location A (A-not-B error). Baillargeon and Graber [111] used a looking-time paradigm to examine 8-month-olds ability to remember the location of a hidden object. In this paradigm, infants looked longer when an actors hand retrieved a toy from an inconsistent position than when the actor retrieved the toy from a consistent position, where the toy was actually hidden. These results indicate that the ability of 8-month-old infants to remember the location of a hidden object is far better than their performance in the A-not-B-Error search task. More evidence comes from a similar set of studies investigating infants and toddlers knowledge about solidity [3, 113]. Three-month-old infants were presented with a rolling ball,

M.M. Daum et al. / Early Ontogeny of Action Perception and Control


which disappeared behind an occluder. On test trials, an obstacle was placed on the track behind the occluder, showing above the occluder. When the occluder was removed and the children were presented either a consistent event, in which the ball was resting in front of the obstacle or an inconsistent event, where the ball seemed to have passed the obstacle and violated the rules of solidity infants looked reliably longer at the inconsistent event than at the consistent event [3]. Thus, already at the age of 3 months, infants seem to detect incongruences with physical laws. However, if children at the age of 2 to 3 years have to actively search for a ball rolled behind an occluder, most of the 2- and 2.5-year-olds did not perform above chance [113]. The ability to visually differentiate between consistent and inconsistent events is not lost in toddlers: if tested in a looking time paradigm, the same toddlers succeed in detecting impossible outcomes of an person or puppet searching an object, but failed in an active search task [114, 115]. Again, here is a dissociation between the rather sophisticated knowledge about solidity and continuity of infants and the rather poor performance in toddlers. Recent evidence from our own research with 6- and 9-month old infants further supports this hypothesis. In a study on the understanding of goal-directed but uncompleted actions, infants were able to encode the goal of an action only when they perceived the action from an allocentric perspective (as performed by another person) but not from an egocentric perspective, similar to the perspective from which they perceive their own manual actions [35]. This study also exhibited no difference between infants ability to encode ipsilateral or contralateral reaching movements. The ability to perform contralateral reaching movements, however, seems to develop at a later age [107]. Furthermore, already 6-month-olds are able to encode the goal of a grasping action towards objects of different sizes from the aperture size of the actors hand during the grasp [116]. This ability seems to be independent from infants ability to anticipatorily adjust the aperture size of their own hands to the size of a target object, which starts only at the age of 9 months [117]. In a recent study, 6-month-olds abilities to perceive and perform a simple means-end task were tested in both an action perception and an action production task [14]. In an action perception version, a preferential looking paradigm was used, in which infants were shown an actor performing support pulling behaviour with an expected and an unexpected outcome. In an action production version, infants had to pull a cloth to receive a toy. Results showed that in the perception task, infants discriminated between expected and unexpected outcomes of the pulling action. This perceptual ability was independent of their actual competence to perform means-end behaviour in the action production task. Finally, in a study examining infants understanding of tool-use actions it was shown 9-month-old infants are able to interpret an action performed by a mechanical device (a claw) as goal-directed, if the infants were shown that the device was operated by a human hand [33]. Infants at this age are not yet able to intentionally use a claw as a tool but they were shown to be able to interpret an action performed with the claw. Similar results were reported by Bertenthal and Longo [118] showing the same effect in an A-not-B-Error task. In sum, these findings suggest a close link between action perception and action production, however, they do not support the common sense view that infants understanding of their own actions is a precondition for the understanding of other peoples actions.


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

12.3 Conclusion To sum up, the research reviewed in this chapter has shown that infants start at a very early age to successfully interpret others goal-directed actions. This early understanding of others actions provides fundamental basis for the understanding others mental states. It was furthermore shown that perception and control of an action are mechanisms which are from very early on deeply intertwined similar to adults. The underlying neural mechanisms of this early understanding of the surrounding social world seems not to be fundamentally different from adults as research on the interrelation of action perception and control shows both, an earlier development of action control compared to action perception and vice versa. From the research reported, one can conclude, that cognitive development does not only seem to be depend on experience with own actions but also on shared experiences with other persons. Due to an early presence of a mirror neuron system or a common representation of perception and action as described earlier in adults might allow the causal influence of action perception and control to be bidirectional. A strong and bidirectional interrelation already in very early infancy may serve as an extremely powerful engine in the development of social understanding [65, 119, 120]. Not to be reduced for either direction of influence extends the possibilities to acquire knowledge about the surrounding physical and social world. This hypothesis of a parallel development of action perception and control received recent support by a study conducted by Sommerville and Woodward [121], in which infants aged 10 months either succeeded in both the perception and the action version of a means-end support task or failed in both tasks. For a larger overview of a possible theoretical background of the parallel development of action perception and control, see Aschersleben [122]. To sum up, as human beings, we need to coordinate our actions with others, and we need to understand other peoples actions to interact with them. It is thus of great relevance to learn how to control own actions and to come to understand and interpret actions performed by others. In the present chapter we showed that this intersubjectivity is deeply rooted already very early in infancy. 12.4 References
[1] [2] [3] [4] [5] [6] [7] [8] [9] W. Prinz, A common coding approach to perception and action. In O. Neumann and W. Prinz (Eds.), Relationships between perception and action, (pp. 167-201). Berlin: Springer-Verlag, 1990. W. Prinz, Perception and action planning. European Journal of Cognitive Psychology, 9, 129154, 1997. E. S. Spelke, K. Breinlinger, J. Macomber & K. Jacobson, Origins of Knowledge. Psychological Review, 99, 605-632, 1992. E. S. Spelke, Initial knowledge: Six suggestions. Cognition, 50, 431-445, 1994. R. Baillargeon, Object permanence in 3 1/2- and 4 1/2-month-old infants. Developmental Psychology, 23, 655-664, 1987. T. Wilcox, L. Nadel & R. Rosser, Location memory in healthy preterm and fullterm infants. Infant Behavior and Development, 19, 1996. R. Baillargeon, E. S. Spelke & S. Wasserman, Object permanence in five-month-old infants. Cognition, 20, 191-208, 1985. S. J. Hespos & R. Baillargeon, Infants' knowledge about occlusion and containment events: a surprising discrepancy. Psychological Science, 12, 141-147, 2001. S. J. Hespos & R. Baillargeon, Reasoning about containment events in very young infants. Cognition, 78, 207-245, 2001.

M.M. Daum et al. / Early Ontogeny of Action Perception and Control


[10] C. von Hofsten & K. Lindhagen, Observations on the development of reaching for moving objects. Journal of Experimental Child Psychology, 28, 158-173, 1979. [11] R. K. Clifton, D. W. Muir, D. H. Ashmead & M. G. Clarkson, Is visually guided reaching in early infancy a myth? Child Development, 64, 1099-1110, 1993. [12] C. von Hofsten, Predictive reaching for moving objects by human infants. Journal of Experimental Child Psychology, 30, 369-382, 1980. [13] C. von Hofsten, Catching skills in infancy. Journal of Experimental Psychology: Human Perception and Performance, 9, 75-85, 1983. [14] M. M. Daum, W. Prinz & G. Aschersleben, Means-end behavior in young infants: the interplay of action perception and action production. Manuscript submitted for publication, 2007. [15] P. Willatts, Development of means-end behavior in young infants: Pulling a support to retrieve a distant object. Developmental Psychology, 35, 651-667, 1999. [16] E. Bates, V. Carlson-Luden & I. Bretherton, Perceptual aspects of tool using in infancy. Infant Behavior and Development, 3, 127-140, 1980. [17] A. Diamond, Development of the ability to use recall to guide action, as indicated by infants' performance on AB. Child Development, 56, 868-883, 1985. [18] I. C. Uzgiris & J. M. Hunt, Assessment in infancy: Ordinal scales of psychological development. Chicago: University of Illinois Press, 1975. [19] P. Willatts, Development of problem-solving strategies in infancy. In D. F. Bjorklund (Ed.), Children's strategies: Contemporary views of cognitive development, (pp. 23-66). Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc, 1990. [20] J. J. Campos, et al., Travel broadens the mind. Infancy, 1, 149-219, 2000. [21] A. G. Greenwald, Sensory feedback mechanisms in performance control: With spatial reference to the ideo-motor mechanism. Psychological Review, 77, 73-99, 1970. [22] W. James, The principles of psychology. New York: Holt, 1890. [23] R. H. Lotze, Medicinische Psychologie oder Physiologie der Seele. Leipzig: Weidmann'sche Buchhandlung, 1852. [24] W. Prinz, Ideomotor action. In H. Heuer and A.-F. Sanders (Eds.), Perspectives on perception and action, (pp. 47-76). Hillsdale, NJ:Lawrence Erlbaum Associates, 1987. [25] B. Hommel, J. Msseler, G. Aschersleben & W. Prinz, The Theory of Event Coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences, 24, 849-937, 2001. [26] G. Aschersleben, Temporal control of movements in sensorimotor synchronization. Brain and Cognition, 48, 66-79, 2002. [27] B. Hommel, The cognitive representation of action: Automatic integration of perceived action effects. Psychological Research, 59, 176-186, 1996. [28] F. Mechsner, D. Kerzel, G. Knoblich & W. Prinz, Perceptual effects of bimanual coordination. Nature, 414, 69-73, 2001. [29] G. Knoblich & R. Flach, Predicting the effects of actions: Interactions of perception and action. Psychological Science, 12, 467-472, 2001. [30] A. N. Meltzoff, Elements of a developmental theory of imitation. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind. Development, evolution, and brain bases, (pp. 19-41). New York: Cambridge University Press, 2002. [31] A. N. Meltzoff, The centrality of motor coordination and proprioception in social and cognitive development: From shared actions to shared mind. In G. J. P. Savelsbergh (Ed.), The development of coordination in infancy, (pp. 463-496). Amsterdam: Free University Press, 1993. [32] M. Carpenter, J. Call & M. Tomasello, Twelve- and 18-month-olds copy actions in terms of goals. Developmental Science, 8, F13-F20, 2005. [33] T. Hofer, P. Hauf & G. Aschersleben, Infant's perception of goal-directed actions performed by a mechanical device. Infant Behavior and Development, 28, 466-480, 2005. [34] A. L. Woodward & J. A. Sommerville, Twelve-month-old infants interpret action in context. Psychological Science, 11, 73-77, 2000. [35] M. M. Daum, W. Prinz & G. Aschersleben, Encoding the goal of an object-directed but uncompleted reaching action in 6- and 9-month-old infants. Developmental Science, in press. [36] A. L. Woodward, Infants selectively encode the goal object of an actor's reach. Cognition, 69, 134, 1998. [37] M. Legerstee, J. Barna & C. DiAdamo, Precursors to the development of intention at 6 months: Understanding people and their actions. Developmental Psychology, 36, 627-634, 2000. [38] Y. Luo & R. Baillargeon, Can a self-propelled box have a goal? Psychological reasoning in 5month-old infants. Psychological Science, 16, 601-608, 2005.


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

[39] B. Jovanovic, et al., The role of effects for infant's perception of action goals. Psychologia, in press. [40] A. L. Woodward, Infants' ability to distinguish between purposeful and non-purposeful behaviors. Infant Behavior and Development, 22, 145-160, 1999. [41] T. Hofer, P. Hauf & G. Aschersleben, Infant's perception of goal-directed actions on television. British Journal of Developmental Psychology, 25, 485-498, 2007. [42] T. Hofer, A. Hohenberger, P. Hauf & G. Aschersleben, The impact of maternal interaction style on infant action understanding. Infant Behavior and Development, 31, 115-126, 2008. [43] I. Kirly, B. Jovanovic, W. Prinz, G. Aschersleben & G. Gergely, The early origins of goal attribution in infancy. Consciousness & Cognition: An International Journal, 12, 752-769, 2003. [44] G. Csibra, G. Gergely, S. Biro, O. Koos & M. Brockbank, Goal attribution without agency cues: the perception of 'pure reason' in infancy. Cognition, 72, 237-267, 1999. [45] G. Gergely, Z. Nadasdy, G. Csibra & S. Biro, Taking the intentional stance at 12 months of age. Cognition, 56, 165-193, 1995. [46] G. Gergely & G. Csibra, Teleological reasoning in infancy: The infant's naive theory of rational action: A reply to Premack and Premack. Cognition, 63, 227-233, 1997. [47] G. Csibra & G. Gergely, The teleological origins of mentalistic action explanations: A developmental hypothesis. Developmental Science, 1, 255-259, 1998. [48] K. Kamewari, M. Kato, T. Kanda, H. Ishiguro & K. Hiraki, Six-and-a-half-month-old children positively attribute goals to human action and to humanoid-robot motion. Cognitive Development, 20, 303-320, 2005. [49] M. M. Saylor, D. A. Baldwin, J. A. Baird & J. LaBounty, Infants' on-line segmentation of dynamic human action. Journal of Cognition and Development, 8, 113-128, 2007. [50] D. A. Baldwin, J. A. Baird, M. M. Saylor & M. Clark, Infants parse dynamic action. Child Development, 72, 708-717, 2001. [51] D. A. Baldwin & J. A. Baird, Action analysis: A gateway to intentional inference. In P. Rochat (Ed.), Early social cognition: Understanding others in the first months of life, (pp. 215-240). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers, 1999. [52] A. L. Woodward & J. J. Guajardo, Infants' understanding of the point gesture as an objectdirected action. Cognitive Development, 17, 1061-1084, 2002. [53] A. T. Phillips, H. M. Wellman & E. S. Spelke, Infants' ability to connect gaze and emotional expression to intentional action. Cognition, 85, 53-78, 2002. [54] B. Sodian & C. Thoermer, Infants' understanding of looking, pointing, and reaching as cues to goal-directed action. Journal of Cognition and Development, 5, 289-316, 2004. [55] A. L. Woodward, Infants' developing understanding of the link between looker and object. Developmental Science, 6, 297-311, 2003. [56] Y. A. Shimizu & S. C. Johnson, Infants' attribution of a goal to a morphologically unfamiliar agent. Developmental Science, 7, 425-430, 2004. [57] T. Behne, M. Carpenter, J. Call & M. Tomasello, Unwilling versus unable: Infants' understanding of intentional action. Developmental Psychology, 41, 328-337, 2005. [58] C. Schwier, C. van Maanen, M. Carpenter & M. Tomasello, Rational imitation in 12-month-old infants. Infancy, 10, 2006. [59] N. Zmyj, M. M. Daum & G. Aschersleben, The development of rational imitation in 9- and 12month-old infants. Manuscript submitted for publication, 2007. [60] A. N. Meltzoff, Understanding the intentions of others: Re-enactment of intended acts by 18month-old children. Developmental Psychology, 31, 838-850, 1995. [61] S. C. Johnson, A. Booth & K. O'Hearn, Inferring the goals of a nonhuman agent. Cognitive Development, 16, 637-656, 2001. [62] F. Bellagamba & M. Tomasello, Re-enacting intended acts: Comparing 12- and 18-month olds. Infant Behavior & Development, 22, 277-282, 1999. [63] M. Carpenter, N. Akhtar & M. Tomasello, Fourteen- through 18-month-old infants differentially imitate intentional and accidental actions. Infant Behavior and Development, 21, 315-330, 1998. [64] J. H. Flavell, Theory-of-mind development: Retrospect and prospect. Merrill-Palmer-Quarterly, 50, 274-290, 2004. [65] M. Tomasello, Having intentions, understanding intentions, and understanding communicative intentions. In P. D. Zelazo, J. W. Astington & D. R. Olson (Eds.), Developing theory of intention: Social understanding and self-control, (pp. 63-75). Majwaj, NJ: Erlbaum, 1999. [66] H. M. Wellman & A. T. Phillips, Developing intentional understandings. In B. F. Malle (Ed., Intentions and intentionality: Foundations of a social cognition, (pp. 125-148). Cambridge, MA, US: The MIT Press, 2001.

M.M. Daum et al. / Early Ontogeny of Action Perception and Control


[67] D. Premack & G. Woodruff, Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1, 515-526, 1977. [68] H. Wimmer & J. Perner, Beliefs about beliefs - Representation and constraining function of wrong beliefs in young children's understanding of deception. Cognition, 13, 103-128, 1983. [69] A. M. Leslie, Developmental parallels in understanding minds and bodies. Trends in Cognitive Sciences, 9, 2005. [70] V. Southgate, A. Senju & G. Csibra, Action anticipation through attribution of false belief by 2year-olds. Psychological Science, 18, 587-592, 2007. [71] K. H. Onishi & R. Baillargeon, Do 15-month-olds infants understand false beliefs? Science, 308, 255-258, 2005. [72] H. M. Wellman, S. Lopez-Duran, J. LaBounty & B. Hamilton, Infant attention to intentional action predicts preschool theory of mind. Developmental Psychology, in press. [73] H. M. Wellman, A. Phillips, S. Dunphy-Lelii & N. LaLonde, Infant social attention predicts preschool social cognition. Developmental Science, 7, 283-288, 2004. [74] G. Aschersleben, T. Hofer & B. Jovanovic, The link between infant attention to goal-directed action and later theory of mind abilities. Developmental Science, in press. [75] K. M. Olineck & D. Poulin-Dubois, Imitation of intentional actions and internal state language in infancy predict preschool theory of mind skills. European Journal of Developmental Psychology, 4, 14-30, 2007. [76] K. M. Olineck & D. Poulin-Dubois, Infants' ability to distinguish between intentional and accidental actions and its relation to internal state language. Infancy, 8, 91-100, 2005. [77] M. Brass, H. Bekkering & W. Prinz, Movement observation affects movement execution in a simple response task. Acta Psychologica, 106, 3-22, 2001. [78] B. Strmer, G. Aschersleben & W. Prinz, Correspondence effects with manual gestures and postures: A study of imitation. Journal of Experimental Psychology: Human Perception and Performance, 26, 1746-1759, 2000. [79] B. Repp & G. Knoblich, Action can affect auditory perception. Psychological Science, 18, 6-7, 2007. [80] A. Hamilton, D. Wolpert & U. Frith, Your own action influences how you perceive another persons action. Current Biology, 14, 493-498, 2004. [81] P. Whr & J. Msseler, Time course of the blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance, 27, 1260-1270, 2001. [82] A. Schub, W. Prinz & G. Aschersleben, Perceiving while acting: Action affects perception. Psychological Research, 68, 208-215, 2004. [83] V. Gallese, L. Fadiga, L. Fogassi & G. Rizzolatti, Action recognition in the premotor cortex. Brain, 119, 593-609, 1996. [84] G. Rizzolatti, L. Fadiga, V. Gallese & L. Fogassi, Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141, 1996. [85] J. Decety & J. Grzes, Neural mechanisms subserving the perception of human actions. Trends in Cognitive Sciences, 3, 172-178, 1999. [86] M. Iacoboni, et al., Cortical mechanisms of human imitation. Science, 286, 2526-2528, 1999. [87] L. Carr, M. Iacoboni, M. C. Dubeau, J. C. Mazziotta & G. L. Lenzi, Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. Proceedings of the National Academy of Sciences of the United States of America, 100, 5497-5502, 2003. [88] M. A. Arbib, From monkey-like action recognition to human language: an evolutionary framework for neuro-linguistics. Behavioral and Brain Sciences, 28, 105-125; discussion 125167, 2005. [89] J. H. Williams, A. Whiten, T. Suddendorf & D. I. Perret, Imitation, mirror neurons and autism. Neuroscience and Biobehavioral Reviews, 25, 287-295, 2001. [90] J.-F. Lepage & H. Theoret, The mirror neuron system: grasping others' actions from birth? Developmental Science, 10, 513-529, 2007. [91] P. Hauf & W. Prinz, The understanding of own and others' actions during infancy: "You-likeMe" or "Me-like-You"? Interaction Studies, 6, 429-445, 2005. [92] A. N. Meltzoff, Imitation and other minds: The "Like Me" hypothesis. In S. Hurley and N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science, 2: Imitation, human development, and culture, (pp. 55-77). Cambridge, MA: MIT Press, 2005. [93] C. Moore & V. Corkum, Social understanding at the end of the first year of life. Developmental Review, 14, 349-372, 1994. [94] M. Tomasello, A. C. Kruger & H. H. Ratner, Cultural learning. Behavioral and Brain Sciences, 16, 495-552, 1993. [95] J. Piaget, The child's conception of the world. Totowa, NJ: Littlefield, Adams, 1929.


M.M. Daum et al. / Early Ontogeny of Action Perception and Control

[96] J. Piaget, The origins of intelligence in children. New York: International Universities Press, 1952. [97] G. Butterworth, Development in infancy: a quarter century of empirical and theoretical progress. In C. A. Hauert (Ed.), Developmental psychology: Cognitive, perceptuo-motor, and neuropsychological perspectives, (pp. 183-190). Amsterdam: North-Holland, 1990. [98] J. Langer, From acting to understanding the comparative development of meaning. In W. F. Overton & D. Palermo (Eds.), The nature and ontogenesis of meaning, (pp. 191-214). Hillsdale, NJ: Erlbaum, 1994. [99] M. A. Eppler, Development of manipulatory skills and the deployment of attention. Infant Behavior and Development, 18, 391-405, 1995. [100] A. N. Meltzoff & M. K. Moore, Imitation of facial and manual gestures by human neonates. Science, 198, 74-78, 1977. [101] R. Grush, The emulation theory of representation: Motor control, imagery, and perception. Behavioral and Brain Sciences, 27, 377-396, 2004. [102] M. Wilson & G. Knoblich, The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131, 460-473, 2005. [103] I. M. Thornton & G. Knoblich, Action perception: seeing the world through a moving body. Current Biology, 16, R27-R29, 2006. [104] M. Schlesinger & J. Langer, Infants' developing expectations of possible and impossible tool-use events between ages 8 and 12 months. Developmental Science, 2, 195-205, 1999. [105] T. Falck-Ytter, G. Gredebck & C. von Hofsten, Infants predict other people's action goals. Nature Neuroscience, 9, 878-879, 2006. [106] M. R. Longo & B. I. Bertenthal, Common coding of observation and execution of action in 9month-old infants. Infancy, 10, 43-59, 2006. [107] J. Bruner, Eyes, hand, and mind. In D. Elkind & J. H. Flavell (Eds.), Studies in cognitive development: Essays in honor of Jean Piaget, (pp. 223-235). Oxford: Oxford University Press, 1969. [108] J. A. Sommerville, A. L. Woodward & A. Needham, Action experience alters 3-month-old infants' perception of others' actions. Cognition, 96, B1-B11, 2005. [109] P. Hauf, G. Aschersleben & W. Prinz, Baby do-baby see! How action production influences action perception in infants. Cognitive Development, 22, 16-32, 2007. [110] A. M. Leslie, The necessity of illusion: perception and thought in infancy. In L. Weiskrantz (Ed.), Thought without language. Oxford: Clarendon, 1988. [111] R. Baillargeon & M. Graber, Evidence of location memory in 8-month-old infants in a nonsearch AB task. Developmental Psychology, 24, 502-511, 1988. [112] R. Baillargeon, J. DeVos & M. Graber, Location memory in 8-month-old infants in a non-search AB task: Further evidence. Cognitive Development, 4, 345-367, 1989. [113] N. E. Berthier, S. DeBlois, C. Poirier, M. Novak & R. Clifton, Where's the ball? Two- and threeyear-olds reason about unseen events. Developmental Psychology, 36, 394-401, 2000. [114] B. Hood, V. Cole-Davies & M. Dias, Looking and search measures of object knowledge in preschool children. Developmental Psychology January, 39, 61-70, 2003. [115] C. Mash, R. K. Clifton & N. E. Berthier. Two-year-olds' event reasoning and object search. 13th Biennial International Conference on Infant Studies. Toronto, Ontario, Canada, 2002. [116] M. M. Daum, M. Vuori, W. Prinz & G. Aschersleben, Encoding the goal of an object-related grasping action in 6- and 9-month-old infants. Manuscript submitted for publication, 2007. [117] C. von Hofsten & L. Rnnqvist, Preparation for grasping an object: a developmental study. Journal of Experimental Psychology: Human Perception and Performance, 14, 610-621, 1988. [118] B. I. Bertenthal & M. R. Longo, Common Coding of the observation and execution of goaldirected actions: Poster presented at the 2007 Biennial Meeting of the Society for Research in Child Development (SRCD), Boston, MA, USA, March 29 -April 1., 2007. [119] A. N. Meltzoff, Imitation as a mechanism of social cognition: Origins of empathy, theory of mind, and the representation of action. In U. Goswami (Ed.), Blackwell handbook of childhood cognitive development, (pp. 6-25). Malden, MA: Blackwell Publishers, 2002. [120] A. L. Woodward, J. A. Sommerville & J. J. Guajardo, How infants make sense of intentional action. In B. F. Malle & L. J. Moses (Eds.), Intentions and intentionality: Foundations of social cognition, (pp. 149-169). Cambridge, MA: The MIT Press, 2001. [121] J. A. Sommerville & A. L. Woodward, Infants' sensitivity to the causal features of means-end support sequences in action and perception. Infancy, 8, 119-145, 2005. [122] G. Aschersleben, Early development of action control. Psychology Science, 48, 405-418, 2006.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



The Role of Joint Commitment in Intersubjectivity

Abstract. Since the beginning of the Nineteen-eighties, cognitive scientists have shown increasing interest in a range of phenomena, processes and capacities underlying human interaction, collectively referred to as intersubjectivity. The goal of this line of research is to give an account of the various forms of human interaction, and in particular of the affective, attentional and intentional determinants of joint activity. The main thesis we develop in the paper is that so far the authors interested in intersubjectivity have neglected, or at least undervalued, an important aspect of joint activity, that is, the essentially normative character of collective intentionality. Our approach to joint activity is mainly based on Margaret Gilberts theory of plural subjects. Gilberts general idea is that joint activities should be regarded as activities carried out by individuals who stand to one another in a special relation, called joint commitment, which has an intrinsically normative nature. As we shall try to show, the concept of a joint commitment is a powerful tool to explain certain specific features of joint activities. In the paper we first point out certain explanatory inadequacies of the current models of intersubjectivity, and contend that such inadequacies depend on failing to appreciate the fundamental role of normativity in collective intentionality. We briefly sketch Gilberts theory of plural subjects, and introduce the concept of a joint commitment, and then discuss some lines along which a psychology of plural subjects may be developed.

Contents 13.1 13.2 13.3 13.4 13.5 13.6 Introduction....................................................................................................... 188 Intersubjectivity and deontic normativity ......................................................... 189 Joint commitment ............................................................................................. 191 Steps to a psychology of plural subjects ........................................................... 193 Conclusions....................................................................................................... 199 References......................................................................................................... 200


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

13.1 Introduction Since the beginning of the Nineteen-eighties, cognitive scientists have shown increasing interest in a range of phenomena, processes and capacities underlying human interaction, collectively referred to as intersubjectivity. The view advocated by these scientists is remarkably different from the one developed within the more traditional Theory of Mind approaches, either in the Theory Theory or in the Simulation Theory versions. Through the contributions of several authors [1-10] a novel view of human interaction is being developed, that is compatible with state-of-the-art knowledge on the phylogenesis and ontogenesis of interaction capacities, with the analysis of human experience worked out by phenomenologists, and with recent findings in the field of the neurosciences. The goal of this line of research is to give an account of the various forms of human interaction, and in particular of the affective, attentional and intentional determinants of joint activity. Indeed, joint activity has long been a major issue for the social sciences and for analythical philosophy. Broadly speaking, the relevant theories can be classified in two groups: in the first group we have theories that attempt to give a summative account of joint activity, reducing it to the same building blocks underlying individual activity; the second group includes nonsummative theories, which claim that joint activity requires certain special types of mental representations, often referred to as collective intentionality [11]. Most authors currently interested in intersubjectivity support some form of nonsummative account. Observational and experimental results on non-human primates, human adults, and human children suggest that humans possess specific mental capacities, which enable forms of joint activity that are precluded to other primate species. A complete and coherent view of such capacities, however, is still beyond the state of the art. In this paper we aim to give a contribution to the construction of such a view. Our main thesis is that so far the authors interested in intersubjectivity have neglected, or at least undervalued, an important aspect of joint activity, that is, the essentially normative character of collective intentionality. Our approach to joint activity is mainly based on Margaret Gilberts theory of plural subjects [12-15]. Gilberts general idea is that joint activities should be regarded as activities carried out by plural subjects, which can be viewed as sets of individual subjects who stand to one another in a special relation, named joint commitment, that has an intrinsically normative nature. As we shall try to show, the concept of a joint commitment is a powerful tool to explain certain specific features of human joint activities. This article is structured as follows. In Section 2 we point out certain explanatory inadequacies of the current models of intersubjectivity, and contend that such inadequacies depend on failing to appreciate the fundamental role of normativity in collective intentionality. In Section 3 we briefly sketch Gilberts theory of plural subjects, and introduce the concept of a joint commitment. In Section 4 we discuss some lines along which a psychology of plural subjects may be developed. Finally, in Section 5 we draw some conclusions and delineate some directions for future research.

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


13.2 Intersubjectivity and deontic normativity Since Trevarthens distinction between primary and secondary intersubjectivity [1], it has become customary to differentiate among different types of intersubjectivity. For example, Stern [16] distinguishes between interaffective, interattentional, and interintentional sharing of experiences, and his distinction is taken up by other autors, like for example Ingar Brink [17]. Grdenfors [5] advocates a similar position, but adds a fourth component, that is, representing the beliefs and knowledge of others. In general, the different components of intersubjectivity are taken to be stratified in levels, both from an evolutionary and a developmental point of view. One of the leading themes of this area of research is to characterise human intersubjectivity with respect to the intersubjectivity of non-human primates, singling out the developmental phases at which specifically human structures and processes appear. Here we shall comment on a few works that we find representative of this approach. In a paper on What makes human cognition unique, Tomasello and Rakoczy [18] compare the impact on human social cognition of two key developmental moments, the first at about one year of age and the second at about four years. In the authors terminology, the first ontogenetic step brings in shared intentionality, that is, the childrens ability to establish self-other equivalence, to take different perspectives on things, and to reflect on and provide normative judgement on their own cognitive activities (p. 123). The second ontogenetic step, which comes after several years of continuous interaction, especially linguistic interaction, with other persons brings in collective intentionality, which ends up in the comprehension of cultural institutions based on collective beliefs and practices such as money and marriage and government. While it is obvious that the second ontogenetic step is uniquely human, Tomasello and Rakoczy contend that a fundamental qualitative difference between human and non-human primates is already brought in by the first step, which sets the bases that make the second step possible. One important aspect whose emergence brings from the first to the second developmental moment is normativity. Here we need to comment on this term, because it is used with different meanings, one of which is essential to our proposal. In the paper we are considering, the authors distinguish between original and derived normativity (p. 127). Original normativity is in fact coextensive with intentionality: every intentional state, as such, has conditions of satisfaction, and can therefore succeed or fail [19]. An intentional action, for example, may achieve or fail to achieve its purpose, and a belief may be true or false. Given that intentional states are the same thing as (mental) representations, we call this kind of normativity representational. Derived normativity has to do with the collectively accepted functions of artefacts. A fork is for bringing solid food to ones mouth, a switch is to turn the light on and off, and so on: functions are normative in the sense that they tell us how an artefact ought to be used. We call this kind of normativity functional. Besides representational and functional normativity, however, there is a third important kind of normativity, that we call deontic. Deontic normativity has to do with obligations and rights, in particular with directed obligations and rights, that is, the obligations and rights that a subject has relative to other subjects. Deontic normativity is often believed to come about


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

only with complex cultural products like legal systems, regulations, contracts and the like. On the contrary, we shall defend the idea that a form of deontic normativity is already there in every kind of joint activity, being a constitutive component of collective intentionality. If this is the case, representational and functional normativity, although essential for human cognition, are not sufficient to account for the normativity of collective intentionality. A second paper we want to discuss here is Brink and Grdenforss work on cooperation and communication in apes and humans [6]. The authors argue that non-human primates are incapable of future-directed cooperation, which concerns new goals that lack fixed value and requires symbolic communication and context-independent representations of means and goals (p. 484). In this paper, Brink and Grdenfors remark that one of the key aspects of cooperation, that is, the guarantee of proper compensation for ones efforts, becomes hazardous with future-directed cooperation. As the authors put it, in the case of as yet imaginary goals, compensation becomes much more of a venture than a safe strategy (pp. 488-489). Brink and Grdenfors consider cooperation within a game-theoretical framework. Much of their argument is based on the difficulty of developing reliable expectations about the others behaviour; expectations are regarded as a purely informational phenomenon, and there is little concern for the normative component of interaction. Toward the end of the paper, the authors turn their attention to aspects of cooperation that involve deontic normativity, like feelings of shame and the expectation of sanctions from the rest of the group related to defective behaviour. This line of thought, however, is not pursued to the point of considering future-directed cooperation as a form of interaction intrinsically driven by deontic normativity. As Brink and Grdenfors remark, the core problem of future-directed cooperation is that it will be difficult to make estimates concerning the behaviour of other agents on the basis of previous experience, since the situation is new and unknown (p. 499). We shall argue in the rest of this paper that providing a sound basis for estimating the future behaviour of other agents is the primary function of joint commitments. Another relevant work is Grdenforss article on the cognitive and communicative demands of cooperation [4], where the author presents a table of different forms of cooperation, at least three of which (Commitment and contract, Cooperation based on conventions, The cooperation of Homo oeconomicus; p. 20) seem to us to involve deontic normativity. Among the demands of these forms of cooperation a special place is given to symbolic communication, while the role of deontic normativity is ignored. For example, it is said (p. 14) that to promise something only means that you intend to do it. On the other hand, when you commit yourself to a second person to do an action, you intend to perform the action in the future, the other person wants you to do it and intends to check that you do it, and there is joint belief concerning these intentions and desires [20]. Unlike promises, commitments can thus not arise unless the agents achieve joint beliefs and have anticipatory cognition. Two criticisms can be made to this position. The first is that promising creates obligations, and is not limited to letting someone else know what one intends to do (see for example [21]). The second is that committing to a second person to do an action cannot be analysed only in terms of epistemic and volitional states like beliefs, desires, and intentions. So, on the one hand to promising is committing oneself; on the other

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


hand, there is more to commitment than achieving joint beliefs and having anticipatory cognition. In a series of important works, Hannes Rakoczy investigates the childrens ability to construct and exploit social reality. In [22] the author interprets young childrens pretend play as examples of cooperative activities involving the collective definition of fragments of social reality (understood along the lines of Searles account [23]). Rakoczys interpretation of pretend play comes very close to the concept of joint commitment that we shall discuss in the following sections: in Rakoczys words, a we-intention essentially involves some basic form of commitment to acting together, analogous to the individual commitment of actors in solitary actions, but different in that not only my own desires and intentions provide reasons for further intentions and actions, but now the collaborators actions and intentions provide reasons for me to act accordingly in the course of the joint action (p. 120). Still it seems to us that the deontic nature of joint commitment is not fully appreciated. As a consequence, commitments are regarded, somewhat vaguely, as quite minimally involving an appreciation of normative inferential (reason giving) relations between collaborators and own actions and the willingness to respect these relations in the pursuit of acting together successfully (p. 120). We believe that the best way to characterise such normative inferential (reason giving) relations is to regard them as deontic relationships (i.e., directed obligations, rights, and entitlements) generated by joint commitments. The discussion we have carried out so far suggests that deontic normativity may indeed be a fundamental component of human interaction. If this is the case, we believe, theories of intersubjectivity will have to grant deontic normativity the room it deserves. In the rest of this paper we shall try to give an initial contribution in this direction, starting from a concise introduction to Gilberts concept of a plural subject. 13.3 Joint commitment Gilberts theory of joint activities is centred on the concept of a plural subject and to the strictly related normative notion of a joint commitment. The importance of normative concepts in general, and of commitment in particular, for understanding human interactions has been recognised long ago. For example, in their pioneering book Winograd and Flores [24] wanted to counteract the forgetfulness of commitment that pervades much of the discussion (both theoretical and commonplace) about language (p. 76). In argumentation theory, commitmentbased models have been proposed and discussed since the concept of a commitment store was introduced by Hamblin [25] and later developed by Walton and Krabbe [26]. Very recently, John Searle [27] has advocated a view of human language in which deontic normativity is regarded as a basic constitutive component, side by side with representative power and syntactic compositionality. In the current landscape, Gilberts theory is unique in placing deontic normativity at the very heart of collective intentionality. Gilberts idea is that all genuinely collective phenomena (like joint activities, collective beliefs, group feelings, social conventions, and so on) involve a normative component, called joint commitment, that turns the set of interacting subjects into a plural subject.


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

The idea of a plural subject may sound metaphysically suspicious, but in fact it is nothing more than a group of individuals bound by a joint commitment. In turn, for a group of individuals to be bound by a joint commitment it is necessary and sufficient for them to entertain certain mental representations. What it means for a group of individuals to be jointly committed to doing X (or believing X, or feeling X, and so on) is explained by Gilbert in several books and papers (see in particular [13], Part III; [14], Chapter 4; and [15], Chapter 7). Below we briefly describe the main features of this important concept. A subject may be individually committed to do X, for example as a result of a personal decision: such a decision may be rescinded, but until this does not happen the subject is committed to do X. Being committed to do X is a reason (although not a sufficient cause) for the subject to do X; however, in the individual case the subject is the only owner of the commitment, and can rescind the commitment as he or she pleases. Contrary to individual commitments, a joint commitment is a commitment of two or more subjects, which we shall call parties of the joint commitment, to engage in a common enterprise as a single body. Taken together, a number of subjects jointly committed to do X form a plural subject of doing X. The main difference between individual and joint commitments is that joint commitments are not separately owned by their parties, but they are, so to speak, collectively owned by all parties at the same time. Joint commitments may arise as a result of an agreement. However, explicit agreements are not necessary: according to Gilbert, what is necessary and sufficient to create a joint commitment, and thus to set up a plural subject, is that it is common knowledge of all parties that every party is ready to engage in some joint enterprise. Such common knowledge may derive from explicit agreements, but also from less structured communicative exchanges and, in many cases, from shared understanding of a culturally meaningful context. Let us consider a few examples. Ann may say to Bob, Im going for a walk, would you like to come? If Bob answers, Yes, sure!, then it will be common knowledge of Ann and Bob that they are both ready to engage in a walk together, and this suffices to create a joint commitment to have a walk together. In certain situations, like for example a dinner party, it will be common knowledge of all participants (without the need of specific communicative exchanges) that all parties are ready to carry out certain kinds of joint activities, like chatting or dancing, with the other participants. Indeed, joint commitments are much more common in human interaction than one may think. Even an apparently unilateral promise, like Bob saying to Ann I promise to come visit tomorrow evening, if accepted by the Ann creates a joint commitment, because while Bob is now obliged to do what he promised, Ann is obliged to stay at home and welcome Bob. For our current purpose, the main feature of joint commitments is that they generate deontic relationships, like directed obligations and the correlative rights and entitlements. (A directed obligation is an obligation that a subject, the debtor of the obligation, owes to another subject, the creditor of the obligation. Every directed obligation brings about a correlative right of the creditor to the debtor.) If n subjects are jointly committed to do something, then every subject is obligated to all other subjects to do his or her part of the joint activity, and has the right that all other subjects do their parts. It is characteristic of joint commitments that all such obligations are created simultaneously, and are interdependent in the sense that if

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


one of the parties fails to fulfill one of his or her obligations, then the joint commitment is violated. What exactly this amounts to depends on a variety of circumstances, including the number of members of the plural subject. In particular, in the case of two parties the violation of an obligation by one of them rescinds the joint commitment. According to Gilbert, every genuine case of joint activity is an activity carried out by a plural subject, and thus involves joint commitments. It is important to understand that such commitments are not imposed to the parties from the outside, but are internal to the joint activity. For example, when a group of people engage in a game, we do not need to assume that there is some external source of obligations that compels the participants to follow the rules of the game: rather, engaging in a game together is by itself a source of obligations. Our brief presentation of plural subjects and joint commitments raises a number of important issues: What is the function of joint commitment? To what kind of things can people jointly commit? What kinds of joint commitments are involved in joint activities? What kinds of cognitive processes underlie joint commitment? How do people make and maintain joint commitments? Since what age are humans able to participate in joint commitments? Some of these questions are logical, in the sense that they concern the function and structure of joint commitments, and some are psychological, in the sense that they directly concern human mental capacities. In the two following sections we shall submit some initial answers to the previous questions. 13.4 Steps to a psychology of plural subjects 13.4.1 The function of joint commitments At least since Aristotle, we understand human beings as rational animals. If we construe the concept of a reason broadly enough, humans are not the only rational species on Earth. But, based on the experimental evidence collected so far, it is generally accepted that humans are the only species that can deploy a very specific type of rationality, that is, the ability to plan their future. Given that anticipatory planning is one of the distinctive features of Homo sapiens [28], it is not surprising that so much attention has been devoted to it by scholars of disciplines like cognitive psychology, philosophy of mind, economy, and artificial intelligence. The function of future-directed intentions, or prior intentions in Searles terminology [19], has been analysed, among the others, by Michael Bratman [29], who stresses their characteristic role of coordinating practical reasoning. Indeed future-directed intentions, organised into complex plans, allow human subjects to reason within stable tracks directed to specific purposes, thus avoiding the risk of being mislead by fluctuating motivations. From an analysis of current literature, it seems that most authors do not find it problematic to extend the stabilising function of intentions from individual to joint action: even the most complex forms of cooperation are assumed to require nothing more than the ability to share nested intentions and beliefs. At present, some authors are starting to see that this is not sufficient. For example, introducing contracts as a sophisticated form of human cooperation, Grdenfors [5] states that If we agree that I shall deliver a hen tomorrow in exchange for the axe you have


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

given me now, I believe that you believe that I will deliver the hen and you believe that I believe that our agreement will then be fulfilled, etc. Furthermore, a contract depends on the possibility of future sanctions and thus on anticipatory cognition: If I dont deliver the hen, you or the society will punish me for breaching the agreement (p. 20). There is an attempt, here, to reduce deontic normativity to the expectation of punishment, and thus to a purely epistemic phenomenon (a futuredirected belief). However, even before we ask ourselves whether this reduction is psychologically plausible, we face a conceptual problem here, because the very concept of a punishment is deontic. Indeed a punishment is more than just a cost imposed to the subject by someone else: it is a cost rightly imposed to the subject by someone else. In our view joint commitments play, in the case of collective activities, a stabilising role analogous to that played by future-directed intentions in the case of individual actions. Joint commitments achieve this function by creating directed obligations, thus decoupling future actions from possibly fluctuating motivations. Consider the following example: by entering a suitable joint commitment, Ann and Bob may form a plural subject of mutual care. While the joint commitment is in force, Ann and Bob will be obliged to carry out appropriate actions, like proving support to each other in difficult situations, and so on. Given the joint commitment, it is not important whether Ann or Bob are continually motivated to support each other: the reason for doing so is now an obligation created by the joint commitment. 13.4.2 The structure of joint commitments To what kind of things can people jointly commit? Or, in other words, what can be the content of a joint commitment? The most obvious examples of joint commitments concern joint activities. For example, by jointly committing to have a walk together, Ann and Bob create obligations concerning their future behaviour. But Margaret Gilbert argues that joint commitments are more general: for example, for a group of people to entertain a collective belief means that the group constitutes a plural subject of believing something. A joint commitment to believe, say, that all men are created equal, will carry out its function in much the same way as a joint commitment to do something together: that is, by creating directed obligations to perform appropriate actions, which will be determined case by case in a context-sensitive way. In the all men are created equal case, for example, every party of the plural subject is obliged to act accordingly, by treating every person with equity, by reacting to blatant discriminations, and so on. An important feature of Gilberts non-summative treatment (see for example [13], Chapter 14) is that a plural subject may collectively believe that p even if not all the parties (indeed, in extreme cases, none of them) believes that p. The point with collective belief is not what individuals actually believe, but what are their obligations given their joint commitment to believe something. Given the significance of affective states in intersubjectivity, it is important to understand whether people can create plural subject of feeling something. This seems to be the case when, for example, a team is proud of a remarkable achievement, or a group of people are sorry for a distressing event occurred to a common friend: statements like We are proud of being the first to land on Mars

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


or We are so sorry your house burnt to ashes reveal that the feeling of pride or sorrow is attributed to a plural subject. Analogously to the case of collective beliefs, a joint commitment to feeling something will carry out its function by creating obligations to perform appropriate actions, independently of the fact that the parties actually have the relevant feeling. Recently, Margaret Gilbert suggested that also joint attention is a plural subject phenomenon [30]. The idea is that joint attention is best understood in terms of a joint commitment to attend as a body to some particular in the environment of the parties (p. 7). According to Gilbert, joint attention requires mutual recognition, which in turn presupposes common knowledge of co-presence. Joint commitments thus appear to be a pervasive aspect of intersubjectivity. From the point of view of a theory of intersubjectivity, it is necessary to understand the relationships between joint commitments and psychological states. In Table 1 we propose a systematic view of all such states. We first classify psychological states into affective, attentional, and intentional states. Here the term intentional is to be understood as a synonym of representational, in line with the philosophical theory of intentionality: perceptions, beliefs, desires, and intentions are all examples of intentional states. We consider purely affective and attentional states as psychological states of a single individual: a distinction between individual and interpersonal states can be drawn only for intentional states, because interpersonality is achieved through representations. Intentional states are classified as individual or interpersonal. Examples of individual intentional states are intending to do something (in the future), intentionally doing something (right now), perceiving something, desiring something, and so on. Interpersonal intentional states are, by definition, those intentional states of a subject whose content involves psychological states of other subjects. There are basically three ways in which a psychological state of an individual may become interpersonal. The first way is through perception: a subject may directly perceive an affective, attentional, or intentional state of another subject. Indeed, the possibility of directly perceiving psychological states of another subject (inclusive of intentional states) is an important tenet of current theories of intersubjectivity (see for example [31]). The second way in which a psychological state may become interpersonal is through sharing: a shared state, in our terminology, is a state that is out in the open (to adopt the felicitous expression used by Gilbert to describe situations of common knowledge [13]) but to which there is no joint commitment. Again, the shared state may be affective, attentional, or intentional. As an example of a shared attentional state that involves no joint commitment consider two criminals trying to kill each other and standing a few meters apart, with the only gun at their disposal lying on the ground right between the two of them. In this situation there is shared attention to the other and to the gun, but of course the two criminals do not form a plural subject of paying attention to the other and to the gun. Finally, a psychological state may be interpersonal by being joint (or collective). By this we mean that the relevant subjects are jointly committed to entertaining such a state. As we have already remarked, the content of a joint commitment may be any affective, attentional, or intentional state.


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

attentional individual intentional subject A intends to do X, does X (intentionally), perceives X, believes X, desires X, etc. affective subject A perceives that subject B has emotion X it is out in the open for subjects A and B that one of them (or both of them) has emotion E A and B are jointly committed to have emotion X (as a body) attentional subject A perceives that subject B attends to object X it is out in the open for subjects A and B that one of them (or both of them) attend to object X A and B are jointly committed to attend to object X (as a body) intentional subject A perceives that subject B intends to do X, etc. it is out in the open for subjects A and B that one of them (or both of them) intends to do X, etc. A and B are jointly committed to intend X, do X, etc. (as a body)

perceived subject A has emotion X subject A attends to object X interpersonal shared


Table 1. A classification of psychological states.

Only some intersubjective processes involve shared intentional states, and an even smaller fraction involve joint intentional states. Among these, however, we find a very significant category of intersubjective processes, that is, joint activities which, in particular, presuppose joint intentions. It is important not to confuse our distinction between shared and joint psychological states with other kinds of distinctions, like for example the one between coordination, collaboration, and cooperation. All types of joint activities involve some kind of dependency between the actions performed by the different parties as part of the joint activity. The difference between coordination, collaboration, and cooperation concerns what we could call degree of coupling: while in the case of coordination, typically based on a loosely synchronised execution of individual plans, coupling is kept to a minimum, cooperation involves a very high degree of coupling, achieved through the collective execution of a common plan. However, all such types of joint activities involve joint commitments, even if their contents will be different for different kinds of joint activities. Suppose for example that Ann and Bob decide to have dinner together at Bobs apartment at 8 pm. As both of them are very busy, they will separately buy some ready-made food: Ann will get the entres and the wine, and Bob will take care of the main course. Ann and Bob are now bound by a joint commitment that generates at least the following obligations: that Ann gets the entres and the wine, goes to Bobs apartment around 8 pm, and then has dinner with Bob; that Bob gets the main course, will be at his apartment around 8 pm, and then has dinner with Ann. The first part of the joint activity, when Ann and Bob separately get the food, has a very low degree of coupling; in spite of this, however, there is a genuine joint commitment binding Ann and Bob to act as agreed.

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


13.4.3 Cognitive requirements A plural subject is a group of people bound by a joint commitment. In turn, the members of the group are bound by a joint commitment if, and only if, they have certain psychological states. But what kind of psychological states are involved? As we have remarked in Section 2, many authors agree that some form of commitment is essential at least for the most complex types of joint activity. There is, however, no attempt to explain what it takes for a subject to commit to a course of action. Given that joint commitments involve deontic normativity, it is tempting to consider them as a case of moral thought. This, however, may not prove a fruitful approach, because the deontic relationships produced by a joint commitment appear to be different from moral obligations. In our opinion, a major difference between moral obligations and the obligations of joint commitments is that, contrary to the former, the latter are intentionally created by people. To clarify the difference, suppose that Bob, motivated by his moral conviction that one should care after the ill, agrees with an elderly neighbour of his that he will soon visit her at the hospital. While one may dispute whether visiting his neighbour was really a moral obligation of Bobs, there is no doubt that after promising Bob is obliged to do so. Even if Bob changes his idea about the moral obligation of caring after the ill, he will still be obliged, because he freely committed his will by making an agreement. In any case, it is clear that the ability to enter into joint commitments presupposes the ability to understand obligations, rights, entitlements, and the like. We believe that such ideas cannot be reduced to non-deontic psychological states, like beliefs and intentions. Being obliged to do X is more than just expecting that if one does not do X something bad will happen. Suppose for example that Bob, together with a group of clients of the local branch of his bank, is caught in a robbery and is ordered by a masked guy to sit on the floor and stay still. Bob knows that something bad will happen if he tries to escape, and in some sense of the word we can actually say that he is obliged to sit on the floor and stay still. However, this obligation cannot be considered as a deontic relationship between Bob and the masked criminal. The problem of finding suitable primitives to which all deontic ideas can be reduced has long be considered in such fields as the philosophy of law and deontic logic. In [23], John Searle defends the idea that all deontic relationships can be defined in terms of one primitive, like for example obligation. This means that any being capable of entertaining thoughts of the kind I am obliged to ... would be able to represent all deontic relationships. A different approach, developed for the first time by Anderson in the field of deontic logic [32], is to reduce deontic notions like obligation and right to a lower-level concept, like violation. To understand this idea, suppose again that Ann and Bob agreed that Bob will visit Ann at her summer cottage next Sunday. Bob, in particular, is now obliged to Ann to go to Anns cottage next Sunday. This idea may take the following form: If I do not go to Anns cottage next Sunday, then I make a violation to Ann. What seems to be sufficient to have joint commitments is therefore a concept of directed violation, that is, of a violation relative to some individual. A different approach is taken by Margaret Gilbert, who proposes to understand the obligations of joint commitments in terms of owing (see [15], Chapter 11).


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

The general idea is that once a joint commitment has been created, every party owes certain actions to the other parties; symmetrically, all parties own, even if they do not yet possess, the actions that are owed to them. It may indeed be the case that the concept of owing can be reduced to the more primitive notion of violation we have previously introduced. But it may also be the other way round: the concept of owing may be a psychological primitive, on which more complex aspects of social cognition are based. In any case, we think that only empirical research may settle this issue. Whether joint commitments are based on a primitive notion of directed violation, or an a primitive notion of owing, it would be extremely interesting to discover at what age human beings are capable of building the relevant representations. Since the publication of Kohlbergs pioneering paper on moral stages [33], much research has been carried out on the development of moral reasoning, but situations of joint commitment have not been a primary concern. Monika Keller and colleagues [34] reported on some experiments in which children were asked to reason on situations in which an agreement between a child and his mother was either fulfilled or violated, and found that even children of about three years of age were able to correctly detect situations of agreement violation. This kind of experiments, though, rather than testing whether children are able to engage in joint commitments in first person, test the childrens ability to reason on third-person situations of joint commitments. Moreover, due to the cognitive complexity of the experimental task, such experiments can be run only on children of at least three years of age. However, recent literature on the early development of sociality (like [18, 22, 35]) suggest that certain fundamental social abilities show up considerably earlier. Recently Maria Grfenhain and colleagues [36] reported on an experiment aimed to identify the presence of joint commitments in social play contexts. The preliminary results show that the deontic implications of joint commitment begin to emerge at two years, and are clearly established by three years of age. Of course, further research is needed before we have a clear picture of the ontogeny of joint commitment. 13.4.4 The life cycle of plural subjects As everything on earth, plural subjects have a beginning, a period of life, and an end that is, a life cycle. Describing all possible life cycles of plural subjects is beyond the scope of this article. In what follows we shall just sketch a few important points. As we have already remarked in Section 2, the joint commitment that constitutes a plural subject may be created through an explicit agreement or may come to exist as an implicit consequence of the parties interaction. For example, at a dancing party two persons may just start dancing together without prior agreement: the joint activity they engage in will imply a joint commitment to dance together at least for a while. Margaret Gilbert suggests that the necessary and sufficient condition for a group of people to form a plural subject is that it is out in the open (i.e., common knowledge) that all members of the group are ready to engage in some common enterprise. Often, the readiness to engage in the common enterprise will mature through a more or less lengthy phase of negotiation.

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


A plural subject exists as long as the underlying joint commitment is in force. During this period the parties of the plural subject are bound by a network of deontic relationships, produced by the joint commitment in a context-dependent way. Such deontic relationships may be classified into two classes: basic and derivative. The basic deontic relationships are the directed obligations, rights, entitlements, and so on that are directly related to carrying out the common enterprise. For example, if Ann and Bob agreed that Bob will visit Ann at her summer cottage next Sunday, then Ann is obliged to Bob to be at her summer cottage next Sunday, Bob has the correlative right to Ann that Ann be at her summer cottage next Sunday, Bob is entitled to go to Anns summer cottage next Sunday, and so on. The derivative deontic relationships concern the management of the joint commitment in the face of violations by the parties of the plural subject. For example, in case Ann is not at her cottage next Sunday, Bob has the derivative entitlement to rebut; or, if after their agreement Ann discovers it will be impossible for her to be at her summer cottage next Sunday, she has the derivative obligation to tell Bob and to provide a suitable justification. A plural subject may come to an end in many different ways. In some cases, the underlying joint commitment will have a well-defined deadline: consider for example the joint commitment of moving a table together, which terminates when the action is completed. In other cases the deadline will be only vaguely defined, and consequently the termination of the joint commitment will require some form of explicit or implicit negotiation. As an example, consider the joint commitment of going for a walk together: given that a walk is a vague concept, sooner or later the parties will start negotiating the end of the common enterprise, for example by saying I start feeling tired now or Im afraid I have to go back now, I have to dress up for dinner. A plural subject may also come to an end due to a violation by one of the parties. In the case of two parties, a violation by one of them is sufficient to wipe out the joint commitment, thus freeing the other party of all obligations. With more than two parties the situation is more complex, and we shall not try to deal with it here. 13.5 Conclusions In this article we have argued that joint activities involve a particular form of deontic normativity, that following Margaret Gilbert we call joint commitment. Joint commitments arise when a number of subjects make it overt that they are ready to engage in a common enterprise, and generate deontic relationships (directed obligations, rights, and entitlements) among these subjects. By creating such deontic relationships, joint commitments play an essential role in stabilising interaction, which is particularly relevant to anticipatory planning. More work needs to be done before we can form a satisfactory picture of the deontic normativity of joint commitments as part of the general phenomenon of human intersubjectivity. Below we mention some issues that seem to us to be important. At the theoretical level, we think that the relationship between joint commitments and moral obligations is in need of clarification. Intuitively, the deontic normativity of joint commitments appears to be distinct from moral


A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity

normativity. However, what this difference exactly amounts to, and what are the relationships between commitment and morality is still unclear. At the empirical level, there seems to be at least four areas in which it would be interesting to carry out experimental work. First, research on the ontogenesis of joint commitment, which as we have seen has already started, may contribute to our understanding of the development of sociality; moreover, considering results in the light of the available literature on moral development may help to understand the relationships between the normativity of commitments and moral normativity. Second, the analysis of adult interactions may clarify important aspects of the life-cycle of plural subjects and the relationships between joint commitments and what we have called the degree of coupling of collective activities. Third, the analysis of narratives may shed light on the affective side and on the first-person perspectives of joint commitment. Finally, it would be interesting to find out how certain types of cognitive and/or relational disorders, due to brain injuries or neurological disorders, influence the human capacity to engage in joint commitments. 13.6 References
[1] C. Trevarthen, Communication and cooperation in early infancy: A description of primary intersubjectivity. In M. Bullowa (Ed.), Before speech: The beginning of interpersonal communication, (pp. 321347). Cambridge: Cambridge University Press, 1979. [2] M. Tomasello, M. Carpenter, J. Call, T. Behne & H. Moll, Understanding and sharing intentions: The origins of human cognition. Behavioral and Brain Sciences, 28, 675735, 2005. [3] S. Gallagher, How the body shapes the mind. New York: Oxford University Press, 2005. [4] P. Grdenfors, The cognitive and communicative demands of cooperation. In T. RnnowRasmussen, B. Petersson, J. Josefsson & D. Egonsson (Eds.), Hommage Wlodek: Philosophical papers dedicated to Wlodek Rabinowicz, 2007. Online: http://www. [5] P. Grdenfors, Evolutionary and developmental aspects of intersubjectivity. In H. Liljenstrm & P. rhem (Eds.), Consciousness transitions: Phylogenetic, ontogenetic and physiological aspects, Amsterdam: Elsevier, to appear. Online: [6] I. Brink & P. Grdenfors, Co-operation and communication in apes and humans. Mind & Language, 18, 484501, 2003. [7] J. Zlatev, Whats in a schema? Bodily mimesis and the grounding of language. In B. Hampe (Ed.), From perception to meaning: Image schemas in cognitive linguistics, (pp. 313342). Berlin: Mouton. [8] S. Brten (Ed), On being moved: From mirror neurons to empathy. Amsterdam/Philadelphia: John Benjamins, 2007. [9] S. Hurley & N. Chater (Eds), Perspectives on imitation: From cognitive neuroscience to social science, 1. Cambridge, MA: MIT Press, 2005. [10] S. Hurley & N. Chater (Eds), Perspectives on imitation: From neuroscience to social science, 2. Cambridge, MA: MIT Press, 2005. [11] D. Tollefsen, Collective intentionality, The Internet Encyclopedia of Philosophy, 2004. Online: [12] M. Gilbert, On social facts. London and New York: Routledge, 1989. [13] M. Gilbert, Living together: Rationality, sociality, and obligation. Lanham: Rowman & Littlefield, 1996. [14] M. Gilbert, Sociality and responsibility: New essays in plural subjet theory. Lanham: Rowman & Littlefield, 2000. [15] M. Gilbert,. A theory of political obligation. Oxford: Clarendon Press, 2006. [16] D. N. Stern, The interpersonal world of the infant. New York: Basic Books, 1985. [17] I. Brinck, The role of intersubjectivity for the development of intentional communication. In J. Zlatev, T. Racine, C. Sinha & E. Itkonen (Eds), The shared mind: Perspectives on

A. Carassa et al. / The Role of Joint Commitment in Intersubjectivity


[18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31]

[32] [33] [34] [35]


intersubjectivity, Amsterdam: John Benjamins, to appear. Online: publicationfiles/pp103.doc M. Tomasello & H. Rakoczy, What makes human cognition unique? From individual to shared to collective intentionality. Mind & Language, 18, 121147, 2003. J. R. Searle, Intentionality: An essay in the philosophy of mind. Cambridge: Cambridge University Press, 1983. B. Dunin-Kepliz & R. Verbrugge, A tuning machine for cooperative problem solving. Fundamenta Informatica, 21, 10011025, 2001. J. R. Searle, Speech acts: An essay in the philosophy of language. Cambridge: Cambridge University Press, 1969. H. Rakoczy, Pretend play and the development of collective intentionality. Cognitive Systems Research, 7, 113127, 2006. J. R. Searle, The construction of social reality, New York: Free Press, 1995. T. Winograd & F. Flores, Understanding computers and cognition: A new foundation for design. Norwood: Ablex, 1986. C. L. Hamblin, Fallacies. London: Methuen, 1970. D. N. Walton & E. C. W. Krabbe, Commitment in dialogue: Basic concepts of interpersonal reasoning. Albany: State University of New York Press, 1995. J. R. Searle, What is language: Some preliminary remarks. In S. L. Tsohatzidis, John Searles philosophy of language: Force, meaning, and mind, (pp. 1548). Cambridge: Cambridge University Press, 2007. A. Gulz, The planning of action as a cognitive and biological phenomenon, Lund: Lund University Cognitive Studies 2, 1991. M. E. Bratman, Intention, plans, and practical reason, Cambridge, MA: Harvard University Press, 1987. M. Gilbert, Mutual recognition, common knowledge, and joint attention. In T. RnnowRasmussen, B. Petersson, J. Josefsson & D. Egonsson (Eds.), Hommage Wlodek: Philosophical papers dedicated to Wlodek Rabinowicz, 2007. Online: http://www. fil. S. Gallagher & D. Hutto, Understanding others through primary interaction and narrative practice. In J. Zlatev, T. Racine, C. Sinha & E. Itkonen (Eds), The shared mind: Perspectives on intersubjectivity, Amsterdam: John Benjamins, to appear. Online: ~gallaghr/gall&Hutto07.pdf A. R. Anderson, A reduction of deontic logic to alethic modal logic. Mind, 67, 100103, 1958. L. Kohlberg, Moral stages and moralization: The cognitive developmental approach. In T. Lickona (Ed.), Moral development and behavior: theory, research, and social issues, (pp. 3153). New York: Holt, Rinehart & Winston, 1976. M. Keller, M. Gummerum, X.-T. Wang, & S. Lindsay, Understanding perspectives and emotions in contract violation: Development of deontic and moral reasoning. Child Development, 75, 614 635, 2004. H. Rakoczy, Play, games, and the development of collective intentionality. In M. Sabbagh & C. Kalish (Eds.), New directions in child and adolescent development, Special issue on Conventionality, to appear. Online: rakoczy_2007_pretense_conventions.pdf M. Grfenhain, T. Behne, M. Carpenter & M. Tomasello, Young childrens understanding of joint action and joint commitment in social play contexts. Poster presented at the 2007 Biennial Meeting of the Society for Research in Child Development (SRCD), Boston (MA), March 26 April 1, 2007.

This page intentionally left blank


One little cat in the corner, Washing her cute little face; One little cat comes to catch her; Two little cats run a race! (One 'cat' sits on floor. Second 'cat' joins in on third line. Two 'cats' chase each other during musical interlude). Two little cats in the corner, Trying to round up a mouse! One cat comes in from the barnyard; Three little cats in the house! (Two 'cats' sit on floor. Third 'cat' joins in on third line. 'Cats' frolic and play during musical interlude). Three little cats on the doorstep, Warming themselves in the sun; One cat comes up from the cellar; Four little cats having fun! (Three 'cats' sit on floor. Fourth 'cat' joins in on third line. 'Cats' frolic and play during musical interlude). Four little cats by the window, Watching the stars twinkle bright, One cat jumps out of the basket; Five little cats say goodnight! (Four 'cats' sit on floor. Fifth 'cat' joins in on third line. 'Cats' snuggle close and fall asleep during slower musical interlude). Jessie Norton Counting Kitties, 2002

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



Joint Action in Music Performance

Abstract. Ensemble musicians coordinate their actions with remarkable precision. The ensemble cohesion that results is predicated upon group members sharing a common goal; a unified concept of the ideal sound. The current chapter reviews research addressing three cognitive processes that enable individuals to realize such shared goals while engaged in musical joint action. The first process is auditory imagery; specifically, anticipating ones own sounds and the sounds produced by other performers. The second process, prioritized integrative attention, involves dividing attention between ones own actions (high priority) and those of others (lower priority) while monitoring the overall, integrated ensemble sound. The third process relates to adaptive timing, i.e., adjusting the timing of ones movements in order to maintain synchrony in the face of tempo changes and other, often unpredictable, events. The way in which these processes interact to determine ensemble coordination is discussed.

Contents 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 Introduction....................................................................................................... 205 Ensemble cohesion & shared goals ................................................................... 206 Anticipatory auditory imagery .......................................................................... 207 Prioritized integrative attention......................................................................... 210 Adapting to others action timing ..................................................................... 212 Relations between imagery, attention, & adaptive timing ................................ 214 Conclusions....................................................................................................... 217 Acknowledgements ........................................................................................... 217 References......................................................................................................... 218

14.1 Introduction In musical contexts within all known cultures and most echelons of society therein, temporally precise inter-individual synchronization can be observed among instrumentalists and dancers, and between performers and audience members. This type of synchrony is unique to humans, not by virtue of its precision chorusing crickets and frogs are masterfully coordinated [1] but rather due to the flexibility with which it is rendered. Human synchronization is a creative affair. It can be achieved through the use of different effectors (such as hands, feet, hips, shoulders, and heads), it can result in a seemingly infinite number of temporal structures (by coordinating rhythms with varying levels of complexity), and it is characterized by rapid adaptation to tempo changes in familiar and unfamiliar


P.E. Keller / Joint Action in Music Performance

musical styles (for example, when dancing to the music of a foreign culture). Nowadays people even engage in musical synchronization via the Internet [2]. The current chapter is concerned with the cognitive processes that enable humans to coordinate their actions with the remarkable precision and flexibility that can be observed during musical joint action, i.e., musical activity involving more than one participant. Although these processes are most likely recruited to some degree regardless of whether the activity is clearly overt such as in instrumental performance and dancing, or more covert such as in listening, the focus here will be on music performance by trained individuals. In musical ensembles, performers engage in mutually coupled, affective exchanges that are mediated by instrumental sounds and expressive body gestures. Ideally, the entrainment underlying such activity should not only result in the coordination of sounds and movements, but also of mental states. Thus, in accordance with enactive approaches to social cognition [3], performers intentionally and actively participate in making sense of the music so that its meaning is shared among co-performers and communicated to audience members. This interactive form of enaction requires each performer to be sensitive to the subjective states expressed by his or her co-performers. Musical joint action therefore exercises the human predisposition for intersubjectivity [4] on grounds where meaning is essentially ineffable, highly embodied, and usually makebelieve (in the sense that a musician does not need to be sad to play mournfully). Consider a pair of pianists playing a duet. How do they coordinate their actions with sufficient precision to produce complex sound patterns that far from being mechanically regular are exquisitely and purposefully structured in time? The ability to synchronize in this way obviously relies upon considerations apart from the technical command of ones instrument. To produce a cohesive ensemble sound, the pianists must hold a common goal; a shared representation of the ideal sound. This chapter begins by discussing ensemble cohesion and shared musical goals, and then goes on to describe research addressing three specific ensemble skills that are assumed to enable performers to achieve such goals. These core ensemble skills, which are rooted in cognitive processes that most likely facilitate joint action more generally [5, 6], are anticipatory auditory imagery, prioritized integrative attention (a form of divided attention), and adaptive timing. The chapter ends by considering how these ensemble skills interact to determine the quality of ensemble cohesion during musical joint action. New data from a piano duet study will be introduced for illustrative purposes at this later stage.

14.2 Ensemble cohesion & shared goals Ensemble musicians usually aim to interact in a manner that is conducive to producing a coherent musical entity. The term ensemble cohesion refers to how well separate instrumental parts gel together to form such an auditory Gestalt. Ensemble cohesion is predicated upon the musicians sharing a common performance goal, that is, a unified conception of the ideal sound. The formation of shared musical goals may be grounded in the automatic tendency for people engaged in joint action to develop mental representations of each others tasks [6]. However, it is assumed here that additional effort is required in the case of musical joint action.

P.E. Keller / Joint Action in Music Performance


The richness and specificity of performance goals vary as a function of the musical context (e.g., goals are more highly resolved and consequently have less degrees of freedom in scripted music than in improvised music [7]). Highly specific performance goals, which are the norm in Western art music, are established while preparing a musical piece for performance through both individual private practice (a mixture of playing ones instrument, listening to recordings, and studying musical scores) and collaborative rehearsal with other group members. During collaborative rehearsal, the formation of performance goals is governed by a mixture of social, conventional, and pragmatic considerations [8-11]. For example, factors such as personality can influence communication among group members, social stereotypes can determine how the opinions of various instrumentalists are weighted (soloists or those playing melodic instruments often seem to have the last word), and the size of the group can determine how leadership is distributed among ensemble members ranging from egalitarian piano duos, through democratic mixed chamber groups, to autocratic regimes where a conductor is expected to impregnate an entire orchestra with his or her performance goal. In any case, once performance goals are established, they reside in memory as idealized mental representations of the sounds constituting the musical piece. Performance goals embody a performers intentions and expectations about how his or her own sound and the overall ensemble sound should be shaped dynamically over time. With such goals in mind, musicians develop performance plans (usually during private practice) that guide the motor processes involved in translating the goal representations into appropriate body movements [12-15]. It seems reasonable to assume that ensemble cohesion will vary according to how well performance goals related to the overall sound are matched across ensemble members. Factors that may compromise the quality of this match include difficulties associated with memorizing the details of complex musical textures, and biases that result from individual differences in stylistic preference and the fact that each musician envisages the overall sound from the unique perspective of his or her individual performance plan. Importantly, though, the degree to which goal representations are shared is not the only determinant of ensemble cohesion. Performance goals must be realized (via the execution of performance plans) under the real-time demands and vagaries of live musical interaction. Three ensemble skills that are purported to enable performers to accomplish thisanticipatory auditory imagery, prioritized integrative attention, and adaptive timingare considered next in turn.

14.3 Anticipatory auditory imagery Ensemble performance requires each musician to anticipate his or her sounds and the sounds produced by other musicians. It is assumed here that these forms of anticipation involve mixtures of auditory and motor imagery, and that such topdown anticipatory processes coevolve with bottom-up expectancies generated on the basis of the perception of actual sounds (see [16, 17]). It is through the generation of auditory and motor images that musicians activate internal representations of performance goals and plans. While engaged in such imagery, the auditory component is most likely paramount in the performers


P.E. Keller / Joint Action in Music Performance

phenomenology: It is what an individual has in mind while playing. Indeed, accomplished musicians often express the opinion that greater performance excellence can be attained by imagining the ideal sound than by concentrating on motor aspects of performance (once the requisite technical skills have been acquired, of course) [18]. This notion sits comfortably with the ideo-motor approach to voluntary action. The central tenet of the ideo-motor approach is that actions are triggered automatically by the anticipation of their intended distal effects [19, 20]. As William James pointed out, a singer needs to think only of the perfect sound in order to produce it ([19] p. 774). Anticipatory auditory imagery can facilitate the accurate performance of ones own part in at least three ways. First, such imagery may prime appropriate movements via functional links between auditory and motor brain regions that have developed through experience playing a musical instrument [21-24]. Second, auditory imagery may assist performers in meeting precise temporal goals, such as a steady tempo, by stabilizing motor control processes [25, 26]. Third, anticipatory auditory imagery may facilitate rapid performance by enabling thorough action preplanning. The degree to which performers engage in anticipatory auditory imagery during such planning increases with increasing musical experience [27]. Thus, although James singer needed only to think of the ideal sound in order to produce it, he probably required a considerable amount of practice before being able to conjure such thoughts accurately and reliably. In case excessive private practice has made James singer lonely, let us place him in a choir. For James singer to coordinate with his fellow choristers, it is necessary for him to predict what they will do, and, even more crucially, exactly when and how they will do it. The typical degree of asynchrony in musical ensembles (around 30-50 ms [28, 29]) is far smaller than would be expected if musicians were sheepishly reacting to the sounds of an individual serving as the leader. Instead, ensemble musicians make predictions about events in other parts by using auditory imagery to simulate the ongoing productions of their co-performers. This process was investigated recently in a study of piano duet performance [30]. Expert pianists were required to record one part from several unfamiliar piano duets, and then to play the complementary part in time with either their own or others recordings after a delay of several months. It was assumed that pianists would be able to simulate upcoming events best in their own recordings because in this case the simulation is being carried out by the same cognitive/motor system with all its idiosyncratic constraints that generated the events in the first place. This was indeed the case: Pianists were more accurate at synchronizing with their own recordings than with others recordings. The task of coordinating the anticipatory auditory images required to guide ones own actions and simultaneously predict the outcomes of others actions may be accomplished by multiple, tightly coupled internal models instantiated in the central nervous system. A distinction has been drawn between forward and inverse internal models in the field of movement neuroscience [31]. Both types of model are capable of learning to represent transformations between motor commands and sensory events based on experience with specific sensorimotor contingencies (e.g., the command to lower a finger in a particular manner, feeling the finger move against a piano key, and hearing a tone with particular qualities). The cerebellum has been identified as a likely seat of such learning [32]. The

P.E. Keller / Joint Action in Music Performance


difference between forward and inverse models lies in the direction of the sensorimotor transformation. Forward models represent the causal relationship between efferent motor signals which issue from the supplementary motor area (SMA) to the primary motor cortex and their ultimate effects on the body and the environment. Forward models have been ascribed roles in controlling ones own actions and in perceiving and understanding the actions of others. When used to guide ones own actions, forward models facilitate the efficiency of motor control processes by allowing movement errors to be corrected on the basis of predicted sensory feedback prior to the arrival of actual feedback [31]. In the context of action observation, it has been claimed that forward models allow the observer to simulate another individuals behavior and thereby predict its future course [33, 34]. Forward models may recruit the so-called mirror system to some degree in doing so. On the basis of findings that similar premotor cortical activation patterns arise when an individual carries out an action and when the individual sees and/or hears somebody else performing the action, the frontal-parietal mirror system has been heralded as a key brain network mediating social interaction [35-38]. It has recently been shown that the mirror system resonates most strongly with actions that belong to the observers own behavioral repertoire while listening to music or viewing dance [39-41]. Musical joint action may capitalize on both of the above functions of forward models. On this view, forward models representing ones own performance promote stable motor control by allowing movement errors to be corrected on the basis of anticipated auditory feedback while forward models representing the actions of ones co-performer(s) assist in predicting the what, when, and how of upcoming auditory synchronization targets. The main difference between these two proposed classes of forward model lies in the nature of the efferent motor signals and tactile and proprioceptive feedback that they represent. Forward models of ones own performance presumably represent information about the specific movements associated with manipulating a particular musical instrument, whereas forward models of others performances do not necessarily represent such specific movement-related information. Indeed, musicians in mixed ensembles readily synchronize with instruments that they cannot themselves play (which may have implications for the nature of the mirror systems involvement in musical joint action). Hence, the movement-related information represented by forward models of others musical performances may be limited to relatively general, instrument-independent forms of body motion (e.g., swaying, rocking, and expressive gesturing) as well as vocal and articulatory activity that could potentially approximate others sounds. Consistent with this notion, Ricarda Schubotz [42] has proposed that forward models run rudimentary simulations based on partial sensorimotor information when an observer is not capable of producing a perceived event sequence, and, moreover, vocal and articulatory loops in the lateral premotor cortex are engaged when predicting upcoming events in sequences whose structural properties are represented best in terms of musical parameters such as rhythm and pitch. Inverse models sit opposite forward models. Traditionally, they represent sensorimotor transformations from desired action outcomes to the motor commands that give rise to these outcomes [32]. When playing music, the process


P.E. Keller / Joint Action in Music Performance

of activating performance goal representations via auditory imagery can be considered to be akin to running inverse models. It is assumed here that as with forward models musical joint action recruits two classes of inverse model, one dealing with the performance of ones own part, and the other dealing with particular parts played by co-performers or the whole ensemble texture (depending on structural aspects of, and familiarity with, the music). The main distinction between these two classes is that inverse models representing ones own part are associated with rehearsed performance plans endowed with the power to trigger instrument-specific motor commands, whereas inverse models representing other instrumental parts (or whole textures) are impotent in this regard. Although the generation of auditory images is mediated in both cases by a motor-related brain network incorporating the SMA and premotor cortex (in conjunction with the secondary auditory cortex) [43-45], appropriate motor commands for action are transmitted from the SMA to the primary motor cortex only on the basis of information from inverse models related to ones own part. Nevertheless, inverse models representing others parts are not superfluous because without them the intended relation between partswhich musicians invest much time in learningwould be lost. Indeed, the correct performance of ones own part is usually defined in terms of the relation between ones part and other parts, as, for example, when the pianist assigned to the secondo part in a duo may be required to play less loudly than the pianist playing the primo part. In this case, the inverse model for the secondo pianists own part requires access to an inverse model representing the primo part in order to suggest motor commands that result in less forceful movements (hence softer sounds) than those being executed by the primo pianist. Pairing inverse models of others performances with corresponding forward models would facilitate efficient motor control by allowing corrections to be made on the basis of the anticipated relation between parts rather than in response to the perception of actual discrepancies between ones own and others actions. Such paired internal models are featured in MOSAIC-based models of social interaction [46], where other inverse models provide input to other forward models, and thereby influence predictions about upcoming likely states in ones co-actors. Paired forward-inverse models of others parts would also be useful in the context of music because they would allow one performer to imagine anothers style of playing in his or her absence, as is presumably done during private practice geared towards preparing for an ensemble performance. Thus, paired forward and inverse models that support motor learning and control in the context of ones own actions may, in the case of musical joint action, be coupled with a second class of paired forward-inverse models specializing in anticipating others sounds. To function properly during musical joint action, the entire system of internal models would naturally need to be kept in tune with changes in the auditory scene via actual sensory feedback. The availability of such feedback is modulated by attention.

14.4 Prioritized integrative attention There is usually a lot to contend with during musical joint action. In ensembles, individual musicians are not only responsible for producing their own parts

P.E. Keller / Joint Action in Music Performance


correctly, but they must also maintain awareness of the relationship between their parts and parts played by others. It has been argued that prioritized integrative attention is the optimal strategy to meet such multiple-task demands [47, 48]. Prioritized integrative attention involves dividing attention between ones own actions (high priority) and those of others (lower priority) while monitoring the overall ensemble sound. This attentional strategy is assumed to facilitate ensemble cohesion by allowing musicians to adjust their performances based on the online comparison of mental representations of the ideal sound (i.e., the performance goal) and incoming perceptual information about the actual sound. Prioritized integrative attention is related to the social cognitive concept of joint attention [49] to the extent that multiple performers attend consensually to the overall ensemble sound or to a common subset of sounds (such as when musicians playing accompanimental roles pay attention to a soloist). A confluence of Mari Riess Jones dynamic attending theory [50, 51] and ideas related to Daniel Kahnemans [52] conception of fluctuations in autonomic arousal has led to the proposal that metric frameworks may drive prioritized integrative attention during musical joint action [48]. Metric frameworks are cognitive/motor schemas that comprise hierarchically arranged levels of pulsation, with pulses at the beat level nested within those at the bar level in simple n:1 integer ratios such as 2:1 (duple meter), 3:1 (triple), or 4:1 (quadruple) [53]. Metric pulsations are experienced as regular series of internal events, with every nth event perceived to be accented, i.e., stronger than its neighbors. March, waltz, and salsa music support different types of rhythmic movement coordination partly because each best fits within a different metric framework: duple, triple, and quadruple, respectively. Metric frameworks facilitate rhythmic perception and action by encouraging listeners and performers to allocate their attentional resources in accordance with periodicities underlying the musics temporal structure [51, 54, 55]. In ensemble performance, metric frameworks may modulate the amount of attention that is available at a particular point in time (via arousal mechanisms) and the amount of attention that is actually invested at this time (via dynamic attending processes) in a manner that is conducive to the flexibility required to integrate information from different sources while tending to a high priority part [48]. Metric resource allocation schemes could thus promote ensemble cohesion by allowing performers to use a common attentional template to accommodate the different surface details of their individual parts. Support for the hypothesis that metric frameworks play a role in prioritized integrative attention comes from studies designed to capture the cognitive and motor demands of ensemble performance using perception- and production-based behavioral tasks. For instance, in a listening task [54], musicians were required simultaneously to memorize a target (high priority) part and the overall aggregate structure (resulting from the combination of parts) of short percussion duets. Recognition memory for both aspects of each duet was found to be influenced by how well the target part and the aggregate structure could be accommodated within the same metric framework. Analogous results were obtained in a rhythmic canon study that required percussionists to produce memorized rhythm patterns while listening to different patterns, which also had to be subsequently reproduced. Prioritized integrative attention can be conceptualized as a hybrid mode of attention that occupies the middle ground of a continuum between two pure


P.E. Keller / Joint Action in Music Performance

modes: selective attention and nonprioritized integrative attention. The former involves focusing on one instrumental or vocal part to the exclusion of others, whereas the latter involves focusing on the aggregate structure that emerges when all parts are combined with equal weight. Ensemble performance may require individuals to roam the middle ground of the selective-integrative attention continuum to deal with changes in the momentary demands of their own parts and the structural relationship between their own and others parts in terms of musical parameters such pitch, rhythm, timbre (instrumental tone color), and balance (relative loudness). Selective and nonprioritized integrative attention, in addition to standard divided attention (which involves focusing on all parts without necessarily gauging the relation between them), have been investigated in a number of studies relevant to multipart musical listening. The results of these studies suggest that the structural relationship between parts can affect the deployment of attention even when this relationship is not directly relevant to the task at hand (e.g., detecting specific target sounds in one or more parts) [56-58]. Considerable attentional skill may be required to overcome such bottom-up perceptual grouping constraints while engaged in musical joint action. Indeed, proficiency in the use of metric frameworks to guide prioritized integrative attention may be a hallmark of expert ensemble performers and listeners. The degree to which prioritized integrative attending skills generalize to other forms of joint action is presently unknown, although the notion seems plausible. Neuroimaging studies have found that manipulations of attentional strategy in the context of multipart musical listening influence activity in frontal-parietal (including the SMA/pre-SMA and premotor cortex) and temporal regions implicated in attention, working memory, and motor imagery across a variety of domains [59, 60].

14.5 Adapting to others action timing The most fundamental requirement of performance-based musical joint action is the temporal coordination of ones own movements and sounds with those of others. To satisfy this requirement, individuals must constantly adjust the timing of their movements in order to maintain synchrony in the face of expressively motivated deviations in local tempo (rubato), large-scale tempo changes, and other often unpredictable events. Such adaptive timing requires flexible internal timekeepers, i.e., interval generators [61] or oscillatory processes [62] that control the temporal aspects of perception and action. Although issues concerning the instantiation of timekeepers in the brain are far from settled [63], steadily accumulating evidence points towards the involvement of distributed neural circuits comprising motor- and imagery-related areas including the SMA/preSMA, premotor regions, the superior temporal gyrus, the basal ganglia, the thalamus, and the cerebellum [64-70]. In musical contexts, the pulsations associated with metric frameworks are driven by hierarchically arranged timekeepers. Oscillatory brain activity that is consistent with metric hierarchies has been detected using electrophysiological techniques with high temporal resolution [71]. The cerebellum may contribute to such oscillatory patterns by entraining the firing rates of neural populations in segregated cortical areas [72]. To enable the production of the non-isochronous rhythms that characterize music, timekeeper

P.E. Keller / Joint Action in Music Performance


networks may recruit prefrontal brain regions that have been implicated in working memory and attention [73, 74]. Musical joint action requires timekeepers in separate individuals to be synchronized, or coupled, with one another. Such coupling is achieved via error correction processes that adjust each individuals timekeeper(s) based on discrepancies between the timing of the individuals actions and those of his or her co-performers. Two independent error correction processes subserve adaptive timing: Period correction, which refers to an adjustment of the duration of the timekeeper interval or oscillator period, and phase correction, which refers to an adjustment to the way in which the sequence of pulses generated by one timekeeper is aligned against the sequence of pulses generated by another timekeeper. Period correction is required only when there is an obvious change in tempo. Phase correction, on the other hand, is needed constantly because timing discrepancies are inevitable. Note, however, that the resultant asynchronies should not be viewed in a negative light. Music sounds dull without them. Moreover, somewhat paradoxically, there is evidence that asynchronies facilitate, rather than interfere with, covert attentional entrainment and overt movement coordination in musical contexts [55, 75, 76]. Detailed theoretical models of phase and period correction have been developed [77-79], and the distinction between the two processes is supported by findings in various fields. Relevant behavioral research has typically employed experimental paradigms that require isolated individuals to produce movements (e.g., finger taps) in time with computer-controlled pacing sequences (see [80, 81] for comprehensive reviews by Bruno Repp). Such studies have yielded results indicating that phase correction takes place automatically (at least at tempi faster than about 60 beats per minute [82]), whereas period correction requires conscious awareness and attention [83, 84]. Phase correction is more effective with auditory than with visual sequences [85], which highlights its importance in musical synchronization. The results of developmental research suggest that full functionality emerges earlier for phase correction than for period correction in human ontogeny [86, 87], and comparative observations have led to the claim that non-human animals who display group synchrony are only capable of phase correction [88]. Finally, neuroscientific work suggests that phase correction is primarily a cerebellar function while period correction calls upon an additional corticothalamic network that includes the basal ganglia and prefrontal regions [73, 89-91]. During musical joint action, ensemble cohesion may vary as a function of the sensitivity of ensemble members to each others use of error correction. In a recent study [92], musically trained individuals were required to synchronize finger taps with auditory sequences presented by a computer that was programmed to implement varying degrees of error correction in a manner that was either cooperative (i.e., aimed at reducing asynchronies) or uncooperative (aimed at increasing asynchronies). Analyses of the humans behavior under these conditions suggested that they engaged in fairly constant, moderate amounts of phase correction so long as the computer was cooperative. When the computer was uncooperative, the humans engaged in more vigorous phase correction, which appeared to be supplemented by intermittent period correction in some situations (most notably when the computer did not implement period correction, and therefore was able to maintain its own stable tempo). To the extent that these


P.E. Keller / Joint Action in Music Performance

findings generalize to ensemble performance, automatically applied phase correction should be sufficient to maintain synchrony in the face of expressive timing deviations. However, when it is difficult to anticipate upcoming expressive timing because the stylistic idiosyncrasies of other ensemble members, or the music itself, are unfamiliar, the performer has the option of intentionally increasing the gain of phase correction and/or engaging strategically in intermittent period correction. Related work has shown that strategic timekeeper adjustments can be used to stabilize challenging modes of sensorimotor coordination. In a study that required antiphase (off-beat) coordination with an external beat sequence [93], it was found that musicians were able to counteract the compelling tendency to fall onto the beat by engaging in regular phase resetting based on metric structure (which was induced either by physical accents in the pacing sequence or by the instruction to imagine such accents when they were in fact absent). Although most research that is relevant to adaptive timing during musical joint action has been conducted using paradigms involving isolated individuals moving in synchrony with computer-controlled sequences, inroads have been made into the realm of real, temporally precise interpersonal coordination. Outside the music domain, the dynamics of interpersonal coordination (e.g., during conversation) have been investigated under conditions that vary in terms of the degree to which coupling is intentional and whether it is mediated via visual and/or auditory channels [94-98]. Intriguing electrophysiological work in this vein has revealed that oscillatory neural activity in the mirror system distinguishes between whether or not two peoples rhythmic finger movements are coordinated when in visual contact [99]. In the music domain, visually mediated coordination has been investigated in research aimed at identifying the kinematic features of a conductors gestures that musicians use as a basis for synchronization [100]. Coordination via the auditory channel has been addressed recently in finger tapping studies that are directly relevant to adaptive timing. Preliminary results from one such study suggest that each individual from a pair compensates for timing errors produced by their partner, as well as their own errors, when tapping alternately in time with an external beat sequence [101]. Such mutual error correction could serve to make multiple ensemble performers sound as one. Related work addressing the impact of social and developmental factors on interpersonal synchronization is also underway [102, 103].

14.6 Relations between imagery, attention, & adaptive timing Anticipatory auditory imagery, prioritized integrative attention, and adaptive timing must act together in concert rather than in isolation during musical joint action. In this section, the results of a new study that investigated how the mechanisms underlying these three ensemble skills interact to determine coordination in piano duos are briefly reported. The body movements of seven pairs of expert pianists were recorded using a motion capture system while they performed unfamiliar duets on a pair of MIDI pianos. Analyses of the pianists movements revealed that anterior-posterior body sway was more strongly correlated in some pairs than in others. These differences

P.E. Keller / Joint Action in Music Performance


between pairs provided an index of musical synchronization that was both reliable (i.e., constant across contrasting musical pieces and independent of whether or not pianists were in visual contact) and valid (i.e., body sway coordination was negatively correlated with the degree of asynchrony between sounds, which was calculated from the MIDI recordings). Several months after recording the duets, the same 14 pianists were invited back individually to complete experimental tasks designed to assess their abilities at anticipatory auditory imagery, prioritized integrative attention, and adaptive timing. The tasks were borrowed from previous studies addressing these cognitive processes. The anticipatory auditory imagery task, which involved the production of rhythmic movement sequences with predictable compatible or incompatible auditory effects (see [25]), yielded an index reflecting the vividness of imagery for upcoming musical sounds. The prioritized integrative attention task (see Experiment 1 in [54]) yielded an index of the strength of the relationship between prioritized integrative attending and metric structure. The adaptive timing task, which involved finger tapping in time with computer-controlled auditory sequences (see [84]), assessed the speed and completeness of adaptation to tempo changes. Although anticipatory auditory imagery, prioritized integrative attention, and adaptive timing indices were not strongly correlated with one another across individual pianists, the three indices combined well to predict the observed differences in body sway coordination between pairs of pianists (see Figure 1).

Figure 1. Scatter plot showing the relationship between body sway coordination (ranging from good to poor on the horizontal axis) and indices of abilities related to three ensemble skillsanticipatory auditory imagery, prioritized integrative attention, and adaptive timing (ranging from low to high on the vertical axis)for seven pairs of pianists. For imagery and attention, each data point represents the mean score for a pair of pianists. For timing, each data point represents the higher of the two scores from a pair. Note that all measures were normalized (hence the units are arbitrary) so that they could be plotted in the same range.


P.E. Keller / Joint Action in Music Performance

Interestingly, the integrity of these predictions did not necessarily rely upon the inclusion of indices from both members of a pair. Four statistical models that differed in terms of the indices that they included were considered. Two models included indices from both members of each pair, either averaged or differenced, and two included indices from just a single member, either the pianist with the highest or the lowest score on each index. The models based on averaged indices and maximum scores accounted for comparably high amounts (each over 90%) of the variance in body sway coordination (while the remaining models were less predictive). Thus, good coordination required at least one member of a pair to have relatively good ensemble skills. Adaptive timing stood out in this regard when the relation between each skill and coordination was considered separately. Here the maximum score from a pair was a stronger determinant of coordination than the averaged score. This may reflect the tendency for individuals to adopt roles as leaders and followers during ensemble performance [10]. Coordination in duos may be good to the extent that the follower is able to anticipate and adapt to the leaders expressive timing nuances while the leader concentrates on shaping the music rather than on adaptive timing. Indeed, the results of the cooperative/uncooperative computer study [92] described earlier are consistent with the notion that that sensorimotor synchronization is facilitated by such an asymmetry in the coupling between two parties in a dyad. Although strong conclusions should not be drawn based on observations from just seven pairs of pianists, the results of this study suggest that it is worthwhile to pursue a model of musical joint action with anticipatory auditory imagery, prioritized integrative attention, and adaptive timing at its core. (It should be noted that alternative models with predictors such as sensitivity to the compatibility between movements and actual rather than anticipated sounds, prioritized integrative attending in contexts lacking clear metric structure, and synchronization accuracy in the absence of tempo changes were tested, but they did not fare well.) The precise nature of the relationship between the cognitive processes in the proposed model remains to be specified. Previous studies examining the relationship between anticipatory imagery and attention outside the music domain have shown that the preparatory activation of sensory areas via imagery boosts neural responses to attended stimuli [104]. Furthermore, the results of work on the relationship between attention and internal timing mechanisms suggest that such preparatory baseline shifts in attention can come to occur in a self-sustained, period manner [50, 51]. It is assumed here that anticipatory auditory imagery facilitates prioritized integrative attention similarly during musical joint action, and that timing mechanisms assist by regulating the relationship between imagery and attentional processes both within and between individuals. Specifically, anticipatory auditory imagery and prioritized integrative attention are linked through the use of common timekeepers to drive forward and inverse internal models within an individual. The error correction processes that mediate adaptive timing may then ensure that these time-locked internal models are coupled between individuals engaged in musical joint action. Overlap in the brain areas subserving imagery, attention, and timingwith the SMA/pre-SMA, premotor regions, and the cerebellum being prominent in this regardis broadly consistent with this sketch. Reviews of the neuroscience of music literature have identified these areas (among others such as the superior

P.E. Keller / Joint Action in Music Performance


temporal gyrus and the sensorimotor cortex) as being of central importance in meeting the sequencing, timing, and sensorimotor integration needs that arise during music perception and production [105, 106]. Individual differences in ensemble expertise may be related to the degree of entrainment between the different neural populations comprising such a core network and the additional brain regions it recruits during musical joint action.

14.7 Conclusions Musical joint action showcases the human capacity for temporally precise yet flexible interpersonal coordination. These qualities are exemplified in musical ensembles. Ensemble cohesion requires individual performers to (1) share common goal representations of the ideal sound, and (2) possess a suite of ensemble skills basic cognitive processes relating to anticipatory auditory imagery, prioritized integrative attention, and adaptive timing that enable these goals to be realized. Additional considerations, including social factors, knowledge of the music, and familiarity with the stylistic tendencies of ones co-performers, may impact upon ensemble cohesion by affecting these three basic processes. Thus, imagery, attention, and adaptive timing may come to modulate the mutual awareness and interpretation of co-performers actions, thereby setting the stage for joint enaction and intersubjectivity. The proposed mechanisms underlying anticipatory auditory imagery, prioritized integrative attention, and adaptive timing include coupled forward and inverse internal models, metric schemas that modulate autonomic arousal and the intensity of attentional focus, and internal timekeepers capable of automatic and intentionbased forms of error correction. It is a challenge for future research to delve deeper into the issue of how these mechanisms interact to determine the quality of musical coordination. Pursuing this challenge should prove that musical joint action is a fruitful domain in which to investigate the cognitive processes and neural mechanisms that support interactive enaction and intersubjectivity.

14.8 Acknowledgements The preparation of this chapter was made possible by support from the Max Planck Society and grant H01F-00729 from the Polish Ministry of Science and Higher Education. The data reported in section 8 of the chapter are from an ongoing study of musical joint action in various types of ensembles from different cultures. I thank Mirjam Appel, Wenke Moehring, Janne Richter, Nadine Seeger, and Kerstin Traeger for running the experiments, and Henrik Grunert and Andreas Romeyke for technical assistance.


P.E. Keller / Joint Action in Music Performance

14.9 References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] B. Merker, Synchronous chorusing and human origins. In N. L. Wallin, B. Merker & S. Brown (Eds.), The origins of music (pp. 315-327). Cambridge, Mass: The MIT Press, 2000. C. Bartlette D. Headlam, M. Bocko & G. Velikic, Effect of network latency on interactive musical performance. Music Perception, 24, 49-62, 2006. H. De Jaegher & E. A. Di Paolo, Participatory sense-making: An enactive approach to social cognition. Phenomenology and the Cognitive Sciences, 6, 485-507, 2007. C. Trevarthen & K. J. Aitken, Infant intersubjectivity: Research, theory, and clinical applications. Journal of Child Psychology & Psychiatry, 42, 3-48, 2001. G. Knoblich & N. Sebanz, The social nature of perception and action. Current Directions in Psychological Science, 15, 99-104, 2006. N. Sebanz, H. Bekkering & G. Knoblich, Joint action: Bodies and minds moving together. Trends in Cognitive Sciences, 10, 70-76, 2006. B. Schgler, Studying temporal co-ordination in jazz duets. Musicae Scientiae, Special Issue 1999-2000, 75-91, 1999-2000. J. Davidson & E. C. King, Strategies for Ensemble Practice. In A. Williamon (Ed.), Enhancing musical performance. Oxford: Oxford University Press, 2004. J. Ginsborg, R. Chaffin & G. Nicholson, Shared performance cues in singing and conducting: A content analysis of talk during practice. Psychology of Music, 34, 167-192, 2006. E. Goodman, Ensemble performance. In J. Rink (Ed.), Musical performance: A guide to understanding (pp. 153-167). Cambridge: Cambridge University Press, 2002. A. Williamon & J. Davidson, Exploring co-performer communication. Musicae Scientiae, 6, 5372, 2002. R. Chaffin, G. Imreh & M. Crawford, Practicing perfection: Memory and piano performance. Mahwah NJ: Erlbaum, 2002. A. Gabrielsson, The performance of music. In D. Deutsch (Ed.), The psychology of music (2nd ed.) (pp. 501-602). San Diego, CA: Academic Press, 1999. C. Palmer, Music performance. Annual Review of Psychology, 48, 115-138, 1997. C. Palmer & P. Q. Pfordresher, Incremental planning in sequence production. Psychological Review, 110, 683-712, 2003. P. Janata, Neurophysiological mechanisms underlying auditory image formation in music. In R. I. Gody & H. Jrgensen (Eds.), Elements of Musical Imagery (pp. 27-42). Lisse: Swets & Zeitlinger Publishers, 2001. P. Janata & K. Paroo, Acuity of auditory images in pitch and time. Perception & Psychophysics, 68, 829-844, 2006. W. H. Trusheim, Audiation and mental imagery: Implications for artistic performance. The Quarterly Journal of Music Teaching and Learning, 2, 139-147, 1993. W. James, Principles of psychology. New York: Holt, 1890. W. Prinz, Ideo-motor action. In H. Heuer & A. F. Sanders (Eds.), Perspectives on perception and action (pp. 47-76). Hillsdale, NJ: Lawrence Erlbaum, 1987. M. Bangert, T. Peschel, M. Rotte, D. Drescher, H. Hinrichs, G. Schlaug, et al., Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage, 15, 917-926, 2006. U. Drost, M. Rieger, M. Brass, T. Gunther & W. Prinz, When hearing turns into playing: Movement induction by auditory stimuli in pianists. Quarterly Journal of Experimental Psychology, 58, 1376-1389, 2005. J. Haueisen & T. R. Knsche, Involuntary motor activation in pianists evoked by music perception. Journal of Cognitive Neuroscience, 13, 786-792, 2001. M. Lotze, G. Scheler, H. R. Tan, C. Braun & N. Birbaumer, The musician's brain: Functional imaging of amateurs and professionals during performance and imagery, NeuroImage, 20, 18171829, 2003. P. E. Keller & I. Koch, The planning and execution of short auditory sequences. Psychonomic Bulletin & Review, 13, 711-716, 2006. P. E. Keller & B. H. Repp, Multilevel coordination stability: Integrated goal representations in simultaneous intra-personal and inter-agent coordination. Manuscript submitted for publication, 2007. P. E. Keller & I. Koch, Action planning in sequential skills: Relations to music performance. Quarterly Journal of Experimental Psychology, 61, 275-291, 2008. R. A. Rasch, Synchronization in performed ensemble music. Acustica, 43, 121-131, 1979.

[17] [18] [19] [20] [21]


[23] [24]

[25] [26]

[27] [28]

P.E. Keller / Joint Action in Music Performance


[29] L. H. Shaffer, Timing in solo and duet piano performances. Quarterly Journal of Experimental Psychology, 36A, 577-595, 1984. [30] P. E. Keller, G. Knoblich & B. H. Repp, Pianists duet better when they play with themselves: On the possible role of action simulation in synchronization. Consciousness & Cognition, 16, 102111, 2007. [31] D. M. Wolpert & Z. Ghahramani, Computational principles of movement neuroscience. Nature Neuroscience, 3, 1212-1217, 2000. [32] D. M. Wolpert, R. C. Miall & M. Kawato, Internal models in the cerebellum. Trends in Cognitive Sciences, 2, 338-347, 1998. [33] M. Jeannerod, Motor cognition: What actions tell the self. Oxford, UK: Oxford University Press, 2006. [34] M, Wilson & G. Knoblich, The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131, 460-473, 2005. [35] J. Decety & J. Grzes, The power of simulation: Imagining one's own and other's behavior. Brain Research, 1079, 4-14, 2006. [36] V. Gallese, C. Keysers & G. Rizzolatti, A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8, 396-403, 2004. [37] V. Gazzola, L. Aziz-Zadeh & C. Keysers, Empathy and the somatotopic auditory mirror system in humans. Current Biology, 16, 1824-1929, 2006. [38] I. Molnar-Szakacs & K. Overy, Music and mirror neurons: from motion to 'e'motion. Social Cognitive and Affective Neuroscience, 1, 235-241, 2006. [39] B. Calvo-Merino, D. E. Glaser, J. Grzes, R. E. Passingham & P. Haggard, Action observation and acquired motor skills: an FMRI study with expert dancers. Cerebral Cortex, 15, 1243-1249, 2005. [40] B. Haslinger, P. Erhard, E. Altenmller, U. Schroeder, H. Boecker & A. O. Ceballos-Baumann, Transmodal sensorimotor networks during action observation in professional pianists. Journal of Cognitive Neuroscience, 17, 282-293, 2005. [41] A. Lahav, E. Saltzman & G. Schlaug, Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience, 27, 308-314, 2007. [42] R. I. Schubotz, Prediction of external events with our motor system: towards a new framework. Trends in Cognitive Sciences, 11, 211-218, 2007. [43] A. R. Halpern, R. J. Zatorre, M. Bouffard & J. A. Johnson, Behavioral and neural correlates of perceived and imagined timbre. Neuropsychologia, 42, 1281-1292, 2004. [44] I. G. Meister, T. Krings, H. Foltys, B. Boroojerdi, M. Mller, R. Tpper & A. Thron, Playing piano in the mindan fMRI study on music imagery and performance in pianists. Cognitive Brain Research, 19, 219-228, 2004. [45] R. J. Zatorre, A. R. Halpern, D. W. Perry, E. Meyer & A. C. Evans, Hearing in the minds ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience, 8, 2946, 1996. [46] D. M. Wolpert, K. Doya & M. Kawato, A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society B, 358, 593602, 2003. [47] P. Keller, Attending in complex musical interactions: The adaptive dual role of meter. Australian Journal of Psychology, 51, 166-175, 1999. [48] P. E. Keller, Attentional resource allocation in musical ensemble performance. Psychology of Music, 29, 20-38, 2001. [49] N. Eilan, C. Hoerl, T. McCormack & J. Roessler (Eds.), Joint attention: Issues in philosophy and psychology. Oxford, UK: Oxford University Press, 2005. [50] M. R. Jones & M. Boltz, Dynamic attending and responses to time. Psychological Review, 96, 459-491, 1989. [51] E. W. Large & M. R. Jones, The dynamics of attending: How we track time varying events. Psychological Review, 106, 119-159, 1999. [52] D. Kahneman, Attention and effort. Englewood Cliffs, NJ: Prentice-Hall, 1973. [53] J. London, Hearing in time: Psychological aspects of musical meter. Oxford: Oxford University Press, 2004. [54] P. E. Keller & D. K. Burnham, Musical meter in attention to multipart rhythm. Music Perception, 22, 629-661, 2005. [55] E. W. Large & C. Palmer, Perceiving temporal regularity in music. Cognitive Science, 26, 1-37, 2002. [56] E. Bigand, S. McAdams & S. Fort, Divided attention in the listening of polyphonic music. International Journal of Psychology, 35, 270-278, 2000.


P.E. Keller / Joint Action in Music Performance

[57] E. J. Crawley, B. E. Acker-Mills, R. E. Pastore & S. Weil, Change detection in multi-voice music: The role of musical structure, musical training, and task demands. Journal of Experimental Psychology: Human Perception & Performance, 28, 367-378, 2002. [58] J. A. Sloboda & J. Edworthy, Attending to two melodies at once: the effect of key relatedness. Psychology of Music, 9, 39-43, 1981. [59] P. Janata, B. Tillmann & J. J. Bharucha, Listening to polyphonic music recruits domain-general attention and working memory circuits. Cognitive, Affective, & Behavioral Neuroscience, 2, 121140, 2002. [60] M. Satoh, K. Takeda, K. Nagata, J. Hatazawa & S. Kuzuhara, Activated brain regions in musicians during an ensemble: a PET study. Cognitive Brain Research, 12, 101-108, 2001. [61] A. M. Wing, Voluntary timing and brain function; an information processing approach. Brain and Cognition, 48, 7-30, 2002. [62] G. Schner, Timing, clocks, and dynamical systems. Brain and Cognition, 48, 31-51, 2002. [63] R. B. Ivry & R. Spencer, The neural representation of time. Current Opinion in Neurobiology, 14, 225-232, 2004. [64] J. L. Chen, V. B. Penhune & R. J. Zatorre, Interactions between auditory and dorsal premotor cortex during synchronization to musical rhythms. NeuroImage, 32, 1771-1781, 2006. [65] J. A. Grahn & M. Brett, Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19, 893-906, 2007. [66] C. Ligeois-Chauvel, I. Peretz, M. Baba, V. Laguitton & P. Chauvel, Contribution of different cortical areas in the temporal lobes to music processing. Brain, 121, 1853-1867, 1998. [67] J. M. Mayville, K. J. Jantzen, A. Fuchs, F. L. Steinberg & J. A. S. Kelso, Cortical and subcortical networks underlying syncopated and synchronized coordination revealed using fMRI. Human Brain Mapping, 17, 214-229, 2002. [68] O. Oullier, K. J. Jantzen, F. L. Steinberg & J. A. S. Kelso, Neural substrates of real and imagined sensorimotor coordination. Cerebral Cortex, 15, 975-985, 2005. [69] S. M. Rao, D. L. Harrington, K. Y. Haaland, J. A. Bobholz, R. W. Cox & J. R. Binder, Distributed neural systems underlying the timing of movements. Journal of Neuroscience, 17, 5528-5535, 1997. [70] M. H. Thaut, Neural basis of rhythmic timing networks in the human brain. Proceedings of the New York Academy of Sciences, 999, (pp.364-373), 2003. [71] T. P. Zanto, J. S. Snyder & E. W. Large, Neural correlates of rhythmic expectancy. Advances in Cognitive Psychology, 2, 221-231, 2006. [72] M. Molinari, M. G. Leggio & M. H. Thaut, The cerebellum and neural networks for rhythmic sensorimotor synchronization in the human brain. Cerebellum, 6, 18-23, 2007. [73] P. A. Lewis, A. M. Wing, P. A. Pope, P. Praamstra & R. C. Miall, Brain activity correlates differentially with increasing temporal complexity of rhythms during initialization, synchronization, and continuation phases of paced finger tapping. Neuropsychologia, 42, 13011312, 2004. [74] K. Sakai, O. Hikosaka, S. Miyauchi, R. Takino, T. Tamada, N. K. Iwata & M. Nielsen, Neural representation of a rhythm depends on its interval ratio. Journal of Neuroscience, 19, 1007410081, 1999. [75] M. J. Hove, P. E. Keller & C. L. Krumhansl, Sensorimotor synchronization with chords containing tone-onset asynchronies: The role of P-centers. Perception & Psychophysics, 69, 699708, 2007. [76] J. A. Prgler, Searching for swing: Participatory discrepancies in the jazz rhythm section. Ethnomusicology, 39, 21-54, 1995. [77] J. Mates, A model of synchronization of motor acts to a stimulus sequence. I. Timing and error corrections. Biological Cybernetics, 70, 463-473, 1994. [78] D. Vorberg & H.-H. Schulze, A two-level timing model for synchronization. Journal of Mathematical Psychology, 46, 56-87, 2002. [79] D. Vorberg & A. Wing, Modeling variability and dependence in timing. In H. Heuer & S. W. Keele (Eds.), Handbook of perception and action, 2 (pp.181-262). London: Academic Press, 1996. [80] B. H. Repp, Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin & Review, 12, 969-992, 2005. [81] B. H. Repp, Musical synchronization. In E. Altenmller, M. Wiesendanger, & J. Kesselring (Eds.), Music, motor control, and the brain (pp. 55-76). Oxford, UK: Oxford University Press, 2006. [82] K. Takanoa & Y. Miyake, Two types of phase correction mechanism involved in synchronized tapping. Neuroscience Letters, 417, 196-200, 2007.

P.E. Keller / Joint Action in Music Performance


[83] B. H. Repp, Processes underlying adaptation to tempo changes in sensorimotor synchronization. Human Movement Science, 20, 277-312, 2001. [84] B. H. Repp & P. E. Keller, Adaptation to tempo changes in sensorimotor synchronization: Effects of intention, attention, and awareness. Quarterly Journal of Experimental Psychology, 57A, 499521, 2004. [85] B. H. Repp & A. Penel, Auditory dominance in temporal processing: New evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology: Human Perception and Performance, 28, 1085-1099, 2002. [86] T. Eerola, G. Luck & P. Toiviainen, An investigation of pre-schoolers corporeal synchronization with music. In M. Baroni, A. R. Addessi, R. Caterina & M. Costa (Eds.), Proceedings of the 9th International Conference on Music Perception & Cognition, (pp. 472-476). Bologna: Bononia University Press, 2006. [87] J. Provasi & A. Bobin-Bgue, Spontaneous motor tempo and rhythmical synchronization in 2 1/2and 4-year-old children. International Journal of Behavioral Development, 27, 220-231, 2003. [88] J. C. Bispham, Rhythm in music: What is it? Who has it? And why? Music Perception, 24, 125134, 2006. [89] K. Lutz, K. Specht, N. J. Shah & L. Jancke, Tapping movements according to regular and irregular visual timing signals investigated with fMRI. NeuroReport, 11, 1301-1306, 2000. [90] P. Praamstra, M. Turgeon, C. W. Hesse, A. M. Wing & L. Perryer, Neurophysiological correlates of error correction in sensorimotor synchronization. NeuroImage, 20, 1283-1297, 2003. [91] K. M. Stephan, M. H. Thaut, G. Wunderlich, W. Schicks, B. Tian, L. Tellmann, et al., Conscious and subconscious sensorimotor synchronization: Prefrontal cortex and the influence of awareness. NeuroImage, 15, 345-352, 2002. [92] B. H. Repp & P. E. Keller, Sensorimotor synchronization with adaptively timed sequences. Manuscript submitted for publication, 2007. [93] P. E. Keller & B. H. Repp, Staying offbeat: Sensorimotor syncopation with structured and unstructured auditory sequences. Psychological Research, 69, 292-309, 2005. [94] Z. Nda, E. Ravasz, Y. Brechet, T. Vicsek & A.-L. Barabsi, The sound of many hands clapping. Nature, 403, 849-850, 2000. [95] A. De Rugy, R. Salesse, O. Oullier & J.-J. Temprado, A neuro-mechanical model for interpersonal coordination. Biological Cybernetics, 94, 427-443, 2006. [96] R. C. Schmidt, C. Carello & M. T. Turvey, Phase transitions and critical fluctuations in the visual coordination of rhythmic movements between people. Journal of Experimental Psychology: Human Perception and Performance, 16, 227-247, 1990. [97] M. J. Richardson, K. L. Marsh & R. C. Schmidt, Effects of visual and verbal information on unintentional interpersonal coordination. Journal of Experimental Psychology: Human Perception and Performance, 31, 62-79, 2005. [98] K. Shockley, M. Santana & C. A. Fowler, Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception and Performance, 29, 326-332, 2003. [99] E. Tognoli, J. Lagarde, G. C. De Guzman & J. A. S. Kelso, The phi complex as a neuromarker of human social coordination. Proceedings of the National Academy of Sciences, 104, 8190-8195, 2007. [100] G. Luck & P. Toiviainen, Ensemble musicians synchronization with conductors gestures: an automated feature-extraction analysis. Music Perception, 24, 189-200, 2006. [101] L. Nowicki, P. E. Keller & W. Prinz, The influence of anothers actions on ones own synchronization with music. Poster presented at the 15th Meeting of the European Society for Cognitive Psychology. Marseille, France, 29 August - 1 September, 2007. [102] T. Himberg, Co-operative tapping and collective time-keeping - differences of timing accuracy in duet performance with human or computer partner. In M. Baroni, A. R. Addessi, R. Caterina & M. Costa (Eds.), Proceedings of the 9th International Conference on Music Perception & Cognition (p. 377). Bologna: Bononia University Press, 2006. [103] S. Kirschner & M. Tomasello, Joint drumming: The social origins of sensorimotor synchronization in young children. Poster presented at the Conference on Language and Music as Cognitive Systems, 11-13 May 2007, Cambridge, UK, 2007. [104] J. Driver & C. Frith, Shifting baselines in attention research. Nature Reviews Neuroscience, 1, 147-148, 2000. [105] P. Janata & S. T. Grafton, Swinging in the brain: shared neural substrates for behaviors related to sequencing and music. Nature Neuroscience, 6, 682-687, 2003. [106] R. J. Zatorre, J. L. Chen & V. B. Penhune, When the brain plays music. Auditory-motor interactions in music perception and production. Nature Reviews Neuroscience, 8, 547-558, 2007.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



Filling the Gap: Dynamic Representation of Occluded Action

Wolfgang PRINZ, Gertrude RAPINETT
Abstract. In this chapter we examine the time course of dynamic-action representations using an experimental paradigm for studying partially occluded action. To address this issue we focus on transitions between perceptual mechanisms (taking care of representing action before and after occlusion), and substitute mechanisms for simulation (taking care of representing the action during occlusion). Does simulation just carry on old processes or initiate new ones? We discuss first results concerning the impact that features of unoccluded action segments make on the representation of occluded segments. These results suggest that action simulation is a creative process, creating novel invisible actions rather than extrapolating visible actions. Observers thus fill the gap by creating something new, not by carrying on something old.

Contents 15.1 15.2 15.3 15.4 15.5 15.6 15.7 Introduction....................................................................................................... 223 Paradigm and basic observations ...................................................................... 226 Linear extrapolation .......................................................................................... 228 Starting from scratch ........................................................................................ 231 Taking goals into account ................................................................................. 233 Conclusions....................................................................................................... 234 References......................................................................................................... 235

15.1 Introduction Visual occlusion is a commonplace thing. When we look around in our environment, many things and events are spatially and temporally occluded. For instance, the persons we are talking to may partially be occluded by a table in front of them and, as they leave the room through a door and then reappear through another door, they may even be entirely occluded for some time. Still, we as observers have a clear sense of their physical presence while they are partially or completely invisible.


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

The issue of occlusion i.e. what may happen behind the occluder and how we can know what is happening has stimulated the fantasy of artists for a long time. One of the reasons may be that, in the case of occlusion, perceptual representation becomes replaced by some other kind of representation re-presentation in a true and more literal sense. For instance, in a famous series of paintings under such mysterious titles like "La condition humaine", or "La belle captive", the Belgium painter Ren Magritte has offered a variety of sceneries in which a landscape is partially occluded by a painting showing the very segment of the landscape that it occludes. The message here seems to be that the painting does not really occlude things but makes them visible in a special way. In any case it seems that the perceiver/painter is capable of representing the hidden scenery in a way that is virtually equivalent to perceiving it. More recently the cognitive neuroscience of action representation has also shown interest in the use of occluders. For instance it has been demonstrated that neurons in both frontal and temporal lobe continue responding for a while to the particular actions to which they attuned when these actions disappear behind an occluder [1, 2]. In this chapter we study mechanisms for the representation of occluded action in human observers. How do they fill the invisible gap that elapses between the visible parts of the action that they can actually see? What kinds of representational mechanisms are involved? In more theoretical terms we may rephrase a situation like this in terms of an interaction between regular perceptual mechanisms (that take care of representing the event segments before and after occlusion) and substitute mechanisms (that take care of representing the event during occlusion). These substitute mechanisms simulate what is happening during occlusion. In the following we use the term of simulation to refer to the operation of those substitute mechanisms. Obviously, unlike action perception which draws on external resources (derived from actual stimulation), action simulation draws on internal resources (derived from stored knowledge). The concept of simulation has in recent years become one of the key notions in research on human intersubjectivity. It plays a major role in research on Theoryof-Mind and on Action Perception. In both domains the concept of simulation expresses the notion that individuals have non-conceptual and non-inferential ways of understanding what other individuals are thinking, feeling, intending or doing. The claim entailed in the notion of simulation is that they do it by putting themselves into the others' shoes, thereby re-enacting their mental states and physical actions [3]. In this chapter we take a closer look at the functional underpinnings of action simulation. Here we use this term in a way that is neutral with respect to any further theoretical claim. For instance we do not want to imply any claim concerning the kinds of representational modalities that may be involved in simulation, that is whether we should think of these mechanisms in terms of visual, kinaesthetic, motor and/or semantic representations [4-9]. Instead, the questions on which we focus in this chapter address transitions and functional relationships between perception and simulation. To which extent does simulation carry on old processes or start new processes? To which extent does it

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


rely on old representations versus creating novel ones? The default answer to these questions is already contained in Magritte's paintings: it is all the same. Perceptual substitution, or simulation, is like perception proper. There is no way to distinguish between the painted landscape on the occluder and the real landscape behind it. According to this view simulation has precisely the same effect as regular perception. A conservative view like this gets support from numerous studies on amodal perception, configurational completion, and virtual contours. These studies address the issue how observers perceive scenes and configurations that are partially occluded. How do they know what the scenery behind their backs looks like (amodal perception)? How do they know how the woods and meadows in the background that are occluded by houses and trees in the foreground look like (configurational completion)? And how and in which sense do they 'see' invisible contours in a Kanisza triangle (virtual contours)? At least for the last two cases theorists have insisted on claiming that representations of occluded parts of stimulus displays are no less accurate and no less real than representations of unoccluded parts and segments, suggesting the conclusion that perceptual substitution is subserved by the same functional machinery as perception proper [10-12]. Here we raise the issue to which extent a view like this also applies to the representation of dynamic events that are occluded for some time, as it often occurs in natural settings. With dynamic occlusion there are always two transitions one from perception to simulation, and another one back from simulation to perception. A setting like this is different from stationary occlusion where perception and simulation coexist in time. Since dynamic occlusion requires to switch back and forth between perception and simulation, it offers an opportunity to separate them in time and study how they are related to each other. Evidence from various studies supports the notion that action simulation may be based on representational resources that are dynamic in the sense of representing the ongoing action as it unfolds in time [5, 13, 14], and some have claimed that the dynamic features of those representations can be traced back to contributions from the motor system [7, 9, 15-17]. For instance, real-time properties of dynamic representations of occluded action have recently been demonstrated in a study by Graf & Prinz [14]. In their study observers perceived brief videos of point-light actions, followed by an occluder and a static posture, and observers were required to judge whether the test stimulus depicted an appropriate continuation of the action. Prediction performance was best when occluder time and movement gap corresponded, i.e. when the test posture was a continuation of the segment that matched the occluder duration in real-time. From these findings we may conclude that action simulation relies on dynamic representations that unfold in real-time. How, then, is real-time simulation related to real-time perception? Can we stick with the default view that simulation is like perception or do we have reasons to challenge it?


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

Figure 1. Experimental setting as seen from the observer's perspective. On each trial the person behind the occluder transported the teapot from home to target position. To prepare stimulus movies for the experiments, we took video recordings from this setup and modified them accordingly (see below).

15.2 Paradigm and basic observations To tackle these questions we developed a paradigm that allows us to study the impact that features of unoccluded action segments make on the representation of occluded segments. In our paradigm observers watched an individual sitting behind the table and facing them. On each trial the individual performed a transport action that started on the lefthand side of the table and ended up on the right-hand side (as seen from the observer's perspective). For instance, the individual picked a teapot on the left and transported it to the right in order to pour tea into a cup. From the observer's perspective the central segment of the transport was always occluded: at some point the transporting hand which was initially visible on the left-hand side disappeared behind a cardboard mounted on the table and then, after some time delay, reappeared on the right-hand side. As Figure 1 illustrates the occluder also occluded the acting person him/herself. Though the occluder is itself a spatially extended object (i.e. the cardboard), the occlusion of the transport action can be specified in terms of both spatial and temporal characteristics. From the observer's view the transporting hand disappears and reappears at certain locations, and simultaneously it disappears and reappears at certain points in time. The spatio-temporal coordinates of the point of disappearance are highly predictable, whereas the coordinates of the point of reappearance are less so. Our task was designed to capitalize on observers' temporal uncertainty with respect to the point of reappearance. On each trial they watched an instance of a full action (i.e. unoccluded initial segment/occluded medium segment/unoccluded final segment), and their task was to judge whether the transporting hand reappeared (i) too early, (ii) just-in-time, or (iii) too late from behind the righthand edge of the occluder. The design of experimental blocks and sessions followed the Method of Constant Stimuli (cf. Figure 2).

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


Figure 2. Sketch of the paradigm. a: Schematic trajectory reflecting the transport movement from left to right; b: Part of the trajectory is hidden behind the occluder; c: Method of Constant Stimuli: on each trial we offered observers one out of 17 possible continuations of the trajectory on the right-hand side. The task was to judge whether the reappearance from behind the occluder occured just in time, too early or too late; d: Typical finding: positive time error, based on just-in-time judgements: in order to be perceived as being just-in-time, the time of reappearance had to be shifted by a positive time error t.

For any transport action that we recorded, we prepared a set of 17 stimulus movies. One of those movies showed the 'true' continuation of the transport as provided by the acting person in the original recording, whereas ten plus six other ones showed points of reappearance that were later or earlier than the 'true' point, respectively (in steps of 40 ms; the asymmetry between late and early points of reappearance reflected our basic findings, as will become apparent below). This procedure yields three frequency distributions over stimulus values one for each of the three judgments. For the studies to be reported here we concentrate on the means of the distributions of the just-in-time judgments, without taking the other two distributions into account. In the exploratory studies on which we focus here we relied on natural variation of unconstraint action. Stimulus movies for a given condition were always prepared from six original recordings. These original recordings were taken from two acting individuals (one male/one female) who each performed three replications of the same transport action. As a result, a total of 2 x 3 x 17 = 102 stimulus movies was prepared for each condition (resulting from six original recordings x 17 stimulus versions as required by the Method of Constant Stimuli).


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

Obviously, since this paradigm draws on natural variation of unconstraint action, it leaves a number of potentially important parameters uncontrolled. Still, there was one observation that we replicated over and over again in an extended series of experiments: a constant positive time error in the judgments of the times of reappearance. We piloted various versions of the task. For instance, in one version, the acting person grasped a mug with a power grip, transported it from left to right and finally put it down at the target position. In another version the acting person grasped a spoon with a precision grip and then transported it from left to right, too. In the pilot experiments we ran both versions randomly intermixed. The basic observation that we made for both versions was a marked positive time error: The mean of just-in-time judgements was regularly obtained for stimulus movies in which the time of reappearance was postponed by about 40-120 ms (relative to the "true" point of reappearance in the original recording). In other words, in order to be perceived as just-in-time, the final unoccluded segment of the action had to start 20-120 ms later than in the original recording. Conversely, the original recording was regularly judged to be "too early". A constant time error of this magnitude is perhaps not a surprising finding in itself. Further inquiry can approach it in two ways. One is to regard it as a phenomenon that needs to be explained in itself: How does the error arise and why is it positive and not negative? The other approach is to regard it as a tool for exploring more general functional questions. For instance, if one considers the time error a signature of simulation, one may use it as a means for studying how simulation is related to perception proper.

15.3 Linear extrapolation How does the time error arise and what does it tell us about the representation of occluded action? An obvious account that we considered first is based on the notion of linear interpolation. If we assume that the occluded part of the actions shown in our movies can be roughly approximated by a linear function (as Figure 2 implies), we have two obvious candidates for the source of the time error: slope and intercept (see Figure 3). One possibility is that it arises from inappropriate speed of extrapolation. According to this view the simulated movement is (for some unknown reason) slower than the perceived movement would be. The other possibility is that the error arises from some constant operation for switching between perception and simulation. For instance, as illustrated in Figure 3 it may reflect initial switching costs for getting the extrapolation started. Note that these two explanations are not mutually exclusive: the error could also reflect a mixture of both, a slope and an intercept effect.

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


Figure 3. Two possible accounts for the positive time error. a: Error arises from slope; b: Error arises from intercept.

Figure 4. Testing the slope vs. the intercept account. a: Occluder width: According to the slope account, t should increase with occluder width, whereas occluder width should not matter for the intercept account; b: Movement speed: According to the slope account t should increase as movement speed decreases, whereas movement speed should not matter for the intercept account.

Slope vs. intercept which one is true, or how much do these two explanations contribute to the resulting time error? To address this question we ran an experiment in which we manipulated two independent factors and studied their impact on the time error (cf. Figure 4). One factor was occluder width. We reasoned that, if the error arises from inappropriate slope, it should monotonically increase with occluder width. Conversely, if it arises from a fixed intercept, it should be independent of occluder width, i.e. the time it needs to travel behind the occluder. The other factor was movement speed. Here we reasoned that, if the error arises from inappropriate slope, the resulting error should be the larger the slower the movement is, i.e. the more time the travel behind the occluder requires. Conversely, if it arises from a constant intercept it should be independent of movement speed. We ran an experiment in which we combined these two manipulations in a factorial design. There were two occluder widths and two movement speeds. Occluder widths were blocked, whereas movement speeds were randomized within blocks. Occluder width was manipulated through the use of two different card boards (large vs. small). Movement speed was manipulated indirectly by using two


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

different kinds of actions that went along with different (mean) transport speeds: placing vs. pouring. For both of these actions the initial scene consisted of a teapot on the left-hand side (home position), a mug on the right-hand side (target position), and the occluder in-between. In one condition the acting person grasped the teapot on the left, transported it to the right and placed it in front of the mug. In the other condition s/he grasped it on the left, transported it to the right and then started pouring tea into the mug. In both conditions the movie ended when the teapot reached the target position, i.e. when it was either placed in front of the mug, or when the pouring started (i.e. the pouring itself was not shown). Pilot work had shown that these two actions differed not only in terms of what happens after the teapot reappears behind the occluder (i.e. placing vs. pouring) but also in terms of what happened before. When the teapot was transported for the sake of pouring, the transport was much slower than when it was transported for the sake of just placing it. This relationship seems to reflect an impact of the ultimate goal on the kinematics of the preceding transport. It may be interesting in itself, but here we used it as a means for manipulating transport speed. As a manipulation check we recorded mean occluder times, i.e. the mean true times required for the transport between the two edges of the occluder in the original recordings. As can be seen from Table 1, there was in fact a substantial difference between occluder times for placing and pouring no less substantial than the difference between large and small occluders. How is the constant time error affected by these manipulations? The results are shown in Table 2. As can be seen, the magnitude of the time error is strongly affected by both, occluder width and transport speed. Therefore the first conclusion that comes to mind is that the intercept account must be refuted. As discussed above it would have predicted identical time errors for all four conditions. However the slope account must be refuted as well, since results are for both factors opposite to predictions. As discussed above the slope account predicts small errors for small occluders and large errors for large occluders; the results show the opposite. In the same vein, the slope account predicts small errors for fast movements and large errors for slow movements; results show the opposite again.
Occluder Size small Speed of Transport fast slow 186 247 large 386 733 placing pouring

Table 1. Transport task: Mean occluder times (in ms) as a function of occluder size (small/large) and speed of transport (fast/placing vs. slow/pouring).

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


Occluder Size small Speed of Transport fast slow 141 69 large 115 18 placing pouring

Table 2. Transport task: Mean constant time errors (in ms) as a function of occluder size and speed of transport.

It therefore seems that we need to forget about both the intercept and the slope account and, perhaps, abandon the notion of continuous extrapolation altogether. Our findings suggest that something else may be going on.

15.4 Starting from scratch The notion of continuous extrapolation implies strong links between perception and its substitute for simulation. The basic idea is that perceptual mechanisms extract parameters from the initial movement segment that are then used to parameterize the substitute mechanism accordingly, to the effect that simulation takes over and carries on what perception has begun with. Is there any alternative to this view? As indicated above, one could perhaps think of a less conservative picture, claiming that when the action disappears behind the occluder, simulation starts something new rather than carrying on something old. For instance, one could think of the simulation system initiating a novel goal-directed action. That action would be goal-directed in the sense that its parameters are derived from both the initial segment and the final segment in which the action's goal is eventually attained. The idea here is that the substitute mechanism, rather than carrying on the action seen before, starts a new action toward the same goal. In order to understand what the possible implications of an approach along these lines may be, we need to start from a slightly more realistic picture of the actions and their kinematics (Figure 5). The modified picture takes into consideration that the transport relies on a biological movement that follows a simple law: it accelerates at the beginning and decelerates in the end. (The picture is of course still highly schematic in that it maintains the idea that the intermediate segment of the movement is approximately linear.)


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

Figure 5. An alternative view. a: A more realistic picture of the trajectory of the transport movement: as every biological movement, it accelerates when leaving the home position, and decelerates when arriving at the target position; b: Starting from scratch: the simulation trajectory reflects a goal-directed action of its own; c: With this scheme time errors may be larger for small than for large occluders; d: Likewise, time errors may be larger for fast than for slow movements (for details see text).

Figure 5b illustrates what starting a novel goal-directed action could mean under these conditions. The course of the simulation trajectory captures the idea that, when the stimulus disappears behind the occluder, the simulation mechanism starts from scratch, initiating a novel goal-directed action. This has two implications. One is that the trajectory of the simulated internal movement follows the same basic law for biological movements to which the perceived external movement obeys, i.e. it accelerates in the beginning and decelerates in the end. The other is that the internal movement is programmed to meet the external movement at target. Combining these two assumptions yields a simulation trajectory that, relative to the 'true' trajectory, is initially much slower, but then becomes gradually faster and catches up. With this scheme in mind we can readily explain why small occluders go along with larger time errors than large occluders do and perhaps also why fast movements go along with larger errors than slow movements. As concerns

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


occluder width, it is obvious that for small occluders the point of reappearance will fall in the slow initial period of the simulation trajectory. Conversely, when the occluder is large enough, the point of reappearance will already be close to the point of convergence between the true and the simulated trajectory (Figure 5c). As concerns movement speed, predictions are less clear since we do not know whether the parameters for the simulation trajectory are estimated from local or from global information. Local information is provided by the trajectory of the initial unoccluded action segment on each given trial. Global information is available from past information, i.e. from integrating local information over a number of previous trials. When fast and slow movements are randomly intermixed as was the case in the present experiment it may not be possible to derive reliable estimates from local information, and simulation may rely to a greater extent on global information, i.e. on averages derived over a number of trials. As Figure 5d shows, this would predict a pattern of time errors that is in line with our findings, i.e. with larger time errors for fast than for slow movements. Of course, we need further experiments to test this post-hoc explanation experiments in which movement speed is parametrically controlled and kept constant within blocks of trials.

15.5 Taking goals into account Further support for the idea that simulation trajectories are affected by action goals comes from an experiment in which we manipulated the implied duration of the act performed at target. In this experiment the acting person transported a teapot from left to right and, after reaching the target location on the right-hand side, started pouring tea either into a cup or into a mug placed at that location. As before, the movie stopped when the pouring started, i.e. the pouring itself was never shown. We ran this experiment in order to study whether the (implied) duration of the act of pouring has any impact on the course of the simulation trajectory. Would observers' knowledge of the fact that, on average, it takes longer to fill a mug than a cup have any impact on the time error at the edge of the occluder?

Figure 6. Scheme illustrating how the time course of goal attainment may affect the kinematics of the simulation trajectory (see text for explanation).


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

The logic of the underlying reasoning is illustrated in Figure 6. If it takes longer to fill a mug than a cup (and if observers know this), then the duration of the (implied) act of pouring could make an impact on the estimated time of attaining the target. For instance, if one assumes that that estimate is equally affected by both, the visible onset and the implied offset of that act, one would have to expect that the estimated time at target is earlier for the cup than for the mug condition. If so, this difference should be reflected in a corresponding difference of the time errors recorded at the edge of the occluder. Again, we ran the experiment with two occluder widths, small and large. As shown in Table 3 results were in line with our expectations. For both occluder sizes the time error was clearly larger for mugs than for cups. (We also replicated our previous finding that small occluders yield larger errors than large occluders do). We take this finding as preliminary support for the notion that action simulation relies on representational mechanisms that take action goals into account. At this point this conclusion is limited to the time domain. With respect to that domain we may conclude that simulated trajectories are modulated by the expected time course of goal attainment. Since mugs and cups were randomly presented, the difference in time errors must this time rely on local information derived from each given trial. This is different from the previous experiment for which we argued that local information may be too unreliable when the task is randomized. However, in that experiment pertinent local information could only be derived from the initial unoccluded segment of the transport, whereas in the present experiment pertinent local information can be accessed throughout the trial (i.e. the mug or the cup that is visible from the beginning through the end of the trial). This may explain why we see an impact of local information in the present study, but not in the previous one.
Occluder Width small Target Object Mug Cup 113 73 large 91 41

Table 3. Pouring task: Mean constant time errors (in ms) for two target objects (mug/cup) and two occluder widths (small/large).

15.6 Conclusions What, then, does the positive time error at the edge of the occluder tell us about relationships and transitions between perception and simulation? Although our exploratory studies still need further confirmation, we may draw some conclusions to guide further studies.

W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action


First, the notion of linear extrapolation does not work neither in terms of slope nor of intercept. Clearly, this questions the underlying intuition that internal substitute mechanisms take over and carry on what external perceptual mechanisms have begun with. Second, what our findings suggest instead is that internal substitute mechanisms initiate novel goal-directed actions that start from scratch. Their trajectories are intrinsically non-linear, obeying the laws of biological motion. Third, the computation of these substitute trajectories seems to take action goals and the time of their attainment into account. Goal-related information may either be locally derived from the ongoing trial, or globally derived from integration over a number of trials. These conclusions are, at this point, based on the specific task we have used. An important issue that needs to be addressed in future research is how task-specific our findings are. Since our task requires observers to deliver explicit judgments concerning the timing of the action, it may perhaps provoke them to initiate novel actions rather than rely on linear extrapolations. Will the same mechanism apply when judgments refer to spatial rather than temporal aspects of the action or even when no explicit judgments are required at all? At this point we cannot rule out the possibility that simulation may rely on two independent mechanisms an automatic 'conservative' routine that relies on extrapolation and a controlled 'creative' routine that relies on action generation and that the relative contributions of the two routines depend on task demands. For instance, one could think of the extrapolation routine as a default mechanism that gets automatically triggered whenever an action gets occluded, whereas the controlled routine takes over if and when explicit judgments are required. Therefore, the evidence from our paradigm, though it speaks in favour of controlled creation of novel actions, does certainly not rule out the existence of an automatic routine for extrapolation of old action. As a final remark we should keep in mind that the time error cannot reveal anything about the issue of representational modalities involved in action perception and simulation. The notion that novel actions start from scratch is entirely neutral with respect to the issue of representational modalities. That novel action could be represented in the very same modalities that are already involved in the foregoing perceptual representation (say, visual, kinaesthetic, motor, etc.) or it could go along with switching from one to the other or even replacing one by the other. The mere fact that perception gets replaced by simulation does not in itself imply that new representational modalities come into play. This issue also needs addressing by further research.

15.7 References
[1] M. A. Umilt, E. Kohler, V. Gallese, L. Fogassi, L. Fadiga, C. Keysers & G. Rizzolatti, I know what you are doing. A neurophysiological study. Neuron, 31, 155-165, 2001. [2] T. Jellema & D. I. Perrett, Perceptual history influences neural responses to face and body postures. The Journal of Cognitive Neuroscience, 15, 961-971, 2003. [3] J. Dokic & J. Proust, Simulation and knowledge of action. Amsterdam: John Benjamins, 2002. [4] S.-J. Blakemore & J. Decety, From the perception of action to the understanding of intention. Nature Reviews Neuroscience, 2, 561-567, 2001. [5] V. Gallese, Embodied simulation: From neurons to phenomenal experience. Phenomenology and the Cognitive Sciences, 4, 23-48, 2005.


W. Prinz and G. Rapinett / Filling the Gap: Dynamic Representation of Occluded Action

[6] R. Grush, The emulation theory of representation: Motor control, imagery, and perception. Behavioral and Brain Sciences, 27, 377-442, 2004. [7] W. Prinz, What re-enactment earns us. Cortex, 42, 515-517, 2006. [8] G. Rizzolatti, L. Fogassi & V. Gallese, Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2, 661-670, 2001. [9] M. Wilson & G. Knoblich, The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131, 460-473. [10] G. Kanizsa, Margini quasi-percettivi in campi con stimolazione omogenea. Rivista di Psicologia, 49, 1955. [11] W. Metzger, Laws of seeing (L. Spillmann, M. Wertheimer & S. Lehar, Trans.). Cambridge: MIT Press, 2006. [12] A. Michotte, G. Thins & G. Crabb, 1964. Les complments amodeaux des structures perceptives. In Studia Psychologica. Louvain: Institut de Psychologie de l'Universit de Louvain. [13] M. Jeannerod, Neural simulation of action: A unifying mechanism for motor cognition. NeuroImage, 14, S103-S109, 2001. [14] M. Graf & W. Prinz, Predicting point-light actions in real-time. NeuroImage, 36, T22-T32, 2007. [15] S.-J. Blakemore & C. Frith, The role of motor contagion in the prediction of action. Neuropsychologia, 43, 260-267, 2005. [16] G. Csibra, Action mirroring and action understanding: An alternative account. In P. Haggard, Y. Rosetti & M. Kawato (Eds.), Sensorimotor foundations of higher cognition. Oxford, UK: Oxford University Press, 2007. [17] D. M. Wolpert & J. R. Flanagan, Motor prediction. Current Biology, 11, R729-R732, 2001.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



The Role of the Face in Intersubjectivity, Emotional Communication and Emotional Experience; Lessons from Moebius Syndrome
Jonathan COLE
Abstract. The importance of the face and facial expression in enactive intersubjectivity is explored by reference to the experience of those with Moebius Syndrome. This rare, congenital condition affects the brain stem, leading to a variety of impairments of which the cardinal ones are an absence of movement of the muscles of facial expression and an absence of movement of the eyes laterally. Those with Moebius have no facial expression and have difficulty with changing the direction of eye gaze. Narratives from several people with Moebius are given. For some their impairments in facial expression lead to problems in interpersonal relatedness and in both emotional communication and in emotional experience itself. Embodied, facial expressions seem to have a large role in interpersonal communication of emotion; without such exchanges the development of emotional experience itself may be difficult.

Contents 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 Introduction....................................................................................................... 238 Thinking happy ................................................................................................. 239 Losing control ................................................................................................... 241 Collection of bits ............................................................................................... 243 Emotion and music ........................................................................................... 245 Learning to feel ................................................................................................. 247 Conclusions....................................................................................................... 248 References .................................................................................................................... 249


J. Cole / Lessons from Moebius Syndrome

16.1 Introduction Two necessary conditions for the development of intersubjectivity seem to be individual, unique embodied identifiers and interaction between individuals movement. In humans the most obvious unique identifier is the face. Movement, in turn, is often described in terms of instrumental or locomotor action which may or may not - be designed to be intersubjective. However there are other embodied actions, many of which are often not attended to, which nevertheless are designed primarily for, and have profound effects on, intersubjectivity and on the perception of self; gesture and facial expression. The face and facial expression are so central to intersubjectivity, embodiment and emotional communication that their roles are difficult to understand unless one considers those who live without them. In this review, some of the experiences of those with a specific congenital impairment, Moebius Syndrome, which leaves subjects without the ability to move the muscles of facial expression from birth, are given. The narratives reveal something of the link between embodied enactive facial expression and intersubjectivity, and between that expression and emotional experience itself. In a recent book I explored the relationship between the face and the self through the extended narratives of those who have a fractured relation between these two; those whose faces have been called into question [1]. One man who went blind progressively as an adult tried to remember his loved ones by remembering their faces, as though photographs on a wall. He became most depressed not when he was finally blind but when these visual memories faded and he had no way of representing others. He was saved by a progressive shift in this from the visual to the auditory sense. The painfulness of this loss, however, was emphasised by another who had gone blind in his 20s; he thought that the lucky ones were blind from birth. Another man, blind from birth and who had never seen faces, said that for him others always were their voices. His selfhood, he said, resided in voice. The importance of the social dimension of faces was shown again and again in those with facial disfigurement1. Often people with a facial disfigurement would have no problem in their use of the face, in their eyesight, in facial movement and expression, or in eating or speech, but their lives were made miserable and incomplete by the responses of some people they met. Some felt stigmatised to such an extent that they would lose confidence so much that buying food, or a bus ticket, or going to the cinema, was beyond them and they retreated to an almost asocial existence. They showed the fundamental nature of face for self - other relationship. As Merleau-Ponty wrote, I exist in the facial expression of the other. [2] I gauge my success by the feedback of those around me, and if those people, particular new people, look away from me, and my appearance, then my self-perception and self-hood suffers. Within these groups of people there was main one division; between those born with the condition, like some with blindness, and those who have a change in their embodiment, say a disfigurement. The latter seemed more able to describe their

Here the use of the term disfigurement for a visible difference follows the use adopted by the UK organization which supports those with facial problems, Changing Faces ( It is a compromise between current usage and the need to de-stigmatise, and remove the dysfunction, abnormal and pathological attitude to the way others live.

J. Cole / Lessons from Moebius Syndrome


journeys compensating for, and coming to terms with, their altered selves. Those with congenital conditions, and who had known no other existence, seemed less able to unpick the seamless relationships between self and face, and between self and others. Two chapters within the book focussed on a rare and fascinating condition, Moebius Syndrome. This is usually a developmental, sporadic genetic condition of a part of the brain between the spinal cord and the brain called the rhomboencephalon. Though sometimes, more correctly, called a sequence rather than a syndrome, because all with it do not have exactly the same problems, the cardinal features are lack of functioning of cranial nerves VI and VII, which control abduction or outward movement of the eyes and movement of the muscles of facial expression. Other variable features include conjugate gaze disturbances, tongue malformation, hand and feet disorders, difficulties in swallowing and speech, dental problems and a small lower jaw, as well as clumsiness and poor coordination [3]. Some suggest an excess incidence of learning difficulties and or autism too, though this remains uncertain. The most obvious aspect of the condition however, is that people with Moebius have no movement, or very little movement, in their faces, meaning that facial expression is absent. In About Face I spoke with several adults with Moebius who described their lives and their problems in some detail. They seemed to suggest that one large deficit was in emotional expression and experience. This seemed to take two, polar opposite, forms; absence or reduced emotion and, when emotion broke through, an inability to control it or express it appropriately. The people I spoke with are called James, Clare and Duncan in the book. 16.2 Thinking happy James, in his 50s, does not remember any awareness of being different until he moved out of the family circle aged eight to go to the village school. He was very late going because of Moebius. I don't think it had occurred to me inside the family environment that there was anything particularly unusual about me. What made me realise I was different was the questioning about my funny face. At the age of eleven, when I went to the grammar school, I used to be asked why do I cry when I eat? I still do and have to wipe my face. I did, however, begin to become aware of difficulties in communicating with people. For instance in those early years at grammar school some people didn't understand me. If I put my hand up in class because I knew the answer the teacher wouldn't ask me. I felt neglected. He had speech therapy and was bright enough to go to Cambridge, where he read religion, having wanted to become an ordained vicar for sometime. I asked him how he viewed his face and his self in those days. He answered tangentially. I have a notion which has stayed with me over much of my life - that it is possible to live in your head, entirely in my head. Whether that came out of my facial problem I don't know. I was very introspective. I divided people into two


J. Cole / Lessons from Moebius Syndrome

categories: those who didn't want to have anything to do with me for various reasons and those who did. I think I had a low idea of self worth. I haven't related these things to my face particularly and that's why I haven't been speaking about it. I just haven't focused anything on the face. I had feelings of low self-esteem and loneliness and isolation in company, where I wasn't with anyone particularly, and I had the feeling, say, at a long table during a meal, say in my Cambridge College, that the conversation divided around me and I was left on my own to eat my food and I was happy to do that - but not really happy. These feelings I have lived with, in my head. I always found it difficult to break in. It is only very recently that the whole area of non-verbal communication has even come to my attention. I know now that since I put out a reduced range of signals I receive back a similarly reduced range. Is he going to know me today, is he going to speak to me today, as I approach someone? As I go about in the street I see people coming towards me and I can tell if they're going to get ready to speak to me if I speak to them, but its taken me a long time to latch onto that fact. He was a very good and placid child, which reflected his reduced emotional range. When he met the wife, he told me: I think initially I was thinking I was in love with her. It was some time later when I realised that I really felt in love.' He became a parish priest and for many years hid behind his vestments which gave him a social role. In his 40s, however, his mother died and this triggered a period of depression and self-doubt. He approached people in the Church and tried to make them understand but it was difficult. I had never previously talked to anyone very much. I love the Church of England and there are fine men amongst its leaders but they don't really want to know. They may ask you how you are and its OK if you say, "Very well," but if you say, "I'm about to drop this tea tray," they don't know what to do. Or they give you the impression that they'd say, "Well drop it and then we can do something. But before then we're powerless." I did go through a period in the late '80's when I was quite desperate and I would describe it in terms of "I am going to drop this tea tray". Perhaps to get attention but I didn't really want to drop it I just wanted someone to listen and do something. Those problems I didn't explicitly relate to the problems of the Moebius Syndrome. But they do relate to the more general need to find myself, which in turn relates to the Moebius. Perhaps I did turn my back on self expression. Perhaps self expression was beyond me. It was just quite enough to get by day to day. I think there's a lot of dissociation. But I think I get trapped in my mind or my head. I sort of think happy or I think sad, not really saying or recognising actually feeling happy or feeling sad. Perhaps I have had a difficulty in recognising that which I'm putting a name to is not a thought at all but it is a feeling, maybe I have to intellectualise mood. I have to say this thought is a happy thought and therefore I am happy. I think also that I have a fear of being out of control with emotions, feeling

J. Cole / Lessons from Moebius Syndrome


something that I can't manage. I have also found it very difficult to communicate feelings throughout my life, whether as a child or with my wife, though I think I am getting better at it now. I don't really know how I communicate happiness or sadness. That's a very hard question. Some people cry when they're sad. I don't. I sometimes felt that I would like to be able to cry but you see I am not really able to cry, my tears can come but there's nothing else. My tears only flow when I eat. I am afraid of such feelings. I try and shut them off. Recently, in his 50s, he began to explore his feelings more. Whereas before he had almost repressed the idea that his problems reflected the Moebius, now he is coming to terms with it. I now realize that some things which may have been due to the condition I felt were just down to me. Rather than saying that the condition has made life difficult I have been saying I have made life difficult. It was my fault. I have failed. One of the things I think that's happening now is that I have a sense of becoming freer; freer in the sense of becoming more myself, not playing a role. I certainly wanted to try and explore me behind the mask of the priesthood. If you say where does 'me' now reside, I think I am slowly coming out of my head. I am not sure I can locate where I am but I don't think I am entirely in my head or even my mind. I have an expression of living 'a life of the mind,' but I do accept that the mind is not easily able to communicate its thoughts or even its feelings. I think I was out of touch with my feelings, or I suppressed a lot of them.

16.3 Losing control Clare was in her 30s2 when we met. Though I had arranged to talk with her, her parents were there and it was her mother who answered most. She's always been very highly strung. If she was sitting down or in a pram and I'd walk out of the room she'd scream. She screamed a lot. If I didn't take her with me everywhere she'd scream; if I went upstairs she had to come with me she went everywhere with me. We didn't have a night's sleep for four years; she used to scream and wake us up. Thinking back, I realise it was because she couldn't shut her eyes [people with Moebius cannot move their eyelids much and so cannot shut their eyes]3. She may have woken up and been frightened of the dark. Maybe if we'd had a small light in her room it may have been better. To wake up in the pitch black and be unable to see anything must have been terrifying. If she did not talk until she was five or so I wondered what was going on in her mind. I imagine she thought a lot, I don't know. When I was in the room she

2 3

People with Moebius tend not to have many wrinkles and so their ages are difficult to judge. Clare told me that, 'My first childhood memory is of having my teeth out at 3, a big black mask came over my face. Even now if somebody goes towards me I can't bear it.' Unable to close her eyes she must have just lain there till she passed out, seeing everything.


J. Cole / Lessons from Moebius Syndrome

followed me with her eyes, and her head of course4, but as soon as I left the room she'd yell. In really stressed situations she'd lose control. She couldn't cope any more and fall down, kicking out, spluttering and shouting. Once she ended up in Casualty. She's had two EEG's to see whether these episodes are epileptic. Earlier that year she had a severe episode, whilst her mother was in hospital. Previously she had been seen by a psychiatrist about her emotional outbursts. This time she walked round to the GP, apparently calmly, and asked to go to the local psychiatric hospital. Clare said: I remember, though I try and block it out of my mind. I know what's going on. I can't cope with assertive situations. She stopped, so I tried to help her out, suggesting that whereas some people might say something's awful, or even bloody awful, and then get angry, she might find it difficult to express emotions until they boil over, that she might go from nothing to a meltdown? She agreed: I've always felt it difficult to express how I feel. I know its only in the last couple of years that, say, at church, when I meet someone, I just say "I can't smile" and then it becomes easier. For 30 years or so I used to put it to the back of my mind. Now I'm beginning to be aware more of it. I wondered if she felt that the 'up and down' parts of her emotions were different to others. I asked if she remembered getting excited at Christmas or birthdays. Not really. It was a similar story from the mother of a small boy with Moebius that I saw, Duncan. The highlights of a normal childhood seemed to pass him by. His mother said: I remember his fifth birthday party, he was sat in his high chair and went to sleep; it was just like another day for him. He didn't want to know, he didn't want to play. He doesn't really get excited on birthdays, even his own. It is difficult to know when he's having fun. When he comes home from school we don't know how he's feeling, we have to ask him. Everything is questions and answers. He has always been a very placid child. He never really gets angry, never really appears upset. I wish I had taken more photographs. Because he never did anything and you usually take milestones, I never took them. He always sits back and listens and stores things for later, much more reflective. We always cuddle him but its true that probably because he's so thoughtful and reflective our approach to him is less spontaneous. I used to cuddle him but he never really cuddled back. Now I
Those with Moebius have to move their heads to look around since they cannot move their eyes much.

J. Cole / Lessons from Moebius Syndrome


still cuddle him because he's my baby but he just sits there saying "I'm too old for this now Mum". All these experiences I had gathered for About Face. In that I tried to weave a thesis about the face as evolving to express more complex emotional states as we became increasingly social and needed to look at others more closely. The face, I suggested, was crucial in the development of the individual, as a unique identifier, but also in the way on which emotions were communicated and indeed developed. But I was always concerned whether the experiences of those with Moebius that I had seen were typical though, of course, even if not they would still be important. So with a friend and colleague who actually lives with Moebius, Henrietta Spalding, we have been seeking the experiences in others who live with Moebius [4]. 16.4 Collection of bits The next interviews with people with Moebius are of interest not only for their memories of childhood, but because they can reflect on how they have changed over the years. Though Moebius is congenital and so people know no other condition, these individuals are able to look back with some insight. Celia is a woman who remembers her time as a young girl, both before school and during those first few years of education. I did not do ballet, horse riding etc, I did hospitals and operations. I had the eye doctor and the foot doctor and a speech therapist, who I dont remember, and a face doctor. She was never aware of not seeing before these ops but then, as she says, she cannot see well even now. My limitations were a fact of life. Not being able to see the blackboard, or not being able to see someone over there. I have, or had, a squint and astigmatism. The shape of the Moebius eye is also different and I cannot move the eyes or move the head so easily, my muscle being not so well developed. Crossing a road is still difficult. I cannot judge when a car over there is going to get to me. I cannot measure distance and moving, the velocity or whatever is out. As a child I could never catch a ball. As well as having the talipes, her feet were also painful; she never told anyone. No one asked. When I was 7 I stopped walking because the feet were so bad and I had to go to school in a wheelchair. I dont remember learning to walk. After some surgery I could walk but I never told anyone I was in pain. People dont ask little children. I always remember that as an adult I have had pain, but I dont remember pain as a child.


J. Cole / Lessons from Moebius Syndrome

The myriad of conditions - feet, mouth, eyes, skin (and other ailments too) - and the countless visits to doctors and therapists had an unusual effect on the way Celia viewed herself. I never thought I was a person; I used to think I was a collection of bits. I thought I had all these different doctors to look after all the different bits. At half term other children would go off camping or swimming courses, I would see the doctors, this one, then that one. Celia was not there; that was a name people called the collection of bits. I did not like my feet; I liked my spirit because I was strong as a child. I liked my brain; I knew I had a brain. I loved reading and read very early on. I liked that bit. I could think and dream and imagine. I had an IQ test which was very high. I was bright, so I didnt worry about the rest. Even though I was a collection of bits I always knew there was something strong inside that I had a mental dialogue with, but it was not the physical body; it was very separate from the physical. In his famous Meditations, Descartes, exploring certainty and the nature of identity, doubted his embodiment but could not conceive of existence without mind; this was where his famous dictum, I think therefore I am, originated. Celia here seems to be making a similar disjunction between herself as whole, thinking being and her imperfect body; Celia was a Cartesian child. She would have an internal dialogue with herself in her thoughts and imagination. In contrast her speech with others was about matters of fact. At 5 the only talking I could do was big, about operations, say to doctors; I could only talk to adults, about my bits, not about me. I could also talk about books. Adults were my friends, not children. I just could not do playing with the other kids. School wasnt bad in the early years though. She loved learning and liked to lose herself in the routine. Learning fed Celia like nothing else. I did not express emotion. I am not sure that I felt emotion, as a defined concept. At my birthday parties I did not get excited. There were people around excited, but I followed what they did. I dont think I was happy, or even had the concept of, happiness as a child. I was saddened by being in pain or having horrid things like a blood test. Sometimes I would cry but even that would almost be a delayed reaction. I would have been sad so long the tears would come as I did not know what to do. Tears would come when she could not deal with a situation. Once, a mum at a birthday party told her to close her mouth when eating. She did not know what to say, so she cried so she could leave the party. I did not know about an emotional world. I thought it might be related to my legs. I knew that being happy was something I couldnt do. Everyone told me I couldnt smile. I never got excited at Christmas. I watched others being excited. I verbalised it but not in an emotional way. I knew things were not as they should have been, even though I did know how they could have been different. I was the

J. Cole / Lessons from Moebius Syndrome


eternal happy ending girl. Even though things were pretty grim, as a child you dont have a choice, you cannot just stop. I did not realise I was unhappy, I just was. Now, as an adult, I will say, This is nice! when I see something I like, just as you smile. Then, with adults, I would have a conversation but with children I was a bystander. Children had another language, a word language, a body language, a facial language. They run around and jump up and down and I could not do that because my legs did not work and because of my lack of balance.

16.5 Emotion and music Interestingly, for two other young women with Moebius music was an important way to explore emotions. Music, uniquely perhaps, captures and imposes emotional experience. Though it presumably evolved to be social, accompanying dance etc., it has also an abstract, almost asocial canon, usually classical music, which allowed these two to explore emotion, through it, on their own and without outward sign. Eleanor is now in her 30s, but she too remembers her time as a child with Moebius, and mentioned two things, her playing and experiencing of music and a novel experience she had with a friend. Sometimes, if you are in one state of mind, you need others to know. Emotion came for me when I played the piano. We always had a piano and I had lessons from aged 6. By 13 I was quite competent and I found that my fingers unleashed emotion and expression in me, even though I did not know what they were. I would play one piece again and again in various ways; happy, sad, cheeky, all jumbled up inside. Musical notes and pattern imposed a mood though not always the mood I felt. I might have been in one mood, but another would come out through my fingers, there were channels of all sorts of different things inside me. There were some sounds, often chords, which did give a feeling of pathos or tragedy; some seemed to sum up my pain. She seemed to be expressing and exploring moods through music even before she knew what these moods were. An artist might start with a palette with just grey and then, suddenly, red, blue, green are there to play with. As she played with them so she began to experience them too. Then, by playing with the colours, she understood what they meant. Eleanor agreed: Yes, I had to learn the palette without the feeling initially and then map feeling on. I grew up with music and heard different tones, even though I was not fully aware of emotions. Since I did not have the language, or the words, for feelings, the music and my fingers would convey them. Often what was conveyed was real pain. They could really say it. At that time, everything was in the fingers. I had no body language.


J. Cole / Lessons from Moebius Syndrome

She went on to talk about an old family friend and the simple yet profound effects of physical contact where previously it had been sparse. Peggy loved me as a child. I always had the adult thing with her and we would talk about how her young daughter behaved and, since I was only 14 myself, it was strange. Once I was with her during holidays and I was near the edge, being so unhappy with myself, I tried to speak with her. But, in the end, I told a cousin instead. She obviously told this friend and there was a real shift in our relationship. Peggy just gave me a hug; no one had given me a hug before. We would spend hours talking. I found a way to articulate things to her being different, being excluded and when she gave me the hug I was so shocked to know what physical warmth was, to be approached through the body not through the intellect. This embodied experience was extraordinary, and the next day it startled me out of my wits. There is a whole new something out there I knew nothing about. Music might have awakened emotional experience but there was something about the social and about an embodied experience which was essential too. Later she went to university, where she learnt far more than academic matters. I learnt in the first year to mix with people. I learnt about experiences, interests and cultures I had not met. I heard of abortions, single parent families and of poorer families. I just started talking to people. For the first time I had the choice of friends. I would go to a group, look around and talk to someone with the same interests and youd have common experience and could start to talk. For the first time my identity was not Moebius, though I did not know what it was that had not yet become defined. For a long time all my efforts had been to get to university. Now I was there, all the others were with their hair, make-up clothes, interests; I was not sure what I was. I did not really have anything to wear, but it didnt matter. Over the months I bought clothes like everyone else to wear, hippie clothes, and I started to develop a character. It was maybe artificial, but I could design my own. Most peoples evolve as they grow up; mine I picked. I wanted to be someone who people would like. Before people had not liked me, so I wanted to be sweet, gentle and likeable. I did not want to be radical at the time there were lots like that. I did not want to stand out. I just wanted to be non offensive, reliable and so that is what I became. That sort of person meant that I did attract people towards me. By the time I left university I was renowned for knowing everyone and everyone knowing me. How did I go from that first night, not knowing how to interact, to that? I learnt body language, interaction, to be comfortable with myself. I think I developed a broad circle of friends, because I found I could express different things with different people. I did not find a single, really intimate friend to whom I could tell everything but my series of friends fulfilled different needs. Eleanor learnt to become a character not as a child, naturally, but as a student through trial and error, acute observation and a formidable will to escape from the straightjacket of Moebius. But still it was not entirely clear that she was capable of taking into herself and feeling what she saw in those around her and aped. This is certainly the experience of another with Moebius, called Lydia.

J. Cole / Lessons from Moebius Syndrome


16.6 Learning to feel Lydia is in her 50s. It was at university in the 60s that she first found she was a social success very like Eleanor. People seemed not to care about Moebius and she learnt, from watching people, for the first time, how to be social. But, looking back, she realises that at this time she was not actually feeling much. She could gesture and sympathise but, to an extent, she was mimicking. It was later, when she moved to Spain and to another culture that she really began her emotional catch up. I do not think I had emotion when I was a child but now I have it. How did I get it? It was in Spain. I learnt Spanish in two months but more they are so theatrical in their emotional expression. The body language I had learnt and used at university could be exaggerated in Spain, using the whole body to express ones feelings. Over here in England it would be over the top, but there it was fine and because of this I learnt to feel within me. At Oxford I had learnt a lot of imitating and mirroring and copying but had not, to a very profound depth, had the feeling. I had been using it to conform and because if I did it I got the response, but I, myself, wasnt feeling. But in Spain everyone is so dramatic. If something awful happens then the world is coming to the end, and if fine you party all night. If sad you burst into tears and then go off to the pub. I had gathered all these skills in language and gesture and then in Spain I could just be me. Because of the cultural up regulation of feeling in gesture I learnt to feel. I am not sure how I mapped gesture and feeling onto my body, but I was starting to feel then. I could feel really ecstatic, happy, for the first time ever. Before, without the expression, I had found feeling difficult. Once in Spain I certainly had the means, the channel and the vehicle, and the feeling. Before, my thought was frigid or cold. I needed the continuation of a thought into real time expression within the body. Darwin said that an emotional feeling can either be expressed, continued and become exaggerated or, if not expressed, reduced and lost. It is as though it has to be in the continual present, continually expressed to be continually experienced. That was how it was for me. I was an intellectual at university. In Spain I experienced emotion. As a child I used to play a musical instrument with emotional expression, but the emotion did not really come from within. I could not let it out. Now, once I could express there was no stopping me. Lydias new experiences were not of course within her or about her alone. They emerged within a rich social, cultural world. In a place where emotions were communicated publicly more than in the UK she learnt, somehow, to experience and as well as imitate feelings. When you live and share emotion together then you all experience it together. I met a man with Moebius who lived in Sweden and he was one of the saddest people. He was completely wooden, with no body language, like a puppet. I met him in Italy; I hope he learnt something from them. If I had not lived in Spain I dont know how I would have turned out.


J. Cole / Lessons from Moebius Syndrome

In Spain, ironically, though they use gesture a lot, they talk in a much more monosyllabic way than here. They are not as musical in speech. My voice is melodious and I had begun to use it to control peoples response. So, in Spain, they all talked about my voice and loved it. That was a Eureka moment. My voice could be my thing, my tool, my vehicle of expression. The voice was the link from me to other people for feeling and for emotion. The language and words dont do it. I had those in Oxford. It was the voice, the melody going with the embodied gesture that completed the circle.

16.7 Conclusions These narratives are individual and from a small number of people, but do suggest that some with Moebius experience profound impairments with emotional experience. Until such experiences are quantified in larger groups it is not possible to know precisely how common they are. But even seen in a few, one must consider the possible explanations. It is possible that associated with Moebius there is a constitutional impairment of emotional experience, say a defect in amygdala/insula emotional resonance. However, the fact that people with the condition learn to feel later suggests that this is unlikely. One small boy with Moebius once looked up at his father and asked, Why cant I be happy? clearly associating facial movement with the emotional experience. We do receive internal feedback of muscle and skin movement in the face; might this contribute to our own mood and emotion? William James developed what has come to be known as the James-Lange Theory of Emotion, that we are happy because we smile rather than the reverse [5]. More recently Fridlund reviewed the evidence for this 'facial feedback hypothesis' and found none of it convincing [6]. He used examples which were necessarily limited; Bell's palsy, in which one side of the face is paralysed and drug induced temporary muscular paralysis. Neither provided evidence for changed mood and emotion following reduced facial movements, but both were unusual situations and unlikely to reflect natural functioning. Adelmann and Zajonc were much more convinced, and quoted experiments in which subjects posed emotion whilst looking at a scene or film and rating it as sad or happy [7]. Those posing a smile rated the film funnier than those posing a grimace. In one ingenious paper the subjects viewed a cartoon with a pen in their mouth, either between the lips, (so preventing smiling), or between the teeth, (so enabling, even facilitating, smiling). The latter group found the film funnier. All such studies have many drawbacks, not least that a posed smile is hardly natural, but this is not to say that there may be some contribution from feedback from the body to mood and emotion. When making a posed smile we are aware of the facial expression changing; when we do so naturally we are not aware of facial movement explicitly though it still may have an implicit effect. This of course would be missing in those with Moebius. Another important contribution may come from social and developmental factors. As a child we learn to express emotions which are usually larger and more labile than in adults. One of the things we do in childhood is to observe the effects of emotional outbursts and expressions from the response of those around us,

J. Cole / Lessons from Moebius Syndrome


particularly our parents. If we cannot express, in the body and on the face, then maybe our capacity to experience these emotions does not develop either. Certainly Clare and, to an extent, James were not able to express smaller, more subtle emotions and found these difficult to experience. For others emotional experience itself seems to have either not developed, or atrophied, or lain dormant for years. Interpersonal relationships may be more difficult to make and develop due to the somatic effects of Moebius. Without gaze control or facial expression and often with delayed language, drooling, and poor motor control it must be more difficult to engage with others and so develop a social existence. In the end the precise reasons for this emotional impairment may be multifactorial, with the somatic absence of facial expression and the effects this has on development and socialization being of most relevance. But from the narratives above what is clear is that for a range of emotion to be experienced it has to be expressed and communicated, and that these have to be through embodied action. Those with Moebius who learn to express do so through voice and prosody, through words and through gesture, to circumvent their facial immobility. Emotional experience appears to be linked to emotional communication via conversations between people. The development of intersubjectivity may therefore require both unique means of identifying individuals and means for these individuals to communicate their feelings. The roles of the face and of embodied action through facial expression in these become apparent in the narratives of those who live without facial mobility. 16.8 References
[1] J. Cole, About Face. Cambridge, MA and London: The MIT Press, 1998. [2] M. Merleau-Ponty, The Primacy of Perception. Evanston: Northwestern University Press, 1964. [3] H.T.F.M. Verzijl, B. van der Zwaag, J.R.M. Cruysberg & G.W. Padberg, Moebius syndrome redefined: a syndrome of rhomboencephalic maldevelopment. Neurology, 61, 327-333, 2003. [4] J. Cole & H. Spalding, Still Faces; living without facial expression. Oxford: Oxford University Press, forthcoming. [5] W. James, The Principles of Psychology. New York: Dover, 1950. [6] A.J. Fridlund, Evolution and facial action in reflex, social motive and paralanguage. Biological Psychology, 32, 3-100, 1991. [7] P.K. Adelman & R.B. Zajonc, Facial efference and emotion. Annual Review of Psychology, 40, 249-280, 1989.

This page intentionally left blank

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.



Autism During Adolescence: Rethinking the Development of Intersubjectivity

Abstract. The goal of this chapter is to theoretically explore intersubjectivity when social development goes awry. More specifically, it will explore intersubjectivity and autism by combining work with infant intersubjectivity, anthropological dimensions of socialness, and sociocultural positions on mediated action and sense of self in order to conceptualize how social changes during adolescence might transfigure mutual engagement. A sociocultural framework that focuses on mediated action is outlined as one way that individual mental functioning and social phenomena come together. The idea that routines become cultural, symbolic tools and social others often function as animate tools that support late emerging intersubjectivity is developed within the paper. Case study material of one child with autism, as reported by the mother, is used to illustrate how intersubjectivity can be constructed by means of these tools after years of disruptive communication and social engagement.

Contents 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 Introduction....................................................................................................... 251 Statement of the problem .................................................................................. 252 Autism from the inside out ............................................................................... 253 Intersubjectivity and autism .............................................................................. 254 Socialness and mediated action......................................................................... 255 Cultural tools and the sense of self ................................................................... 257 Conclusions....................................................................................................... 259 References......................................................................................................... 260

17.1 Introduction The importance of intersubjectivity for the development of communication has a long history in the social sciences. Those taking a developmental perspective often situate its beginnings in the biological substrates of humanness. For example, Brten [1] used the notion of presentational immediacy to suggest that infants come prepared for mutual attunement, and Trevarthen [2] postulated that infants are equipped with primary intersubjectivity that allows them upon birth to enter into mutual engagement with social others. The goal of this chapter is to


F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity

theoretically explore intersubjectivity when social development has gone awry by rethinking the manifestations of this during adolescence. Focusing on autism as a challenge for the development of intersubjectivity, case study material will be used in conjunction with theoretical contributions from the literatures on infant intersubjectivity, anthropological dimensions of socialness, and sociocultural positions on mediated action and sense of self to question and transfigure theoretical perspectives about changes in mutual engagement during adolescence. 17.2 Statement of the problem It can be asked why fundamental positions on intersubjectivity are going to be revisited during the adolescent period using individuals with autism. There are several reasons. First, there is a long standing debate in the literature about whether children with autism develop theory of mind (ToM) [3]. Once accepted as part of the syndrome, scholars are increasingly making the argument that failure in ToM tasks are more about the tasks than about the thinking of the child. This claim and counter argument have been made by Trevarthen and Aiken [4] as well as Ochs et al. [5] and legitimizes perspectives on atypical development that differ from the prevailing deficit literature. Second, regardless of the failure of children on ToM tasks and other measures of social understanding, parents continue to report that their children, even those who are severely autistic and non-verbal, intentionally communicate and convey a wide range of emotions during interactions. This signals a difference between the evidence of science, in the narrow sense, and lived experience. And finally, fundamental positions on intersubjectivity are being revisited because they provide the theoretical evidence for making decisions about life trajectories when development is atypical. Societal demands change during adolescence for parents and children as transition from formal schooling to life beyond this is planned. Therefore, adolescence becomes not just a biological reality but a social shift point. This necessitates rethinking what children with less than perfect neurophysiologic status can and do accomplish in order to interact with others. Acknowledging the forms that mutual engagement may take can enrich developmental theory and at the same time substantiate a push toward possible futures rather than relegate adolescents with autism to stagnant living situations. The main points of this paper will be to raise questions about present notions of the development of intersubjectivity when mutual attunement seems not to exist (early on) for parent-child dyads; to theorize about the role of social others and cultural tools as mediators for constructing intersubjectivity; and to illustrate points of intersubjective achievement when cultural tools and social others as animate tools become the units of analysis [6]. In order to shed new light on the development of intersubjectivity, a sociocultural framework that focuses on mediated action [7, 8] will be outlined as one way that individual mental functioning and social phenomena come together. The idea that routines become cultural, symbolic tools and social others often function as animate tools that support late emerging intersubjectivity will be developed through case study material within the paper.

F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity


17.3 Autism from the inside out The cause of autism as a result of insufficient maternal nurturing to a neurologically based developmental disorder changed in the latter half of the 20th century. Today it is considered one of the most severe behavior disorders of childhood and, regardless of causation, is manifested by social-communicative isolation [9]. Scientific investigations provide insights to autism as a disorder. Rather than a review of the extensive literature on autism, this paper uses parent description as the basis for rethinking development changes in intersubjectivity. Parent descriptions and stories provide insight about the everyday experience of living with autism as well as the ways that they recognize and guide their child as a person within the autistic experience. One such description comes from Donna (personal communication) when telling the story of her son Terrys emerging autism. Terry was a beautiful, healthy and happy baby. All this changed when he was about 2-1/2 years old. Rather than progressing, his development began to slide backwards. Within six months Terry lost all the words he had been speaking and no longer made attempts to get attention or interact with others. He began to bite and hit himself and began to use repetitive actions with little meaning. By age three Terry was diagnosed as severely autistic, and as a child who would never speak or be able to learn. Donnas story contains themes common to the stories many parents tell about their children with autism. Parents claim in the beginning, at birth and in the early months of infancy, they were in love with their babies and their babies were responsive, engaged and in love with them. Joy, laughter, smiles, gazing and touching were shared between mother and baby.1 In other words, from the perspective of the mother, the baby was normal. While there are variations among story tellers, sometime between the first and third year of life a change occurs. Some children become repetitive; some children dont talk; some children lose developed skills. These common stories of the beginnings of autism share a social-communicative theme. There is a communication breakdown that may or may not include specific language delay but does impact the social relation of the child within the routines of the family and, to even a greater extent, the relation of the child to the social world outside of the family. For most parents the years that follow the onset of autism in their children are filled with struggles to understand the syndrome and at the same time interpret their childs unique ways of understanding and communicating to other family members, social service workers and teachers. Confronted with the medical advice to institutionalize her first child, Donna began to learn about autism and worked to keep her child not only at home but cared for within the health and educational systems. For her, as for many parents, living with and learning about autism plunged Donna into theories about development and hoped for changes in intersubjectivity that would settle questions about Terrys future. Would it be one lived with inadequate social skills and poor interpersonal communication or one shared with others?

Mother will be used throughout this paper for ease of reading.


F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity

17.4 Intersubjectivity and autism A functional definition of autism that can assist with thinking through issues of intersubjectivity may well be that it is at the same time both a neurodevelopment disorder and a social disorder [5]. As a neurodevelopmental disorder, sensory systems, language and symbolic systems, and cognitive systems are disrupted [10]. As a social disorder, both interpersonal communication and functioning in the social world are impacted. Ochs et al. [5] disembedded the concomitant social issues into two dimensions when conducting ethnographic research with family of children with high functioning autism or Aspergers. They purposed that social can be understood as 1) interpersonal socialness and 2) socio-cultural socialness. The first describes communication that depends on the everyday shared use and interpretation of language. Children with autism vary in how much and how well they can use language, and even when they do use language, difficulty interpreting the messages of others can persist. This is often the focus of intervention and educational programs, since children with autism dont seem to get what is said and often say the wrong or inappropriate things to others. The more interesting case made by Ochs et al. [5] is the parsing out of a second dimensions of socialness. They used Hymes [11] notion of language as being context sensitive knowledge, as well as Lave and Wengers [12] work on situated learning and legitimate peripheral participation to arrive at the notion of socio-cultural socialness. This is connected to expectations about knowing, perceiving and engaging in conduct appropriate to specific social practices. Recognizing two kinds of socialness allowed Ochs et al. [5] to describe the social interactiveness of children and families in ways that went beyond the skills of the individuals. Their position is particularly interesting with regard to intersubjectivity because, as they say, Human beings are socialized to recognize and implement social practices, including their own and others expected roles, stances, and comportment, all of which require socio-cultural perspective. Donnas stories about her work with Terry in early childhood illustrate this point. According to Donna, I kept Terry at home and began working with him in conjunction with home services. There was little change in Terrys language skills. He never learned to speak like he was before autism, but did learn to use a few basic pictures for communication. He never stopped biting himself (and on occasions others). He remained a picky eater, and he required assistance for even the most basic of personal needs, usually running or screaming when others tried to meet these. We, all of us at home, just kept being consistent, demanding that he take part in life around him and that he communicate. As can be seen, Donna assigned social expectations and communicative roles to Terry and established a picture communication system as well as routines so he could participate in the everyday life of the home. These actions are consistent with the Ochs et al.s [5] two dimensions of socialness. The work of both Brten and Trevarthen provide developmental links to the above position on socialness. Brten [1] made the case for mutual attunement by suggesting that infants enter their social worlds at birth with a self-organizing system that allows for presentational immediacy that does not require or rely on linguistic modes of communication. This point is relevant for the later discussion

F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity


of adolescent intersubjectivity in the case of autism, as is Trevarthens [12] stance that preverbal intuitive communication is fundamental to the development of thinking and talking in culturally specific ways. Here Trevarthen is elaborating on his long maintained position that infants at birth have primary intersubjectivity that result in shared emotions with adults and supports the development of secondary (socially mediated) intersubjectivity [4]. He argues that collective human understanding, and hence secondary intersubjectivity, results from emotional and sympathetic experiences with others that arise out of continuous communication and construction of meaning. His writing on child motives for engagement and his broadening of the criteria of intersubjectivity to include emotional empathy and coordinated communicative expressions, not necessarily linguistic in nature, are particularly pertinent for thinking through the ways intersubjectivity might change over the course of a life lived with autism [4,12]. The positions of Brten and Trevarthen provide essential points that can support conclusions about intersubjectivity that may be real to parents but not to the occasional professional when children have severe forms of autism. This could apply to Donnas perspective. She was steadfast in her view that Terry knew her, loved her, and was responsive to her. While outside observation might do little to support this, the theoretical positions of both Brten and Trevarthen validate the possibility that Donna and Terry could share mutual attunement and emotions that might look very different to those outside the lived experience of autism. To outsiders Terry, according to Donna, simply looked very autistic and mentally deficient. This leads to the question, is there an additional theoretical perspective that can be used for analysis of Terrys situated actions that support intersubjectivity? 17.5 Socialness and mediated action Ochs et al. [5] in part based their distinctions of socialness on the works of Vygotsky [14] and scholars aligned with the fundamentals of his theory. One such scholar, Wertsch [14,15], used Vygotskys account of intermental and intramental functioning to differentiate between social as individuals interacting in an immediate context and social as the cultural, historical and institutional means by which an individual acts. He referred to this as mediated action since any action produced by an individual either in isolation or with others is inherently social because the material by which the action is accomplished is always a social and, thereby, a cultural derivative. Hagstrom [8] used Wertschs mediated action as well as his discussion of cultural tools [16] to organize a functional individual system (FIS) framework for working with clinical populations, including autism. Three interdependent systems, the physical, cultural and social, are understood as continuously influencing each other during the developmental process. Thus rather than focus simply on the physical differences associated with autism, teachers, social service workers and parents can think about how things created and passed down by people to people (cultural) when appropriated via interactions with others (social) become the way that any individual, including a child with autism, acts in and on his everyday world. This action, regardless of biological/neurological determinates remains social and cultural. In other words, it remains in Wertschs words, mediated action. Hagstrom [8], consistent with


F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity

Vygotskian and Wertschian theory, included three kinds of tools in the FIS framework: tangible, mental and animate. Tangible tools, as actual objects or artifacts, and mental tools, as symbolic, are widely discussed in the sociocultural literature. Animate tools have not received the same attention even though the notion that an individual might function as a tool for another person was considered by Vygotsky [17] and further elaborated by Zinchinko [18]. It is a particularly integral tool whose use is constrained by culture for children with various disabilities including autism. Donna when talking about her approaches to working with Terry as a school age child provides an example of mediated action. Each summer I work with Terry to keep him going, building skills, and most of all to decrease his self-abuse. He knows about rewards because I use a counter that clicks. Each click means that he has stayed with an activity. These can be as simple as continuing to eat a new food to using his picture communication board to answer questions. Donnas approach to working with her son reflected a cultural, historical and institutional paradigm that mediated her actions. Donna could not have used it nor could we as readers understand it outside of the cultural frame that organized her actions and expectations for responses. A tools analysis [8] of the following excerpt makes it clear that routines upon which activities were based as well as the reward system reflect values of a particular world (cultural) view. The journal entry does not reflect the use of animate tools, probably not because this form of assistance was not used but rather because noting it was not part of the treatment paradigm that organized the sessions. This is a journal entry for one mornings work: Used picture communication for requesting, 10 clicks; verbal imitation of basic vowels and consonants, 15 clicks; facial massage and blowing on party favors, 25 clicks. Used a five token card to go for a car ride. As can also be seen, documentation of social interaction between the participants or between the participants and the surrounding social world is missing from this activity description. The importance of social others in acquiring and using cultural tools is not part of the mind set used to organize these tasks or important to the documented results. The point being made here about cultural tools and mediated action becomes clearer by contrast. Six months after the journal entry above, Donna attended a workshop that stressed social interaction when working with children with autism. This is her first journal entry for Terry, now 14 years of age, following that workshop. Sunday will be six weeks that Ive been working with social interaction. It has been a blast rather than a struggle! He gets my undivided time for thirty minutes every night. What ever he is doing, I join in with. I dont make him do anything for that thirty minutes. In return, I enjoy my son and he gets a hug, a kiss and no struggle about going to bed.

F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity


This journal entry describes the general technique, which has as its goal recognizing the person with autism as first and foremost a person and then joining that persons activity rather than imposing a pre-established objective on the activity. This thirty minutes of interaction comes with a cultural history of therapeutic use that is far different from the earlier representation of parental intervention. The description of Terrys activities this time focus on social interactional skills, again reflecting how the perception of what is important has been shifted by the cultural (historical and institutional) frames of meaning. Here are some things we see Terry doing that are definitely a change (and all with no rewards or clicks). He is doing all kinds of different things at the park instead of just swinging. And hes approaching other kids. Maybe giving them a big fat kiss is an awkward way to say hello, but the fact that he is noticing other children is awesome! He sat on the front porch, just sat there, for 30 whole minutes, just relaxing without getting up or running away. He is starting to show jealousy of things his sister gets to do. For example, she sleeps with me and last night he arrived at my bedroom door with his blanket and pillow. He communicated the unfairness by bringing symbols of bedtime downstairs! It was a first, but did we ever let him try to communicate his way, using things his way? Considering cultural tools as the means by which individuals move through everyday life helps us better understand the needs and challenges of persons with disruptive communication, such as that associated with autism. First, understanding the tools used (i.e., tangible, symbolic, animate) allows for the adjustment of communication. Second, describing a person's use of the tools for social as interpersonal action and /or social as socio-cultural situated action reveals the network of meanings (i.e., cultural dimension) that are being deployed by participants in situated learning. And lastly, the selection, manipulation and social negotiation of communication via tools allow others to see the agency of a thinking self in action. 17.6 Cultural tools and the sense of self Tools are important in the everyday life as ways to attune to others. This can include what the individual brings to the communicative interchange as knowledge and skills as well as what is made available within such interactions by means of materials and the participants manipulations of these, which may well involve assistance of others. To be a thinking self in a particular moment is to act in that moment with an array of tools identifiable to at least one other person involved in the action. These tools are acquired from/within culture, and for the child and family with autism, within what over time has emerged as the working culture of the home. The identifiable commonalities are recognized as the home culture, which involves ongoing, dynamic interpersonal creation. As new tools are created in the service of shared engagement, culture at the local level changes. As culture changes, new tools are needed. As a result of this change, the tool kits of the individual may have similar kinds of tools (consistent with the notion of culture resulting from the


F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity

actions of people living and working in groups/communities) yet the kit has the possibility for constant expansion. Therefore, it can be said that ones own sense of self as well as the sense that others have of the individual as a self emerges out of communicative consistencies that are interpersonally woven together over time via action mediated by cultural tools. [19] This results in a persistent sense of self for the individual, even an individual with autism, as well as for those around him. This sense of self is an expectation frame for how a person will act with others or in a given situation. So it is important to do two things. One, theoretically recognize that the thinking of ones self or others as selves is a mediated construction which need not be linguistic. Two, and at the same time, accept that individuals achieve a pervasive sense of this self that guides actions and participation with others in everyday life, even when both interpersonal and socio-cultural situated socialness is violated. The notion of a sensed self is pivotal to the discussion of intersubjectivity and is fundamental to socialness. Specifically, because tools are vital to the ways we each negotiate everyday life, culture reflected in and through the use of tools is fundamental to the social perception of an individual as a persistent self. The forms (or tools) used by individuals during social interpersonal action permit others to sense the self behind the act even when this socialness in the course of a sociocultural situation violates conventions and expectations of comportment. This self action, need not be symbolic but can rather be mediated by cultural tools such as the enactment of shared routines or the negotiation of space or time via animate tools. Such forms of tool use for interpersonal action would be consistent with Brtens [1] mutual attunement and Trevarthens [4,12] broadened position on intersubjectivity. The appropriation of routines and the rhythm of the interactions associated with this becomes a communicative tool for individuals with disruptive communication and for the social others with whom they are communicatively engaged. In the course of typical development for a child who is neurodevelopmentally atypical, the building of predicable social interpersonal action depends on mutual attunement. This mutual attunement, which according to so many of the mothers of children with autism, is present at birth and during the early months of the infants life. Perhaps it is this that they use and build on to interpret, as animate tools, the actions and communications of their children to others as the children attempt to negotiate Ochs et al.s [5] second dimension of socialness. A last entry authored by Donna on an email distribution list is a fitting example of the public emergence of sense of self. Terrys reported acts reflect the social engagement he has had with his family and the use of social routines and actions as a tool. His situated action demonstrates attunement and mutual engagement with a new other that is reflective of continued changes in mediated intersubjectivity as an adolescence. Last night, here at my house, two of my amazing colleagues and I hosted our first "Supper Club with Autism". It's for teens with autism who use augmentative communication. Goal: socialization, attention, functional communication, cooking, serving, and cleaning skills, etc. It was scary because we weren't sure how the kids would take it. I was fairly certain Terry, my 15 year old with autism and who uses a Dynavox and some speech, would be alright...just a little hyper probably. But we were pretty uncertain about the other teen who this summer began learning to use

F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity


the Picture Exchange Communication System (PECS). This was a brand new experience for him, and we just wanted him to know how welcome he was and how much we wanted him there. But that wasn't going to be easy due to the nature of the disorder. Well, he was pretty scared when his mother first left, and he was escalating with sounds and signs of potential abusive behavior towards anyone who came near him. So we adults sat down and got quiet to minimize sensory stimulation while this kiddo soaked it all in. We hoped he'd modulate. He was still squealing pretty hard and you could tell he was stressed out. I don't know what the other adults were thinking...but at that moment I was thinking something like... "UUgh, and your stupid ideas! This is awful!" And then a miracle appeared in the form of two autistic boys. Terry, who had been off in his own world until that point, suddenly came out of nowhere and plopped himself down right beside (and slightly on top) of this other guy who was so upset. Terry put his arm around him and patted his head and face. He was trying to tell him it would all be okay!!! He was reaching out to comfort another human being! And he KNEW what it was like to feel scared when something is different from the routine. It was precious and it answered so many questions life poses. We all held our breath, wondering how the other teen, who can be prone to violent outbursts, would take Terry's physical gestures. And here is the beautiful part.... he seemed a little shocked, but then he appeared to understand and he calmed down and even smiled. It was no small wonder friends. If you know anything about autism you will realize that both boys climbed a giant mountain, one in the giving and one in the receiving of comfort. 17.7 Conclusions Certainly the goal of early and persistent education of children with autism is to maximize the possibility for them to have possible futures rather than stagnant lives. But because assessments and interventions are permeated with developmental perspectives that index specific child behaviors as representative of social engagement, other evidence of intersubjectivity may be over looked by decision making professionals. Therefore, understanding the essence of intersubjectivity rather than discrete behavioral manifestation becomes critical within the literature so a person-centered view will over time infiltrate scientific and cultural perspectives. The main points of this paper was to raise questions about present notions of the development of intersubjectivity when mutual attunement seems to disappear or be nonexistent; to theorize about the role of social others and cultural tools as mediators for constructing intersubjectivity; and to illustrate points of intersubjective achievement when cultural tools and social others as animate tools become the units of analysis [6, 8]. The work of the parent and the child when viewed from this perspective may shed light not only on differences inherent to the individual reaching to achieve socialness when their atypical development is typical (in a lived sense for them), but also on the necessity of mutual reaching for communication and willingness to be an animate tool in order to achieve intersubjectivity.


F. Hagstrom / Autism During Adolescence: Rethinking the Development of Intersubjectivity

The mother who fights institutionally situated socio-cultural action by maintaining that her child is a thinking, feeling individual can present little in the way of empirical data to support her position. And her fight is on two borders: interpersonally to push the childs continued development and social situationally to push for recognition of educational viability when few or none of the rules of comportment are realities for the child. Part of the claim in this paper is that intersubjectivity may be experienced differently by those involved in everyday situations and may not be observable if static ways of looking at intersubjectivity are imposed on the novelty of living with autism. Focusing on socialness from Ochs et als [5] two dimensions, the mutuality of emotional and communication engagement between mothers and their children [12], and the use of cultural tools as mediated action [8,14] can assist in examining intersubjectivity as a life-long journey for those with autism. 17.8 References
[1] S. Brten, The virtual other in infants minds and social feelings. In A. H. Wold (Ed.), The Dialogical Alternative, (pp. 77-97). Oslo: Scandinavian University Press, 1992. [2] C. Trevarthen, Communication and cooperation in early infancy: A description of primary intersubjectivity. In M. Bullowa (Ed.), Before Speech, (pp. 22-45). Cambridge: Cambridge University Press, 1979. [3] S. Baron-Cohen, H. Tager-Flusberg & D. Cohen (Eds.), Understanding other minds. Perspectives from developmental cognitive neurosciences 2nd edition, New York: Oxford University Press, 2000. [4] C. Trevarthen & K. J. Aitken, Infant intersubjectivity: research, theory, and clinical applications. Journal of Child Psychology and Psychiatry, 41 (1), 3-48, 2001. [5] E. Ochs, T. Kremer-Sadlik, K. G. Sirota & O. Solomon, Autism and the social world: an anthropological perspective. Discourse Studies, 147-183, 2004. [6] F. Hagstrom, The contribution of routine practices to communication and mind. In U. SatterlundLarsson (Ed.), Socio-cultural theory and methods: An anthology. Trollhattan: University Trollhattan/Uddevalla, 2001. [7] J. V.Wertsch, Mind as action. New York: Oxford University Press, 1998. [8] F. Hagstrom, Including identity in clinical practice. In F. Hagstrom & B. B. Shadden (Eds.) Topics in Language Disorders, 3 (3), 2005. [9] L. E. Berk, Infants, children, and adolescents 6th edition. Boston: Person Education, Inc., 2008. [10] F. F. Ferri, Ferris clinical advisor. Elsevier Mosby, Philadelphia, 2005. [11] D. Hymes, On communicative competence. In J. B. Pride & J. Homes (Eds.), Sociolinguistic, Harmondworth: Penquin, 269-285, 1972. [12] C. Trevarthen, An infants motives for speaking and thinking in the culture, In A. H. Wold (Ed.) The Dialogical Alternative, (pp.99-137). Oslo: Scandinavian University Press, 1992. [13] L. S. Vygotsky, Thinking and speech. In R. W. Rieber & A. S. Carton (Eds.), The collected works of L. S. Vygotsky (N. Minick, Trans.). New York: Plenum Press, 1987. [14] J. V. Wertsch, P. Tulviste & F. Hagstrom, A sociocultural approach to agency. In E. A. Forman, N. Minick & C. A. Stone (Eds.), Contexts for learning: Sociocultural dynamics in childrens development, (pp.336-356). New York: Oxford University Press, 1993. [15] J. V. Wertsch, Collective remembering, New York: Oxford University, 2002. [16] J. V.Wertsch & L. J. Rupert, The authority of cultural tools in a sociocultural approach to mediated agency. Cognition and Instruction, 11, (3 & 4), 227-239, 1994. [17] L.S. Vygotsky, Mind in society: The development of higher psychological processes. Cambridge: Harvard University Press, 1978. [18] V. P. Zinchenko, Vygotskys ideas about units of analsis for the analysis of mind. In J. V. Wertsch (Ed.), Culture, communication, and cognition: Vygotskian perspectives, (pp. 94-118). New York: Cambridge University Press, 1985. [19] B. B. Shadden, F. Hagstrom & P. Koski, Neurogentic communication disorders: Life stories and the narrative self. San Diego, CA: Plural Press, in press.

Enacting Intersubjectivity F. Morganti et al. (Eds.) IOS Press, 2008 2008 The authors. All rights reserved.


Andrn, M. Aschersleben, G. Bosco, F.M. Braten, S. Brinck, I. Carassa, A. Cole, J. Colombetti, M. Daum, M.M. De Jaegher, H. Di Paolo, E. Hagstrom, F. Keller, P.E. Leavens, D.A. Lindblom, J. 117 175 81 133 117 vii, 187 237 187 175 33 33 251 205 65 49 Morganti, F. Prinz, W. Racine, T.P. Rapinett, G. Rezzonico, G. Riva, G. Sinigaglia, C. Susswein, N. Tirassa, M. Tsakiris, M. Wereha, T.J. Ziemke, T. Zlatev, J. Zmyj, N. vii, 3, 187 165, 223 65 223 v vii, 97 17 65 81 149 65 49 117 175

This page intentionally left blank

This page intentionally left blank

This page intentionally left blank