You are on page 1of 67

Vol 8, Nos 3-4 (2013): Empirical Musicology Review

Special Issue on Music and Shape: Perception and Theory

Table of Contents
EDITOR'S NOTE
Introduction to Special Issue on Music and Shape: Perception and Theory
Daniel Leech-Wilkinson, Mats B. Küssner 161

ARTICLES
Shaping and Co-Shaping Forms of Vitality in Music: Beyond Cognitivist and Emotivist
Approaches to Musical Expressiveness
Jin Hyun Kim 162-173
Cross-Cultural Representations of Musical Shape
George Athanasopoulos, Nikki Moran 185-199
Tonality: The Shape of Affect
Mine Doğantan-Dack 208-218

COMMENTARIES
Response to Jin Hyun Kim - Dynamics of Musical Expression
Mine Doğantan-Dack 174-177
Empirical Aesthetics, Computational Cognitive Modeling, and Experimental Phenomenology:
Methodological remarks on “Shaping and Co-Shaping Forms of Vitality in Music: Beyond
Cognitivist and Emotivist Approaches to Musical Expressiveness” by Jin Hyun Kim
Uwe Seifert 178-184
Visual Representations of Music in Three Cultures: Commentary on Athanasopoulos and Moran
Siu-Lan Tan 200-203
Musical Objects, Cross-Domain Correspondences, and Cultural Choice: Commentary
on “Cross-Cultural Representations of Musical Shape” by George Athanasopoulos and Nikki Moran
Zohar Eitan 204-207
Tonality and the Cultural
Daniel Leech-Wilkinson 219-222
Shape Cognition and Temporal, Instrumental and Cognitive Constraints on Tonality.
Public Peer Review of “Tonality: The Shape of Affect” by Mine Doğantan-Dack.
Rolfe Inge Godøy 223-226
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Special Issue on Music and Shape


DANIEL LEECH-WILKINSON
King’s College London

MATS B. KÜSSNER
King’s College London

THIS is the third of the three-part volume on relationships between music and shape. The
earlier issues on the themes of Pedagogy and Performance (Vol. 8, No. 1), and Motion Shapes
(Vol. 8, No. 2) are complemented here by a focus on Perception and Theory. Topics addressed
include cross-cultural representations of musical shapes from the UK, Japan and Papua New
Guinea; the evolutionary origins of tonality as a system for the dynamic shaping of affect; and
how shaping and co-shaping of ‘forms of vitality’ in music gives rise to aesthetic experience.

161
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Shaping and Co-Shaping Forms of Vitality in Music:


Beyond Cognitivist and Emotivist Approaches to Musical
Expressiveness
JIN HYUN KIM
University of Oldenburg

ABSTRACT: Over the last three decades, there has been an increasing number of
empirical studies on how music conveys and induces emotional expressiveness,
revolving around both the longstanding discourse over compositional and performance
features related to recognized or felt emotions, and more recent interest in
(neuro)psychological mechanisms underlying emotions induced by music. However,
the question of how expressive forms of music are shaped and co-shaped within the
ongoing process of music-making and music perception has received little
investigation. This paper focuses on the expressive forms of music that the
developmental psychologist Daniel N. Stern refers to as ‘forms of vitality’, discussing
how they are (co)shaped and give rise to aesthetic experience of music. The aim is the
development of a theoretical framework allowing for a new research perspective on
musical expressiveness—taking into account the aesthetic experience of music—in
relation to the process of (co)shaping forms of vitality in music. Further, a hypothesis
for and methodologies of empirical research fitting into this theoretical framework are
considered, expanding the schema beyond cognitivist and emotivist approaches to
musical expressiveness.

Submitted 2013 January 27; accepted June 28.

KEYWORDS: forms of vitality, musical expressiveness, aesthetic experience of music

TO develop a theoretical framework for empirical investigation of musical expressiveness moving beyond
the question of how music represents or induces emotions—as characterized in cognitivist versus emotivist
approaches to musical expressiveness (Scherer & Zentner, 2001)—this paper considers expressive forms of
music as “forms of vitality.” The term comes from the developmental psychologist Daniel N. Stern; it is
used here to characterize musical expressiveness as fundamental to the experiencing of vitality, i.e. living
(Stern, 2010), rather than musical (performance) features conceived of as correlates of specific emotions
perceived or felt by listeners. According to Stern, forms of vitality, shaped dynamically, allow for the co-
shaping of the (aesthetic) experience of others’ expression—which can extend to an aesthetic object. Hence,
this discussion is directed toward the question of how and to what extent aesthetic experience of musical
expressiveness is related to the shaping and co-shaping of forms of vitality in music, and suggests
hypothetical empirical studies with the potential to cope with this question.

RESEARCH ON MUSIC AND EMOTION: SOME CRITICAL DISCUSSIONS

Recently, empirical research on music and emotion has flourished, with much interest in the topic in both
empirically and theoretically oriented scholarship. The topos of “music as a language of emotions,” dating
to the Romantic Era, has resurfaced in music research after the long 20th-century dominance of formalist
aesthetics; this renewed interest comes in conjunction with recent scientific focus on emotion, which has
long been disregarded in cognition research. In psychology-centered interdisciplinary research on emotion,
[1] music increasingly is used as a stimulus for empirical-experimental studies investigating the recognition
or induction of emotion.

162
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Cognitivist v. Emotivist Approaches: Object–Subject Dichotomy

Psychological and neuroscientific experiments on emotions related to music engage—through their


hypotheses and experimental designs—the cognitivist versus emotivist philosophical theories on musical
expressiveness recently debated among Anglo-American analytic philosophers (for a detailed discussion
see Davies, 2003; 2010). As a result, concepts of emotions perceived and felt by listeners—emotions
represented and induced through music—have become established in empirical-experimental research on
music and emotion (Juslin & Sloboda, 2001; 2010a).
Both cognitivist and emotivist philosophical theories on musical expressiveness—e.g. contour
theory (Peter Kivy), resemblance theory (Stephen Davies), and persona theory (Jerrold Levinson)—
approach musical expressiveness in terms of perceptual qualities, rather than treating that expressiveness as
the result of symbolization. The main difference between cognitivism and emotivism is their respective
conception of the source of experienced musical expressiveness. In cognitivism, musical expressiveness is
ascribed to musical features recognizable to listeners regardless of felt emotions that may or may not
accompany recognition of musical expressiveness. Here, music itself is conceived of as the source of
musical expressiveness. In contrast, emotivism emphasizes the emotions as felt by listeners and triggered
by musical features; accordingly, musical expressiveness cannot be experienced without felt emotions, and
therefore the subjective experience of musical expressiveness accompanied by emotions is the source of
that musical expressiveness.
The differing approaches reflect the opposing positions of objectivist and subjectivist theories on
aesthetic judgment and enjoyment as rooted in Western aesthetics. According to objectivist aesthetics,
objects, whether works of art or not, are a source of aesthetic judgment and enjoyment. For instance, the
Pythagorean theory of proportions teaches that an object composed proportionally is aesthetic due to its
objective properties. Subjectivist aesthetics rather views the aesthetic subject as the source of aesthetic
judgment and enjoyment, as in Shaftesbury’s theory of taste claiming that aesthetic beauty is only in the
mind: an object is not beautiful by virtue of its properties, but rather because the mind that forms this object
is itself beautiful (Cooper, 2001, pp. 225-226).
This separation between object and subject of aesthetic perception and appreciation, fundamental
to cognitivist versus emotivist theories on musical expressiveness, is reflected in the opposing concepts of
perceived versus felt emotions as commonly applied in empirical research on music and emotion.
According to the chosen approach, the experimental question becomes either: 1) which kind of emotion is
represented through music/perceived during listening to music, or 2) which kind of emotion is induced
through music/felt during listening to music. The experimental participants must consciously distinguish
two kinds of emotional processing taking place in relation to musical stimuli: recognition of represented
emotion, and feeling of induced emotion. In some studies, experimental participants are asked to identify
one or the other; others require that they consciously attend to both.

Separation Between Activity and Passivity

In addition to the dichotomy between aesthetic object and subject, a separation between emotional
expression (activity) and emotion perception or arousal (passivity) is also present both in cognitivist and
emotivist aesthetic theories regarding musical expressiveness and in empirical research on music and
emotion. Emotional expression is ascribed to the composer’s or performer’s music-making, which brings
her/his emotions, or imaginings of another's emotions, into the work of music during composition or
performance. Listeners are meant to react passively to this music with emotions perceived or aroused by the
musical expression encoded in or performed into the corresponding score. Recognition or arousal of
emotions is hence attributed to the process of music perception.
As a result, music-making as a process of expressing emotions, and music listening as a process of
perceiving expressed emotions or emotion arousal, are treated as separate processes in which expressed and
perceived or aroused emotions are conceived of as communicable—although the question of what
mechanisms underlie this communication has not been investigated in much detail (Juslin, 2005).
Moreover, most empirical research using survey methods and measurements of neural or other
physiological activities has focused on the processes of music listening, as it is difficult to investigate the
ongoing processes of music-making using these methods; the methods for investigation intervene in the
subject of that investigation. Both the theoretical premise differentiating emotional expression on a
productive side from emotion perception or arousal on a receptive side, and the methodological constraints
of empirical research on music and emotion, have led to the dominance of studies on emotion perception or
arousal during music listening.

163
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Expressive Musical Features

Early empirical studies on music and emotion have focused on musical features considered correlates of
various categories of emotions, reported by and/or measured through experimental subjects as either
perceived or felt. Those musical features have been analyzed in composed music (cf. Gabrielsson &
Lindström, 2001; 2010) and musical performances; the musical features of the latter comprise not only
acoustic but also bodily-gestural features (Dahl & Friberg, 2004; 2007). Some studies (Gabrielsson &
Juslin, 1996; Gabrielsson & Lindström, 1995; Juslin, 1997) claim that music is emotionally expressive
because of the correlations between musical acoustic features and perceived or felt emotions, whether basic
or complex emotions:[2] for instance, in the correlations between happiness and musical features such as a
fast mean tempo, high sound level, small tempo and sound level variability, bright timbre, fast tone attack,
sharp contrasts in duration, and rising microintonation, among others (cf. Juslin & Timmers, 2010, Figure
17.2).
It is, however, worth noting that several musical features identified as correlated with certain
emotions are dynamic forms, encompassing temporal fluctuations (e.g. limited or broad timing variability,
final ritardando) or changes of pitch and intensity (e.g. rising pitch contour, broad volume variability),
rather than static forms. However, the question of how musical expressiveness emerges in the course of
music-making and music perception is rarely addressed.[3] Instead, most studies have focused on the
relationship between the musical features of a short excerpt from a larger piece of music and emotions
induced through or perceived in that excerpt as measured or reported at a given moment.[4] To propose
hypotheses for empirical studies investigating the dynamic nature of musical expressiveness, it therefore
appears necessary to develop a theoretical framework providing a new research perspective on musical
expressiveness.

EXPRESSIVE FORMS OF MUSIC: FORMS OF VITALITY

In its attention to dynamic expressive forms of music that can be supposed as related to musicians’ or
listeners’ experiences,[5] the concept of “forms of vitality” recently introduced by the developmental
psychologist Daniel N. Stern deserves consideration. Stern claims that the experience of vitality—in the
context of psychology, as related to the experience of being alive, and applied to music, dance, theater and
cinema—is grounded “in physical action and traceable mental operations” and “inherent in the act of [both
physical and mental] movement” (Stern, 2010, p. 9). This grounding creates a sense of time, space, force,
and directionality in the human mind (Stern, 2010, p. 4). In turn, the experience of vitality serves as a basis
for experiencing others’ vitality, regardless of whether others’ movements are observed or not.
The experience of vitality is referred to as what Stern (1985) calls “vitality affects”, such as a
sense of surging, fading away, sudden explosiveness, and fleetingness. Stern (2010, p. 4) contends that not
only sentient beings but also time-based arts allow for the experience of vitality, and therefore move us.
Five dynamic events, the “fundamental dynamic pentad,” give rise to the experience of vitality: movement,
force, time, space, and directionality; as a whole, these result in a “Gestalt” (Stern, 2010, pp. 4-5). For Stern
(2010, p. 5), vitality is a Gestalt that “emerges from the theoretically separate experiences of movement,
force, time, space, and intention”. Hence, it is illogical to analyze vitality in terms of each separate element.
Rather, forms of vitality are conceived of as being immediately grasped from the fundamental dynamic
pentad; for instance, dynamic forms of vitality such as exploding or fading can be comprehended in terms
of a graph consisting of a time axis and an intensity (force) axis. Forms of vitality are concerned with the
“how” rather than the “what” of felt experiences; regardless of the content—whether emotions, thoughts or
actions—that felt experiences may have, specific kinds of experience are constituted through dynamic
forms of vitality. According to Stern (2010), forms of vitality are “part of episodic memories” (p. 11), and
serve as “the most fundamental of all felt experience” (p. 8) when dealing with one’s own movements and
those of others; this takes place in the experience of time-based arts.
Interestingly, some aspects of the shaping of music related to mental experience have already been
discussed in detail, by the musicologist Friedrich von Hausegger at the end of the 19th century and the
music pedagogue Alexander Truslit in the beginning of the 20th century. Delineating the causal relation
between bodily aroused states and vocalizations, Hausegger discusses dynamic forms of sound, which are
experienced as an expression of mental states, in his seminal monograph “Music as Expression (Die Musik
als Ausdruck)” (1887). A shaped sound depends on retaining the body’s aroused state, which, rather than
allowing the affected muscle to remain in consistent contraction, makes it move toward periodic activity;

164
Empirical Musicology Review Vol. 8, No. 3-4, 2013

this influences the resulting duration, intensity, height, and depth and temporal order of vocalization
(Hausegger, 1887, pp. 10-11, 62). Hausegger contends that shaped vocal sounds are not only experienced as
expressions of others’ aroused states, but also give rise to the “co-sense (Mitempfindung)” of arousal (p.
42). He also considers this kind of phenomenon in the context of non-sentient phenomena such as music
and dance.
In the monograph “Shaping and Movement in Music (Gestaltung und Bewegung in der Musik)”,
Truslit (1938) tackles the coupled relationship between the shaping of music and musical experience. The
shaping of music is regarded as fundamental to the musical experience, which takes place during both
music-making and music perception; the latter is characterized by the listeners’ “co-shaping (mitgestalten)”
of music (Truslit, 1938, p. 20) through their inward experience of movement (p. 27). Basing the shaping of
a sound on its duration and intensity, Truslit conceives of movement as the primordial element being
shaped. Movement in music is shaped by dynamics—gradations of sound intensity changing the volume of
sound as perceived—in conjunction with agogics—temporal changes of sound causing its deceleration or
acceleration within the given overall temporal structure—resulting together in spatio-temporal contours of
music. According to Truslit, dynamics and agogics act as fundamentals of the process of musical shaping.
Truslit’s “dynamo-agogics” can therefore be regarded as analogous to Stern’s “forms of vitality” as applied
to music.
Common to Stern, Hausegger, and Truslit, whose considerations are representatively characterized
by “forms of vitality,” is that 1) a shape of expression has a spatio-temporal dynamic form of movement,
which consists of different elements yet is irreducible; 2) this expressive shape or its shaping is not
decoupled from the experience of expression, which is understood in a broader sense than emotions; 3) that
shape/shaping gives rise to its covert co-shaping by others, which leads to a co-sense of experience among
the group; and 4) points 1, 2, and 3 are also valid for a non-animate phenomenon, such as music. This raises
the question of to what extent the concept of “forms of vitality” can contribute to research on musical
expressiveness. Crucially, this concept allows for an investigation of the way (“how”) the experience of
musical expressiveness is coupled with the shaping of expressive forms of music, whereas most recent
studies on music and emotion have focused on the content (“what”) of musical expressiveness—for
instance, emotions, whether basic (Fritz et al., 2009; Juslin & Laukka, 2004) or complex (Scherer, 2004;
Zentner, Grandjean, & Scherer, 2008)—and/or the mechanisms (“why”) of emotions induced by music
(Juslin & Västfjäll, 2008; Juslin, Västfjäll, & Lundqvist, 2010). Furthermore, expressive musical features
can be considered as their holistic dynamic forms, rather than as the sum of each separable element.
Understanding “musical expressiveness” in the sense of the experience of expression—in
conjunction with forms of vitality in music—shifts the focus of research on musical expressiveness to the
aesthetic experience of music, a special form of musical experience not yet investigated in much detail
within the scope of music and emotion research.[6] In the following section, discussion is directed toward
the question of how the aesthetic experience of music can be addressed in relation to the dynamic nature of
musical expressiveness being shaped and co-shaped in the course of music-making and music perception.

AESTHETIC EXPERIENCE OF MUSIC

According to John Dewey, an American philosopher and psychologist, an experience becomes an aesthetic
experience when: 1) “the material experienced runs its course to fulfillment” (Dewey, 1934, p. 36),
reaching “an inclusive and fulfilling close” (p. 58) “through ordered and organized movement” (p. 40)
deemed “dynamic” (p. 57); and 2) such experience is rounded out into “a single coherent experience” (p.
56) with a satisfying emotional quality (cf. pp. 39-40). Dewey characterizes aesthetic experience as
“emotional,” yet does not consider emotions separate things labeled joy, sorrow, hope, etc. (cf. p. 43);
rather, he conceives of emotions as “qualities, when they are significant, of a complex experience that
moves and changes” (p. 43). In Dewey’s view of aesthetic experience, the dynamic quality of emotional
experience again comes into play, rather than the discrete categories of the experience’s content, which can
merely be recognized.
Dewey (1934, p. 48 ff.) examines the shaping of the artist’s experience, which he characterizes as
the interplay between doing and undergoing—a simultaneous moment of both retracing what has already
been done (artistically executed) and of anticipating what to do next. Hence, artistic experience,
traditionally considered an artist’s executive expression, is regarded as being coupled with the aesthetic
experience conventionally assigned to beholders of artworks. Based on this reciprocal and cumulative
relationship between doing and undergoing, an artist shapes and reshapes their production process toward
fulfillment or perfection, and in this way makes the experience provided by the work of art an aesthetic
experience. Such a union of doing and undergoing also underlies the perception and appreciation of a work

165
Empirical Musicology Review Vol. 8, No. 3-4, 2013

of art. According to Dewey (1934, p. 54), this kind of receptive act “accumulating toward objective
fulfillment” goes beyond passivity; he contends that, when perceiving and appreciating an artwork, the
beholder (re-)creates the artist’s experience of the process of dynamic organization in the work (cf. p. 56).
Taking into account Stern’s and Dewey’s considerations, research on musical expressiveness which
directs its focus toward aesthetic experience of music requires as a starting point an aesthetic theory doing
justice to the dynamic process of shaping the experience of expression, and to the process of co-shaping
that accompanies music perception and appreciation. The aesthetics of empathy (“Einfühlung”), developed
in the context of psychological aesthetics at the end of the 19th century, seems appropriate to this purpose,
especially in light of some recent neuroaesthetic approaches to investigating the neural mechanisms of the
reciprocal relationship between doing and undergoing that are supposed to underlie the empathic aesthetic
experience co-shaped while watching dance (Calvo-Merino, Glaser, Grezes, Passingham, & Haggard, 2005;
Calvo-Merino, Jola, Glaser, & Haggard, 2008; Jola, Abedian-Amiri, Kuppuswamy, Pollick, & Grosbas,
2012; Jola, Davis, & Haggard, 2011) and beholding the visual arts (Di Dio & Gallese, 2009; Freedberg &
Gallese, 2007; Gallese, 2010).

Aesthetics of Empathy (‘Einfühlung’)

“Einfühlung,” the original German term for empathy, was first used to denote an implicit and immediate
process related to an aesthetic object; later, it referred to an interpersonal process unmediated by the
rational mode of thinking. This is akin to Stern’s “forms of vitality”, as it can be used for both a person’s
behavior and artistic expressive forms. Moreover, “empathy” does not only refer to (shared) emotions, but
also to felt experience, either inter-subjectively shared or taking place between an aesthetic object and
subject; or, more precisely, experience felt into (“ein-fühlen”) others or an aesthetic object. Others’
expressive behavior or expressive forms of an aesthetic object, which can be characterized in terms of
Stern’s “forms of vitality,” are considered the source of empathy. Describing the mental process of
empathy, George W. Pigman claims that “I see an expression and begin to imitate it; the expression calls
forth in me the corresponding psychic experience; my psychic experience is then ‘felt into’ (‘eingefühlt’)
the expression” (Pigman, 1995, p. 242).
According to the aesthetics of empathy, an aesthetic object is perceived as expressive neither due
purely to its formal property—as contended in recent cognitivist theories on musical expressiveness—nor
based on the imagining of a virtual subject of expression behind this object—as claimed in Jerrold
Levinson’s persona theory (Levinson, 2006), an emotivist take on musical expressiveness. Rather, the
perception of expressiveness results from the beholders/listeners pouring themselves (including their mood)
into the aesthetic object while observing/listening to that object. Following Theodor Lipps (1900, p. 416,
translated by the author) “I am inwardly in or by an object, not somehow, but with my own personal
quality”; the “experienced quality of myself appears as a certainty of the aesthetic object” (p. 417). The
aesthetic perception and appreciation that exists within the beholder/listener is projected onto the object,
which is regarded as expressive to the extent that the beholder/listener feels (cf. Lipps, 1903). The aesthetic
object and one’s own feelings experienced in conjunction with this object cannot be strictly separated.
Moritz Geiger characterizes an (aesthetic) experience based on empathy as being unified between “the
other I and my own I” (Geiger, 1911, p. 37, translated by the author). According to his interpretation of the
term “Einfühlung,” the “I” observing an object and the “I” felt into the object do not exist separately; my
perceptions and feelings as triggered by the aesthetic object are not related to another person’s expression,
but rather to the objectified expression of myself, and more precisely to the correspondence between “the
objectified and ongoing (actual) personality” (Lipps, 1900, p. 420). This allows for a “unity of activity and
objectivity” (p. 422).
Lipps (1903) introduces, as a basic mechanism of aesthetic empathy, kinaesthetic simulation of
motor action, which he calls “inner doing” (p. 186) or “inner imitation” (p. 191)—more precisely, “my
experienced doing” (p. 187). This kind of action simulation is increasingly assumed to underlie the
observation of action executed by others, taking up recent findings on neural mirror mechanisms matching
action observation and execution (Rizzolatti & Sinigaglia, 2007). Kinaesthetic simulation of motor action is
used as a hypothesis in a considerable number of neuroscientific experiments on the understanding of
action, emotion, language, among others, and recently on (tactile) sensation (Keysers & Gazzola, 2009) and
aesthetic experience (Calvo-Merino et al., 2008; Di Dio & Gallese, 2009). The philosopher Gregory Currie
has very recently pointed out that the hypothesis of the simulative processes—more precisely, the processes
of motor simulation—underlying the (aesthetic) empathy which may not be accessible to consciousness,
may be the most significant point to be taken from Lippsian theory on aesthetic empathy (Currie, 2011).
Returning to the idea that expressive forms of music are conceived of as shaped forms of vitality giving rise

166
Empirical Musicology Review Vol. 8, No. 3-4, 2013

to the (aesthetic) experience of musical expressiveness, the simulative processes underlying aesthetic
empathy may also be assumed to be dynamic, i.e. changing in time, paralleling the process of co-shaping
experience. Hence, the kinaesthetic image accompanying the listener’s motor simulation—which underlies
this co-shaping process—can be considered as unfolding in the course of aesthetic perception and
appreciation of music, “tak[ing] shape in response to and in anticipation of the events” of a work of music
(Currie, 2011), being shaped dynamically toward fulfillment.

SUGGESTIONS FOR EMPIRICAL STUDIES ON MUSICAL EXPRESSIVENESS

Thus far this paper has established that expressive forms of music and the aesthetic experience of musical
expressiveness—including the latter’s underlying processes of kinaesthetic simulation—are closely related
to each other; each is dynamic, characterized as unifying shaping and co-shaping as well as action and
perception. This final section offers suggestions for empirical studies appropriate to the dynamic nature of
musical expressiveness, which cannot be grasped in its totality by either a cognitivist or emotivist approach.
Empirical research on musical expressiveness, directed toward the relationship between the (co)shaping
processes of forms of vitality in music, the aesthetic experience of musical expressiveness, and its
underlying simulative processes of motor reactions, should deal not only with the process of music
perception, but also with that of music-making.
Developing an empirical study able to cope with the complexity of this relationship first requires
discussion of methodological considerations. The simulative processes of motor reactions, assumed as
underlying the aesthetic experience of musical expressiveness, can be most thoroughly investigated using
neuroimaging methods to trace the neural networks involved in the music-making experienced by
musicians as aesthetically expressive; the possibility of measuring neural activation over mediofrontal
regions and the motor cortex is based on the results of current research on neural networks involved in
empathy (Decety, 2011). Neuroimaging techniques requiring the test participant to lie in a scanner,
however, are inappropriate to investigating the process of music-making: the confines of the scanner make
for unnatural bodily posture and limited movement; the noise in the scanner would prevent the participant
from concentrating on musical-auditory events. Electroencephalography (EEG) does not allow the musician
to move their heads freely, as it is difficult to remove motor artifacts from EEG signals. Functional near-
infrared spectroscopy (fNIRS) technology, as explored in a pilot study carried out by the author and
colleagues using NIRScout, a portable NIRS system, imposes minimal physical constraints on the
participant (when playing piano or a string instrument); nonetheless, current neuroimaging techniques
require many more improvements to be reliable tools for investigating the process of music-making. On the
whole, the question of what the dynamic processes of kinaesthetic simulation underlying the process of
musical shaping towards fulfillment (and the co-shaping process taking place in music perception as well)
look like can be most efficiently addressed by investigating the neurodynamic processes involved in music-
making and music perception; such an approach does more justice to the temporal process of (co)shaping a
piece of music than current structure- or function-oriented neuroscientific methods.
The issue of whether and how aesthetic experience can be made accessible by measuring the
neural activity or other bodily activities leads to further methodological considerations, concerning the
relationship between first-person and third-person data. As the first-person aesthetic experience is
conceived of as “subjective,” most experimental and empirical research at present has approached
phenomenal experience from the “objective” third-person perspective. The first-person perspective has
often been included into experimental design only “in a very weak sense” (Gallagher & Sørensen, 2006, p.
120)—namely, that while the participants are asked to report their experiences during the experiment,
vocally or by pushing buttons, in most cases they are also instructed to focus on experimental stimuli rather
than their experiences. This leaves the relationship between such experimentally induced first-person
perspectives, phenomenal experience, and third-person data an open question (Northoff & Heinzel, 2006).
An examination of the relationship between first- and third-person perspectives is conceived of as
only possible when based on first-person experiences; this requires the development of new first-person
methodologies. Developed in France beginning in the mid-1990s within the scope of a
neurophenomenological research program, neurophenomenology attempts to incorporate philosophical-
phenomenological approaches into neuroscientific research (Petitmengin, Baulac, & Navarro, 2006;
Petitmengin, Navarro, & Le Van Quyen, 2007; Lutz, Lachaux, Martinerie, & Varela, 2002; Varela, 1996),
exploring the possibility of mutual influence between experimental participants’ neural activities and the
structures of their first-person experiences. A technique of phenomenological interview (Petitmengin, 2006;
2007; 2011) offers access to ostensibly subjective experience correlated with neural activations, allowing
for the detection of preictal symptoms preceding epileptic seizures (Petitmengin et al., 2006). The current

167
Empirical Musicology Review Vol. 8, No. 3-4, 2013

research on first-person experience using the techniques of the phenomenological interview aims at
producing a model for the (generic) structures of experience, especially temporal structures, from interview
data (Petitmengin, 2006; 2011), which can then be incorporated into analysis of the (temporal) structures of
the third-person data of neural activities. Neurophenomenological approaches are well-suited for empirical
study of the relationship between the aesthetic experience of musical expressiveness emerging in the
process of music-making and its underlying simulative processes of motor reactions. While in-depth
interview techniques collect general descriptions of musicians’ experiences, including their beliefs and
opinions, the phenomenological interview techniques allow for retrospection directly related to the
aesthetic experience of musical expressiveness that has just taken place during the music-making; this is
recalled, for instance, in terms of specific perception modalities, bodily attention and conscious images, etc.
Furthermore, the process of music-making is not disrupted by a post-performance phenomenological
interview. The first-person descriptions of one subject are compared to those of other subjects, attempting to
find the generic (especially temporal) structures of experience, which are then related to—rather than
reduced to—the (especially temporal) structures analyzed from the third-person data of neural activities.
Developing a neurophenomenological study to investigate the aesthetic experience of music, however,
remains highly challenging, given that temporal structures from neuroimaging data can be analyzed most
efficiently when using a neurodynamic approach, whereas at present structure- and function-oriented
neuroscientific approaches are dominant. The current state of phenomenological interview techniques also
requires further development to investigate effectively the (generic temporal structures of) aesthetic
experience of music.
The next methodological step to be considered is whether and how the forms of vitality in music
being shaped in the process of music-making can be related to the structures of experience delineated by a
neurophenomenological study. Up until now, musical performance features have largely been analyzed in
relation to discrete emotional categories, and the dynamic expressive forms of music have rarely been
related to the (co)shaping process unifying doing and undergoing. With regard to the process of music
perception, several experiments call for the participants to realize this co-shaping process in an observable
or measurable way, e.g. by drawing a continuous line while listening to music (Godøy, Haga, & Jensenius,
2006; Küssner, Prior, Gold, & Leech-Wilkinson, 2012; Truslit, 1938). Taking into account the thesis
developed in this paper that forms of vitality in music shaped by a musician lead to their co-shaping by a
listener, the listeners’ co-shaping processes—in conjunction with continuous response methodology (cf.
Schubert, 2010) concerning musical expressiveness as experienced by listeners during the process of a
musician’s music-making—can be incorporated into the investigation of the process of music-making.
A hypothesis of the empirical study on musical expressiveness to be developed beyond a cognitivist
and emotivist approach might be—as in the author’s 2010 case study (Kim, Demey, Moelants, & Leman,
2010)—that musical performance by professional musicians would be neither an act of mere expression of
the performer’s or an imagined person’s emotions, nor an act of representation of emotions; rather, it would
be a process of going along with music, an empathic devotion of the self to the music in shaping its
production toward fulfillment or perfection, largely based on automatic processes available through
embodied knowledge of the piece of music and guided by action-perception loops. To test this hypothesis,
different conditions of music performance can be compared. In a case study, the author and colleagues
(Kim et al., 2010) used two conditions of music performance which may have an impact on the empathic
process underlying that performance: a sight-reading performance and a practised performance of a single
piece not previously performed by the experimental participants. To relate the aesthetic experience of
musical expressiveness to the process of shaping forms of vitality in music towards an inclusive and
fulfilling close, different conditions of music performance to be compared may include a musical passage
analyzed and interpreted by the musician as musically fulfilled and another, unfulfilled passage from the
same piece of music.
To understand the dynamic nature of musical expressiveness, further research might be directed to-
ward the role of episodic memories in the experience of musical expressiveness on the one hand and the
time- and anticipation-based aspect of the perception and apprehension of musical expressiveness on the
other. With regard to the latter, the musicologist and theorist Leonard B. Meyer’s seminal work “Emotion
and Meaning in Music” (1956) serves as a point of departure in terms of forms of vitality in music and
emotional experience. A more psychologically-based approach to expectation in music perception has
recently been discussed in the monograph “Sweet Anticipation: Music and the Psychology of
Expectation” (2006) by the cognitive musicologist David Huron. In cognitive neuroscience of music, a
recent experiment on intra-musical meaning has used hypotheses based on the expectation deriving from
inner musical logic (Koelsch, 2011), following previous similar approaches to musical syntax (Koelsch,
2009; Koelsch, Fritz, Schulze, Alsop, & Schlaug, 2005; Tillmann 2005; Tillmann, Janata, & Bharucha

168
Empirical Musicology Review Vol. 8, No. 3-4, 2013

2003). Empirical studies appropriate to addressing the ways in which the (aesthetic) experience of musical
expressiveness emerges in the process of (co)shaping forms of vitality in music can be designed by working
from these theories and the results of related empirical studies.

NOTES

[1] The disciplinary distribution of scholarship in research on music and emotion is well represented in the
two major volumes on this topic (Juslin & Sloboda, 2001; 2010a). See also several research institutes
focusing on emotion, such as the Geneva Emotion Research Group, part of a multidisciplinary centre for
the affective sciences at the University of Geneva/Switzerland, and the Cluster of Excellence “Languages
of Emotion” at the Freie Universität Berlin/Germany.

[2] Recent debate about what kind of emotions music induces has introduced the term "musical emotions,"
an abbreviation for emotions induced by music (Juslin & Laukka, 2004; Juslin & Sloboda, 2010a; Zentner,
Grandjean, & Scherer, 2008). Zentner, Grandjean, and Scherer (2008) claim that musical emotions consist
of emotional categories subtler than those of basic emotions, whereas Juslin and Laukka (2004) and Fritz et
al. (2009) hold that music induces basic emotions.

[3] For a detailed discussion see Kim (2010).

[4] In most neuroscientific experiments on music perception and cognition, the excerpts from a piece of
music used as stimuli are less than one minute in duration. This very limited duration is due to current
neuroimaging techniques and analysing methods, which are oriented toward neurocognitive structures or
functions rather than based on neurodynamic approaches. Empirical research on music and emotion has
only very recently taken up methodological considerations concerning the measurement of experimental
participants’ continuous emotional reactions to stimuli in behavioral experimental design (cf. Schubert,
2010).

[5] In the 1950s, the philosopher Susanne K. Langer contended that an isomorphism exists between the
structure of music and the structure of feeling, claiming that expressive forms of music should be conceived
of as forms of ‘feeling’—“in its broadest sense, meaning everything that can be felt, from physical
sensation, pain and comfort, excitement and repose, to the most complex emotions, intellectual tensions, or
steady feeling-tones of a conscious human life” (Langer, 1960 [1957], p. 249).

[6] Juslin and Sloboda (2010b) point out that investigating the aesthetic experience of music is a
desideratum in music and emotion research.

REFERENCES

Calvo-Merino, B., Glaser, D.E., Grezes, J., Passingham, R.E., & Haggard, P. (2005). Action observation
and acquired motor skills: An fMRI study with expert dancers. Cerebral Cortex, Vol. 15, No. 8, pp.
1243-1249.

Calvo-Merino, B., Jola, C., Glaser, D.E., & Haggard, P. (2008). Towards a sensorimotor aesthetics of
performing art. Consciousness and Cognition, Vol. 17, No. 3, pp. 911-922.

Cooper, A. (Third Earl of Shaftesbury) (2001). Characteristics of Men, Manners, Opinions, Times.
Indianapolis: Liberty Fund.

Currie, G. (2011). Empathy for objects. In: A. Coplan & P. Goldie (Eds.), Empathy: Philosophical and
Psychological Perspectives. Oxford: Oxford University Press, pp. 82-97.

Dahl, S., & Friberg, A. (2004). Expressiveness of musician's body movements in performances on
marimba. In: A. Camurri & G. Volpe (Eds.), Gesture-Based Communication in Human-Computer
Interaction: 5th International Gesture Workshop, GW 2003, Genova, Italy, April 15-17, 2003, Selected
Revised Papers. LNAI 2915 Berlin/Heidelberg: Springer, pp. 479-486.

169
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Dahl, S., & Friberg, A. (2007). Visual perception of expressiveness in musicians' body movements. Music
Perception, Vol. 24, No. 5, pp. 433-454.

Davies, S. (2003). Themes in the Philosophy of Music. Oxford: Oxford University Press.

Davies, S. (2010). Emotions expressed and aroused by music: Philosophical perspectives. In: P.N. Juslin &
J. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, and Applications. Oxford: Oxford
University Press, pp. 15-43.

Decety, J. (2011). Dissecting the neural mechanisms mediating empathy. Emotion Review, Vol. 3, No. 1, pp.
92-108.

Dewey, J. (1934). Art as Experience. New York: Penguin.

Di Dio, C., & Gallese, V. (2009). Neuroaesthetics: A review. Current Opinion in Neurobiology, Vol. 19, No.
6, pp. 682-687.

Freedberg, D., & Gallese, V. (2007). Motion, emotion, and empathy in esthetic experience. Trends in
Cognitive Sciences, Vol. 11, No. 5, pp. 197-203.

Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., Friederici, A.D., & Koelsch, S.
(2009). Universal recognition of three basic emotions in music. Current Biology, Vol. 19, No. 7, pp.
573-576.

Gabrielsson, A., & Juslin, P.N. (1996). Emotional expression in music performance: Between the
performer’s intention and the listener’s experience. Psychology of Music, Vol. 24, No. 1, pp. 68-91.

Gabrielsson, A., & Lindström, E. (2001). The influence of musical structure on emotional expression. In:
P.N. Juslin & J. Sloboda (Eds.), Music and Emotion: Theory and Research. Oxford: Oxford University
Press, pp. 223-248.

Gabrielsson, A., & Lindström, E. (2005). Emotional expression in synthesizer and sentograph performance.
Psychomusicology, Vol. 14, pp. 94-116.

Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions. In:
P.N. Juslin & J. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, and Applications.
Oxford: Oxford University Press, pp. 367-400.

Gallagher, S., & Sørensen, J.B. (2006). Experimenting with phenomenology. Consciousness and Cognition,
Vol. 15, No. 26, pp. 119-134.

Gallese, V. (2010). Mirror neurons and art. In: F. Bacci & D. Melcher (Eds.), Art and the Senses. Oxford:
Oxford University Press, pp. 441-449.

Geiger, M. (1911). Über das Wesen und die Bedeutung der Einfühlung. In: F. Schumann (Ed.), Bericht über
den IV. Kongreß für experimentelle Psychologie in Innsbruck vom 19. bis 22. April 1910. Leipzig: Barth,
pp. 29-73.

Godøy, R.I., Haga, E., & Jensenius, A.R. (2006). Exploring music-related gestures by sound-tracking: A
preliminary study. In: K. Ng (Ed.), Proceedings of the COST287-ConGAS 2nd International Symposium on
Gesture Interfaces for Multimedia Systems, 9-10 May 2006, Leeds, UK, pp. 27-33.

Hausegger, F.v. (1887). Die Musik als Ausdruck. Vienna: Carl Konegen.

Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation, Cambridge, MA: MIT
Press.

170
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Jola, C., Davis, A., & Haggard, P. (2011). Proprioceptive integration and body representation: insights into
dancers’ expertise. Experimental Brain Research, Vol. 213, Nos. 2-3, pp. 257-265.

Jola, C., Abedian-Amiri, A., Kuppuswamy, A., Pollick, F., & Grosbas, M.H. (2012). Motor simulation
without motor expertise: Enhanced corticospinal excitability in visually experienced dance spectators. PLoS
ONE, Vol. 7, No. 3, p. e33343.

Juslin, P.N. (1997). Emotional communication in music performance: A functionalist perspective and some
data. Music Perception, Vol. 14, No. 4, pp. 383-418.

Juslin, P.N. (2005). From mimesis to catharsis: Expression, perception, and induction of emotion in music.
In: R. MacDonald, D.J. Hargreaves, & D. Miell (Eds.), Musical Communication. Oxford: Oxford
University Press, pp. 85-115.

Juslin, P.N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and
a questionnaire study of everyday listening. Journal of New Music Research, Vol. 33, No. 3, pp. 217-238.

Juslin, P.N., & Sloboda, J.A. (Eds.) (2001). Music and Emotion: Theory and Research. Oxford: Oxford
University Press.

Juslin, P.N., & Sloboda, J.A. (Eds.) (2010a). Handbook of Music and Emotion: Theory, Research, and
Applications. Oxford: Oxford University Press.

Juslin, P.N., & Sloboda, J.A. (2010b). The past, present, future of music and emotion research. In: P.N.
Juslin & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, and Applications.
Oxford: Oxford University Press, pp. 933-955.

Juslin, P.N., & Timmers, R. (2010). Expression and communication of emotion in music performance. In:
P.N. Juslin & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, and Applications.
Oxford: Oxford University Press, pp. 453-489.

Juslin P.N., & Västfjäll, D. (2008). Emotional response to music: The need to consider underlying
mechanisms. Behavioral and Brain Sciences, Vol. 31, No. 5, pp. 559-575.

Juslin P.N., Västfjäll, D., & Lundqvist, L. (2010). How does music evoke emotions? Exploring the
underlying mechanisms. In: P.N. Juslin & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory,
Research, and Applications. Oxford: Oxford University Press, pp. 605-642.

Keysers, C., & Gazzola, V. (2009). Expanding the mirror: Vicarious activity for actions, emotions and
sensations. Current Opinion in Neurobiology, Vol. 19, No. 6, pp. 666-671.

Kim, J.H. (2010). Towards embodiment-based research on musical expressiveness. In: S. Flach, D.
Margulies, & J. Söffner (Eds.), Habitus in Habitat I: Emotion and Motion. Bern: Peter Lang, pp. 245-260.

Kim, J.H., Demey, M., Moelants, D., & Leman, M. (2010). Performance micro-gestures related to musical
expressiveness. In: In S.M. Demorest, S.J. Morrison, & P.S. Campbell (Eds.), Proceedings of the 11th
International Conference on Music Perception and Cognition. Seattle, Washington: Causal Productions, pp.
827-833.

Kivy. P. (2001). Introduction to a Philosophy of Music. Oxford: Clarendon Press.

Koelsch, S. (2009). Music-syntactic processing and auditory memory: similarities and differences between
ERAN and MMN. Psychophysiology, Vol. 46, No. 1, pp. 179-190.

Koelsch, S. (2011). Towards a neural basis of processing musical semantics. Physics of Life Reviews, Vol.
8, No. 2, pp. 89-105.

Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing music:

171
Empirical Musicology Review Vol. 8, No. 3-4, 2013

an fMRI study. NeuroImage, Vol. 25, No. 4, pp. 1068-1076.

Küssner, M.B., Prior, H.M., Gold, N.E., & Leech-Wilkinson, D. (2012). Getting the shapes "right" at the
expense of creativity? How musicians' and non-musicians' visualizations of sound differ. In: E.
Cambouropoulos, C. Tsougras, P. Mavromatis, & K. Pastiadis (Eds.), Proceedings of the 12th International
Conference on Music Perception and Cognition, 23-28 July 2012, Thessaloniki, Greece, p. 561.

Langer, S.K. (1960 [1957]). Expressiveness and symbolism. From Problems of Art (1957). In: M. Rader
(Ed.), A Modern Book of Esthetics: An Anthology (3rd Edition). New York: Holt, Rinehart and Winston, pp.
248-258.

Levinson, J. (2006). Musical expressiveness as hearability-as-expression. In: M. Kieran (Ed.),


Contemporary Debates in Aesthetics and the Philosophy of Art. Malden, MA/Oxford: Blackwell, pp.
192-204.

Lipps, T. (1900). Aesthetische Einfühlung. Zeitschrift für Psychologie und Physiologie der Sinnesorgane,
Vol. 22, pp. 415-450.

Lipps, T. (1903). Einfühlung, innere Nachahmung, und Organempfindungen. Archiv für die gesamte
Psychologie, Vol. 1, pp. 185-204.

Lutz, A., Lachaux, J.P., Martinerie, J., & Varela, F. (2002). Guiding the study of brain dynamics by using
first-person data: Synchrony patterns correlate with ongoing conscious states during a simple visual task.
Proceedings of the National Academy of Sciences, Vol. 99, No. 3, pp. 1586-1591.

Meyer, L.B. (1956). Emotion and Meaning in Music. Chicago/London: The University of Chicago Press.

Northoff. G., & Heinzel, A. (2006). First-Person Neuroscience: A new methodological approach for linking
mental and neuronal states. Philosophy, Ethics, and Humanities in Medicine, Vol. 1, No. 3.

Petitmengin, C. (2006). Describing one’s subjective experience in the second person: An interview method
for the science of consciousness. Phenomenology and Cognitive Science, Vol. 5, Nos. 3-4, pp. 229-269.

Petitmengin, C. (2007). Towards the source of thoughts: The gestural and transmodal dimension of lived
experience. Journal of Consciousness Study, Vol. 14, No. 3, pp. 54-82.

Petitmengin, C. (2011). Describing the experience of describing? The blind spot of introspection. Journal of
Consciousness Study, Vol. 18, No. 1, pp. 44-62.

Petitmengin, C., Baulac, M., & Navarro, V. (2006). Seizure anticipation: Are neurophenomenological
approaches able to detect preictal symptoms? Epilepsy & Behavior, Vol. 9, No. 2, pp. 298-306.

Petitmengin, C., Navarro, V., & Le Van Quyen, M. (2007). Anticipating seizure: Pre-reflective experience
at the center of neurophenomenology. Consciousness and Cognition, Vol. 16, No. 3, pp. 746-764.

Pigman, G.W. (1995). Freud and the history of empathy. The International Journal of Psycho-Analysis, Vol.
76, No. 2, pp. 237-256.

Rizzolatti, G., & Sinigaglia, C. (2007). Mirrors in the Brain: How Our Minds Share Actions, Emotions, and
Experience. Oxford: Oxford University Press.

Scherer, K.R. (2004). Which emotions can be induced by music? What are the underlying mechanisms?
And how can we measure them? Journal of New Music Research, Vol. 33, No. 3, pp. 239-251.

Scherer, K.R., & Zentner, M. R. (2001). Emotional effects of music: production rules. In: P.N. Juslin & J.A.
Sloboda (Eds.), Music and Emotion: Theory and Research. Oxford: Oxford University Press, pp. 361-387.

172
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Schubert, E. (2010). Continuous self-report methods. In: P.N. Juslin & J.A. Sloboda (Eds.), Handbook of
Music and Emotion: Theory, Research, and Applications. Oxford: Oxford University Press, pp. 223-253.

Stern, D.N. (1985). The Interpersonal World of the Infant. New York: Basic Books.

Stern, D.N. (2010). Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts,
Psychotherapy, and Development. New York: Oxford University Press.

Tillmann, B. (2005). Implicit investigations of tonal knowledge in nonmusician listeners. Annals of the New
York Academy of Science, Vol. 1060, pp. 100-110.

Tillmann, B., Janata, P., & Bharucha, J. (2003). Activation of the inferior frontal cortex in musical priming.
Cognitive Brain Research, Vol. 16, No. 2, pp. 145-161.

Truslit, A. (1938). Gestaltung und Bewegung in der Musik. Berlin: Chr. Friedrich Vieweg.

Varela, F.J. (1996). Neurophenomenology. Journal of Consciousness Study, Vol. 3, No. 4, pp. 330-349.

Zentner, M., Grandjean, D., & Scherer, K.R. (2008). Emotions evoked by the sound of music:
Characterization, classification, and measurement. Emotion, Vol. 8, No. 4, pp. 494-521.

173
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Response to Jin Hyun Kim


Dynamics of Musical Expression
MINE DOĞANTAN-DACK
Middlesex University, London, UK

ABSTRACT: This commentary focuses on three aspects of the experience of musical


expressiveness as a dynamically emergent phenomenon: the nature of the experience itself, the
nature of the dynamic forms of music, and the process of shaping music expressively in the act
of music-making.

Submitted 2013 May 15; accepted 2013 May 30.

KEYWORDS: musical expression, forms of vitality, musical movement, kinaesthesis

IN the wake of the collapse of the Cartesian philosophy of mind in Western thought, much progress has
been made in neuroscience over the last three decades in unravelling the nature and mechanisms of
mental phenomena. Arguably the most powerful premise to emerge from this research is the
inadequacy of dualistic conceptions that have severed the mind from the body, thinking from emotions,
and the self from the rest of the world in mainstream philosophical and psychological thought for the
greater part of the 20th century. Recent research on the human brain carried out through advanced
technological means provides overwhelming evidence to dissolve decisively these conceptual dualities,
and to support the hypotheses that ‘the mind’ as the site of mental phenomena is fully embodied
(Damasio, 2000; Edelman, 2007); that affective experiences including emotions constitute an integral
background for cognitive processes (Damasio, 1994, 2000; LeDoux, 1999: Sousa, 1987); and that
subjectivities are grounded in inter-subjective experiences (Bråten, 1998, 2009: Stern, 1985;
Trevarthen, 1993). Concomitant with these developments has been a general shift of epistemological
emphasis in contemporary psychology towards a concern with dynamic aspects of experience, such
that conscious experience is regarded as grounded in the kinesthetic awareness of the dynamic
processes unfolding in and with the body as it moves (Sheets-Johnstone, 2011; Stern, 1985). I therefore
welcome Jin Hyun Kim’s timely contribution which – in keeping with recent research in neuroscience
and psychology – proposes a dynamic view of musical expression as an emergent phenomenon rooted
in fundamental felt experiences associated with the movements of our own bodies and of other
dynamic events. In her article, Kim identifies these basic felt experiences as ‘forms of vitality’,
borrowing a notion from developmental psychologist Daniel Stern (2010).
In this commentary I should like to address three questions that are fundamental for Kim’s
proposed theoretical framework and also for research in musical expression. These are: 1) what does an
experience of musical expressiveness consist of?; 2) what exactly are forms of vitality in relation to
music and musical experiences?; and 3) what is involved in shaping music expressively? But first, a
note on the originality of Kim’s approach would be useful: while, to my knowledge, the conceptual
coupling of music making (performer) and perception (listener) in the emergence of the experience of
musical expressiveness has not been explored before, the individual components of the proposed
theoretical framework are not entirely novel in empirical music psychological research. For instance,
“how musical expressiveness emerges in the course of music-making”, i.e. as a dynamic process, which
is one of the central questions Kim poses, has been a core concern in research on expressive music
performance. Indeed, some of this research deals with the origins of the dynamic processes, e.g.
changing tempos and dynamic intensities, observed in the acoustical characteristics of musical
performances, positing the dynamics and constraints of motor actions as just such an origin (e.g. Todd,
1992). Leech-Wilkinson’s research on performance styles (2009) is also largely concerned with the
dynamic nature and features of expressive music making. The idea that listening is not a passive
reception of aural impressions but an active process involving bodily participation, which is another
aspect of the framework proposed by Kim, has also been previously articulated in research (Cox, 2011;
Cross, 2010; Godøy, 2003). Finally, while the introduction of the term “forms of vitality” into a
discussion of musical expressiveness is a novel feature of her approach, as Kim conceptually aligns
forms of vitality with the theory of dynamo-agogics proposed by Alexander Truslit, one can find
precedents in expressive performance research also for this kind of conceptualization of musical
expressiveness (e.g. Repp, 1996).
What does an experience of musical expressiveness consist of? I am very supportive of Kim’s
attempt to broaden the conceptual space of musical expressiveness beyond the expression of emotions,

174
Empirical Musicology Review Vol. 8, No. 3-4, 2013

understood as discrete experiential categories with labels such as sadness, happiness, fear, anger, etc.
Defined in this way, I do not think emotion is necessary for the creation and reception of
expressiveness in music, and psychological research has arguably put too much emphasis on emotions
in relation to other kinds of greatly varied experiential qualities that accompany the experience of
musical expressiveness. What is necessary, however, for anyone – performer or listener – to experience
music as expressive is a certain valorized felt affect, which does not necessarily lead to a named
emotion. Kim’s emphasis on the idea that music is regarded as expressive to the extent that the
individual experiences it in terms of – or, through the lens of – an affective component is thus very
apposite. For purposes of research, it is advisable not to treat musical expression as an ontological sub-
category of emotional expression, as this pushes a wide range of other experiential phenomena related
to musical expression that does not represent emotions out of the conceptual range of the researcher
from the start. I am not convinced, however, that the experience of musical expression needs to be
treated necessarily as an aesthetic experience, “a special form of musical experience not yet
investigated in much detail within the scope of music and emotion research” in Kim’s words. While
some experiences of music’s expressiveness can indeed have the characteristics of an aesthetic
experience as defined by John Dewey and adopted by Kim, it is not an empirically established fact, nor
is it conceptually necessary, that all experiences of musical expressiveness be aesthetic in nature: for
example, humans can react to a short, fleeting moment of music they hear through a valorized affective
response, where the “moment” thus experienced does not “reach an inclusive and fulfilling close” or
“accumulate toward objective fulfillment” as in Dewey’s theory of aesthetic experience. In reality,
there exists a great variety of listening practices that do not correspond to the normative or idealized
listening mode – “fixated listening” (Biddle, 2011) – promoted in musicology. What is needed is
extensive research on how different listening modes influence the reception as well as the creation of
expressiveness in musical practices. Perhaps the most serious issue regarding the theoretical framework
Kim proposes, however, is the lack of cultural considerations: research indicates that the
expressiveness of performed music is contingent on historical-cultural circumstances (Leech-
Wilkinson, 2009), and that there are learned, idiomatic affective responses that humans display when
engaging with music as part of their acquired cultural behaviour (e.g., Solís, 2004). We do not
automatically respond to “foreign” musical idioms through such learned characteristic responses.
Furthermore, descriptions people give of the effects of musical expressiveness are consistently
mediated through culture-specific values (e.g., Feld, 1990), and I, for one, would be interested in seeing
Kim’s proposed theoretical framework develop in this direction, i.e. to incorporate the culture-specific
evaluative responses that are an essential part of the ways people talk about their experiences of
musical expressiveness.
What exactly are forms of vitality in relation to music and musical experiences? In her
opening statement, Kim writes that her article “considers expressive forms of music as ‘forms of
vitality.’” Although the article does not make explicit where these forms inhere or whence they emerge,
one gleans from the text that “expressive forms of music are conceived of as shaped forms of vitality”,
implying that they emerge during the act of music making, i.e. in performance. According to Stern
(2010, p. 20), “concerning art, it is obvious that vitality dynamics is a fundamental aspect of
performance in the time-based arts”, yet the idea of forms of vitality as emergent in the act of music
making confronts us with a kind of complexity that is not evident, for example, in dance performance.
Given that forms of vitality are related to qualities associated with movement, the particular forms of
vitality shaped by a dancer as she moves, thus creating “the dance”, are one and the same as the
movements of the dancer’s body: in this regard, the dance and the dancer are one. In music, however,
there are two different kinds of forms of vitality that are shaped concurrently, yet are not identical:
those constituted by the movements of “the music”, i.e. dynamic forms that emerge as we perceive
tonal relationships unfold in time, and those constituted by our perception of the movements of the
performer’s body. We are far from understanding how exactly these two distinct dynamic sources of
musical expressiveness might be related, giving rise to a unified experiential Gestalt. To give one
example, in the Finale of Beethoven’s Piano Trio Op. 1 No. 3 in C minor, an expressively awe-
inspiring moment unfolds between measures 170-172: in measure 170, the music cadences in F major
with a crotchet-long root-position tonic chord in the piano, and an F in the violin. The piano part has a
rest in the rest of measure 170 and in measure 171, while the violin sustains the F from measure 170 to
the middle of measure 172, when the piano enters with an arpeggiated root-position D-flat major chord:
physically, the pianist is still, or poised through measure 171, i.e. does not display any notable external
movement between mm. 170 and 172, and the violinist is only moving to sustain one note – in terms of
the forms of vitality that are being shaped through the bodies of the performers, there is minimal
physical activity, and hence minimal movement, force or directionality displayed in physical space.
With the onset of measure 172, however, we realize that between measures 170 and 172, a magical
musical movement has taken place in pitch space, with a definite sense of force, temporal unfolding
and directionality, to a location that is not exactly very near: a movement embodied as a particular form
of vitality within a particular kind of tonal pitch space. The experience of the expressiveness of this

175
Empirical Musicology Review Vol. 8, No. 3-4, 2013

passage emerges from both kinds of forms of vitality unfolding simultaneously. Furthermore, it is not
difficult to see that due to the different gestural affordances of each kind of musical instrument, the
emergent forms of vitality giving rise to experiences of expressiveness would be different when a given
melody is played on a piano and on a flute, for example, even as the dynamic qualities of the melodic
movement might remain identical (I say “might” here since the same notes played on the same kind of
instrument can acquire different forms of vitality when played in different tempos, etc.). It is precisely
the dynamic relationship between our experience of the movements in physical space of humans
making music and the movements of music in tonal space that needs to be investigated and
incorporated into Kim’s theoretical framework if the concept of “forms of vitality” is to be applied
effectively in the context of musical expressiveness, and the mechanisms responsible for its emergence
are to be investigated. In other words, we need extensive research to understand what the forms of
vitality emerging from his/her bodily movements in making music feel like for the performer; how
these relate to the experience of emergent ‘musical’ forms of vitality generated by the experience of
tonal relationships; and how both of these relate to the listener’s felt experience of forms of vitality as a
unified Gestalt in an experience of musical expressiveness. Furthermore, since in different musical
cultures, pitch spaces are configured differently, creating different motional affordances, the cultural-
specificity of musical forms of vitality should be part of any research agenda in musical
expressiveness. In this connection, the distinction Sheets-Johnstone (2011, p. 305) makes between
“form values” and “animate values” could be a useful tool, since “unlike form values, in the strict sense
of morphology, [animate values] are (or can be) differently modified by culture.”
What is involved in shaping music expressively in the act of music-making? According to Kim,
the act of music-making, as well as the act of listening, both involve a process of “kinaesthetic
simulation”. It is not difficult to understand how the listener’s experience of expressiveness may
involve kinaesthetic images accompanying motor simulation: there is evidence in research, some of
which is also cited by Kim, suggesting that such a process indeed takes place in engaging with music as
listeners (Godøy, 2010; Leman, 2010). However, it is not entirely clear what it is exactly that the
performer simulates: is it the experienced dynamic relationships between the tones, i.e. musical
movement? If so, is there a one-to-one correspondence between musical movement and kinesthetic
movement? Such important questions are not addressed in Kim’s article, leaving the proposed
theoretical framework conceptually in need of refinement. Furthermore, researchers need to be wary of
drawing facile analogies between the activity of making music and that of listening: while listening
may indeed involve a certain co-shaping process, we are far from confidently asserting in research
terms that there is a substantial similarity in terms of the bodily and affectively felt qualities of the
forms of vitality as experienced by the person actively producing the musical sounds and the person
observing/listening to this process of shaping. It is not a clearly established fact that the intense
physicality of making music constitutes a difference only in degree in comparison to the physicality of
listening to music: performers, unlike listeners, come to know and represent the expressive details of
the music they play kinaesthetically. Information regarding expressiveness is imprinted in their
musculature, and committed to long-term memory, unlike in the case of non-performing listeners.
Hence, extensive research is needed to further articulate the exact nature of the co-shaping process Kim
puts forward as part of the emergent experience of musical expressiveness.
Jin Hyun Kim presents a theoretical approach to musical expressiveness that promises to go
beyond the mainstream conceptualization of the phenomenon as a subspecies of emotional expression.
Her orientation towards the inter-subjectively shared and understood forms of vitality as a basis for
exploring expressiveness in music is a novel approach. Empirical investigations of her hypotheses will
no doubt reveal much about the mysteries and magic of the universal human capacity for expressive
music-making.

REFERENCES

Biddle, I. (2011). Listening, consciousness, and the charm of the universal: What it feels like for a
Lacanian. In: D. Clarke & E. Clarke (Eds.), Music and Consciousness: Philosophical, Psychological,
and Cultural Perspectives. New York: Oxford University Press, pp. 65-78.

Bråten, S. (Ed.) (1998). Intersubjective Communication and Emotion in Early Ontogeny. Cambridge:
Cambridge University Press.

Bråten, S. (2009). The Intersubjective Mirror in Infant Learning and Evolution of Speech. Philadelphia
PA: John Benjamins Publishing.

Cox, A. (2011). Embodying music: Principles of mimetic hypothesis. Music Theory Online, Vol. 17,
No. 2. URL: www.mtosmt.org/issues/mto.11.17.2/mto.11.17.2.cox.html

176
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Cross, I. (2010). Listening as covert performance. Journal of the Royal Music Association, Vol. 135,
Supplement 1, pp. 67-77.

Damasio, A. (2000). The Feeling of What Happens: Body, Emotion and the Making of Consciousness.
London: Vintage.

Damasio, A. (1994). Descartes’ Error. New York: Harper Collins.

Edelman, G. (2007). Second Nature: Brain Science and Human Knowledge. London: Yale University
Press.

Feld, S. (1990). Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression.
Philadelphia: University of Pennsylvania Press.

Godøy, R.I. (2010). Gestural affordances of musical sound. In: R.I. Godøy & M. Leman (Eds.),
Musical Gestures: Sound, Movement and Meaning. New York: Routledge, pp. 103-125.

Godøy, R.I. (2003). Motor-mimetic music cognition. Leonardo, Vol. 36, No. 4, pp. 317-319.

LeDoux, J. (1999). The Emotional Brain: The Mysterious Underpinnings of Emotional Life. London:
Phoenix.

Leech-Wilkinson, D. (2009). The Changing Sound of Music: Approaches to Studying Recorded


Musical Performance. London: CHARM.
URL: http://www.charm.kcl.ac.uk/studies/chapters/intro.html

Leman, M. (2010). Music, gesture and the formation of embodied meaning. In: R.I. Godøy & M.
Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. New York: Routledge, pp. 126-153.

Repp, B. (1996). Dynamics of expressive piano performance: Schumann’s “Träumerei” revisited.


Journal of the Acoustical Society of America, Vol. 100, No. 1, pp. 641-650.

Sheets-Johnstone, M. (2011). The Primacy of Movement. Philadelphia: John Benjamins Publishing.

Solís, T. (Ed.) (2004). Performing Ethnomusicology: Teaching and Representation in World Music
Ensembles. Berkeley: University of California Press.

Sousa, R.D. (1987). The Rationality of Emotion. Cambridge, MA: MIT Press.

Stern, D. (1985). The Interpersonal World of the Infant: A View from Psychoanalysis and
Developmental Psychology. New York: Basic Books.

Stern, D. (2010). Forms of Vitality: Exploring Dynamic Experience in Psychology and the Arts.
Oxford: Oxford University Press.

Todd, N. (1992). The dynamics of dynamics: A model of musical expression. Journal of the Acoustical
Society of America, Vol. 91, No. 6, pp. 3540-3550.

Trevarthen, C. (1993). The self born in intersubjectivity: An infant communicating. In: U. Neisser
(Ed.), The Perceived Self: Ecological and Interpersonal Sources of Self-Knowledge. New York:
Cambridge University Press, pp. 121-173.

177
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Empirical Aesthetics, Computational Cognitive Modeling, and


Experimental Phenomenology:
Methodological remarks on “Shaping and Co-Shaping Forms
of Vitality in Music: Beyond Cognitivist and Emotivist
Approaches to Musical Expressiveness” by Jin Hyun Kim
UWE SEIFERT
Institute of Musicology, University of Cologne

ABSTRACT: The core ideas of the proposed framework for empirical aesthetics are
interpreted as focusing on processes, interaction, and phenomenological experience.
This commentary first touches on some methodological impediments to developing
theories of processing and interaction, and emphasizes the necessity of computational
cognitive modeling using robots to test the empirical adequacy of such theories.
Further, the importance of developing and integrating phenomenological methods into
current experimental research is stressed, using experimental phenomenology as
reference. Situated cognition, affective computing, human-robot interaction research,
computational cognitive modeling and social and cultural neuroscience are noted as
providing relevant insight into the empirical adequacy of current theories of cognitive
and emotional processing. In the near future these fields will have a stimulating impact
on empirical aesthetics and research on music and the mind.

Submitted 2013 June 14; accepted 2013 June 17.

KEYWORDS: situated cognition, affective computing, action-oriented approaches

INTRODUCTION

WITH “Shaping and Co-Shaping Forms of Vitality in Music: Beyond Cognitivist and Emotivist
Approaches to Musical Expressiveness” Dr. Jin Hyun Kim proposes a conceptual framework for empirical
research on aesthetics, in particular the aesthetics of music. The topic of the proposal accords with current
discourse on aesthetics, emotions, consciousness, and empathy in philosophy, psychology, and the
neurosciences (Bacci & Melcher, 2011; Coplan & Goldie, 2011; Schellekens & Goldie, 2011; Shimanura &
Palmer, 2012). It is also related to current research on music, meaning, gesture, movement (Godøy &
Leman, 2010; Gritten & King, 2011), and communicative musicality (Malloch & Trevarthen, 2009).
At present, meaningfulness of music is mainly discussed focusing on sound, emotions, and
aesthetic emotions (Koelsch, 2013; Sander, 2013; Silvia, 2005, 2009; Robinson, 2009, 2010;). Kim’s
analysis of current discussions in research on musical emotions and aesthetics in philosophy, psychology,
and cognitive neuroscience reveals a general empiricist stance associated with an epistemological subject-
object dualism leading to a methodological activity-passivity dichotomy in empirical research, in particular
in experimental design; in response to these epistemological and methodological problems, she offers a
framework extending such conceptualizations of music and aesthetics. The proposal’s core idea is that
musical expressiveness and aesthetic experience are shaped or formed during a continuous process of
realization, i.e. during actions of production and reception. Musical expressiveness, forms of vitality, and
shaping/co-shaping are introduced as key concepts. “Musical expressiveness” indicates that the proposed
approach to musical aesthetics encompasses appreciation as rooted in a subject, i.e. a person's phenomenal
experience, and an object, i.e. musical sound. “Forms of vitality” provide a means to reconcile the
epistemological subject and object of expressiveness, i.e. phenomenological experience and musical
sounds. The meaningfulness of aesthetic experience is created––shaped and co-shaped––by humans in a
process of continuous realization; the interactive processes that are the shaping and co-shaping of forms of
vitality constitute the basis for (re)creating meaningful aesthetic experiences leading to musical
expressiveness.

178
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Notably, in general, this framework is in accordance with genetic epistemology (Piaget,


1970/1983) and with a schema-based action-oriented approach to action, music, and language (Arbib, 2003,
2013). Furthermore, Kim proposes developing an approach to experimental research in
neurophenomenology that deals with aesthetic experience from a first-person perspective. The conception
of aesthetic experience based on shaping and co-shaping is a promising basis for the development of a
feasible empirical research strategy as well as ideas for corresponding experimental designs. In total, Kim’s
proposal entails research on interaction, processes, and phenomenological experience. To this end, a theory
of processing and interaction in connection with a theory of consciousness is necessary, which, to my mind,
presents three further requirements for empirical aesthetics: 1) Integration of computational cognitive
modeling in addition to the development of experimental methods for studying mental processes (Bower &
Clapper, 1989); 2) computational models of emotional processes related to music and aesthetics; and 3) a
methodology for phenomenology in empirical research and experiments.

DYNAMICS OF INTERACTION: EMPIRICAL ADEQUACY AND COMPUTATIONAL


COGNITIVE MODELING

Efforts at analyzing and explaining interactions and processes, as well as testing proposed models and
theories, require computational cognitive modeling (Sun, 2008). In terms of an action-oriented approach to
music and aesthetics, it is necessary to give empirical evidence for a functional architecture consisting of
schemas, and to strive for an explanation of their dynamics; those dynamics can be studied by investigating
processes, which are realized by mechanisms carrying out specific operations. These mechanisms are
biophysically observable phenomena implementing operations; operations––e.g., attention, empathy,
emotion, memory––are hypothetical constructs, theoretical concepts to be related to biophysical processes
or structures. Despite several theoretical problems concerning the explanation of the concept of
computation, these operations—as parts of a functional architecture and potential processes in time—are
currently best conceived of as (effective) procedures, i.e. mathematical and logical descriptions (of
empirical phenomena) subsumable under the concept of Turing-machine computability.
My claim, then, is that computational cognitive modeling using robots is necessary to test the
empirical adequacy of theories of processing and interaction. Computational cognitive modeling provides a
feasibility test for hypotheses on the realizability of processes and interactions implied by a certain model.
If computational modeling is used in connection with robots, the feasibility test becomes a test of empirical
adequacy––real-time exhibition of a task in our world. Successfully dancing a sarabande, for example,
could be a robot’s task in testing the empirical adequacy of a dynamic systems approach to attention and
rhythm perception in music (Large, 2010; Port, 2003) using computational cognitive modeling with a robot.
In general, successfully testing a model in a scenario involving robots is evidence of its real-world
realizability, and the model not only serves as a summary of empirical data with some predictive capacity
that relies only on plausibility arguments for its empirical relevance but becomes an empirically confirmed
hypothesis. Another more general advantage of computational cognitive modeling is that the metaphor
“cognition as computational process or computation” often used in (cognitive) psychology and the
neurosciences becomes a scientifically relevant concept providing theoretically and empirically convincing
explanations. In addition, computational cognitive modeling using robots provides a bridge to frameworks
for more rigorous analysis of theories and formal descriptions of processes and interactions (cf. Wells,
2006). These formal theories or concepts in turn might be used metaphorically as heuristics in empirical
research and their empirical adequacy tested by computational cognitive modeling.
As long as such functional architectures have not been successfully tested under real-world
conditions—specifically, implemented with robots or virtual agents interacting with our world—their
empirical adequacy is left unsettled and open to questioning of its scientific validity. Computational
cognitive modeling using robots is thus essential for testing the empirical adequacy of mathematical
models, explicit theories, and formal models of processes and interaction; to my mind, it is at present the
most appropriate method for testing the empirical adequacy of theories of processing and interaction, as
well as crucial in the development of such theories in the first place.

AESTHETICS: AFFECTIVE COMPUTING AND EMOTIONAL PROCESSES

In aesthetics and music––and cognition in general (Pessoa, 2009)––the role of emotions cannot be
neglected. In developing a theory of aesthetic or music appreciation which takes the dynamics of processes
and interactions into account, it is important to study emotions as modes of operations instead of entities

179
Empirical Musicology Review Vol. 8, No. 3-4, 2013

(Fellous, 2006). The existing important models of emotional processing, Scherer (2010) and Sander (2013),
provide hypotheses about functional architectures; however, the alleged interactions of their components
and time courses are merely metaphorical descriptions of the processes involved. Their empirical adequacy
as models or theories of processing thus needs to be tested in future research.
To get a clearer view of the dynamics of musical and aesthetic emotions, experimental research is
necessary, but insufficient by itself. Testing current theories based on conceptual analysis and empirical
evidence—in pursuit of developing ideas about the dynamics of aesthetic emotions as feelings, their role in
social interaction, and their cultural embedding—requires computational modeling of emotional processes.
Much work remains to be done towards this end, in general and in research on music in particular. In music
research, computational cognitive modeling is still in its infancy; recent surveys also reveal that
computational models of musical emotion processing are lacking (Eerola, 2012; Purwins, Grachten,
Herrera, Hazan, Marxer, & Serra, 2008; Purwins, Herrera, Grachten, Hazan, Marxer, & Serra, 2008;
Temperley, 2013). The complexity of the situation is further heightened because for emotional processing in
aesthetics and music, the role of phenomenological consciousness, social interaction, and cultural
situatedness needs to be taken into account.[1] A schema theoretic and action-oriented approach to
empirical aesthetics and music needs to be extended to research on social schemas to enable this (Seifert,
2011).
Several first proposals for principles or guidelines concerning computational modeling of emotion,
in particular emotion generation and effect, exist (Fellous & Ledoux, 2005; Gunes & Schuller, 2013;
Hudlicka, 2011; Marsella, Gratch, & Petta, 2010; Roesch, Korsten, Fragopanagos, Taylor, Grandjean, &
Sander, 2011). New insights into and stimulation of research on aesthetics and music are offered by action-
oriented approaches and situated cognition (Robbins & Aydede, 2009; Seifert, 2011; Seifert et al., 2013),
affective computing (Gökçay & Yildirim, 2011; Scherer, Bänziger, & Roesch, 2010), human-robot
interaction (Dautenhahn, 2007), and musical robotics (Solis & Ng, 2011)––or more generally, research on
social human-agent interaction, in particular in New Media Art and entertainment—in connection with
cultural and social cognitive neuroscience (Han, Northoff, Vogeley, Wexler, Kitayama, & Varnum, 2013) as
well as phenomenological approaches in experimental research (cf. Albertazzi, 2013).

AESTHETIC EXPERIENCE: PHENOMENOLOGICAL DESCRIPTIONS AND


EXPERIMENTAL RESEARCH

Empirical research on aesthetics, music, and emotion cannot avoid taking consciousness and experience,
i.e. a first-person perspective or the phenomenological mind (Jackendoff, 1987), into account (Clarke &
Clarke, 2011; Leman, 2008; Northoff, 2012; Shimanura & Palmer, 2012; Tsuchiya & Adolphs, 2007).
Although there is an enormous number of techniques available in the current methodology of psychology,
an explicit methodology of experiential descriptions to study consciousness and phenomenal experience is
lacking; in experimental and cognitive psychology and a fortiori in cognitive neuroscience there are almost
no methods taking the first-person perspective into consideration (Gadenne, 1996, p. 13).
Kim is well aware of such methodological desiderata and proposes a neurophenomenological
approach, which might offer the possibility of linking descriptions of phenomenological experiences with
neuroscientific data of their neural substrates. To advance methodological thinking and develop new
experimental designs in current music research and aesthetics, a discussion about introspection,
phenomenological methods and practical applications in experimental contexts seems necessary. Some
remarks on experimental phenomenology are here presented to point to possible topics for future
discussion, and to add ideas about the role of phenomenology in cognitive science and empirical research
(cf. Petitot, Varela, Pachoud, & Roy, 1999).
Rainer Mausfeld (2011; 2013) advocates experimental phenomenology in research on perception
and draws attention to the importance of careful phenomenological descriptions in investigating
fundamental principles of perception. Such descriptions should precede theoretical and experimental
considerations, and “achievements must be not contaminated with physical or physiological
considerations” (Mausfeld, 2013, p. 95). In accordance with Mausfeld and to ensure greater reliability of
introspective reports in neuroscientific investigations, Shallice and Cooper (2011, p. 443) propose initially
using phenomenological descriptions, followed by “correlat[ion] with cognitive neuroscience evidence.”
They conclude that the subject’s expertise is relevant to the task, and that degree of expertise “should
depend on the type of task.” This raises methodological questions for experimental studies investigating
aesthetic experience: What tasks might be relevant to investigate aesthetic experience, and thus should be
used in experimental design? Depending on the task, what degree of expertise should the subjects have?
How could that degree be determined? What are relevant experiences to focus on?

180
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Experimental phenomenology requires methodological reconsiderations about experimental


research (cf. Kuboy, 1999, p. 347). These reconsiderations might take phenomenological descriptions, such
as those of the (philosophical and psychological) phenomenology in the Husserlian tradition, into account
(Bischof, 2009; Dreyfus, 1982; Petitot et al., 1999). For example, such descriptions could be used to modify
or refine experimental and modeling studies. Considerations in experimental phenomenology can be found
in Gestalt or act psychology as developed by Carl Stumpf, Vittorio Benussi, Liliana Albertazzi, Giovanni
Vicario, and Paolo Bozzi, to name just a few. In empirical research investigating mental processes,
phenomenological reports might also be used to reconstruct the conceptual framework determining the
(cultural and social) constraints involved in structuring aesthetic processing. Then, such reports might offer
the possibility of discovering or at least giving some hints about relevant general structures, entities, or
operations involved in the functional architecture underlying actual mental processing.

CONCLUDING REMARKS

The previous remarks outline the challenging and highly complex problems in the study of empirical
aesthetics focusing on processes and interactions, and the consequent urgent need to develop appropriate
concepts, tools and empirical methods. New theoretical concepts and research methods for the acquisition
of relevant data and tools for analysis are required to advance empirical aesthetics conceptually and
empirically as the theory of processing and interaction. While advancing the field calls for an integration of
experimental research and computational cognitive modeling, such modeling of emotional processes in
aesthetics and music at present is lacking. Moreover, empirical aesthetics has to address methodological
problems concerning research on the phenomenal mind, such as phenomenal consciousness, second-order
consciousness, intentionality, and first-person perspective. As a result, computational modeling of
emotional processes is particularly essential.
Computational cognitive modeling, situated cognition, affective computing, musical robotics, and
human-robot interaction––in connection with experimental phenomenology, and social and cultural
neuroscience––will provide new data and insights and thus will, in the near future, have a stimulating
impact on empirical aesthetics, music research, and research on the (individual and collective) mind. A joint
endeavor of theorists, experimentalists, and modelers from different disciplines will be necessary to deal
with these challenges and possibilities.

NOTES

[1] For a brief discussion about the possibility of feelings in animals and animats, i.e. machines, cf. Cruse
(2009, pp. 185-190).

REFERENCES

Albertazzi, L. (Ed.) (2013). Handbook of Experimental Phenomenology: Visual Perception of Shape,


Space, and Appearance. Chichester: Wiley.

Arbib, M.A. (2003). Schema theory. In: M.A. Arbib (Ed.), The Handbook of Brain Theory and Neural
Networks. Cambridge, MA: MIT Press, pp. 993-998.

Arbib, M.A. (Ed.) (2013). Language, Music, and the Brain: A Mysterious Relationship. Cambridge, MA:
MIT Press.

Bacci, F., & Melcher, D. (Eds.) (2011). Art and the Senses. Oxford: Oxford University Press.

Bischof, N. (2009). Psychologie: Ein Grundkurs für Anspruchsvolle. Stuttgart: Kohlhammer.

Bower, G.H., & Clapper, J.P. (1989). Experimental methods in cognitive science. In: M.I. Posner (Ed.),
Foundations of Cognitive Science. Cambridge, MA: MIT Press, pp. 245-300.

Clarke, D., & Clarke, E. (Eds.) (2011). Music and consciousness: philosophical, psychological, and
cultural perspectives. Oxford: Oxford University Press.

181
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Coplan, A., & Goldie, P. (Eds.) (2011). Empathy: Philosophical and Psychological Perspectives. Oxford:
Oxford University Press.

Cruse, H. (2009). Neural Networks as Cybernetic Systems. 3rd and Revised Edition. Bielefeld: Brain, Minds
& Media.

Dautenhahn, K. (2007). Socially intelligent robots: dimensions of human-robot interaction. In: N. Emery,
N. Clayton, & C. Frith (Eds.), Social Intelligence: From Brain to Culture. Oxford: Oxford University Press,
pp. 313-351.

Dreyfus, H.L. (Ed.) (1982). Husserl, Intentionality, and Cognitive Science. Cambrigde, MA: MIT Press.

Eerola, T. (2012). Modeling listeners’ emotional response to music. Topics in Cognitive Science, Vol. 4,
No. 4, pp. 607-624.

Fellous, J.-M. (2006). A mechanistic view of the expression and experience of emotion in the arts (book
review of Deeper Than Reason: Emotion and Its Role in Literature, Music, and Arts by Jenefer Robinson,
Oxford: Oxford University Press, 2005). American Journal of Psychology, Vol. 119, No. 4, pp. 668-674.

Fellous, J.-M., & Ledoux, J.E. (2005). Toward basic principles for emotional processing: what the fearful
brain tells the robot. In: J.-M. Fellous, & M.A. Arbib (Eds.), Who Needs Emotions? The Brain Meets the
Robot. Oxford: Oxford University Press, pp. 79-115.

Gadenne, V. (1996). Bewußtsein, Kognition und Gehirn – Einführung in die Psychologie des
Bewußtseins. Bern: Hans Huber.

Godøy, R.I., & Leman, M. (Eds.) (2010). Musical Gestures: Sound, Movement, and Meaning. New York:
Routledge.

Gökçay, D., & Yildirim, G. (Eds.) (2011). Affective Computing and Interaction: Psychological, Cognitive
and Neuroscientific Perspectives. Hershey: Information Science Reference.

Gritten, A., & King, E. (Eds.) (2011). New Perspectives on Music and Gesture. Farnham: Ashgate.

Gunes, H., & Schuller, B. (2013). Categorical and dimensional affect analysis in continuous input: Current
trends and future directions. Image and Vision Computing, Vol. 31, No. 2, pp. 120-136.

Han, S., Northoff, G., Vogeley, K., Wexler, B.E., Kitayama, S., & Varnum, M.E.W. (2013). A cultural
neuroscience approach to the biosocial nature of the human brain. Annual Review of Psychology, Vol. 64,
pp. 335-359.

Hudlicka, E. (2011). Guidelines for designing computational models of emotion. International Journal of
Synthetic Emotions, Vol. 2, No. 1, pp. 26-79.

Jackendoff, R. (1987). Consciousness and the Computational Mind. Cambridge, MA: MIT Press.

Koelsch, S. (2013). Music and emotion. In: J. Armony & P. Vuilleumier (Eds.), The Cambridge Handbook
of Human Affective Neurocscience. Cambridge: Cambridge University Press, pp. 286-303.

Kubovy, M. (1999). Gestalt psychology. In: R.A. Wilson & F.C. Keil (Eds.), The MIT Encyclopedia of the
Cognitive Sciences. Cambridge, MA: MIT Press, pp. 346-349.

Large, E.C. (2010). Neurodynamics of music. In: M.R. Jones, R.R. Fay, & A.N. Popper (Eds.), Music
Perception. New York: Springer, pp. 201-232.

Leman, M. (2008). Embodied Music Cognition and Mediation Technology. Cambridge, MA: MIT Press.

182
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Malloch, S., & Trevarthen, C. (Eds.) (2009). Communicative Musicality: Exploring the Basis of Human
Companionship. Oxford: Oxford University Press.

Marsella, S., Gratch, J., & Petta, P. (2010). Computational models of emotion. In: K.R. Scherer, T.
Bänziger, & E. Roesch (Eds.), A Blueprint for Affective Computing: A Sourcebook and Manual. Oxford:
Oxford University Press, pp. 21-41.

Mausfeld, R. (2011). Intrinsic multiperspectivity: Conceptual forms and the functional architecture of the
perceptual system. In: W. Welsch, W.J. Singer, & A. Wunder (Eds.), Interdisciplinary Anthropology:
Continuing Evolution of Man. Berlin: Springer, pp. 19-54.

Mausfeld, R. (2013). The attribute of realness. In: L. Albertazzi (Ed.), Handbook of Experimental
Phenomenology: Visual Perception of Shape, Space, and Appearance. Chichester: Wiley.

Northoff, G. (2012). From emotions to consciousness: a neuro-phenomenal and neuro-relational approach.


Frontiers in Psychology: Emotion Science, Vol. 3, No. 303, pp. 1-17.

Pessoa, L. (2009). Cognition and emotion. Scholarpedia, Vol. 4, No. 1, p. 4567.

Petitot, J., Varela, F.J., Pachoud, B., & Roy, J.-M. (Eds.) (1999). Naturalizing Phenomenology: Issues in
Contemporary Phenomenology and Cognitive Science. Stanford, CA: Stanford University Press.

Piaget, J. (1970/1983). Piaget's theory. In: P. Mussen (Ed.), Handbook of Child Psychology, Vol. I. 4th
Edition. New York: Wiley.

Port, R.F. (2003). Dynamical systems hypothesis in cognitive science. In: L. Nadel (Ed.), Encyclopedia of
Cognitive Science. Hoboken, NJ: Wiley, pp. 1-6.

Purwins, H., Grachten, M., Herrera, P., Hazan, A., Marxer, R., & Serra, X. (2008). Computational models
of music perception and cognition II: Domain-specific music processing. Physics of Life Review, Vol. 5,
No. 3, pp. 169-182.

Purwins, H., Herrera, P., Grachten, M., Hazan, A., Marxer, R., & Serra, X. (2008). Computational models
of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Review,
Vol. 5, No. 3, pp. 151-168.

Robbins, P., & Aydede, M. (2009). A short primer on situated cognition. In: P. Robbins & M. Aydede (Eds.),
The Cambridge Handbook of Situated Cognition. Cambridge: Cambridge University Press, pp. 3-10.

Robinson, J. (2009). Aesthetic emotions (philosophical perspectives). In: D. Sander & K.R. Scherer (Eds.),
The Oxford Companion to Emotion and the Affective Sciences. Oxford: Oxford University Press, pp. 6-9.

Robinson, J. (2010). Emotional responses to music: What are they? How do they work? And are they
relevant to aesthetic appreciation? In: P. Goldie (Ed.), The Oxford Handbook of Philosophy of Emotion.
Oxford: Oxford University Press, pp. 651-680.

Roesch, E.B., Korsten, N., Fragopanagos, N.F., Taylor, J.G., Grandjean, D., & Sander, D. (2011).
Biological and computational constraints to psychological modeling of emotion. In: P. Petta, C. Pelachaud,
& R. Cowie (Eds.), Emotion-Oriented Systems: The Humaine Handbook. Berlin: Springer, pp. 47-62.

Sander, D. (2013). Models of emotion: the affective neuroscience approach. In: J. Armony & P. Vuilleumier
(Eds.), The Cambridge Handbook of Human Affective Neuroscience. Cambridge: Cambridge University
Press, pp. 5-54.

Schellekens, E., & Goldie, P. (Eds.) (2011). The Aesthetic Mind: Philosophy and Psychology. Oxford:
Oxford University Press.

183
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Scherer, K.R. (2010). The component process model: Architecture for a comprehensive computational
model of emergent emotion. In: K.R. Scherer, T. Bänziger & E.B. Roesch (Eds.), Blueprint for affective
computing: A sourcebook. Oxford: Oxford University Press, pp. 47-70.

Scherer, K.R., Bänziger, T., & Roesch, E.B. (Eds.) (2010). Blueprint for Affective Computing: A
Sourcebook. Oxford: Oxford University Press.

Seifert, U. (2011). Investigating the musical mind: situated cognition, artistic human-robot interaction
design, and cognitive musicology. Transhumanities, Vol. 4, No. 1, pp. 149-161.

Seifert, U., Verschure, P.F.M.J., Arbib, M.A., Cohen, A.J., Fogassi, L., Fritz, T., Kuperberg, G., Manzoli, J.,
& Rickard, N. (2013). Semantics of internal and external worlds. In: M.A. Arbib (Ed.), Language, Music
and the Brain: A Mysterious Relationship. Cambridge, MA: MIT Press.

Shallice, T., & Cooper, R.P. (2011). The Organization of Mind. Oxford: Oxford University Press.

Shimanura, A.P., & Palmer, S.E. (Eds.) (2012). Aesthetic Science: Connecting Minds, Brains, and
Experience. Oxford: Oxford University Press.

Silvia, P.J. (2005). Emotional responses to art: From collation and arousal to cognition and emotion. Review
of General Psychology, Vol. 9, No. 4, pp. 342-357.

Silvia, P.J. (2009). Aesthetic emotions (psychological perspectives). In: D. Sander & K.R. Scherer (Eds.),
The Oxford Companion to Emotion and the Affective Sciences. Oxford: Oxford University Press.

Solis, J., & Ng, K. (Eds.) (2011). Musical Robotics and Interactive Multimedia Systems. Berlin: Springer.

Sun, R. (Ed.) (2008). The Cambridge Handbook of Computational Psychology. Cambridge: Cambridge
University Press.

Temperley, D. (2013). Computational models of music cognition. In: D. Deutsch (Ed.), The Psychology of
Music. 3rd Edition. New York: Academic Press, pp. 327-368.

Tsuchiya, N., & Adolphs, R. (2007). Emotion and consciousness. Trends in Cognitive Sciences, Vol. 11, No.
4, pp. 158-167.

Wells, A.J. (2006). Rethinking Cognitive Computation: Turing and the Science of Mind. New York:
Palgrave.

184
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Cross-cultural representations of musical shape


GEORGE ATHANASOPOULOS
University of Edinburgh

NIKKI MORAN
University of Edinburgh

ABSTRACT: In cross-cultural research involving performers from distinct cultural


backgrounds (U.K., Japan, Papua New Guinea), we examined 75 musicians’
associations between musical sound and shape, and saw pronounced differences
between groups. Participants heard short stimuli varying in pitch contour and were
asked to represent these visually on paper, with the instruction that if another
community member saw the marks they should be able to connect them with the
sounds. Participants from the U.K. group produced consistent symbolic
representations, which involved depicting the passage of time from left-to-right.
Japanese participants unfamiliar with English language and western standard notation
provided responses comparable to the U.K. group’s. The majority opted to use a
horizontal timeline, whilst a minority of traditional Japanese musicians produced
unique responses with time represented vertically. The last group, a non-literate Papua
New Guinean tribe known as BenaBena, produced a majority of iconic responses
which did not follow the time versus pitch contour model, but highlighted musical
qualities other than the parameters intentionally varied in the investigation, focusing on
hue and loudness. The participants’ responses point to profoundly different ‘norms’ of
musical shape association, which may be linked to literacy and to the functional role of
music in a community.

Submitted 2013 February 20; accepted 2013 June 27.

KEYWORDS: music and shape, cross-cultural, comparative, notation, literacy

“Music confirms what is already there in society and culture, and adds nothing more than patterns of
sound.” Blacking (1973, p. 54).

INTRODUCTION

IN 1972, Deregowski posed an influential question when he asked: “Do pictures offer us a lingua franca
for inter-cultural communication?” (Deregowski, 1972, p. 82). The explanations offered by his cross-
cultural comparisons (and the studies whose evidence he reviewed) have since been critiqued for an
ethnocentricism that was prevalent in contemporary academia (Layton, 1981). But at the very least,
Deregowski demonstrated that the perception of three-dimensional objects is influenced by cultural
learning: that the depiction of perspective as one culture knows it is not the aesthetic preference of all
cultures. The theory of cultural relativity of shape runs counter to prototype theory, which holds that the
perception of certain ‘natural categories’ (Rosch, 1973) follows universal principles. Rosch’s pioneering
research on the perception of shape proposed the psychological primacy of particular categories—namely,
square, triangle and circle. However, results from a replication of Rosch’s original study by Robertson,
Davidoff and Shapiro (2002), with members of the North Namibian Himba tribe, demonstrated
idiosyncratic shape categorization, while a further study by Nisbett (2003) also supports the view that shape
perception is culturally contingent. Comparing responses from North American and Chinese participants,
Nisbett (2003) found differences in the way that the two groups described pictures, with the former group
attending to dominant foreground objects and the second group tending towards a more holistic description.

185
Empirical Musicology Review Vol. 8, No. 3-4, 2013

The cultural relativity of shape brings up fascinating questions with regard to musical
representation. The relationship between musical sound and shape is a complex matter. Musicians who are
enculturated in what can broadly be referred to as a western classical performance practice are so familiar
with musical notation that it is common to refer to the score as “the music”. In comparison, most culturally-
distinct musical practices rely far less on textual representation, if they rely on it at all. And yet, the
capacity to etch or otherwise make marks acting as signs is a defining characteristic of human culture.
Given the strong relationship of symbolic representation with literacy in all cultures (Schmandt-Besserat,
1992), the effect of the acquisition of literacy is an obvious factor that seems likely to influence an
individual’s mode of visual representation of musical sounds (Athanasopoulos, Moran, & Frith, 2011). In
this paper, however, we focus our discussion on the way in which culturally-specific perceptions of shape
and time may affect the representation of musical sound.

Theoretical Background

REPRESENTATION OF TIME

The spatial representation of time is not culturally equivalent. Space-time metaphors are highly
conventional, apparently dependent upon ‘time-moving’ or ‘ego-moving’ perceptions of the passing of time
(Gentner, Imai, & Boroditsky, 2002). However, all written languages must follow a particular direction on
the page, and this aspect seems to be linked to the representation of time in two dimensions (Fuhrman &
Boroditsky, 2010; Mitchell, 2004; Zwaan, 1965). In fact, evidence suggests that culturally specific spatial
representations inform judgements about time even in non-linguistic tasks (Boroditsky, 2001, 2011). For
example, Mandarin speakers appear to think of the passage of time as movement in a vertical direction
(top-to-bottom), while speakers of English as a first language seem more likely to conceive of a horizontal,
left-to-right movement. Similarly, Fuhrman and Boroditsky (2010) reported that Hebrew participants’
spatial representation of time was consistent with direction of writing, passing in a horizontal right-to-left
manner, while Zwaan (1965) noted that Dutch and Israeli participants located the ‘past’ on the left-hand
side and right-hand side of the page respectively. The visual representation of time in timelines also tends
to follow the cultural convention for written language (Mitchell, 2004). For example, Japanese comics, in
which the story traditionally flows in a vertical right-to-left manner, are mirrored horizontally before
translations are printed for European and American markets (Farago, 2007).
In contrast to visual art, music takes place in and through time. In musical performance, time is
apportioned and manipulated; time can be said to be the universal raw material of musical process
(Blacking, 1973, pp. 105-111). The decisions taken by an individual in order to create a visual
representation of musical sounds must, then, be informed not only by culturally specific ontologies of
music, but also by culturally specific notions of time.

MUSIC AND SHAPE

Ethnomusicologist Steven Feld conducted extensive research with the Papua New Guinean Kaluli,
documenting and interpreting the metaphorical relationships that underpin aesthetic concepts essential to
Kaluli song and poetics. Feld argues that Kaluli music theory—in terms of ”logical patterns of symbolic
material” (Feld, 1982, p. 16)—is inextricable from its function, namely ”to activate and bring forth
meaningful social relations through structural expression” (Feld, 1982, p. 16). His work reveals the extent
to which cultural organization and behaviour influence both the conceptualization of, and discourse on, the
abstraction of musical sound into symbolic patterns. Feld’s work provides a reminder that the concept of
music (and its theorisation) common to most empirical music research is not, in fact, a universal given.
However, specific acoustic parameters are commonly associated with certain visual metaphors, and
empirical research supports various cross-modal matches—for example, pitch height is commonly
associated with relative height on the vertical axis (Casasanto, Philips, & Boroditsky, 2003) and fast tempo
may be associated with fast movement (Eitan & Granot, 2006). In other investigations, loudness has also
been empirically associated with verticality (Eitan, Schupak, & Marks, 2010) and length (Carello,
Anderson, & Kunkler-Peck, 1998). However, as Walker (1987b) and Sadek (1987) indicate, these
mappings are not found cross-culturally. Using a forced-choice design, Walker (1987b) asked participants
to match various auditory stimuli to one of four visual metaphors, predicting associations between pitch-

186
Empirical Musicology Review Vol. 8, No. 3-4, 2013

height with vertical placement on an x–y axis; timbre with pattern-sign; loudness with size; and duration
with length represented horizontally across the x–y axis. Reponses were collected from six different groups
of participants, including four Canadian Indian groups, and two urban control groups comparing musically
competent and musically naïve participants. Walker (1987b) – and a replication study with Egyptian
participants by Sadek (1987) both indicated differences according to musical training and cultural
background. A questionnaire study conducted by Prior (2010) also indicated that background culture may
lead to different perceptions of musical shape. Küssner and Leech-Wilkinson (in press) found a link
between participants’ musical training and the visual representations that participants generated using a
real-time drawing paradigm.
The latter study aside, little research to date has examined what happens when adult participants
are given free rein to create visual representations of musical sounds. Undertaking just this task, Tan and
Kelly (2004) used whole musical compositions as stimuli, requesting musically trained and untrained US
college participants to “make any marks” in order to represent the sounds visually. The musically trained
participants demonstrated a tendency to create representations aligned on an x–y axis (henceforth referred
to as Cartesian), representing time in a horizontal, left-to-right fashion, and depicting aspects of the musical
surface (primarily pitch) on the vertical axis, in accordance with findings described elsewhere
(Athanasopoulos & Moran, 2012; Athanasopoulos, Moran, & Frith, 2011). However, Tan and Kelly (2004)
also found that musically untrained participants tended to provide pictorial representations through images
or pictures telling a story. Küssner (2013) obtained comparable results from British participants.
Similar to the method of Tan and Kelly (2004), Bamberger (2005) and Barrett (2005) designed
studies that permitted their participants to create notational systems of their own “invention” without being
restricted to the pitch versus time model which most organised notational systems follow. In these two
investigations, participants (mainly children) were asked to provide a means of invented notation for a
familiar tune so that one of their peers would be able to perform the tune, without further guidance from the
researcher. This resulted in a number of participants mimicking familiar notational models, while others
attempted to develop executive models of notation that fitted the task. These tended to prioritise aesthetic
concerns: for example, participants included performance guidelines for playing techniques, but
disregarded specific note duration values. In a separate free-drawing investigation with Japanese children as
participants (Adachi, 1997), a considerable proportion of responses included onomatopoeia and linguistic
script to depict information about attack rate.
Beyond the investigations mentioned above, other studies to have deployed free representation
techniques have focused on questions related to cognitive development, examining children’s responses
(Reybrouck, Verschaffel, & Lauwerier 2009; Verschaffel, Reybrouck, Janssens, & Van Dooren, 2010). As
far as we are aware, no existing cross-cultural research has used a free-drawing task with adult musicians. It
seems inevitable that cultural difference will result in varied depictions of time and shape in response to
musical sound stimuli, but in what ways? What is the effect of cultural background on an individual’s two-
dimensional representation of musical sound?

COMPARATIVE FREE-DRAWING STUDY


To respond to this question, this study compares the freely-drawn responses of performers from distinct
cultural backgrounds to musical sounds whose pitch contours were systematically varied.[1] Nettl (1985)
notes that western musical practices may transform traditional societies and may replace or modify existing
norms. Therefore, this study draws on evidence collected through remote fieldwork, in order to meet with
participants with the least exposure to western culture. In total, 75 individuals represented three culturally
distinct groups: Edinburgh, U.K. (March/April 2011); Tokyo and Kyoto, Japan (May/June 2010); and the
BenaBena villages in Papua New Guinea (August 2010).[2]
Japanese traditional music is distinctive from western musical culture in that it deploys various
elaborate notational systems that do not follow the standard Western notation format. Rather, the systems
are either alphabetic (where one symbol stands for one pitch) or executive (directions as to how to create a
sound, similar to guitar tablatures). Meanwhile, the BenaBena of Papua New Guinea do not use any written
system of communication—neither for verbal literacy, nor for musical notation—and are relatively
secluded from urbanisation.[3]
For these participant groups, the existing literature and research evidence point to the following
hypotheses:

187
Empirical Musicology Review Vol. 8, No. 3-4, 2013

• Individuals literate in western standard notation (WSN) or Japanese traditional notation (JTN) will
depict variations in pitch taking a Cartesian approach, using orthogonal axes to show time versus
pitch, thus representing sequential occurrence (Küssner & Leech-Wilkinson, in press; Tan &
Kelly, 2004).
• A substantial number of literate participants may deploy written words to describe sound events
(Adachi, 1997).
• In the absence of the common point of reference provided by general (or musical) literacy, non-
literate individuals will depict variation in pitch idiosyncratically with regard to the sequential
occurrence of events in time.
• As indicated by Feld (1982), symbolic representation of sound events may exist at a metaphorical
level in non-literate cultures: this metaphorical aspect may be reflected in an internally consistent
system of representation through visual responses.

Research Design

Using a quasi-experimental design, three participant groups heard twelve musical stimuli that varied in
pitch contour (Up, Down, Peak or Valley), as detailed in the Method section below, and drew responses in
which they depicted the presented sound events.
The groups were as well-matched as was possible in the circumstances of remote fieldwork, taking
account of approximate number, experience as a performing musician, and gender split. The age of
participants varied, but all participants were adults in this study and so acted as mature representatives of
their particular musical tradition. The very concept of a dictation exercise—which in some ways describes
the task—is very common when learning western music and WSN, but it does not exist in Japanese
traditional music, nor for the BenaBena. The groups cannot be considered to be matched with their
counterparts in terms of experience in representing sound visually. However, the open-ended nature of the
task was designed to compensate for some of these limitations, the payback being the richness and diversity
of the data.

DEVELOPMENT OF THE SOUND STIMULI

The sound stimuli were developed according to certain constraints, in order that the same stimuli could
reasonably be used for all three groups. Auditory stimuli using unfamiliar or ‘unmusical’ timbres or
patterns were likely to disrupt participants’ responses. A synthesized flute sound was selected to represent
pitch variations, as flutes in some form are common to all three musical cultures.
The musical examples consisted of very basic melodic patterns. Papua New Guinean music from
the Highland provinces does not follow elaborate melodic patterns other than the pentatonic scale,
favouring rhythmical complexity instead. Thus the pitch relations of the melodic examples were limited to
fourths, fifths and octaves and used simple contours: in our terms, either rising (Up), rising-falling (Peak),
falling-rising (Valley), or falling (Down), as illustrated in Figure 1.
The stimuli were developed using Sibelius 6 software (Avid, 2009), exported as MIDI files at a
tempo of 60 beats per minute, and produced using Digidesign Pro Tools 8 (DigiDesign, 2009). They were
recorded in MP3 format and replayed to the participants with a Samsung K5 MP3 player through the in-
built (slide-out) stereo speakers. Headphones were not used at all for any group, since they were likely to be
an obstacle to participation for the BenaBena cohort, who view all music-related activities as communal
and would not be familiar with using them. The participants were seated approximately one metre from the
sound source, and the stimuli were reproduced at 52 dB SPL. The British and Japanese groups participated
indoors, but by necessity the study took place outdoors with the BenaBena.

188
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Fig. 1. Test stimuli consisted of 12 sound sequences representing four different pitch contours: Up, Down,
Peak, and Valley. The sequences were presented in the same fixed, quasi-random order to participants in all
three groups.

Method
PARTICIPANTS

The first group (Group A) consisted of 25 musicians of British nationality and cultural background (mean
age = 23.5 years; SD = 4.2 years; range: 19-37 years, 10 males, 15 females; 21 right-handed, 4 left-handed).
The mean age for starting a musical instrument was 6.7 years (SD = 1.7 years, r: 5-11 years), and mean
duration of performing a musical instrument was 15.7 years (SD = 4.4 years, r: 11-32 years). All
participants were acquainted with WSN. 28% were also acquainted with guitar tablature, 20% with jazz
chord notation and 12% had performed or composed music using graphic scores. One participant (4%) also
knew a traditional notational system used to transcribe Irish folk melodies.
The second group (Group B) consisted of 24 Japanese musicians with minimal or no knowledge of
WSN (mean age = 47.6; SD = 26.4 years; r: 18-87 years; 11 males, 13 females; 23 right-handed, 1 left-
handed). The mean age for starting a musical instrument was 18.6 years (SD=12.3, r: 3-40 years), while the
mean duration of performing a musical instrument was 32.6 years (SD = 18.5 years, r: 8-63 years). All
participants were acquainted with a form of JTN, while 28% claimed to be in a position to recognize WSN
as a form of notation when they saw it but were unable to use it, and one participant (4%) was aware of the
existence of graphic scores, though he had never used one in performance. It should be noted that it is
practically impossible to find Japanese musicians with no exposure at all to WSN.
The third group (Group C) consisted of BenaBena tribesmen and women of the Eastern highlands
region of Papua New Guinea, unfamiliar with any literary or notational script. 26 musicians (mean age by
estimation = 57.1 years; SD = 10.5 years; r: 35-80 years; 15 males, 11 females; 26 right-handed, none left-
handed) participated in the investigation. Handedness of the participants was established by asking them
with which hand they used common farming tools amongst the Highlands of Papua New Guinea, such as
machetes or a cangkul (hoe). The mean age for starting a musical instrument (from self-estimated
responses) was 15.5 years (SD = 4.2 years, r: 10-25 years); while the mean age of performing a musical
instrument was 42.2 years (SD = 10.5 years, r: 23-62 years). The numbers may not be accurate; performers
might not have been actively performing music throughout this span. Participants provided responses for
their engagement in music-making and participation in “sing-sings” and traditional community ceremonies.
All but one participant reported having taken part in sing-sings since they were very young children.
Recruited participants were from the BenaBena tribe and came from six hamlets (Keni, Logo, Sifu, Opeks,
Siopeks, Moweto). It should be noted that music among the BenaBena tribe is a highly communal activity,

189
Empirical Musicology Review Vol. 8, No. 3-4, 2013

which does not separate active performers from a non-participating audience. Some individuals are
regarded as the best singers or dancers, and may be called out to perform in group activities involving
music and dance, but this does not exclude others from participating. Therefore, all BenaBena are
considered musicians for the purposes of this study.
The first author conducted this fieldwork and worked with Groups B and C with the assistance of
local translators. For Group B, translators were recruited with the assistance of the Kyoto City University of
the Arts and the Tokyo Geijutsu Daigaku. For Group C, the local schoolteacher, one tribe member who was
a college student at the University of Goroka, and the first author’s host, were all proficient in English and
assisted with translations. For more detail, see Athanasopoulos (2013).

PROCEDURE

Participants were exposed to 20 trials, of which 12 varied according to the four pitch contour categories
(Up, Down, Peak, and Valley: see Figure 1).[4] The remaining eight stimuli featured combinations of the
four categories presented here. Each sound event had a total duration of four seconds, and was repeated
after a four-second pause before the next trial. Participants could start drawing after the first stimulus
presentation. The overall time-limit to provide a visual response to the sound event was 16 seconds. If
participants were not able to provide a response, they proceeded to the next trial after this time elapsed and
the next trial began. In preparation, the order of presentation of all 20 stimuli was initially randomised; this
order was then held constant in its presentation to participants.
Participants were asked to represent the sound on paper in such a way that if another member of
their community saw their marks they should be able to connect them with the sound. Responses were
drawn on A4 graph paper using ball-point pens. Before data collection began, participants were offered up
to four trial runs using three sound stimuli drawn randomly from the database. On average, participants
made use of two of the potential four trials.
The BenaBena participants were not accustomed to the particular fine-motor skills associated with
holding a pen and providing responses on paper. In order to communicate the concept behind the task it
was necessary to draw a comparison between the acts of etching (with which all individuals were familiar)
and drawing. As preparation for the study, the group were invited to draw any image from their
surrounding environment on plain white paper using thick coloured markers, and then with ball-point pens.
After the free-drawing investigation took place, all participants were interviewed about their experience of
the task so as to record information that could contribute to the interpretation of their drawn responses.
The participants’ responses were classified according to three categories, based on their method of
internal organisation and reference, and on the apparent representation of events in time (i.e. use of
Cartesian pitch-versus-time axes) (Athanasopoulos, Moran, & Frith, 2011; Küssner & Leech-Wilkinson, in
press; Tan & Kelly, 2004):
• Symbolic Cartesian (SC). Reference to sound events through abstract symbols. Pitch contour
and time represented spatially on orthogonal axes. In this system, one symbol relates to
another, and need not signify anything beyond this internal system. For example, note heads in
WCN are abstract, symbolic representations of the sound event.
• Iconic Cartesian (IC). Reference to sound events through drawings that attempt to imitate the
event in some respect as stand-alone icons. As opposed to abstract symbols, icons (in this
classification) attempt to indicate directly and analogously some aspect of the sound event for
which they stand. Icons and time represented spatially on orthogonal axes.
• Iconic (I). Reference to sound events through drawings that attempt to imitate the event in
some respect as stand-alone icons.
Further to these classifications, responses were separated according to their method of representing
events in time:
• Left-to-Right (L-R), imitating WSN and script.
• Top-to-Bottom (T-B), imitating a majority of JTN systems and script.
• Neither L-R nor T-B.
Classification was carried out by the first author in consultation with two independent raters.
Where no majority decision was reached, the first author’s decision was taken forward. The classification
terms were not disclosed to participants, who drew their responses freely without such instruction.

190
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Results

Responses were classified according to the apparent internal organisation as described above, as either SC,
IC or I. Cartesian responses were further classified as either L-R (left to right), T-B (top to bottom), or as
neither L-R nor T-B.
The results for each individual were examined to find the degree of reliability in their
representational approach, both within contour-type and across all the stimuli. Results were then compared
across groups. The classifications are illustrated below in Figures 2, 3 and 4, showing examples of each
category.

Fig. 2. Examples of three different Symbolic Cartesian (SC) systems: i. Drawn by a participant from Group
A (British); ii, iii. Drawn by participants from Group B (Japanese). For i and ii, time is represented
horizontally L-R, and pitch variation is shown through vertical variation. No elements of WSN have been
used. Example iii represents the passing of time using a vertical axis, T-B. Pitch variations are represented
by the inclinations of the strokes.

191
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Fig. 3. Examples of Iconic Cartesian (IC) representation by a British participant from Group A (iv), and by
a Japanese participant (Group B) (v). Although the drawings follow Cartesian representation (L-R), they
are not internally consistent, and are therefore classed here as Iconic Cartesian (IC), not Symbolic Cartesian
(SC).

Fig. 4. This example (vi) illustrates the category of Iconic (I). The BenaBena participant has used this
single image to represent all pitch stimuli (Up, Down, Peak and Valley). The depiction of the sound events
does not involve a timeline. According to the participant at the follow-up interview, all of the flute sounds
were represented by the lines drawn above.

RELIABILITY OF RESPONSES

According to the prescribed categories, individuals were entirely consistent in their own approach to the
depiction of the sound events, both within and between contour types. For all groups (British, Japanese and
BenaBena), individual participants maintained one method of representation throughout: see Table 1.

192
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Table 1. Individual participants’ preferred style of representation for the four different types of pitch
contour: Symbolic Cartesian (SC), Iconic Cartesian (IC) or Iconic (I). NB: Responses were entirely
congruent for the three different stimuli used within each contour group.
Group A (n=25) Group B (n=24) Group C (n=26)
Contour SC IC I SC IC I SC IC I
Up 24 1 0 19 4 1 4 3 19
Down 24 1 0 19 4 1 4 3 19
Peak 24 1 0 19 4 1 4 3 19
Valley 24 1 0 19 4 1 4 3 19

All but one of the British participants (24 out of 25) and the majority of Japanese participants (19
out of 24) used a consistent symbolic mode of representation of the pitch stimuli, presenting abstract
symbols on a timeline. Of the British participants, one-quarter employed WSN in some way: three
participants provided responses entirely through WSN, and a further three incorporated some WSN
elements. Two of the British participants used written text directions in addition to their drawn response.
One Japanese participant apparently experienced a dilemma with a small number of his responses,
which he subsequently crossed out. Had these been included in the analysis, these would have been classed
as iconic. The reason he gave for changing his mind was that since the directives were to represent the
sound so that if another member of their community saw them they should be able to connect them with the
sound, his responses would not be clear.
The BenaBena participants provided the largest variety across the three categories of response. The
majority of responses (19 out of 26) were classified as iconic, using stand-alone icons depicting sound
events and no obvious (linear) portrayal of time. However, four responses were classed as consistently
symbolic and used a timeline, and three were classed as consistently iconic and also used a timeline.
Drawing on the post-test interview data, the nineteen iconic responses could be further separated into two
categories. Twelve responses (slightly less than half of all the BenaBena responses) showed some internal
consistency as a system: in terms of organisation, participants attempted to draw iconic representations of
the sound event’s source (a flute), rather than the sound event itself. They deemed pitch variations to be
unimportant; rather, according to self-reports from the post-task interviews, they focused on elements of
timbre or perceived variations in loudness (though we attempted to minimize the latter by keeping the
amplitude constant (Athanasopoulos, 2013, p. 238). The remaining seven iconic respondents
(approximately one-quarter of all BenaBena participants) did not appear to demonstrate this level of
organisation.

DIRECTION OF TIME IN THE RESPONSES

All Cartesian (SC or IC) responses are set out in Table 2 below. Of the British participants (Group A), all
used an SC approach to represent events in time. The unanimously preferred timeline was horizontal L-R.
Regarding the traditional Japanese group (Group B), two participants who provided SC responses opted for
a vertical representation of time. One Japanese participant provided responses that did not follow Cartesian
representation.

Table 2. Directionality of time as represented by SC and IC responses by group. Direction is either Left-to-
Right (L-R) or Top-to-Bottom (T-B).
Direction of time Group A: WSN Group B: JTN Group C: None
Left-to-Right (L-R) 25 21 7
Top-to-Bottom (T-B) 0 2 0

Finally, the BenaBena (Group C) presented the majority of their responses through iconicity and
not symbolism, and without apparently depicting the sound events as occurring in time. However, seven
respondents did deploy a Cartesian system and all chose to use a horizontal timeline.

193
Empirical Musicology Review Vol. 8, No. 3-4, 2013

DISCUSSION
Several points arise in relation to the initial hypotheses. These are addressed in turn, followed by a more
expansive discussion of the results.
• Individuals literate in WSN or JTN will depict variations in pitch taking a ‘Cartesian’ approach,
using orthogonal axes to show time versus pitch, thus representing sequential occurrence (Tan &
Kelly, 2004; Küssner & Leech-Wilkinson, in press).
Regardless of cultural background, almost all literate participants in this study provided systems of
representation characterised by a Cartesian (orthogonal) depiction of time versus pitch in a free-drawing
paradigm, supporting this hypothesis. The one literate participant who did not do so is a Noh/Kabuki
singer-actor. In the subsequent interview, she reported that she felt the stimuli had a kinetic quality, and she
held that her resulting representations aimed at demonstrating this, regardless of whether they lacked
clarity. Within the categories used for classification in this study, her responses were deemed iconic, but it
is arguable that since she consciously deviated from the explicit directions for the task, her response should
be disregarded.
• A substantial number of literate participants may deploy written words to describe sound events
(Adachi, 1997).
A minority of the literate participants—two out of forty-nine—provided responses using text
(words). They only used the words to create directives that accompanied their (SC) drawings, contradicting
this hypothesis.
• In the absence of the common point of reference provided by general (or musical) literacy, non-
literate individuals will depict variation in pitch idiosyncratically with regard to the sequential
occurrence of events in time.
As expected, the majority of participants unfamiliar with notational systems provided iconic responses and
without a timeline, supporting this hypothesis, in accordance with Tan and Kelly’s (2004) findings
regarding participants unfamiliar with musical notation. Tempered with the reports from the participant
interviews, however, the findings from this study suggest that nearly half of those participants who created
idiosyncratic systems did aim to create internally consistent representations.
• Since symbolic representation of sound events may exist at a metaphorical level in non-literate
cultures as indicated by Feld (1982), this may be reflected in an internally consistent method of
representation through visual responses.
Following the discussion of the previous point, non-literate individuals did not deploy internally
consistent symbolic systems in the manner of their literate counterparts, but still produced a considerable
proportion of internally consistent iconic responses (12 out of 26 participants). We discuss this further
below.

Further findings

Other findings beyond these predictions revealed that all participants, regardless of cultural background,
literacy or type of musical training, were able to provide coherent responses with varying levels of
organisation in a free-drawing investigation; and regardless of group, all participants who completed these
tasks were highly consistent in their manner of depicting the sound events and variations.
All musicians were able to provide invented notational systems in order to depict sound. This
suggests that everyone has the ability to associate musical sounds with some form of visual representation.
Although many participants were sceptical about the value of the task presented to them, all (with three
exceptions due to mistranslation) were able to provide graphic representations of sound, even where the
idea to represent sound visually was an alien concept, as was the case with the BenaBena.
The resulting representations are surprisingly similar between groups of different cultural
backgrounds, due to the wide adoption of the SC method of representation. A large number of the entire
participant population (regardless of origin or type of musical training) tended to represent sound in a
linear, left-to-right axial representation resembling analogue notational systems, with time located
horizontally. The linear representation of music may have its roots in literacy, since the latter provides
participants with a timeline of reference and an axis on which to put responses. However, this does not
account for the small number of BenaBena who used the IC mode of representation, depicting the passage
of time horizontally. One could argue that this minority may have been responding to what they thought

194
Empirical Musicology Review Vol. 8, No. 3-4, 2013

that the Western investigator perceived as ‘correct’, mimicking a script that they may have seen elsewhere
(perhaps even by watching the hands of the investigator as he took occasional notes). An alternative theory
may examine the idea of cross-cultural tendencies towards the linear representation of musical sounds.
Further investigation is required in this area. From the results of this study, then, the conclusion should be
drawn that the visual representation of music is surprisingly consistent across cultures, and that Cartesian
representation is the dominant mode.
However, we return to the notable proportion of responses which were iconic in nature and did not
make use of a linear timeline. The majority of participants who did not follow the left-to-right horizontal
path of representation were the non-literate BenaBena, who opted for an abstract pictorial method which
did not depict time. Additionally, we have the responses of the two Japanese master musicians unfamiliar
with WSN, whose interviews suggest that they are ideologically opposed to what they consider a
threatening expansion of western culture (Athanasopoulos, 2013, pp. 167-169). This suggests that
sociological factors may well influence—or overrule—participants’ first responses to the task. The
responses and interview with the BenaBena thus provide a unique insight into the opposing perspective,
with the example of a non-literate community’s first encounter with a novel form of musical
communication. (This was a benign encounter whose potential consequences the researcher considered and
attempted to mitigate (see Ethical Considerations in Athanasopoulos, 2013)).
Responses from the United Kingdom and Japan classified as iconic were idiosyncratic in nature
and represent a minority in relation to the number of symbolic responses within the same groups. The
BenaBena’s pictorial approach was more predictable. We attribute the consistency of their responses to
their approach to the task. First, after the initial introduction of the idea to the BenaBena, the participants
would discuss the notion of sound representation among themselves trying to reach some sort of consensus
before the task took place. Though they were not permitted to look at each other’s responses, participants
would openly discuss their answers and debate the appropriateness of their responses. This mode of
collaborative engagement was as open to Groups A and B as it was to C, but it only took place—it only
seemed to be required—for the BenaBena.
Subsequent interviews revealed that the participants were not attempting to indicate variations
within the stimuli, but rather attempted to adhere strictly to the instruction given to them: “Produce
responses so that if another member of the community saw them they would be able to link them to the
sound events”. Thus, they completed the task for its communicative function. Usually after discussion with
their peers, they invented–collaboratively–an iconic method of representation appropriate for the task.
Participants attempted to indicate that a flute was playing through iconic representation of the sound,
without attempting to indicate specific variations between the sound events (see Figure 4). When
questioned if all the sound events were the same (regardless of contour variation), the participant
responded, “Yes, it is still a flute playing” (Athanasopoulos, 2013, p. 239). When asked about perceived
differences regarding the values of specific sound events that followed a pattern (up, down, peak or valley)
the participant replied by asking why these differences should matter, since it is always a flute playing,
regardless of these variations (Athanasopoulos, 2013, p. 239). This particular style of response was, in fact,
replicated throughout further sections of the task (attack rate and duration), whose results are not discussed
in this particular article. For example, in order to depict variations in attack rate, participants deployed
circles to stand for drums, while lines next to or on the circles indicated the action of hitting the drum, or
the number of strokes (Athanasopoulos & Moran, 2012). Small variations in the length of lines next to the
circles indicated perceived variations in loudness.

CONCLUSION

At the moment of its creation, the iconic representation of music for a communicative function must follow
society’s norms. Conceivably, this may in certain circumstances gradually give rise to symbolic systems.
But in order for such musical communication to take place in the first instance, members of a community
must reach agreement on the salient aspects of what is being perceived. The BenaBena participants’ debates
made the task possible by creating a relevant context for the functional (communicative) task in hand. They
produced responses which attempted to signal common points of reference without recourse to an existing
bank of symbols or through the structure of Cartesian organisation. The BenaBena’s purely functional
approach to the task highlights the specific nature of WSN, whose role in the established canon of Western
art music–and relationship with the concept of the musical work–embodies both ideological and aesthetic

195
Empirical Musicology Review Vol. 8, No. 3-4, 2013

concerns (Goehr, 2007). Whilst WSN is ubiquitous, we are reminded that the underlying principles with
which it is associated are not universal.
On the topic of universal expression and comprehension of musical meaning, there is little
agreement amongst scholars in ethnomusicology, music psychology or in semiotics. In this study, three
culturally diverse groups of participants provided empirical data with which to contemplate one small
aspect of the topic, using a free-drawing paradigm. The results point to literacy as the most significant
factor in predicting the type of representation used to depict musical sounds. Literate participants
unfamiliar with music notation still tended to deploy a timeline to mark the progression of sounds in time.
The responses of the non-literate BenaBena participants, who were also non-music-notational, were most
striking in their distinctive approach to musical shape association. Despite the unfamiliarity of the task, the
participants quickly defined appropriate iconic references to meet the communication criteria of the task.
These culturally appropriate references focused on musical parameters that seemed either to matter more to
participants, or were deemed least ambiguous. As a result, variations between sound events within a
specific category of musical stimuli (such as pitch variations) were put aside, in order to maintain the
clarity of the symbolic (icon) reference.

NOTES

[1] A note on “culture”. In a recent special issue of this journal (Meaning and Entrainment in Language and
Music, Volume 12), Widdess draws on cultural anthropologist Maurice Bloch’s (1998) definition of
culture: “That which needs to be known in order to operate reasonably effectively in a specific human
environment” (Bloch, 1998, p. 4). This definition is consonant with our own usage, and we appreciate
Widdess’ elaboration that much of what gives a musical culture its identity depends on knowledge that is
acquired and deployed in non-linguistic ways (Widdess, 2012, p. 88). Meanwhile, Cross reminds the reader
that academic discourse on musical culture typically refers only to constructs which are continually
“refashioned” in relation to one another (Cross, 2012, p. 95); culture, therefore, is not a reliable concept for
the faithful description of permanently or objectively recognisable “domains” (Cross, 2012, p. 95). We
acknowledge this argument too, recognising the reductive and desensitising effect of over-use of the term.
This project, however, uses the concept of “culture” to make comparisons that can highlight particular
elements of difference and diversity in individuals’ responses to specific elements of musical sound.

[2] The study and fieldwork were carried out as part of the first author’s doctoral research, involving more
than 120 participants from 5 groups from three distinct cultural backgrounds (U.K., Japan, Papua New
Guinea) in Edinburgh (U.K.), Tokyo & Kyoto (Japan), and Port Moresby and the BenaBena villages in
Papua New Guinea. The preliminary results of this study are published as conference proceedings
(Athanasopoulos, 2013; Athanasopoulos & Moran, 2012; Athanasopoulos, Moran, & Frith, 2011). The
research and fieldwork were funded by the Onassis Foundation, Greece, the Great Britain Sasakawa
Foundation, and the University of Edinburgh.

[3] This is relative. Missionaries have now reached even the most remote tribes in Papua New Guinea. The
world that Feld (1982) described no longer exists; within two generations Papua New Guinean societies
have been radically transformed.

[4] The doctoral study from which these findings are drawn examined participant responses to three sets of
20-trial auditory stimuli varying on pitch, duration and attack rate for the free-drawing investigation.
Participants also responded to a further 12 auditory stimuli (varying on pitch and attack rate) for an
additional forced-choice study.

REFERENCES
Adachi, M. (1997). Japanese Children's Use of Linguistic Symbols in Depicting Rhythm Patterns. In:
Proceedings of the 4th International Conference on Music Perception and Cognition. Montreal, Canada:
McGill University, pp. 413-418.

196
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Athanasopoulos, G. (2013). Scoring Sounds: the Visual Representation of Music in Cross-Cultural


Perspective. PhD dissertation, Reid School of Music, University of Edinburgh.

Athanasopoulos, G., & Moran, N. (2012). Pictorial notations of pitch, duration and tempo: A musical
approach to the cultural relativity of shape. In: Proceedings of the 12th International Conference on Music
Perception and Cognition. Thessaloniki, Greece, p. 69.

Athanasopoulos, G., Moran, N., & Frith, S. (2011). Literacy makes a difference: A cross-cultural study on
the graphic representation of music by communities in the United Kingdom, Japan and Papua New Guinea.
Paper presented at the 2011 Biennial Meeting of the Society for Music Perception and Cognition (SMPC).
Rochester, NY, USA.

Avid Technology. (2009). Sibelius 6 Software. http://www.sibelius.com/home/index_flash.html

Bamberger, J. (2005). How the conventions of music notation shape musical perception and performance.
In: D. Miell, R. MacDonald, & D. J. Hargreaves (Eds.), Musical communication. New York: Oxford
University Press, pp. 143-170.

Barrett, M. (2005). Representation, cognition and communication: Invented notation in children's musical
communication. In: D. Miell, R. MacDonald, & D.J. Hargreaves (Eds.), Musical communication. New
York: Oxford University Press, pp. 117-142.

Blacking J. (1973). How Musical is Man? University of Washington Press (February 1, 1990) reprint from
1973.

Bloch, M. (1998). How We Think They Think: Anthropological Studies in Cognition, Memory and Literacy.
Boulder: Westview Press.

Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time.
Cognitive Psychology, Vol. 43, No. 1, pp. 1-22.

Boroditsky, L. (2011). How language shapes thought. Scientific American, Vol. 304, pp. 62-65.

Casasanto, D., Philips, W., & Boroditsky, L. (2003). Do we think about music in terms of space?
Metaphoric representation of musical pitch. In: R. Alterman & D. Kirsch (Eds.), Proceedings of the 25th
Annual Conference of the Cognitive Science Society. Boston, MA: Cognitive Science Society, p. 1323.

Cross, I. (2012). Musics, cultures and meanings: Music as communication. Empirical Musicology Review,
Vol. 7, Nos. 1-2, pp. 95-97.

Carello, C., Anderson, K.L., & Kunkler-Peck, A.J. (1998). Perception of object length by sound.
Psychological Science, Vol. 9, pp. 211-214.

Deregowski, J.B. (1972). Pictorial perception and culture. Scientific American, Vol. 227, No. 5, pp. 82-88.

DigiDesign (2009). Digidesign Pro Tools 8 Software. Avid Technology.


http://www.avid.com/US/products/family/Pro-Tools

Eitan, Z., & Granot, R.Y. (2006). How music moves: Musical parameters and listeners’ images of motion.
Music Perception, Vol. 23, No. 3, pp. 221-248.

Eitan, Z., Schupak, A., & Marks, L.E. (2010). Louder is Higher: Cross-Modal Interaction of Loudness
Change and Vertical Motion in Speeded Classification. In: K. Miyazaki, Y. Hiraga, M. Adachi, Y.
Nakajima, & M. Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception
and Cognition (ICMP10), Causal Productions, Adelaide (2008), (10 pp).

197
Empirical Musicology Review Vol. 8, No. 3-4, 2013

http://www2.tau.ac.il/InternetFiles/Segel/Art/UserFiles/file/Proceedings%2010.pdf

Farago, A. (2007). Interview: Jason Thompson. The Comics Journal, September 30, 2007.

Feld, S. (1982). Sound and Sentiment: Birds, Weeping, Poetics and Song in Kaluli Expression.
Philadelphia, PA: University of Pennsylvania Press.

Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., Friederici, A., & Koelsch, S.
(2009). Universal recognition of three basic emotions in music. Current Biology, Vol. 19, No. 7, pp. 573-
576.

Fuhrman, O., & Boroditsky, L. (2010). Cross-cultural differences in mental representations of time:
Evidence from an implicit non-linguistic task. Cognitive Science, Vol. 34, No. 8, pp. 1430-1451.

Gentner, D., Imai, M., & Boroditsky, L. (2002). As time goes by: Evidence for two systems in processing
space-time metaphors. Language and Cognitive Processes, Vol. 27, No. 5, pp. 537-565.

Goehr, L. (2007). The Imaginary Museum of Musical Works, Oxford: Oxford University Press.

Küssner, M.B. (2013). Music and shape. Literary and Linguistic Computing. Advance Access published
January 15, 2013: 10.1093/llc/fqs071.

Küssner, M.B., & Leech-Wilkinson, D. (in press). Investigating the Influence of Musical Training on
Cross-Modal Correspondences and Sensorimotor Skills in a Real-Time Drawing Paradigm. Psychology of
Music.

Küssner, M.B., Prior, H.M., Gold, N., & Leech-Wilkinson, D. (2012). Getting the shapes ‘right’ at the
expense of creativity? How musicians’ and non-musicians’ visualizations of sound differ. Proceedings of
the 12th international conference on Music Perception and Cognition, Thessaloniki, Greece, p. 121.

Layton, R. (1981). The Anthropology of Art (2nd Edition). Cambridge: Cambridge University Press.

Mitchell, M. (2004). The Visual Representation of Time in Timelines, Graphs and Charts. In: Australian &
New Zealand Communication Association Conference. http://epublications.bond.edu.au/hss_pubs/107

Nettl, B. (1985). Western Impact on World Music: Change, Adaptation, and Survival. New York: Schirmer
Books.

Nisbett, R. (2003). The Geography of Thought: How Asians and Westerners Think Differently - And Why.
New York: Free Press.

Prior, H.M. (2010). Links between music and shape: Style-specific, language-specific, or universal? 1st
International Colloquium on Universals in Music: Data, issues, perspectives, Université de Provence, Aix,
France.

Reybrouck, M., Verschaffel, L., & Lauwerier, S. (2009). Children's graphical notations as representational
tools for musical sense-making in a music-listening task. British Journal of Music Education, Vol. 26, No.
2, pp. 189-211.

Roberson, D., Davidoff, J., & Shapiro, L. (2002). Squaring the circle: The cultural relativity of good shape.
Journal of Cognition and Culture, Vol. 2, No. 1, pp. 29-53.

Rosch, E.H. (1973). Natural categories. Cognitive Psychology, Vol. 4, No. 3, pp. 328-350.

Sadek, A.A.M. (1987). Visualization of musical concepts. Council for Research in Music Education,
Bulletin No. 91, pp. 149-154.

198
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Schmandt-Besserat, D. (1992). How Writing Came About. Austin: University of Texas Press.

Tagg, P. (1993). Universal music and the case of death. Critical Quarterly, Vol. 35, No. 2, pp. 54-98.

Tan, S., & Kelly, M. (2004). Graphic representations of short musical compositions. Psychology of Music,
Vol. 32, No. 2, pp. 191-212.

Tarasti, E. (2002). Signs of Music: A Guide to Musical Semiotics. New York: Mouton de Gruyter.

Verschaffel, L., Reybrouck, M., Janssens, M., & Van Dooren, W. (2010). Using graphical notations to
assess children’s experiencing of simple and complex musical fragments. Psychology of Music, Vol. 38,
No. 3, pp. 259-284.

Walker, A.R. (1987a). Some differences between pitch perception by children of different cultural and
musical backgrounds. Council for Research in Music Education, Bulletin No. 91, pp. 166-170.

Walker, A.R. (1987b). The effects of culture, environment, age and musical training on choices of visual
metaphors for sound. Perception and Psychophysics, Vol. 42, No. 5, pp. 491-502.

Widdess, R. (2012). Music, meaning and culture. Empirical Musicology Review, Vol. 7, Nos. 1-2, pp. 88-
94.

Zwaan, E.W.J. (1965). Links en rechts in waarneming en beleving (Left and right in visual perception as a
function of the direction of writing). Doctoral Thesis, Rijksuniversiteit Utrecht, The Netherlands. Cited in:
Winn, W. (1994), Contributions of perceptual and cognitive processes to the comprehension of graphics.
In: Schnotz, W., & Kulliavy, R.W. (Eds.), Comprehension of Graphics. Amsterdam: North-Holland, pp. 3-
27.

199
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Visual Representations of Music in Three Cultures:


Commentary on Athanasopoulos and Moran
SIU-LAN TAN
Kalamazoo College

ABSTRACT: Athanasopoulos and Moran (2013) examined visual representations of


brief melodic sequences (solo synthesized flute playing rising, falling, peak, and valley
pitch contours) by British participants familiar with western standard notation,
Japanese participants familiar with Japanese standard notation, and participants from
the BenaBena tribe in Papua New Guinea who were unfamiliar with any literary or
notational script. This commentary discusses the method, analysis, and implications of
the findings, within the context of a multidirectional gain/loss perspective of the
acquisition of skills in human development, as applied to musical notation.

Submitted 2013 May 12; accepted 2013 May 27.

KEYWORDS: musical notation, invented notation, enculturation, melodic contour

IN an ambitious paper, Athanasopoulos and Moran (2013) ask: What is the effect of cultural background on
an individual’s two-dimensional representation of musical sound? To address this, they played brief
excerpts of synthesized flute sequences, varying in pitch to express (what may be described as) rising,
falling, peak, and valley contours, and asked participants to “represent the sound on paper in such a way
that if another member of their community saw their marks they would be able to connect them to the
sounds”. The participants were experienced musicians in the sense that they had all played and performed
on musical instruments for many years. They were drawn from three culturally distinct groups: British
participants living in Edinburgh (Scotland) who were all familiar with western standard notation (WSN),
Japanese participants in Tokyo and Kyoto familiar with Japanese standard notation (JSN) though with some
exposure to WSN, and participants in the BenaBena tribe living in Papua New Guinea who were unfamiliar
with any literary or notational script.
Among the main findings: Almost all British participants (24/25) and the majority of Japanese
participants (19/24) used representations that expressed time on the horizontal axis, and variations in pitch
on the vertical axis – in line with the conventions of WSN. (Further, left-to-right depiction of time was
employed by 46 of the 48 British and Japanese participants who provided time-ordered representations,
despite JTN’s conventional top-to-bottom organization of time). In contrast, only 4 out of 26 BenaBena
participants’ visual representations followed this scheme. Indeed, neither dimension used by the majority of
the British and Japanese participants appeared to be a focus in most BenaBena participants’ representations.
Neither (i) organizing the sounds with respect to linear chronological ordering nor (ii) attention to
variations in pitch apparently served as salient dimensions in the majority of BenaBena participants’
representations. These are quite intriguing findings in an area for which we have as yet little cross-cultural
empirical work.
The challenges of conducting a cross-cultural study in three countries are significant, especially
where remote fieldwork is involved. The authors are to be commended for thoughtfully adapting the
procedure in culturally sensitive ways (careful selection of instrument and pitch relations for stimuli,
avoiding the use of headphones, using etching to link to an unfamiliar task for the BenaBena group,
practice with pens, and emphasis on ecological validity). The inclusion of post-procedure interviews added
a layer of information not often reported in standard research studies, and yet important to any investigation
about meaning-making. However, some limitations in methodology should be noted:
(i) Average ages of participants varied widely: 23.5 years for the British group, and 47.6 and 57.1
years respectively for the Japanese and BenaBena groups. It is possible that cultural differences may be
conflated with cohort effects, and/or with age effects (such as variation in fine motor and sensory
capacities, or cognitive abilities, or other developmental differences). (ii) There is lack of clarity about
whether participants were run individually, or if groups A, B, and C were run as three whole groups –
raising the question of possible order effects. (iii) There is also the question of “demand characteristics” (or

200
Empirical Musicology Review Vol. 8, No. 3-4, 2013

cues participants may use to infer what the experimenter is interested in). While the drawing task was quite
open, the musical stimuli were carefully controlled to call participants’ attention to what the authors
considered to be a salient dimension. However, this assumption may have led to two different tasks for the
various groups. Most British and Japanese participants read the scenario as a melodic dictation task
whereas the BenaBena’s lack of familiarity with this context allowed them to approach the task in a more
open-ended way, selecting their own salient parameters of sound to represent.
Given that the participants were instructed to provide representations that would enable members
of their community to connect them to the sounds, it would have been interesting to pursue the efficacy of
the markings for matching to the particular pitch contours. What if participants from each group had been
given examples of the most typical notations of each group? How would they have matched the markings
to each contour in the recorded excerpts? Are principles of representation necessarily the same as those
used in deciphering others’ notation? Performing the sounds would also add another layer. What if
participants had been given several samples such as the ones shown in Figures 3 to 5, and asked to play
them? For instance: “This was drawn by somebody else listening to the same music. Play (or sing) these
marks in one or more ways.” Such exploration may lead to discoveries about what dimensions of musical
sound may receive more or less attention in the inevitable gain and loss that comes with the narrowing of
abilities accompanying enculturation into any particular musical tradition, as discussed in the next section.

MULTIDIRECTIONALITY, GAIN AND LOSS

In an influential essay published in 1987, developmental theorist Paul Baltes offered a set of propositions to
characterize the nature of human development across the life span. Two closely interrelated propositions
focused on the “multidirectional” nature of development and the idea of development as “a dynamic and
continuous interplay of gain and loss” (p. 611). These propositions challenged the idea of human
development as a simple linear progression of incremental gains in favor of a more complex view of
multiple directions of change, such that advances in development actually consist of simultaneous
expression of both gain and loss. In the words of Baltes (1987), “It is assumed that any developmental
progression displays at the same time new adaptive capacity as well as the loss of previously existing
capacity. No developmental change during the life course is pure gain” (p. 616).
For example, as they approach one year, infants become increasingly sensitive to fine differences
in the speech sounds of the language(s) to which they are frequently exposed, as reflected by their babbling
which increasingly resembles the sounds of what will become their native language(s). In tandem, however,
by the end of the first year, they are no longer able to discern fine differences between some phonemes not
common in the language(s) around them (such as /l/ and /r/ for infants in Japanese-speaking homes [Kuhl,
Stevens, Hayashi, Deguchi, Kiritani, & Iverson, 2006]). To extend to a musical example, young western
infants discern mistuned pitches equally well in melodies based on the Javanese (Indonesian) pélog scale or
western major/minor scale (Lynch, Eilers, Oller, & Urbano, 1990). However, by age one year western
infants show greater sensitivity to mistuning in melodies based on major/minor scales (Lynch & Eilers,
1992) while becoming less sensitive to mistuning of melodies in other scales.
Further, this multidirectional interplay of gain and loss is also expressed as individuals begin to
master the representational tools of their culture. This is because any representational system is selective
and incomplete in capturing what it represents or communicates. All forms of notation necessarily select
some aspects for attention over others and exclude many more. Western standard notation (WSN)
eventually came to represent precise pitch and time relations by way of diastematic and mensural notation
(Randel, 2003), and looser (non-calibrated) dynamic and tempo designations – and other parameters of
musical sound and performance are not explicitly represented.[1] To adapt a statement from David Olson’s
(1994) provocative book on the conceptual and cognitive implications of writing and reading script to the
realm of music: “The development of a functional way of communicating with visible marks…[is] a
discovery of the representable structures of” music (p. 68).[2]
Looking at Athanasopoulos and Moran’s (2013) study through this frame, the focus is less directed
to a comparison of visual representations of music against a “standard” notation, or solely a comparison
between groups, than to opening up the question of what we may discover from different ways of engaging
with music, and what rich parameters of music receive less attention in the inevitable interplay of gain and
loss that comes with the narrowing of abilities accompanying specialization.

201
Empirical Musicology Review Vol. 8, No. 3-4, 2013

BEYOND PITCH CONTOUR?

Sensitivity to pitch contour is apparent very early in life, in infants’ responsiveness to “prosody” (i.e., the
musical qualities of speech including melody, rhythm, intonation, phrasing etc). Language researchers (e.g.,
Papoušek, Papoušek, & Symmes, 1991) observed that rising contours in speech tend to capture infants’
attention, falling contours tend to soothe or signal the end of an interactive sequence, and bell-shaped
contours hold infants’ interest and express approval, and that these contours are used similarly across many
tonal and nontonal languages in speech directed to infants. Thus the specific stimuli selected by
Athanasopoulos and Moran (2013) were aptly chosen. Musical notation systems developing in the west
also reflect special attention to melodic contour (Randel, 2003). Even treatises on notation of musical
performance in non-western musical cultures for ethnomusicological purposes (e.g., Abraham &
Hornbostel, 1909) devoted significantly more space to discussions of how to represent musical pitch,
compared to other parameters such as rhythm, tempo, and duration.
Yet researchers (e.g., Walker, 1997) have also found that “in some cultures studied, musical pitch
may not be a predominant feature in the musical [e.g.] vocal behavior examined” (p. 315). Indeed, Walker
found that Australian Aboriginal singing often employed only a single pitch, but exhibited rich variety in
the formants. Thus the essential information in Aboriginal singing is focused on frequency spectra, in
contrast to attention to variations in pitch while holding timbre constant (i.e., maintaining the steady state)
as in much of western opera singing. Athanasopoulos and Moran (2013) made a parallel observation about
some BenaBena participants’ focus on the qualities of the flute sound (e.g., the tone color and fluctuations
in loudness) as opposed to pitch contour in their graphic representations of the brief flute excerpts.
Although he does not ground his approach in discourse on multidirectional gain/loss, Walker (1997) adopts
this spirit, concluding his study with a focus on what we have to learn from this group’s engagement with
music and by extracting specific practical implications for music training and performance: “Notations
based on visual metaphors for tone qualities or vowel pitch [features not notated in WSN] may be useful
perceptual and conceptual aids in the education of musical performers” (p. 343, parentheses added). So too,
we have much to learn from Athanasopoulos and Moran’s observations of the BenaBena group’s diverse
responses to the task.
Athanasopoulos and Moran conclude that “While WSN is ubiquitous, we are reminded that the
underlying principles with which it is associated are not universal”. In a previous study, my colleagues and
I examined 50 American college students who had never received any formal or informal training in
western musical notation to study their intuitions about how to read WSN (Tan, Wakefield, & Jeffries,
2009; see also Tan, 2002, for a related study). Most thought that pitch is depicted on the vertical axis
though many believed it is denoted by both note-head and stem. All 50 participants assumed music is read
from left to right, and a majority (41/50) thought that notes spaced closer together are played faster than
notes placed further apart. Participants commonly assumed that a “note” must have a filled circle and stem,
and only 21/50 participants knew a whole note (semibreve) is also a “note”, as most thought it signified
silence or absence of sound. The majority (39/50) thought duration of note length progressed in this fashion
from shortest to longest: whole, half, quarter, eighth notes etc. (semibreve, minim, crotchet, quaver etc.) –
corresponding with the symbol’s complexity – while actually it is the reverse order. These findings dovetail
with Athanasopoulos and Moran’s research as the Tan (2002) and Tan et al. (2009) studies suggest that
many basic conventions of western standard notation are not intuitive, even for college-aged students
immersed in western culture and exposed to many other western notational forms.

CLOSING REMARKS

Overall, I found Athanasopoulos and Moran’s study (part of a more extensive work) to be ambitious and
interesting, yielding intriguing findings in its main analysis and more informal observations. Its participant
pools make a valuable contribution to a growing empirical area that has mainly focused on children’s
invented notations of music. The methodological issues I raised are not uncommon in fieldwork and studies
that extend their reach beyond the typical convenience samples of college students. These issues are less
problematic if the focus is not primarily on comparison to WSN or between groups, but on discovering the
breadth of possibilities of human engagement with musical sound. In contrast to a more simple linear
assumption, the perspective of specialization as involving simultaneous expression of both gain and loss
sheds light on the complexities of the shape of musical enculturation and more broadly, of human
development.

202
Empirical Musicology Review Vol. 8, No. 3-4, 2013

NOTES

[1] For a cogent discussion of the printed musical score as a performance guide and mediator of meaning,
see Hultberg (2002).

[2] The final word in Olson’s quotation in the original context was “speech”.

REFERENCES

Abraham, O., & von Hornbostel, E.M. (1909). Translated by List, G. & E. (1994). Suggested methods for
the transcription of exotic music. Ethnomusicology, Vol. 38, No. 3, pp. 202-216.

Baltes, P.B. (1982). Theoretical propositions of life-span developmental psychology: On the dynamics
between growth and decline. Developmental Psychology, Vol. 23, No. 5, pp. 611-626.

Hultberg, C. (2002). Approaches to musical notation: the printed score as a mediator of meaning in western
tonal tradition. Music Education Research, Vol. 4, No. 2, pp. 185-197.

Kuhl, P.K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months. Developmental
Science, Vol. 9, No. 2, pp. F13-F21.

Lynch, M.P., & Eilers, R.E. (1992). A study of perceptual development for musical tuning. Perception &
Psychophysics, Vol. 52, No. 6, pp. 599-608.

Lynch, M.P., Eilers, R.E., Oller, D.K., & Urbano, R.C. (1990). Innateness, experience, and music
perception. Psychological Science, Vol. 1, No. 4, pp. 272-276.

Olson, D.R. (1994). The world on paper: The conceptual and cognitive implications of writing and reading.
Cambridge: Cambridge University Press.

Papoušek, M., Papoušek, H., & Symmes, D. (1991). The meanings of melodies in motherese in tone and
stress languages. Infant Behavior and Development, Vol. 14, No. 4, pp. 415-440.

Randel, D.M. (2003). Notation. In: D.M. Randel (Ed.), The Harvard dictionary of music. Cambridge, MA:
Belknap, pp. 565-571.

Tan, S.-L. (2002). Beginners’ intuitions about musical notation. College Music Symposium, Vol. 42, pp.
31-141.

Tan, S.-L., Wakefield, E.M., & Jeffries, W.P. (2009). Musically untrained college students’ interpretations
of musical notation: Sound, silence, loudness, duration, and temporal order. Psychology of Music, Vol. 37,
No. 1, pp. 5-24.

Walker, R. (1997). Visual metaphors as music notations for sung vowel spectra in different cultures.
Journal of New Music Research, Vol. 26, No. 4, pp. 315-345.

203
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Musical objects, cross-domain correspondences, and cultural


choice: Commentary on “Cross-cultural representations of
musical shape” by George Athanasopoulos and Nikki Moran
ZOHAR EITAN
Tel Aviv University

ABSTRACT: The target article illustrates deep cross-cultural gaps, involving not only
the representation of musical shape but also the notion of a musical object itself. Yet,
numerous empirical findings suggest that important cross-modal correspondences
involving music and visual dimensions are inborn or learned at infancy, prior to the
acquisition of language and most culture-specific behavior. Drawing on recent
empirical work, the commentary attempts to reconcile this apparent disparity.

Submitted 2013 June 23; accepted 2013 June 26.

KEYWORDS: cross-modal correspondence, musical object, nature vs. nurture

MUSICAL SHAPE AND THE SOUND OBJECT

“CROSS-CULTURAL representations of musical shape” compellingly reminds us that some fundamental


Western notions of music are neither self-evident nor universal – including the notion of the musical object
itself. A deeply ingrained Western tradition (though one forcefully challenged in recent decades) views
musical events and compositions as stand-alone sound objects, which may be detached from any specific
social context and function and thus contemplated and represented abstractedly. Such notions, however,
may be completely alien to some non-Western cultures, including that of the BenaBena participants
observed by Athanasopoulos and Moran. For them, “music” necessarily imparts in specific social contexts,
meanings and functions (Hays, 1986; Feld, 1990); hence, the concept of an independent sound object,
detached from any social or natural production contexts, would make little sense.
Importantly, where the notion of an abstract, context-less musical object makes little sense, so does
that of independent musical “structure.” Thus, structural variables that, for Western musicians, may define
musical patterns and events (such as pitch contour, used by the authors to distinguish stimuli from each
other) may be of little significance in a different culture, where such variables have no role in delineating
socially meaningful sound patterns.
The responses of BenaBena participants to the drawing task in the present experiment clearly
illustrate such cultural differences. “Participants attempted to indicate that a flute was playing through
iconic representation of the sound, without attempting to indicate specific variations between the sound
events … When questioned if all the sound events were the same (regardless of contour variation), the
participant responded, ‘Yes, it is still a flute playing’” (p. 191). It seems, then, that the very notion of an
abstract sound object was indeed alien to BenaBena participants. Instead, they associated the stimuli they
heard with concrete, culturally significant modes of sound production (flute playing, for instance, is central
to traditional “sing-sing” ceremonies of the BenaBena and other traditional Papua New Guinea cultures,
associated with the evocation of ancestral spirits or the assertion of male dominance over women; Hays,
1986; Langness, 1974). Consequentially, the contour variations that, from the experimenters’ perspective,
clearly distinguished one stimulus from another were not heeded in the BenaBena graphic representations.
Apparently, these structural variations (though perceptible) were viewed as irrelevant, since they did not
represent any culturally significant distinctions. Thus, stimuli were all “the same” because they did not
differ in the one way participants found to be culturally relevant: their perceived mode of production (flute
playing).
The above interpretation may also suggest why most BenaBena participants did not use spatial
mappings of the musical timeline in their drawings. While space-time mappings differ among cultures in
important ways (Gentner, Imai, & Boroditsky, 2002; Fuhrman & Boroditsky, 2010; Núñez & Cooperrider,
2013), the tendency to represent temporal relationships, such as temporal order or duration, by some spatial
relationships seems universal. Furthermore, such mappings are expressed in the BenaBena cultural milieu,

204
Empirical Musicology Review Vol. 8, No. 3-4, 2013

using both language metaphors and bodily gestures. Núñez, Cooperrider, and Wassmann (2012), for
instance, observed consistent use of up-down metaphors and gestures to refer to past and future events,
respectively; and the well-known Kaluli waterfall metaphors provide another intriguing example (Feld,
1990). Yet, while Western participants in the experiment mostly used “Cartesian” time-to-space mappings,
emulating the left-right spatial mapping of the musical timeline in Western musical notation, no spatial
mapping of the order or duration of musical events was observed in most BenaBena drawings. Apparently,
BenaBena participants did not use time-to-space mappings in their drawings, though those were culturally
available, since they did not view the specific temporal structures of the individual stimuli presented to
them as meaningful: devoid of any cultural context, these timelines were “all the same” to them, since “it is
still a flute playing” (p. 191).

CULTURAL CONSTRAINTS AND CROSS-DOMAIN MAPPINGS

Athanasopoulos and Moran’s experiment explicitly and directly required participants to draw upon their
shared cultural resources, and “… represent the sound so that if another member of their community saw
them they should be able to connect them with the sound” (p.186). When such tasks are presented to
members of one culture with a long-established tradition of graphic sound representation, as well as to
members of another culture that does not use any graphic representation of sound whatsoever (and in
addition may have no clear notion of an independent sound object to be represented), cultural differences in
sound representation would be inevitable.
Yet, such differences do not necessarily imply that cross-cultural, possibly universal tendencies to
correlate sound and visual shape in specific ways are non-existent. In recent decades, diverse studies have
indicated that some cross-modal correspondences involving sound arise independently of cultural practice
(see Eitan, 2013; Spence, 2011, for research overviews). Thus, infants tend to associate pitch direction
(“up” or “down”) with spatial rise and fall (Dolscheid, Hunnius, Casasanto, & Majid, 2012; Jeschonek,
Pauen, & Babocsai, 2012; Wagner, Winner, Cicchetti, & Gardner, 1981; Walker et al., 2010), pitch height
with visual shape (higher pitches are spiky, lower pitches rounded; Walker et al., 2010), pitch height with
thickness (higher pitch is thinner; Dolscheid et al., 2012), and loudness with brightness (Lewkowicz &
Turkewitz, 1980); human adults, pre-verbal children, and even non-human primates (chimpanzees) all
associate pitch height with visual brightness (Ludwig, Adachi, & Matsuzawa, 2011; Mondloch & Maurer,
2004). Though most such studies did not directly involve cross-cultural comparisons, they do suggest that
some cross-modal correspondences do not depend on enculturation, as they emerge at a very early age and
may even be shared by humans and other species.
How can such findings be reconciled with the results of Athanasopoulos and Moran’s experiment,
suggesting that “the iconic representation of music for a communicative function must follow society’s
norms” (p.191)? At least in part, the answer lies in the differences in purpose—and consequentially, in
research questions and experimental procedures—between the present experiment and most cross-modal
studies. The present experiment examined how shared cultural norms governing the association of sound
and shape are expressed by members of the respective cultures. The task it utilizes has an explicit intra-
cultural communicative purpose (“represent the sound on paper in such a way that if another member of
their community saw their marks they should be able to connect them with the sound”; p.186), and it allows
for—even demands—conscious reflection. In contrast, much cross-modal research elicits automatic,
subconscious responses, often at a pre-attentive level of processing. Furthermore, the dependent measures
in such experiments may be implicit or indirect – measures not directly defined in the experimental task,
which participants are often unaware of or unable to control. Even cross-modal studies utilizing direct,
explicit measures rarely demand that participants intentionally apply cultural norms or codes, or define the
experimental task as a communication undertaking. Thus, while the present experiment suggests how
participants utilize shared cultural resources (if such resources exist) to associate music and shape, it does
not examine whether and how cross-modal correspondences other than those codified by the participants’
culture (such as those revealed by the infant studies mentioned above) may affect perception and behavior.
Indeed, task demands might have attenuated any effects of cross-modal correspondences other than those
generated by culturally-ingrained knowledge and habits.
The main challenge for the comparative study of crossmodal correspondences, in music and
elsewhere, is examining the interactions between correspondences explicitly codified by language and
cultural practice and those whose sources seem independent of culture. Would, for instance,
correspondences such as those between pitch and spatial height or loudness and visual brightness—already
effective in infants a few months old—endure in cultures whose language and other cultural practices do

205
Empirical Musicology Review Vol. 8, No. 3-4, 2013

not express such correspondences? Would cross-modal correspondences fashioned by cultural practice
shape aspects of perception, cognition and action not susceptible to conscious awareness and control?
Examining such questions would require extensive and innovative cross-cultural research, applying
converging methodologies, both implicit and explicit, while investigating diverse perceptual and cognitive
processes and related cultural practices and artifacts.
Currently, very few studies (to which Athanasopoulos and Moran’s work is a welcome and
important addition) have attempted such challenging undertaking; yet even those few suggest a very
intriguing, complex picture. A recent study (Dolscheid, Shayan, Majid, & Casasanto, 2013), examining
pitch mappings across cultures, provides an interesting example. Dolscheid and her colleagues observed
that speakers of different languages use different metaphors to denote the auditory dimension we call “pitch
height” (Eitan & Timmers, 2010). Thus, while Dutch speakers describe pitch as high or low, Farsi speakers
describe it as thin or thick. The researchers presented Dutch and Farsi participants with different pitches,
concurrently with visual stimuli varying in height or in thickness, and asked them to reproduce each pitch
by singing. The irrelevant visual height information affected the performance of Dutch speakers (who refer
to pitch as “high” or “low”), but not that of Farsi speakers (who refer to pitch as “thin” or “thick”). In
contrast, the irrelevant thickness information affected the performance of Farsi speakers, but not that of
Dutch speakers. In further experiments, Dutch speakers were trained to use thickness metaphors for pitch in
two contrasting ways: similarly to Farsi speakers (higher pitches are “thinner”) and in a “reversed-Farsi”
way (higher pitches are “thicker”). Both groups then performed the pitch reproduction task. While the
group trained with Farsi-like pitch metaphors (high pitch in thinner) was affected by cross-dimensional
thickness interference similarly to native Farsi speakers, the group trained with “reversed-Farsi” metaphors
(high pitch is thicker) demonstrated no effects.
Together, these findings present an intriguing picture of the way cultural practices (here, native
language metaphors) may affect preexisting, possibly universal cross-modal correspondences. Both pitch/
height and pitch/thickness correspondences were revealed in infant studies, suggesting that the source of
both correspondences is not language or other cultural practices (Dolscheid et al., 2012; Jeschonek et al.,
2012; Wagner et al., 1981; Walker et al., 2010). Yet language (and possibly, other cultural practices and
artifacts) may strengthen or attenuate such earlier correspondences, affecting behavior and perception.
Cultural practices, then, do not create cross-modal correspondences, but rather modify the strength of
preexisting correspondences (inborn or acquired through implicit statistical learning) and position them in
culturally specific contexts. Yet, “natural” cross-modal correspondences not adopted by a culture do not
necessarily disappear. They remain latent, and may be induced even through brief training (Eitan and
Timmers (2010), using a different experimental paradigm, reach comparable conclusions).
This model is, of course, tentative and cursory, and need to be examined through extensive future
research examining diverse cultural settings and psychological mechanisms. Extending the intriguing
findings of “Cross-cultural representations of musical shape” through the use of complementary research
methods, enabling a closer look at the interaction of “nature” and “nurture” across cultures, would surely be
a valuable step forward.

REFERENCES

Dolscheid, S., Hunnius, S., Casasanto, D., & Majid, A. (2012). The sound of thickness: Prelinguistic
infants’ associations of space and pitch. In: N. Miyake, D. Peebles, & R.P. Cooper (Eds.), Proceedings of
the 34th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, pp.
306-311.

Dolscheid, S., Shayan, S., Majid, A., & Casasanto, D. (2013). The thickness of musical pitch:
Psychophysical evidence for linguistic relativity. Psychological Science, Vol. 24, No. 5, pp. 613-621.

Eitan, Z. (2013). How pitch and loudness shape musical space and motion: New finding and persisting
questions. In: S.L. Tan, A. Cohen, S. Lipscomb, & R. Kendall (Eds.), The Psychology of Music in
Multimedia. Oxford: Oxford University Press, pp. 161-187.

Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-
domain mappings of auditory pitch in a musical context. Cognition, Vol. 114, No. 3, pp. 405-422.

Feld, S. (1990). Sound and Sentiment: Birds, Weeping, Poetics and Song in Kaluli Expression. Philadelphia,

206
Empirical Musicology Review Vol. 8, No. 3-4, 2013

PA: University of Pennsylvania Press.

Fuhrman, O., & Boroditsky, L. (2010). Cross-cultural differences in mental representations of time:
Evidence from an implicit non-linguistic task. Cognitive Science, Vol. 34, No. 8, pp. 1430-1451.

Gentner, D., Imai, M., & Boroditsky, L. (2002). As time goes by: Evidence for two systems in processing
space-time metaphors. Language and Cognitive Processes, Vol. 17, No. 5, pp. 537-565.

Hays, T.E. (1986). Sacred flutes, fertility, and growth in the Papua New Guinea Highlands. Anthropos, Vol.
81, pp. 435-453.

Jeschonek, S., Pauen, S., & Babocsai, L. (2013). Cross-modal mapping of visual and acoustic displays in
infants: The effect of dynamic and static components. European Journal of Developmental Psychology,
Vol. 10, No. 3, pp. 337-358.

Langness, L.L. (1974). Ritual Power and Male Domination in the New Guinea Highlands. Ethos, Vol. 2,
No. 3, pp. 189-212.

Lewkowicz, D., & Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory-visual
intensity matching. Developmental Psychology, Vol. 16, pp. 597-607.

Ludwig, V.U., Adachi, I., & Matsuzawa, T. (2011). Visuoauditory mappings between high luminance and
high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proceedings of the National Academy
of Sciences of the United States of America, Vol. 108, No. 51, pp. 20661-20665.

Mondloch C.J., & Maurer, D. (2004). Do small white balls squeak? Pitch–object correspondences in young
children. Cognitive, Affective and Behavioral Neuroscience, Vol. 4, No. 2, pp. 133-136.

Núñez, R., & Cooperrider, K. (2013). The tangle of space and time in human cognition. Trends in Cognitive
Sciences, Vol. 17, No. 5, pp. 200-229.

Núñez, R., Cooperrider, K., Doan, D., & Wassmann, J. (2012). Contours of time: Topographic construals of
past, present, and future in the Yupno valley of Papua New Guinea. Cognition, Vol. 124, No. 1, pp. 25-35.

Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception & Psychophysics,
Vol. 73, No. 4, pp. 971-995.

Wagner, S., Winner E., Cicchetti, D., & Gardner, H. (1981). “Metaphorical” mapping in human infants.
Child Development, Vol. 52, No. 2, pp. 728-731.

Walker, P., Gravin Bremner, J., Mason, U., Spring, J., Mattock, K., Slater, A., & Johnson, S.P. (2010).
Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences. Psychological Science, Vol.
21, No. 1, pp. 21-25.

207
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Tonality: The Shape of Affect


MINE DOGANTAN-DACK
Middlesex University, UK

ABSTRACT: The last decade has witnessed an increasing interest in studying music as
it relates to human evolution, leading to the establishment of so-called evolutionary
musicology as a new field of enquiry. Researchers in this field maintain that music
indeed played as crucial a role as the development of language in the evolution of
humankind. The most frequently cited musical phenomena in relation to various
adaptive functions include rhythm, meter, and melodic contour. In this connection, the
universal phenomenon of tonal organisation of pitch in musical systems received no
attention. This article provides a hypothesis regarding the evolutionary origins of
tonality as a system for the dynamic shaping of affect, and establishes further
connections between music and affective states by proposing a link between the
emergence of tonality and of the human capacity to regulate inter-subjective dynamics
by shaping the course of affect towards stable states. The article also proposes that
tonality provides an archetypal psychological space within which the human ability to
shape different paths towards stable affective states could evolve.

Submitted 2013 March 29; accepted 2013 July 4.

KEYWORDS: tonality, attraction schema, evolutionary musicology

TONALITY AS A UNIVERSAL

THE idea that there exist correspondences between the dynamic principles of heavenly bodies, of our
subjective experiences and of music is a very old one, and not confined to the West. In the earliest
accounts of human music and musicality within ancient mythologies of China, Babylonia and India
(Lippman, 1992, pp. 3-9), the constitution of the cosmos, of humans and of music are integrally related;
and the Pythagorean tradition originating in ancient Greece attributes the powerful effects of music to
the harmonic principles it shares with cosmic order. The similarities humans observe between musical
motion and physical motion, on the one hand, and between musical motion and “inner motion” or
emotion, on the other hand, constitutes one of the recurrent themes in the history of Western musical
aesthetics, and many non-Western cultures draw parallels between musical movement and the
progression, regeneration and return perceived in the natural world as well as in psychological
phenomena. Musical thought across different times and places appears to consistently attribute human
musicality to something that goes deeper than the accidents of culture or nurture. Arguably the most
significant development in contemporary music psychology has been the recognition precisely of this
fact, that musical phenomena are indeed rooted in the hard-wired biological, i.e. perceptual, affective
and motor, capacities of our species and not merely in the accidental features of cultural and historical
circumstances.
A landmark in this connection has been the publication of A Generative Theory of Tonal Music
(1983) by Fred Lerdahl and Ray Jackendoff, which identified music theory as an integral branch of
cognitive science. Setting out to provide an account of human musicality by identifying the
psychological principles underlying music perception, Lerdahl and Jackendoff (1983, p. 332) have
argued that music theory, the traditional constructs of which rely upon such principles, “can provide
central evidence toward a more organic theory of mind.” Following the appearance of this influential
work, with which music psychology is thought to have come of age (Sloboda, 2005, pp. 102-103), the
discipline of psychology, which had been dominated by studies of visual perception and language
throughout the twentieth century, placed musical behaviour on a par with other cognitive domains in
exploring the mental capacities and principles of our species. Concurrently with this development,
there has been increasing interest in studying the possible neurobiological and neuropsychological
determinants of music.
The popularity of this research area has been further enhanced by a renewed interest in
musical universals in ethnomusicological studies, which for the greater part of the twentieth century
regarded diversity and non-universality of music as the basis of its disciplinary methodology (Nettl,
2005, pp. 42-49). In this connection, the recent emergence of evolutionary musicology as a field of
enquiry in its own right (Wallin, Merker, & Brown, 2000) represents the culmination of the idea that

208
Empirical Musicology Review Vol. 8, No. 3-4, 2013

behind the rich variety of musics around the world are the hard-wired similarities and features of the
human mind and body that are part of our evolutionary heritage. As scientific investigations in relation
to the evolution of our species make musical phenomena the focus of research with increasing
frequency, empirical findings make it hard, if not impossible, to argue against “the universality of
sensory, perceptual, and cognitive processes [underlying different music systems], independently of the
social and musical culture” (Carterette & Kendall, 1999, p. 727). In the words of Stephen Mithen,
“rather than looking at sociological or historical factors, we can only explain the human propensity to
make and listen to music by recognizing that it has been encoded into the human genome during the
evolutionary history of our species” (Mithen, 2005, p. 1).
While systematically associated with language and other cognitive domains, musical
behaviour – in accordance with one of the most remarkable findings of recent neuroscience – also
displays a neural architecture specific to itself and to our species. Isabelle Peretz has provided
compelling evidence that the neural networks that process language and music are dissociable,
indicating domains that are at least partly independent, and that the capacity for music is not
represented as a single entity in the brain but has different components, such that some of these can
remain intact even when others are impaired. Peretz (2003, p. 192) has written that “neurological
observations have consistently and recurrently suggested that music might well be distinct from other
cognitive functions, in being subserved by specialized neural networks.” The existence of such
specialized brain structures signifies biological foundations for music upon which cultural variations
can arise. Hence, even if “the larger scientific community is largely sceptical about links between
music and biology” (Trehub, 2003, p. 3) – an attitude represented by Steven Pinker's conspicuous
dismissal of music as “auditory cheesecake” (Pinker, 1997, p. 534) – recent work in evolutionary
musicology, which brings together the expertise of comparative musicologists, neuroscientists,
developmental psychologists, anthropologists, archaeologists and even zoologists, provides compelling
evidence concerning the evolutionary significance of music, supporting the view that in the evolution
and survival of humankind the role played by music has been no less crucial than the role played by
language.
To date, most of the research on the evolutionary origins and significance of music has
focused on those features that music shares with other cognitive domains, particularly with dance and
language. Indeed, some of the prominent evolutionary accounts of music have proposed common
origins for speech, music and dance[1] – as well as poetry and pretend play (Molino, 2000).
Consequently, among the various features that make up the phenomenon of music, the most significant
evolutionary roles have been given to meter and rhythm, which are attributes of the human capacity for
both language and music as well as for organized bodily movement.[2] Accordingly, the ability to keep
time and entrain physical movements to an external beat is assumed to have played a major role in the
emergence of coordinated body movements, social bonding and group cohesion. Another feature of
music that has received considerable attention in evolutionary accounts has been pitch contour, which
is part of both musical and linguistic cognition. Processing of pitch contour constitutes the presumed
evolutionary origin of vocal phrase formation and of richly communicative emotional expression
(Brown, 2000).
Within this research profile in evolutionary musicology, certain features of music still remain
neglected. In this article, I explore one of these features, a specifically musical capacity that appears to
have a brain mechanism specially reserved for it: namely, tonality. I ask whether we can claim an
evolutionary basis for the phenomenon of tonality, and whether it represents a biologically adaptive
function in addition to being a cultural artifice in its particular manifestations. Taken in its broadest
sense, “tonality” refers to the hierarchical organization of the pitch material around a single, central
pitch, which is often used to evoke stability and closure. One of the basic functions of tonality is to
shape musical movement by means of the functional hierarchy among the tones. In Western tonal
music, the musical movement as it unfolds is often strongly directed especially because of a highly
developed and structured harmonic system. While such a harmonic system is absent in non-Western
musics, organizing the pitch material around a central pitch is a musical universal.[3] Here, I use the
term “universal” in the sense of a “statistical universal” (Nettl, 2005, p. 48): as Bruno Nettl has argued,
a practical way of exploring and discussing musical universals is by asking “whether there are features
shared not by all but by a healthy majority of musics. We look for what is extremely common,
substituting the concept of ‘statistical’ universals for what may be described as a ‘true’
universal” (Nettl, 2005, p. 48). While tonality is not necessary for music to exist – think of serial or
electro-acoustic music – humanity evidently has chosen to make it a universal feature of musical
practice.

In the great majority of musical systems, humans use specific scale formations as the basis of
their music. Most significantly, they also assign a functional hierarchy to the members of their scales
such that one of them behaves as a tonal centre. One of the other scale members often has a privileged
relationship to the central tone, and prepares its arrival. Björn Merker has argued in this connection that

209
Empirical Musicology Review Vol. 8, No. 3-4, 2013

“the cross-cultural ubiquity of tonal music, and the ability of listeners to perceive tonality in music
employing unfamiliar scale systems, hints that it may have a deeper significance in the world of human
music” (Merker, 2006, p. 33). The hierarchical ordering of pitches as an abstract construct is a
distinctly musical phenomenon and has no counterpart in other cognitive domains: cognition of
language and the processing of environmental sounds do not rely on the perception or understanding of
a pitch hierarchy. Even though there are similarities between the uses of pitch in musical melodies and
speech intonation, no known language involves a comparable tonal hierarchy. Knowledge of tonal
hierarchy enables listeners to develop expectations for the occurrence of certain pitches, especially
towards the end of a piece of music. Lerdahl and Jackendoff (2006, p. 53) have written that
“psychological explanations alone do not explain why music is organized in terms of a set of fixed
pitches organized [hierarchically] in a tonal space…We conclude that the mind/brain must contain
something more specialized than psychoacoustic principles that accounts for the existence and
organization of tonality.” Indeed, research indicates that tonal cognition – or the tonal encoding of pitch
(Peretz & Morais, 1989) – is neurologically dissociable from pitch discrimination, recognition of
melodic contour, identification of timbre, and cognition of rhythm (Ayotte, Peretz, & Hyde, 2002).
Brain damage can, for example, selectively impair tonal cognition such that some patients, while
having no difficulty in processing pitch variations, are no longer able to judge melodic closure properly
(Peretz & Coltheart, 2003). The existence of a specialized neural architecture behind tonality, which
does not at the same time serve language, dance, or any other human capacity as far as we can tell, is
intriguing and requires an explanation.
In working towards an explanation and some hypotheses, I revisit the ancient idea that
postulates correspondences between the dynamic order of natural phenomena, of emotions and of
music, and focus on the kind of movement generated by the functional hierarchy among the pitches of
a musical system, i.e. tonal movement, since the starting point towards an evolutionary account of
tonality is almost certainly related to its fascinating capacity to create an experience of movement –
more specifically, of return and arrival – and the attendant experience of a spatial-temporal shape. In
this connection, I will consider certain patterns of movement that are observed in the context of
dynamic natural phenomena – patterns that function to stabilize dynamic systems – and will argue that
the representation of such patterns of movement as pitch-based shapes may have had evolutionary
significance for our species. Towards this end, a brief tour within the history of Western music theory
will prove useful.

TONAL MOTION: THEORETICAL AND HISTORICAL PERSPECTIVES

In Western musical thought, theoretical, critical and analytical explorations of tonality have been so
thoroughly dependent on terminology and concepts related to movement that it is hard to imagine how
one can even begin to talk about the phenomenon of tonality in this tradition without any reference to
movement. Exclude all motion words from music theory, and it would be extremely difficult, if not
impossible, to communicate verbally the easily appreciated meaning of such statements as “harmonic
motion from tonic to dominant is functionally distinct from motion from dominant to tonic” (Morgan,
1998, p. 2), or “movement to the dominant domain creates a sense of tonal tension that is subsequently
resolved by the descent back to the tonic” (Gauldin, 1997, p. 256). Western music theory has long been
interested in the nature and source of the movement experience that the tonal features of music
generate in listeners, and employing affect-based conceptualizations and terminology has been a
frequent strategy in accounting for this phenomenon. Such a strategy is certainly not unique to Western
musical thought: as Ian Cross has written “The evocation of affect and the experience of movement
appear intimately bound to music in many cultures” (Cross, 1999, p. 29) and any evolutionary account
of tonality must take this fact into account. In Western theory, some of the oldest models attribute the
source of tonal movement to musical structures, such as dissonances, and refer to affective experiences
by way of explanation. A fourteenth-century author of contrapuntal theory, for example, speaks of
imperfect intervals “striving to attain” a more perfect interval (Cohen, 2001, p. 16). A theorist from the
fifteenth century writes that a dissonant interval, which is imperfect in comparison to a consonant one,
“ardently burns to attain that perfection” (Cohen, 2001, p. 16). One of the most influential theories
proposed during the twentieth century, i.e. Schenker’s organicist model of tonal music, is indeed a
contemporary version of this anthropomorphic tradition in music theory. Schenker believed that
musical tones have a “life” of their own and behave in accordance with their “will” such that each tone
desires to become the root of a consonant triad. While this tradition, which gives a central role to
dissonant structures in the generation of tonal motion, has occupied an eminent place in Western
musical thought for centuries, it has also been significantly manifest in the music theories of other
cultures. Even though the kinds of pitch combinations that are regarded as dissonances, and the manner
of “resolving” them vary widely from culture to culture, dissonance as a determinant of expectation of
movement to greater stability, as well as the fact of resolution, i.e. the existence of movement patterns

210
Empirical Musicology Review Vol. 8, No. 3-4, 2013

from “restless” pitch structures towards “restful” ones, are universal phenomena (Carterette & Kendall,
1999).
Another frequent strategy to account for tonal motion in Western theory has concerned
establishing conceptual and terminological connections between physical and musical movement. This
strategy turns to notions of inertia, gravity and gravitational fields and to forces of attraction in
explaining the generation of movement in physical and tonal spaces. It is in particular the concept of
attractions – one of the most powerful theoretical tools of modern physics since the sixteenth century –
that has found its way into music theory as tonal attractions in the writings of Jean-Philippe Rameau,
François-Joseph Fétis, Jérôme-Joseph de Momigny, Ernst Kurth, Victor Zuckerkandl, and more
recently Fred Lerdahl and Steve Larson. The common consensus in recent theory is that we experience
and understand tonal movement by metaphorically transferring our embodied experience of physical
forces such as gravity into the domain of music. The mechanism we employ in making this transfer is a
basic cognitive capacity, namely cross-domain conceptual mapping, which allows us to conceptualize
one kind of experience in terms of another, “preserving in the target domain the inferential structure of
the source domain” (Lakoff & Johnson, 1999, p. 91). In other words, tonal motion is accounted for in
terms of the source domain of physical motion; yet, it is not entirely clear why and how the source
domain for the concept of attraction should be the physical world as most music theorists assume. I
return to this issue below, but here it is worth noting that historically other writers who have used the
concept of tonal attractions made different kinds of assumptions about it: Momigny, for example,
believed the term “attractions” as used in music theory is not merely a metaphor but refers to a genuine
structural similarity between the planetary and tonal systems. He wrote that “like the attraction
recognized in physics in relation to the inertia of bodies, this attraction acts in inverse relation to
distance: a tone that is only half a step away from the one that has to follow it is much more powerfully
attracted by it, than were it [separated] by a whole step. Here is a new analogy I have discovered in
nature, and that proves the marvellous harmony that reigns among the things least resembling one
another in appearance” (Momigny, 1803-1806, p. 52).[4] Fétis, on the other hand, attributed the
experience of such attractive forces between the tones to inherent organizational principles of the mind.
He argued that the human mind cannot but perceive tonal relationships as based on attractions,
similarly to necessarily perceiving the physical world through the Kantian experiential categories of
space and time. In this context, Fétis explicitly denied acoustical or mathematical properties as
determining tonal motion, and instead talked about a “mysterious law” originating in human cognition
and governing the attraction and motion of sounds.

EVOLUTION OF TONALITY: THE ATTRACTION SCHEMA

While contemporary accounts of tonal movement in terms of tonal attractions appear to explain why
musical pitches are organized within a functional hierarchy, and around a stable pitch, they fail to
explain one of the most important aspects of tonal organization, i.e. the existence of recurring patterns
that lead to the stable pitch. These recurring patterns constitute what are technically known as
cadences. In all world musics, tonal movement preceding the return of the central pitch is structured
and not arbitrary. The stable pitch does not simply reappear, but “returns” following a process of
returning, which involves recurring patterns of movement from instability towards stability. This
feature is so fundamental that one can re-conceptualize tonality as the system for cadencing, made
possible – to be sure – by the existence of a tonal centre. In thinking about the evolutionary origins of
tonality, one has to be able to explain why in all musical cultures, cadences, rather than musical
initiations, have been systematized and formalized. Humans apparently choose to create strongly
memorable patterns for moving from relative tonal instability to stability, rather than for moving from
relative tonal stability to instability. Hence cadences, but not tonal initiations, display universal
features. For instance, Nettl writes that “in the vast majority of cultures most musical utterances tend to
descend at the end, but they are not similarly uniform at their beginnings” (Nettl, 2005, p. 46). Any
account of tonality – evolutionary or not – needs to be able to explain the significance, and the origins,
of such memorable patterns of tonal movement that precede the return of the stable pitch.
In this connection, a revised model of attraction as we find in contemporary physics proves a
useful theoretical tool (Milnor, 1985). According to this model, attractive forces – conceived as
generators of movement patterns – explain the temporal behaviour of dynamical systems. When the
equilibrium of a nonlinear system[5] is disturbed, the system reorganizes itself to reach a stable state. In
such cases, the theory of self-organization posits so-called attractors. Dynamical systems display this
kind of temporal evolution by being attracted to certain dynamical configurations – typically a steady
oscillation that either repels or attracts neighbouring states of the system. The equilibrium state towards
which all other states converge is regarded as an attractor: it draws all possible states of the system to
itself and all possible trajectories come together at that particular configuration. Most significantly, the
attractor in this case is not a place or an object but a dynamic shape or temporal pattern of movement,
referred to as a “limit cycle”. Such attractors are at the heart of most periodic processes observed in

211
Empirical Musicology Review Vol. 8, No. 3-4, 2013

nature. Examples of natural systems that display stable and sustained attractive behaviour of this kind
would be the beating of the heart, the neural discharge in the brain, the circadian rhythms of the 24-
hour period in humans and animals, etc. Limit cycles are fundamental to periodicities, and they are
everywhere in nature.
I propose to conceive of the phenomenon of tonality as a dynamical system that displays
attractive behaviour as described above, in particular as a system for cadencing in terms of the (quasi-
periodic) recurrence of certain pitch-based shapes that draw melodic trajectories to itself (and harmonic
trajectories, in the case of Western tonal music). It should be noted that what matters here is the very
existence of a structural similarity between tonal practices and other kinds of phenomena rather than
any terminological similarity: not all cultures would use the same kind of terminology and discourse to
account for such structural similarities, but what is intriguing is the existence of a system of creative
practice as music that displays a certain structural similarity to certain dynamic, temporal shapes that
we observe in natural phenomena. In this connection, one would first need to understand the basis on
which humans perceive such structural similarities between natural and musical phenomena: is it the
case that our experience and understanding of movement across different domains in terms of stability
and attractions originate in our observation of these attributes in the physical world of objects?
In this connection, I hypothesize that our recognition and identification of certain movement
patterns in the physical world as a process of return to stability (as in limit cycles) is based on our
capacity to generate and experience such patterns subjectively and intersubjectively in an embodied-
affective manner; this capacity also forms the origins of tonality as I argue below. Movement patterns
that periodically return to stability constitute the structure of an affective schema acquired very early in
life.[6] Infants develop richly communicative psychological experiences and expressive behaviour
before they walk, or talk (Bloom, 1993; Stern, 1981, 1985). The earliest schemas humans develop,
which are affective in nature, concern survival-enhancing interactions with parents and caregivers, and
teach us about orientation, temporal progression, cause and effect, force, goal, and agency. The
increasing dependence of human newborns on caregivers for survival during the course of evolution
made these early interactions take on features that ensured sustained positive affect. It is believed that
the sequences of vocal, facial, and kinetsic movements that structure interactions between infants and
caregivers played a central role in the affective – as well as cognitive – evolution of our species. Ellen
Dissanayake has argued that such daily multimodal interactions between infants and caregivers become
“ritualized” as they are repeated over and over again (Dissanayake, 2001, p. 389). These repeated
patterns – involving changing intensities, tempos and shapes of multimodal movements accompanied
by positive affective states – form the structure of arguably the earliest affective schema humans
acquire in life, representing an affective process that is employed to make sense of the world at a very
early stage. For the sake of my argument, I shall call this the attraction schema: one can think of such
movement patterns as an attractor, or a psychological limit cycle towards which various affective
exchanges are directed. The purpose of the multimodal movement patterns forming the basis of the
exchanges between the infant and the caregiver is to provide a stable affective referential state, so that
all negative psychological states can be steered back to it by enacting, and re-enacting, the various
vocal, kinesic, haptic and facial components of the schema as and when required. We can speculate that
during the course of evolution, caregivers began to create memorable pitch patterns as part of the vocal
component of the attraction schema, employing them to return to the same pitch, and thereby marking
the beginnings of tonality. It is crucial to note here that the temporal modulation of the kinesic, haptic,
visual, and vocal components of the attraction schema are controlled by the underlying affective
dynamics and its changing shapes: in other words, the different perceptual and expressive modalities
always modulate congruously because they are all supported by the same affective dynamics. The
practical impossibility of modulating vocal expression from sadness to joy while keeping a sad facial
expression shows that the perceptual-expressive modalities are indeed controlled by an amodal
affective system. As far as the origins of tonality are concerned, as the kinesic and visual components
of the attraction schema moved towards an expression and experience of affective stability, the vocal
component of the schema would have had to follow the dynamics of the affective shaping in the same
direction, i.e. towards stability. In order to test this hypothesis, empirical research can address in detail
the question of modulation of affect through the involvement of different modalities. For instance, by
presenting infants with a pattern of tonal movement that modulates incongruously with the visual and
kinesic components of the caregiver’s ongoing affective communication, e.g. by playing a melody that
moves towards instability while the visual and kinesic patterns of the caregiver move towards stability
and vice versa, the causal relationships between these multimodal patterns presented in different
combinations and the modulations that take place in the infant’s affective state can be established.
What is important as far as tonality is concerned is that in its origins it is not the tones that are
attracted to stable pitches, but it is the affective system that is attracted to stable states through all its
different perceptual and expressive modalities. In other words, tonality is an integral component of the
attraction scheme acquired early in life, and its origins are to be located not in tones nor in tonal
perception as such but in certain affective states. I would argue that the emergence of tonal encoding of

212
Empirical Musicology Review Vol. 8, No. 3-4, 2013

pitch can be construed as a pre-linguistic stage in the evolution of modern humans, intimately related to
the evolution of our affective capacities.
As the root of the attraction schema – and thereby of tonality – are to be found in our pre-
linguistic affective experiences and implicit memories of dynamic patterns that emerge as we interact
with members of our species, I hypothesize that our capacity for recognizing movement patterns that
converge towards stable states in diverse phenomena does not originate in our cognitive understanding
of the physical world of objects and events as recent music theory claims. In other words, the source
domain for the attraction schema is not the behaviour of physical objects or even merely our own
physical movements in the physical world. There is evidence that affective understanding is rooted in
embodied first-person feelings, and not in the mere observation of the actions and gestures of other
agents or of the motions of natural phenomena (Damasio, 1999, p. 343). Accordingly, unless humans
can experience affective states subjectively, their ability to recognize them in other agents is impaired.
Patients with such impairment can still describe the movements they observe accurately in terms of
shape, intensity and rhythm, but cannot attribute any affective content to them. In other words, the
movement patterns do not constitute for them an affectively meaningful unit with a sense of purpose.
To interpret the motions and dynamic shapes in the world as affectively meaningful, the first-person
experience of one’s own embodied feelings appear to be essential. We are able to comprehend and
describe the movements of dynamical systems, such as those of the solar or tonal systems, in terms of
affective concepts – such as attraction – precisely because our affective schemas support such
descriptions. If we did not have access to affective schemas, it is not clear in what sense we would
construe tonal – and even planetary – movement in terms of attractions. The dynamic order of the
heavenly bodies and of music appear to our understanding as constituted through attractive forces
generating stable states only because our affective experiences appear to our consciousness as
dynamically regulated and directed towards stable states periodically.
The attraction schema that I put forward as the evolutionary basis of tonality involves several
important features that need to be emphasized. Firstly, it is relational through and through in that the
multimodal temporal shapes constituting the schema reflect non-linguistic, intersubjective exchanges or
turn-taking. In the earliest stages of life, when the schema is first acquired, affect modulation is
controlled heavily by the caregiver: it is believed that the earliest signs of self-regulation of affect
appear around the age of six months, when infants appear to internalize some of the particulars of the
affective schema “practised” by their caregivers (Thompson, 1994). In this connection, empirical
research is needed to reveal how much tonal singing contributes to the emergence of self-regulation of
affect at this early stage. For example, tests could be designed in order to establish the relationship
between the amount of tonal singing a caregiver presents an infant with from birth onwards and the
onset of self-regulation of affect in the infant; and to compare the effects of differently “weighted”
affective schemas practised by caregivers (e.g. those that put more emphasis on visual affective
exchange in comparison to tonal singing and vice versa) on the length of time infants take to start self-
regulation of affect. Furthermore, existing empirical research on lullabies (e.g., Unyk, Trehub, Trainor,
& Schellenberg, 1992; Trehub, Unyk, & Trainor, 1993; Trehub & Trainor 1998) can be effectively
extended to test the role of tonality on infants’ lullaby preferences. Lullabies exist in every known
culture and are universally employed to calm infants and induce sleep. By presenting infants with re-
composed lullabies that do not return to the same tonality, empirical tests can explore whether infants
prefer the original to the re-composed lullabies. Significantly, the role of the shape of the return to the
original tonality in lullabies can be studied by presenting infants with lullabies that return to the
original tonality abruptly, i.e. without the temporal pattern of cadencing, as well as with those that
employ the process of returning, and observing which alternative they prefer. In addition, the
experiment by Trehub, Unyk, and Trainor (1993), which provided evidence that adult listeners are able
to identify whether the songs from a foreign culture represent lullabies, can be modified to use re-
composed lullabies that do not return to the same tonality in order to further explore the role and
universality of tonality in infant-directing song. The second point about the attraction schema is that it
concerns an affective episode that extends in time; as such it is experienced as having a trajectory and a
shape. The relationship between the perception and making of spatial shapes and the abstract temporal
shapes that lived phenomena (such as narratives, emotions and music) represent is a complex and
intriguing issue: one mechanism that is put forward to explain this relationship is cross-modal mapping
and cross-domain mapping.
There is extensive research indicating that both static and dynamic stimulations in one
modality can influence the perception of information in another modality. Historically, one of the
earliest theories in this connection was proposed by the Austrian philosopher and psychologist
Christian von Ehrenfels, who is best-known today for his article titled “Über Gestaltqualitäten” (“On
Gestalt Qualities”), published in 1890. The central idea of this work, i.e. that our perceptions contain
“form qualities” or Gestalten, which are not contained in isolated sensations, is often quoted. What is
not so well known about Ehrenfels’ famous article is that it also involves the development of an idea
first proposed by the Austrian physicist and philosopher Ernst Mach on the perception of spatial and

213
Empirical Musicology Review Vol. 8, No. 3-4, 2013

temporal shapes (Mach, 1865). Ehrenfels argued that each experience we have of a Gestalt or form in
any sensory modality is cognized as structurally analogous to the experience of a spatial shape. In
other words, spatial Gestalten serve in his view as references for our comprehension of forms or shapes
in other modalities. An immediate implication of this idea is that concepts related to the perception and
experience of spatial shapes can be applied to shapes extended in time. Indeed, the idea that there are
similarities of form between different fields of experience is one of the most important conclusions of
Ehrenfels’ article. During the twentieth century, various authors including Heinz Werner (1948),
Susanne Langer (1942), Lawrence Marks (1978) and Daniel Stern (1985) have argued along similar
lines for the existence in our minds of abstract “amodal” forms that we utilize in making sense of the
world through different modalities of perception. The attraction schema that I propose, which manifests
itself as felt temporal shapes, is thus also experienced and understood in terms of spatial shapes and
trajectories: I would even argue that the representation of the attraction schema as spatial trajectories
provides the very first step towards experiencing space as existing beyond – and more abstractly than –
the locatedness we sense through the immediate kinaesthetic configurations of our bodies. In other
words, it could be that the attraction schema provides the essential basis for developing a concept of
space that goes beyond the immediate percept of space.

SOME IMPLICATIONS FOR RESEARCH

I have argued so far that the originary function of tonality is to structure the vocal component of the
attraction schema by creating pitch processes experienced as movement towards stability. The capacity
to structure pitch materials in this way has far-reaching consequences as far as the development of
other mental capacities are concerned. These consequences implicate two inter-related areas, both of
which can be regarded as having played significant roles during the evolution of humankind:

1) Emergence of the capacity to experience and structure psychological spaces;


2) Emergence of the capacity to create narrative structures.

There is not much work done on how psychological spaces arise and what their nature is. One of the
first to theorize about these kinds of spaces was the Gestalt psychologist Kurt Lewin, who attempted to
explain human behaviour by reference to psychological spaces governed by driving and hindering
forces (Lewin, 1935, 1936). Such psychological spaces would be organised in terms of trajectories and
dynamic shapes. The human ability to move in psychological spaces, to project oneself into different
hypothetical situations and hypothetical times, and the cognitive flexibility this brings is unmatched in
the animal world, and must have had crucial evolutionary significance. It is possible that tonal
movement structured around a functional pitch hierarchy brought with it the earliest kind of
psychological space with a clear orientation, indicating unambiguously the place of rest and stability.
The secure knowledge that there is a fixed place of stability in this space may have provided humans
with the capacity to imagine mechanisms for steering affect back to it from many diverging states,
using many different kinds of routes. In terms of evolution, the psychological space established by
tonality may have played a crucial role in affective development, by making it possible to regulate
affective states and to steer affect towards stability. Originating in the dyadic relationship of a newborn,
psychological spaces humans move in become more and more differentiated and complex in adult life.
Consistent and stable referential states are crucial in the development of psychological spaces and the
cadencing that tonality enables functions as a structuring principle in this regard: cadences provide
”islands of consistency” (Stern, 1985, p. 45) around which a psychological landscape can be structured.
The emergence and experience of psychological spaces is closely related to one of the most
significant features of tonality and of the attraction schema, namely its capacity to structure processes
of return and arrival and thereby end-states. There is evidence in research that the representation of
end-states as stable states play a crucial role in various mental and bodily functions (Aarts & Elliot,
2012; Schmidt & Lee, 2005). Trajectories and dynamic shapes that lead to end-states appear to have
motor, cognitive and affective significance. In daily locomotion, for example, the majority of our
physical interactions with the world are organised as motor processes oriented towards physical targets,
i.e. they are goal or end-state oriented. Such locomotion involves sophisticated abilities such as object
representation, trajectory prediction, etc., and humans constantly negotiate trajectories or temporal
shapes when they move in physical space. There is evidence indicating that the mental representation
of the goal-state drives the motor trajectory, and that the trajectory in turn determines the movement
dynamics; furthermore, there is evidence that goal-oriented locomotion is not planned as a succession
of steps, but as a complete locomotor spatial trajectory driven by the representation of the end-state
(Hicheur, Pham, Arechavaleta, Laumond, & Berthoz, 2007). Cognitive-affective daily tasks also appear
to rely on the representation of end-states. For example, when people make intensity judgements of
their affective experiences, end-intensities (in addition to peak intensities) have the strongest effect on
overall intensity judgment (Kahneman, 1999). There appears to be something fundamental about the
214
Empirical Musicology Review Vol. 8, No. 3-4, 2013

representation of end-states in our physical interaction with the world, as well as in the structuring of
our affective experiences. Perhaps, there is after all a neurological basis to Shakespeare’s “All’s well
that ends well” motto.
I hypothesize that the representation of a structured return to certain end-states as stable states
is also essential for making sense of temporal experiences during the pre-linguistic stage of
development, and must have been significant also during human evolution. Humans can connect
together and give meaning to an otherwise disconnected set of events only by surveying the past from a
psychologically stable point. Imagine a world where we never experience arrival and return to
psychological stability, where events follow one another in a chaotic fashion; it then becomes difficult,
if not impossible, to conceive how we would assign relational significance and meaning to events.
Without stable points from which to look back on what happened, human memory would not be what it
is. It is only because we can periodically “return” to stable psychological states that we can interpret
past events in the manner of a narrative, with more or less clearly marked departures and arrivals. Time
becomes human time when we can assign narratives to our temporal experiences. The beginning of all
narrative structures can be found in cadencing, i.e. the essence of tonality. In evolutionary terms, the
ability to construct linguistic narratives is built upon this pre-linguistic affective-tonal capacity, which
made it possible for humans to unify the events within a temporal span through the structuring power
of the stabilizing cadence. The evolution of the tonal encoding of pitch, therefore, must have had an
adaptive role in that it created the possibility of giving meaning to temporal experiences of longer and
longer stretches, which underlies the capacity of modern humans to form autobiographical memories
and narratives.
In conclusion, I hope to have opened up debate in this article not only about the possible
biological bases of the “resilience of tonality” (to borrow a term from Adorno), but also to draw
attention to the continuity between diverse natural phenomena and our motor-cognitive-affective
experiences, all displaying intriguingly similar temporal, dynamic shapes. Much work, of course,
remains to be done to better understand the correspondences between spatial shapes and the temporal
dynamic course of long-range, lived phenomena such as narratives, emotions and music.

NOTES

[1] See Chapters 16, 17, 18, and 22 in The Origins of Music, N. L.Wallin, B. Merker, & S. Brown
(Eds.), 2001, Cambridge, MA: MIT Press.

[2] Although metric time-keeping is widely considered to have evolved in the context of musicking and
dancing in groups (see, for example, “An Introduction to Evolutionary Musicology” by Brown, Merker
and Wallin in The Origins of Music: 12), the kind of sophisticated metric hierarchy that is observed in
musics all around the world is not a feature of either language or dance. Lerdahl and Jackendoff (2006)
have noted that metrical structure is not widely shared by other cognitive systems, and in that sense
represents a striking contrast with grouping structures, which are observed very commonly in many
different domains.

[3] See for example Chapters 1, 2, 3, 5, 9 and 10 in Analytical Studies in World Music edited by
Michael Tenzer (New York: Oxford University Press, 2006), which includes discussions about tonality
in this broad sense, specifically in Xorasani maqam from Iran, Bulgarian Horo, Flamenco from
Andalusia, music of the Aka from Central Africa, South Indian ragas, and Western classical music
respectively. See also Blacking (1970).

[4] Translation from the French original by the author.

[5] A system whose output is not directly proportional to its input; to put it simply, a nonlinear system
is one whose behavior is not the sum of its parts or their multiple; e.g. fluid flow.

[6] In contemporary psychology, the term “schema” is used to refer to structured information about the
similarities between different experiences. Schemas are “organized sets of memories about sequences
of events or physical scenes and their temporal and spatial characteristics, which are built up as we
notice regularities in the environment” (Snyder, 2000, p. 95). The commonalities shared by different
situations occurring at different times are abstracted so as to become a “memory framework” (Snyder,
2000, p. 95), and to function as a schema. Schemas, which operate unconsciously, shape our
expectations about various phenomena, and guide the processing of new information as well as the
retrieval of information stored in memory. Affective schemas carry pre-conceptual information about
our responses to the environment and to other people.

215
Empirical Musicology Review Vol. 8, No. 3-4, 2013

REFERENCES

Aarts, H., & Elliot, A.J. (2012). Goal-Directed Behavior. New York: Psychology Press.

Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflicted with a
music-specific disorder. Brain, Vol. 125, No. 2, pp. 238-251.

Blacking, J. (1970). Tonal Organization in the Music of Two Venda Initiation Schools.
Ethnomusicology, Vol. 14, No. 1, pp. 1-54.

Bloom, L. (1993). The transition from infancy to language: Acquiring the power of expression. New
York: Cambridge University Press.

Brown, S. (2001). The ‘musilanguage’ model of music evolution. In: N.L. Wallin, B. Merker, & S.
Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 271-300.

Carterette, E.C., & Kendall, R.A. (1999). Comparative music perception and cognition. In: D. Deutsch
(Ed.), The Psychology of Music. London: Academic Press, pp. 725-791.

Cohen, D.E. (2001). The imperfect seeks its perfection: Harmonic progression, directed motion, and
Aristotelian physics. Music Theory Spectrum, Vol. 23, No. 2, pp. 139-169.

Cross, I. (1999). Is music the most important thing we ever did? Music, development and evolution. In:
S. Won Yi (Ed.), Music, Mind and Science. Seoul National University Press, pp. 1-39.

Damasio, A. (1999). The feeling of what happens: Body and emotion in the making of consciousness.
New York: Harcourt Brace.

Dissanayake, E. (2001). Antecedents of the temporal arts in early mother-infant interaction. In: N.L.
Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp.
389-410.

Gauldin, R. (2006). Harmonic Practice in Tonal Music. New York: W.W.Norton.

Hicheur, H., Pham, Q.-C., Arechavaleta, G., Laumond, J.-P., & Berthoz, A. (2007). The formation of
trajectories during goal-oriented locomotion in humans. The European Journal of Neuroscience, Vol.
26, No. 8, pp. 2376-2390.

Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT
Press.

Lerdahl, F., & Jackendoff, R. (2006). The capacity for music: What’s special about it? Cognition, Vol.
100, No. 1, pp. 33-72.

Kahneman, D. (1999). Objective happiness. In: D. Kahneman, E. Diener, & N. Schwartz (Eds.), Well-
being: The Foundations of Hedonic Psychology. New York: Russell-Sage, pp. 4-25.

Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to
western thought. New York: Basic Books.

Langer, S. (1942). Philosophy in a New Key. Cambridge, MA: Harvard University Press.

Lewin, K. (1935). A Dynamic Theory of Personality. New York: McGraw-Hill.

Lewin, K. (1936). Principles of Topological Psychology. New York: McGraw-Hill.

Lippman, E. (1992). A History of Musical Aesthetics. Lincoln: University of Nebraska Press.

Mach, E. (1865). Bemerkungen zur Lehre vom räumlichen Sehen. Zeitschrift für Philosophie und
philosophische Kritik, Vol. 46, pp. 1-5.

216
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Marks, L.E. (1978). The Unity of the Senses: Interrelations Among the Modalities. New York:
Academic Press.

Merker, B. (2006). Layered constraints on the multiple creativities of music. In: I. Deliège & G.A.
Wiggins (Eds.), Musical Creativity: Multidisciplinary Research in Theory and Practice. Hove:
Psychology Press, pp. 25-41.

Milnor, J. (1985). On the concept of attractor. Communications of Mathematical Physics, Vol. 99, No.
2, pp. 177-195.

Morgan, R.P. (1998). Symmetrical form and common-practice tonality. Music Theory Spectrum, Vol.
20, No. 1, pp. 1-47.

Mithen, S. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body.
London: Weidenfeld & Nicholson.

Molino, J. (2001). Toward an evolutionary theory of music and language. In: N.L. Wallin, B. Merker, &
S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 165-176.

Momigny, J.-J. de. (1803-1806). Cours complet d’harmonie at de composition. Paris: chez l’auteur.

Nettl, B. (2005). The study of ethnomusicology: Thirty-one issues and concepts. Urbana: University of
Illinois Press.

Peretz, I. (2003). Brain specialization for music: New evidence from congenital amusia. In: I. Peretz &
R. Zatorre (Eds.), The Cognitive Neuroscience of Music. New York: Oxford University Press, pp.
192-203.

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, Vol. 6, No. 7,
pp. 688-691.

Peretz, I., & Morais, J. (1989). Music and modularity. Contemporary Music Review, Vol. 4, pp.
277-291.

Pinker, S. (1997). How the Mind Works. London: Allen Lane.

Schmidt, R., & Lee, T. (2005). Motor Control and Learning: A Behavioral Emphasis. Champaign:
Human Kinetics Publishers.

Snyder, B. (2000). Music and Memory: An Introduction. Cambridge, MA: MIT Press.

Sloboda, J. (2005). Exploring the Musical Mind: Cognition, Emotion, Ability, Function. New York:
Oxford University Press.

Stern, D. (1981). The development of biologically determined signals of readiness to communicate,


which are language ‘resistant’. In: R. E. Stark (Ed.), Language Behaviour in Infancy and Early
Childhood. New York: Elsevier/North Holland, pp. 45-62.

Stern, D. (1985). The Interpersonal World of the Infant. London: Academic Press.

Tenzer, M. (2006). Analytical Studies in World Music. New York: Oxford University Press.

Thompson, R.A. (1994). Emotion regulation: a theme in search of definition. Monographs for the
Society for Research in Child Development, Vol. 59, pp. 25-52.

Trehub, S.E. (2003). Musical predispositions in infancy: An update. In: I. Peretz & R. Zatorre (Eds.),
The Cognitive Neuroscience of Music. New York: Oxford University Press, pp. 3-20.

Trehub, S.E. & Trainor, L.J. (1998). Singing to infants: Lullabies and play songs. Advances in Infancy
Research, Vol. 12, pp. 43-77.

Trehub, S.E., Unyk, A.M., & Trainor, L.J. (1993). Maternal singing in cross-cultural perspective. Infant
Behavior and Development, Vol. 16, No. 3, pp. 285-295.

217
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Unyk, A.M., Trehub, S.E., Trainor, L.J. &, Schellenberg, E.G. (1992). Lullabies and simplicity: A
cross-cultural perspective. Psychology of Music, Vol. 20, No. 1, pp. 15-28.

von Ehrenfels, C. (1890). Über Gestaltqualitäten. Vierteljahrsschrift für wissenschaftliche Philosophie,


Vol. 14, pp. 242-292.

Wallin, N.L., Merker, B., & Brown, S. (2001). The Origins of Music. Cambridge, MA: MIT Press.

Werner, H. (1948). Comparative Psychology of Mental Development. Chicago: Follett Publishing.

218
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Tonality and the Cultural


DANIEL LEECH-WILKINSON
King’s College London

ABSTRACT: The emphasis in this response to Mine Doğantan-Dack’s “Tonality: The


Shape of Affect” is on the cultural construction of tonality and the ways in which that
may encourage Western musicians to understand tonality as a fundamental
psychological process. An alternative hypothesis is proposed in which tonality assists
in the effective musical modelling of hunting, offering a means to relive the experience
in social groups.

Submitted 2013 May 15; accepted 2013 June 10.

KEYWORDS: tonality, acculturation, listening, cadence, hunting

TONALITY—in the broad sense of a use of pitch material that tends to favor one pitch-class over the rest,
treating it as a recurring focus—has figured surprisingly little in the growing literature on musical
universals.[1] While usually listed among these, there seems to have been little detailed enquiry into the
features shared among tonal systems; and so the extent to which they are really fundamental to music
perception remains to be reliably assessed. That allows hypotheses as to the evolutionary age of tonality to
settle at many different depths according to the author’s view of what tonality does in our experience of
music and of life. Mine Doğantan-Dack’s highly interesting proposal ranges widely through recent work on
music’s possible role in natural selection and in the modern brain, and does so consistently with other
suggestions relating music to fundamental human traits (Mithen, 2005; Cross, 1999). In that sense it
belongs within the continuing reaction against Pinker’s (1997) more utilitarian view of music as nice but
only superficially useful. For researchers who sense that music goes deeper within them than almost
anything else, this new proposal—that music helps us to learn how to control ourselves (to put it crudely)—
is strikingly attractive. Especially interesting is the emphasis on cadencing and its organization of return but
not departure. Any model for social behavior that tends to lead it towards contentment and resolution is in a
good position to claim to offer selective advantage, especially once conflict becomes weaponized.
Tonality is a complex phenomenon, not just in its handling of pitches within musical practices but
also in its engagement with musical cultures, particularly with sets of beliefs about its identity and role.
Because tonality is so important in Western music theory there may have been a tendency to overrate its
importance in all traditions; and a danger, too—precisely because it has been so extensively theorized—of
missing the huge importance of the cultural in our ideas about tonality. We have come to treat it as an
objective mechanism rather than as one among many possible ways of listening. We have to be constantly
on our guard in this respect, especially in music science because of the strong tendency of empirical work
to assume that it speaks for all.
Much of this Western bias towards tonality as a cognitively dominant principle probably depends
on music notation. Seeing notes as objects rather than experiences encourages the identification of
systematic processes in their patterning, and also the belief that what is in the score wholly specifies
music’s identity and content. And while that belief has gone out of academic fashion in recent decades, a
notion of “the music itself”, contained in the notes, continues to appear in writings and discussions of all
sorts as if it were an obvious given. Music’s uses and associations are, by comparison, trivial issues: for the
Western theorist, music takes little worthwhile meaning other than from the functional relations between its
notes. Beneath a belief in the central role of tonality lies a further notion that tonality itself functions
according to rules derived from nature through acoustics, so that simply sounding a tonal chord, never mind
a conventional chordal progression, satisfies a listener through its natural properties. Tonality quickly
becomes an almost mystical notion, deeply bound up with our biological situation.
This is not to suggest that Doğantan-Dack requires these beliefs as a basis for her theory—far from
it—but only that it is very easy for those of us educated in this tradition, and practising it over a lifetime of
playing and listening, to experience tonality as powerful and even fundamental. And that in turn makes it
easier to see its basic properties as deep-seated. But how much of that is “natural” and how much cultural?
Wong, Roy, and Margulis (2009) found that participants judged music of another culture more tense than

219
Empirical Musicology Review Vol. 8, No. 3-4, 2013

that of their own, and the fact that music can be used so effectively to maintain or even enhance divisions
between groups (Bakagiannis & Tarrant, 2006) reminds us that music’s ability to regulate behavior depends
to a substantial extent on familiarity.
Lullabies are especially interesting in this context, for the features identified by researchers as
universals touch only lightly on tonality through a tendency for falling intervals to predominate (related to
the descending curve of carer-infant vocalizations).[2] On the other hand, lullabies’ universals include quite
specific features of performance practice. From the perspective of Western music it is surprising, to say the
least, that a performance practice should be as hard-wired as a set of relationships between notes. In other
words, what is and is not fundamental about music may be far more varied and surprising than we imagine.
It may be worth mentioning, also, the increasing amount of empirical work on listeners’ ability to
follow tonal and formal processes in Western music (discussed in Leech-Wilkinson, 2012). Findings are
tending to show that listeners do not know or care if a piece ends away from the tonic, so at best only very
local tonal processes are noticed. In that light, it seems uncertain whether the experience of tonality could
have helped in organizing longer and longer temporal experiences. Atonal scores also make emotional
trajectories and journeys through intensification and relaxation, showing (at any rate for this listener) how
powerfully that can be achieved without tonality. Changing textures, rhythms, densities, registers, timbres,
and contours may well be able to do all that is required of a model of affect regulation.
Can we be confident, therefore, that tonality, even in the broadest sense of the term, is that
important an ingredient in musical experience in most cultures? It figures importantly in a number of
musical theories, certainly, but that does not guarantee that it is important in listening (save possibly for
theorists). We would need to see research into that question, involving members of many and contrasting
non-Western cultures, before we could tell whether the perceptual importance attributed to tonality is at all
widespread. It may very well be, but we cannot be confident as yet.
Another important ingredient in Doğantan-Dack’s argument is the experience of music as being
like other kinds of experience, in other words the way in which music so readily seems like styles of
movement or states of mind. This characteristic of music makes it seem deep-rooted because it is easily
mappable. The more varied are things and experiences that music seems to be like, the deeper we imagine it
must go within our brains. There is, of course, a wealth of research showing these kinds of mappings, and
Cross (1999) has argued persuasively for the selective advantages this could have brought. The
phenomenon of large numbers of people seeming to share deep experiences of music while disagreeing
about it in detail is so everyday that it seems hardly to need testing. But how can we feel confident that
music is promiscuously mappable other than through cultural exposure? Fritz et al. (2009) have recently
found Western musical emotions somewhat recognizable by members of a very different culture, but only
somewhat and only for the most basic emotions. Is that enough?
A particularly intriguing idea in Doğantan-Dack’s article is that feeling responses to the dynamic
shapes inherent in cadencing, rather than the physical properties of the world around us (experience of
gravity, and so on), might lie beneath our recognition of attractors including points of tonal arrival. Given
the way we use the word “attract”—which is surely ultimately a word describing human relationships—to
describe effects of gravity, magnetism, electrical charge, and so on, I find this very appealing. But in music
feeling and motion become linked in our bodies through practice: the link is not there from the very
beginning. Beginners’ playing is not emotional or affective: it is just noisy and repetitive. Musical
performance—and is this true also of musical cognition?—becomes affecting and affective later on, after
years of experience. It would be valuable to find out whether tonal closure has such emotional association
for young people or whether it is something that grows over years. It seems quite possible that it grows
gradually, and that as it does so it comes to feel natural (leading, incidentally, to a propensity to accept
hypotheses like this one).
With these caveats in mind, let us propose an alternative hypothesis. Let us suppose, for the sake
of argument, that our response to music that delineates an emotional trajectory (and by no means all worlds
musics do this) depends much less on tonality than on music’s similarity to the experience of that key
survival skill for our human and long pre-humanoid ancestors, hunting. Hunting, like music, involves a
physical-emotional trajectory of increasing activity, crisis, and relaxation, experienced through increasing
heartbeat, intense thrill, decreasing heartbeat and (when successful) fulfillment; in emotional terms,
anticipation, achievement, satisfaction. Cadencing then models homecoming (“the home key”, we still say,
a place of safety and rest). From a male perspective, one might add, music’s power might seem more likely
to arise from this modeling than from any other, which only goes to emphasize the huge extent to which
analogies like this are gendered and cultured. Clearly music did not enable hunting; rather it could have
enabled the reliving, and in a sense (and perhaps before semantic language) the retelling, in a safe and

220
Empirical Musicology Review Vol. 8, No. 3-4, 2013

social environment, with benefits for sociability and group cohesion of the sort proposed by Cross, of the
most dangerously exciting experience that life had to offer.
I do not offer this as a serious hypothesis, although perhaps it could be made into one. Rather, my
point is that because music maps so promiscuously a great variety of hypotheses as to music’s evolutionary
advantages can be proposed. With as wide a range of references as we see here, ranging across recent work
in many of the relevant disciplines and drawn together into an elegant argument, a highly attractive case
can be (indeed, has been) made. Since there is no possibility of definitive evidence in such studies, there
may be a case for developing as many competing hypotheses as possible as, in effect, the only route (since
refutation by testing is rarely possible) to find out which survives as the fittest.

NOTES

[1] On musical universals useful recent studies include Higgins (2006), Morrison and Demorest (2009), and
Laukka, Eerola, Thingujam, Yamasaki, and Beller (2013).

[2] There is further discussion of this in Leech-Wilkinson (2006). Lullaby research is well-summarized and
discussed by McDermott and Hauser (2005).

REFERENCES

Bakagiannis, S., & Tarrant, M. (2006). Can music bring people together? Effects of shared musical
preference on intergroup bias in adolescence. Scandinavian Journal of Psychology, Vol. 47, No. 2, pp.
129-136.

Cross, I. (1999). Is music the most important thing we ever did? Music, development and evolution. In: S.
Won Yi (Ed.), Music, Mind and Science. Seoul National University Press, pp. 1-39.

Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., Friederici, A. D., & Koelsch, S.
(2009). Universal recognition of three basic emotions in music. Current Biology, Vol. 19, No. 7, pp.
573-576.

Higgins, K.M. (2006). The cognitive and appreciative import of musical universals. Revue internationale
de philosophie, Vol. 4, No. 238, pp. 487-503. www.cairn.info/revue-internationale-de-philosophie-2006-4-
page-487.htm.

Laukka, P., Eerola, T., Thingujam, N. S., Yamasaki, T., & Beller, G. (2013). Universal and culture-specific
factors in the recognition and performance of musical affect expressions. Emotion. Advance online
publication. doi: 10.1037/a0031388

Leech-Wilkinson, D. (2006). Portamento and musical meaning. Journal of Musicological Research, Vol.
25, Nos. 3-4, pp. 233-261.

Leech-Wilkinson, D. (2012). Compositions, scores, performances, meanings. Music Theory Online, Vol. 18,
No. 1. http://www.mtosmt.org/issues/mto.12.18.1/mto.12.18.1.leech-wilkinson.php

McDermott, J., & Hauser, M. (2005). The origins of music: innateness, uniqueness, and evolution. Music
Perception, Vol. 23, No. 1, pp. 29-59.

Mithen, S. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body. London:
Weidenfeld & Nicholson.

Morrison, S.J., & Demorest, S.M. (2009). Cultural constraints on music perception and cognition. Progress
in Brain Research, Vol. 178, pp. 67-77.

Pinker, S. (1997). How the Mind Works. London: Allen Lane.

221
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Wong, P.C.M., Roy, A.K., & Margulis, E.H. (2009). Bimusicalism: the implicit dual enculturation of
cognitive and affective systems. Music Perception, Vol. 27, No. 2, pp. 81-88.

222
Empirical Musicology Review Vol. 8, No. 3-4, 2013

Shape Cognition and Temporal, Instrumental and Cognitive


Constraints on Tonality. Public Peer Review of “Tonality:
The Shape of Affect” by Mine Doğantan-Dack.
ROLF INGE GODØY
University of Oslo, Department of Musicology

ABSTRACT: One of the main aims of Mine Doğantan-Dack’s article is to present a


theory of the evolutionary emergence of tonality in music and its connections with
affective states. In the course of her article, we are presented with several
fundamental issues of music perception, music and evolution, dynamical theory, as
well as music and shape. In this public peer review, I comment on some of these
issues, starting with some thoughts on the relationships between musical features,
shapes and timescales, then continue with some thoughts on tonality, and finally
present some remarks on tonality in relation to instrumental and human constraints.

Submitted: 2013 June 14; accepted 2013 June 17.

KEYWORDS: Shape, tonality, timescales, instrumental and cognitive constraints

MUSICAL FEATURE SHAPES

THERE can be no doubt that concepts of shape are ubiquitous in musical discourse and music
cognition: we use innumerable shape related metaphors for most (if not all) features of music such as
dynamics, timbre, harmony, pitch, contour, rhythm, texture, tempo, timing, expressivity and affective
qualities. Also, we encounter shapes in various music-related images such as in graphical scores,
composers’ sketches, music analysis illustrations, as well as in more directly signal-based shape images
such as waveforms and spectrograms, and last but not least, as shape images of music-related body
motion. We could thus speak of a widespread and deep-rooted shape cognition in music, as well as in
human reasoning in general, as suggested by some directions in the cognitive sciences, especially by
so-called morphodynamical theory and cognitive linguistics (Godøy, 1997; Petitot, 1985; Thom, 1983).
As shape cognition seems to apply to most musical features as well as to other domains of the
cognitive sciences, it may be tempting to regard shape cognition as amodal, as having a high level of
generality or even abstraction, making possible applications of similar shape categories to qualitatively
rather different domains. We could for instance use the shape expression “flat” to characterize a melody
(i.e. having a small ambit, circling around just a few tones), dynamics (having no increase or decrease
in intensity), timbre (being stationary for the duration of the sound), spectrum (meaning all partials are
of equal amplitude), a musical performance (that it came across as rather bland), or even the overall
emotive effect of an entire musical work (it did not move us at all). The advantage of such shape
cognition applied to various music-related features is obvious: shape cognition is inherently holistic
quite simply because a shape is perceived and conceived “all-at-once” as a geometric unit, and not
piecemeal as a series of individual points or numerical values.
The holistic nature inherent in shape cognition also fits quite well with music-related body
motion in that most human motion is continuous and clearly exhibits shapes, both of motion
trajectories and of quasi-stationary postures. The main idea with what I have previously called a motor-
mimetic approach to music perception and cognition (Godøy, 2003), which in turn is based on so-called
motor theory (Galantucci, Fowler, & Turvey, 2006), is that perception and cognition are active
processes where we trace the shapes of whatever it is that we are perceiving and thinking.

223
Empirical Musicology Review Vol. 8, No. 3-4, 2013

TIMESCALES OF MUSICAL FEATURES

However, this link between shape cognition and sensations of motion also raises questions about
timescales. By engaging in so-called musical imagery (Godøy & Jørgensen, 2001), I can “replay” a
tune in my mind at the same tempo as when I heard it performed “for real”, or I can quickly scan
through the entire melody in a couple of seconds, or I can even imagine the entire melody “in an
instant”, as a shape. This means that we have timescales in music cognition ranging from the very long
to the very short, so we need to consider the different timescales at work here if we wish to relate shape
cognition with perceived, non-abstract features in music.
The timescales of music actually extend from the very short durations of audible vibrations
(from the threshold of hearing at approximately 20000 vibration shapes per second) to several minutes
and hours. We may view timescales as continuous between these extremes, but there are significant
qualitative differences in human perception and action at various ranges: what is within the audio range
of 20 to 20000 Hz is neurophysiologically very different from that which is below this limit. And that
which can fit within the range of what is commonly accepted as short-term memory of (very
approximately) 0.5 to 5 seconds, is qualitatively quite different from that which is recalled from long-
term memory, such as memories of more extended sections, tunes or whole works. This means that in
musical imagery it is possible to scan through any excerpt or whole work of music at any speed, but in
actual unfolding or “real time” perception we have qualitatively distinct features at different timescales.
This is also the case when it comes to the perception of tonality at different timescales.
Western music theory has often had a tendency to sidestep these issues, at least in part because
of notation. A prime example is the idea of reducing large-scale works to more compressed and skeletal
overview images as in Schenkerian analysis, reducing, say, an extended symphony movement to an
Urlinie of a few tones. With such a notation-based paradigm for musical analysis, critical perceptual
distinctions between the local and the global may be lost. As far as I can see, it remains to be
convincingly demonstrated that large-scale formal, including tonal, relationships are perceptually as
decisive as some music theory would like us to believe (see Tillmann and Bigand [2004] for some very
interesting critical remarks on this). The few experimental studies we have of the efficacy of large-scale
forms (e.g., Eitan & Granot, 2008) tell us we should be suspicious of such claims until further notice.
To my knowledge, most of the recent perception-based studies of tonality focus on short timescales,
typically on local cadential contexts, rather than on large-scale (e.g., sonata form) tonal relations.
Additionally, from my knowledge of various ethnomusicological studies, the concept of tonality is
much determined by instrumental constraints, effectively making local pitch patterns repeated at short
timescales the basis for sensations of tonality. We should thus always consider timescales when we
think about tonality.

NOTIONS OF TONALITY

Considering the vast literature directly or indirectly concerned with notions of tonality in Western
musical thought, it is difficult to extract any single, clear definition of what tonality in music actually
is. For one thing, there are the terminological issues of what is meant by terms such as “tonal”,
“modal”, “free tonal”, “atonal”, “serial”, etc. It seems to me that in the Anglo-Saxon world the term
“tonal” is often seen as synonymous with what may be called “functional harmonic music”, amenable
to the Hugo Riemann type of tonality definitions by chord functions. But in other contexts “tonal”
could be taken to include modal and/or “free tonal” music, as in various strands of neoclassical and
other Western 20th century music, for example Olivier Messiaen or even John Coltrane.
It could be interesting here to remember the more “extended” view of tonality that was
advocated by Paul Hindemith, partly based on acoustic and perceptual principles (Hindemith, 1941).
Although this work is often considered speculative and unsystematic, one interesting idea of
Hindemith’s is that any single interval or constellation of intervals (a chord) will have a more or less
salient root tone, and that any progression of intervals or chords would thus result in some tonal
sensation. Hindemith’s contention was that music may be anti-tonal yet still locally and in passing have
some (weaker or stronger) tonal centre, as he tried to demonstrate with his analysis of an excerpt from
Schönberg’s Piano Piece op. 33a. Hindemith was modern in his view of tonality not just in dissociating
it from past European major-minor-chromatic ways, but also in regarding tonality as a graded, “more or
less” and contextually emergent phenomenon. Interestingly, we have in recent decades seen more
systematic bottom-up, signal-based approaches to tonality as an emergent phenomenon in listening, as
in Leman (1995). Furthermore, such bottom-up approaches suggest that the perception of pitch and
intervals are also dependent on timbral features (Sethares, 2005).

224
Empirical Musicology Review Vol. 8, No. 3-4, 2013
Attempting to take a more universal or “world music” view of tonality, it would make sense to
adopt a combined acoustical-perceptual approach that includes what I call instrumental constraints, i.e.
the production of the sound, the sound’s perceived timbral features, and various practices such as the
tunings and interval sizes used, and also the overall statistical distribution of the various pitches during
the unfolding of the music. The last point is essential as it might capture the phenomenon of tonality-
sensations as a result of the sheer recurrence of certain pitches, independent of tunings, interval sizes,
or modalities (scales), showing that, for example, Messiaen-style recurring central tones create tonal
sensations even in passages that are modally quite diverse. Similarly, the use of drones in instrumental
music like in Norwegian Hardanger Fiddling and several other world musics may be seen as prominent
cases of tonality, what I would call a landmark type tonality based on the constraint that the strings (or
pipes, tubes, bells, etc.) stay more or less tuned to the same pitch throughout the performance.
To summarize, there are some constraints at work here that I think we ought to keep in mind
when considering tonality in its various guises, namely that tonality should

• be seen in relation to timescales


• be seen in relation to instrumental features, in particular the use of drones as landmarks for
tonality
• be seen in relation to timbre, tuning, and interval size
• allow for innumerable interval constellations (all kinds of modal scales)
• be seen as a statistical and contextually emergent “more-or-less” phenomenon.

TONALITY BY LANDMARKS AND EFFECTOR POSITIONS

In summary and as an overall comment to Mine Doğantan-Dack’s article, it seems to me that known
musical practices suggest that tonality emerges on the basis of constraints: basic physical constraints of
instruments such as the use of drones, as well as of timbre and tunings, and various neurocognitive
constraints at different timescales. In a world music perspective, the use of drones is significant,
effectively becoming landmarks in pitch space to which a great variety of pitches may be related. Also
the frequency of occurrence of a given pitch could be seen as a basis for tonality, meaning that tonality
is an emergent statistical phenomenon.
Related to the landmark function of drones and/or sheer frequency of occurrence of some
tone(s) in instrumental music, it could be tempting to speculate that effector position and effector shape
play a role in sensations of tonality, for example as finger positions on the strings or the tubes, and later
on in the development of musical technologies as position and shape of hands on the keys of an
instrument. We could also speculate that on an even more general cognitive level there could be a
coupling of tonality with spatial position, enabling the gestural representation of pitches (as in sol-fa
practice). Research on spontaneous body motion to music seems to suggest that pitch is the feature of
musical sound most readily rendered and most agreed upon: people regardless of training tend to render
pitches with hands up for “high” pitches, hands down for “low” pitches (Godøy, 2010). However, to
what extent recurrent tonal centers would be rendered by more precise hand positions is something that
remains to be explored.
Lastly, it is essential to consider timescales when thinking about tonality, as there seem to be
quite significant differences between the local and the more large-scale formal relations of tonality.
Concentrating on the local, the combined embodied and instrument-based constraints mentioned above
seem to allow universal frameworks for generating some tonality sensation; yet the diversity in how
tonality is manifest in various musical practices world-wide is quite astonishing.

REFERENCES

Eitan, Z., & Granot, R.Y. (2008). Growing oranges on Mozart’s apple tree: “inner form” and aesthetic
judgment. Music Perception, Vol. 25, No. 5, pp. 397-417.

Galantucci, B., Fowler, C.A., & Turvey, M.T. (2006). The motor theory of speech perception reviewed.
Psychonomic Bulletin and Review, Vol. 13, No. 3, pp. 361-377.

Godøy, R.I. (1997). Formalization and Epistemology. Oslo: Scandinavian University Press.

Godøy, R.I. (2003). Motor-mimetic music cognition. Leonardo, Vol. 36, No. 4, pp. 317-319.

225
Empirical Musicology Review Vol. 8, No. 3-4, 2013
Godøy, R.I. (2010). Gestural affordances of musical sound. In: R.I. Godøy & M. Leman (Eds.),
Musical Gestures: Sound, Movement, and Meaning. New York: Routledge, pp. 103-125.

Godøy, R.I., & Jørgensen, H. (Eds.) (2001). Musical Imagery. Lisse (Holland): Swets & Zeitlinger.

Hindemith, P. (1941). The Craft of Musical Composition. London: Schott.

Leman, M. (1995). Music and Schema Theory: Cognitive Foundations of Systematic Musicology.
Berlin: Springer.

Petitot, J. (1985). Morphogenèse du Sens I. Paris: Presses Universitaires de France.

Sethares, W.A. (2005). Tuning, Timbre, Spectrum, Scale. London: Springer.

Tillmann, B., & Bigand, E. (2004). The relative importance of local and global structures in music
perception. The Journal of Aesthetics and Art Criticism, Vol. 62, No. 2, pp. 211-222.

Thom, R. (1983). Paraboles et catastrophes. Paris: Flammarion.

226

You might also like